File Language Analyzer

File Language Analyzer is a suite of Python modules, that provides objects, constants and functions, to recognise the language of a file, analyze its informations and process (elaborate and create) .csv letter frequency tables.

Keep in mind that this project is programmed very poorly, however the logic behind the adopted method is interesting.

Project Status

Features

Recognise the language of a file
Convert .csv frequency table to Python dictionary
Convert Python dictionary to .csv frequency table
Generate frequency table starting from a set of Twitter messages

Math behind it

By analyzing the frequency of every single letter is possible to detect the language of a given text.
Once the characters' frequencies have been extracted, this information can be used as a representation of the text.
We want to find out which is its language, so we have to determine which is the table's column that has the nearest values.
To accomplish that, it can be used the Pythagorean theorem extended to 26 dimensions, the number of letters in the Latin alphabet.
By computing the distance between the given text and each language inside the table, it's possible to define which is the nearest language.

Technologies

Python 3.x
Python built-in libraries
Twitter API wrapped by tweepy library
wikipedia-api module
Flask

Requirements

Use one of the following commands (according to the configuration of your environment):

$ pip install -r requirements.txt

or

$ py -m pip install -r requirements.txt

Launch

If you are in Bash-like environment with Python installed, you can run directly by typing:

$ ./Main.py

Otherwise, depending on your Python interpreter installation and your OS:

$ python Main.py

or

$ py Main.py

After that, go to http://127.0.0.1:5000 or http://localhost:5000 and try out the web interface.

Default frequency table is letters_frequency_twitter.csv

Usage

If you want to use tweetrain.py's functions, you have to insert your personal Twitter tokens. Look at the first four uppercase variables and fill in double quotes with the proper value.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
Frequency_Tables		Frequency_Tables
Modules		Modules
Test_Files		Test_Files
Web_Interface		Web_Interface
.gitignore		.gitignore
LICENSE		LICENSE
Main.py		Main.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Language Analyzer

Table of Contents

Project Status

Features

Math behind it

Technologies

Requirements

Launch

Usage

About

Releases

Sponsor this project

Packages

Languages

License

mc-cat-tty/Language-Classification

Folders and files

Latest commit

History

Repository files navigation

File Language Analyzer

Table of Contents

Project Status

Features

Math behind it

Technologies

Requirements

Launch

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages