Author-Identification-from-text

Authorship identification has been a very important and practical problem in Natural Language Processing. The problem is to identify the author of a document from a given list of possible authors. A large amount of work exists on this problem in literature. We develop ideas based on this work in order to build our own model for authorship identification. We also take a model from this work as a baseline for comparing the results. Our model for the task is a text classifier based on logistic regression which includes n-grams, style markers and document finger-printing as features.

Dataset

Reuter_50_50 is the dataset used. It is present in directories training/ , testing/ and all. It contains 50 text file for 50 authors. Each text file contains several lines for that author.

Requirements

python with common ML and NLP libraries like Scikit-learn,Theano,Nltk etc.

Organization

learner.py is the main file . run it to see the output.

More Details

coming soon

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

Credits

Devansh Dalal
Abhishek

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
All		All
C50test		C50test
Others		Others
Small		Small
lstm		lstm
pyintertextuality		pyintertextuality
report		report
slides		slides
testing		testing
training		training
README.md		README.md
__init__.py		__init__.py
authos.txt		authos.txt
dump.txt		dump.txt
fingerprintgenerator.py		fingerprintgenerator.py
fingerprintgenerator.pyc		fingerprintgenerator.pyc
inp.txt		inp.txt
learner.py		learner.py
out.txt		out.txt
p.py		p.py
porter.py		porter.py
porter.pyc		porter.pyc
q.py		q.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Author-Identification-from-text

Dataset

Requirements

Organization

More Details

Contributing

Credits

About

Releases

Packages

Languages

devanshdalal/Author-Identification-task

Folders and files

Latest commit

History

Repository files navigation

Author-Identification-from-text

Dataset

Requirements

Organization

More Details

Contributing

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages