Skip to content

Latest commit

 

History

History
56 lines (40 loc) · 1.58 KB

README.md

File metadata and controls

56 lines (40 loc) · 1.58 KB

Word counter PDF files

pyversion commits-since release last-commit


A program for counting the number of words(word tokenize) in PDF files.

It should be noted that this program does not detect scanned files.

How to run

To run this file; Just use steps below:

  • Install python3, pip, PyPDF2, nltk.
  • Clone the project Word_counter

Table Of Contents

Tip

NLTK libraries are required.

If you want to install them on your system You must run the following code:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

Parameters

You must modify the filename variable to rename the input file:

filename = 'Your_file.pdf'

To change the number of output words, you must modify the variable count_word:

count_word = 30

TODO List

  • Create a CSV file
  • Create a Wordclouds