Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

Open
TianlinZhang668 opened this issue Apr 9, 2019 · 8 comments

Comments

@TianlinZhang668
Copy link

i run makedatafiles.py. but it has an error:
Preparing to tokenize /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized...
Making list of files to tokenize...
Tokenizing 92579 files in /home/ztl/Downloads/cnn_stories/cnn/stories and saving in cnn_stories_tokenized...
Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.process.PTBTokenizer
Stanford CoreNLP Tokenizer has finished.
Traceback (most recent call last):

However i can run echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer in the root
i dont know how to deal with? thanks a lot

@TianlinZhang668
Copy link
Author

i run the corenlp-3.9.2.jar

@ubaidsworld
Copy link

You need stanford-corenlp-3.7.0.jar. See this: https://github.com/abisee/cnn-dailymail#2-download-stanford-corenlp
Please read the README.md file.

@TianlinZhang668
Copy link
Author

Successfully finished tokenizing /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized.

Making bin file for URLs listed in url_lists/all_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 239, in
write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test.bin"))
File "make_datafiles.py", line 154, in write_to_bin
url_hashes = get_url_hashes(url_list)
File "make_datafiles.py", line 106, in get_url_hashes
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 106, in
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 101, in hashhex
h.update(s)
TypeError: Unicode-objects must be encoded before hashing

i have got the tokenized, but next ....

@JafferWilson
Copy link

Try this: https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail
Guess it will solve your tokenization and rest other issues.

@quanghuynguyen1902
Copy link

if I have content of the article that isn't the same as structure of the CNN's article

@JafferWilson
Copy link

@quanghuynguyen1902 Guess you already have opened a new issue #29
Lets go there. Please someone close this issue.

@mooncrater31
Copy link

I am facing the same issue in here.

@SpaceTime1999
Copy link

source ./.bash_profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants