Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

TianlinZhang668 · 2019-04-09T02:19:40Z

i run makedatafiles.py. but it has an error:
Preparing to tokenize /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized...
Making list of files to tokenize...
Tokenizing 92579 files in /home/ztl/Downloads/cnn_stories/cnn/stories and saving in cnn_stories_tokenized...
Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.process.PTBTokenizer
Stanford CoreNLP Tokenizer has finished.
Traceback (most recent call last):

However i can run echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer in the root
i dont know how to deal with? thanks a lot

TianlinZhang668 · 2019-04-09T02:21:45Z

i run the corenlp-3.9.2.jar

ubaidsworld · 2019-04-09T05:13:35Z

You need stanford-corenlp-3.7.0.jar. See this: https://github.com/abisee/cnn-dailymail#2-download-stanford-corenlp
Please read the README.md file.

TianlinZhang668 · 2019-04-09T06:09:32Z

Successfully finished tokenizing /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized.

Making bin file for URLs listed in url_lists/all_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 239, in
write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test.bin"))
File "make_datafiles.py", line 154, in write_to_bin
url_hashes = get_url_hashes(url_list)
File "make_datafiles.py", line 106, in get_url_hashes
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 106, in
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 101, in hashhex
h.update(s)
TypeError: Unicode-objects must be encoded before hashing

i have got the tokenized, but next ....

JafferWilson · 2019-04-09T10:56:00Z

Try this: https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail
Guess it will solve your tokenization and rest other issues.

quanghuynguyen1902 · 2019-05-09T08:59:23Z

if I have content of the article that isn't the same as structure of the CNN's article

JafferWilson · 2019-05-09T10:25:52Z

@quanghuynguyen1902 Guess you already have opened a new issue #29
Lets go there. Please someone close this issue.

mooncrater31 · 2019-12-28T07:36:04Z

I am facing the same issue in here.

SpaceTime1999 · 2021-09-03T13:52:06Z

source ./.bash_profile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

TianlinZhang668 commented Apr 9, 2019

TianlinZhang668 commented Apr 9, 2019

ubaidsworld commented Apr 9, 2019

TianlinZhang668 commented Apr 9, 2019

JafferWilson commented Apr 9, 2019

quanghuynguyen1902 commented May 9, 2019

JafferWilson commented May 9, 2019

mooncrater31 commented Dec 28, 2019

SpaceTime1999 commented Sep 3, 2021

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer #28

Comments

TianlinZhang668 commented Apr 9, 2019

TianlinZhang668 commented Apr 9, 2019

ubaidsworld commented Apr 9, 2019

TianlinZhang668 commented Apr 9, 2019

JafferWilson commented Apr 9, 2019

quanghuynguyen1902 commented May 9, 2019

JafferWilson commented May 9, 2019

mooncrater31 commented Dec 28, 2019

SpaceTime1999 commented Sep 3, 2021