You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue:
python make_datafiles.py ./cnn/stories ./dailymail/stories/
Making bin file for URLs listed in url_lists/all_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 138, in
write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test"))
File "make_datafiles.py", line 84, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 26, in read_text_file
with open(text_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'url_lists/all_test.txt'
Then I assume it is because all_test_urls doesn't direct to the url file in the dataset, i.e., wayback_test_urls.txt. So, I alter the file name to all_test.txt and put it in the folder, ./cnn/url_lists . But the code still gives the same error. So, I check the source again and find something wrong in the following line. url_list = read_text_file(url_file)
And I alter it to be: url_list = read_text_file(os.path.join('./cnn', url_file))
In this way, I think all the source and target file is generated from only cnn dataset. Am I right?
The text was updated successfully, but these errors were encountered:
I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue:
Then I assume it is because all_test_urls doesn't direct to the url file in the dataset, i.e., wayback_test_urls.txt. So, I alter the file name to all_test.txt and put it in the folder, ./cnn/url_lists . But the code still gives the same error. So, I check the source again and find something wrong in the following line.
url_list = read_text_file(url_file)
And I alter it to be:
url_list = read_text_file(os.path.join('./cnn', url_file))
In this way, I think all the source and target file is generated from only cnn dataset. Am I right?
The text was updated successfully, but these errors were encountered: