Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update make_datafiles.py #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Nov 27, 2018

  1. Update make_datafiles.py

    Removing this ERRORS
    ERROR1: Traceback (most recent call last):
      File "make_datafiles.py", line 239, in <module>
        write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test.bin"))
      File "make_datafiles.py", line 154, in write_to_bin
        url_hashes = get_url_hashes(url_list)
      File "make_datafiles.py", line 106, in get_url_hashes
        return [hashhex(url) for url in url_list]
      File "make_datafiles.py", line 106, in <listcomp>
        return [hashhex(url) for url in url_list]
      File "make_datafiles.py", line 101, in hashhex
        h.update(s)
    TypeError: Unicode-objects must be encoded before hashing
    
    ERROR 2:
    PTBTokenizer tokenized 203071165 tokens at 1811476.32 tokens per second.
    Stanford CoreNLP Tokenizer has finished.
    Successfully finished tokenizing dailymail/stories/ to dm_stories_tokenized.
    
    Making bin file for URLs listed in url_lists/all_test.txt...
    Writing story 0 of 11490; 0.00 percent done
    Traceback (most recent call last):
      File "make_datafiles.py", line 239, in <module>
        write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test.bin"))
      File "make_datafiles.py", line 184, in write_to_bin
        tf_example.features.feature['article'].bytes_list.value.extend([article])
    TypeError: "marseille , france -lrb- cnn -rrb- the french prosecutor leading an investigation into the crash of has type str, but expected one of: bytes
    the-black-knight-01 authored Nov 27, 2018
    Configuration menu
    Copy the full SHA
    6861dab View commit details
    Browse the repository at this point in the history