Skip to content

Commit

Permalink
change logic for adding split texts of long documents to 'document_li…
Browse files Browse the repository at this point in the history
…st' to version of 'hboisgibault'

Signed-off-by: Tim Schopf <tim.schopf@t-online.de>
  • Loading branch information
TimSchopf committed Jun 18, 2022
1 parent 8b4cdd4 commit bf8a697
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions keyphrase_vectorizers/keyphrase_vectorizer_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,10 +307,10 @@ def _get_pos_keyphrases(self, document_list: List[str], stop_words: str, spacy_p
max_doc_length = 1000000
for document in document_list:
if len(document) > max_doc_length:
docs_list.append(self._split_long_document(text=document, max_text_length=max_doc_length))
docs_list.extend(self._split_long_document(text=document, max_text_length=max_doc_length))
else:
docs_list.append([document])
document_list = [text for split_text in docs_list for text in split_text]
docs_list.append(document)
document_list = docs_list
del docs_list

# increase max length of documents that spaCy can parse
Expand Down

0 comments on commit bf8a697

Please sign in to comment.