You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking into this dataset. I have a couple of queries.
The given article is a long text in the Oracle folder for the story dataset. How to compare retriever performance whereas the retrieved article is a list of text separated by a token?
I was wondering how to construct the corpus of documents so that I use another retriever model to retrieve documents. I found there are divided_documents in Raw folder. But the maximum number of words in a divided document is still pretty high (6221 according to nltk word_tokenize), which exceeds the input length limit for Contriever retriever. What is the best way to divide documents so that each divided document length is within the input length limit of Contriever?
Thanks for this great work. I would appreciate your time. Looking forward to hearing from you.
The text was updated successfully, but these errors were encountered:
I was looking into this dataset. I have a couple of queries.
Thanks for this great work. I would appreciate your time. Looking forward to hearing from you.
The text was updated successfully, but these errors were encountered: