-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a mismatch between output["topic-word-matrix"]
and dataset.get_vocabulary()
in terms of words?
#86
Comments
output["topic-word-matrix"]
and dataset.get_vocabulary()
in terms of words?output["topic-word-matrix"]
and dataset.get_vocabulary()
in terms of words?
There should be a one-to-one correspondence between the two. It's difficult to say what is wrong. Can you share more details about the problem? |
Good day Dr. Silvia, nice to see you again, and thank you for reply. Here are the details of the issue. :) First, I created a dataset folder containing two files, namely The corpus file: The vocabulary file (sorted alphabetically): Second, I loaded the dataset and trained LDA models with the dataset. Third, after training, I imported one of the LDA models. With the model’s topic-word-matrix as the data and the dataset’s vocabulary as the column. The resulting data frame is shown in the figure below: Last, the top 5 words of the data frame’s first topic are different from the top 5 words of the model’s first topic. I can't determine why there are discrepancies in the top words of the topics. With appreciation, Benz |
Hi Benz,
Thanks for your patience. Silvia |
Hi, Let me know :) |
There is a mismatch between
output["topic-word-matrix"]
anddataset.get_vocabulary()
in terms of words?I created a Dataframe as follows:
df = pd.DataFrame(data = output["topic-word-matrix"], columns = dataset.get_vocabulary()).T
When I sort the Dataframe by a topic number to get the top words for a topic, why do the results differ from
output["topics"][i]
?Thank you!
The text was updated successfully, but these errors were encountered: