Skip to content

I scrapped over 7000 Japanese-language reviews from an online platform and transformed them into a Pandas DataFrame. I performed preprocessing and tokenization of the Japanese text using specialized libraries, visualized word embeddings, and conducted topic modeling to classify articles/reviews into categories.

Notifications You must be signed in to change notification settings

lethuyngocan/Japanese-Text-Similarity

Repository files navigation

Japanese-Text-Similarity

  1. Text Extraction:

1.1. Question (1):

1.2. Solution: Uncompress the compressed file

  1. Understanding the data:

2.1. Understand the problem statetement:

2.2. Basic EDA and Visualization pre-cleaning process:

  1. Text Preprocessing:

3.1. Removing Noise:

3.2. Removing Punctuation:

3.3. Tokenization:

3.4. Removing Stopwords:

  1. Embedded Representation

4.1. Question (2.1)

4.2. Solution: Embedding Visualization

4.3. Question (2.2)

4.4. Solution: Query similarity with gensim

  1. Text Classification:

5.1. Question (2.3):

5.2. Solution: Text Classification with Naive Bayes (NB)

5.3. Question (2.4):

5.4. Solution: Improve the accuracy of the model

  1. Extraction of Characteristic words

6.1. Question(3)

6.2. Solution: Topic Modelling with LDA

  1. Conclusion

About

I scrapped over 7000 Japanese-language reviews from an online platform and transformed them into a Pandas DataFrame. I performed preprocessing and tokenization of the Japanese text using specialized libraries, visualized word embeddings, and conducted topic modeling to classify articles/reviews into categories.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published