This is my final project for Information Retrieval Course
Want to give it a try? Try it now on
Indonesian News Title by Ibrahim on Kaggle. This dataset contains 91,017 Indonesian News Title caregorized into 9 labels ('finance', 'food', 'health', 'hot', 'inet', 'news', 'oto', 'sport', 'travel').
Date | Url | Title | Category |
---|---|---|---|
02/26/2020 | https://finance.detik.com/berita-ekonomi-bisnis/d-4916114/kemnaker-awasi-tka-di-meikarta | Kemnaker Awasi TKA di Meikarta | finance |
02/27/2020 | https://sport.detik.com/detiktv/d-4916359/kangen-rooney-lihat-nih-gol-gol-kerennya | Kangen Rooney? Lihat Nih Gol-gol Kerennya | sport |
04/22/2020 | https://travel.detik.com/travel-news/d-4985787/jadikan-rumah-serasa-tempat-travelling-ini-tipsnya | Jadikan Rumah Serasa Tempat Travelling, Ini Tipsnya | travel |
Model | Test Accuracy | Train Loss | Validation Loss |
---|---|---|---|
Word2Vec + LSTM | 89.16% | 0.4327 | 0.4741 |
Word2Vec + CNN 1D | 88.53% | 0.4324 | 0.5803 |
├── datasets
│ ├── preprocessed
│ │ ├── data.pkl
│ └── indonesian-news-title.csv
├── models
│ ├── cnn.py
│ ├── lstm.py
│ ├── inference.py
│ └── w2v_matrix.pkl
├── utils
│ ├── preprocessor.py
│ └── tokenizer.pkl
├── app.py
├── train_cnn.py
├── train_lstm.py
├── requirements.txt
├── README.md
└── .gitignore
streamlit run app.py
Last update: 22 November 2022