Skip to content

adirizq/indonesian-news-title-category-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indonesian News Title Category CLassifier

This is my final project for Information Retrieval Course

Want to give it a try? Try it now on Streamlit Cloud

Dataset

Indonesian News Title by Ibrahim on Kaggle. This dataset contains 91,017 Indonesian News Title caregorized into 9 labels ('finance', 'food', 'health', 'hot', 'inet', 'news', 'oto', 'sport', 'travel').

Example

Date Url Title Category
02/26/2020 https://finance.detik.com/berita-ekonomi-bisnis/d-4916114/kemnaker-awasi-tka-di-meikarta Kemnaker Awasi TKA di Meikarta finance
02/27/2020 https://sport.detik.com/detiktv/d-4916359/kangen-rooney-lihat-nih-gol-gol-kerennya Kangen Rooney? Lihat Nih Gol-gol Kerennya sport
04/22/2020 https://travel.detik.com/travel-news/d-4985787/jadikan-rumah-serasa-tempat-travelling-ini-tipsnya Jadikan Rumah Serasa Tempat Travelling, Ini Tipsnya travel

Result

Model Test Accuracy Train Loss Validation Loss
Word2Vec + LSTM 89.16% 0.4327 0.4741
Word2Vec + CNN 1D 88.53% 0.4324 0.5803

Project Structure

├── datasets
│   ├── preprocessed
│   │   ├── data.pkl
│   └── indonesian-news-title.csv
├── models
│   ├── cnn.py
│   ├── lstm.py
│   ├── inference.py
│   └── w2v_matrix.pkl
├── utils
│   ├── preprocessor.py
│   └── tokenizer.pkl
├── app.py
├── train_cnn.py
├── train_lstm.py
├── requirements.txt
├── README.md
└── .gitignore

Streamlit Demo

streamlit run app.py

Author

Rizky Adi

Last update: 22 November 2022

Releases

No releases published

Packages

No packages published

Languages