Netifier: Negativity Classifier

Introduction

The rapid spread of information through internet have benefitted our lives in many different ways. But, it also introduces us to some problems, one of them being the spead of negative contents on the internet. The presence of 'toxic' post led to peopple struggle to have effective conversations.

Inspired by Toxic Comment Classification Challenge, we decided to do similar thing using data from Indonesian social media. Our goal is to analyze and create multi-label text toxicity classifier using machine learning.

Contributions

Created Indonesian Social Media Text Toxicity Dataset
Created Pipeline For The Task: Exploratory Data Analysis, Data Preprocessing, and Modelling
Compared Various Machine Learning Model Performance On This Task

Dataset

As far as we know, there's no available public dataset on Indonesian text toxicity and we decided to collect the data ourselves. We scraped posts on famous social media sites in Indonesia, such as Instagram, Twitter, and Kaskus. We then manually labelled ~7000 samples into 4 categories: pornography, hate speech, racism, and radicalism.

We also attempted to collect more data using semi-supervised method. We collected additional ~20.000 samples through this method. All of the data could be downloaded from this repository.

Project Organization

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
└── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
 						  and short description

Project Members

Ahmad Izzan
Christian Wibisono
Ilham Firdausi Putra

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netifier: Negativity Classifier

Introduction

Contributions

Dataset

Project Organization

Project Members

About

Releases

Packages

Contributors 2

Languages

ahmadizzan/netifier

Folders and files

Latest commit

History

Repository files navigation

Netifier: Negativity Classifier

Introduction

Contributions

Dataset

Project Organization

Project Members

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages