Extreme Multi-label Classification

In this project, we experimented with various extreme multi-label classification algorithms on a large-scale data set.

Task

Extreme classification is a multi-label classification problem that annotates a data point with the most relevant subset of labels from an extremely large label set. It has wide applications in diverse areas such as dynamic search advertising, text classification, and recommender systems. The main technical challenges include improving the prediction accuracy and reducing the training time, prediction time and model size.

Data

In this project, we performed extreme multi-label classification on EURLex-4K dataset, a collection of documents about European Union Law with 3993 categories.

Methods

We first applied traditional multi-label algorithms as baseline. There are two traditional methods that we tried:

Problem Transformation (Binary Relevance/Classifier Chain plus Traditional ML algorithms like RF/KNN)
Algorithm Adaption (Adapted KNN, SVM etc.)

We further implemented embedding-based models Principal Label Space Transformation (PLST) and Sparse Local Embeddings for Extreme Multi-label Classification (SLEEC), and we modified existed algorithms for improvements.

Finally, we focused on one of the leading one-vs-all based extreme classifiers Partitioned Label Trees (Parabel).

Evaluation

LRAP

We used label ranking average precision (LRAP) as our evaluation metric to assess label ranking performance.

Training Time

We also record training times to evaluate model efficiency.

Conclusion

The result shows that the Parabel achieves the highest LRAP score as well as the best training time among all the algorithms we experimented with.

Contributors

Man Jin (mj1637@nyu.edu)
Florence Denglin Jiang (florence.jiang@nyu.edu)
Hong Gong (hg1153@nyu.edu)
Jacqueline Yuwei Wang (yw1854@nyu.edu)
Yi Xu (yx2090@nyu.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LEML		LEML
PLST		PLST
Parabel		Parabel
SLEEC		SLEEC
Traditional_methods_1_Problem_transformation_models		Traditional_methods_1_Problem_transformation_models
Traditional_methods_2_Algorithm_adaption_models		Traditional_methods_2_Algorithm_adaption_models
data		data
.gitignore		.gitignore
README.md		README.md
project_report.pdf		project_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extreme Multi-label Classification

Task

Data

Methods

Evaluation

LRAP

Training Time

Conclusion

Contributors

About

Releases

Packages

Contributors 3

Languages

TRokieG/Extreme_Multilabel_Classification

Folders and files

Latest commit

History

Repository files navigation

Extreme Multi-label Classification

Task

Data

Methods

Evaluation

LRAP

Training Time

Conclusion

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages