Budgets for schools and school districts are huge, complex, and unwieldy. It's no easy task to digest where and how schools are using their resources. Education Resource Strategies is a non-profit that tackles just this task with the goal of letting districts be smarter, more strategic, and more effective in their spending.
Your task is a multi-class-multi-label classification problem with the goal of attaching canonical labels to the freeform text in budget line items. These labels let ERS understand how schools are spending money and tailor their strategy recommendations to improve outcomes for students, teachers, and administrators.
This repository contains code volunteered from leading competitors in the Box-Plots for Education on DrivenData.
Winning code for other DrivenData competitions is available in the competition-winners repository.
Place | Team or User | Public Score | Private Score | Summary of Model |
---|---|---|---|---|
1 | quocnle | 0.3665 | 0.3650 | My model is based on Online Learning, specifically a Logistic Regression model that uses the hashing trick and stochastic gradient descent with an adaptive learning rate. |
2 | Abhishek | 0.4409 | 0.4388 | The problem was treated as an NLP problem rather than a machine learning problem with some structured dataset. |
3 | giba | 0.4551 | 0.4534 | My approach is based in a Gradient Boosted Machine, so all text must be converted to an identification id (number). |