Skip to content

Commit

Permalink
Merge pull request #9 from adamingas/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
adamingas authored Jan 20, 2024
2 parents 047267e + 6856561 commit acc72e7
Show file tree
Hide file tree
Showing 4 changed files with 71 additions and 36 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/deploymemt.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python application

on:
push:
branches: [ "main" ]
pull_request:
branches: ['main']
workflow_run:
workflows: ['ci']
types:
- completed

permissions:
contents: read

jobs:
cd:
# Only run this job if new work is pushed to "main"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
# Set up operating system
runs-on: ubuntu-latest
permissions:
id-token: write
environment:
name: pypi
url: https://pypi.org/p/ordinalgbt
# Define job steps
steps:
- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- uses: actions/checkout@v3
# Here we run build to create a wheel and a
# .tar.gz source distribution.
- name: Build package
run: python -m build --sdist --wheel
# Finally, we use a pre-defined action to publish
# our package in place of twine.
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
- name: Test install from PyPi
run: |
pip install ordinalgbt
35 changes: 1 addition & 34 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,37 +37,4 @@ jobs:
- uses: chartboost/ruff-action@v1
- name: Test with pytest
run: |
pytest
cd:
needs: ci
# Only run this job if new work is pushed to "main"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
# Set up operating system
runs-on: ubuntu-latest
permissions:
id-token: write
environment:
name: pypi
url: https://pypi.org/p/ordinalgbt
# Define job steps
steps:
- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- uses: actions/checkout@v3
# Here we run build to create a wheel and a
# .tar.gz source distribution.
- name: Build package
run: python -m build --sdist --wheel
# Finally, we use a pre-defined action to publish
# our package in place of twine.
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
- name: Test install from PyPi
run: |
pip install ordinalgbt
pytest
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,13 @@ The `predict_proba` method can be used to get the probabilities of each class:
y_proba = model.predict_proba(X_new)

print(y_proba)
```
```

## TODOs
* Create XGBoost and Catboost implementations
* Bring test coverage to 100%
* Implement the all-thresholds loss function
* Implement the ordistic loss function
* Create more stable sigmoid calculation
* Experiment with bounded and unbounded optimisation for the thresholds
* Identify way to reduce jumps due to large gradient
10 changes: 9 additions & 1 deletion docs/motivation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"Usually when faced with prediction problems involving ordered labels (i.e. low, medium, high) and tabular data, data scientists turn to regular multinomial classifiers from the gradient boosted tree family of models, because of their ease of use, speed of fitting, and good performance. Parametric ordinal models have been around for a while, but they have not been popular because of their poor performance compared to the gradient boosted models, especially for larger datasets.\n",
"\n",
"Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones."
"Although classifiers can predict ordinal labels adequately, they require building as many classifiers as there are labels to predict. This approach, however, leads to slower training times, and confusing feature interpretations. For example, a feature which is positively associated with the increasing order of the label set (i.e. as the feature's value grows, so do the probabilities of the higher ordered labels), will va a positive association with the highest ordered label, negative with the lowest ordered, and a \"concave\" association with the middle ones.\n"
]
},
{
Expand All @@ -33,6 +33,14 @@
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"There's been recurring requests from the community for an ordinal loss implementation in all of the major gradient boosting model frameworks ([LightGBM](https://github.com/microsoft/LightGBM/issues/5882), [XGBoost](https://github.com/dmlc/xgboost/issues/5243), [XGBoost](https://github.com/dmlc/xgboost/issues/695), [CatBoost](https://github.com/catboost/catboost/issues/1994))."
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down

0 comments on commit acc72e7

Please sign in to comment.