You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've just released NannyML 0.9.0! In this release we're smashing some bugs, improving some docs and introducing two new calculators types: the summary stats calculators and the data quality calculators!
Installing / upgrading
You can get this latest version by using pip:
pip install -U nannyml
Or conda:
conda install -c conda-forge nannyml
What’s new?
Data quality calculators
We've added our two first data quality metrics to track over time: the number of missing values and the number of unseen values.
The missing values calculator returns the number of missing values in a column for a given chunk. This allows you to track this number over time and compare it to the amount of missing values for that column in your reference data.
The unseen values calculator checks for any values in categorical features that have not occurred in your reference data.
The following snippet illustrates how to set up unseen values tracking:
import nannyml as nml
from IPython.display import display
reference, analysis, analysis_targets = nml.load_titanic_dataset()
display(reference.head())
selected_columns = [
'Sex', 'Ticket', 'Cabin', 'Embarked',
]
calc = nml.UnseenValuesCalculator(
column_names=selected_columns,
)
calc.fit(reference)
results = calc.calculate(analysis)
display(results.filter(period='all').to_df())
for column_name in results.column_names:
results.filter(column_names=column_name).plot().show()
With these calculators you can track the evolution of summary statistics over time. Currently supported summary stats are:
Summation
Average
Standard Deviation
Median
Row count
NannyML will determine thresholds for the summary statistic values based on the reference period data and raise an alert when new values exceed those thresholds.
The following snippet shows how to set up monitoring of the median:
import nannyml as nml
from IPython.display import display
reference, analysis, analysis_targets = nml.load_synthetic_car_loan_dataset()
display(reference.head())
selected_columns = [
'car_value', 'debt_to_income_ratio', 'driver_tenure'
]
calc = nml.SummaryStatsMedianCalculator(
column_names=selected_columns,
)
calc.fit(reference)
results = calc.calculate(analysis)
display(results.filter(period='all').to_df())
for column_name in results.column_names:
results.filter(column_names=column_name).plot().show()
We have multiple proverbial irons in the fire so library development has slowed down a bit. We'll be picking up the pace soon with some fundamental changes!
We hope our new functionality improves your quality of life (and deployed models). As always, any feedback is encouraged!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello everybody!
We've just released NannyML 0.9.0! In this release we're smashing some bugs, improving some docs and introducing two new calculators types: the summary stats calculators and the data quality calculators!
Installing / upgrading
You can get this latest version by using pip:
pip install -U nannyml
Or conda:
conda install -c conda-forge nannyml
What’s new?
Data quality calculators
We've added our two first data quality metrics to track over time: the number of missing values and the number of unseen values.
The missing values calculator returns the number of missing values in a column for a given chunk. This allows you to track this number over time and compare it to the amount of missing values for that column in your reference data.
The unseen values calculator checks for any values in categorical features that have not occurred in your reference data.
The following snippet illustrates how to set up unseen values tracking:
You can read more about the data quality calculators in the missing values calculator tutorial and the unseen values calculator tutorial.
Summary stats calculators
With these calculators you can track the evolution of summary statistics over time. Currently supported summary stats are:
NannyML will determine thresholds for the summary statistic values based on the reference period data and raise an alert when new values exceed those thresholds.
The following snippet shows how to set up monitoring of the median:
Read more about it in the summary stats tutorials.
What's next?
We have multiple proverbial irons in the fire so library development has slowed down a bit. We'll be picking up the pace soon with some fundamental changes!
We hope our new functionality improves your quality of life (and deployed models). As always, any feedback is encouraged!
Reach out in our community Slack, log a bug or a feature request our repository or just leave us a star for positive holiday vibes!
Beta Was this translation helpful? Give feedback.
All reactions