Release 0.6.2 #122

nnansters · 2022-09-16T17:55:38Z

nnansters
Sep 16, 2022
Maintainer

Hey everybody,

Niels from NannyML engineering here to tell you about our 0.6.2 release.

Installing / upgrading

You can get this latest version by using pip:

pip install -U nannyml

Or conda:

conda install -c conda-forge nannyml

What's new?

In this release we've focused on lowering the threshold for trying out NannyML even more. We've made the timestamp_column_name optional. A quick refresher maybe?

The timestamp column is a column in your dataset that represents the time your model was invoked, the time at which your model made a prediction for a given set of features. Having this timestamp allows NannyML to calculate, visualize and track your model performance over time. In a production setting you'll most likely have access to this information, since you're gathering the inputs and outputs of a model deployed somewhere.

However, not everyone is using NannyML within a production context. You might just want to evaluate it. From this version on you'll no longer have to craft an artificial timestamp column just to be able to use NannyML.

There are some side-effects of doing this of course.

We've provided an alternative way of plotting the results; no longer relying on a time-based X-axis but now using the index of each chunk as an X-axis. By splitting your data into chunks, you impose an ordering onto your data. Metrics will be plotted in that order.
You can no longer use the PeriodBasedChunker to chunk your data according to a particular date offset when no timestamp was given.

So, what does that mean for you?

Any code you currently have will still work. 🥳
You can drop the timestamp_column_name argument from any calculator or estimator initializer.

# This will still work, as before
estimator_with_timestamp = nml.CBPE(
    timestamp_column_name='timestamp',
    y_pred_proba='y_pred_proba',
    y_pred='y_pred',
    y_true='work_home_actual',
    metrics=['roc_auc'],
    chunk_size=chunk_size,
    problem_type='classification_binary',
)

# But this is also valid now!
# initialize, specify required data columns, fit estimator and estimate
estimator = nml.CBPE(
    y_pred_proba='y_pred_proba',
    y_pred='y_pred',
    y_true='work_home_actual',
    metrics=['roc_auc'],
    chunk_size=chunk_size,
    problem_type='classification_binary',
)

We documented this behavior a bit more in our data requirements docs.

What's changed?

We've added the missing s3fs dependency that caused our CLI to fail when trying to work with S3 buckets for reading/writing data.
We've fixed some outdated plotting kind constants being used in the Runner class used by the CLI, causing some plot renders to fail.
Some documentation fixes.
We've added a load of tests, mainly concerning plotting and the Runner class.

What's up next?

We're now officially in 🌴 downtime 🍸 , meaning we get to work on some of our passion projects ❤️
The results of those will be announced soon.

I wish you all a fully recharging weekend 🔋

Niels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.6.2 #122

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Release 0.6.2 #122

nnansters Sep 16, 2022 Maintainer

Installing / upgrading

What's new?

What's changed?

What's up next?

Replies: 0 comments

nnansters
Sep 16, 2022
Maintainer