Improving the developer experience - what do we need? #566

PGijsbers · 2023-07-18T14:04:14Z

I think it is time we invest into the developer experience before adding more features. To open the issue I am sharing my thoughts on where we should put our effort, but I welcome feedback and suggestions! I do not propose to halt all effort in other directions, but I think contributors that want to focus on the longevity of the benchmark should ideally focus their efforts here first. My hope is that by addressing these issues, future features and frameworks will be easier to add and quicker to review.

In my opinion, this project is missing several aspects that I would expect from a modern Python project:

Type hints
pre-commit for auto-formatting and linting (e.g., ruff, black)
Good unit test coverage. Fast isolated tests for things like data ingestion, configuration parsing, and result processing. Due to the stochasticity and time consuming process of installing framework tools, full integration tests are probably left for CI by default.
Clear contribution guidelines
Step-by-step tutorials for different ways to extend the benchmark (meaning benchmark, constraint, framework)

With that in place (or at least the first three), I think the following refactorings should be highest priority:

Refactor the use of Namespaces everywhere. To me it makes it much harder to see what information functions expect from their signatures, and at times code has to be carefully analyzed (or checked at runtime) to determine the exact content of arguments. I propose to replace (most) Namespaces by dataclasses or pydantic models, whenever dynamic assignment isn't necessary. Might need to use something like Pydantic's TypedDict to support merging out of the box.
Break down large functions and modules that contain multiple classes and responsibilities (e.g., amlb/runners/aws.py or amlb/datasets/file.py into multiple submodules to make the code easier to navigate.
Prefer immutable to mutable.
Define abstractions for framework integration #279 which also allows things like recording intermediate results. E.g., train->save artifacts->test->inference time, would more reliably provide results while still attempting to measure inference time.

The text was updated successfully, but these errors were encountered:

PGijsbers · 2023-07-20T10:48:40Z

In the same line, we could reduce developer overhead by externalizing the framework integrations: #571

This hopefully has the benefit that framework authors claim more ownership of their integration and ensure it stays updated themselves.

mfeurer · 2023-11-03T17:04:37Z

I fully agree on everything you write and the plan you laid out here! This will make AMLB easier to understand, use, and contribute to.

Innixma · 2024-01-09T20:04:32Z

Huge +1 to type hints, especially in code that is executed in a subprocess and thus is hard to debug (such as in the exec.py scripts).

PGijsbers added enhancement New feature or request question Further information is requested Benchmark Design Discussion or problems with the design of the benchmark itself. quality Something needs to be tested, keep the app functional labels Jul 18, 2023

PGijsbers mentioned this issue Jul 19, 2023

Start adding static type checking #567

Merged

PGijsbers pinned this issue Jul 22, 2023

PGijsbers mentioned this issue Oct 3, 2023

Any documentation on OpenmlDataset and TaskConfig? #592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the developer experience - what do we need? #566

Improving the developer experience - what do we need? #566

PGijsbers commented Jul 18, 2023

PGijsbers commented Jul 20, 2023

mfeurer commented Nov 3, 2023

Innixma commented Jan 9, 2024

Improving the developer experience - what do we need? #566

Improving the developer experience - what do we need? #566

Comments

PGijsbers commented Jul 18, 2023

PGijsbers commented Jul 20, 2023

mfeurer commented Nov 3, 2023

Innixma commented Jan 9, 2024