Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the developer experience - what do we need? #566

Open
9 tasks
PGijsbers opened this issue Jul 18, 2023 · 3 comments
Open
9 tasks

Improving the developer experience - what do we need? #566

PGijsbers opened this issue Jul 18, 2023 · 3 comments
Labels
Benchmark Design Discussion or problems with the design of the benchmark itself. enhancement New feature or request quality Something needs to be tested, keep the app functional question Further information is requested

Comments

@PGijsbers
Copy link
Collaborator

I think it is time we invest into the developer experience before adding more features. To open the issue I am sharing my thoughts on where we should put our effort, but I welcome feedback and suggestions! I do not propose to halt all effort in other directions, but I think contributors that want to focus on the longevity of the benchmark should ideally focus their efforts here first. My hope is that by addressing these issues, future features and frameworks will be easier to add and quicker to review.

In my opinion, this project is missing several aspects that I would expect from a modern Python project:

  • Type hints
  • pre-commit for auto-formatting and linting (e.g., ruff, black)
  • Good unit test coverage. Fast isolated tests for things like data ingestion, configuration parsing, and result processing. Due to the stochasticity and time consuming process of installing framework tools, full integration tests are probably left for CI by default.
  • Clear contribution guidelines
  • Step-by-step tutorials for different ways to extend the benchmark (meaning benchmark, constraint, framework)

With that in place (or at least the first three), I think the following refactorings should be highest priority:

  • Refactor the use of Namespaces everywhere. To me it makes it much harder to see what information functions expect from their signatures, and at times code has to be carefully analyzed (or checked at runtime) to determine the exact content of arguments. I propose to replace (most) Namespaces by dataclasses or pydantic models, whenever dynamic assignment isn't necessary. Might need to use something like Pydantic's TypedDict to support merging out of the box.
  • Break down large functions and modules that contain multiple classes and responsibilities (e.g., amlb/runners/aws.py or amlb/datasets/file.py into multiple submodules to make the code easier to navigate.
  • Prefer immutable to mutable.
  • Define abstractions for framework integration #279 which also allows things like recording intermediate results. E.g., train->save artifacts->test->inference time, would more reliably provide results while still attempting to measure inference time.
@PGijsbers PGijsbers added enhancement New feature or request question Further information is requested Benchmark Design Discussion or problems with the design of the benchmark itself. quality Something needs to be tested, keep the app functional labels Jul 18, 2023
@PGijsbers
Copy link
Collaborator Author

In the same line, we could reduce developer overhead by externalizing the framework integrations: #571

This hopefully has the benefit that framework authors claim more ownership of their integration and ensure it stays updated themselves.

@mfeurer
Copy link
Contributor

mfeurer commented Nov 3, 2023

I fully agree on everything you write and the plan you laid out here! This will make AMLB easier to understand, use, and contribute to.

@Innixma
Copy link
Collaborator

Innixma commented Jan 9, 2024

Huge +1 to type hints, especially in code that is executed in a subprocess and thus is hard to debug (such as in the exec.py scripts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Design Discussion or problems with the design of the benchmark itself. enhancement New feature or request quality Something needs to be tested, keep the app functional question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants