Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further standardize the layout and packaging #271

Open
mjpost opened this issue Jul 29, 2024 · 3 comments
Open

Further standardize the layout and packaging #271

mjpost opened this issue Jul 29, 2024 · 3 comments

Comments

@mjpost
Copy link
Owner

mjpost commented Jul 29, 2024

          > The build now [fails with](https://github.com/mjpost/sacrebleu/actions/runs/10117535642/job/28001732454?pr=270#step:12:29):
/tmp/build-env-uf1gjo1m/lib/python3.12/site-packages/setuptools_scm/git.py:167: UserWarning: "/home/runner/work/sacrebleu/sacrebleu" is shallow and may cause errors
  warnings.warn(f'"{wd.path}" is shallow and may cause errors')
error: Multiple top-level packages discovered in a flat-layout: ['data', 'sacrebleu'].

I've been trying to keep this change as small as possible, but there seem to be a few non-standard things about the layout and packaging of the source tree. I could go all in and fix all that, if you're interested? I don't want to overstep, though.

Originally posted by @dhellmann in #270 (comment)

@dhellmann
Copy link
Contributor

Some of the things I had in mind include

  • moving source into the src directory, where packaging tools look for it by default (not required, now that we got the finders settings right, but it lets you package the tests separately from the source)
  • setting up tox with environments for testing more easily locally (we can keep the make targets as entry points)
  • setting up [test] extras for installing test dependencies accurately

Let me know if any of that looks like it wouldn't be useful.

@martinpopel
Copy link
Collaborator

  • Ad src: I see that src/ is now the recommended way and personally, I like it. (But wasn't it the package name few years ago?)

There are users who prefer to use sacrebleu without installation like

git clone https://github.com/mjpost/sacrebleu
./sacrebleu/sacrebleu/sacrebleu.py [...]

So these users will need to switch to ./sacrebleu/src/sacrebleu.py [...], which is hopefully no problem for them, but we should consider releasing a version with a deprecation warning first (at least if it is easy for us).

@dhellmann
Copy link
Contributor

  • Ad src: I see that src/ is now the recommended way and personally, I like it. (But wasn't it the package name few years ago?)

It was. I don't know the full history, and honestly it took me a while to get used to having the extra directory there. This isn't the most important change, so we could skip it, especially if you have users who regularly use sacrebleu from a source repo instead of package.

  • Ad tox: Currently, we test using python -m pytest and ./test.sh. Do you plan tox to execute the Bash script or do you plan to rewrite it to pure Python? There are tests such as sacrebleu -t wmt17 -l en-fi --echo ref | sacrebleu -b -t wmt17/B -l en-fi --detail, which test if two sacrebleu instances can be executed concurrently in a pipeline because it is a common use case and there were bugs in this part (I think the first sacrebleu process locked some files which caused errors in the second process).

Tox would manage a virtualenv with sacrebleu and its dependencies, then run whatever commands are needed to test that code. Both pytest and test.sh could be run through tox. The main benefit is ensuring you're testing with the right dependencies without having to manually set up those dependencies, but it also provides a nice layer to describe various test scenarios that have to be completely independent. The linter and runtime dependencies don't have to be installed into the same virtualenv, for example. For most of my python projects, I use tox.ini in the way you're using Makefile, to provide a documented set of steps for running a task related to maintaining the project.

Ah, yeah, you have "dev" already, which is effectively the same thing. The name "test" is a common standard. I usually have "test" and leave the build and linter dependencies to either tox or the GitHub action definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants