-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to support Python 3.9-3.12 #131
Conversation
Shouldn't these other packages be updated as well? Now is the time to see if we can update all the packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit, @johnml1135, and @TaperChipmunk32)
.devcontainer/dockerfile
line 38 at r1 (raw file):
&& pip install poetry==${POETRY_VERSION} black pipenv virtualenv clearml \ && poetry config virtualenvs.in-project true \ && pip install cffi
What package is depending on cffi? And is there a reason it needs to be in the dockerfile rather than pyproject.toml?
.devcontainer/dockerfile-Python312
line 29 at r1 (raw file):
# make some useful symlinks that are expected to exist RUN ln -sfn /usr/bin/python${PYTHON_VERSION} /usr/bin/python3 & \
We're keeping the 3.9 version as the default, so you can probably just remove the 3.12 version of the dev docker container, unless someone else has a reason we should keep it.
.vscode/PythonImportHelper-v2-Completion.json
line 0 at r1 (raw file):
This file probably shouldn't be needed.
Without cffi installed in the docker container, poetry throws many warnings and errors about missing the cffi backend. Also, I did not realize the import helper got pushed, I'll add it to .gitignore. |
I was going to keep transformers downgraded because of some changes made to the NLLBTokenizers in later versions, but I will work on upgrading it instead. I've also started looking into upgrading the other packages mentioned. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #131 +/- ##
==========================================
+ Coverage 88.07% 88.14% +0.07%
==========================================
Files 273 273
Lines 16041 16050 +9
==========================================
+ Hits 14128 14148 +20
+ Misses 1913 1902 -11 ☔ View full report in Codecov by Sentry. |
186527d
to
d18efa3
Compare
I have figured out the change that was made to transformers in version 4.42.0 that breaks our code. It is from this pr. The PR removes some deprecated features that we were still using, instead of the new attributes. I am going to update transformers to the latest minor version and add it to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these changes mean that we can no longer support transformers<4.42.0?
Reviewed 14 of 16 files at r1, 7 of 11 files at r2, 21 of 23 files at r3, 6 of 8 files at r4, 2 of 3 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @johnml1135, @mshannon-sil, and @TaperChipmunk32)
.gitignore
line 141 at r6 (raw file):
#VS Code .vscode
The .vscode
directory shouldn't be ignored.
machine/corpora/parallel_text_corpus.py
line 472 at r6 (raw file):
def iterable() -> Iterable[Tuple[Union[str, int], dict]]: key = ""
Why was this change made?
machine/corpora/parallel_text_corpus.py
line 621 at r6 (raw file):
return len(self._df) return len(self._df[(self._df[self._source_column] != "") & (self._df[self._target_column] != "")]) return len(self._df[self._df[self._source_column].isin(cast(Sequence[str], text_ids))]) & (
Why is this necessary?
machine/jobs/huggingface/hugging_face_nmt_model_factory.py
line 42 at r6 (raw file):
and "clearml" in self._training_args.report_to ): if isinstance(self._training_args.report_to, List):
We typically check isinstance
against the list
class instead of the List
type.
machine/jobs/huggingface/hugging_face_nmt_model_factory.py
line 45 at r6 (raw file):
self._training_args.report_to.remove("clearml") elif isinstance(self._training_args.report_to, str): self._training_args.report_to = None
You should only set this to None
if the string is equal to "clearml"
.
machine/translation/thot/simplex_model_weight_tuner.py
line 86 at r6 (raw file):
model = load_smt_model(self._word_alignment_model_type, parameters) decoder = load_smt_decoder(model, parameters) if decoder is not None:
This is caused by an issue in the Thot type stubs. I will fix that and then these errors should go away.
machine/translation/thot/thot_smt_model.py
line 317 at r6 (raw file):
wa_matrix = WordAlignmentMatrix.from_word_pairs(1, 1, {(0, 0)}) else: src_phrase = list(normalized_source_tokens[source_start_index : source_start_index + src_phrase_len])
Why was this necessary?
tests/corpora/test_dbl_bundle_text_corpus.py
line 1 at r6 (raw file):
from tests.testutils.dbl_bundle_test_environment import DblBundleTestEnvironment
If you add extraPaths = ["tests"]
in the tools.pyright
section of the pyproject.toml file, then it will fix all of these errors.
I believe that the tokenizer fix just merged in mandates that |
This should be reversed. If you want, create a CPU only dev environment that doesn't have the runArgs. |
Previously, ddaspit (Damien Daspit) wrote…
We're these changes due to the updates in python? |
Why are these changes needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 16 files at r1, 5 of 11 files at r2, 3 of 8 files at r4, 2 of 3 files at r5, 1 of 1 files at r6, 30 of 30 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @ddaspit, @mshannon-sil, and @TaperChipmunk32)
@ddaspit @johnml1135 Huggingface just released transformers version 4.46.0, which deprecates the argument "tokenizer" and renames it to "processing_class" for Trainer. This breaks some of our code that uses the Seq2SeqTrainer. So for now I am limiting support to >=4.38.0 and <4.46.0. We can either support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update it to support transformers = ">=4.38.0,<4.46.0"
. Also, you updated the sentencepiece, datasets, and sacremoses versions. Was that necessary?
Reviewed 28 of 30 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @johnml1135, @mshannon-sil, and @TaperChipmunk32)
pyproject.toml
line 67 at r1 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
I was going to keep transformers downgraded because of some changes made to the NLLBTokenizers in later versions, but I will work on upgrading it instead.
I've also started looking into upgrading the other packages mentioned.
There is no need to upgrade the packages if they work. The versions specified here are intended to be the minimum version that Machine supports.
pyproject.toml
line 66 at r9 (raw file):
sil-thot = "^3.4.4" transformers = "^4.38.0, <4.46.0"
The version should be specified as >=4.38.0,<4.46.0
.
machine/corpora/parallel_text_corpus.py
line 621 at r6 (raw file):
Previously, johnml1135 (John Lambert) wrote…
We're these changes due to the updates in python?
I don't receive a type error here. What error are you getting?
machine/translation/thot/simplex_model_weight_tuner.py
line 86 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This is caused by an issue in the Thot type stubs. I will fix that and then these errors should go away.
There is a new version of Thot (3.4.5) that should fix these issues. Also, you should add reportMissingModuleSource = false
in the tool.pyright
section of the pyproject.toml. It will get rid of the spurious warnings when importing Thot modules.
machine/translation/thot/thot_smt_model.py
line 319 at r9 (raw file):
src_phrase = [""] * src_phrase_len for i in range(src_phrase_len): src_phrase[i] = normalized_source_tokens[source_start_index + i] # type: ignore
I don't receive a type error here. What error are you getting?
tests/corpora/test_dbl_bundle_text_corpus.py
line 1 at r6 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
- I'll remove .vscode from .gitignore
- It was left over from another change I made to try and remove an error. It was throwing another error about all code paths must return the correct type and not None. I was able to revert it with no issue.
- isin does not accept set[str], it only accepts Series | DataFrame | Sequence[Unknown] | Mapping[Unknown, Unknown]
- I'll fix that
- I'll fix that
- Alright, I'll wait on those changes to Thot then and revert my changes
- An error was occurring about "Argument of type "Sequence[str]" cannot be assigned to parameter "value" of type "str"". Another option is to just add type: ignore and tests still pass.
- I'll add that in
Can you reply to each individual comment? It is difficult for me to figure out what you are replying to for each comment here.
I've removed it now
It was left over from another change I made to try and remove an error. It was throwing another error about all code paths must return the correct type and not None. I was able to revert it with no issue.
isin does not accept set[str], it only accepts Series | DataFrame | Sequence[Unknown] | Mapping[Unknown, Unknown]
I fixed this
I fixed this
Alright, I updated Thot to the newest version
An error was occurring about "Argument of type "Sequence[str]" cannot be assigned to parameter "value" of type "str"". Another option is to just add type: ignore and tests still pass.
I've added that and reverted those changes. |
I've updated support of transformers to those versions. The other updates were not necessary, so I reverted the change.
isin does not accept set[str], it only accepts Series | DataFrame | Sequence[Unknown] | Mapping[Unknown, Unknown]. I do not get the error on vscode, but in pyright during the build process.
Done, I've updated Thot and added that line to pyproject.toml. I also reverted the changes I made caused by the Thot typing issues.
I also can no longer recreate the error, I'm not sure what was causing it.
I've deleted that old reply and made another that is more readable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r10, 1 of 2 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @johnml1135, @mshannon-sil, and @TaperChipmunk32)
machine/corpora/parallel_text_corpus.py
line 621 at r6 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
isin does not accept set[str], it only accepts Series | DataFrame | Sequence[Unknown] | Mapping[Unknown, Unknown]. I do not get the error on vscode, but in pyright during the build process.
Ok, I was able to reproduce the issue. Rather than casting to Sequence
, you should convert text_ids
to a list and then pass it in to the isin
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135, @mshannon-sil, and @TaperChipmunk32)
@ddaspit @johnml1135 I keep getting an error with Windows directory permission denied that I cannot seem to fix, does anyone have any ideas? You can see the error in the recent build failure. All other OS builds complete successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out the issue. It looks like the trainer and engine aren't getting cleaned up properly in the unit test. I will push a commit with the fix.
Reviewed all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135, @mshannon-sil, and @TaperChipmunk32)
Previously, ddaspit (Damien Daspit) wrote…
ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r10.
Reviewable status: 44 of 48 files reviewed, 5 unresolved discussions (waiting on @ddaspit, @mshannon-sil, and @TaperChipmunk32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you caught the unit test that I missed. Also, can you update to the latest version of the actions in the CI build to get rid of the warnings.
Reviewed 4 of 4 files at r15, 6 of 6 files at r16, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @mshannon-sil and @TaperChipmunk32)
dceea74
to
ce6d5bc
Compare
I updated github actions and the warnings are now gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r17, 32 of 32 files at r18, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @mshannon-sil and @TaperChipmunk32)
@mshannon-sil - are you ok with all the changes? |
Updated Ubuntu version to noble (24.04), along with CUDA version to match Ubuntu Updated dev docker container to Python 3.9 and Ubuntu noble Updated non-dev docker container to Python 3.12 and Ubuntu noble Updated CI to build and test Python 3.9-3.12 Updated transformers support to versions ">=4.38.0,<4.46"
dd290e7
to
d21158a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 11 files at r2, 1 of 30 files at r7, 1 of 5 files at r10, 1 of 1 files at r13, 2 of 4 files at r15, 1 of 6 files at r16, 3 of 5 files at r17, 32 of 32 files at r18, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @TaperChipmunk32)
This PR is ready to be merged. When this is merged, do I need to send out a message telling everyone to rebuild their dev container and update their poetry environment? @ddaspit |
-Updated machine.py to support Python 3.9-3.12
-Updated Ubuntu version to noble (24.04), along with CUDA version to match Ubuntu
-Updated dev docker container to Python 3.9 and Ubuntu noble
-Updated non-dev docker container to Python 3.12 and Ubuntu noble
-Updated CI to build and test Python 3.9-3.12
-Updated transformers support to versions ">=4.38.0,<4.46"
This change is