-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add T5 LM v1.1 encoder #550
Conversation
512356f
to
7c88eb3
Compare
.github/workflows/ci_eval.yaml
Outdated
|
||
- name: Run long running tests | ||
run: | | ||
pytest --longrun \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this take to run to have this fall under longrun? Ideally we should be testing the models on presubmit if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes 30 seconds. These are 2 models small and xxl. Maybe I can leave only small on presubmit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it takes only 30 seconds each you might as well run both and keep them on presubmit instead of longrun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These presubmit tests will need to run on llama-mi300x-3
or other machines that have the files available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored a bit. Added option --with-t5-data
that enables tests that depend on T5 data. Made only the XXL eager mode test longrun. Added a presubmit job that runs the small
model test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXL is more important for flux like models so might be better to ensure that one on presubmit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -256,6 +272,16 @@ def get_model_artifacts(request: FixtureRequest): | |||
model_path["llama3_405b_fp8_model_path"] = set_fixture_from_cli_option( | |||
request, "--llama3-405b-fp8-model-path", "llama3_405b_fp8_model" | |||
) | |||
model_path["google__t5_v1_1_small_fp32_model_path"] = set_fixture_from_cli_option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need 2 separate model_path flags for small vs xxl models instead of just calling this twice from our tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the already accepted nomenclature for the Llama variants. I think we will get a lot more variants like fp16 and other quantizations.
We probably want sane defaults for all files so that you can do pytest sharktank/tests
if you got the files at their expected places already. It is important to have a simple command to run all tests.
for arg in args: | ||
res = elementwise(operator, res, arg) | ||
return res | ||
def elementwise_trenary(operator, x, y, z, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed? Also should this be elementwise_ternary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also expect this should be InferenceTensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the typo.
Originally I added this for convenience. Here is removed as it conflicts with dispatching to ternary ops. If the user wants binary op + folding they should do that themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KyleHerndon, I think this should not accept any inference tensor as we do unboxing, which would be undesirable for some tensor types. The unboxing here is fine as it is a no-op.
If someone wants an implementation for example for quantized tensors they should write one.
I guess it is about whether we want to perform potentially a very inefficient computation or to fail.
def testV1_1Fp32CompareTorchEagerAgainstHuggingFace(self, huggingface_repo_id: str): | ||
get_dataset( | ||
huggingface_repo_id, | ||
).download() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just cache the model for the CI instead of downloading each time? May end up with failures due to corrupted downloads on occasion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses huggingface's cache underneath. If some of the CI starts failing due to a suspected corrupted cache someone will need to clear it manually.
@@ -35,26 +34,6 @@ def testBroadcastDims(self): | |||
assert res[1] == 2 | |||
|
|||
|
|||
class ElementwiseTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we no longer testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests the binary op + folding, which is no longer present.
parser.addoption( | ||
"--google-t5-v1-1-small-fp32-model-path", | ||
type=Path, | ||
default="/data/t5/small/google__t5-v1_1-small_fp32.gguf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't hardcode any llama model/tokenizer paths anymore here. You can pass it as an arg directly to pytest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have a default that allows you to pytest sharktank/tests
if you have data at default paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For someone running these tests on a machine that does not have the data in the default paths we should have at least a comment with a link or something for how/where to get this data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
The encoder shares much of the underlying stack as the decoder. Here only the encoder is presented as a class. I have not gone out of my way to strip all decoder related stuff from the stack. Things like check-pointing and dropout are stripped. The author attribution is added to the license of the T5 model file as this seems like a derivative work. They are both Apache 2.0. There are a few tests of the various components and 2 tests for the entire encoder for the small and xxl variants. They relay on huggingface and the models are downloaded no the fly into the cache. The tests expect the corresponding GGUF files to be already preset and available on the file system.
The encoder shares much of the underlying stack as the decoder. Here only the encoder is presented as a class.
I have not gone out of my way to strip all decoder related stuff from the stack. Things like check-pointing and dropout are stripped.
The author attribution is added to the license of the T5 model file as this seems like a derivative work. They are both Apache 2.0.
There are a few tests of the various components and 2 tests for the entire encoder for the small and xxl variants. They relay on huggingface and the models are downloaded no the fly into the cache. The tests expect the corresponding GGUF files to be already preset and available on the file system.