Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add T5 LM v1.1 encoder #550

Merged
merged 12 commits into from
Nov 20, 2024
Merged

Add T5 LM v1.1 encoder #550

merged 12 commits into from
Nov 20, 2024

Conversation

sogartar
Copy link
Contributor

The encoder shares much of the underlying stack as the decoder. Here only the encoder is presented as a class.
I have not gone out of my way to strip all decoder related stuff from the stack. Things like check-pointing and dropout are stripped.

The author attribution is added to the license of the T5 model file as this seems like a derivative work. They are both Apache 2.0.

There are a few tests of the various components and 2 tests for the entire encoder for the small and xxl variants. They relay on huggingface and the models are downloaded no the fly into the cache. The tests expect the corresponding GGUF files to be already preset and available on the file system.

@sogartar sogartar changed the title Add T5 LM v1.1 encoder WIP Add T5 LM v1.1 encoder Nov 15, 2024
@sogartar sogartar marked this pull request as draft November 15, 2024 20:07
@sogartar sogartar changed the title WIP Add T5 LM v1.1 encoder Add T5 LM v1.1 encoder Nov 15, 2024
@sogartar sogartar marked this pull request as ready for review November 15, 2024 23:25

- name: Run long running tests
run: |
pytest --longrun \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this take to run to have this fall under longrun? Ideally we should be testing the models on presubmit if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes 30 seconds. These are 2 models small and xxl. Maybe I can leave only small on presubmit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it takes only 30 seconds each you might as well run both and keep them on presubmit instead of longrun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These presubmit tests will need to run on llama-mi300x-3 or other machines that have the files available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored a bit. Added option --with-t5-data that enables tests that depend on T5 data. Made only the XXL eager mode test longrun. Added a presubmit job that runs the small model test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXL is more important for flux like models so might be better to ensure that one on presubmit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -256,6 +272,16 @@ def get_model_artifacts(request: FixtureRequest):
model_path["llama3_405b_fp8_model_path"] = set_fixture_from_cli_option(
request, "--llama3-405b-fp8-model-path", "llama3_405b_fp8_model"
)
model_path["google__t5_v1_1_small_fp32_model_path"] = set_fixture_from_cli_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 2 separate model_path flags for small vs xxl models instead of just calling this twice from our tests?

Copy link
Contributor Author

@sogartar sogartar Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the already accepted nomenclature for the Llama variants. I think we will get a lot more variants like fp16 and other quantizations.
We probably want sane defaults for all files so that you can do pytest sharktank/tests if you got the files at their expected places already. It is important to have a simple command to run all tests.

for arg in args:
res = elementwise(operator, res, arg)
return res
def elementwise_trenary(operator, x, y, z, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed? Also should this be elementwise_ternary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also expect this should be InferenceTensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the typo.
Originally I added this for convenience. Here is removed as it conflicts with dispatching to ternary ops. If the user wants binary op + folding they should do that themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KyleHerndon, I think this should not accept any inference tensor as we do unboxing, which would be undesirable for some tensor types. The unboxing here is fine as it is a no-op.
If someone wants an implementation for example for quantized tensors they should write one.
I guess it is about whether we want to perform potentially a very inefficient computation or to fail.

sharktank/sharktank/ops/signatures.py Show resolved Hide resolved
def testV1_1Fp32CompareTorchEagerAgainstHuggingFace(self, huggingface_repo_id: str):
get_dataset(
huggingface_repo_id,
).download()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just cache the model for the CI instead of downloading each time? May end up with failures due to corrupted downloads on occasion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses huggingface's cache underneath. If some of the CI starts failing due to a suspected corrupted cache someone will need to clear it manually.

@@ -35,26 +34,6 @@ def testBroadcastDims(self):
assert res[1] == 2


class ElementwiseTest(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we no longer testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the binary op + folding, which is no longer present.

@sogartar sogartar requested a review from IanNod November 18, 2024 19:41
parser.addoption(
"--google-t5-v1-1-small-fp32-model-path",
type=Path,
default="/data/t5/small/google__t5-v1_1-small_fp32.gguf",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't hardcode any llama model/tokenizer paths anymore here. You can pass it as an arg directly to pytest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have a default that allows you to pytest sharktank/tests if you have data at default paths?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone running these tests on a machine that does not have the data in the default paths we should have at least a comment with a link or something for how/where to get this data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

.github/workflows/ci-sharktank.yml Outdated Show resolved Hide resolved
sharktank/sharktank/models/t5/t5.py Show resolved Hide resolved
@sogartar sogartar requested a review from IanNod November 20, 2024 21:02
The encoder shares much of the underlying stack as the decoder. Here
only the encoder is presented as a class.
I have not gone out of my way to strip all decoder related stuff from
the stack. Things like check-pointing and dropout are stripped.

The author attribution is added to the license of the T5 model file as
this seems like a derivative work. They are both Apache 2.0.

There are a few tests of the various components and 2 tests for the
entire encoder for the small and xxl variants. They relay on huggingface
and the models are downloaded no the fly into the cache.
The tests expect the corresponding GGUF files to be already preset and
available on the file system.
@sogartar sogartar merged commit 9535984 into nod-ai:main Nov 20, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants