Add T5 LM v1.1 encoder #550

sogartar · 2024-11-15T20:06:50Z

The encoder shares much of the underlying stack as the decoder. Here only the encoder is presented as a class.
I have not gone out of my way to strip all decoder related stuff from the stack. Things like check-pointing and dropout are stripped.

The author attribution is added to the license of the T5 model file as this seems like a derivative work. They are both Apache 2.0.

There are a few tests of the various components and 2 tests for the entire encoder for the small and xxl variants. They relay on huggingface and the models are downloaded no the fly into the cache. The tests expect the corresponding GGUF files to be already preset and available on the file system.

sharktank/sharktank/layers/ffn_block.py

sharktank/sharktank/models/t5/t5.py

IanNod · 2024-11-18T16:20:33Z

.github/workflows/ci_eval.yaml

+
+      - name: Run long running tests
+        run: |
+          pytest --longrun \


How long does this take to run to have this fall under longrun? Ideally we should be testing the models on presubmit if possible

It takes 30 seconds. These are 2 models small and xxl. Maybe I can leave only small on presubmit.

If it takes only 30 seconds each you might as well run both and keep them on presubmit instead of longrun

These presubmit tests will need to run on llama-mi300x-3 or other machines that have the files available.

I refactored a bit. Added option --with-t5-data that enables tests that depend on T5 data. Made only the XXL eager mode test longrun. Added a presubmit job that runs the small model test.

XXL is more important for flux like models so might be better to ensure that one on presubmit

IanNod · 2024-11-18T16:22:31Z

sharktank/conftest.py

@@ -256,6 +272,16 @@ def get_model_artifacts(request: FixtureRequest):
    model_path["llama3_405b_fp8_model_path"] = set_fixture_from_cli_option(
        request, "--llama3-405b-fp8-model-path", "llama3_405b_fp8_model"
    )
+    model_path["google__t5_v1_1_small_fp32_model_path"] = set_fixture_from_cli_option(


Do we need 2 separate model_path flags for small vs xxl models instead of just calling this twice from our tests?

This follows the already accepted nomenclature for the Llama variants. I think we will get a lot more variants like fp16 and other quantizations.
We probably want sane defaults for all files so that you can do pytest sharktank/tests if you got the files at their expected places already. It is important to have a simple command to run all tests.

IanNod · 2024-11-18T16:24:59Z

sharktank/sharktank/ops/default_impls.py

-    for arg in args:
-        res = elementwise(operator, res, arg)
-    return res
+def elementwise_trenary(operator, x, y, z, *args, **kwargs):


Why is this change needed? Also should this be elementwise_ternary?

I also expect this should be InferenceTensor

Fixed the typo.
Originally I added this for convenience. Here is removed as it conflicts with dispatching to ternary ops. If the user wants binary op + folding they should do that themselves.

@KyleHerndon, I think this should not accept any inference tensor as we do unboxing, which would be undesirable for some tensor types. The unboxing here is fine as it is a no-op.
If someone wants an implementation for example for quantized tensors they should write one.
I guess it is about whether we want to perform potentially a very inefficient computation or to fail.

sharktank/sharktank/ops/signatures.py

IanNod · 2024-11-18T16:29:00Z

sharktank/tests/models/t5/t5_test.py

+    def testV1_1Fp32CompareTorchEagerAgainstHuggingFace(self, huggingface_repo_id: str):
+        get_dataset(
+            huggingface_repo_id,
+        ).download()


should we just cache the model for the CI instead of downloading each time? May end up with failures due to corrupted downloads on occasion.

This uses huggingface's cache underneath. If some of the CI starts failing due to a suspected corrupted cache someone will need to clear it manually.

IanNod · 2024-11-18T16:30:53Z

sharktank/tests/ops/ops_test.py

@@ -35,26 +34,6 @@ def testBroadcastDims(self):
        assert res[1] == 2


-class ElementwiseTest(unittest.TestCase):


Why are we no longer testing?

This tests the binary op + folding, which is no longer present.

.github/workflows/ci_eval.yaml

archana-ramalingam · 2024-11-18T19:58:19Z

sharktank/conftest.py

+    parser.addoption(
+        "--google-t5-v1-1-small-fp32-model-path",
+        type=Path,
+        default="/data/t5/small/google__t5-v1_1-small_fp32.gguf",


We don't hardcode any llama model/tokenizer paths anymore here. You can pass it as an arg directly to pytest.

Why not have a default that allows you to pytest sharktank/tests if you have data at default paths?

For someone running these tests on a machine that does not have the data in the default paths we should have at least a comment with a link or something for how/where to get this data

I added a comment.

sharktank/sharktank/layers/ffn_block.py

.github/workflows/ci_eval.yaml

sharktank/tests/evaluate/perplexity_torch_test.py

.github/workflows/ci_eval.yaml

sharktank/sharktank/layers/configs/llm_configs.py

sharktank/sharktank/layers/ffn_block.py

sharktank/sharktank/layers/configs/llm_configs.py

.github/workflows/ci-sharktank.yml

sharktank/sharktank/models/t5/t5.py

The encoder shares much of the underlying stack as the decoder. Here only the encoder is presented as a class. I have not gone out of my way to strip all decoder related stuff from the stack. Things like check-pointing and dropout are stripped. The author attribution is added to the license of the T5 model file as this seems like a derivative work. They are both Apache 2.0. There are a few tests of the various components and 2 tests for the entire encoder for the small and xxl variants. They relay on huggingface and the models are downloaded no the fly into the cache. The tests expect the corresponding GGUF files to be already preset and available on the file system.

sogartar force-pushed the t5 branch from e85be1a to 36e48dc Compare November 15, 2024 20:07

sogartar changed the title ~~Add T5 LM v1.1 encoder~~ WIP Add T5 LM v1.1 encoder Nov 15, 2024

sogartar marked this pull request as draft November 15, 2024 20:07

sogartar force-pushed the t5 branch 3 times, most recently from 512356f to 7c88eb3 Compare November 15, 2024 21:47

sogartar changed the title ~~WIP Add T5 LM v1.1 encoder~~ Add T5 LM v1.1 encoder Nov 15, 2024

sogartar requested review from IanNod, KyleHerndon and rsuderman November 15, 2024 23:24

sogartar marked this pull request as ready for review November 15, 2024 23:25

archana-ramalingam reviewed Nov 16, 2024

View reviewed changes

sharktank/sharktank/layers/ffn_block.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Nov 16, 2024

View reviewed changes

sharktank/sharktank/layers/ffn_block.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Nov 16, 2024

View reviewed changes

sharktank/sharktank/models/t5/t5.py Outdated Show resolved Hide resolved

sogartar commented Nov 18, 2024

View reviewed changes

sharktank/sharktank/models/t5/t5.py Show resolved Hide resolved

sogartar requested a review from archana-ramalingam November 18, 2024 15:42

IanNod reviewed Nov 18, 2024

View reviewed changes

sogartar requested a review from IanNod November 18, 2024 19:41

archana-ramalingam reviewed Nov 18, 2024

View reviewed changes

.github/workflows/ci_eval.yaml Outdated Show resolved Hide resolved

archana-ramalingam reviewed Nov 18, 2024

View reviewed changes

sharktank/sharktank/layers/ffn_block.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Nov 18, 2024

View reviewed changes

sharktank/sharktank/layers/ffn_block.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Nov 18, 2024

View reviewed changes

.github/workflows/ci_eval.yaml Outdated Show resolved Hide resolved

sogartar requested a review from archana-ramalingam November 18, 2024 21:41

sogartar mentioned this pull request Nov 20, 2024

Add T5 encoder exporting to MLIR and numerics verification with IREE #573

Merged