diff --git a/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt b/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt index 4cb02e6a0..e9690b677 100644 --- a/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt +++ b/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt @@ -30,14 +30,14 @@ Verified for distributed inference mode via DeepSpeed *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future. -Please check `LLM best known practice <../../examples/cpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts. +Please check `LLM best known practice `_ for instructions to install/setup environment and example scripts. Module Level Optimization API for customized LLM (Prototype) ------------------------------------------------------------ In the past year, LLM has been flourishing with many open-sourced models contributed to the community, while researchers are building their own LLMs from transformer blocks with variants in implementation details. To help LLM researchers and developers improve their productivity, Intel® Extension for PyTorch* provides module level optimizations for commonly used LLM modules and functionalities, which are operators or certain operator combinations in nature. -Please check `LLM module level optimization practice <../../examples/cpu/inference/python/llm-modeling>`_ to better understand how to use `module level APIs `_ to optimize your LLM and achieve better performance. +Please check `LLM module level optimization practice `_ to better understand how to use `module level APIs `_ to optimize your LLM and achieve better performance. Demos ----- diff --git a/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html b/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html index 42911773f..759ecfb0d 100644 --- a/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html +++ b/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html @@ -125,7 +125,7 @@

Intel® Extension for PyTorch* CPU ISA Dynamic Dispatch Design DocSphinx using a theme provided by Read the Docs. - +

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
diff --git a/cpu/2.3.0+cpu/genindex.html b/cpu/2.3.0+cpu/genindex.html index e881b05a6..ef43fe926 100644 --- a/cpu/2.3.0+cpu/genindex.html +++ b/cpu/2.3.0+cpu/genindex.html @@ -375,7 +375,7 @@

V

Built with Sphinx using a theme provided by Read the Docs. - +

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
diff --git a/cpu/2.3.0+cpu/index.html b/cpu/2.3.0+cpu/index.html index faeb6e121..b05e1f672 100644 --- a/cpu/2.3.0+cpu/index.html +++ b/cpu/2.3.0+cpu/index.html @@ -182,7 +182,7 @@

SupportSphinx using a theme provided by Read the Docs. - +

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
diff --git a/cpu/2.3.0+cpu/py-modindex.html b/cpu/2.3.0+cpu/py-modindex.html index 7aab41737..323572252 100644 --- a/cpu/2.3.0+cpu/py-modindex.html +++ b/cpu/2.3.0+cpu/py-modindex.html @@ -165,7 +165,7 @@

Python Module Index

Built with Sphinx using a theme provided by Read the Docs. - +

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
diff --git a/cpu/2.3.0+cpu/search.html b/cpu/2.3.0+cpu/search.html index a1f97b24d..d94b3c725 100644 --- a/cpu/2.3.0+cpu/search.html +++ b/cpu/2.3.0+cpu/search.html @@ -133,7 +133,7 @@ Built with Sphinx using a theme provided by Read the Docs. - +

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
diff --git a/cpu/2.3.0+cpu/searchindex.js b/cpu/2.3.0+cpu/searchindex.js index c8847b61b..e55de425d 100644 --- a/cpu/2.3.0+cpu/searchindex.js +++ b/cpu/2.3.0+cpu/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["design_doc/cpu/isa_dyndisp", "index", "tutorials/api_doc", "tutorials/blogs_publications", "tutorials/cheat_sheet", "tutorials/contribution", "tutorials/examples", "tutorials/features", "tutorials/features/amp", "tutorials/features/auto_channels_last", "tutorials/features/codeless_optimization", "tutorials/features/fast_bert", "tutorials/features/graph_capture", "tutorials/features/graph_optimization", "tutorials/features/hypertune", "tutorials/features/int8_overview", "tutorials/features/int8_recipe_tuning_api", "tutorials/features/isa_dynamic_dispatch", "tutorials/features/nhwc", "tutorials/features/optimizer_fusion", "tutorials/features/runtime_extension", "tutorials/features/split_sgd", "tutorials/features/sq_recipe_tuning_api", "tutorials/getting_started", "tutorials/installation", "tutorials/introduction", "tutorials/known_issues", "tutorials/license", "tutorials/llm", "tutorials/llm/llm_optimize", "tutorials/performance", "tutorials/performance_tuning/launch_script", "tutorials/performance_tuning/torchserve", "tutorials/performance_tuning/tuning_guide", "tutorials/releases"], "filenames": ["design_doc/cpu/isa_dyndisp.md", "index.rst", "tutorials/api_doc.rst", "tutorials/blogs_publications.md", "tutorials/cheat_sheet.md", "tutorials/contribution.md", "tutorials/examples.md", "tutorials/features.rst", "tutorials/features/amp.md", "tutorials/features/auto_channels_last.md", "tutorials/features/codeless_optimization.md", "tutorials/features/fast_bert.md", "tutorials/features/graph_capture.md", "tutorials/features/graph_optimization.md", "tutorials/features/hypertune.md", "tutorials/features/int8_overview.md", "tutorials/features/int8_recipe_tuning_api.md", "tutorials/features/isa_dynamic_dispatch.md", "tutorials/features/nhwc.md", "tutorials/features/optimizer_fusion.md", "tutorials/features/runtime_extension.md", "tutorials/features/split_sgd.rst", "tutorials/features/sq_recipe_tuning_api.md", "tutorials/getting_started.md", "tutorials/installation.md", "tutorials/introduction.rst", "tutorials/known_issues.md", "tutorials/license.md", "tutorials/llm.rst", "tutorials/llm/llm_optimize.md", "tutorials/performance.md", "tutorials/performance_tuning/launch_script.md", "tutorials/performance_tuning/torchserve.md", "tutorials/performance_tuning/tuning_guide.md", "tutorials/releases.md"], "titles": ["Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc", "Intel\u00ae Extension for PyTorch*", "API Documentation", "Blogs & Publications", "Cheat Sheet", "Contribution", "Examples", "Features", "Auto Mixed Precision (AMP)", "Auto Channels Last", "Codeless Optimization (Prototype)", "Fast BERT (Prototype)", "Graph Capture (Prototype)", "Graph Optimization", "HyperTune (Prototype)", "Intel\u00ae Extension for PyTorch* optimizations for quantization", "INT8 Recipe Tuning API (Prototype)", "ISA Dynamic Dispatching", "Channels Last", "Optimizer Fusion", "Runtime Extension", "Split SGD", "Smooth Quant Recipe Tuning API (Prototype)", "Quick Start", "Installation", "Introduction", "Troubleshooting", "License", "Large Language Models (LLM) Optimization Overview", "Transformers Optimization Frontend API", "Performance", "Launch Script Usage Guide", "TorchServe with Intel\u00ae Extension for PyTorch*", "Performance Tuning Guide", "Releases"], "terms": {"The": [0, 1, 2, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "document": [0, 7, 17, 20, 29, 34], "i": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 34], "redirect": 0, "thi": [0, 2, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 26, 27, 28, 29, 30, 31, 34], "link": [0, 1, 6, 17, 34], "now": [0, 2, 7, 15, 18, 32, 33, 34], "intel optim": 1, "intel\u00ae extension for pytorch*": 1, "gpu": [1, 3, 18, 34], "discrete gpu": 1, "intel discrete gpu": 1, "extend": [1, 18, 25, 33, 34], "latest": [1, 2, 25, 28, 30, 34], "perform": [1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 18, 19, 21, 25, 28, 29, 31], "optim": [1, 3, 4, 6, 8, 9, 11, 12, 14, 16, 18, 20, 21, 23, 25, 26, 31, 32, 33, 34], "hardwar": [1, 3, 17, 25, 28, 32, 34], "take": [1, 2, 7, 8, 10, 12, 13, 14, 18, 21, 25, 26, 30, 31, 33], "advantag": [1, 2, 7, 9, 12, 18, 21, 25, 30, 31, 33], "advanc": [1, 2, 6, 7, 16, 25, 28], "vector": [1, 2, 6, 17, 18, 25, 28], "512": [1, 6, 11, 16, 25, 28, 31], "avx": [1, 6, 17, 25, 28], "neural": [1, 3, 7, 16, 22, 25, 28, 33, 34], "network": [1, 3, 7, 8, 20, 25, 28, 33], "instruct": [1, 5, 6, 7, 8, 17, 21, 23, 24, 25, 28, 30, 33, 34], "vnni": [1, 15, 17, 25, 28], "matrix": [1, 6, 7, 25, 28], "amx": [1, 3, 6, 7, 17, 25, 28, 30], "cpu": [1, 3, 4, 5, 6, 7, 8, 10, 14, 15, 16, 19, 20, 23, 25, 26, 28, 30, 31, 32, 34], "well": [1, 2, 5, 6, 7, 11, 16, 20, 21, 24, 28, 32, 33, 34], "x": [1, 5, 6, 8, 10, 13, 15, 16, 17, 18, 20, 21, 23, 26, 34], "e": [1, 2, 6, 7, 8, 12, 16, 17, 18, 28, 31, 33, 34], "xmx": 1, "ai": [1, 2, 3, 7, 28], "engin": [1, 6, 18, 33], "discret": 1, "moreov": [1, 2, 28], "provid": [1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 16, 20, 22, 24, 26, 28, 29, 31, 32, 33, 34], "easi": [1, 3, 21], "acceler": [1, 2, 3, 6, 7, 13, 28, 29, 30, 34], "through": [1, 2, 6, 7, 8, 12, 25, 28, 33, 34], "xpu": [1, 2, 3, 34], "devic": [1, 2, 15, 29, 31, 34], "In": [1, 2, 6, 7, 8, 12, 16, 17, 18, 19, 21, 23, 28, 31, 32, 33, 34], "current": [1, 2, 5, 7, 11, 13, 14, 15, 16, 17, 19, 20, 26, 28, 29, 34], "technolog": [1, 7, 28], "landscap": [1, 7, 28], "gener": [1, 5, 6, 7, 10, 12, 16, 17, 18, 21, 23, 28, 29, 30, 31, 32, 33, 34], "genai": [1, 7, 28], "workload": [1, 6, 7, 8, 10, 11, 12, 21, 26, 28, 29, 30, 31, 33, 34], "model": [1, 2, 3, 4, 8, 9, 10, 11, 12, 14, 16, 23, 24, 25, 26, 29, 30, 33, 34], "have": [1, 2, 5, 6, 7, 9, 14, 17, 18, 20, 21, 23, 26, 27, 28, 30, 31, 32, 33, 34], "gain": [1, 7, 26, 28, 34], "widespread": [1, 7, 28], "attent": [1, 2, 7, 28, 34], "popular": [1, 7, 22, 28, 30, 34], "larg": [1, 2, 19, 23, 24, 25, 26, 29, 30, 33, 34], "languag": [1, 2, 23, 24, 25, 26, 29, 34], "llm": [1, 16, 22, 24, 25, 29, 34], "emerg": [1, 7, 28], "domin": [1, 7, 28], "drive": [1, 7, 28], "applic": [1, 2, 7, 20, 28, 32, 33], "start": [1, 3, 4, 5, 6, 7, 10, 20, 24, 34], "from": [1, 2, 3, 4, 5, 8, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 29, 31, 32, 33, 34], "2": [1, 2, 3, 8, 10, 16, 17, 18, 20, 21, 25, 26, 27, 28, 29, 30, 31, 33], "1": [1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 33], "0": [1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 32, 33], "specif": [1, 2, 5, 6, 7, 12, 18, 20, 26, 28, 31, 33, 34], "certain": [1, 7, 26, 28, 29, 31, 33], "ar": [1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "introduc": [1, 3, 7, 15, 18, 21, 22, 31, 33, 34], "For": [1, 2, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 34], "more": [1, 2, 5, 6, 7, 8, 10, 11, 13, 16, 17, 19, 20, 21, 23, 26, 28, 32, 33, 34], "inform": [1, 2, 6, 7, 14, 17, 18, 28, 31, 32, 33, 34], "refer": [1, 7, 9, 13, 14, 16, 17, 18, 20, 22, 23, 24, 25, 32, 34], "section": [1, 6, 7, 8, 14, 20, 23, 24, 25, 28, 29, 32, 33, 34], "can": [1, 2, 5, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 28, 29, 30, 31, 32, 33, 34], "load": [1, 2, 6, 7, 13, 15, 16, 17, 23, 29, 32, 34], "python": [1, 2, 4, 10, 14, 17, 20, 26, 28, 29, 31, 32, 33, 34], "modul": [1, 6, 7, 8, 13, 16, 17, 26, 29, 31, 34], "program": [1, 5, 7, 11, 20, 31, 33, 34], "c": [1, 7, 8, 16, 17, 20, 26, 28, 31, 32, 33, 34], "librari": [1, 2, 5, 6, 7, 17, 20, 32, 33, 34], "script": [1, 2, 3, 4, 5, 6, 7, 8, 10, 14, 17, 20, 23, 24, 26, 28, 29, 30, 32, 33, 34], "user": [1, 2, 7, 9, 10, 12, 13, 15, 16, 18, 20, 26, 31, 32, 33, 34], "enabl": [1, 2, 3, 4, 6, 7, 8, 10, 13, 16, 18, 20, 22, 23, 26, 28, 31, 32, 33, 34], "dynam": [1, 4, 20, 28, 32, 33, 34], "import": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 32, 33, 34], "intel_extension_for_pytorch": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 23, 25, 29, 32, 34], "featur": [1, 2, 3, 5, 8, 10, 13, 14, 18, 20, 23, 25, 26, 28, 30, 31, 32, 33, 34], "includ": [1, 2, 5, 6, 7, 10, 14, 15, 17, 23, 26, 27, 28, 30, 34], "onli": [1, 2, 5, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 20, 21, 26, 28, 31, 32, 34], "packag": [1, 2, 5, 6, 7, 10, 23, 25, 26, 32, 33, 34], "mai": [1, 2, 3, 5, 6, 7, 8, 9, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "newer": [1, 28, 33], "code": [1, 2, 5, 6, 7, 10, 11, 12, 13, 18, 19, 21, 23, 24, 26, 27, 29, 33, 34], "base": [1, 2, 3, 4, 5, 6, 7, 10, 11, 17, 20, 21, 26, 28, 29, 30, 32, 33, 34], "due": [1, 8, 10, 17, 20, 26], "differ": [1, 2, 6, 7, 15, 16, 17, 18, 20, 28, 31, 32, 33, 34], "develop": [1, 3, 6, 28, 30, 33, 34], "schedul": [1, 2, 13, 20, 31, 33], "ha": [1, 2, 7, 10, 14, 17, 18, 20, 21, 26, 28, 30, 31, 33, 34], "been": [1, 6, 7, 10, 17, 18, 28, 31, 33, 34], "releas": [1, 17, 18, 26, 30, 33], "an": [1, 2, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 26, 31, 32, 33, 34], "open": [1, 16, 28, 33], "sourc": [1, 5, 6, 17, 27, 28, 33, 34], "project": [1, 6], "github": [1, 2, 5, 6, 7, 8, 34], "you": [1, 2, 5, 6, 7, 8, 13, 14, 15, 17, 18, 20, 23, 25, 26, 28, 29, 31, 33, 34], "find": [1, 2, 6, 7, 14, 16, 23, 26, 30, 31, 34], "how": [1, 2, 6, 10, 15, 17, 18, 23, 28, 32, 33, 34], "get": [1, 2, 3, 4, 6, 7, 10, 11, 15, 17, 20, 21, 22, 26, 28, 29, 30, 31, 33, 34], "main": [1, 2, 5, 6, 14, 20, 31, 32], "branch": [1, 7, 30], "quick": [1, 20, 24, 25], "about": [1, 2, 5, 7, 13, 16, 32, 33, 34], "product": [1, 2, 7, 14, 28, 34], "structur": [1, 18, 31], "shown": [1, 6, 18, 28, 31, 32], "follow": [1, 2, 4, 5, 6, 7, 8, 11, 14, 15, 16, 17, 18, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34], "figur": [1, 2, 21, 28, 33], "eager": [1, 7, 12, 23, 32, 34], "mode": [1, 2, 5, 7, 10, 12, 18, 20, 23, 26, 32, 34], "frontend": [1, 2, 7, 20, 28, 34], "custom": [1, 2, 7, 26, 34], "fusion": [1, 2, 7, 10, 21, 28, 34], "int8": [1, 2, 3, 4, 17, 18, 20, 22, 28, 29, 34], "quantiz": [1, 3, 4, 13, 22, 26, 28, 30, 32, 34], "api": [1, 3, 6, 10, 11, 15, 20, 26, 33, 34], "further": [1, 2, 5, 6, 7, 18, 20, 28, 33, 34], "improv": [1, 3, 7, 8, 13, 20, 22, 28, 30, 32, 33], "achiev": [1, 2, 6, 7, 28, 33, 34], "convert": [1, 2, 4, 6, 7, 8, 9, 10, 13, 16, 17, 18, 20, 23, 26, 32, 34], "graph": [1, 4, 8, 10, 16, 23, 26, 31, 34], "us": [1, 2, 3, 4, 5, 6, 11, 14, 15, 17, 18, 19, 21, 23, 24, 25, 26, 27, 28, 32, 33, 34], "pass": [1, 2, 5, 10, 17, 20, 26, 32, 34], "reduc": [1, 2, 7, 15, 19, 20, 21, 22, 26, 28, 33, 34], "oper": [1, 2, 6, 8, 13, 15, 21, 32, 33, 34], "kernel": [1, 2, 7, 20, 26, 28, 30, 33, 34], "invoc": [1, 7], "overhead": [1, 2, 7, 10, 19, 20, 26, 28, 33, 34], "result": [1, 2, 6, 10, 12, 14, 16, 18, 20, 21, 30, 31, 32, 33], "compar": [1, 2, 7, 13, 18, 21, 26, 28, 30, 31, 33, 34], "normal": [1, 2, 6, 7, 13, 20, 28, 33, 34], "yield": [1, 7, 33], "better": [1, 2, 6, 7, 15, 18, 20, 28, 31, 32, 33, 34], "techniqu": [1, 2, 7, 11, 12, 28, 34], "like": [1, 2, 3, 5, 6, 7, 8, 14, 18, 19, 21, 26, 28, 31, 33, 34], "amplifi": 1, "them": [1, 5, 7, 18, 19, 28, 31, 33], "comprehens": [1, 34], "both": [1, 2, 6, 7, 16, 18, 19, 21, 28, 29, 31, 32, 33, 34], "torchscript": [1, 2, 5, 7, 10, 11, 12, 19, 23, 26, 32, 34], "torchdynamo": [1, 7, 12, 23, 34], "With": [1, 2, 7, 10, 20, 31, 34], "we": [1, 2, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 18, 19, 20, 21, 23, 28, 30, 32, 33, 34], "recommend": [1, 5, 6, 7, 9, 10, 15, 16, 20, 23, 30, 31, 33, 34], "torch": [1, 2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 18, 20, 23, 26, 29, 32, 33, 34], "jit": [1, 2, 5, 6, 7, 8, 13, 15, 16, 18, 20, 23, 26, 32, 34], "trace": [1, 6, 7, 8, 12, 13, 15, 16, 20, 23, 26, 32, 34], "your": [1, 5, 6, 7, 8, 10, 14, 15, 20, 23, 24, 26, 27, 28, 29, 34], "prefer": [1, 7, 8, 15, 24], "option": [1, 2, 5, 7, 10, 14, 15, 16, 29, 31, 34], "wider": 1, "rang": [1, 6, 7, 15, 16, 19, 21, 26, 31, 32, 34], "ipex": [1, 2, 3, 4, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 23, 26, 29, 31, 32, 34], "backend": [1, 2, 3, 6, 7, 12, 13, 16, 17, 23, 26, 28, 31, 33, 34], "avail": [1, 2, 6, 7, 11, 17, 20, 22, 23, 29, 31, 33, 34], "good": [1, 2, 5, 7, 12, 18, 19, 28, 33, 34], "On": [1, 2, 7, 18, 28, 33], "automat": [1, 2, 6, 7, 9, 10, 12, 13, 15, 16, 18, 22, 28, 31, 32, 33, 34], "dispatch": [1, 34], "underli": [1, 17, 28], "detect": [1, 6, 12, 17, 26, 33, 34], "set": [1, 2, 4, 5, 6, 7, 8, 14, 15, 16, 17, 21, 24, 26, 28, 30, 31, 32, 33, 34], "isa": [1, 34], "leverag": [1, 7, 11, 28, 32, 34], "unit": [1, 2, 33], "runtim": [1, 8, 13, 17, 31, 33, 34], "offer": [1, 5, 33], "finer": [1, 7, 20], "grain": [1, 3, 7, 20], "thread": [1, 2, 7, 20, 26, 30, 31, 32, 33, 34], "control": [1, 2, 7, 20, 26, 31, 33, 34], "weight": [1, 2, 7, 10, 12, 13, 15, 16, 18, 20, 22, 23, 26, 28, 34], "share": [1, 5, 6, 16, 20, 32, 33, 34], "increas": [1, 2, 3, 21, 26, 28, 30, 33, 34], "effici": [1, 7, 11, 19, 20, 28, 31, 33, 34], "implement": [1, 5, 7, 11, 19, 26, 28, 33, 34], "regist": [1, 7, 10, 16, 17, 34], "mechan": [1, 7, 17, 21, 34], "These": [1, 5, 6, 7, 8, 13, 28], "nativ": [1, 6, 7, 8, 17, 19, 21, 26, 28, 34], "calcul": [1, 2, 8, 16, 21, 22], "util": [1, 6, 7, 10, 13, 15, 16, 18, 21, 28, 31, 33, 34], "dpc": 1, "compil": [1, 5, 6, 23, 26, 33, 34], "sycl": 1, "standard": [1, 34], "also": [1, 2, 6, 7, 10, 13, 14, 16, 18, 19, 28, 30, 31, 33, 34], "number": [1, 2, 5, 6, 7, 14, 16, 19, 20, 21, 26, 32, 34], "which": [1, 2, 5, 7, 8, 10, 14, 15, 16, 17, 18, 20, 26, 28, 30, 31, 32, 33, 34], "found": [1, 6, 7, 14, 16, 18, 29, 31, 32, 33, 34], "doc": [1, 2, 5, 11, 29, 34], "directori": [1, 5, 6, 14, 29, 31, 32], "team": [1, 5], "track": 1, "bug": [1, 5, 34], "enhanc": [1, 3, 28, 34], "request": [1, 5, 20, 32], "issu": [1, 2, 5, 8, 21, 26, 33], "befor": [1, 2, 5, 6, 13, 14, 17, 18, 20, 31, 33, 34], "submit": [1, 5, 7, 20], "suggest": [1, 2, 15, 18, 20, 33, 34], "report": [1, 17], "search": [1, 2, 4, 5, 7, 16, 22, 28, 31], "exist": [1, 5, 7, 13, 26, 31, 33], "see": [1, 2, 5, 8, 14, 34], "alreadi": [1, 5, 6, 18, 28, 33], "pytorch": [2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 16, 17, 20, 23, 25, 26, 27, 28, 29, 30, 31, 33, 34], "dtype": [2, 4, 6, 7, 8, 10, 11, 13, 15, 16, 17, 23, 26, 29, 31, 34], "none": [2, 6, 29, 31], "o1": [2, 26, 34], "inplac": [2, 4, 6, 13, 15, 18, 23, 32], "fals": [2, 4, 6, 7, 8, 13, 14, 15, 16, 17, 20, 22, 23, 26, 31, 32, 34], "conv_bn_fold": [2, 26, 34], "linear_bn_fold": 2, "weights_prepack": [2, 6, 7, 23, 26], "replace_dropout_with_ident": 2, "optimize_lstm": 2, "split_master_weight_for_bf16": 2, "fuse_update_step": 2, "auto_kernel_select": [2, 7, 30], "sample_input": [2, 9, 34], "graph_mod": [2, 4, 7, 12, 34], "concat_linear": 2, "appli": [2, 6, 7, 8, 12, 13, 16, 18, 19, 21, 23, 26, 28, 29, 31, 34], "given": [2, 6, 13, 14, 16, 28], "nn": [2, 6, 7, 8, 10, 13, 15, 16, 18, 20, 26, 34], "If": [2, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 20, 26, 31, 32, 33, 34], "train": [2, 3, 4, 7, 11, 13, 15, 16, 18, 21, 23, 26, 28, 29, 31, 34], "otherwis": [2, 7, 20], "infer": [2, 3, 4, 7, 10, 11, 12, 15, 18, 20, 21, 23, 26, 30, 33, 34], "conv": [2, 8, 10, 13, 15, 20, 26, 34], "bn": [2, 10, 15, 26, 34], "fold": [2, 10, 15, 16, 26, 34], "prepack": [2, 6, 10, 18, 26, 28, 34], "so": [2, 5, 6, 7, 8, 15, 17, 18, 20, 30, 31, 32, 33, 34], "onednn": [2, 3, 13, 17, 26, 28, 34], "order": [2, 17, 18, 21, 31, 33, 34], "cach": [2, 5, 7, 19, 20, 30, 34], "reus": [2, 33], "memori": [2, 6, 7, 8, 9, 10, 13, 19, 20, 21, 26, 28, 30, 32, 34], "layout": [2, 26, 34], "call": [2, 6, 8, 13, 17, 18, 21, 26, 32, 33, 34], "block": [2, 5, 16, 20, 22, 28, 33, 34], "although": [2, 33], "itself": [2, 5, 18], "enough": [2, 7, 19], "usag": [2, 6, 7, 8, 23, 25, 32, 33, 34], "perspect": [2, 13, 18, 21, 28, 31, 33], "drawback": [2, 21], "run": [2, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 26, 30, 31, 32, 33, 34], "split": [2, 6, 7, 16, 17, 19, 20, 26, 34], "one": [2, 5, 7, 12, 13, 14, 16, 18, 19, 20, 26, 29, 31, 33, 34], "sever": [2, 7, 10, 19, 30, 31, 34], "dimens": [2, 18, 26], "data": [2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 31, 32, 34], "fix": [2, 5, 7, 34], "size": [2, 6, 7, 11, 15, 16, 17, 18, 23, 26, 28, 30, 32, 33, 34], "each": [2, 8, 14, 16, 17, 19, 20, 21, 31, 32, 33, 34], "time": [2, 5, 7, 14, 16, 17, 18, 19, 26, 28, 30, 33, 34], "execut": [2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 19, 20, 26, 31, 32, 33, 34], "detail": [2, 5, 6, 7, 8, 9, 11, 13, 17, 18, 24, 25, 26, 28, 30, 32, 33, 34], "mermori": 2, "format": [2, 5, 6, 7, 9, 14, 22, 26, 28, 31, 33, 34], "manual": [2, 7, 10, 14, 18, 20, 34], "To": [2, 5, 6, 7, 10, 13, 15, 16, 17, 18, 20, 21, 23, 28, 32, 33, 34], "predefin": 2, "shape": [2, 6, 7, 16, 20, 23, 30, 33, 34], "prior": [2, 23], "match": [2, 8, 17, 31], "requir": [2, 5, 6, 8, 10, 16, 18, 21, 26, 28, 29, 31, 32, 34], "won": [2, 7, 8, 17, 26], "t": [2, 5, 7, 8, 14, 15, 16, 17, 18, 20, 26, 32, 34], "convers": [2, 8, 13, 34], "directli": [2, 6, 33, 34], "go": [2, 5, 8], "methodologi": [2, 6, 7, 19, 33], "possibl": [2, 14, 15, 19, 28, 33, 34], "avoid": [2, 10, 20, 21, 26, 31, 32, 33, 34], "thu": [2, 7, 8, 10, 18, 20, 21, 28, 31, 32, 33], "paramet": [2, 6, 7, 8, 10, 16, 17, 19, 20, 21, 26, 28, 29, 30, 31, 33, 34], "work": [2, 5, 6, 7, 14, 15, 17, 20, 26, 28, 29, 31, 33, 34], "bfloat16": [2, 3, 4, 7, 10, 11, 17, 18, 23, 29, 31, 34], "half": [2, 7, 17, 21], "k": [2, 5], "float16": [2, 8], "cast": [2, 8, 21, 28], "accord": [2, 13, 28, 33, 34], "default": [2, 4, 6, 7, 10, 12, 13, 15, 16, 17, 20, 22, 23, 26, 28, 30, 32, 33, 34], "valu": [2, 6, 10, 14, 16, 17, 19, 20, 21, 22, 26, 28, 31, 32, 33, 34], "mean": [2, 16, 17, 18, 20, 22, 28, 34], "do": [2, 5, 8, 16, 18, 20, 21, 26, 28, 30, 31, 32, 33, 34], "noth": 2, "note": [2, 3, 5, 6, 15, 16, 17, 18, 20, 22, 24, 28, 30, 31, 32, 33], "type": [2, 4, 5, 6, 7, 10, 16, 17, 18, 20, 21, 23, 30, 31, 32, 34], "conv2d": [2, 7, 8, 10, 13, 18, 20, 26, 34], "linear": [2, 6, 7, 8, 13, 15, 16, 18, 26, 33, 34], "convtranspose2d": [2, 13], "case": [2, 6, 7, 9, 12, 16, 17, 18, 28, 31, 33, 34], "addit": [2, 6, 7, 17, 21, 28, 34], "embed": [2, 7, 28, 34], "lstm": [2, 10, 15, 34], "sgd": [2, 6, 7, 8, 16, 19], "string": [2, 31], "o0": [2, 26, 34], "No": [2, 18, 34], "function": [2, 5, 6, 7, 8, 10, 11, 12, 14, 15, 17, 20, 21, 23, 26, 28, 29, 31, 33, 34], "just": [2, 14, 29, 34], "return": [2, 6, 7, 8, 10, 16, 17, 20, 26, 34], "origin": [2, 6, 7, 12, 13, 15, 17, 20, 29, 34], "dropout": [2, 10], "remov": [2, 5, 21, 34], "inferenc": 2, "master": [2, 7, 21, 31], "fuse": [2, 7, 13, 16, 19, 28, 34], "updat": [2, 5, 7, 16, 19, 21, 22, 34], "step": [2, 5, 6, 7, 8, 14, 16, 19, 21, 32], "overridden": [2, 17], "explicitli": [2, 8, 16, 20, 26, 31, 34], "bool": [2, 14], "whether": [2, 6, 8, 16, 18, 22, 23, 33], "conv_bn": 2, "It": [2, 6, 7, 8, 10, 13, 17, 18, 20, 21, 23, 26, 29, 31, 33, 34], "knob": [2, 4, 12, 31], "overwrit": [2, 31], "configur": [2, 4, 6, 7, 14, 15, 16, 17, 31, 32, 34], "linear_bn": 2, "convolut": [2, 6, 7, 13, 20, 33, 34], "reorder": [2, 18, 28], "doesn": [2, 15, 16, 18, 26, 34], "support": [2, 5, 6, 7, 13, 15, 16, 17, 18, 19, 20, 21, 25, 26, 28, 29, 31, 32, 33, 34], "replac": [2, 5, 7, 10, 26, 34], "ident": [2, 10, 18], "aten": [2, 6, 7, 34], "opportunit": 2, "bf16": [2, 3, 7, 17, 19, 21, 23, 26, 28, 30, 34], "save": [2, 5, 6, 7, 13, 14, 15, 16, 18, 21, 28, 32, 34], "solut": [2, 7, 26, 28, 34], "all": [2, 5, 6, 8, 13, 14, 17, 19, 20, 28, 29, 32, 33, 34], "param": [2, 19, 31], "tupl": [2, 6, 17, 20], "tensor": [2, 6, 7, 8, 11, 15, 16, 17, 20, 26, 28, 32, 34], "feed": [2, 9, 18], "sampl": [2, 6, 9, 14, 16, 17, 29, 33], "input": [2, 6, 7, 9, 10, 13, 15, 16, 17, 18, 22, 23, 26, 29, 30, 32, 33, 34], "impact": [2, 7, 20], "pack": [2, 20, 34], "intel": [2, 3, 4, 7, 8, 9, 10, 11, 13, 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 29, 34], "extens": [2, 3, 4, 6, 9, 10, 13, 14, 16, 17, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34], "per": [2, 10, 15, 16, 20, 30, 31, 32, 33, 34], "some": [2, 5, 7, 8, 13, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "heurist": [2, 20, 34], "real": [2, 7, 14, 15, 30, 34], "best": [2, 6, 7, 8, 14, 16, 17, 22, 24, 28, 33, 34], "try": [2, 5, 6, 7, 12, 14, 16, 26, 31, 33, 34], "select": [2, 5, 7, 13, 24, 34], "true": [2, 4, 6, 10, 12, 13, 14, 15, 16, 17, 22, 23, 31, 32, 33, 34], "might": [2, 7, 18, 26, 33, 34], "cost": [2, 6, 28, 30, 33], "extra": [2, 5, 10, 20, 31, 32], "combin": [2, 12, 14, 28, 31, 34], "method": [2, 8, 15, 16, 18, 22, 26, 33, 34], "multipl": [2, 5, 7, 8, 16, 17, 18, 26, 28, 30, 32, 33, 34], "subgraph": 2, "modifi": [2, 5, 6], "other": [2, 6, 7, 8, 14, 17, 18, 19, 23, 28, 31, 33], "place": [2, 8, 28, 33, 34], "scenario": [2, 6, 7, 18, 33, 34], "convolutuon": 2, "counterpart": [2, 7, 18, 34], "pleas": [2, 6, 7, 11, 16, 22, 26, 28, 31, 33, 34], "invok": [2, 6, 8, 10, 13, 20, 23, 26, 29, 34], "ddp": [2, 6], "distribut": [2, 3, 7, 16, 31, 32, 33], "deepcopi": 2, "rather": [2, 18], "than": [2, 5, 7, 17, 18, 20, 21, 26, 33, 34], "allreduc": 2, "caus": [2, 7, 21, 26, 28, 31, 33, 34], "unpredict": 2, "accuraci": [2, 3, 6, 7, 8, 15, 16, 21, 22, 26, 28, 34], "loss": [2, 5, 6, 8, 16, 18, 21, 26], "exampl": [2, 5, 7, 8, 13, 18, 19, 21, 22, 23, 24, 25, 28, 29, 32, 33, 34], "load_state_dict": [2, 34], "path": [2, 6, 7, 14, 18, 20, 23, 31, 33, 34], "eval": [2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "optimized_model": [2, 34], "evalu": [2, 16, 34], "optimized_optim": 2, "altern": [2, 6, 18], "motiv": [2, 20], "ad": [2, 7, 10, 33, 34], "alia": 2, "unifi": [2, 31], "style": [2, 5], "modular": 2, "float32": [2, 13, 21, 23, 26, 30, 31, 34], "quantization_config": [2, 6, 29], "qconfig_summary_fil": [2, 6, 29], "low_precision_checkpoint": [2, 6, 29], "deployment_mod": [2, 6, 23], "transform": [2, 3, 4, 6, 10, 11, 13, 16, 18, 22, 23, 28, 32, 33, 34], "focu": [2, 10, 18, 29, 34], "especi": [2, 5, 28, 34], "task": [2, 7, 28, 31, 33, 34], "famili": [2, 28, 33], "full": [2, 5, 18, 32, 33, 34], "llama": [2, 3, 6, 28], "gpt": [2, 28, 30], "j": [2, 5, 17, 28, 30], "neox": [2, 28], "opt": [2, 6, 17, 28], "falcon": [2, 28], "bloom": [2, 28], "codegen": [2, 28, 34], "baichuan": [2, 28, 34], "chatglm": [2, 28], "gptbigcod": [2, 28], "t5": [2, 26, 28, 34], "mistral": [2, 28, 34], "mpt": [2, 28, 34], "mixtral": [2, 28], "stablelm": [2, 28], "qwen": [2, 28], "git": [2, 5, 28], "llava": [2, 28], "yuan": [2, 28], "phi": [2, 28], "scope": [2, 7, 8, 21, 34], "abov": [2, 5, 10, 19, 28, 30, 31, 32], "transpar": [2, 7, 29, 33, 34], "benifit": 2, "float": [2, 6, 7, 8, 14, 15, 16, 17, 21, 29, 34], "when": [2, 5, 6, 7, 8, 9, 14, 18, 19, 20, 21, 22, 25, 26, 28, 30, 31, 32, 33, 34], "mix": [2, 6, 13, 23, 26, 28, 34], "str": [2, 6, 14, 23, 31], "specifi": [2, 5, 6, 14, 20, 31, 33, 34], "either": [2, 26, 31], "object": [2, 6, 7, 14, 17, 20, 33, 34], "defin": [2, 5, 6, 7, 8, 10, 16, 17, 18, 22, 32], "recip": [2, 4, 7, 13, 15, 26, 28, 34], "quant": [2, 16], "static": [2, 4, 16, 26, 28, 31, 32, 33, 34], "onc": [2, 5, 6, 14, 17, 18, 20, 21, 32, 33], "quantizat": 2, "config": [2, 6, 11, 23, 31, 32], "json": [2, 6, 15, 16, 32, 34], "file": [2, 4, 5, 6, 8, 14, 15, 16, 17, 18, 31, 34], "under": [2, 6, 8, 18, 20, 27, 31, 34], "need": [2, 5, 6, 7, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 29, 31, 32, 33, 34], "calibr": [2, 13, 22, 26, 29, 30, 32, 34], "dict": [2, 6, 23], "int4": [2, 28, 29, 34], "": [2, 3, 5, 8, 10, 14, 15, 18, 19, 20, 21, 22, 26, 31, 32, 33], "should": [2, 5, 8, 15, 20, 28, 31, 33], "state_dict": [2, 6], "checkpoint": [2, 6, 29], "pt": [2, 6, 13, 14, 15, 23, 32, 34], "gptq": [2, 6, 34], "etc": [2, 5, 6, 17, 34], "where": [2, 5, 7, 16, 21, 33], "kei": [2, 7, 28, 34], "scale": [2, 3, 6, 15, 28], "zero": [2, 6, 15, 34], "point": [2, 6, 8, 15, 21, 33, 34], "bia": [2, 8, 20, 34], "weight_kei": 2, "packed_weight": 2, "scale_kei": 2, "zero_point_kei": 2, "packed_zp": 2, "bias_kei": 2, "chang": [2, 5, 6, 7, 8, 10, 11, 12, 15, 17, 18, 20, 23, 25, 26, 29, 31], "make": [2, 5, 6, 7, 14, 15, 17, 21, 23, 28, 32, 33], "n": [2, 6, 7, 16, 18, 19, 20, 26, 32, 33, 34], "thei": [2, 7, 8, 31, 33], "uint4": 2, "compress": 2, "along": [2, 5, 6, 21, 33, 34], "store": [2, 17, 18, 19, 21, 28, 31, 32, 33, 34], "int32": 2, "state": [2, 15, 19, 28], "automaticlli": 2, "deploy": [2, 7, 13, 34], "torchscirpt": 2, "workabl": 2, "forward": [2, 6, 8, 13, 16, 20, 21, 26, 32, 33, 34], "after": [2, 5, 7, 13, 20, 21, 23, 24, 32, 33, 34], "deepspe": [2, 34], "parallel": [2, 5, 6, 7, 28, 33, 34], "class": [2, 5, 6, 7, 8, 10, 16, 20, 26, 34], "verbos": [2, 4, 31], "demand": [2, 7], "easier": [2, 18, 21], "debug": [2, 31], "dump": [2, 31], "messag": [2, 6, 10, 12, 18, 31], "contain": [2, 5, 6, 13, 17, 26, 31, 32, 33, 34], "durat": [2, 21], "while": [2, 7, 8, 11, 12, 18, 21, 26, 28, 32, 33, 34], "via": [2, 5, 6, 7, 18, 20, 30, 31, 33, 34], "environ": [2, 5, 6, 17, 20, 24, 28, 30, 31, 32, 33], "variabl": [2, 5, 17, 30, 31, 32, 33, 34], "name": [2, 5, 7, 14, 17, 25, 28, 31, 32, 33, 34], "dnnl_verbos": 2, "howev": [2, 5, 7, 8, 9, 16, 20, 26, 28, 31, 33, 34], "those": [2, 15, 33], "amount": [2, 16, 26, 28, 33], "investig": [2, 31], "singl": [2, 7, 13, 14, 16, 19, 20, 30, 32, 34], "iter": [2, 16, 21, 28, 34], "out": [2, 5, 6, 7, 8, 10, 13, 16, 19, 20, 30, 31, 33, 34], "second": [2, 10, 28, 32, 33], "verbose_on": 2, "verbose_off": 2, "disabl": [2, 6, 7, 13, 26, 31, 33, 34], "verbose_on_cr": 2, "creation": 2, "linearsilu": [2, 34], "silu": [2, 13], "http": [2, 5, 16, 34], "org": [2, 7, 16, 26, 34], "stabl": [2, 3, 8, 34], "html": [2, 5, 16], "output": [2, 6, 7, 8, 13, 14, 16, 18, 23, 26, 34], "same": [2, 5, 7, 10, 15, 16, 17, 18, 20, 21, 28, 31, 32, 33, 34], "init": [2, 5, 15, 34], "linear_modul": 2, "4096": [2, 33], "ipex_fus": 2, "randn": [2, 10, 13, 16, 18, 32, 34], "linearsilumul": [2, 34], "multipli": 2, "mul": [2, 13, 16], "linear2silumul": [2, 34], "linear_": 2, "linear_m": 2, "two": [2, 7, 14, 16, 20, 21, 28, 32, 33, 34], "linear_s_modul": 2, "linear_m_modul": 2, "linearrelu": [2, 34], "relu": [2, 7, 13, 16, 18, 26, 34], "linearnewgelu": [2, 34], "newgeluactiv": 2, "com": [2, 5, 34], "huggingfac": [2, 6, 26, 28, 32, 34], "blob": 2, "src": [2, 17], "activ": [2, 6, 7, 15, 16, 20, 28, 31, 33], "py": [2, 5, 10, 14, 20, 31, 32, 34], "l50": 2, "new_gelu": 2, "lineargelu": [2, 34], "gelu": [2, 13, 34], "linearmul": [2, 34], "linearadd": [2, 34], "add": [2, 5, 7, 8, 13, 14, 19, 21, 32, 34], "linearaddadd": [2, 34], "other_1": 2, "other_2": 2, "rotaryembed": [2, 34], "max_position_embed": 2, "int": [2, 6, 7, 14, 17, 23, 26, 29, 31, 34], "pos_embd_dim": 2, "10000": 2, "backbon": 2, "co": 2, "paper": [2, 34], "2104": 2, "09864": 2, "queri": [2, 17, 18], "multi": [2, 7, 14, 20, 28, 31, 33, 34], "head": [2, 34], "comput": [2, 6, 7, 13, 15, 16, 18, 20, 21, 28, 30, 31, 32, 33, 34], "max": [2, 6, 16, 17, 22, 23, 26, 34], "posit": [2, 28, 33, 34], "frequenc": [2, 30], "exact": 2, "g": [2, 7, 8, 16, 17, 18, 28, 34], "gptjforcausallm": 2, "architectur": [2, 28, 30, 33], "eleutherai": [2, 28], "6b": [2, 28, 30], "l4": 2, "batch": [2, 6, 7, 13, 16, 18, 20, 23, 26, 30, 32, 34], "sequenc": [2, 18, 21, 28, 34], "length": [2, 5, 14, 21, 26, 30, 34], "num_head": 2, "num_kv_head": 2, "head_dim": 2, "position_id": [2, 6], "element": [2, 18, 19], "past_kv_length": 2, "id": [2, 31, 32], "construct": [2, 7, 13], "current_posit": 2, "num": [2, 20, 32, 33, 34], "dim": [2, 6, 18, 23], "offset": [2, 18, 28], "sin": 2, "neighbor": 2, "rotary_dim": 2, "rotary_ndim": 2, "rotari": [2, 28], "64": [2, 8, 10, 16, 20, 30, 31, 34], "gptj": 2, "rope_modul": 2, "2048": [2, 6], "32": [2, 6, 18, 21, 23, 30, 31, 32], "16": [2, 17, 20, 21, 30, 31, 32], "256": [2, 30], "arang": [2, 6, 16], "unsqueez": 2, "query_roteri": 2, "direct": [2, 5, 13], "apply_funct": 2, "without": [2, 5, 6, 7, 8, 10, 16, 20, 21, 26, 32, 34], "initi": [2, 20, 32], "assum": [2, 7, 8, 23, 32, 33, 34], "arg": [2, 4, 6, 7, 14, 16, 19, 23, 31, 32, 34], "num_token": 2, "rotary_half": 2, "rmsnorm": [2, 28, 34], "hidden_s": [2, 6], "ep": [2, 7, 10, 19], "1e": [2, 7, 10, 16], "06": [2, 31, 32], "hidden": [2, 18, 28], "modeling_llama": 2, "l76": 2, "variance_epsilon": 2, "6": [2, 5, 7, 11, 14, 20, 30, 31, 32, 33, 34], "ones": [2, 6, 17], "hidden_st": 2, "usual": [2, 18, 20, 33], "rmsnorm_modul": 2, "fastlayernorm": [2, 34], "normalized_shap": 2, "layernorm": [2, 13, 16, 22, 34], "list": [2, 5, 7, 8, 13, 14, 16, 18, 25, 29, 31, 32, 33, 34], "denomin": 2, "numer": [2, 8, 33], "stabil": [2, 8, 34], "layernorm_modul": 2, "05": [2, 7, 10, 30, 31], "expect": [2, 7, 30, 34], "indirectaccesskvcacheattent": [2, 34], "text_max_length": 2, "kv_cach": [2, 28], "decod": [2, 28, 30, 34], "layer": [2, 16, 20, 22, 28, 34], "bring": [2, 6, 7, 9, 15, 16, 21, 28, 31, 33, 34], "beam": [2, 28], "idx": [2, 28, 31], "concat": [2, 20, 26, 28, 34], "entir": [2, 16, 28], "context": [2, 5, 6, 8, 20, 28, 33], "dot": [2, 7, 18, 28], "veri": [2, 5, 15, 18, 28], "long": [2, 6, 18, 21, 26, 28, 34], "bottleneck": [2, 28], "indirect": 2, "access": [2, 6, 7, 18, 19, 32], "iakv": [2, 28], "firstli": [2, 28], "pre": [2, 28, 34], "alloc": [2, 10, 20, 28, 30, 32, 34], "buffer": [2, 28], "index": [2, 5, 18, 28, 33], "histori": [2, 14, 28], "decid": [2, 15, 20, 28], "timestamp": [2, 28], "max_seq": 2, "head_num": 2, "head_siz": 2, "token": [2, 6, 23, 28, 30], "everi": [2, 28], "kv": 2, "seq_len": [2, 30], "scale_attn": 2, "sqrt": [2, 13, 19], "layer_past": 2, "seq_info": 2, "key_cach": 2, "value_cach": 2, "info": [2, 6, 17, 26, 31, 32, 34], "head_mask": 2, "mask": [2, 7, 17, 26], "yet": [2, 6, 26, 34], "attention_mask": [2, 6], "attn_output": 2, "attn_weight": 2, "first": [2, 3, 5, 6, 7, 9, 10, 12, 16, 19, 20, 21, 26, 31, 32, 33], "matmul": [2, 8, 13, 26, 34], "new_layer_past": 2, "l1318": 2, "def": [2, 6, 8, 10, 16, 20, 26, 34], "_reorder_cach": 2, "self": [2, 6, 8, 10, 16, 20, 26, 34], "past_key_valu": [2, 6], "beam_idx": 2, "len": [2, 6, 7, 13, 16, 17], "4": [2, 6, 11, 13, 14, 18, 20, 23, 28, 30, 31, 33, 34], "3": [2, 5, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 20, 21, 28, 30, 31, 33], "pagedattent": [2, 34], "vllm": 2, "blog": [2, 34], "2023": [2, 3, 30], "20": [2, 7, 18, 30, 31, 32, 34], "page": [2, 6, 13, 20, 24, 29, 30, 33, 34], "num_block": 2, "block_siz": 2, "basic": [2, 4, 16, 21, 33], "logic": [2, 14, 18, 32, 33], "dram": 2, "manag": [2, 8, 13, 20, 28, 31], "slot": [2, 30], "reshape_and_cach": 2, "single_query_cached_kv_attent": 2, "mha": [2, 34], "intra": 2, "tabl": [2, 7, 17, 28, 30, 34], "map": [2, 6, 18, 30], "physic": [2, 14, 20, 32, 33], "slot_map": 2, "allcat": 2, "keytensor": 2, "num_seq": 2, "_i_": 2, "block_numb": 2, "head_map": 2, "block_tabl": 2, "context_len": 2, "max_context_len": 2, "alibi_slop": 2, "5": [2, 6, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 26, 28, 30, 31, 32, 33, 34], "max_num_blocks_per_seq": 2, "optin": 2, "alibi": 2, "slope": 2, "varlenattent": [2, 34], "scaled_dot_product_attent": 2, "accept": [2, 34], "variant": [2, 8, 28], "among": [2, 31, 32, 33], "doe": [2, 7, 13, 18, 20, 26, 34], "query_token": 2, "total": [2, 6, 30, 33], "key_token": 2, "value_token": 2, "seqlen_q": 2, "batch_siz": [2, 6, 11, 13, 16, 18, 23, 32], "seqlen_k": 2, "max_seqlen_q": 2, "max_seqlen_k": 2, "pdropout": 2, "probabl": 2, "greater": 2, "softmax_scal": 2, "factor": [2, 6, 16, 31], "softmax": [2, 13, 34], "is_caus": 2, "causal": 2, "varlenattention_modul": 2, "emply_lik": 2, "rotary_embed": [2, 34], "rms_norm": [2, 34], "fast_layer_norm": [2, 34], "indirect_access_kv_cache_attent": [2, 34], "add_casual_mask": 2, "varlen_attent": [2, 34], "zero_tensor": 2, "return_softmax": 2, "gen_": 2, "fast_bert": [2, 4, 6, 7, 11, 34], "unpad": 2, "tpp": [2, 28], "speedup": [2, 6, 8, 28, 30, 34], "still": [2, 5, 7, 8, 13, 16, 18, 21, 26, 34], "squenc": 2, "sparsiti": 2, "seed": 2, "libxsmm": 2, "though": [2, 7], "peak": [2, 7, 11, 34], "enable_onednn_fus": [2, 13], "get_smooth_quant_qconfig_map": [2, 6, 29], "alpha": [2, 6, 19, 22], "act_observ": 2, "act_ic_observ": 2, "wei_observ": 2, "wei_ic_observ": 2, "share_weight_observ": 2, "smoothquant": [2, 6, 7, 16, 22, 28, 34], "arxiv": 2, "pdf": 2, "2211": 2, "10438": 2, "hyper": [2, 30, 33, 34], "observ": [2, 9, 13, 15, 34], "op": [2, 7, 15, 16, 22, 28, 34], "histogramobserv": [2, 15], "q": [2, 28], "min": [2, 16, 22, 26, 34], "affect": [2, 31], "argument": [2, 6, 7, 22, 26, 31], "ao": [2, 6, 15], "minmaxobserv": [2, 6, 15], "channel": [2, 3, 10, 15, 16, 26, 34], "perchannelminmaxobserv": [2, 6, 15], "with_arg": [2, 6, 15], "ch_axi": 2, "qint8": [2, 6, 15], "qscheme": [2, 6, 15, 34], "per_channel_symmetr": [2, 6, 15], "qconfig": [2, 4, 6, 13, 16, 26, 29, 32, 34], "prepar": [2, 4, 6, 13, 16, 26, 29, 32, 34], "example_input": [2, 4, 6, 13, 15, 29, 32, 34], "bn_fold": 2, "example_kwarg_input": 2, "fp32": [2, 4, 16, 17, 19, 21, 23, 28, 34], "A": [2, 5, 6, 7, 10, 11, 17, 26, 28, 31, 33, 34], "even": [2, 5, 7, 33, 34], "prepared_model": [2, 4, 6, 13, 15, 16, 26, 29, 34], "original_model": 2, "later": [2, 7, 25, 33], "unexpect": 2, "behavior": [2, 20, 31, 33], "insert": [2, 16], "fake": 2, "introduct": [2, 7, 28, 33, 34], "avaiabl": 2, "autotun": [2, 4, 22, 34], "calib_dataload": [2, 6, 16, 34], "calib_func": 2, "eval_func": [2, 16, 34], "op_type_dict": 2, "smoothquant_arg": [2, 16], "sampling_s": [2, 4, 16, 34], "accuracy_criterion": [2, 4, 16, 34], "tuning_tim": [2, 4, 16, 34], "driven": 2, "tune": [2, 3, 4, 7, 8, 15, 20, 26, 28, 31, 32, 34], "help": [2, 5, 6, 17, 23, 28, 31, 33, 34], "quickli": 2, "dataload": [2, 6, 10, 13, 16, 20, 22, 29, 34], "post": [2, 4, 5, 7, 15, 28, 34], "process": [2, 6, 7, 11, 12, 14, 16, 19, 20, 21, 26, 31, 32, 33], "metric": [2, 16, 30], "scalar": 2, "higher": [2, 7, 13, 17, 18, 28], "constraint": [2, 34], "optyp": 2, "wise": [2, 16, 19, 22, 29, 34], "space": [2, 7, 16, 18, 22, 33], "global": [2, 20, 22, 34], "algorithm": [2, 13, 18, 30, 34], "would": [2, 5, 6, 14, 16, 17, 18, 30, 31, 32, 33, 34], "explor": 2, "100": [2, 4, 14, 16, 17, 30, 32], "accuracy_criterion_typ": 2, "rel": [2, 4, 16, 31, 34], "absolut": [2, 31], "accuracy_criterion_valu": 2, "maximum": [2, 16, 17], "allow": [2, 8, 14, 16, 22, 31, 33, 34], "01": [2, 4, 7, 16, 31, 32, 34], "timeout": [2, 5, 21], "earli": [2, 34], "stop": [2, 33], "is_runtime_ext_en": 2, "helper": 2, "check": [2, 5, 6, 7, 13, 18, 28, 29, 31, 34], "exetens": 2, "openmp": [2, 7, 20, 26, 30, 32, 34], "preload": [2, 31], "cpupool": [2, 20, 34], "core_id": [2, 20, 31], "node_id": [2, 20, 31, 32, 34], "abstract": [2, 11, 20], "pool": [2, 20, 34], "core": [2, 7, 14, 17, 30, 33, 34], "numa": [2, 20, 31, 32, 34], "node": [2, 20, 30, 32, 33, 34], "pin": [2, 20], "cpu_pool": [2, 20, 34], "region": [2, 8, 17, 33], "design": [2, 5, 8, 18, 21, 29, 34], "decor": 2, "multistreammodulehint": [2, 20, 34], "kwarg": [2, 29], "hint": [2, 20], "multistreammodul": [2, 7, 20, 26, 34], "its": [2, 6, 7, 8, 14, 17, 21, 28, 30, 31, 32, 33, 34], "arbitrari": 2, "keyword": 2, "num_stream": [2, 20, 34], "auto": [2, 6, 10, 17, 18, 22, 23, 26, 28, 31, 33, 34], "concat_output": 2, "input_split_hint": [2, 20], "multi_stream": 2, "output_concat_hint": [2, 20], "stream": [2, 7, 20, 34], "throughput": [2, 3, 18, 20, 26, 28, 30, 34], "insid": [2, 5, 20, 31], "divis": [2, 20], "equal": [2, 15, 20, 32, 33], "remaind": [2, 20], "divisor": [2, 20], "batchsiz": [2, 20], "larger": [2, 20, 30, 33], "piec": [2, 20], "less": [2, 8, 18, 20, 26, 34], "mini": [2, 20, 34], "don": [2, 5, 8, 14, 17, 34], "want": [2, 5, 7, 14, 15, 17, 20, 31, 34], "leav": [2, 20, 33], "scriptmodul": [2, 13, 20], "union": 2, "instanc": [2, 7, 10, 14, 32, 34], "reason": [2, 10, 18, 20, 34], "flag": [2, 5, 7, 17, 20, 31, 34], "indic": [2, 6, 18, 28], "concaten": [2, 21], "raw": 2, "asynchron": [2, 7], "get_core_list_of_node_id": 2, "softwar": [3, 27, 34], "jul": 3, "deep": [3, 7, 8, 11, 13, 14, 21, 33], "learn": [3, 7, 8, 11, 13, 14, 21, 31, 33], "boost": [3, 6, 7, 9, 21, 30, 31, 33, 34], "dl": [3, 7, 34], "hug": 3, "face": 3, "bert": [3, 4, 10, 30, 34], "googl": [3, 5, 28], "cloud": 3, "platform": [3, 7, 18, 32, 33, 34], "gcp": 3, "technologi": [3, 7], "guid": [3, 6, 7, 17, 32, 34], "apr": 3, "mar": [3, 32], "new": [3, 5, 12, 16, 17, 18, 20, 23, 26, 29, 33], "x86": 3, "sapphir": 3, "rapid": 3, "part": [3, 5, 7, 8, 18, 21, 26, 33, 34], "jan": 3, "secur": 3, "torchserv": [3, 34], "confer": 3, "dec": 3, "2022": [3, 31, 32], "what": [3, 5, 6, 8, 23], "pyg": 3, "diffus": [3, 34], "arc": 3, "nov": 3, "13": [3, 10, 17, 30, 31, 32, 33], "potenti": [3, 7, 34], "fine": [3, 20, 31, 32, 33, 34], "fx": [3, 7, 10, 26, 34], "sep": [3, 17], "empow": 3, "xeon": [3, 7, 14, 21, 28, 30, 32, 33, 34], "scalabl": [3, 7, 21, 28, 30, 33, 34], "processor": [3, 7, 19, 21, 28, 30, 33, 34], "aug": [3, 30], "vision": [3, 6, 30], "last": [3, 10, 21, 26, 34], "One": [3, 18, 19, 31, 33], "click": 3, "compressor": [3, 7, 16, 22, 34], "4x": 3, "jun": 3, "grokk": 3, "principl": [3, 18], "kt": 3, "person": 3, "text": [3, 6, 26, 28, 30, 33], "speech": [3, 33], "2021": [3, 17, 31, 32], "up": [3, 7, 11, 20, 24, 28, 33, 34], "modern": 3, "naver": 3, "low": [3, 4, 6, 7, 21, 23, 31, 33, 34], "latenc": [3, 14, 18, 28, 30, 32, 34], "machin": [3, 5, 6, 7, 14, 17, 26, 31, 32, 33, 34], "feb": 3, "dlrm": [3, 7, 26, 30, 34], "oneccl": [3, 6, 31, 34], "mention": [3, 10, 20, 21, 34], "deprec": [3, 26], "facebook": [3, 6, 28], "3rd": [3, 7, 21, 30, 34], "gen": [3, 30, 34], "capabl": [3, 17, 34], "2020": 3, "collabor": 3, "2019": 3, "caff": 3, "2017": 3, "command": [4, 5, 6, 14, 23, 31, 32, 33, 34], "descript": [4, 7, 16, 18, 20, 25, 33, 34], "instal": [4, 5, 6, 23, 25, 26, 28, 33, 34], "m": [4, 14, 20, 26, 31, 32, 33, 34], "pip": [4, 5, 34], "captur": [4, 34], "log": [4, 6, 13, 31, 32, 34], "prompt": [4, 6, 23, 34], "export": [4, 31, 33], "onednn_verbos": 4, "dure": [4, 6, 7, 10, 13, 16, 21, 31, 33, 34], "precis": [4, 6, 13, 21, 23, 26, 30, 34], "no_grad": [4, 6, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "amp": [4, 6, 10, 23, 26, 34], "autocast": [4, 6, 7, 10, 23, 34], "prototyp": [4, 13, 20, 26, 34], "fast": [4, 12, 33, 34], "bertmodelmodel": 4, "bertmodel": [4, 6, 11, 32], "from_pretrain": [4, 6, 11, 23, 29, 32], "uncas": [4, 6, 10, 11, 32, 34], "launch": [4, 6, 20, 32, 34], "autom": [4, 7, 8, 14, 31, 32, 34], "ipexrun": [4, 10, 31, 34], "lt": [4, 28, 30], "your_pytorch_script": [4, 31], "gt": [4, 14, 28, 33], "hypertun": [4, 34], "hyperparamet": [4, 7], "conf": [4, 13, 14, 31, 34], "your_conf_fil": [4, 34], "your_python_script": [4, 34], "default_static_qconfigprepared_model": 4, "anyplac": 4, "d": [4, 5, 6, 7, 8, 13, 26, 28, 34], "calibration_data_load": [4, 6, 13], "converted_model": [4, 6, 26, 34], "default_dynamic_qconfigprepared_model": 4, "tuned_model": [4, 16, 34], "eval_funct": 4, "convert_model": [4, 13, 15, 16], "thank": [5, 34], "interest": 5, "begin": 5, "intent": 5, "propos": [5, 7, 11, 16, 18, 21], "intend": 5, "shall": [5, 18, 33], "discuss": [5, 18, 33], "agre": 5, "plan": [5, 7, 10], "look": [5, 14, 16, 18], "ahead": 5, "outstand": 5, "pick": 5, "comment": [5, 14, 17, 22, 34], "particular": [5, 6, 8, 29, 34], "ask": 5, "pull": 5, "here": [5, 8, 10, 13, 16, 17, 18, 20, 26, 32, 33, 34], "uninstal": 5, "ll": [5, 32, 33], "know": 5, "fulli": [5, 15, 17, 21, 33, 34], "warn": [5, 6, 12, 31, 32, 34], "skip": [5, 6, 17, 18, 31], "few": [5, 7, 9, 13, 16, 18, 32, 34], "alwai": [5, 6, 7, 8, 18, 31, 33, 34], "loop": [5, 21, 29], "re": [5, 8, 32, 33], "feel": [5, 18, 34], "lazi": 5, "ye": 5, "clone": 5, "copi": [5, 17, 18], "cd": [5, 6], "rebas": [5, 34], "submodul": 5, "sync": [5, 20], "recurs": 5, "job": 5, "setup": [5, 6, 28, 34], "symlink": 5, "tree": [5, 6], "reinstal": [5, 26], "again": [5, 19, 32], "__init__": [5, 6, 8, 10, 16, 20, 26, 34], "repeatedli": 5, "interfac": [5, 6, 18, 26, 28], "pyi": 5, "non": [5, 8, 13, 18, 30, 32, 34], "cpp": [5, 6, 33], "cc": [5, 6, 17], "cu": 5, "h": [5, 6, 7, 16, 18, 26, 31, 32], "sure": [5, 14, 15, 32, 33], "until": [5, 20, 21, 33], "next": [5, 7, 34], "clean": 5, "cmake": [5, 6, 17, 34], "must": [5, 14, 17, 19], "maco": 5, "linux": [5, 6, 17, 30, 31, 33], "homebrew": 5, "brew": 5, "our": [5, 16, 19, 28, 33, 34], "error": [5, 6, 7, 10, 16, 18, 21, 22, 26, 34], "printf": 5, "stdio": 5, "nint": 5, "hello": 5, "world": [5, 7], "clang": 5, "simpl": [5, 7, 8, 11, 18, 33, 34], "binari": [5, 6, 7, 8, 17, 34], "folder": 5, "mani": [5, 14, 28, 31, 33, 34], "wai": [5, 10, 16, 18, 28, 34], "rm": 5, "rf": 5, "toplevel": 5, "over": [5, 7, 8, 9, 16, 18, 30, 31, 34], "made": [5, 34], "edit": [5, 26, 34], "repo": [5, 6, 7], "commit": 5, "ani": [5, 8, 10, 17, 18, 32, 34], "keep": [5, 12, 18, 21, 28, 32, 33, 34], "realli": 5, "untrack": 5, "deinit": 5, "f": [5, 6, 13, 16, 28, 34], "xdf": 5, "within": [5, 16, 21, 29, 33, 34], "experi": [5, 7, 10, 12, 16, 18, 26, 33, 34], "env_key1": 5, "env_val1": 5, "env_key2": 5, "env_val2": 5, "suit": 5, "locat": [5, 17, 34], "test_": 5, "individu": [5, 30], "filenam": 5, "repres": [5, 7, 21], "wish": [5, 7], "test_jit": 5, "narrow": 5, "down": [5, 32, 34], "testclassnam": 5, "testnam": 5, "let": [5, 10, 18, 19, 20, 21], "sai": 5, "test_sequenti": 5, "testjit": 5, "expecttest": 5, "hypothesi": 5, "mypi": 5, "depend": [5, 7, 17, 18, 25, 26, 33, 34], "conda": [5, 33], "offici": [5, 32, 33, 34], "unittest": 5, "substr": 5, "test_nn": 5, "v": 5, "testnn": 5, "test_bceloss": 5, "test_mseloss": 5, "keystrok": 5, "ci": 5, "quicklint": 5, "aren": 5, "setup_lint": 5, "target": [5, 6, 10, 13, 14, 17, 34], "makefil": 5, "complet": [5, 6, 14, 18, 29, 33], "tab": 5, "trail": [5, 21], "newlin": 5, "quick_check": 5, "flake8": 5, "cmakelint": 5, "tidi": 5, "changed_onli": 5, "written": [5, 6, 17], "framework": [5, 34], "runner": 5, "bin": [5, 6, 17, 31, 32], "gtest_filt": 5, "testsuit": 5, "maycontainalia": 5, "containeraliasingtest": 5, "test_alias_analysi": 5, "docstr": 5, "line": [5, 10, 13, 18, 31, 32, 33], "limit": [5, 8, 10, 20, 26, 32, 33, 34], "80": [5, 30, 31], "charact": 5, "fit": [5, 7, 33, 34], "jupyt": 5, "popup": 5, "prerequisit": [5, 6], "r": [5, 6, 7, 14, 23, 30, 32, 33], "txt": [5, 6, 32], "_build": 5, "rst": 5, "live": 5, "tutori": [5, 6, 15, 16, 34], "autofunct": 5, "autoclass": 5, "shorten": 5, "sphinx": 5, "produc": [5, 8], "miss": 5, "relat": [6, 13, 17, 31, 33, 34], "demonstr": [6, 18, 26, 32], "box": [6, 10, 33], "benefit": [6, 7, 8, 10, 20, 21, 28, 32, 33, 34], "against": 6, "below": [6, 8, 10, 14, 19, 20, 21, 22, 23, 26, 28, 31, 32, 33, 34], "criterion": [6, 8, 16, 22], "zero_grad": [6, 7, 16], "torchvis": [6, 10, 12, 13, 16, 18, 32, 34], "lr": [6, 7, 8, 16, 19], "001": [6, 8], "download": [6, 13, 16], "dataset": [6, 13, 16, 29, 30, 33, 34], "cifar10": [6, 13], "compos": [6, 13], "resiz": [6, 13], "224": [6, 8, 10, 12, 13, 30, 32, 34], "totensor": [6, 13, 16], "train_dataset": [6, 13], "root": [6, 13, 16, 17, 28], "train_load": [6, 8], "128": [6, 8, 10, 13, 20, 30, 34], "crossentropyloss": [6, 16], "momentum": [6, 10, 21], "9": [6, 7, 14, 17, 23, 25, 31, 32], "uncom": 6, "batch_idx": [6, 13], "enumer": [6, 13, 16, 29], "backward": [6, 7, 8, 16, 21, 33, 34], "print": [6, 11, 12, 13, 14, 16, 17, 23, 31], "model_state_dict": 6, "optimizer_state_dict": 6, "pth": 6, "finish": [6, 11, 12, 13, 16, 20], "noqa": [6, 11, 12, 13, 16, 23, 29], "f401": [6, 11, 12, 13, 16, 23, 29], "oneapi": [6, 33], "collect": [6, 32, 33, 34], "commun": [6, 28, 31, 32, 33, 34], "bind": [6, 7, 31, 32, 33, 34], "o": [6, 17, 23, 30], "dist": 6, "oneccl_bindings_for_pytorch": 6, "torch_ccl": 6, "master_addr": 6, "127": [6, 31, 34], "master_port": 6, "29500": [6, 31], "rank": [6, 31, 34], "pmi_rank": 6, "world_siz": [6, 29], "pmi_siz": [6, 29], "init_process_group": 6, "ccl": [6, 31, 34], "init_method": 6, "env": [6, 29], "dist_sampl": 6, "distributedsampl": 6, "sampler": 6, "distributeddataparallel": 6, "batch_id": 6, "destroy_process_group": 6, "nlp": [6, 7, 26, 30, 34], "resnet50_weight": [6, 12, 13], "rand": [6, 8, 12, 13, 20, 26, 34], "vocab_s": [6, 11, 32], "seq_length": [6, 11, 32], "randint": [6, 11, 32], "freez": [6, 8, 10, 13, 15, 16, 20, 23, 26, 32, 34], "check_trac": [6, 13, 32], "strict": [6, 32], "sinc": [6, 7, 18, 19, 20, 21, 26, 33, 34], "manual_se": [6, 11], "43": [6, 11, 31, 32], "12": [6, 10, 14, 17, 30, 31, 32], "instanti": 6, "qconfig_map": 6, "default_static_qconfig_map": 6, "own": [6, 15, 28], "qconfigmap": 6, "per_tensor_affin": [6, 15, 34], "quint8": [6, 15], "set_glob": 6, "traced_model": [6, 10, 13, 15, 16, 26, 34], "static_quantized_model": 6, "local": [6, 20, 28, 31, 32, 33], "default_dynamic_qconfig_map": 6, "placeholderobserv": [6, 15], "is_dynam": [6, 15], "dynamic_quantized_model": 6, "dedic": [6, 28, 34], "faster": [6, 7, 8, 30, 33], "variou": [6, 7, 14, 28, 33, 34], "38": [6, 11, 31, 32], "account": 6, "pretrain": [6, 32, 34], "login": 6, "argpars": [6, 23], "autoconfig": [6, 23], "automodelforcausallm": [6, 23, 29, 34], "autotoken": [6, 23], "parser": [6, 23], "argumentpars": [6, 23], "add_help": [6, 23], "add_argu": [6, 23], "choic": [6, 21, 23, 31], "choos": [6, 8, 20, 23, 31, 33, 34], "dinner": [6, 23], "greedi": [6, 23], "action": [6, 23], "store_tru": [6, 23], "parse_arg": [6, 23], "amp_en": [6, 23], "els": [6, 14, 17, 18, 23], "amp_dtyp": [6, 23], "getattr": [6, 23], "model_id": [6, 23], "125m": 6, "trust_remote_cod": [6, 23], "torch_dtyp": [6, 23], "low_cpu_mem_usag": [6, 23], "memory_format": [6, 7, 18, 23], "channels_last": [6, 7, 18, 23, 33, 34], "num_beam": [6, 23], "generate_kwarg": [6, 23], "do_sampl": [6, 23], "temperatur": [6, 23], "input_s": [6, 23], "return_tensor": [6, 23], "input_id": [6, 23], "inference_mod": [6, 23, 29], "gen_id": [6, 23], "max_new_token": [6, 23], "gen_text": [6, 23], "batch_decod": [6, 23], "skip_special_token": [6, 23], "input_tokens_length": [6, 23], "output_tokens_length": [6, 23], "total_new_token": [6, 23], "zip": [6, 23, 34], "flush": [6, 23], "typic": [6, 10, 28, 33, 34], "summari": [6, 34], "narg": 6, "neelnanda": 6, "pile": 6, "10k": 6, "meta": [6, 18, 28, 29], "7b": [6, 28, 30], "hf": [6, 28], "beam_idx_tmp": 6, "contigu": [6, 13, 18, 33, 34], "global_past_key_valu": 6, "num_attention_head": 6, "user_model": [6, 15], "num_hidden_lay": 6, "pad_val": 6, "pad_max": 6, "tokenize_funct": 6, "set_format": 6, "column": 6, "elif": 6, "collate_batch": 6, "position_ids_pad": 6, "input_ids_pad": 6, "last_ind": 6, "attention_mask_pad": 6, "append": [6, 7], "vstack": 6, "calib_dataset": [6, 29], "load_dataset": 6, "calib_evalu": 6, "shuffl": 6, "collate_fn": 6, "break": [6, 16, 34], "calibration_sampl": 6, "save_qconf_summari": [6, 15, 16, 29], "qconf_summari": [6, 15, 16, 29], "int8_qconfig": 6, "done": [6, 10, 16, 17, 26, 33, 34], "Will": [6, 18], "exit": [6, 31], "benchmark": [6, 26, 30, 31, 34], "lowp": 6, "fp16": [6, 17, 29], "unrel": 6, "lowp_mod": [6, 29], "fall": [6, 12], "back": [6, 12, 17, 18, 21, 26], "implicitli": 6, "determin": [6, 17, 21, 33], "woqweightdtyp": [6, 29], "weight_dtyp": [6, 29], "woqlowpmod": [6, 29], "get_weight_only_quant_qconfig_map": [6, 29], "known": [6, 10, 28], "practic": [6, 21, 24, 28, 33], "libtorch": [6, 34], "suppos": [6, 14, 33], "handl": [6, 18, 33], "servic": [6, 28, 30, 33], "regular": [6, 21], "unlik": 6, "app": [6, 34], "iostream": 6, "argc": 6, "const": [6, 17], "char": 6, "argv": 6, "catch": 6, "c10": [6, 17], "std": [6, 17, 19], "cerr": 6, "ivalu": 6, "push_back": 6, "cout": 6, "slice": [6, 18], "end": [6, 13, 20, 34], "endl": 6, "cmakelist": 6, "cmake_minimum_requir": 6, "version": [6, 7, 16, 17, 25, 26, 27, 32, 33, 34], "fatal_error": 6, "find_packag": 6, "add_execut": 6, "target_link_librari": 6, "torch_ipex_librari": 6, "set_properti": 6, "properti": [6, 32], "cxx_standard": 6, "17": [6, 30, 31, 32], "mkdir": 6, "build": [6, 28, 33, 34], "dcmake_prefix_path": 6, "libpytorch_path": 6, "had": [6, 33], "verifi": [6, 7], "ldd": 6, "workspac": 6, "identif": [6, 17], "gnu": [6, 17, 32], "xx": 6, "cxx": [6, 17], "abi": [6, 17, 34], "usr": [6, 17, 31, 32], "torchconfig": 6, "22": [6, 30, 31, 32], "kineto_librari": 6, "notfound": 6, "stack": [6, 8], "most": [6, 7, 13, 21, 28, 30, 32, 33, 34], "recent": [6, 7, 18], "append_torchlib_if_found": 6, "ipexconfig": 6, "84": [6, 30, 31, 33], "lib": [6, 31, 32], "libintel": [6, 34], "ext": [6, 34], "0x00007f3cf98e0000": 6, "libc10": 6, "0x00007f3cf985a000": 6, "0x00007f3cf70fc000": 6, "libtorch_cpu": 6, "0x00007f3ce16ac000": 6, "libdnnl_graph": 6, "0x00007f3cde954000": 6, "former": 6, "zoo": [6, 30], "simpli": [6, 7, 26, 31], "overview": [7, 25, 29, 34], "three": [7, 16, 17], "claus": [7, 10, 19], "guidanc": 7, "intel_pytorch_extens": [7, 25, 26, 34], "10": [7, 14, 16, 17, 18, 21, 25, 26, 31, 32, 33], "correct": [7, 18, 25, 34], "speed": [7, 11, 19, 28, 33, 34], "happen": 7, "inductor": [7, 34], "level": [7, 10, 13, 16, 18, 20, 21, 26, 33, 34], "migrat": 7, "pattern": [7, 11, 18, 28, 34], "highli": [7, 23, 28, 33, 34], "adapt": 7, "nchw": [7, 33], "nhwc": [7, 33, 34], "could": [7, 13, 16, 18, 26, 32, 33, 34], "anymor": [7, 34], "aka": [7, 18], "cooper": [7, 30, 34], "lake": [7, 30, 34], "avx512": [7, 17, 18, 32, 34], "partial": 7, "upstream": [7, 18, 34], "land": [7, 34], "pr": [7, 18, 34], "being": [7, 33], "review": [7, 34], "instead": [7, 8, 14, 19, 20, 29, 30, 31, 32, 33, 34], "device_nam": [7, 8], "conduct": 7, "frequent": 7, "websit": 7, "registr": 7, "topologi": [7, 18, 19, 26, 30, 31, 33, 34], "roialign": [7, 34], "nm": [7, 34], "cnn": [7, 18, 26, 30, 33, 34], "frozenbatchnorm2d": 7, "num_featur": 7, "batchnorm2d": [7, 10, 26, 34], "statist": 7, "affin": [7, 10, 15, 20, 31, 32, 33], "w": [7, 16, 18, 21, 30, 32], "interact": [7, 34], "beyond": 7, "kind": 7, "gender": 7, "hobbi": 7, "between": [7, 8, 17, 20, 33, 34], "man": [7, 33], "plai": [7, 33], "footbal": 7, "b": [7, 8, 16, 28], "mergedembeddingbag": 7, "embedding_spec": 7, "embeddingspec": 7, "merg": [7, 34], "embeddingbag": [7, 26, 34], "At": [7, 17], "stage": [7, 10, 19, 20, 29, 33, 34], "spars": [7, 18, 34], "dens": [7, 18], "gradient": 7, "mergedembeddingbagwithsgd": 7, "emblist": 7, "modulist": 7, "emb1": 7, "emb2": 7, "emb3": 7, "emb_m": 7, "in1": 7, "in2": 7, "in3": 7, "in_m": 7, "emb": 7, "in_i": 7, "merged_emb": 7, "from_embeddingbag_list": 7, "minim": [7, 14, 17, 33], "heavi": 7, "big": [7, 18], "read": [7, 19], "futur": [7, 28, 34], "visit": [7, 33], "mergedembeddingbagwith": 7, "weight_decai": [7, 19], "grad": [7, 19], "creat": [7, 16, 20, 33, 34], "decai": 7, "to_bfloat16_train": 7, "merged_input": 7, "linearize_indices_and_offset": 7, "need_linearize_indices_and_offset": 7, "booltensor": 7, "becom": [7, 28, 33], "balanc": [7, 16, 22, 33], "embedingbag": 7, "often": 7, "categor": 7, "power": [7, 33, 34], "law": 7, "ag": 7, "video": 7, "game": 7, "19": [7, 30, 31, 32, 34], "29": [7, 31, 32], "row": 7, "write": [7, 17], "address": [7, 18, 31, 32, 33, 34], "conflict": [7, 17], "solv": [7, 19, 33], "togeth": [7, 14, 20, 33, 34], "immedi": 7, "right": [7, 21, 23, 28], "friendli": [7, 33], "gemm": [7, 18, 26, 28, 34], "aim": [7, 10, 16, 33], "math": 7, "wa": [7, 31, 32, 33, 34], "test": [7, 16, 17, 30, 34], "broad": [7, 9, 34], "toggl": 7, "switch": [7, 17, 31, 33, 34], "concern": 7, "footprint": [7, 21, 28, 34], "stick": 7, "splitsgd": [7, 21], "spawn": [7, 20], "subject": [7, 17, 20, 27, 34], "built": [7, 17, 20, 34], "deliv": [7, 28, 34], "separ": [7, 19, 27, 33], "smooth": 7, "ptq": 7, "tackl": 7, "problem": [7, 19, 26, 32, 33], "systemat": 7, "outlier": [7, 16], "commonli": [7, 28, 33, 34], "hopefulli": 7, "eas": [7, 18, 34], "small": [7, 19, 33, 34], "turn": [7, 34], "boolean": [7, 34], "off": [7, 8, 21, 28, 30, 34], "area": [7, 14], "extrem": [7, 14, 33], "situat": [7, 14], "huge": [7, 14, 33], "impract": [7, 14], "consum": [7, 14], "launcher": [7, 13, 31, 33, 34], "integr": [7, 18, 28, 33, 34], "conveni": [8, 34], "lower": [8, 17, 21, 28, 34], "becaus": [8, 17, 18, 21, 28, 33, 34], "lighter": 8, "smaller": [8, 17], "sacrif": 8, "trade": [8, 28, 30, 34], "slower": [8, 33, 34], "accur": 8, "primarili": [8, 34], "show": [8, 17, 21, 28, 29, 30, 31, 32, 33, 34], "simplenet": [8, 34], "super": [8, 10, 16, 20, 26, 34], "stride": [8, 10, 20, 34], "pad": [8, 10, 20, 34], "y": [8, 15, 16, 20, 21, 34], "chosen": [8, 14, 17], "maintain": 8, "categori": [8, 34], "circumst": 8, "imag": [8, 13, 18, 33, 34], "label": 8, "float64": 8, "suppli": 8, "addmm": 8, "addmm_": 8, "cannot": [8, 19, 26, 34], "describ": [8, 13, 18, 21, 32, 33], "expos": 8, "namespac": [8, 17], "regardless": [8, 34], "unlist": 8, "downstream": 8, "believ": [8, 18], "unstabl": 8, "conv1d": [8, 13], "conv3d": [8, 13, 34], "conv_transpose1d": 8, "conv_transpose2d": 8, "conv_transpose3d": 8, "bmm": [8, 34], "mm": 8, "baddbmm": 8, "addbmm": 8, "conv_tbc": 8, "group_norm": 8, "_native_multi_head_attent": 8, "avg_pool3d": 8, "binary_cross_entropi": 8, "grid_sampl": 8, "polar": 8, "prod": 8, "quantil": 8, "nanquantil": 8, "stft": 8, "cdist": 8, "view_as_complex": 8, "choleski": 8, "cholesky_invers": 8, "cholesky_solv": 8, "invers": 8, "lu_solv": 8, "matrix_rank": 8, "orgqr": 8, "ormqr": 8, "pinvers": 8, "max_unpool2d": 8, "max_unpool3d": 8, "adaptive_avg_pool3d": 8, "reflection_pad1d": 8, "reflection_pad2d": 8, "replication_pad1d": 8, "replication_pad2d": 8, "replication_pad3d": 8, "mse_loss": 8, "cosine_embedding_loss": 8, "nll_loss": 8, "nll_loss2d": 8, "hinge_embedding_loss": 8, "poisson_nll_loss": 8, "smooth_l1_loss": 8, "cross_entropy_loss": 8, "l1_loss": 8, "huber_loss": 8, "margin_ranking_loss": 8, "soft_margin_loss": 8, "triplet_margin_loss": 8, "multi_margin_loss": 8, "ctc_loss": 8, "kl_div": 8, "multilabel_margin_loss": 8, "binary_cross_entropy_with_logit": 8, "fft_fft": 8, "fft_ifft": 8, "fft_fft2": 8, "fft_ifft2": 8, "fft_fftn": 8, "fft_ifftn": 8, "fft_rfft": 8, "fft_irfft": 8, "fft_rfft2": 8, "fft_irfft2": 8, "fft_rfftn": 8, "fft_irfftn": 8, "fft_hfft": 8, "fft_ihfft": 8, "linalg_cond": 8, "linalg_matrix_rank": 8, "linalg_solv": 8, "linalg_choleski": 8, "linalg_svdv": 8, "linalg_eigv": 8, "linalg_eigvalsh": 8, "linalg_inv": 8, "linalg_householder_product": 8, "linalg_tensorinv": 8, "linalg_tensorsolv": 8, "fake_quantize_per_tensor_affin": 8, "eig": 8, "geqrf": 8, "lstsq": 8, "_lu_with_info": 8, "qr": 8, "svd": 8, "symeig": 8, "triangular_solv": 8, "fractional_max_pool2d": 8, "fractional_max_pool3d": 8, "adaptive_max_pool3d": 8, "multilabel_margin_loss_forward": 8, "linalg_qr": 8, "linalg_cholesky_ex": 8, "linalg_svd": 8, "linalg_eig": 8, "linalg_eigh": 8, "linalg_lstsq": 8, "linalg_inv_ex": 8, "cat": [8, 31, 32, 34], "index_copi": 8, "intervent": 8, "mixtur": [8, 34], "enable_auto_channels_last": 9, "disable_auto_channels_last": 9, "regress": [9, 34], "rais": 10, "oob": [10, 34], "easili": [10, 15], "who": 10, "inevit": 10, "simplifi": [10, 34], "snippet": [10, 29], "optimum": 10, "monkei": 10, "patch": [10, 34], "embedding_bag": 10, "qa": [10, 34], "clear": 10, "ninstanc": [10, 14, 31, 34], "ncore": [10, 31], "28": [10, 14, 16, 30, 31, 32, 33, 34], "run_qa": [10, 34], "model_name_or_path": [10, 29, 34], "dataset_nam": [10, 34], "squad": [10, 30, 34], "do_ev": [10, 34], "per_device_train_batch_s": [10, 34], "learning_r": [10, 34], "3e": [10, 34], "num_train_epoch": [10, 34], "max_seq_length": [10, 34], "384": [10, 32, 34], "doc_strid": [10, 34], "output_dir": [10, 14, 34], "tmp": [10, 32, 34], "debug_squad": [10, 34], "dummymodul": 10, "input1": 10, "kernel_s": 10, "7": [10, 14, 17, 20, 21, 31, 32, 34], "track_running_stat": 10, "customized_forward": 10, "method1": 10, "success": [10, 24], "method2": 10, "fail": [10, 26, 34], "top": [10, 21, 34], "unabl": 10, "hook": [10, 16], "As": [10, 19, 20, 28, 31, 32, 33, 34], "behaviour": 10, "repeat": [10, 18, 21], "feasibl": 10, "idea": [11, 21, 33], "primit": [11, 20, 30, 34], "portabl": 11, "hpc": 11, "ensur": [11, 19, 20, 32], "perf": [11, 18], "tri": 12, "failur": [12, 34], "incorrect": [12, 26, 34], "trigger": 12, "meanwhil": [12, 33, 34], "resnet50": [12, 13, 14, 18, 30, 31, 33, 34], "dag": 13, "acycl": 13, "straight": [13, 33], "cover": [13, 18, 31], "constant": 13, "resourc": [13, 20, 28, 32, 33], "focus": [13, 34], "front": [13, 34], "batchnorm": [13, 17, 18, 26, 34], "propag": [13, 21, 33], "graph_for": 13, "regard": 13, "rn50": [13, 34], "sum": [13, 16, 18, 19, 34], "convrelu": 13, "convsumrelu": 13, "default_static_qconfig": [13, 15, 32, 34], "quantized_model": [13, 15, 34], "244": 13, "convtranspose3d": 13, "ab": [13, 32], "clamp": 13, "elu": 13, "exp": 13, "hardtanh": 13, "hardswish": [13, 34], "mish": 13, "sigmoid": [13, 34], "pow": 13, "round": [13, 21], "squar": [13, 28], "tanh": [13, 34], "leaki": 13, "_": [13, 15, 16, 17, 18, 20, 30, 31, 32, 33, 34], "div": 13, "view": [13, 18, 20, 21], "transpos": [13, 34], "dequant": [13, 16], "partit": [13, 33], "leaky_relu": 13, "___": 13, "divid": [13, 32, 33, 34], "maxpool2d": 13, "_____": 13, "stock": [13, 30, 34], "owner": 13, "otheriws": 13, "compuat": 13, "wikipedia": [13, 33], "There": [14, 16, 20, 33, 34], "thing": [14, 33], "yaml": 14, "strategi": [14, 33, 34], "grid": 14, "random": 14, "max_trial": 14, "trial": 14, "record": [14, 32], "csv": 14, "hyperparam": 14, "mandatori": 14, "hp": 14, "ncores_per_inst": 14, "all_physical_cor": 14, "ncore_per_inst": [14, 34], "all_logical_cor": 14, "use_all_nod": 14, "num_nod": 14, "use_logical_cor": [14, 32], "is_hyperthreading_en": 14, "disable_numactl": [14, 32], "disable_iomp": [14, 32], "malloc": [14, 31, 33], "tc": 14, "je": 14, "previou": [14, 16, 18, 33, 34], "hyperparamt": 14, "8": [14, 16, 30, 31, 32, 33], "respect": [14, 16, 30, 31, 34], "maxim": 14, "statement": [14, 17], "higher_is_bett": 14, "target_v": 14, "inf": 14, "minimum": [14, 16, 18], "platinum": [14, 30, 32, 33], "8180m": [14, 33], "socket": [14, 30, 32, 33, 34], "anoth": [14, 31, 33, 34], "conf_fil": [14, 34], "hypertune_directori": 14, "termin": 14, "15": [14, 17, 30, 31, 32], "339081764221191": 14, "gave": 14, "side": [15, 33], "compon": [15, 26, 27, 28], "much": [15, 18, 21, 28, 33], "abl": 15, "similar": [15, 17, 33], "satisfi": [15, 26], "tradeoff": 15, "reduce_rang": 15, "methond": 15, "obsev": 15, "symmetr": 15, "sete": 15, "skylak": 15, "quant_stat": 15, "calibration_data_set": [15, 34], "qparam": 15, "And": [15, 20, 32, 34], "achang": 15, "overrid": 15, "load_qconf_summari": 15, "dynamic_qconfig": 15, "default_dynamic_qconfig": [15, 32], "per_tensor_symmetr": 15, "gru": 15, "lstmcell": 15, "rnncell": 15, "grucel": 15, "bother": 16, "desir": [16, 31], "receip": [16, 20], "sq": 16, "difficulti": 16, "vari": 16, "across": [16, 31], "herebi": 16, "obtain": 16, "abil": 16, "optdecoderlay": 16, "blockwis": 16, "consist": [16, 28, 33, 34], "major": 16, "adjust": 16, "accordingli": 16, "predict": 16, "criteria": 16, "consider": 16, "numpi": 16, "np": [16, 31], "tolist": 16, "auto_alpha_arg": 16, "init_alpha": [16, 22], "baselin": [16, 22, 34], "alpha_min": [16, 22], "alpha_max": [16, 22], "99": [16, 30, 34], "alpha_step": [16, 22], "step_siz": [16, 22], "shared_criterion": [16, 22], "enable_blockwise_loss": [16, 22], "portion": 16, "beginn": 16, "quickstart_tutori": 16, "training_data": 16, "fashionmnist": 16, "test_data": 16, "loader": 16, "train_dataload": 16, "test_dataload": 16, "neuralnetwork": 16, "flatten": [16, 20], "linear_relu_stack": 16, "sequenti": 16, "logit": 16, "loss_fn": 16, "pred": 16, "backpropag": 16, "item": 16, "7f": 16, "5d": 16, "epoch": 16, "argmax": 16, "inc": [16, 17, 22, 28], "accu": 16, "tuned_conf": 16, "explain": [17, 18, 21], "fork": [17, 33], "avx512_vnni": 17, "avx512_bf16": 17, "avx2": [17, 26, 34], "avx2_vnni": 17, "avx512_fp16": 17, "11": [17, 31, 32], "gcc": 17, "findavx": 17, "bodi": 17, "anonym": 17, "virtual": 17, "polymorph": 17, "pertain": 17, "cpuid": 17, "statu": 17, "pointer": 17, "system": [17, 33], "specifii": 17, "complier": 17, "isacodegen": 17, "suffix": 17, "adaptiveaveragepoolingkrnl": 17, "isa_codegen": 17, "o3": 17, "d__avx__": 17, "dcpu_capability_avx2": 17, "mavx2": 17, "mfma": 17, "mno": 17, "avx256": 17, "unalign": [17, 34], "dcpu_cap": 17, "dcpu_capability_default": 17, "d__avx512f__": 17, "mavx512f": 17, "mavx512bw": 17, "mavx512vl": 17, "mavx512dq": 17, "dcpu_capability_avx512": 17, "mavx512vnni": 17, "dcpu_capability_avx512_vnni": 17, "mavx512bf16": 17, "dcpu_capability_avx512_bf16": 17, "mamx": 17, "tile": 17, "dcpu_capability_amx": 17, "mavx512fp16": 17, "dcpu_capability_avx512_fp16": 17, "align": [17, 18, 21, 34], "stead": 17, "sleef": 17, "width": [17, 18], "isa_nam": 17, "inlin": 17, "compat": [17, 21], "definit": [17, 21], "Such": 17, "But": [17, 18], "tip": 17, "newkernelkrnl": 17, "newkernel": 17, "header": 17, "special": [17, 18, 28], "fastest": 17, "cpuinfo": 17, "mykernel": 17, "fn_type": 17, "void": 17, "ipex_declare_dispatch": 17, "ipex_define_dispatch": 17, "ipex_register_dispatch": 17, "kcpu": 17, "declar": 17, "ideep": [17, 18], "common": [17, 21, 28, 31, 33], "intrins": 17, "cvtfp32tobf16": 17, "pragma": 17, "torch_ipex": [17, 34], "cvt_fp32_to_bf16": 17, "dst": 17, "cvt_fp32_to_bf16_kernel_impl": 17, "cvt_fp32_to_bf16_kernel_fn": 17, "cvt_fp32_to_bf16_kernel_stub": 17, "macro": 17, "cpu_capability_avx512": 17, "cpu_capability_avx512_bf16": 17, "hav": 17, "cvtfp32tobf16krnl": 17, "vec512": 17, "vec256": 17, "endif": 17, "immintrin": 17, "__m256i": 17, "_cvt_fp32_to_bf16": 17, "__m512": 17, "reinterpret_cast": 17, "_mm512_cvtneps_pbh": 17, "__m512i": 17, "_mm512_castps_si512": 17, "nan": [17, 34], "_mm512_set1_epi32": 17, "0xffff": 17, "mask_valu": 17, "_mm512_cmp_ps_mask": 17, "_cmp_ord_q": 17, "0x1": 17, "vec_bia": 17, "0x7fff": 17, "uint32_t": 17, "lsb": 17, "t_valu": 17, "_mm512_and_si512": 17, "_mm512_srli_epi32": 17, "rounding_bia": 17, "_mm512_add_epi32": 17, "_mm512_mask_blend_epi32": 17, "_mm512_cvtusepi32_epi16": 17, "f32": [17, 18], "_mm512_loadu_p": 17, "_mm256_storeu_si256": 17, "_mm512_maskz_loadu_p": 17, "_mm256_mask_storeu_epi16": 17, "getveclength": 17, "get_cpp_typesize_and_vecs": 17, "scalartyp": 17, "get_cpp_typesize_and_vecsize_kernel_impl": 17, "get_cpp_typesize_and_vecsize_kernel_fn": 17, "get_cpp_typesize_and_vecsize_kernel_stub": 17, "types": 17, "vectors": 17, "getveclengthkrnl": 17, "doubl": 17, "make_tupl": 17, "sizeof": 17, "complexdoubl": 17, "complex": 17, "complexfloat": 17, "decltyp": 17, "impl": 17, "scalartypetocpptyp": 17, "torch_check": 17, "09": [17, 31], "58": [17, 31], "anaconda": 17, "copyright": [17, 27], "credit": 17, "licens": 17, "_c": [17, 26], "_get_current_isa_level": 17, "_get_highest_cpu_support_isa_level": 17, "_get_highest_binary_support_isa_level": 17, "quit": [17, 34], "By": [17, 31, 33], "aten_cpu_cap": 17, "effect": [17, 21, 26, 32, 33], "intern": [17, 18, 20, 32], "purpos": [17, 31, 32, 33], "addtion": 17, "tool": [17, 33, 34], "subfold": 17, "rh": 17, "toolset": 17, "33": [17, 31, 32], "cmakefil": 17, "cpu_featur": 17, "dir": [17, 31], "66": [17, 31, 34], "cpu_feature_main": 17, "xcr0": 17, "00000000000602e7": 17, "mmx": 17, "sse": 17, "sse2": 17, "sse3": 17, "ssse3": 17, "sse4_1": 17, "sse4_2": 17, "aes_ni": 17, "sha": 17, "xsave": 17, "fma": 17, "f16c": 17, "avx_vnni": 17, "avx512_f": 17, "avx512_cd": 17, "avx512_pf": 17, "avx512_er": 17, "avx512_vl": 17, "avx512_bw": 17, "avx512_dq": 17, "avx512_ifma": 17, "avx512_vbmi": 17, "avx512_vpopcntdq": 17, "avx512_4fmap": 17, "avx512_4vnniw": 17, "avx512_vbmi2": 17, "avx512_vpclmul": 17, "avx512_bitalg": 17, "avx512_vp2intersect": 17, "amx_bf16": 17, "amx_til": 17, "amx_int8": 17, "prefetchw": 17, "prefetchwt1": 17, "represent": 18, "multidimension": 18, "arrai": 18, "nd": 18, "1d": 18, "semant": 18, "attribut": 18, "coo": 18, "canon": 18, "assign": [18, 32, 33], "2d": 18, "height": 18, "illustr": [18, 19, 21, 31, 33], "actual": [18, 21], "bmp": 18, "contiguous_format": [18, 33], "tensorflow": 18, "close": [18, 31, 33], "to_mkldnn": 18, "difficult": 18, "manipul": 18, "to_dens": 18, "natur": [18, 21, 28], "hold": [18, 33], "secret": 18, "ingredi": 18, "almost": 18, "foundat": [18, 33], "upper": [18, 33], "fact": [18, 33], "expens": 18, "benefici": 18, "nb": 18, "me": 18, "roughli": 18, "50": [18, 31, 32], "mkldnn": 18, "mkldnn_util": 18, "subsequ": [18, 33], "concept": [18, 33], "diagram": [18, 33], "hard": [18, 26], "conclus": 18, "necessari": 18, "neglig": 18, "move": [18, 33], "organ": 18, "question": [18, 30], "reinterpret": 18, "answer": [18, 30], "chw": 18, "hw": 18, "stride_n": 18, "stride_c": 18, "stride_h": 18, "stride_w": 18, "merit": 18, "express": [18, 34], "noncontigu": 18, "n1": 18, "n2": 18, "mind": [18, 32], "someth": 18, "reli": [18, 20], "rfc": 18, "hwc": 18, "wc": 18, "chwn": 18, "hwn": 18, "wn": 18, "empti": [18, 31], "outplac": [18, 34], "is_contigu": 18, "_appli": 18, "brief": [18, 28, 34], "imagenet": [18, 30], "spontan": 18, "tell": [18, 20, 33], "NOT": [18, 31], "compris": 18, "explicit": [18, 20, 33], "implicit": 18, "tensoriter": 18, "guidelin": 18, "awar": [18, 20, 31, 32], "my": 18, "upsampl": [18, 34], "cudnn": 18, "accommod": 18, "md": 18, "format_tag": 18, "src_md": 18, "desc": 18, "data_typ": 18, "src_mem": 18, "src_data_ptr": 18, "card": 18, "hwio": 18, "resnext101": [18, 34], "detectron2": 18, "8x": 18, "lamb": [19, 21], "adagrad": [19, 21], "clr": 19, "lr_decai": 19, "state_sum": 19, "addcmul_": 19, "add_": 19, "addcdiv_": 19, "whole": [19, 20, 33], "storag": 19, "onboard": [19, 33], "third": [19, 34], "high": [19, 21, 33], "bound": [19, 20, 28, 33], "bottl": 19, "neck": 19, "prevent": 19, "pseudo": [19, 21, 34], "adagrad_fused_step": 19, "group": [19, 20, 33], "grad0": 19, "grad1": 19, "grad_n": 19, "param_n": 19, "state_sum_n": 19, "adagrad_step": 19, "grad_i": 19, "param_i": 19, "state_sum_i": 19, "other_arg": 19, "coupl": [20, 33, 34], "omp": [20, 26, 31, 32, 33, 34], "ld_preload": [20, 31, 32, 33], "libiomp5": [20, 31, 32, 33], "model_script": 20, "examplenet": 20, "examplenet1": 20, "x1": 20, "start_dim": 20, "examplenet2": 20, "conv2": 20, "x2": 20, "y1": 20, "y2": 20, "model1": 20, "traced_model1": 20, "model2": 20, "traced_model2": 20, "multi_stream_model": [20, 34], "datatyp": [20, 34], "receipt": 20, "steam": [20, 34], "input_hint": 20, "output_hint": 20, "pthread": 20, "async": [20, 34], "wake": 20, "synchron": [20, 26, 34], "imper": [20, 34], "suffer": 20, "gil": 20, "hurt": 20, "mitig": [20, 30], "omp_num_thread": [20, 26, 31, 32, 34], "phase": 20, "s1": 20, "c1": 20, "numactl": [20, 31, 32], "outsid": 20, "superset": 20, "undefin": [20, 33], "gb": 20, "simultan": 20, "correspond": [20, 31, 34], "cpu_pool1": 20, "cpu_pool2": 20, "task1": 20, "task2": 20, "y1_futur": 20, "y2_futur": 20, "y_runtim": 20, "kmp_": 20, "fulfil": 20, "worker": [20, 31], "serv": [20, 34], "sub": [20, 28, 33], "wait": [20, 33], "futuretensor": 20, "didn": 20, "dlopen": 20, "symbol": 20, "bottom": 21, "bit": [21, 28], "sign": 21, "expon": 21, "mantissa": 21, "23": [21, 31, 32], "capac": [21, 30], "digit": 21, "shorter": [21, 28], "fewer": 21, "neg": 21, "disadvantag": 21, "shift": 21, "left": [21, 28, 32], "lose": 21, "decim": 21, "valid": [21, 34], "1234500000": 21, "0000012345": 21, "1234512345": 21, "sens": 21, "fraction": 21, "12345": 21, "00000": 21, "signific": 21, "bui": 21, "involv": 21, "ground": 21, "truth": 21, "chain": 21, "rule": [21, 34], "meet": [21, 33, 34], "wide": [21, 34], "understand": [21, 28, 33], "formula": 21, "\u03b1": 21, "gw": 21, "denot": 21, "receiv": 21, "rate": 21, "earlier": 21, "inaccur": 21, "exactli": 21, "kept": 21, "halv": 21, "recov": 21, "fp32_w": 21, "concat_fp32_from_bf16": 21, "bf16_w": 21, "fp32_gw": 21, "bf16_gw": 21, "weight_dacai": 21, "split_bf16_from_fp32": 21, "ratio": [22, 30, 34], "beta": [23, 26], "demostr": 23, "cheat": 23, "sheet": 23, "pypi": [26, 34], "occupi": 26, "remark": [26, 30, 33], "__name__": [26, 34], "__main__": [26, 31, 32, 34], "112": [26, 30, 33, 34], "nnc": 26, "poor": [26, 34], "xlm": 26, "roberta": [26, 34], "casual": 26, "gpt2": 26, "summar": 26, "classif": [26, 30], "allenai": 26, "longform": 26, "409": 26, "workaround": [26, 34], "_jit_set_texpr_fuser_en": 26, "csrc": 26, "tensorexpr_fus": 26, "settensorexprfuseren": 26, "longer": [26, 30], "complic": [26, 31, 33], "undergo": [26, 29], "runtimeerror": [26, 34], "overflow": [26, 34], "unpack": [26, 34], "exce": [26, 30, 33, 34], "quantize_per_tensor": 26, "pseudocod": [26, 34], "omp_num_threa": 26, "set_num_thread": [26, 34], "freezed_model": [26, 34], "run_benchmark": [26, 34], "flow": 26, "bag": [26, 34], "progress": [26, 28, 34], "abnorm": [26, 34], "tbd": 26, "transformerencoderlay": 26, "encount": [26, 34], "rnnt": [26, 34], "joint_net": [26, 34], "caller": [26, 34], "apach": [27, 32], "notic": [27, 31, 32], "term": 27, "condit": 27, "multiheadattent": 28, "feedforward": 28, "lot": [28, 34], "besid": [28, 33, 34], "adopt": [28, 34], "modelfamili": 28, "hub": 28, "staticquantizationint8": 28, "onlyquantizationint8": 28, "onlyquantizationint4": 28, "13b": [28, 30, 34], "70b": [28, 34], "8b": 28, "20b": 28, "dolli": [28, 34], "databrick": 28, "v2": [28, 30, 34], "12b": 28, "tiiuae": 28, "40b": 28, "30b": 28, "3b": 28, "bigscienc": 28, "1b7": 28, "salesforc": 28, "2b": 28, "baichuan2": [28, 34], "chat": 28, "thudm": 28, "chatglm3": [28, 34], "chatglm2": [28, 34], "bigcod": 28, "starcod": [28, 34], "flan": 28, "xl": 28, "mosaicml": 28, "mistralai": 28, "v0": 28, "8x7b": 28, "stabilityai": 28, "1_6b": 28, "liuhaotian": 28, "v1": [28, 34], "microsoft": 28, "ieityuan": 28, "yuan2": 28, "102b": 28, "signifi": 28, "perfect": 28, "codellama": 28, "rope": 28, "past": 28, "year": 28, "flourish": 28, "contribut": [28, 31, 34], "research": 28, "web": 28, "legend": 28, "autotp": 28, "obviou": 28, "hotspot": 28, "lead": 28, "significantli": [28, 34], "heavier": 28, "io": 28, "occurr": 28, "ship": 28, "2nd": 28, "4th": [28, 30], "except": [28, 31], "beeter": 28, "Its": 28, "seen": 28, "woq": 28, "integ": [28, 33], "bandwidth": 28, "reorder_cach": 28, "beam_width": 28, "secondli": 28, "elimin": 28, "shard": 28, "content": [29, 34], "your_calibration_dataset": 29, "calib_sampl": 29, "calibration_model": 29, "qconfig_summary_file_path": 29, "nf4": 29, "init_distribut": 29, "get_acceler": 29, "communication_backend_nam": 29, "var": 29, "ondevic": 29, "init_infer": 29, "mp_size": 29, "base_dir": 29, "repo_root": 29, "checkpoints_json": 29, "zone": [30, 34], "articl": [30, 33], "llama2": [30, 34], "1024": [30, 33], "were": [30, 31, 32, 33], "carri": 30, "m7i": 30, "m6i": [30, 32], "47x": 30, "62x": 30, "57x": 30, "58x": 30, "85x": 30, "27x": 30, "38x": 30, "29x": 30, "36x": 30, "conclud": [30, 34], "respons": 30, "session": 30, "exhibit": 30, "wherea": 30, "p90": 30, "26x": 30, "sec": 30, "39": [30, 31, 32, 34], "26": [30, 31, 32], "49": [30, 31, 32], "170": 30, "21": [30, 31, 32], "measur": [30, 34], "17th": 30, "16xlarg": 30, "u": [30, 32], "west": 30, "ubuntu": 30, "04": [30, 31], "1009": 30, "sw": 30, "workload1": 30, "inference2": 30, "realtim": 30, "inference3": 30, "tunabl": [30, 32], "8380": 30, "30ghz": 30, "83x": 30, "44x": 30, "ssd": [30, 34], "resnet34": [30, 34], "16x": 30, "coco": 30, "1200": 30, "resnext": 30, "32x16d": 30, "81x": 30, "21x": 30, "vgg": 30, "75x": 30, "19x": 30, "shufflenetv2_x1": 30, "07x": 30, "78x": 30, "04x": 30, "max_seq_len": 30, "384task": 30, "jemalloc": [30, 32, 34], "05x": 30, "96x": 30, "mrpc": 30, "128task": 30, "distilbert": 30, "12x": 30, "dnnl": 30, "base_text_classif": 30, "f1": 30, "81": [30, 31], "79": [30, 31], "93": 30, "02": [30, 32], "85": [30, 31], "86": [30, 31], "top1": 30, "76": [30, 31], "75": [30, 31], "98": 30, "78": [30, 31], "199": 30, "48": [30, 31, 32], "vgg11": 30, "69": [30, 31], "67": [30, 31, 34], "96": 30, "44": [30, 31, 32], "36": [30, 31, 32], "92": 30, "97": 30, "shufflenet": 30, "histogram": [30, 34], "40": [30, 31, 32, 34], "ucod": 30, "0xd0002a0": 30, "ON": 30, "turboboost": 30, "bio": 30, "ddr": 30, "16gb": 30, "3200": 30, "dcpmm": 30, "256gb": 30, "host": [30, 34], "cento": 30, "2105": 30, "18": [30, 31, 32], "305": 30, "el8_4": 30, "x86_64": 30, "docker": [30, 34], "spectr": 30, "meltdown": 30, "24x": 30, "31x": 30, "15x": 30, "30x": 30, "mobilenet": 30, "08x": 30, "03x": 30, "09x": 30, "39x": 30, "35x": 30, "160": 30, "55x": 30, "06x": 30, "fpn": 30, "71x": 30, "20x": 30, "13x": 30, "32x": 30, "48x": 30, "11x": 30, "terabyt": 30, "14x": 30, "02x": 30, "10x": 30, "33x": 30, "8380h": 30, "90ghz": 30, "56": [30, 31, 32, 33], "67x": 30, "45x": 30, "77x": 30, "18x": 30, "formerli": [30, 33, 34], "0x700001c": 30, "wlydcrb1": 30, "sy": 30, "0016": 30, "p29": 30, "2006080250": 30, "64gb": 30, "768gb": 30, "influenc": [31, 33], "properli": 31, "themselv": [31, 34], "free": [31, 34], "mainli": [31, 34], "around": 31, "interpret": 31, "prefix": 31, "cross": [31, 32, 33, 34], "taskset": 31, "malloc_conf": [31, 33], "crash": [31, 33, 34], "nnode": 31, "nproc": 31, "count": 31, "addr": 31, "ip": 31, "hostnam": 31, "proc": 31, "port": 31, "hostfil": 31, "mpi": 31, "mpiexec": 31, "hydra": 31, "ppn": 31, "genv": 31, "i_mpi_pin_domain": 31, "codeless": 31, "ut": 31, "exclus": 31, "mutual": 31, "ld": 31, "favorit": 31, "kmp": [31, 33], "granular": [31, 32, 33], "compact": [31, 32, 33], "stdout": 31, "afterward": [31, 33], "undesir": 31, "_timestamp_inst": 31, "_timestamp_instance_": 31, "_core": 31, "run_20210712212258_inst": 31, "run_20210712212258_instance_0_cores_0": 31, "gif": 31, "07": 31, "764": 31, "conda_prefix": [31, 32], "virtual_env": [31, 32], "lib64": [31, 32], "home": [31, 32], "drop": [31, 32], "kmp_affin": [31, 32, 33], "kmp_blocktim": [31, 32, 33], "14": [31, 32, 34], "24": [31, 32], "25": [31, 32], "27": [31, 32, 33], "30": [31, 32], "31": [31, 32], "34": [31, 32], "35": [31, 32], "37": [31, 32, 34], "41": [31, 32], "42": [31, 32], "tee": 31, "run_20210712223308_inst": 31, "run_20210712223308_instance_0_cores_0": 31, "87": 31, "08": 31, "117": 31, "88": 31, "118": 31, "45": [31, 32], "46": [31, 32], "47": [31, 32], "51": [31, 32], "52": [31, 32], "53": [31, 32], "54": [31, 32], "55": [31, 32, 33], "57": 31, "59": 31, "60": 31, "61": 31, "62": 31, "63": [31, 34], "65": 31, "68": [31, 34], "70": 31, "71": 31, "72": 31, "73": 31, "74": 31, "77": 31, "82": 31, "83": [31, 33], "run_20210712214504_inst": 31, "run_20210712214504_instance_0_cores_22": 31, "513": 31, "run_20210712220928_inst": 31, "run_20210712220928_instance_0_cores_0": 31, "355": 31, "356": 31, "deduct": 31, "run_20210712221615_inst": 31, "run_20210712221615_instance_0_cores_11": 31, "591": 31, "run_20210712221150_inst": 31, "run_20210712221150_instance_0_cores_0": 31, "run_20210712221150_instance_1_cores_22": 31, "233": 31, "236": 31, "run_20210712221415_inst": 31, "run_20210712221415_instance_0_cores_0": 31, "run_20210712221415_instance_1_cores_4": 31, "run_20210712221415_instance_2_cores_8": 31, "run_20210712221415_instance_3_cores_12": 31, "run_20210712221415_instance_4_cores_16": 31, "run_20210712221415_instance_5_cores_20": 31, "run_20210712221415_instance_6_cores_24": 31, "run_20210712221415_instance_7_cores_28": 31, "run_20210712221415_instance_8_cores_32": 31, "run_20210712221415_instance_9_cores_36": 31, "run_20210712221415_instance_10_cores_40": 31, "140": 31, "143": 31, "146": 31, "149": 31, "151": 31, "154": 31, "157": 31, "159": 31, "162": 31, "164": 31, "167": 31, "run_20210712221305_inst": 31, "run_20210712221305_instance_0_cores_0": 31, "run_20210712221305_instance_1_cores_11": 31, "run_20210712221305_instance_2_cores_22": 31, "run_20210712221305_instance_3_cores_33": 31, "470": 31, "471": 31, "473": 31, "476": 31, "479": 31, "instance_idx": 31, "independ": 31, "confirm": 31, "175": 31, "176": 31, "177": 31, "run_20220106130151_instance_0_cores_0": 31, "sometim": [31, 33], "235": 31, "jemallocl": 31, "oversize_threshold": [31, 33], "background_thread": [31, 33], "metadata_thp": [31, 33], "dirty_decay_m": [31, 33], "9000000000": [31, 33], "muzzy_decay_m": [31, 33], "libjemalloc": 31, "run_20210713153048_instance_0_cores_0": 31, "654": 31, "libtcmalloc": [31, 32], "655": 31, "run_20210713153333_instance_0_cores_0": 31, "784": 31, "run_20210713153659_instance_0_cores_0": 31, "blocktim": 31, "00": [31, 34], "760": [31, 32], "761": [31, 32], "omp_schedul": [31, 33], "omp_proc_bind": [31, 33], "run_20210713152500_instance_0_cores_0": 31, "give": [32, 34], "ipex_en": 32, "procedur": 32, "tunin": 32, "dramat": [32, 33], "cpu_launcher_en": 32, "cpu_launcher_arg": 32, "hyperthread": 32, "present": 32, "ital": 32, "ptmalloc": 32, "use_default_alloc": [32, 34], "tcmalloc": 32, "enable_tcmalloc": 32, "enable_jemalloc": 32, "nth": [32, 33], "uniform": 32, "overlap": 32, "signficantli": 32, "8180": 32, "affinit": 32, "addition": 32, "kill": 32, "unutil": 32, "restart": 32, "remain": 32, "aliv": 32, "taken": 32, "care": 32, "worri": 32, "continu": [32, 34], "Then": 32, "interrupt": 32, "dummi": 32, "dummy_tensor": 32, "scheme": 32, "bert_int8_jit": 32, "n_iter": 32, "rn50_int8_jit": 32, "usus": 32, "rn50_ipex_int8": 32, "handler": 32, "image_classifi": 32, "similarli": 32, "bert_ipex_int8": 32, "transformer_handler_gener": 32, "setup_config": 32, "seq_classification_artifact": 32, "index_to_nam": 32, "nc": 32, "model_stor": 32, "server": [32, 33], "rest": 32, "model_log": 32, "096": 32, "8375c": 32, "03": 32, "981": 32, "982": 32, "previous": 32, "cases": 32, "223": 32, "site": 32, "model_service_work": 32, "sock": 32, "unix": 32, "9000": 32, "762": 32, "763": 32, "9001": 32, "274": 32, "9002": 32, "975": 32, "9003": 32, "bench": 32, "amazon": 32, "ec2": 32, "24xlarg": 32, "reproduc": 32, "url": [32, 34], "modelurl": 32, "inputpath": 32, "concurr": [32, 33], "huggingface_transform": 32, "sample_text_captum_input": 32, "graphic": 33, "xe": 33, "briefli": 33, "background": 33, "knowledg": 33, "c620": 33, "seri": 33, "chipset": 33, "purlei": 33, "chip": 33, "inclus": 33, "1mb": 33, "l2": 33, "2666": 33, "mhz": 33, "ddr4": 33, "six": 33, "ultra": 33, "interconnect": 33, "upi": 33, "microarchitectur": 33, "connect": 33, "transfer": 33, "equip": 33, "motherboard": 33, "attach": 33, "remot": 33, "asu": 33, "z11pa": 33, "d8": 33, "competit": 33, "stall": 33, "busi": 33, "uma": 33, "lscpu": 33, "retriev": 33, "111": 33, "50ghz": 33, "node0": 33, "node1": 33, "sophist": 33, "brought": [33, 34], "polici": 33, "put": 33, "sysctl": 33, "great": 33, "placement": 33, "cpunodebind": 33, "membind": 33, "multithread": 33, "primari": 33, "consecut": 33, "join": 33, "libgomp": 33, "libiomp": 33, "hang": [33, 34], "gomp_cpu_affin": 33, "comma": 33, "invalid": 33, "thrash": 33, "did": [33, 34], "compet": 33, "unus": 33, "proclist": 33, "millisecond": 33, "sleep": 33, "200m": 33, "period": 33, "elaps": 33, "overal": 33, "appropri": 33, "reserv": 33, "sole": 33, "penal": 33, "role": 33, "unnecessari": 33, "destruct": 33, "emphas": 33, "fragment": 33, "mmuzzy_decay_m": 33, "forg": 33, "dealloc": 33, "costli": 33, "gpertool": 33, "plu": 33, "pretti": 33, "nifti": 33, "analysi": 33, "gperftool": 33, "set_flush_denorm": 33, "warm": 33, "therefor": 33, "threshold": 33, "usuali": 33, "come": 33, "maskrcnn": [33, 34], "wav2vec2": 33, "recognit": 33, "onednn_primitive_cache_capac": 33, "65536": 33, "voic": 33, "excit": 34, "announc": 34, "accompani": 34, "privat": 34, "broader": 34, "sincer": 34, "encourag": 34, "feedback": 34, "creator": 34, "reach": 34, "hf_beam_sampl": 34, "hf_beam_search": 34, "hf_greedy_search": 34, "hf_sampl": 34, "walk": 34, "2561": 34, "2584": 34, "2617": 34, "2663": 34, "2733": 34, "act": 34, "2550": 34, "2568": 34, "2641": 34, "2675": 34, "2613": 34, "upgrad": 34, "v3": 34, "2747": 34, "misc": 34, "2468": 34, "2627": 34, "2631": 34, "2704": 34, "changelog": 34, "optimize_transform": 34, "your_generation_param": 34, "newli": 34, "varianc": 34, "encod": 34, "2349": 34, "2412": 34, "2469": 34, "2476": 34, "flash": 34, "2317": 34, "2334": 34, "2392": 34, "2480": 34, "elser": 34, "2491": 34, "public": 34, "2473": 34, "2511": 34, "2433": 34, "2253": 34, "2251": 34, "2236": 34, "2278": 34, "2257": 34, "dockerfil": 34, "ux": 34, "2229": 34, "2195": 34, "2299": 34, "2315": 34, "2283": 34, "2280": 34, "2292": 34, "2275": 34, "2319": 34, "2198": 34, "2264": 34, "2290": 34, "experiment": 34, "workflow": 34, "1563": 34, "excess": 34, "1677": 34, "1688": 34, "1664": 34, "lar": 34, "1695": 34, "dictionari": 34, "1682": 34, "2137": 34, "1568": 34, "1585": 34, "1590": 34, "1587": 34, "1594": 34, "old": 34, "hypervisor": 34, "vm": 34, "1513": 34, "1593": 34, "padding_mod": 34, "1580": 34, "1566": 34, "transnetv2": 34, "1564": 34, "rnn": 34, "avx512_core_vnni": 34, "1592": 34, "1589": 34, "1517": 34, "hero": 34, "inspir": 34, "stanford": 34, "consumpt": 34, "ve": 34, "1341": 34, "instancenorm": 34, "1330": 34, "1414": 34, "1473": 34, "1419": 34, "1488": 34, "webpag": 34, "1318": 34, "1353": 34, "1328": 34, "1355": 34, "1367": 34, "1384": 34, "1295": 34, "1392": 34, "1376": 34, "1373": 34, "1338": 34, "1391": 34, "1322": 34, "usabl": 34, "effort": 34, "cv": 34, "refin": 34, "identifi": 34, "torchrun": 34, "shortcut": 34, "mkl": 34, "sgemm": 34, "geomean": 34, "auto_ipex": 34, "hood": 34, "calibrated_model": 34, "model_to_be_calibr": 34, "992": 34, "64byte": 34, "addlayernorm": 34, "retinanet": 34, "1032": 34, "1053": 34, "1074": 34, "tightli": 34, "matur": 34, "offlin": 34, "becam": 34, "bake": 34, "wave2vec": 34, "albert": 34, "facilit": 34, "minmax": 34, "movingaverageminmax": 34, "polish": 34, "flexibl": 34, "quantconf": 34, "multi_stream_input_hint": 34, "multi_stream_output_hint": 34, "adam": 34, "822": 34, "3d": 34, "642": 34, "deconv3d": 34, "692": 34, "787": 34, "swish": 34, "fsi": 34, "risk": 34, "551": 34, "leakyrelu": 34, "589": 34, "407": 34, "647": 34, "convolution1d": 34, "657": 34, "einsum": 34, "alphafold2": 34, "674": 34, "711": 34, "threa": 34, "slow": 34, "equival": 34, "joint": 34, "net": 34, "pend": 34, "648": 34, "684": 34, "685": 34, "dockerhub": 34, "wheel": 34, "sdk": 34, "2x": 34, "5x": 34, "reduct": 34, "center": 34, "deploi": 34, "u8": 34, "s8": 34, "satur": 34, "occur": 34, "u7": 34, "unsign": 34, "s7": 34, "worth": 34, "upload": 34, "pip3": 34, "whl": 34, "220mb": 34, "5mb": 34, "dep": 34, "220m": 34, "cxx11": 34, "224m": 34, "7m": 34, "5m": 34, "qkv": 34, "278": 34, "531": 34, "432": 34, "438": 34, "602": 34, "sliu": 34, "hardsigmoid": 34, "relu6": 34, "selu": 34, "524": 34, "452": 34, "425": 34, "100mb": 34, "40mb": 34, "meant": 34, "resolv": 34, "te": 34, "wrap": 34, "bactchnorm": 34, "205": 34, "straightforward": 34, "underhood": 34, "torchvison": 34, "hugginfac": 34, "legal": 34, "resnet18": 34, "resnet18_xpu": 34, "enable_auto_mixed_precis": 34, "mixed_dtyp": 34, "mymodel": 34, "xx_c": 34, "xx_v": 34, "clibrat": 34, "ampconf": 34, "automixprecis": 34, "running_mod": 34, "cali_dataset": 34, "trace_model": 34, "omp_set_num_thread": 34, "model_execut": 34, "same_model_execution_again": 34, "descriptor": 34, "rc3": 34, "parti": 34, "49786": 34, "rc": 34, "readm": 34, "stakehold": 34, "5rc3": 34, "dpcpp": 34, "heterogen": 34, "bfp16": 34, "proper": 34, "tacotron2": 34, "frozenbatchnorm": 34, "embeddingbad": 34, "daili": 34, "resnext3d": 34, "maskrnn": 34, "codenam": 34, "mlp": 34, "eltwis": 34, "7x": 34, "enable_auto_optim": 34, "streamlin": 34, "enable_auto_mix_precis": 34, "inject": 34, "resnet3d": 34, "fb": 34, "yolov3": 34, "maxpool": 34}, "objects": {"": [[2, 0, 0, "-", "intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu": [[2, 0, 0, "-", "runtime"]], "intel_extension_for_pytorch.cpu.runtime": [[2, 1, 1, "", "CPUPool"], [2, 1, 1, "", "MultiStreamModule"], [2, 1, 1, "", "MultiStreamModuleHint"], [2, 1, 1, "", "Task"], [2, 2, 1, "", "get_core_list_of_node_id"], [2, 2, 1, "", "is_runtime_ext_enabled"], [2, 1, 1, "", "pin"]], "intel_extension_for_pytorch": [[2, 2, 1, "", "enable_onednn_fusion"], [2, 2, 1, "", "fast_bert"], [2, 0, 0, "-", "llm"], [2, 2, 1, "", "optimize"], [2, 0, 0, "-", "quantization"], [2, 1, 1, "", "verbose"]], "intel_extension_for_pytorch.llm": [[2, 0, 0, "-", "functional"], [2, 0, 0, "-", "modules"], [2, 2, 1, "", "optimize"]], "intel_extension_for_pytorch.llm.functional": [[2, 2, 1, "", "fast_layer_norm"], [2, 2, 1, "", "indirect_access_kv_cache_attention"], [2, 2, 1, "", "rms_norm"], [2, 2, 1, "", "rotary_embedding"], [2, 2, 1, "", "varlen_attention"]], "intel_extension_for_pytorch.llm.modules": [[2, 1, 1, "", "FastLayerNorm"], [2, 1, 1, "", "IndirectAccessKVCacheAttention"], [2, 1, 1, "", "Linear2SiluMul"], [2, 1, 1, "", "LinearAdd"], [2, 1, 1, "", "LinearAddAdd"], [2, 1, 1, "", "LinearGelu"], [2, 1, 1, "", "LinearMul"], [2, 1, 1, "", "LinearNewGelu"], [2, 1, 1, "", "LinearRelu"], [2, 1, 1, "", "LinearSilu"], [2, 1, 1, "", "LinearSiluMul"], [2, 1, 1, "", "PagedAttention"], [2, 1, 1, "", "RMSNorm"], [2, 1, 1, "", "RotaryEmbedding"], [2, 1, 1, "", "VarlenAttention"]], "intel_extension_for_pytorch.nn": [[7, 1, 1, "", "FrozenBatchNorm2d"]], "intel_extension_for_pytorch.nn.functional": [[7, 2, 1, "", "interaction"]], "intel_extension_for_pytorch.nn.modules": [[7, 1, 1, "", "MergedEmbeddingBag"], [7, 1, 1, "", "MergedEmbeddingBagWithSGD"]], "intel_extension_for_pytorch.quantization": [[2, 2, 1, "", "autotune"], [2, 2, 1, "", "convert"], [2, 2, 1, "", "get_smooth_quant_qconfig_mapping"], [2, 2, 1, "", "prepare"]]}, "objtypes": {"0": "py:module", "1": "py:class", "2": "py:function"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "class", "Python class"], "2": ["py", "function", "Python function"]}, "titleterms": {"intel": [0, 1, 5, 6, 15, 30, 31, 32, 33], "extens": [0, 1, 5, 7, 15, 20, 26, 32], "pytorch": [0, 1, 5, 15, 18, 32], "cpu": [0, 2, 17, 18, 33], "isa": [0, 7, 17], "dynam": [0, 6, 7, 15, 17, 26], "dispatch": [0, 7, 17], "design": [0, 17, 20, 31], "doc": 0, "architectur": 1, "support": [1, 8, 10], "api": [2, 7, 9, 13, 16, 17, 18, 22, 25, 28, 29], "document": [2, 5, 25, 32, 33], "gener": [2, 26], "llm": [2, 6, 7, 23, 28, 30], "modul": [2, 10, 20, 28], "level": [2, 17, 28], "optim": [2, 7, 10, 13, 15, 19, 28, 29], "prototyp": [2, 6, 7, 10, 11, 12, 14, 16, 22, 28], "fast": [2, 6, 7, 11], "bert": [2, 6, 7, 11, 32], "graph": [2, 7, 12, 13, 28], "quantiz": [2, 6, 7, 15, 16, 29], "runtim": [2, 7, 20, 26], "blog": 3, "public": 3, "cheat": 4, "sheet": 4, "contribut": 5, "develop": 5, "tip": 5, "debug": [5, 17], "unit": 5, "test": 5, "python": [5, 6, 7], "better": 5, "local": 5, "pytest": 5, "lint": 5, "c": [5, 6, 18], "write": [5, 18], "build": [5, 17], "exampl": [6, 10, 11, 12, 14, 16, 17, 20, 31], "train": [6, 8], "singl": [6, 28, 31], "instanc": [6, 28, 30, 31], "float32": [6, 8], "bfloat16": [6, 8, 21, 26, 30], "distribut": [6, 28, 29], "infer": [6, 8, 28, 29, 31, 32], "eager": [6, 8], "mode": [6, 28, 31], "resnet50": [6, 32], "torchscript": [6, 8], "torchdynamo": [6, 26], "beta": [6, 7], "new": [6, 7, 34], "featur": [6, 7, 11, 12, 17], "from": [6, 7], "2": [6, 7, 14, 32, 34], "0": [6, 7, 34], "int8": [6, 7, 13, 16, 26, 30, 32], "static": [6, 15], "calibr": [6, 15], "deploy": 6, "larg": [6, 7, 28], "languag": [6, 7, 28], "model": [6, 7, 13, 15, 18, 20, 28, 32], "fp32": [6, 10, 13, 29, 30], "bf16": [6, 10, 13, 29], "smooth": [6, 16, 22], "weight": [6, 29], "onli": [6, 29], "int4": 6, "ai": [6, 30], "refer": [6, 8], "easi": 7, "us": [7, 8, 9, 10, 13, 16, 20, 31], "1": [7, 14, 32, 34], "torch": 7, "compil": [7, 17], "auto": [7, 8, 9, 16, 20], "channel": [7, 9, 18, 33], "last": [7, 9, 18, 33], "mix": [7, 8], "precis": [7, 8, 28], "amp": [7, 8], "oper": [7, 18, 19, 28], "codeless": [7, 10], "13": [7, 34], "captur": [7, 12], "hypertun": [7, 14], "introduct": [8, 19, 25], "case": [8, 10, 20], "default": [8, 9, 14, 18, 31], "path": 8, "autocast": 8, "op": 8, "elig": 8, "specif": [8, 17], "behavior": 8, "can": 8, "promot": 8, "widest": 8, "input": [8, 20], "type": [8, 28], "eas": [9, 13], "enabl": 9, "disabl": 9, "known": [9, 20, 34], "issu": [9, 20, 34], "motiv": 10, "usag": [10, 11, 12, 14, 16, 20, 26, 29, 31], "huggingfac": 10, "The": 10, "origin": 10, "command": 10, "ipex": [10, 28], "launch": [10, 31], "appli": 10, "forward": 10, "method": 10, "explicitli": 10, "instead": 10, "__call__": 10, "attr": 10, "alreadi": 10, "jit": 10, "trace": 10, "descript": [11, 12], "prerequisit": 11, "methodologi": [13, 28], "fusion": [13, 19], "pattern": 13, "fold": 13, "your_conf_fil": 14, "hyperparamet": 14, "launcher": [14, 32], "defin": [14, 15], "search": 14, "space": 14, "tune": [14, 16, 22, 33], "user": 14, "your_python_script": 14, "qconfig": 15, "prepar": 15, "do": 15, "convert": 15, "deploi": [15, 32], "recip": [16, 20, 22], "autotun": 16, "algorithm": 16, "alpha": [16, 34], "fix": 16, "determin": 16, "through": 16, "overview": [17, 28, 30, 31, 33], "requir": [17, 20], "code": 17, "folder": 17, "struct": 17, "kernel": [17, 18], "implement": [17, 20], "csrc": 17, "aten": [17, 18], "xyzkrnl": 17, "cpp": 17, "stub": 17, "xyz": 17, "h": 17, "dyndisp": 17, "dispatchstub": 17, "codegen": 17, "process": 17, "add": 17, "custom": [17, 28], "intrin": 17, "vec": 17, "privat": 17, "select": 17, "manual": 17, "check": 17, "what": [18, 34], "i": [18, 20, 31], "memori": [18, 31, 33], "format": 18, "all": [18, 31], "That": 18, "matter": 18, "nchw": 18, "b": 18, "nhwc": 18, "wip": 18, "block": 18, "nchw16c": 18, "stride": 18, "layout": 18, "tensor": 18, "creation": 18, "convers": 18, "d": 18, "coverag": 18, "statu": 18, "regist": [18, 32], "nativ": 18, "manner": 18, "onednn": [18, 33], "creat": [18, 32], "convolut": 18, "primit": [18, 33], "target": 18, "multistream": 20, "examples1": 20, "basic": 20, "examples2": 20, "set": 20, "examples3": 20, "structur": [20, 33], "output": 20, "perform": [20, 26, 30, 32, 33, 34], "asynchron": 20, "task": 20, "configur": [20, 30, 33], "core": [20, 31, 32], "bind": 20, "detail": 20, "how": 20, "iomp": 20, "preload": 20, "load": 20, "dure": 20, "split": 21, "sgd": 21, "stochast": 21, "gradient": 21, "descent": 21, "quant": 22, "quick": 23, "start": [23, 25, 32], "instal": [24, 32], "get": 25, "troubleshoot": 26, "regress": 26, "shape": 26, "result": [26, 34], "correct": 26, "licens": 27, "list": 28, "verifi": 28, "via": 28, "deepspe": [28, 29], "demo": 28, "linear": 28, "low": 28, "data": [28, 30], "indirect": 28, "access": [28, 33], "kv": 28, "cach": [28, 33], "transform": 29, "frontend": 29, "pseudocod": 29, "common": 29, "scenario": 29, "smoothquant": 29, "woq": 29, "center": 30, "product": 30, "v1": 30, "11": [30, 34], "number": [30, 31, 33], "accuraci": 30, "softwar": [30, 33], "version": 30, "hardwar": [30, 33], "200": [30, 34], "an": 30, "aw": 30, "ec2": 30, "c6i": 30, "2xlarg": 30, "10": [30, 34], "script": 31, "guid": [31, 33], "physic": 31, "ii": 31, "includ": 31, "logic": 31, "iii": 31, "node": 31, "iv": 31, "your": 31, "multipl": 31, "v": 31, "throughput": 31, "vi": 31, "latenc": 31, "vii": 31, "viii": 31, "index": 31, "jemalloc": [31, 33], "tcmalloc": [31, 33], "alloc": [31, 33], "openmp": [31, 33], "librari": 31, "gnu": [31, 33], "torchserv": 32, "content": [32, 33], "thi": [32, 33], "serv": 32, "pin": 32, "boost": 32, "multi": 32, "worker": 32, "scale": 32, "export": 32, "serial": 32, "file": 32, "archiv": 32, "3": [32, 34], "4": 32, "benchmark": 32, "non": 33, "uniform": 33, "numa": 33, "numactl": 33, "omp_num_thread": 33, "omp_thread_limit": 33, "denorm": 33, "releas": 34, "highlight": 34, "100": 34, "12": 34, "300": 34, "": 34, "chang": 34, "9": 34, "8": 34, "improv": 34, "other": 34, "note": 34}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc": [[0, "intel-extension-for-pytorch-cpu-isa-dynamic-dispatch-design-doc"]], "Intel\u00ae Extension for PyTorch*": [[1, "intel-extension-for-pytorch"]], "Architecture": [[1, "architecture"]], "Support": [[1, "support"]], "API Documentation": [[2, "api-documentation"], [25, "api-documentation"]], "General": [[2, "general"]], "LLM Module Level Optimizations (Prototype)": [[2, "llm-module-level-optimizations-prototype"]], "Fast Bert (Prototype)": [[2, "fast-bert-prototype"], [6, "fast-bert-prototype"]], "Graph Optimization": [[2, "graph-optimization"], [7, "graph-optimization"], [13, "graph-optimization"], [28, "graph-optimization"]], "Quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "CPU Runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "Blogs & Publications": [[3, "blogs-publications"]], "Cheat Sheet": [[4, "cheat-sheet"]], "Contribution": [[5, "contribution"]], "Contributing to Intel\u00ae Extension for PyTorch*": [[5, "contributing-to-intel-extension-for-pytorch"]], "Developing Intel\u00ae Extension for PyTorch*": [[5, "developing-intel-extension-for-pytorch"]], "Tips and Debugging": [[5, "tips-and-debugging"]], "Unit testing": [[5, "unit-testing"]], "Python Unit Testing": [[5, "python-unit-testing"]], "Better local unit tests with pytest": [[5, "better-local-unit-tests-with-pytest"]], "Local linting": [[5, "local-linting"]], "C++ Unit Testing": [[5, "c-unit-testing"]], "Writing documentation": [[5, "writing-documentation"]], "Building documentation": [[5, "building-documentation"]], "Tips": [[5, "tips"]], "Examples": [[6, "examples"]], "Python": [[6, "python"]], "Training": [[6, "training"]], "Single-instance Training": [[6, "single-instance-training"]], "Float32": [[6, "float32"], [6, "id1"]], "BFloat16": [[6, "bfloat16"], [6, "id6"], [21, "bfloat16"], [26, "bfloat16"]], "Distributed Training": [[6, "distributed-training"]], "Inference": [[6, "inference"]], "Eager Mode": [[6, "eager-mode"], [6, "id7"]], "Resnet50": [[6, "resnet50"], [6, "id2"], [6, "id4"], [6, "id8"], [6, "id11"], [6, "id14"]], "BERT": [[6, "bert"], [6, "id3"], [6, "id5"], [6, "id9"], [6, "id12"], [6, "id15"], [32, "bert"]], "TorchScript Mode": [[6, "torchscript-mode"], [6, "id10"]], "TorchDynamo Mode (Beta, NEW feature from 2.0.0)": [[6, "torchdynamo-mode-beta-new-feature-from-2-0-0"], [6, "id13"]], "INT8": [[6, "int8"], [26, "int8"]], "Static Quantization": [[6, "static-quantization"], [15, "static-quantization"]], "Calibration": [[6, "calibration"]], "Deployment": [[6, "deployment"]], "Dynamic Quantization": [[6, "dynamic-quantization"], [15, "dynamic-quantization"]], "Large Language Model (LLM)": [[6, "large-language-model-llm"]], "FP32/BF16": [[6, "fp32-bf16"], [29, "fp32-bf16"]], "Smooth Quantization INT8": [[6, "smooth-quantization-int8"]], "Weight Only Quantization INT8/INT4": [[6, "weight-only-quantization-int8-int4"]], "C++": [[6, "c"]], "Intel\u00ae AI Reference Models": [[6, "intel-ai-reference-models"]], "Features": [[7, "features"]], "Easy-to-use Python API": [[7, "easy-to-use-python-api"]], "Large Language Models (LLM, NEW feature from 2.1.0)": [[7, "large-language-models-llm-new-feature-from-2-1-0"]], "torch.compile (Beta, NEW feature from 2.0.0)": [[7, "torch-compile-beta-new-feature-from-2-0-0"]], "ISA Dynamic Dispatching": [[7, "isa-dynamic-dispatching"], [17, "isa-dynamic-dispatching"]], "Auto Channels Last": [[7, "auto-channels-last"], [9, "auto-channels-last"]], "Auto Mixed Precision (AMP)": [[7, "auto-mixed-precision-amp"], [8, "auto-mixed-precision-amp"]], "Operator Optimization": [[7, "operator-optimization"]], "Optimizer Optimization": [[7, "optimizer-optimization"]], "Runtime Extension": [[7, "runtime-extension"], [20, "runtime-extension"], [26, "runtime-extension"]], "INT8 Quantization": [[7, "int8-quantization"]], "Codeless Optimization (Prototype, NEW feature from 1.13.0)": [[7, "codeless-optimization-prototype-new-feature-from-1-13-0"]], "Graph Capture (Prototype, NEW feature from 1.13.0)": [[7, "graph-capture-prototype-new-feature-from-1-13-0"]], "HyperTune (Prototype, NEW feature from 1.13.0)": [[7, "hypertune-prototype-new-feature-from-1-13-0"]], "Fast BERT Optimization (Prototype, NEW feature from 2.0.0)": [[7, "fast-bert-optimization-prototype-new-feature-from-2-0-0"]], "Introduction": [[8, "introduction"], [19, "introduction"], [25, "introduction"]], "Use Case": [[8, "use-case"]], "Default Precision": [[8, "default-precision"]], "Inference with Eager Path": [[8, "inference-with-eager-path"]], "Inference with TorchScript Path": [[8, "inference-with-torchscript-path"]], "Training Support": [[8, "training-support"]], "Autocast Op Reference": [[8, "autocast-op-reference"]], "Op Eligibility": [[8, "op-eligibility"]], "Op-Specific Behavior": [[8, "op-specific-behavior"]], "Ops that can autocast to bfloat16": [[8, "ops-that-can-autocast-to-bfloat16"]], "Ops that can autocast to float32": [[8, "ops-that-can-autocast-to-float32"]], "Ops that promote to the widest input type": [[8, "ops-that-promote-to-the-widest-input-type"]], "Ease-of-use auto channels last API": [[9, "ease-of-use-auto-channels-last-api"]], "default": [[9, "default"]], "enable": [[9, "enable"]], "disable": [[9, "disable"]], "Known issue": [[9, "known-issue"], [34, "known-issue"], [34, "id43"]], "Codeless Optimization (Prototype)": [[10, "codeless-optimization-prototype"]], "Motivation": [[10, "motivation"]], "Example Usage with HuggingFace": [[10, "example-usage-with-huggingface"]], "The origin command with ipex launch": [[10, "the-origin-command-with-ipex-launch"]], "Command to apply ipex optimization for FP32": [[10, "command-to-apply-ipex-optimization-for-fp32"]], "Command to apply ipex optimization for BF16": [[10, "command-to-apply-ipex-optimization-for-bf16"]], "Use Case not supported": [[10, "use-case-not-supported"]], "Module uses forward method explicitly instead of the __call__ attr": [[10, "module-uses-forward-method-explicitly-instead-of-the-call-attr"]], "Already using ipex.optimize": [[10, "already-using-ipex-optimize"]], "Already using Jit Trace": [[10, "already-using-jit-trace"]], "Fast BERT (Prototype)": [[11, "fast-bert-prototype"]], "Feature Description": [[11, "feature-description"], [12, "feature-description"]], "Prerequisite": [[11, "prerequisite"]], "Usage Example": [[11, "usage-example"], [12, "usage-example"], [16, "usage-example"]], "Graph Capture (Prototype)": [[12, "graph-capture-prototype"]], "Ease-of-use graph optimization API": [[13, "ease-of-use-graph-optimization-api"]], "FP32 and BF16 models": [[13, "fp32-and-bf16-models"]], "INT8 models": [[13, "int8-models"]], "Methodology": [[13, "methodology"]], "Fusion": [[13, "fusion"]], "FP32 and BF16 fusion patterns": [[13, "fp32-and-bf16-fusion-patterns"]], "INT8 fusion patterns": [[13, "int8-fusion-patterns"]], "Folding": [[13, "folding"]], "HyperTune (Prototype)": [[14, "hypertune-prototype"]], "Usage of Hypertune": [[14, "usage-of-hypertune"]], "your_conf_file": [[14, "your-conf-file"]], "Hyperparameters": [[14, "hyperparameters"]], "Launcher Hyperparameters": [[14, "launcher-hyperparameters"]], "Defining hyperparameters and their search spaces": [[14, "defining-hyperparameters-and-their-search-spaces"]], "1. Defining hyperparameters to tune:": [[14, "defining-hyperparameters-to-tune"]], "2. Defining the search spaces of the hyperparameters:": [[14, "defining-the-search-spaces-of-the-hyperparameters"]], "Default search space": [[14, "default-search-space"]], "User defined search space": [[14, "user-defined-search-space"]], "": [[14, "your-python-script"]], "Usage Examples": [[14, "usage-examples"], [31, "usage-examples"]], "Intel\u00ae Extension for PyTorch* optimizations for quantization": [[15, "intel-extension-for-pytorch-optimizations-for-quantization"]], "Define qconfig": [[15, "define-qconfig"]], "Prepare Model and Do Calibration": [[15, "prepare-model-and-do-calibration"]], "Convert to Static Quantized Model and Deploy": [[15, "convert-to-static-quantized-model-and-deploy"]], "Define QConfig": [[15, "id1"]], "Prepare Model": [[15, "prepare-model"]], "Convert to Dynamic Quantized Model and Deploy": [[15, "convert-to-dynamic-quantized-model-and-deploy"]], "INT8 Recipe Tuning API (Prototype)": [[16, "int8-recipe-tuning-api-prototype"]], "Smooth Quantization Autotune": [[16, "smooth-quantization-autotune"]], "Algorithm: Auto-tuning of $\\alpha$.": [[16, "algorithm-auto-tuning-of-alpha"]], "$\\alpha$ Usage": [[16, "alpha-usage"]], "Using a fixed alpha": [[16, "using-a-fixed-alpha"]], "Determining the alpha through auto-tuning": [[16, "determining-the-alpha-through-auto-tuning"]], "Overview": [[17, "overview"], [30, "overview"], [31, "overview"], [33, "overview"]], "CPU ISA build compiler requirement": [[17, "cpu-isa-build-compiler-requirement"]], "Dynamic Dispatch Design": [[17, "dynamic-dispatch-design"]], "Code Folder Struct": [[17, "code-folder-struct"]], "Kernel implementation: csrc/cpu/aten/kernels/xyzKrnl.cpp": [[17, "kernel-implementation-csrc-cpu-aten-kernels-xyzkrnl-cpp"]], "Kernel Stub: csrc/cpu/aten/xyz.cpp and csrc/cpu/aten/xyz.h": [[17, "kernel-stub-csrc-cpu-aten-xyz-cpp-and-csrc-cpu-aten-xyz-h"]], "Dispatch Stub implementation: csrc/cpu/dyndisp/DispatchStub.cpp and csrc/cpu/dyndisp/DispatchStub.h": [[17, "dispatch-stub-implementation-csrc-cpu-dyndisp-dispatchstub-cpp-and-csrc-cpu-dyndisp-dispatchstub-h"]], "CodeGen Process": [[17, "codegen-process"]], "Add Custom Kernel": [[17, "add-custom-kernel"]], "ISA intrinics specific kernel example:": [[17, "isa-intrinics-specific-kernel-example"]], "Vec specific kernel example:": [[17, "vec-specific-kernel-example"]], "Private Debug APIs": [[17, "private-debug-apis"]], "Example:": [[17, "example"], [17, "id1"]], "Select ISA level manually.": [[17, "select-isa-level-manually"]], "CPU feature check": [[17, "cpu-feature-check"]], "Channels Last": [[18, "channels-last"], [33, "channels-last"]], "What is Channels Last": [[18, "what-is-channels-last"]], "Memory Format Is All That Matters": [[18, "memory-format-is-all-that-matters"]], "a. NCHW (default)": [[18, "a-nchw-default"]], "b. NHWC (WIP for CPU)": [[18, "b-nhwc-wip-for-cpu"]], "c. Blocked (nChw16c)": [[18, "c-blocked-nchw16c"]], "PyTorch Strided Layout": [[18, "pytorch-strided-layout"]], "PyTorch Channels Last Memory Format APIs": [[18, "pytorch-channels-last-memory-format-apis"]], "a. tensor creation": [[18, "a-tensor-creation"]], "b. tensor conversion": [[18, "b-tensor-conversion"]], "c. model conversion": [[18, "c-model-conversion"]], "d. operator coverage": [[18, "d-operator-coverage"]], "Writing Channels Last Kernels": [[18, "writing-channels-last-kernels"]], "a. Status on CPU": [[18, "a-status-on-cpu"]], "b. Register Channels Last Kernel in ATen Native Manner": [[18, "b-register-channels-last-kernel-in-aten-native-manner"]], "c. Register oneDNN Kernel on Channels Last": [[18, "c-register-onednn-kernel-on-channels-last"]], "oneDNN NHWC APIs": [[18, "onednn-nhwc-apis"]], "a. Create NHWC Memory": [[18, "a-create-nhwc-memory"]], "b. Create Convolution Primitive": [[18, "b-create-convolution-primitive"]], "CPU Channels Last Targets": [[18, "cpu-channels-last-targets"]], "Optimizer Fusion": [[19, "optimizer-fusion"]], "Operation Fusion": [[19, "operation-fusion"]], "Requirements": [[20, "requirements"]], "Use Cases": [[20, "use-cases"]], "Example of MultiStream Module": [[20, "example-of-multistream-module"]], "Examples1: Basic Usage": [[20, "examples1-basic-usage"]], "Examples2: Usage with \u201cAUTO\u201d setting": [[20, "examples2-usage-with-auto-setting"]], "Examples3: Usage for models with structure inputs/outputs": [[20, "examples3-usage-for-models-with-structure-inputs-outputs"]], "Performance recipes": [[20, "performance-recipes"]], "Known issues": [[20, "known-issues"], [34, "id37"]], "Example of asynchronous task": [[20, "example-of-asynchronous-task"]], "Example of configuring core binding": [[20, "example-of-configuring-core-binding"]], "Detail Design": [[20, "detail-design"]], "How the core binding is implemented": [[20, "how-the-core-binding-is-implemented"]], "Design of Task": [[20, "design-of-task"]], "IOMP preload or load during the runtime": [[20, "iomp-preload-or-load-during-the-runtime"]], "Split SGD": [[21, "split-sgd"], [21, "id2"]], "Stochastic Gradient Descent (SGD)": [[21, "stochastic-gradient-descent-sgd"]], "Smooth Quant Recipe Tuning API (Prototype)": [[22, "smooth-quant-recipe-tuning-api-prototype"]], "Quick Start": [[23, "quick-start"]], "LLM Quick Start": [[23, "llm-quick-start"]], "Installation": [[24, "installation"]], "Get Started": [[25, "get-started"]], "Troubleshooting": [[26, "troubleshooting"]], "General Usage": [[26, "general-usage"]], "Performance Regression": [[26, "performance-regression"]], "TorchDynamo": [[26, "torchdynamo"]], "Dynamic Shape": [[26, "dynamic-shape"]], "Result Correctness": [[26, "result-correctness"]], "License": [[27, "license"]], "Large Language Models (LLM) Optimization Overview": [[28, "large-language-models-llm-optimization-overview"]], "ipex.llm Optimized Model List": [[28, "ipex-llm-optimized-model-list"]], "Verified for single instance mode": [[28, "verified-for-single-instance-mode"]], "Verified for distributed inference mode via DeepSpeed": [[28, "verified-for-distributed-inference-mode-via-deepspeed"]], "Module Level Optimization API for customized LLM (Prototype)": [[28, "module-level-optimization-api-for-customized-llm-prototype"]], "Demos": [[28, "demos"]], "Optimization Methodologies": [[28, "optimization-methodologies"]], "Linear Operator Optimization": [[28, "linear-operator-optimization"]], "Low Precision Data Types": [[28, "low-precision-data-types"]], "Indirect Access KV Cache": [[28, "indirect-access-kv-cache"]], "Distributed Inference": [[28, "distributed-inference"]], "Transformers Optimization Frontend API": [[29, "transformers-optimization-frontend-api"]], "Pseudocode of Common Usage Scenarios": [[29, "pseudocode-of-common-usage-scenarios"]], "SmoothQuant": [[29, "smoothquant"]], "Weight Only Quantization (WOQ)": [[29, "weight-only-quantization-woq"]], "Distributed Inference with DeepSpeed": [[29, "distributed-inference-with-deepspeed"]], "Performance": [[30, "performance"], [34, "performance"]], "Performance Data for Intel\u00ae AI Data Center Products": [[30, "performance-data-for-intel-ai-data-center-products"]], "LLM Performance": [[30, "llm-performance"]], "INT8 with v1.11": [[30, "int8-with-v1-11"]], "Performance Numbers": [[30, "performance-numbers"], [30, "id1"], [30, "id4"]], "Accuracy": [[30, "accuracy"]], "Configuration": [[30, "configuration"], [30, "id2"], [30, "id5"]], "Software Version": [[30, "software-version"], [30, "id3"], [30, "id6"]], "Hardware Configuration": [[30, "hardware-configuration"], [30, "id7"], [33, "hardware-configuration"]], "FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance": [[30, "fp32-with-v1-11-200-on-an-aws-ec2-c6i-2xlarge-instance"]], "FP32 and BFloat16 with v1.10": [[30, "fp32-and-bfloat16-with-v1-10"]], "Launch Script Usage Guide": [[31, "launch-script-usage-guide"]], "Usage of launch script": [[31, "usage-of-launch-script"]], "Single instance for inference": [[31, "single-instance-for-inference"]], "I. Use all physical cores": [[31, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[31, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on designated nodes": [[31, "iii-use-physical-cores-on-designated-nodes"]], "IV. Use your designated number of cores": [[31, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[31, "multiple-instances-for-inference"]], "V. Throughput mode": [[31, "v-throughput-mode"]], "VI. Latency mode": [[31, "vi-latency-mode"]], "VII. Your designated number of instances": [[31, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[31, "viii-your-designated-number-of-instances-and-instance-index"]], "Usage of Jemalloc/TCMalloc/Default memory allocator": [[31, "usage-of-jemalloc-tcmalloc-default-memory-allocator"]], "Jemalloc": [[31, "jemalloc"], [33, "jemalloc"]], "TCMalloc": [[31, "tcmalloc"], [33, "tcmalloc"]], "Default memory allocator": [[31, "default-memory-allocator"]], "Usage of OpenMP library": [[31, "usage-of-openmp-library"]], "Intel OpenMP Library": [[31, "intel-openmp-library"]], "GNU OpenMP Library": [[31, "gnu-openmp-library"]], "TorchServe with Intel\u00ae Extension for PyTorch*": [[32, "torchserve-with-intel-extension-for-pytorch"]], "Contents of this Document": [[32, "contents-of-this-document"], [33, "contents-of-this-document"]], "Install Intel\u00ae Extension for PyTorch*": [[32, "install-intel-extension-for-pytorch"]], "Serving model with Intel\u00ae Extension for PyTorch*": [[32, "serving-model-with-intel-extension-for-pytorch"]], "TorchServe with Launcher": [[32, "torchserve-with-launcher"]], "Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference": [[32, "launcher-core-pinning-to-boost-performance-of-torchserve-multi-worker-inference"]], "Scaling workers": [[32, "scaling-workers"]], "Creating and Exporting INT8 model for Intel\u00ae Extension for PyTorch*": [[32, "creating-and-exporting-int8-model-for-intel-extension-for-pytorch"]], "1. Creating a serialized file": [[32, "creating-a-serialized-file"]], "ResNet50": [[32, "resnet50"]], "2. Creating a Model Archive": [[32, "creating-a-model-archive"]], "3. Start TorchServe to serve the model": [[32, "start-torchserve-to-serve-the-model"]], "4. Registering and Deploying model": [[32, "registering-and-deploying-model"]], "Benchmarking with Launcher": [[32, "benchmarking-with-launcher"]], "Benchmarking with Launcher Core Pinning": [[32, "benchmarking-with-launcher-core-pinning"]], "Performance Boost with Intel\u00ae Extension for PyTorch* and Launcher": [[32, "performance-boost-with-intel-extension-for-pytorch-and-launcher"]], "Performance Tuning Guide": [[33, "performance-tuning-guide"]], "Intel CPU Structure": [[33, "intel-cpu-structure"]], "Non-Uniform Memory Access (NUMA)": [[33, "non-uniform-memory-access-numa"]], "Software Configuration": [[33, "software-configuration"]], "Numactl": [[33, "numactl"]], "OpenMP": [[33, "openmp"]], "OMP_NUM_THREADS": [[33, "omp-num-threads"]], "OMP_THREAD_LIMIT": [[33, "omp-thread-limit"]], "GNU OpenMP": [[33, "gnu-openmp"]], "Intel OpenMP": [[33, "intel-openmp"]], "Memory Allocator": [[33, "memory-allocator"]], "Denormal Number": [[33, "denormal-number"]], "OneDNN primitive cache": [[33, "onednn-primitive-cache"]], "Releases": [[34, "releases"]], "2.3.0": [[34, "id1"]], "Highlights": [[34, "highlights"], [34, "id3"], [34, "id5"], [34, "id7"], [34, "id9"], [34, "id11"], [34, "id13"], [34, "id15"], [34, "id18"], [34, "id21"], [34, "id24"], [34, "id26"], [34, "id29"]], "2.2.0": [[34, "id2"]], "2.1.100": [[34, "id4"]], "2.1.0": [[34, "id6"]], "2.0.100": [[34, "id8"]], "2.0.0": [[34, "id10"]], "Known Issues": [[34, "known-issues"], [34, "id16"], [34, "id22"], [34, "id30"]], "1.13.100": [[34, "id12"]], "1.13.0": [[34, "id14"]], "1.12.300": [[34, "id17"]], "1.12.100": [[34, "id19"]], "1.12.0": [[34, "id20"]], "1.11.200": [[34, "id23"]], "1.11.0": [[34, "id25"]], "What\u2019s Changed": [[34, "what-s-changed"], [34, "id31"]], "1.10.100": [[34, "id27"]], "1.10.0": [[34, "id28"]], "1.9.0": [[34, "id32"]], "What\u2019s New": [[34, "what-s-new"], [34, "id34"], [34, "id36"], [34, "id39"], [34, "id42"]], "1.8.0": [[34, "id33"]], "1.2.0": [[34, "id35"]], "Performance Improvement": [[34, "performance-improvement"]], "Others": [[34, "others"]], "1.1.0": [[34, "id38"]], "1.0.2": [[34, "id40"]], "1.0.1-Alpha": [[34, "alpha"]], "1.0.0-Alpha": [[34, "id41"]], "Performance Result": [[34, "performance-result"]], "NOTE": [[34, "note"]]}, "indexentries": {"cpupool (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.CPUPool"]], "fastlayernorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.FastLayerNorm"]], "indirectaccesskvcacheattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.IndirectAccessKVCacheAttention"]], "linear2silumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.Linear2SiluMul"]], "linearadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAdd"]], "linearaddadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAddAdd"]], "lineargelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearGelu"]], "linearmul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearMul"]], "linearnewgelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearNewGelu"]], "linearrelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearRelu"]], "linearsilu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSilu"]], "linearsilumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSiluMul"]], "multistreammodule (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModule"]], "multistreammodulehint (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModuleHint"]], "pagedattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.PagedAttention"]], "rmsnorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RMSNorm"]], "rotaryembedding (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RotaryEmbedding"]], "task (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.Task"]], "varlenattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.VarlenAttention"]], "autotune() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.autotune"]], "convert() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.convert"]], "enable_onednn_fusion() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.enable_onednn_fusion"]], "fast_bert() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.fast_bert"]], "fast_layer_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.fast_layer_norm"]], "get_core_list_of_node_id() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.get_core_list_of_node_id"]], "get_smooth_quant_qconfig_mapping() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.get_smooth_quant_qconfig_mapping"]], "indirect_access_kv_cache_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.indirect_access_kv_cache_attention"]], "intel_extension_for_pytorch": [[2, "module-intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu.runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "intel_extension_for_pytorch.llm": [[2, "module-intel_extension_for_pytorch.llm"]], "intel_extension_for_pytorch.llm.functional": [[2, "module-intel_extension_for_pytorch.llm.functional"]], "intel_extension_for_pytorch.llm.modules": [[2, "module-intel_extension_for_pytorch.llm.modules"]], "intel_extension_for_pytorch.quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "is_runtime_ext_enabled() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.is_runtime_ext_enabled"]], "module": [[2, "module-intel_extension_for_pytorch"], [2, "module-intel_extension_for_pytorch.cpu.runtime"], [2, "module-intel_extension_for_pytorch.llm"], [2, "module-intel_extension_for_pytorch.llm.functional"], [2, "module-intel_extension_for_pytorch.llm.modules"], [2, "module-intel_extension_for_pytorch.quantization"]], "optimize() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.optimize"]], "optimize() (in module intel_extension_for_pytorch.llm)": [[2, "intel_extension_for_pytorch.llm.optimize"]], "pin (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.pin"]], "prepare() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.prepare"]], "rms_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rms_norm"]], "rotary_embedding() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rotary_embedding"]], "varlen_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.varlen_attention"]], "verbose (class in intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.verbose"]], "frozenbatchnorm2d (class in intel_extension_for_pytorch.nn)": [[7, "intel_extension_for_pytorch.nn.FrozenBatchNorm2d"]], "mergedembeddingbag (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBag"]], "mergedembeddingbagwithsgd (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBagWithSGD"]], "interaction() (in module intel_extension_for_pytorch.nn.functional)": [[7, "intel_extension_for_pytorch.nn.functional.interaction"]]}}) \ No newline at end of file +Search.setIndex({"docnames": ["design_doc/cpu/isa_dyndisp", "index", "tutorials/api_doc", "tutorials/blogs_publications", "tutorials/cheat_sheet", "tutorials/contribution", "tutorials/examples", "tutorials/features", "tutorials/features/amp", "tutorials/features/auto_channels_last", "tutorials/features/codeless_optimization", "tutorials/features/fast_bert", "tutorials/features/graph_capture", "tutorials/features/graph_optimization", "tutorials/features/hypertune", "tutorials/features/int8_overview", "tutorials/features/int8_recipe_tuning_api", "tutorials/features/isa_dynamic_dispatch", "tutorials/features/nhwc", "tutorials/features/optimizer_fusion", "tutorials/features/runtime_extension", "tutorials/features/split_sgd", "tutorials/features/sq_recipe_tuning_api", "tutorials/getting_started", "tutorials/installation", "tutorials/introduction", "tutorials/known_issues", "tutorials/license", "tutorials/llm", "tutorials/llm/llm_optimize", "tutorials/performance", "tutorials/performance_tuning/launch_script", "tutorials/performance_tuning/torchserve", "tutorials/performance_tuning/tuning_guide", "tutorials/releases"], "filenames": ["design_doc/cpu/isa_dyndisp.md", "index.rst", "tutorials/api_doc.rst", "tutorials/blogs_publications.md", "tutorials/cheat_sheet.md", "tutorials/contribution.md", "tutorials/examples.md", "tutorials/features.rst", "tutorials/features/amp.md", "tutorials/features/auto_channels_last.md", "tutorials/features/codeless_optimization.md", "tutorials/features/fast_bert.md", "tutorials/features/graph_capture.md", "tutorials/features/graph_optimization.md", "tutorials/features/hypertune.md", "tutorials/features/int8_overview.md", "tutorials/features/int8_recipe_tuning_api.md", "tutorials/features/isa_dynamic_dispatch.md", "tutorials/features/nhwc.md", "tutorials/features/optimizer_fusion.md", "tutorials/features/runtime_extension.md", "tutorials/features/split_sgd.rst", "tutorials/features/sq_recipe_tuning_api.md", "tutorials/getting_started.md", "tutorials/installation.md", "tutorials/introduction.rst", "tutorials/known_issues.md", "tutorials/license.md", "tutorials/llm.rst", "tutorials/llm/llm_optimize.md", "tutorials/performance.md", "tutorials/performance_tuning/launch_script.md", "tutorials/performance_tuning/torchserve.md", "tutorials/performance_tuning/tuning_guide.md", "tutorials/releases.md"], "titles": ["Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc", "Intel\u00ae Extension for PyTorch*", "API Documentation", "Blogs & Publications", "Cheat Sheet", "Contribution", "Examples", "Features", "Auto Mixed Precision (AMP)", "Auto Channels Last", "Codeless Optimization (Prototype)", "Fast BERT (Prototype)", "Graph Capture (Prototype)", "Graph Optimization", "HyperTune (Prototype)", "Intel\u00ae Extension for PyTorch* optimizations for quantization", "INT8 Recipe Tuning API (Prototype)", "ISA Dynamic Dispatching", "Channels Last", "Optimizer Fusion", "Runtime Extension", "Split SGD", "Smooth Quant Recipe Tuning API (Prototype)", "Quick Start", "Installation", "Introduction", "Troubleshooting", "License", "Large Language Models (LLM) Optimization Overview", "Transformers Optimization Frontend API", "Performance", "Launch Script Usage Guide", "TorchServe with Intel\u00ae Extension for PyTorch*", "Performance Tuning Guide", "Releases"], "terms": {"The": [0, 1, 2, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "document": [0, 7, 17, 20, 29, 34], "i": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 34], "redirect": 0, "thi": [0, 2, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 26, 27, 28, 29, 30, 31, 34], "link": [0, 1, 6, 17, 34], "now": [0, 2, 7, 15, 18, 32, 33, 34], "intel optim": 1, "intel\u00ae extension for pytorch*": 1, "gpu": [1, 3, 18, 34], "discrete gpu": 1, "intel discrete gpu": 1, "extend": [1, 18, 25, 33, 34], "latest": [1, 2, 25, 28, 30, 34], "perform": [1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 18, 19, 21, 25, 28, 29, 31], "optim": [1, 3, 4, 6, 8, 9, 11, 12, 14, 16, 18, 20, 21, 23, 25, 26, 31, 32, 33, 34], "hardwar": [1, 3, 17, 25, 28, 32, 34], "take": [1, 2, 7, 8, 10, 12, 13, 14, 18, 21, 25, 26, 30, 31, 33], "advantag": [1, 2, 7, 9, 12, 18, 21, 25, 30, 31, 33], "advanc": [1, 2, 6, 7, 16, 25, 28], "vector": [1, 2, 6, 17, 18, 25, 28], "512": [1, 6, 11, 16, 25, 28, 31], "avx": [1, 6, 17, 25, 28], "neural": [1, 3, 7, 16, 22, 25, 28, 33, 34], "network": [1, 3, 7, 8, 20, 25, 28, 33], "instruct": [1, 5, 6, 7, 8, 17, 21, 23, 24, 25, 28, 30, 33, 34], "vnni": [1, 15, 17, 25, 28], "matrix": [1, 6, 7, 25, 28], "amx": [1, 3, 6, 7, 17, 25, 28, 30], "cpu": [1, 3, 4, 5, 6, 7, 8, 10, 14, 15, 16, 19, 20, 23, 25, 26, 28, 30, 31, 32, 34], "well": [1, 2, 5, 6, 7, 11, 16, 20, 21, 24, 28, 32, 33, 34], "x": [1, 5, 6, 8, 10, 13, 15, 16, 17, 18, 20, 21, 23, 26, 34], "e": [1, 2, 6, 7, 8, 12, 16, 17, 18, 28, 31, 33, 34], "xmx": 1, "ai": [1, 2, 3, 7, 28], "engin": [1, 6, 18, 33], "discret": 1, "moreov": [1, 2, 28], "provid": [1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 16, 20, 22, 24, 26, 28, 29, 31, 32, 33, 34], "easi": [1, 3, 21], "acceler": [1, 2, 3, 6, 7, 13, 28, 29, 30, 34], "through": [1, 2, 6, 7, 8, 12, 25, 28, 33, 34], "xpu": [1, 2, 3, 34], "devic": [1, 2, 15, 29, 31, 34], "In": [1, 2, 6, 7, 8, 12, 16, 17, 18, 19, 21, 23, 28, 31, 32, 33, 34], "current": [1, 2, 5, 7, 11, 13, 14, 15, 16, 17, 19, 20, 26, 28, 29, 34], "technolog": [1, 7, 28], "landscap": [1, 7, 28], "gener": [1, 5, 6, 7, 10, 12, 16, 17, 18, 21, 23, 28, 29, 30, 31, 32, 33, 34], "genai": [1, 7, 28], "workload": [1, 6, 7, 8, 10, 11, 12, 21, 26, 28, 29, 30, 31, 33, 34], "model": [1, 2, 3, 4, 8, 9, 10, 11, 12, 14, 16, 23, 24, 25, 26, 29, 30, 33, 34], "have": [1, 2, 5, 6, 7, 9, 14, 17, 18, 20, 21, 23, 26, 27, 28, 30, 31, 32, 33, 34], "gain": [1, 7, 26, 28, 34], "widespread": [1, 7, 28], "attent": [1, 2, 7, 28, 34], "popular": [1, 7, 22, 28, 30, 34], "larg": [1, 2, 19, 23, 24, 25, 26, 29, 30, 33, 34], "languag": [1, 2, 23, 24, 25, 26, 29, 34], "llm": [1, 16, 22, 24, 25, 29, 34], "emerg": [1, 7, 28], "domin": [1, 7, 28], "drive": [1, 7, 28], "applic": [1, 2, 7, 20, 28, 32, 33], "start": [1, 3, 4, 5, 6, 7, 10, 20, 24, 34], "from": [1, 2, 3, 4, 5, 8, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 29, 31, 32, 33, 34], "2": [1, 2, 3, 8, 10, 16, 17, 18, 20, 21, 25, 26, 27, 28, 29, 30, 31, 33], "1": [1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 33], "0": [1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 32, 33], "specif": [1, 2, 5, 6, 7, 12, 18, 20, 26, 28, 31, 33, 34], "certain": [1, 7, 26, 28, 29, 31, 33], "ar": [1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "introduc": [1, 3, 7, 15, 18, 21, 22, 31, 33, 34], "For": [1, 2, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 34], "more": [1, 2, 5, 6, 7, 8, 10, 11, 13, 16, 17, 19, 20, 21, 23, 26, 28, 32, 33, 34], "inform": [1, 2, 6, 7, 14, 17, 18, 28, 31, 32, 33, 34], "refer": [1, 7, 9, 13, 14, 16, 17, 18, 20, 22, 23, 24, 25, 32, 34], "section": [1, 6, 7, 8, 14, 20, 23, 24, 25, 28, 29, 32, 33, 34], "can": [1, 2, 5, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 28, 29, 30, 31, 32, 33, 34], "load": [1, 2, 6, 7, 13, 15, 16, 17, 23, 29, 32, 34], "python": [1, 2, 4, 10, 14, 17, 20, 26, 28, 29, 31, 32, 33, 34], "modul": [1, 6, 7, 8, 13, 16, 17, 26, 29, 31, 34], "program": [1, 5, 7, 11, 20, 31, 33, 34], "c": [1, 7, 8, 16, 17, 20, 26, 28, 31, 32, 33, 34], "librari": [1, 2, 5, 6, 7, 17, 20, 32, 33, 34], "script": [1, 2, 3, 4, 5, 6, 7, 8, 10, 14, 17, 20, 23, 24, 26, 28, 29, 30, 32, 33, 34], "user": [1, 2, 7, 9, 10, 12, 13, 15, 16, 18, 20, 26, 31, 32, 33, 34], "enabl": [1, 2, 3, 4, 6, 7, 8, 10, 13, 16, 18, 20, 22, 23, 26, 28, 31, 32, 33, 34], "dynam": [1, 4, 20, 28, 32, 33, 34], "import": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 32, 33, 34], "intel_extension_for_pytorch": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 23, 25, 29, 32, 34], "featur": [1, 2, 3, 5, 8, 10, 13, 14, 18, 20, 23, 25, 26, 28, 30, 31, 32, 33, 34], "includ": [1, 2, 5, 6, 7, 10, 14, 15, 17, 23, 26, 27, 28, 30, 34], "onli": [1, 2, 5, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 20, 21, 26, 28, 31, 32, 34], "packag": [1, 2, 5, 6, 7, 10, 23, 25, 26, 32, 33, 34], "mai": [1, 2, 3, 5, 6, 7, 8, 9, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "newer": [1, 28, 33], "code": [1, 2, 5, 6, 7, 10, 11, 12, 13, 18, 19, 21, 23, 24, 26, 27, 29, 33, 34], "base": [1, 2, 3, 4, 5, 6, 7, 10, 11, 17, 20, 21, 26, 28, 29, 30, 32, 33, 34], "due": [1, 8, 10, 17, 20, 26], "differ": [1, 2, 6, 7, 15, 16, 17, 18, 20, 28, 31, 32, 33, 34], "develop": [1, 3, 6, 28, 30, 33, 34], "schedul": [1, 2, 13, 20, 31, 33], "ha": [1, 2, 7, 10, 14, 17, 18, 20, 21, 26, 28, 30, 31, 33, 34], "been": [1, 6, 7, 10, 17, 18, 28, 31, 33, 34], "releas": [1, 17, 18, 26, 30, 33], "an": [1, 2, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 26, 31, 32, 33, 34], "open": [1, 16, 28, 33], "sourc": [1, 5, 6, 17, 27, 28, 33, 34], "project": [1, 6], "github": [1, 2, 5, 6, 7, 8, 34], "you": [1, 2, 5, 6, 7, 8, 13, 14, 15, 17, 18, 20, 23, 25, 26, 28, 29, 31, 33, 34], "find": [1, 2, 6, 7, 14, 16, 23, 26, 30, 31, 34], "how": [1, 2, 6, 10, 15, 17, 18, 23, 28, 32, 33, 34], "get": [1, 2, 3, 4, 6, 7, 10, 11, 15, 17, 20, 21, 22, 26, 28, 29, 30, 31, 33, 34], "main": [1, 2, 5, 6, 14, 20, 31, 32], "branch": [1, 7, 30], "quick": [1, 20, 24, 25], "about": [1, 2, 5, 7, 13, 16, 32, 33, 34], "product": [1, 2, 7, 14, 28, 34], "structur": [1, 18, 31], "shown": [1, 6, 18, 28, 31, 32], "follow": [1, 2, 4, 5, 6, 7, 8, 11, 14, 15, 16, 17, 18, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34], "figur": [1, 2, 21, 28, 33], "eager": [1, 7, 12, 23, 32, 34], "mode": [1, 2, 5, 7, 10, 12, 18, 20, 23, 26, 32, 34], "frontend": [1, 2, 7, 20, 28, 34], "custom": [1, 2, 7, 26, 34], "fusion": [1, 2, 7, 10, 21, 28, 34], "int8": [1, 2, 3, 4, 17, 18, 20, 22, 28, 29, 34], "quantiz": [1, 3, 4, 13, 22, 26, 28, 30, 32, 34], "api": [1, 3, 6, 10, 11, 15, 20, 26, 33, 34], "further": [1, 2, 5, 6, 7, 18, 20, 28, 33, 34], "improv": [1, 3, 7, 8, 13, 20, 22, 28, 30, 32, 33], "achiev": [1, 2, 6, 7, 28, 33, 34], "convert": [1, 2, 4, 6, 7, 8, 9, 10, 13, 16, 17, 18, 20, 23, 26, 32, 34], "graph": [1, 4, 8, 10, 16, 23, 26, 31, 34], "us": [1, 2, 3, 4, 5, 6, 11, 14, 15, 17, 18, 19, 21, 23, 24, 25, 26, 27, 28, 32, 33, 34], "pass": [1, 2, 5, 10, 17, 20, 26, 32, 34], "reduc": [1, 2, 7, 15, 19, 20, 21, 22, 26, 28, 33, 34], "oper": [1, 2, 6, 8, 13, 15, 21, 32, 33, 34], "kernel": [1, 2, 7, 20, 26, 28, 30, 33, 34], "invoc": [1, 7], "overhead": [1, 2, 7, 10, 19, 20, 26, 28, 33, 34], "result": [1, 2, 6, 10, 12, 14, 16, 18, 20, 21, 30, 31, 32, 33], "compar": [1, 2, 7, 13, 18, 21, 26, 28, 30, 31, 33, 34], "normal": [1, 2, 6, 7, 13, 20, 28, 33, 34], "yield": [1, 7, 33], "better": [1, 2, 6, 7, 15, 18, 20, 28, 31, 32, 33, 34], "techniqu": [1, 2, 7, 11, 12, 28, 34], "like": [1, 2, 3, 5, 6, 7, 8, 14, 18, 19, 21, 26, 28, 31, 33, 34], "amplifi": 1, "them": [1, 5, 7, 18, 19, 28, 31, 33], "comprehens": [1, 34], "both": [1, 2, 6, 7, 16, 18, 19, 21, 28, 29, 31, 32, 33, 34], "torchscript": [1, 2, 5, 7, 10, 11, 12, 19, 23, 26, 32, 34], "torchdynamo": [1, 7, 12, 23, 34], "With": [1, 2, 7, 10, 20, 31, 34], "we": [1, 2, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 18, 19, 20, 21, 23, 28, 30, 32, 33, 34], "recommend": [1, 5, 6, 7, 9, 10, 15, 16, 20, 23, 30, 31, 33, 34], "torch": [1, 2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 18, 20, 23, 26, 29, 32, 33, 34], "jit": [1, 2, 5, 6, 7, 8, 13, 15, 16, 18, 20, 23, 26, 32, 34], "trace": [1, 6, 7, 8, 12, 13, 15, 16, 20, 23, 26, 32, 34], "your": [1, 5, 6, 7, 8, 10, 14, 15, 20, 23, 24, 26, 27, 28, 29, 34], "prefer": [1, 7, 8, 15, 24], "option": [1, 2, 5, 7, 10, 14, 15, 16, 29, 31, 34], "wider": 1, "rang": [1, 6, 7, 15, 16, 19, 21, 26, 31, 32, 34], "ipex": [1, 2, 3, 4, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 23, 26, 29, 31, 32, 34], "backend": [1, 2, 3, 6, 7, 12, 13, 16, 17, 23, 26, 28, 31, 33, 34], "avail": [1, 2, 6, 7, 11, 17, 20, 22, 23, 29, 31, 33, 34], "good": [1, 2, 5, 7, 12, 18, 19, 28, 33, 34], "On": [1, 2, 7, 18, 28, 33], "automat": [1, 2, 6, 7, 9, 10, 12, 13, 15, 16, 18, 22, 28, 31, 32, 33, 34], "dispatch": [1, 34], "underli": [1, 17, 28], "detect": [1, 6, 12, 17, 26, 33, 34], "set": [1, 2, 4, 5, 6, 7, 8, 14, 15, 16, 17, 21, 24, 26, 28, 30, 31, 32, 33, 34], "isa": [1, 34], "leverag": [1, 7, 11, 28, 32, 34], "unit": [1, 2, 33], "runtim": [1, 8, 13, 17, 31, 33, 34], "offer": [1, 5, 33], "finer": [1, 7, 20], "grain": [1, 3, 7, 20], "thread": [1, 2, 7, 20, 26, 30, 31, 32, 33, 34], "control": [1, 2, 7, 20, 26, 31, 33, 34], "weight": [1, 2, 7, 10, 12, 13, 15, 16, 18, 20, 22, 23, 26, 28, 34], "share": [1, 5, 6, 16, 20, 32, 33, 34], "increas": [1, 2, 3, 21, 26, 28, 30, 33, 34], "effici": [1, 7, 11, 19, 20, 28, 31, 33, 34], "implement": [1, 5, 7, 11, 19, 26, 28, 33, 34], "regist": [1, 7, 10, 16, 17, 34], "mechan": [1, 7, 17, 21, 34], "These": [1, 5, 6, 7, 8, 13, 28], "nativ": [1, 6, 7, 8, 17, 19, 21, 26, 28, 34], "calcul": [1, 2, 8, 16, 21, 22], "util": [1, 6, 7, 10, 13, 15, 16, 18, 21, 28, 31, 33, 34], "dpc": 1, "compil": [1, 5, 6, 23, 26, 33, 34], "sycl": 1, "standard": [1, 34], "also": [1, 2, 6, 7, 10, 13, 14, 16, 18, 19, 28, 30, 31, 33, 34], "number": [1, 2, 5, 6, 7, 14, 16, 19, 20, 21, 26, 32, 34], "which": [1, 2, 5, 7, 8, 10, 14, 15, 16, 17, 18, 20, 26, 28, 30, 31, 32, 33, 34], "found": [1, 6, 7, 14, 16, 18, 29, 31, 32, 33, 34], "doc": [1, 2, 5, 11, 29, 34], "directori": [1, 5, 6, 14, 29, 31, 32], "team": [1, 5], "track": 1, "bug": [1, 5, 34], "enhanc": [1, 3, 28, 34], "request": [1, 5, 20, 32], "issu": [1, 2, 5, 8, 21, 26, 33], "befor": [1, 2, 5, 6, 13, 14, 17, 18, 20, 31, 33, 34], "submit": [1, 5, 7, 20], "suggest": [1, 2, 15, 18, 20, 33, 34], "report": [1, 17], "search": [1, 2, 4, 5, 7, 16, 22, 28, 31], "exist": [1, 5, 7, 13, 26, 31, 33], "see": [1, 2, 5, 8, 14, 34], "alreadi": [1, 5, 6, 18, 28, 33], "pytorch": [2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 16, 17, 20, 23, 25, 26, 27, 28, 29, 30, 31, 33, 34], "dtype": [2, 4, 6, 7, 8, 10, 11, 13, 15, 16, 17, 23, 26, 29, 31, 34], "none": [2, 6, 29, 31], "o1": [2, 26, 34], "inplac": [2, 4, 6, 13, 15, 18, 23, 32], "fals": [2, 4, 6, 7, 8, 13, 14, 15, 16, 17, 20, 22, 23, 26, 31, 32, 34], "conv_bn_fold": [2, 26, 34], "linear_bn_fold": 2, "weights_prepack": [2, 6, 7, 23, 26], "replace_dropout_with_ident": 2, "optimize_lstm": 2, "split_master_weight_for_bf16": 2, "fuse_update_step": 2, "auto_kernel_select": [2, 7, 30], "sample_input": [2, 9, 34], "graph_mod": [2, 4, 7, 12, 34], "concat_linear": 2, "appli": [2, 6, 7, 8, 12, 13, 16, 18, 19, 21, 23, 26, 28, 29, 31, 34], "given": [2, 6, 13, 14, 16, 28], "nn": [2, 6, 7, 8, 10, 13, 15, 16, 18, 20, 26, 34], "If": [2, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 20, 26, 31, 32, 33, 34], "train": [2, 3, 4, 7, 11, 13, 15, 16, 18, 21, 23, 26, 28, 29, 31, 34], "otherwis": [2, 7, 20], "infer": [2, 3, 4, 7, 10, 11, 12, 15, 18, 20, 21, 23, 26, 30, 33, 34], "conv": [2, 8, 10, 13, 15, 20, 26, 34], "bn": [2, 10, 15, 26, 34], "fold": [2, 10, 15, 16, 26, 34], "prepack": [2, 6, 10, 18, 26, 28, 34], "so": [2, 5, 6, 7, 8, 15, 17, 18, 20, 30, 31, 32, 33, 34], "onednn": [2, 3, 13, 17, 26, 28, 34], "order": [2, 17, 18, 21, 31, 33, 34], "cach": [2, 5, 7, 19, 20, 30, 34], "reus": [2, 33], "memori": [2, 6, 7, 8, 9, 10, 13, 19, 20, 21, 26, 28, 30, 32, 34], "layout": [2, 26, 34], "call": [2, 6, 8, 13, 17, 18, 21, 26, 32, 33, 34], "block": [2, 5, 16, 20, 22, 28, 33, 34], "although": [2, 33], "itself": [2, 5, 18], "enough": [2, 7, 19], "usag": [2, 6, 7, 8, 23, 25, 32, 33, 34], "perspect": [2, 13, 18, 21, 28, 31, 33], "drawback": [2, 21], "run": [2, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 26, 30, 31, 32, 33, 34], "split": [2, 6, 7, 16, 17, 19, 20, 26, 34], "one": [2, 5, 7, 12, 13, 14, 16, 18, 19, 20, 26, 29, 31, 33, 34], "sever": [2, 7, 10, 19, 30, 31, 34], "dimens": [2, 18, 26], "data": [2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 31, 32, 34], "fix": [2, 5, 7, 34], "size": [2, 6, 7, 11, 15, 16, 17, 18, 23, 26, 28, 30, 32, 33, 34], "each": [2, 8, 14, 16, 17, 19, 20, 21, 31, 32, 33, 34], "time": [2, 5, 7, 14, 16, 17, 18, 19, 26, 28, 30, 33, 34], "execut": [2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 19, 20, 26, 31, 32, 33, 34], "detail": [2, 5, 6, 7, 8, 9, 11, 13, 17, 18, 24, 25, 26, 28, 30, 32, 33, 34], "mermori": 2, "format": [2, 5, 6, 7, 9, 14, 22, 26, 28, 31, 33, 34], "manual": [2, 7, 10, 14, 18, 20, 34], "To": [2, 5, 6, 7, 10, 13, 15, 16, 17, 18, 20, 21, 23, 28, 32, 33, 34], "predefin": 2, "shape": [2, 6, 7, 16, 20, 23, 30, 33, 34], "prior": [2, 23], "match": [2, 8, 17, 31], "requir": [2, 5, 6, 8, 10, 16, 18, 21, 26, 28, 29, 31, 32, 34], "won": [2, 7, 8, 17, 26], "t": [2, 5, 7, 8, 14, 15, 16, 17, 18, 20, 26, 32, 34], "convers": [2, 8, 13, 34], "directli": [2, 6, 33, 34], "go": [2, 5, 8], "methodologi": [2, 6, 7, 19, 33], "possibl": [2, 14, 15, 19, 28, 33, 34], "avoid": [2, 10, 20, 21, 26, 31, 32, 33, 34], "thu": [2, 7, 8, 10, 18, 20, 21, 28, 31, 32, 33], "paramet": [2, 6, 7, 8, 10, 16, 17, 19, 20, 21, 26, 28, 29, 30, 31, 33, 34], "work": [2, 5, 6, 7, 14, 15, 17, 20, 26, 28, 29, 31, 33, 34], "bfloat16": [2, 3, 4, 7, 10, 11, 17, 18, 23, 29, 31, 34], "half": [2, 7, 17, 21], "k": [2, 5], "float16": [2, 8], "cast": [2, 8, 21, 28], "accord": [2, 13, 28, 33, 34], "default": [2, 4, 6, 7, 10, 12, 13, 15, 16, 17, 20, 22, 23, 26, 28, 30, 32, 33, 34], "valu": [2, 6, 10, 14, 16, 17, 19, 20, 21, 22, 26, 28, 31, 32, 33, 34], "mean": [2, 16, 17, 18, 20, 22, 28, 34], "do": [2, 5, 8, 16, 18, 20, 21, 26, 28, 30, 31, 32, 33, 34], "noth": 2, "note": [2, 3, 5, 6, 15, 16, 17, 18, 20, 22, 24, 28, 30, 31, 32, 33], "type": [2, 4, 5, 6, 7, 10, 16, 17, 18, 20, 21, 23, 30, 31, 32, 34], "conv2d": [2, 7, 8, 10, 13, 18, 20, 26, 34], "linear": [2, 6, 7, 8, 13, 15, 16, 18, 26, 33, 34], "convtranspose2d": [2, 13], "case": [2, 6, 7, 9, 12, 16, 17, 18, 28, 31, 33, 34], "addit": [2, 6, 7, 17, 21, 28, 34], "embed": [2, 7, 28, 34], "lstm": [2, 10, 15, 34], "sgd": [2, 6, 7, 8, 16, 19], "string": [2, 31], "o0": [2, 26, 34], "No": [2, 18, 34], "function": [2, 5, 6, 7, 8, 10, 11, 12, 14, 15, 17, 20, 21, 23, 26, 28, 29, 31, 33, 34], "just": [2, 14, 29, 34], "return": [2, 6, 7, 8, 10, 16, 17, 20, 26, 34], "origin": [2, 6, 7, 12, 13, 15, 17, 20, 29, 34], "dropout": [2, 10], "remov": [2, 5, 21, 34], "inferenc": 2, "master": [2, 7, 21, 31], "fuse": [2, 7, 13, 16, 19, 28, 34], "updat": [2, 5, 7, 16, 19, 21, 22, 34], "step": [2, 5, 6, 7, 8, 14, 16, 19, 21, 32], "overridden": [2, 17], "explicitli": [2, 8, 16, 20, 26, 31, 34], "bool": [2, 14], "whether": [2, 6, 8, 16, 18, 22, 23, 33], "conv_bn": 2, "It": [2, 6, 7, 8, 10, 13, 17, 18, 20, 21, 23, 26, 29, 31, 33, 34], "knob": [2, 4, 12, 31], "overwrit": [2, 31], "configur": [2, 4, 6, 7, 14, 15, 16, 17, 31, 32, 34], "linear_bn": 2, "convolut": [2, 6, 7, 13, 20, 33, 34], "reorder": [2, 18, 28], "doesn": [2, 15, 16, 18, 26, 34], "support": [2, 5, 6, 7, 13, 15, 16, 17, 18, 19, 20, 21, 25, 26, 28, 29, 31, 32, 33, 34], "replac": [2, 5, 7, 10, 26, 34], "ident": [2, 10, 18], "aten": [2, 6, 7, 34], "opportunit": 2, "bf16": [2, 3, 7, 17, 19, 21, 23, 26, 28, 30, 34], "save": [2, 5, 6, 7, 13, 14, 15, 16, 18, 21, 28, 32, 34], "solut": [2, 7, 26, 28, 34], "all": [2, 5, 6, 8, 13, 14, 17, 19, 20, 28, 29, 32, 33, 34], "param": [2, 19, 31], "tupl": [2, 6, 17, 20], "tensor": [2, 6, 7, 8, 11, 15, 16, 17, 20, 26, 28, 32, 34], "feed": [2, 9, 18], "sampl": [2, 6, 9, 14, 16, 17, 29, 33], "input": [2, 6, 7, 9, 10, 13, 15, 16, 17, 18, 22, 23, 26, 29, 30, 32, 33, 34], "impact": [2, 7, 20], "pack": [2, 20, 34], "intel": [2, 3, 4, 7, 8, 9, 10, 11, 13, 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 29, 34], "extens": [2, 3, 4, 6, 9, 10, 13, 14, 16, 17, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34], "per": [2, 10, 15, 16, 20, 30, 31, 32, 33, 34], "some": [2, 5, 7, 8, 13, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "heurist": [2, 20, 34], "real": [2, 7, 14, 15, 30, 34], "best": [2, 6, 7, 8, 14, 16, 17, 22, 24, 28, 33, 34], "try": [2, 5, 6, 7, 12, 14, 16, 26, 31, 33, 34], "select": [2, 5, 7, 13, 24, 34], "true": [2, 4, 6, 10, 12, 13, 14, 15, 16, 17, 22, 23, 31, 32, 33, 34], "might": [2, 7, 18, 26, 33, 34], "cost": [2, 6, 28, 30, 33], "extra": [2, 5, 10, 20, 31, 32], "combin": [2, 12, 14, 28, 31, 34], "method": [2, 8, 15, 16, 18, 22, 26, 33, 34], "multipl": [2, 5, 7, 8, 16, 17, 18, 26, 28, 30, 32, 33, 34], "subgraph": 2, "modifi": [2, 5, 6], "other": [2, 6, 7, 8, 14, 17, 18, 19, 23, 28, 31, 33], "place": [2, 8, 28, 33, 34], "scenario": [2, 6, 7, 18, 33, 34], "convolutuon": 2, "counterpart": [2, 7, 18, 34], "pleas": [2, 6, 7, 11, 16, 22, 26, 28, 31, 33, 34], "invok": [2, 6, 8, 10, 13, 20, 23, 26, 29, 34], "ddp": [2, 6], "distribut": [2, 3, 7, 16, 31, 32, 33], "deepcopi": 2, "rather": [2, 18], "than": [2, 5, 7, 17, 18, 20, 21, 26, 33, 34], "allreduc": 2, "caus": [2, 7, 21, 26, 28, 31, 33, 34], "unpredict": 2, "accuraci": [2, 3, 6, 7, 8, 15, 16, 21, 22, 26, 28, 34], "loss": [2, 5, 6, 8, 16, 18, 21, 26], "exampl": [2, 5, 7, 8, 13, 18, 19, 21, 22, 23, 24, 25, 28, 29, 32, 33, 34], "load_state_dict": [2, 34], "path": [2, 6, 7, 14, 18, 20, 23, 31, 33, 34], "eval": [2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "optimized_model": [2, 34], "evalu": [2, 16, 34], "optimized_optim": 2, "altern": [2, 6, 18], "motiv": [2, 20], "ad": [2, 7, 10, 33, 34], "alia": 2, "unifi": [2, 31], "style": [2, 5], "modular": 2, "float32": [2, 13, 21, 23, 26, 30, 31, 34], "quantization_config": [2, 6, 29], "qconfig_summary_fil": [2, 6, 29], "low_precision_checkpoint": [2, 6, 29], "deployment_mod": [2, 6, 23], "transform": [2, 3, 4, 6, 10, 11, 13, 16, 18, 22, 23, 28, 32, 33, 34], "focu": [2, 10, 18, 29, 34], "especi": [2, 5, 28, 34], "task": [2, 7, 28, 31, 33, 34], "famili": [2, 28, 33], "full": [2, 5, 18, 32, 33, 34], "llama": [2, 3, 6, 28], "gpt": [2, 28, 30], "j": [2, 5, 17, 28, 30], "neox": [2, 28], "opt": [2, 6, 17, 28], "falcon": [2, 28], "bloom": [2, 28], "codegen": [2, 28, 34], "baichuan": [2, 28, 34], "chatglm": [2, 28], "gptbigcod": [2, 28], "t5": [2, 26, 28, 34], "mistral": [2, 28, 34], "mpt": [2, 28, 34], "mixtral": [2, 28], "stablelm": [2, 28], "qwen": [2, 28], "git": [2, 5, 28], "llava": [2, 28], "yuan": [2, 28], "phi": [2, 28], "scope": [2, 7, 8, 21, 34], "abov": [2, 5, 10, 19, 28, 30, 31, 32], "transpar": [2, 7, 29, 33, 34], "benifit": 2, "float": [2, 6, 7, 8, 14, 15, 16, 17, 21, 29, 34], "when": [2, 5, 6, 7, 8, 9, 14, 18, 19, 20, 21, 22, 25, 26, 28, 30, 31, 32, 33, 34], "mix": [2, 6, 13, 23, 26, 28, 34], "str": [2, 6, 14, 23, 31], "specifi": [2, 5, 6, 14, 20, 31, 33, 34], "either": [2, 26, 31], "object": [2, 6, 7, 14, 17, 20, 33, 34], "defin": [2, 5, 6, 7, 8, 10, 16, 17, 18, 22, 32], "recip": [2, 4, 7, 13, 15, 26, 28, 34], "quant": [2, 16], "static": [2, 4, 16, 26, 28, 31, 32, 33, 34], "onc": [2, 5, 6, 14, 17, 18, 20, 21, 32, 33], "quantizat": 2, "config": [2, 6, 11, 23, 31, 32], "json": [2, 6, 15, 16, 32, 34], "file": [2, 4, 5, 6, 8, 14, 15, 16, 17, 18, 31, 34], "under": [2, 6, 8, 18, 20, 27, 31, 34], "need": [2, 5, 6, 7, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 29, 31, 32, 33, 34], "calibr": [2, 13, 22, 26, 29, 30, 32, 34], "dict": [2, 6, 23], "int4": [2, 28, 29, 34], "": [2, 3, 5, 8, 10, 14, 15, 18, 19, 20, 21, 22, 26, 31, 32, 33], "should": [2, 5, 8, 15, 20, 28, 31, 33], "state_dict": [2, 6], "checkpoint": [2, 6, 29], "pt": [2, 6, 13, 14, 15, 23, 32, 34], "gptq": [2, 6, 34], "etc": [2, 5, 6, 17, 34], "where": [2, 5, 7, 16, 21, 33], "kei": [2, 7, 28, 34], "scale": [2, 3, 6, 15, 28], "zero": [2, 6, 15, 34], "point": [2, 6, 8, 15, 21, 33, 34], "bia": [2, 8, 20, 34], "weight_kei": 2, "packed_weight": 2, "scale_kei": 2, "zero_point_kei": 2, "packed_zp": 2, "bias_kei": 2, "chang": [2, 5, 6, 7, 8, 10, 11, 12, 15, 17, 18, 20, 23, 25, 26, 29, 31], "make": [2, 5, 6, 7, 14, 15, 17, 21, 23, 28, 32, 33], "n": [2, 6, 7, 16, 18, 19, 20, 26, 32, 33, 34], "thei": [2, 7, 8, 31, 33], "uint4": 2, "compress": 2, "along": [2, 5, 6, 21, 33, 34], "store": [2, 17, 18, 19, 21, 28, 31, 32, 33, 34], "int32": 2, "state": [2, 15, 19, 28], "automaticlli": 2, "deploy": [2, 7, 13, 34], "torchscirpt": 2, "workabl": 2, "forward": [2, 6, 8, 13, 16, 20, 21, 26, 32, 33, 34], "after": [2, 5, 7, 13, 20, 21, 23, 24, 32, 33, 34], "deepspe": [2, 34], "parallel": [2, 5, 6, 7, 28, 33, 34], "class": [2, 5, 6, 7, 8, 10, 16, 20, 26, 34], "verbos": [2, 4, 31], "demand": [2, 7], "easier": [2, 18, 21], "debug": [2, 31], "dump": [2, 31], "messag": [2, 6, 10, 12, 18, 31], "contain": [2, 5, 6, 13, 17, 26, 31, 32, 33, 34], "durat": [2, 21], "while": [2, 7, 8, 11, 12, 18, 21, 26, 28, 32, 33, 34], "via": [2, 5, 6, 7, 18, 20, 30, 31, 33, 34], "environ": [2, 5, 6, 17, 20, 24, 28, 30, 31, 32, 33], "variabl": [2, 5, 17, 30, 31, 32, 33, 34], "name": [2, 5, 7, 14, 17, 25, 28, 31, 32, 33, 34], "dnnl_verbos": 2, "howev": [2, 5, 7, 8, 9, 16, 20, 26, 28, 31, 33, 34], "those": [2, 15, 33], "amount": [2, 16, 26, 28, 33], "investig": [2, 31], "singl": [2, 7, 13, 14, 16, 19, 20, 30, 32, 34], "iter": [2, 16, 21, 28, 34], "out": [2, 5, 6, 7, 8, 10, 13, 16, 19, 20, 30, 31, 33, 34], "second": [2, 10, 28, 32, 33], "verbose_on": 2, "verbose_off": 2, "disabl": [2, 6, 7, 13, 26, 31, 33, 34], "verbose_on_cr": 2, "creation": 2, "linearsilu": [2, 34], "silu": [2, 13], "http": [2, 5, 16, 34], "org": [2, 7, 16, 26, 34], "stabl": [2, 3, 8, 34], "html": [2, 5, 16], "output": [2, 6, 7, 8, 13, 14, 16, 18, 23, 26, 34], "same": [2, 5, 7, 10, 15, 16, 17, 18, 20, 21, 28, 31, 32, 33, 34], "init": [2, 5, 15, 34], "linear_modul": 2, "4096": [2, 33], "ipex_fus": 2, "randn": [2, 10, 13, 16, 18, 32, 34], "linearsilumul": [2, 34], "multipli": 2, "mul": [2, 13, 16], "linear2silumul": [2, 34], "linear_": 2, "linear_m": 2, "two": [2, 7, 14, 16, 20, 21, 28, 32, 33, 34], "linear_s_modul": 2, "linear_m_modul": 2, "linearrelu": [2, 34], "relu": [2, 7, 13, 16, 18, 26, 34], "linearnewgelu": [2, 34], "newgeluactiv": 2, "com": [2, 5, 34], "huggingfac": [2, 6, 26, 28, 32, 34], "blob": 2, "src": [2, 17], "activ": [2, 6, 7, 15, 16, 20, 28, 31, 33], "py": [2, 5, 10, 14, 20, 31, 32, 34], "l50": 2, "new_gelu": 2, "lineargelu": [2, 34], "gelu": [2, 13, 34], "linearmul": [2, 34], "linearadd": [2, 34], "add": [2, 5, 7, 8, 13, 14, 19, 21, 32, 34], "linearaddadd": [2, 34], "other_1": 2, "other_2": 2, "rotaryembed": [2, 34], "max_position_embed": 2, "int": [2, 6, 7, 14, 17, 23, 26, 29, 31, 34], "pos_embd_dim": 2, "10000": 2, "backbon": 2, "co": 2, "paper": [2, 34], "2104": 2, "09864": 2, "queri": [2, 17, 18], "multi": [2, 7, 14, 20, 28, 31, 33, 34], "head": [2, 34], "comput": [2, 6, 7, 13, 15, 16, 18, 20, 21, 28, 30, 31, 32, 33, 34], "max": [2, 6, 16, 17, 22, 23, 26, 34], "posit": [2, 28, 33, 34], "frequenc": [2, 30], "exact": 2, "g": [2, 7, 8, 16, 17, 18, 28, 34], "gptjforcausallm": 2, "architectur": [2, 28, 30, 33], "eleutherai": [2, 28], "6b": [2, 28, 30], "l4": 2, "batch": [2, 6, 7, 13, 16, 18, 20, 23, 26, 30, 32, 34], "sequenc": [2, 18, 21, 28, 34], "length": [2, 5, 14, 21, 26, 30, 34], "num_head": 2, "num_kv_head": 2, "head_dim": 2, "position_id": [2, 6], "element": [2, 18, 19], "past_kv_length": 2, "id": [2, 31, 32], "construct": [2, 7, 13], "current_posit": 2, "num": [2, 20, 32, 33, 34], "dim": [2, 6, 18, 23], "offset": [2, 18, 28], "sin": 2, "neighbor": 2, "rotary_dim": 2, "rotary_ndim": 2, "rotari": [2, 28], "64": [2, 8, 10, 16, 20, 30, 31, 34], "gptj": 2, "rope_modul": 2, "2048": [2, 6], "32": [2, 6, 18, 21, 23, 30, 31, 32], "16": [2, 17, 20, 21, 30, 31, 32], "256": [2, 30], "arang": [2, 6, 16], "unsqueez": 2, "query_roteri": 2, "direct": [2, 5, 13], "apply_funct": 2, "without": [2, 5, 6, 7, 8, 10, 16, 20, 21, 26, 32, 34], "initi": [2, 20, 32], "assum": [2, 7, 8, 23, 32, 33, 34], "num_token": 2, "rotary_half": 2, "rmsnorm": [2, 28, 34], "hidden_s": [2, 6], "ep": [2, 7, 10, 19], "1e": [2, 7, 10, 16], "06": [2, 31, 32], "hidden": [2, 18, 28], "modeling_llama": 2, "l76": 2, "variance_epsilon": 2, "6": [2, 5, 7, 11, 14, 20, 30, 31, 32, 33, 34], "ones": [2, 6, 17], "hidden_st": 2, "usual": [2, 18, 20, 33], "rmsnorm_modul": 2, "fastlayernorm": [2, 34], "normalized_shap": 2, "layernorm": [2, 13, 16, 22, 34], "list": [2, 5, 7, 8, 13, 14, 16, 18, 25, 29, 31, 32, 33, 34], "denomin": 2, "numer": [2, 8, 33], "stabil": [2, 8, 34], "layernorm_modul": 2, "05": [2, 7, 10, 30, 31], "indirectaccesskvcacheattent": [2, 34], "text_max_length": 2, "kv_cach": [2, 28], "decod": [2, 28, 30, 34], "layer": [2, 16, 20, 22, 28, 34], "bring": [2, 6, 7, 9, 15, 16, 21, 28, 31, 33, 34], "beam": [2, 28], "idx": [2, 28, 31], "concat": [2, 20, 26, 28, 34], "entir": [2, 16, 28], "context": [2, 5, 6, 8, 20, 28, 33], "dot": [2, 7, 18, 28], "veri": [2, 5, 15, 18, 28], "long": [2, 6, 18, 21, 26, 28, 34], "bottleneck": [2, 28], "indirect": 2, "access": [2, 6, 7, 18, 19, 32], "iakv": [2, 28], "firstli": [2, 28], "pre": [2, 28, 34], "alloc": [2, 10, 20, 28, 30, 32, 34], "buffer": [2, 28], "index": [2, 5, 18, 28, 33], "histori": [2, 14, 28], "decid": [2, 15, 20, 28], "timestamp": [2, 28], "max_seq": 2, "head_num": 2, "head_siz": 2, "token": [2, 6, 23, 28, 30], "everi": [2, 28], "kv": 2, "seq_len": [2, 30], "scale_attn": 2, "sqrt": [2, 13, 19], "layer_past": 2, "seq_info": 2, "key_cach": 2, "value_cach": 2, "info": [2, 6, 17, 26, 31, 32, 34], "head_mask": 2, "mask": [2, 7, 17, 26], "yet": [2, 6, 26, 34], "attention_mask": [2, 6], "attn_weight": 2, "first": [2, 3, 5, 6, 7, 9, 10, 12, 16, 19, 20, 21, 26, 31, 32, 33], "matmul": [2, 8, 13, 26, 34], "new_layer_past": 2, "attn_output": 2, "l1318": 2, "def": [2, 6, 8, 10, 16, 20, 26, 34], "_reorder_cach": 2, "self": [2, 6, 8, 10, 16, 20, 26, 34], "past_key_valu": [2, 6], "beam_idx": 2, "len": [2, 6, 7, 13, 16, 17], "4": [2, 6, 11, 13, 14, 18, 20, 23, 28, 30, 31, 33, 34], "3": [2, 5, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 20, 21, 28, 30, 31, 33], "pagedattent": [2, 34], "vllm": 2, "blog": [2, 34], "2023": [2, 3, 30], "20": [2, 7, 18, 30, 31, 32, 34], "page": [2, 6, 13, 20, 24, 29, 30, 33, 34], "num_block": 2, "block_siz": 2, "basic": [2, 4, 16, 21, 33], "logic": [2, 14, 18, 32, 33], "dram": 2, "manag": [2, 8, 13, 20, 28, 31], "slot": [2, 30], "reshape_and_cach": 2, "single_query_cached_kv_attent": 2, "mha": [2, 34], "intra": 2, "tabl": [2, 7, 17, 28, 30, 34], "map": [2, 6, 18, 30], "physic": [2, 14, 20, 32, 33], "slot_map": 2, "allcat": 2, "keytensor": 2, "num_seq": 2, "block_numb": 2, "head_map": 2, "block_tabl": 2, "context_len": 2, "max_context_len": 2, "alibi_slop": 2, "5": [2, 6, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 26, 28, 30, 31, 32, 33, 34], "max_num_blocks_per_seq": 2, "optin": 2, "alibi": 2, "slope": 2, "varlenattent": [2, 34], "scaled_dot_product_attent": 2, "accept": [2, 34], "variant": [2, 8, 28], "among": [2, 31, 32, 33], "doe": [2, 7, 13, 18, 20, 26, 34], "arg": [2, 4, 6, 7, 14, 16, 19, 23, 31, 32, 34], "query_token": 2, "total": [2, 6, 30, 33], "key_token": 2, "value_token": 2, "seqlen_q": 2, "batch_siz": [2, 6, 11, 13, 16, 18, 23, 32], "seqlen_k": 2, "max_seqlen_q": 2, "max_seqlen_k": 2, "pdropout": 2, "probabl": 2, "greater": 2, "softmax_scal": 2, "factor": [2, 6, 16, 31], "softmax": [2, 13, 34], "is_caus": 2, "causal": 2, "varlenattention_modul": 2, "emply_lik": 2, "rotary_embed": [2, 34], "rms_norm": [2, 34], "fast_layer_norm": [2, 34], "expect": [2, 7, 30, 34], "indirect_access_kv_cache_attent": [2, 34], "add_casual_mask": 2, "varlen_attent": [2, 34], "zero_tensor": 2, "return_softmax": 2, "gen_": 2, "fast_bert": [2, 4, 6, 7, 11, 34], "unpad": 2, "tpp": [2, 28], "speedup": [2, 6, 8, 28, 30, 34], "still": [2, 5, 7, 8, 13, 16, 18, 21, 26, 34], "squenc": 2, "sparsiti": 2, "seed": 2, "libxsmm": 2, "though": [2, 7], "peak": [2, 7, 11, 34], "enable_onednn_fus": [2, 13], "get_smooth_quant_qconfig_map": [2, 6, 29], "alpha": [2, 6, 19, 22], "act_observ": 2, "act_ic_observ": 2, "wei_observ": 2, "wei_ic_observ": 2, "share_weight_observ": 2, "smoothquant": [2, 6, 7, 16, 22, 28, 34], "arxiv": 2, "pdf": 2, "2211": 2, "10438": 2, "hyper": [2, 30, 33, 34], "observ": [2, 9, 13, 15, 34], "op": [2, 7, 15, 16, 22, 28, 34], "histogramobserv": [2, 15], "q": [2, 28], "min": [2, 16, 22, 26, 34], "affect": [2, 31], "argument": [2, 6, 7, 22, 26, 31], "ao": [2, 6, 15], "minmaxobserv": [2, 6, 15], "channel": [2, 3, 10, 15, 16, 26, 34], "perchannelminmaxobserv": [2, 6, 15], "with_arg": [2, 6, 15], "ch_axi": 2, "qint8": [2, 6, 15], "qscheme": [2, 6, 15, 34], "per_channel_symmetr": [2, 6, 15], "qconfig": [2, 4, 6, 13, 16, 26, 29, 32, 34], "prepar": [2, 4, 6, 13, 16, 26, 29, 32, 34], "example_input": [2, 4, 6, 13, 15, 29, 32, 34], "bn_fold": 2, "example_kwarg_input": 2, "fp32": [2, 4, 16, 17, 19, 21, 23, 28, 34], "A": [2, 5, 6, 7, 10, 11, 17, 26, 28, 31, 33, 34], "even": [2, 5, 7, 33, 34], "prepared_model": [2, 4, 6, 13, 15, 16, 26, 29, 34], "original_model": 2, "later": [2, 7, 25, 33], "unexpect": 2, "behavior": [2, 20, 31, 33], "insert": [2, 16], "fake": 2, "introduct": [2, 7, 28, 33, 34], "avaiabl": 2, "autotun": [2, 4, 22, 34], "calib_dataload": [2, 6, 16, 34], "calib_func": 2, "eval_func": [2, 16, 34], "op_type_dict": 2, "smoothquant_arg": [2, 16], "sampling_s": [2, 4, 16, 34], "accuracy_criterion": [2, 4, 16, 34], "tuning_tim": [2, 4, 16, 34], "driven": 2, "tune": [2, 3, 4, 7, 8, 15, 20, 26, 28, 31, 32, 34], "help": [2, 5, 6, 17, 23, 28, 31, 33, 34], "quickli": 2, "dataload": [2, 6, 10, 13, 16, 20, 22, 29, 34], "post": [2, 4, 5, 7, 15, 28, 34], "process": [2, 6, 7, 11, 12, 14, 16, 19, 20, 21, 26, 31, 32, 33], "metric": [2, 16, 30], "scalar": 2, "higher": [2, 7, 13, 17, 18, 28], "constraint": [2, 34], "optyp": 2, "wise": [2, 16, 19, 22, 29, 34], "space": [2, 7, 16, 18, 22, 33], "global": [2, 20, 22, 34], "algorithm": [2, 13, 18, 30, 34], "would": [2, 5, 6, 14, 16, 17, 18, 30, 31, 32, 33, 34], "explor": 2, "100": [2, 4, 14, 16, 17, 30, 32], "accuracy_criterion_typ": 2, "rel": [2, 4, 16, 31, 34], "absolut": [2, 31], "accuracy_criterion_valu": 2, "maximum": [2, 16, 17], "allow": [2, 8, 14, 16, 22, 31, 33, 34], "01": [2, 4, 7, 16, 31, 32, 34], "timeout": [2, 5, 21], "earli": [2, 34], "stop": [2, 33], "is_runtime_ext_en": 2, "helper": 2, "check": [2, 5, 6, 7, 13, 18, 28, 29, 31, 34], "exetens": 2, "openmp": [2, 7, 20, 26, 30, 32, 34], "preload": [2, 31], "cpupool": [2, 20, 34], "core_id": [2, 20, 31], "node_id": [2, 20, 31, 32, 34], "abstract": [2, 11, 20], "pool": [2, 20, 34], "core": [2, 7, 14, 17, 30, 33, 34], "numa": [2, 20, 31, 32, 34], "node": [2, 20, 30, 32, 33, 34], "pin": [2, 20], "cpu_pool": [2, 20, 34], "region": [2, 8, 17, 33], "design": [2, 5, 8, 18, 21, 29, 34], "decor": 2, "multistreammodulehint": [2, 20, 34], "kwarg": [2, 29], "hint": [2, 20], "multistreammodul": [2, 7, 20, 26, 34], "its": [2, 6, 7, 8, 14, 17, 21, 28, 30, 31, 32, 33, 34], "arbitrari": 2, "keyword": 2, "num_stream": [2, 20, 34], "auto": [2, 6, 10, 17, 18, 22, 23, 26, 28, 31, 33, 34], "concat_output": 2, "input_split_hint": [2, 20], "multi_stream": 2, "output_concat_hint": [2, 20], "stream": [2, 7, 20, 34], "throughput": [2, 3, 18, 20, 26, 28, 30, 34], "insid": [2, 5, 20, 31], "divis": [2, 20], "equal": [2, 15, 20, 32, 33], "remaind": [2, 20], "divisor": [2, 20], "batchsiz": [2, 20], "larger": [2, 20, 30, 33], "piec": [2, 20], "less": [2, 8, 18, 20, 26, 34], "mini": [2, 20, 34], "don": [2, 5, 8, 14, 17, 34], "want": [2, 5, 7, 14, 15, 17, 20, 31, 34], "leav": [2, 20, 33], "scriptmodul": [2, 13, 20], "union": 2, "instanc": [2, 7, 10, 14, 32, 34], "reason": [2, 10, 18, 20, 34], "flag": [2, 5, 7, 17, 20, 31, 34], "indic": [2, 6, 18, 28], "concaten": [2, 21], "raw": 2, "asynchron": [2, 7], "get_core_list_of_node_id": 2, "softwar": [3, 27, 34], "jul": 3, "deep": [3, 7, 8, 11, 13, 14, 21, 33], "learn": [3, 7, 8, 11, 13, 14, 21, 31, 33], "boost": [3, 6, 7, 9, 21, 30, 31, 33, 34], "dl": [3, 7, 34], "hug": 3, "face": 3, "bert": [3, 4, 10, 30, 34], "googl": [3, 5, 28], "cloud": 3, "platform": [3, 7, 18, 32, 33, 34], "gcp": 3, "technologi": [3, 7], "guid": [3, 6, 7, 17, 32, 34], "apr": 3, "mar": [3, 32], "new": [3, 5, 12, 16, 17, 18, 20, 23, 26, 29, 33], "x86": 3, "sapphir": 3, "rapid": 3, "part": [3, 5, 7, 8, 18, 21, 26, 33, 34], "jan": 3, "secur": 3, "torchserv": [3, 34], "confer": 3, "dec": 3, "2022": [3, 31, 32], "what": [3, 5, 6, 8, 23], "pyg": 3, "diffus": [3, 34], "arc": 3, "nov": 3, "13": [3, 10, 17, 30, 31, 32, 33], "potenti": [3, 7, 34], "fine": [3, 20, 31, 32, 33, 34], "fx": [3, 7, 10, 26, 34], "sep": [3, 17], "empow": 3, "xeon": [3, 7, 14, 21, 28, 30, 32, 33, 34], "scalabl": [3, 7, 21, 28, 30, 33, 34], "processor": [3, 7, 19, 21, 28, 30, 33, 34], "aug": [3, 30], "vision": [3, 6, 30], "last": [3, 10, 21, 26, 34], "One": [3, 18, 19, 31, 33], "click": 3, "compressor": [3, 7, 16, 22, 34], "4x": 3, "jun": 3, "grokk": 3, "principl": [3, 18], "kt": 3, "person": 3, "text": [3, 6, 26, 28, 30, 33], "speech": [3, 33], "2021": [3, 17, 31, 32], "up": [3, 7, 11, 20, 24, 28, 33, 34], "modern": 3, "naver": 3, "low": [3, 4, 6, 7, 21, 23, 31, 33, 34], "latenc": [3, 14, 18, 28, 30, 32, 34], "machin": [3, 5, 6, 7, 14, 17, 26, 31, 32, 33, 34], "feb": 3, "dlrm": [3, 7, 26, 30, 34], "oneccl": [3, 6, 31, 34], "mention": [3, 10, 20, 21, 34], "deprec": [3, 26], "facebook": [3, 6, 28], "3rd": [3, 7, 21, 30, 34], "gen": [3, 30, 34], "capabl": [3, 17, 34], "2020": 3, "collabor": 3, "2019": 3, "caff": 3, "2017": 3, "command": [4, 5, 6, 14, 23, 31, 32, 33, 34], "descript": [4, 7, 16, 18, 20, 25, 33, 34], "instal": [4, 5, 6, 23, 25, 26, 28, 33, 34], "m": [4, 14, 20, 26, 31, 32, 33, 34], "pip": [4, 5, 34], "captur": [4, 34], "log": [4, 6, 13, 31, 32, 34], "prompt": [4, 6, 23, 34], "export": [4, 31, 33], "onednn_verbos": 4, "dure": [4, 6, 7, 10, 13, 16, 21, 31, 33, 34], "precis": [4, 6, 13, 21, 23, 26, 30, 34], "no_grad": [4, 6, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "amp": [4, 6, 10, 23, 26, 34], "autocast": [4, 6, 7, 10, 23, 34], "prototyp": [4, 13, 20, 26, 34], "fast": [4, 12, 33, 34], "bertmodelmodel": 4, "bertmodel": [4, 6, 11, 32], "from_pretrain": [4, 6, 11, 23, 29, 32], "uncas": [4, 6, 10, 11, 32, 34], "launch": [4, 6, 20, 32, 34], "autom": [4, 7, 8, 14, 31, 32, 34], "ipexrun": [4, 10, 31, 34], "lt": [4, 28, 30], "your_pytorch_script": [4, 31], "gt": [4, 14, 28, 33], "hypertun": [4, 34], "hyperparamet": [4, 7], "conf": [4, 13, 14, 31, 34], "your_conf_fil": [4, 34], "your_python_script": [4, 34], "default_static_qconfigprepared_model": 4, "anyplac": 4, "d": [4, 5, 6, 7, 8, 13, 26, 28, 34], "calibration_data_load": [4, 6, 13], "converted_model": [4, 6, 26, 34], "default_dynamic_qconfigprepared_model": 4, "tuned_model": [4, 16, 34], "eval_funct": 4, "convert_model": [4, 13, 15, 16], "thank": [5, 34], "interest": 5, "begin": 5, "intent": 5, "propos": [5, 7, 11, 16, 18, 21], "intend": 5, "shall": [5, 18, 33], "discuss": [5, 18, 33], "agre": 5, "plan": [5, 7, 10], "look": [5, 14, 16, 18], "ahead": 5, "outstand": 5, "pick": 5, "comment": [5, 14, 17, 22, 34], "particular": [5, 6, 8, 29, 34], "ask": 5, "pull": 5, "here": [5, 8, 10, 13, 16, 17, 18, 20, 26, 32, 33, 34], "uninstal": 5, "ll": [5, 32, 33], "know": 5, "fulli": [5, 15, 17, 21, 33, 34], "warn": [5, 6, 12, 31, 32, 34], "skip": [5, 6, 17, 18, 31], "few": [5, 7, 9, 13, 16, 18, 32, 34], "alwai": [5, 6, 7, 8, 18, 31, 33, 34], "loop": [5, 21, 29], "re": [5, 8, 32, 33], "feel": [5, 18, 34], "lazi": 5, "ye": 5, "clone": 5, "copi": [5, 17, 18], "cd": [5, 6], "rebas": [5, 34], "submodul": 5, "sync": [5, 20], "recurs": 5, "job": 5, "setup": [5, 6, 28, 34], "symlink": 5, "tree": [5, 6], "reinstal": [5, 26], "again": [5, 19, 32], "__init__": [5, 6, 8, 10, 16, 20, 26, 34], "repeatedli": 5, "interfac": [5, 6, 18, 26, 28], "pyi": 5, "non": [5, 8, 13, 18, 30, 32, 34], "cpp": [5, 6, 33], "cc": [5, 6, 17], "cu": 5, "h": [5, 6, 7, 16, 18, 26, 31, 32], "sure": [5, 14, 15, 32, 33], "until": [5, 20, 21, 33], "next": [5, 7, 34], "clean": 5, "cmake": [5, 6, 17, 34], "must": [5, 14, 17, 19], "maco": 5, "linux": [5, 6, 17, 30, 31, 33], "homebrew": 5, "brew": 5, "our": [5, 16, 19, 28, 33, 34], "error": [5, 6, 7, 10, 16, 18, 21, 22, 26, 34], "printf": 5, "stdio": 5, "nint": 5, "hello": 5, "world": [5, 7], "clang": 5, "simpl": [5, 7, 8, 11, 18, 33, 34], "binari": [5, 6, 7, 8, 17, 34], "folder": 5, "mani": [5, 14, 28, 31, 33, 34], "wai": [5, 10, 16, 18, 28, 34], "rm": 5, "rf": 5, "toplevel": 5, "over": [5, 7, 8, 9, 16, 18, 30, 31, 34], "made": [5, 34], "edit": [5, 26, 34], "repo": [5, 6, 7], "commit": 5, "ani": [5, 8, 10, 17, 18, 32, 34], "keep": [5, 12, 18, 21, 28, 32, 33, 34], "realli": 5, "untrack": 5, "deinit": 5, "f": [5, 6, 13, 16, 28, 34], "xdf": 5, "within": [5, 16, 21, 29, 33, 34], "experi": [5, 7, 10, 12, 16, 18, 26, 33, 34], "env_key1": 5, "env_val1": 5, "env_key2": 5, "env_val2": 5, "suit": 5, "locat": [5, 17, 34], "test_": 5, "individu": [5, 30], "filenam": 5, "repres": [5, 7, 21], "wish": [5, 7], "test_jit": 5, "narrow": 5, "down": [5, 32, 34], "testclassnam": 5, "testnam": 5, "let": [5, 10, 18, 19, 20, 21], "sai": 5, "test_sequenti": 5, "testjit": 5, "expecttest": 5, "hypothesi": 5, "mypi": 5, "depend": [5, 7, 17, 18, 25, 26, 33, 34], "conda": [5, 33], "offici": [5, 32, 33, 34], "unittest": 5, "substr": 5, "test_nn": 5, "v": 5, "testnn": 5, "test_bceloss": 5, "test_mseloss": 5, "keystrok": 5, "ci": 5, "quicklint": 5, "aren": 5, "setup_lint": 5, "target": [5, 6, 10, 13, 14, 17, 34], "makefil": 5, "complet": [5, 6, 14, 18, 29, 33], "tab": 5, "trail": [5, 21], "newlin": 5, "quick_check": 5, "flake8": 5, "cmakelint": 5, "tidi": 5, "changed_onli": 5, "written": [5, 6, 17], "framework": [5, 34], "runner": 5, "bin": [5, 6, 17, 31, 32], "gtest_filt": 5, "testsuit": 5, "maycontainalia": 5, "containeraliasingtest": 5, "test_alias_analysi": 5, "docstr": 5, "line": [5, 10, 13, 18, 31, 32, 33], "limit": [5, 8, 10, 20, 26, 32, 33, 34], "80": [5, 30, 31], "charact": 5, "fit": [5, 7, 33, 34], "jupyt": 5, "popup": 5, "prerequisit": [5, 6], "r": [5, 6, 7, 14, 23, 30, 32, 33], "txt": [5, 6, 32], "_build": 5, "rst": 5, "live": 5, "tutori": [5, 6, 15, 16, 34], "autofunct": 5, "autoclass": 5, "shorten": 5, "sphinx": 5, "produc": [5, 8], "miss": 5, "relat": [6, 13, 17, 31, 33, 34], "demonstr": [6, 18, 26, 32], "box": [6, 10, 33], "benefit": [6, 7, 8, 10, 20, 21, 28, 32, 33, 34], "against": 6, "below": [6, 8, 10, 14, 19, 20, 21, 22, 23, 26, 28, 31, 32, 33, 34], "criterion": [6, 8, 16, 22], "zero_grad": [6, 7, 16], "torchvis": [6, 10, 12, 13, 16, 18, 32, 34], "lr": [6, 7, 8, 16, 19], "001": [6, 8], "download": [6, 13, 16], "dataset": [6, 13, 16, 29, 30, 33, 34], "cifar10": [6, 13], "compos": [6, 13], "resiz": [6, 13], "224": [6, 8, 10, 12, 13, 30, 32, 34], "totensor": [6, 13, 16], "train_dataset": [6, 13], "root": [6, 13, 16, 17, 28], "train_load": [6, 8], "128": [6, 8, 10, 13, 20, 30, 34], "crossentropyloss": [6, 16], "momentum": [6, 10, 21], "9": [6, 7, 14, 17, 23, 25, 31, 32], "uncom": 6, "batch_idx": [6, 13], "enumer": [6, 13, 16, 29], "backward": [6, 7, 8, 16, 21, 33, 34], "print": [6, 11, 12, 13, 14, 16, 17, 23, 31], "model_state_dict": 6, "optimizer_state_dict": 6, "pth": 6, "finish": [6, 11, 12, 13, 16, 20], "noqa": [6, 11, 12, 13, 16, 23, 29], "f401": [6, 11, 12, 13, 16, 23, 29], "oneapi": [6, 33], "collect": [6, 32, 33, 34], "commun": [6, 28, 31, 32, 33, 34], "bind": [6, 7, 31, 32, 33, 34], "o": [6, 17, 23, 30], "dist": 6, "oneccl_bindings_for_pytorch": 6, "torch_ccl": 6, "master_addr": 6, "127": [6, 31, 34], "master_port": 6, "29500": [6, 31], "rank": [6, 31, 34], "pmi_rank": 6, "world_siz": [6, 29], "pmi_siz": [6, 29], "init_process_group": 6, "ccl": [6, 31, 34], "init_method": 6, "env": [6, 29], "dist_sampl": 6, "distributedsampl": 6, "sampler": 6, "distributeddataparallel": 6, "batch_id": 6, "destroy_process_group": 6, "nlp": [6, 7, 26, 30, 34], "resnet50_weight": [6, 12, 13], "rand": [6, 8, 12, 13, 20, 26, 34], "vocab_s": [6, 11, 32], "seq_length": [6, 11, 32], "randint": [6, 11, 32], "freez": [6, 8, 10, 13, 15, 16, 20, 23, 26, 32, 34], "check_trac": [6, 13, 32], "strict": [6, 32], "sinc": [6, 7, 18, 19, 20, 21, 26, 33, 34], "manual_se": [6, 11], "43": [6, 11, 31, 32], "12": [6, 10, 14, 17, 30, 31, 32], "instanti": 6, "qconfig_map": 6, "default_static_qconfig_map": 6, "own": [6, 15, 28], "qconfigmap": 6, "per_tensor_affin": [6, 15, 34], "quint8": [6, 15], "set_glob": 6, "traced_model": [6, 10, 13, 15, 16, 26, 34], "static_quantized_model": 6, "local": [6, 20, 28, 31, 32, 33], "default_dynamic_qconfig_map": 6, "placeholderobserv": [6, 15], "is_dynam": [6, 15], "dynamic_quantized_model": 6, "dedic": [6, 28, 34], "faster": [6, 7, 8, 30, 33], "variou": [6, 7, 14, 28, 33, 34], "38": [6, 11, 31, 32], "account": 6, "pretrain": [6, 32, 34], "login": 6, "argpars": [6, 23], "autoconfig": [6, 23], "automodelforcausallm": [6, 23, 29, 34], "autotoken": [6, 23], "parser": [6, 23], "argumentpars": [6, 23], "add_help": [6, 23], "add_argu": [6, 23], "choic": [6, 21, 23, 31], "choos": [6, 8, 20, 23, 31, 33, 34], "dinner": [6, 23], "greedi": [6, 23], "action": [6, 23], "store_tru": [6, 23], "parse_arg": [6, 23], "amp_en": [6, 23], "els": [6, 14, 17, 18, 23], "amp_dtyp": [6, 23], "getattr": [6, 23], "model_id": [6, 23], "125m": 6, "trust_remote_cod": [6, 23], "torch_dtyp": [6, 23], "low_cpu_mem_usag": [6, 23], "memory_format": [6, 7, 18, 23], "channels_last": [6, 7, 18, 23, 33, 34], "num_beam": [6, 23], "generate_kwarg": [6, 23], "do_sampl": [6, 23], "temperatur": [6, 23], "input_s": [6, 23], "return_tensor": [6, 23], "input_id": [6, 23], "inference_mod": [6, 23, 29], "gen_id": [6, 23], "max_new_token": [6, 23], "gen_text": [6, 23], "batch_decod": [6, 23], "skip_special_token": [6, 23], "input_tokens_length": [6, 23], "output_tokens_length": [6, 23], "total_new_token": [6, 23], "zip": [6, 23, 34], "flush": [6, 23], "typic": [6, 10, 28, 33, 34], "summari": [6, 34], "narg": 6, "neelnanda": 6, "pile": 6, "10k": 6, "meta": [6, 18, 28, 29], "7b": [6, 28, 30], "hf": [6, 28], "beam_idx_tmp": 6, "contigu": [6, 13, 18, 33, 34], "global_past_key_valu": 6, "num_attention_head": 6, "user_model": [6, 15], "num_hidden_lay": 6, "pad_val": 6, "pad_max": 6, "tokenize_funct": 6, "set_format": 6, "column": 6, "elif": 6, "collate_batch": 6, "position_ids_pad": 6, "input_ids_pad": 6, "last_ind": 6, "attention_mask_pad": 6, "append": [6, 7], "vstack": 6, "calib_dataset": [6, 29], "load_dataset": 6, "calib_evalu": 6, "shuffl": 6, "collate_fn": 6, "break": [6, 16, 34], "calibration_sampl": 6, "save_qconf_summari": [6, 15, 16, 29], "qconf_summari": [6, 15, 16, 29], "int8_qconfig": 6, "done": [6, 10, 16, 17, 26, 33, 34], "Will": [6, 18], "exit": [6, 31], "benchmark": [6, 26, 30, 31, 34], "lowp": 6, "fp16": [6, 17, 29], "unrel": 6, "lowp_mod": [6, 29], "fall": [6, 12], "back": [6, 12, 17, 18, 21, 26], "implicitli": 6, "determin": [6, 17, 21, 33], "woqweightdtyp": [6, 29], "weight_dtyp": [6, 29], "woqlowpmod": [6, 29], "get_weight_only_quant_qconfig_map": [6, 29], "known": [6, 10, 28], "practic": [6, 21, 24, 28, 33], "libtorch": [6, 34], "suppos": [6, 14, 33], "handl": [6, 18, 33], "servic": [6, 28, 30, 33], "regular": [6, 21], "unlik": 6, "app": [6, 34], "iostream": 6, "argc": 6, "const": [6, 17], "char": 6, "argv": 6, "catch": 6, "c10": [6, 17], "std": [6, 17, 19], "cerr": 6, "ivalu": 6, "push_back": 6, "cout": 6, "slice": [6, 18], "end": [6, 13, 20, 34], "endl": 6, "cmakelist": 6, "cmake_minimum_requir": 6, "version": [6, 7, 16, 17, 25, 26, 27, 32, 33, 34], "fatal_error": 6, "find_packag": 6, "add_execut": 6, "target_link_librari": 6, "torch_ipex_librari": 6, "set_properti": 6, "properti": [6, 32], "cxx_standard": 6, "17": [6, 30, 31, 32], "mkdir": 6, "build": [6, 28, 33, 34], "dcmake_prefix_path": 6, "libpytorch_path": 6, "had": [6, 33], "verifi": [6, 7], "ldd": 6, "workspac": 6, "identif": [6, 17], "gnu": [6, 17, 32], "xx": 6, "cxx": [6, 17], "abi": [6, 17, 34], "usr": [6, 17, 31, 32], "torchconfig": 6, "22": [6, 30, 31, 32], "kineto_librari": 6, "notfound": 6, "stack": [6, 8], "most": [6, 7, 13, 21, 28, 30, 32, 33, 34], "recent": [6, 7, 18], "append_torchlib_if_found": 6, "ipexconfig": 6, "84": [6, 30, 31, 33], "lib": [6, 31, 32], "libintel": [6, 34], "ext": [6, 34], "0x00007f3cf98e0000": 6, "libc10": 6, "0x00007f3cf985a000": 6, "0x00007f3cf70fc000": 6, "libtorch_cpu": 6, "0x00007f3ce16ac000": 6, "libdnnl_graph": 6, "0x00007f3cde954000": 6, "former": 6, "zoo": [6, 30], "simpli": [6, 7, 26, 31], "overview": [7, 25, 29, 34], "three": [7, 16, 17], "claus": [7, 10, 19], "guidanc": 7, "intel_pytorch_extens": [7, 25, 26, 34], "10": [7, 14, 16, 17, 18, 21, 25, 26, 31, 32, 33], "correct": [7, 18, 25, 34], "speed": [7, 11, 19, 28, 33, 34], "happen": 7, "inductor": [7, 34], "level": [7, 10, 13, 16, 18, 20, 21, 26, 33, 34], "migrat": 7, "pattern": [7, 11, 18, 28, 34], "highli": [7, 23, 28, 33, 34], "adapt": 7, "nchw": [7, 33], "nhwc": [7, 33, 34], "could": [7, 13, 16, 18, 26, 32, 33, 34], "anymor": [7, 34], "aka": [7, 18], "cooper": [7, 30, 34], "lake": [7, 30, 34], "avx512": [7, 17, 18, 32, 34], "partial": 7, "upstream": [7, 18, 34], "land": [7, 34], "pr": [7, 18, 34], "being": [7, 33], "review": [7, 34], "instead": [7, 8, 14, 19, 20, 29, 30, 31, 32, 33, 34], "device_nam": [7, 8], "conduct": 7, "frequent": 7, "websit": 7, "registr": 7, "topologi": [7, 18, 19, 26, 30, 31, 33, 34], "roialign": [7, 34], "nm": [7, 34], "cnn": [7, 18, 26, 30, 33, 34], "frozenbatchnorm2d": 7, "num_featur": 7, "batchnorm2d": [7, 10, 26, 34], "statist": 7, "affin": [7, 10, 15, 20, 31, 32, 33], "w": [7, 16, 18, 21, 30, 32], "interact": [7, 34], "beyond": 7, "kind": 7, "gender": 7, "hobbi": 7, "between": [7, 8, 17, 20, 33, 34], "man": [7, 33], "plai": [7, 33], "footbal": 7, "b": [7, 8, 16, 28], "mergedembeddingbag": 7, "embedding_spec": 7, "embeddingspec": 7, "merg": [7, 34], "embeddingbag": [7, 26, 34], "At": [7, 17], "stage": [7, 10, 19, 20, 29, 33, 34], "spars": [7, 18, 34], "dens": [7, 18], "gradient": 7, "mergedembeddingbagwithsgd": 7, "emblist": 7, "modulist": 7, "emb1": 7, "emb2": 7, "emb3": 7, "emb_m": 7, "in1": 7, "in2": 7, "in3": 7, "in_m": 7, "emb": 7, "in_i": 7, "merged_emb": 7, "from_embeddingbag_list": 7, "minim": [7, 14, 17, 33], "heavi": 7, "big": [7, 18], "read": [7, 19], "futur": [7, 28, 34], "visit": [7, 33], "mergedembeddingbagwith": 7, "weight_decai": [7, 19], "grad": [7, 19], "creat": [7, 16, 20, 33, 34], "decai": 7, "to_bfloat16_train": 7, "merged_input": 7, "linearize_indices_and_offset": 7, "need_linearize_indices_and_offset": 7, "booltensor": 7, "becom": [7, 28, 33], "balanc": [7, 16, 22, 33], "embedingbag": 7, "often": 7, "categor": 7, "power": [7, 33, 34], "law": 7, "ag": 7, "video": 7, "game": 7, "19": [7, 30, 31, 32, 34], "29": [7, 31, 32], "row": 7, "write": [7, 17], "address": [7, 18, 31, 32, 33, 34], "conflict": [7, 17], "solv": [7, 19, 33], "togeth": [7, 14, 20, 33, 34], "immedi": 7, "right": [7, 21, 23, 28], "friendli": [7, 33], "gemm": [7, 18, 26, 28, 34], "aim": [7, 10, 16, 33], "math": 7, "wa": [7, 31, 32, 33, 34], "test": [7, 16, 17, 30, 34], "broad": [7, 9, 34], "toggl": 7, "switch": [7, 17, 31, 33, 34], "concern": 7, "footprint": [7, 21, 28, 34], "stick": 7, "splitsgd": [7, 21], "spawn": [7, 20], "subject": [7, 17, 20, 27, 34], "built": [7, 17, 20, 34], "deliv": [7, 28, 34], "separ": [7, 19, 27, 33], "smooth": 7, "ptq": 7, "tackl": 7, "problem": [7, 19, 26, 32, 33], "systemat": 7, "outlier": [7, 16], "commonli": [7, 28, 33, 34], "hopefulli": 7, "eas": [7, 18, 34], "small": [7, 19, 33, 34], "turn": [7, 34], "boolean": [7, 34], "off": [7, 8, 21, 28, 30, 34], "area": [7, 14], "extrem": [7, 14, 33], "situat": [7, 14], "huge": [7, 14, 33], "impract": [7, 14], "consum": [7, 14], "launcher": [7, 13, 31, 33, 34], "integr": [7, 18, 28, 33, 34], "conveni": [8, 34], "lower": [8, 17, 21, 28, 34], "becaus": [8, 17, 18, 21, 28, 33, 34], "lighter": 8, "smaller": [8, 17], "sacrif": 8, "trade": [8, 28, 30, 34], "slower": [8, 33, 34], "accur": 8, "primarili": [8, 34], "show": [8, 17, 21, 28, 29, 30, 31, 32, 33, 34], "simplenet": [8, 34], "super": [8, 10, 16, 20, 26, 34], "stride": [8, 10, 20, 34], "pad": [8, 10, 20, 34], "y": [8, 15, 16, 20, 21, 34], "chosen": [8, 14, 17], "maintain": 8, "categori": [8, 34], "circumst": 8, "imag": [8, 13, 18, 33, 34], "label": 8, "float64": 8, "suppli": 8, "addmm": 8, "addmm_": 8, "cannot": [8, 19, 26, 34], "describ": [8, 13, 18, 21, 32, 33], "expos": 8, "namespac": [8, 17], "regardless": [8, 34], "unlist": 8, "downstream": 8, "believ": [8, 18], "unstabl": 8, "conv1d": [8, 13], "conv3d": [8, 13, 34], "conv_transpose1d": 8, "conv_transpose2d": 8, "conv_transpose3d": 8, "bmm": [8, 34], "mm": 8, "baddbmm": 8, "addbmm": 8, "conv_tbc": 8, "group_norm": 8, "_native_multi_head_attent": 8, "avg_pool3d": 8, "binary_cross_entropi": 8, "grid_sampl": 8, "polar": 8, "prod": 8, "quantil": 8, "nanquantil": 8, "stft": 8, "cdist": 8, "view_as_complex": 8, "choleski": 8, "cholesky_invers": 8, "cholesky_solv": 8, "invers": 8, "lu_solv": 8, "matrix_rank": 8, "orgqr": 8, "ormqr": 8, "pinvers": 8, "max_unpool2d": 8, "max_unpool3d": 8, "adaptive_avg_pool3d": 8, "reflection_pad1d": 8, "reflection_pad2d": 8, "replication_pad1d": 8, "replication_pad2d": 8, "replication_pad3d": 8, "mse_loss": 8, "cosine_embedding_loss": 8, "nll_loss": 8, "nll_loss2d": 8, "hinge_embedding_loss": 8, "poisson_nll_loss": 8, "smooth_l1_loss": 8, "cross_entropy_loss": 8, "l1_loss": 8, "huber_loss": 8, "margin_ranking_loss": 8, "soft_margin_loss": 8, "triplet_margin_loss": 8, "multi_margin_loss": 8, "ctc_loss": 8, "kl_div": 8, "multilabel_margin_loss": 8, "binary_cross_entropy_with_logit": 8, "fft_fft": 8, "fft_ifft": 8, "fft_fft2": 8, "fft_ifft2": 8, "fft_fftn": 8, "fft_ifftn": 8, "fft_rfft": 8, "fft_irfft": 8, "fft_rfft2": 8, "fft_irfft2": 8, "fft_rfftn": 8, "fft_irfftn": 8, "fft_hfft": 8, "fft_ihfft": 8, "linalg_cond": 8, "linalg_matrix_rank": 8, "linalg_solv": 8, "linalg_choleski": 8, "linalg_svdv": 8, "linalg_eigv": 8, "linalg_eigvalsh": 8, "linalg_inv": 8, "linalg_householder_product": 8, "linalg_tensorinv": 8, "linalg_tensorsolv": 8, "fake_quantize_per_tensor_affin": 8, "eig": 8, "geqrf": 8, "lstsq": 8, "_lu_with_info": 8, "qr": 8, "svd": 8, "symeig": 8, "triangular_solv": 8, "fractional_max_pool2d": 8, "fractional_max_pool3d": 8, "adaptive_max_pool3d": 8, "multilabel_margin_loss_forward": 8, "linalg_qr": 8, "linalg_cholesky_ex": 8, "linalg_svd": 8, "linalg_eig": 8, "linalg_eigh": 8, "linalg_lstsq": 8, "linalg_inv_ex": 8, "cat": [8, 31, 32, 34], "index_copi": 8, "intervent": 8, "mixtur": [8, 34], "enable_auto_channels_last": 9, "disable_auto_channels_last": 9, "regress": [9, 34], "rais": 10, "oob": [10, 34], "easili": [10, 15], "who": 10, "inevit": 10, "simplifi": [10, 34], "snippet": [10, 29], "optimum": 10, "monkei": 10, "patch": [10, 34], "embedding_bag": 10, "qa": [10, 34], "clear": 10, "ninstanc": [10, 14, 31, 34], "ncore": [10, 31], "28": [10, 14, 16, 30, 31, 32, 33, 34], "run_qa": [10, 34], "model_name_or_path": [10, 29, 34], "dataset_nam": [10, 34], "squad": [10, 30, 34], "do_ev": [10, 34], "per_device_train_batch_s": [10, 34], "learning_r": [10, 34], "3e": [10, 34], "num_train_epoch": [10, 34], "max_seq_length": [10, 34], "384": [10, 32, 34], "doc_strid": [10, 34], "output_dir": [10, 14, 34], "tmp": [10, 32, 34], "debug_squad": [10, 34], "dummymodul": 10, "input1": 10, "kernel_s": 10, "7": [10, 14, 17, 20, 21, 31, 32, 34], "track_running_stat": 10, "customized_forward": 10, "method1": 10, "success": [10, 24], "method2": 10, "fail": [10, 26, 34], "top": [10, 21, 34], "unabl": 10, "hook": [10, 16], "As": [10, 19, 20, 28, 31, 32, 33, 34], "behaviour": 10, "repeat": [10, 18, 21], "feasibl": 10, "idea": [11, 21, 33], "primit": [11, 20, 30, 34], "portabl": 11, "hpc": 11, "ensur": [11, 19, 20, 32], "perf": [11, 18], "tri": 12, "failur": [12, 34], "incorrect": [12, 26, 34], "trigger": 12, "meanwhil": [12, 33, 34], "resnet50": [12, 13, 14, 18, 30, 31, 33, 34], "dag": 13, "acycl": 13, "straight": [13, 33], "cover": [13, 18, 31], "constant": 13, "resourc": [13, 20, 28, 32, 33], "focus": [13, 34], "front": [13, 34], "batchnorm": [13, 17, 18, 26, 34], "propag": [13, 21, 33], "graph_for": 13, "regard": 13, "rn50": [13, 34], "sum": [13, 16, 18, 19, 34], "convrelu": 13, "convsumrelu": 13, "default_static_qconfig": [13, 15, 32, 34], "quantized_model": [13, 15, 34], "244": 13, "convtranspose3d": 13, "ab": [13, 32], "clamp": 13, "elu": 13, "exp": 13, "hardtanh": 13, "hardswish": [13, 34], "mish": 13, "sigmoid": [13, 34], "pow": 13, "round": [13, 21], "squar": [13, 28], "tanh": [13, 34], "leaki": 13, "_": [13, 15, 16, 17, 18, 20, 30, 31, 32, 33, 34], "div": 13, "view": [13, 18, 20, 21], "transpos": [13, 34], "dequant": [13, 16], "partit": [13, 33], "leaky_relu": 13, "___": 13, "divid": [13, 32, 33, 34], "maxpool2d": 13, "_____": 13, "stock": [13, 30, 34], "owner": 13, "otheriws": 13, "compuat": 13, "wikipedia": [13, 33], "There": [14, 16, 20, 33, 34], "thing": [14, 33], "yaml": 14, "strategi": [14, 33, 34], "grid": 14, "random": 14, "max_trial": 14, "trial": 14, "record": [14, 32], "csv": 14, "hyperparam": 14, "mandatori": 14, "hp": 14, "ncores_per_inst": 14, "all_physical_cor": 14, "ncore_per_inst": [14, 34], "all_logical_cor": 14, "use_all_nod": 14, "num_nod": 14, "use_logical_cor": [14, 32], "is_hyperthreading_en": 14, "disable_numactl": [14, 32], "disable_iomp": [14, 32], "malloc": [14, 31, 33], "tc": 14, "je": 14, "previou": [14, 16, 18, 33, 34], "hyperparamt": 14, "8": [14, 16, 30, 31, 32, 33], "respect": [14, 16, 30, 31, 34], "maxim": 14, "statement": [14, 17], "higher_is_bett": 14, "target_v": 14, "inf": 14, "minimum": [14, 16, 18], "platinum": [14, 30, 32, 33], "8180m": [14, 33], "socket": [14, 30, 32, 33, 34], "anoth": [14, 31, 33, 34], "conf_fil": [14, 34], "hypertune_directori": 14, "termin": 14, "15": [14, 17, 30, 31, 32], "339081764221191": 14, "gave": 14, "side": [15, 33], "compon": [15, 26, 27, 28], "much": [15, 18, 21, 28, 33], "abl": 15, "similar": [15, 17, 33], "satisfi": [15, 26], "tradeoff": 15, "reduce_rang": 15, "methond": 15, "obsev": 15, "symmetr": 15, "sete": 15, "skylak": 15, "quant_stat": 15, "calibration_data_set": [15, 34], "qparam": 15, "And": [15, 20, 32, 34], "achang": 15, "overrid": 15, "load_qconf_summari": 15, "dynamic_qconfig": 15, "default_dynamic_qconfig": [15, 32], "per_tensor_symmetr": 15, "gru": 15, "lstmcell": 15, "rnncell": 15, "grucel": 15, "bother": 16, "desir": [16, 31], "receip": [16, 20], "sq": 16, "difficulti": 16, "vari": 16, "across": [16, 31], "herebi": 16, "obtain": 16, "abil": 16, "optdecoderlay": 16, "blockwis": 16, "consist": [16, 28, 33, 34], "major": 16, "adjust": 16, "accordingli": 16, "predict": 16, "criteria": 16, "consider": 16, "numpi": 16, "np": [16, 31], "tolist": 16, "auto_alpha_arg": 16, "init_alpha": [16, 22], "baselin": [16, 22, 34], "alpha_min": [16, 22], "alpha_max": [16, 22], "99": [16, 30, 34], "alpha_step": [16, 22], "step_siz": [16, 22], "shared_criterion": [16, 22], "enable_blockwise_loss": [16, 22], "portion": 16, "beginn": 16, "quickstart_tutori": 16, "training_data": 16, "fashionmnist": 16, "test_data": 16, "loader": 16, "train_dataload": 16, "test_dataload": 16, "neuralnetwork": 16, "flatten": [16, 20], "linear_relu_stack": 16, "sequenti": 16, "logit": 16, "loss_fn": 16, "pred": 16, "backpropag": 16, "item": 16, "7f": 16, "5d": 16, "epoch": 16, "argmax": 16, "inc": [16, 17, 22, 28], "accu": 16, "tuned_conf": 16, "explain": [17, 18, 21], "fork": [17, 33], "avx512_vnni": 17, "avx512_bf16": 17, "avx2": [17, 26, 34], "avx2_vnni": 17, "avx512_fp16": 17, "11": [17, 31, 32], "gcc": 17, "findavx": 17, "bodi": 17, "anonym": 17, "virtual": 17, "polymorph": 17, "pertain": 17, "cpuid": 17, "statu": 17, "pointer": 17, "system": [17, 33], "specifii": 17, "complier": 17, "isacodegen": 17, "suffix": 17, "adaptiveaveragepoolingkrnl": 17, "isa_codegen": 17, "o3": 17, "d__avx__": 17, "dcpu_capability_avx2": 17, "mavx2": 17, "mfma": 17, "mno": 17, "avx256": 17, "unalign": [17, 34], "dcpu_cap": 17, "dcpu_capability_default": 17, "d__avx512f__": 17, "mavx512f": 17, "mavx512bw": 17, "mavx512vl": 17, "mavx512dq": 17, "dcpu_capability_avx512": 17, "mavx512vnni": 17, "dcpu_capability_avx512_vnni": 17, "mavx512bf16": 17, "dcpu_capability_avx512_bf16": 17, "mamx": 17, "tile": 17, "dcpu_capability_amx": 17, "mavx512fp16": 17, "dcpu_capability_avx512_fp16": 17, "align": [17, 18, 21, 34], "stead": 17, "sleef": 17, "width": [17, 18], "isa_nam": 17, "inlin": 17, "compat": [17, 21], "definit": [17, 21], "Such": 17, "But": [17, 18], "tip": 17, "newkernelkrnl": 17, "newkernel": 17, "header": 17, "special": [17, 18, 28], "fastest": 17, "cpuinfo": 17, "mykernel": 17, "fn_type": 17, "void": 17, "ipex_declare_dispatch": 17, "ipex_define_dispatch": 17, "ipex_register_dispatch": 17, "kcpu": 17, "declar": 17, "ideep": [17, 18], "common": [17, 21, 28, 31, 33], "intrins": 17, "cvtfp32tobf16": 17, "pragma": 17, "torch_ipex": [17, 34], "cvt_fp32_to_bf16": 17, "dst": 17, "cvt_fp32_to_bf16_kernel_impl": 17, "cvt_fp32_to_bf16_kernel_fn": 17, "cvt_fp32_to_bf16_kernel_stub": 17, "macro": 17, "cpu_capability_avx512": 17, "cpu_capability_avx512_bf16": 17, "hav": 17, "cvtfp32tobf16krnl": 17, "vec512": 17, "vec256": 17, "endif": 17, "immintrin": 17, "__m256i": 17, "_cvt_fp32_to_bf16": 17, "__m512": 17, "reinterpret_cast": 17, "_mm512_cvtneps_pbh": 17, "__m512i": 17, "_mm512_castps_si512": 17, "nan": [17, 34], "_mm512_set1_epi32": 17, "0xffff": 17, "mask_valu": 17, "_mm512_cmp_ps_mask": 17, "_cmp_ord_q": 17, "0x1": 17, "vec_bia": 17, "0x7fff": 17, "uint32_t": 17, "lsb": 17, "t_valu": 17, "_mm512_and_si512": 17, "_mm512_srli_epi32": 17, "rounding_bia": 17, "_mm512_add_epi32": 17, "_mm512_mask_blend_epi32": 17, "_mm512_cvtusepi32_epi16": 17, "f32": [17, 18], "_mm512_loadu_p": 17, "_mm256_storeu_si256": 17, "_mm512_maskz_loadu_p": 17, "_mm256_mask_storeu_epi16": 17, "getveclength": 17, "get_cpp_typesize_and_vecs": 17, "scalartyp": 17, "get_cpp_typesize_and_vecsize_kernel_impl": 17, "get_cpp_typesize_and_vecsize_kernel_fn": 17, "get_cpp_typesize_and_vecsize_kernel_stub": 17, "types": 17, "vectors": 17, "getveclengthkrnl": 17, "doubl": 17, "make_tupl": 17, "sizeof": 17, "complexdoubl": 17, "complex": 17, "complexfloat": 17, "decltyp": 17, "impl": 17, "scalartypetocpptyp": 17, "torch_check": 17, "09": [17, 31], "58": [17, 31], "anaconda": 17, "copyright": [17, 27], "credit": 17, "licens": 17, "_c": [17, 26], "_get_current_isa_level": 17, "_get_highest_cpu_support_isa_level": 17, "_get_highest_binary_support_isa_level": 17, "quit": [17, 34], "By": [17, 31, 33], "aten_cpu_cap": 17, "effect": [17, 21, 26, 32, 33], "intern": [17, 18, 20, 32], "purpos": [17, 31, 32, 33], "addtion": 17, "tool": [17, 33, 34], "subfold": 17, "rh": 17, "toolset": 17, "33": [17, 31, 32], "cmakefil": 17, "cpu_featur": 17, "dir": [17, 31], "66": [17, 31, 34], "cpu_feature_main": 17, "xcr0": 17, "00000000000602e7": 17, "mmx": 17, "sse": 17, "sse2": 17, "sse3": 17, "ssse3": 17, "sse4_1": 17, "sse4_2": 17, "aes_ni": 17, "sha": 17, "xsave": 17, "fma": 17, "f16c": 17, "avx_vnni": 17, "avx512_f": 17, "avx512_cd": 17, "avx512_pf": 17, "avx512_er": 17, "avx512_vl": 17, "avx512_bw": 17, "avx512_dq": 17, "avx512_ifma": 17, "avx512_vbmi": 17, "avx512_vpopcntdq": 17, "avx512_4fmap": 17, "avx512_4vnniw": 17, "avx512_vbmi2": 17, "avx512_vpclmul": 17, "avx512_bitalg": 17, "avx512_vp2intersect": 17, "amx_bf16": 17, "amx_til": 17, "amx_int8": 17, "prefetchw": 17, "prefetchwt1": 17, "represent": 18, "multidimension": 18, "arrai": 18, "nd": 18, "1d": 18, "semant": 18, "attribut": 18, "coo": 18, "canon": 18, "assign": [18, 32, 33], "2d": 18, "height": 18, "illustr": [18, 19, 21, 31, 33], "actual": [18, 21], "bmp": 18, "contiguous_format": [18, 33], "tensorflow": 18, "close": [18, 31, 33], "to_mkldnn": 18, "difficult": 18, "manipul": 18, "to_dens": 18, "natur": [18, 21, 28], "hold": [18, 33], "secret": 18, "ingredi": 18, "almost": 18, "foundat": [18, 33], "upper": [18, 33], "fact": [18, 33], "expens": 18, "benefici": 18, "nb": 18, "me": 18, "roughli": 18, "50": [18, 31, 32], "mkldnn": 18, "mkldnn_util": 18, "subsequ": [18, 33], "concept": [18, 33], "diagram": [18, 33], "hard": [18, 26], "conclus": 18, "necessari": 18, "neglig": 18, "move": [18, 33], "organ": 18, "question": [18, 30], "reinterpret": 18, "answer": [18, 30], "chw": 18, "hw": 18, "stride_n": 18, "stride_c": 18, "stride_h": 18, "stride_w": 18, "merit": 18, "express": [18, 34], "noncontigu": 18, "n1": 18, "n2": 18, "mind": [18, 32], "someth": 18, "reli": [18, 20], "rfc": 18, "hwc": 18, "wc": 18, "chwn": 18, "hwn": 18, "wn": 18, "empti": [18, 31], "outplac": [18, 34], "is_contigu": 18, "_appli": 18, "brief": [18, 28, 34], "imagenet": [18, 30], "spontan": 18, "tell": [18, 20, 33], "NOT": [18, 31], "compris": 18, "explicit": [18, 20, 33], "implicit": 18, "tensoriter": 18, "guidelin": 18, "awar": [18, 20, 31, 32], "my": 18, "upsampl": [18, 34], "cudnn": 18, "accommod": 18, "md": 18, "format_tag": 18, "src_md": 18, "desc": 18, "data_typ": 18, "src_mem": 18, "src_data_ptr": 18, "card": 18, "hwio": 18, "resnext101": [18, 34], "detectron2": 18, "8x": 18, "lamb": [19, 21], "adagrad": [19, 21], "clr": 19, "lr_decai": 19, "state_sum": 19, "addcmul_": 19, "add_": 19, "addcdiv_": 19, "whole": [19, 20, 33], "storag": 19, "onboard": [19, 33], "third": [19, 34], "high": [19, 21, 33], "bound": [19, 20, 28, 33], "bottl": 19, "neck": 19, "prevent": 19, "pseudo": [19, 21, 34], "adagrad_fused_step": 19, "group": [19, 20, 33], "grad0": 19, "grad1": 19, "grad_n": 19, "param_n": 19, "state_sum_n": 19, "adagrad_step": 19, "grad_i": 19, "param_i": 19, "state_sum_i": 19, "other_arg": 19, "coupl": [20, 33, 34], "omp": [20, 26, 31, 32, 33, 34], "ld_preload": [20, 31, 32, 33], "libiomp5": [20, 31, 32, 33], "model_script": 20, "examplenet": 20, "examplenet1": 20, "x1": 20, "start_dim": 20, "examplenet2": 20, "conv2": 20, "x2": 20, "y1": 20, "y2": 20, "model1": 20, "traced_model1": 20, "model2": 20, "traced_model2": 20, "multi_stream_model": [20, 34], "datatyp": [20, 34], "receipt": 20, "steam": [20, 34], "input_hint": 20, "output_hint": 20, "pthread": 20, "async": [20, 34], "wake": 20, "synchron": [20, 26, 34], "imper": [20, 34], "suffer": 20, "gil": 20, "hurt": 20, "mitig": [20, 30], "omp_num_thread": [20, 26, 31, 32, 34], "phase": 20, "s1": 20, "c1": 20, "numactl": [20, 31, 32], "outsid": 20, "superset": 20, "undefin": [20, 33], "gb": 20, "simultan": 20, "correspond": [20, 31, 34], "cpu_pool1": 20, "cpu_pool2": 20, "task1": 20, "task2": 20, "y1_futur": 20, "y2_futur": 20, "y_runtim": 20, "kmp_": 20, "fulfil": 20, "worker": [20, 31], "serv": [20, 34], "sub": [20, 28, 33], "wait": [20, 33], "futuretensor": 20, "didn": 20, "dlopen": 20, "symbol": 20, "bottom": 21, "bit": [21, 28], "sign": 21, "expon": 21, "mantissa": 21, "23": [21, 31, 32], "capac": [21, 30], "digit": 21, "shorter": [21, 28], "fewer": 21, "neg": 21, "disadvantag": 21, "shift": 21, "left": [21, 28, 32], "lose": 21, "decim": 21, "valid": [21, 34], "1234500000": 21, "0000012345": 21, "1234512345": 21, "sens": 21, "fraction": 21, "12345": 21, "00000": 21, "signific": 21, "bui": 21, "involv": 21, "ground": 21, "truth": 21, "chain": 21, "rule": [21, 34], "meet": [21, 33, 34], "wide": [21, 34], "understand": [21, 28, 33], "formula": 21, "\u03b1": 21, "gw": 21, "denot": 21, "receiv": 21, "rate": 21, "earlier": 21, "inaccur": 21, "exactli": 21, "kept": 21, "halv": 21, "recov": 21, "fp32_w": 21, "concat_fp32_from_bf16": 21, "bf16_w": 21, "fp32_gw": 21, "bf16_gw": 21, "weight_dacai": 21, "split_bf16_from_fp32": 21, "ratio": [22, 30, 34], "beta": [23, 26], "demostr": 23, "cheat": 23, "sheet": 23, "pypi": [26, 34], "occupi": 26, "remark": [26, 30, 33], "__name__": [26, 34], "__main__": [26, 31, 32, 34], "112": [26, 30, 33, 34], "nnc": 26, "poor": [26, 34], "xlm": 26, "roberta": [26, 34], "casual": 26, "gpt2": 26, "summar": 26, "classif": [26, 30], "allenai": 26, "longform": 26, "409": 26, "workaround": [26, 34], "_jit_set_texpr_fuser_en": 26, "csrc": 26, "tensorexpr_fus": 26, "settensorexprfuseren": 26, "longer": [26, 30], "complic": [26, 31, 33], "undergo": [26, 29], "runtimeerror": [26, 34], "overflow": [26, 34], "unpack": [26, 34], "exce": [26, 30, 33, 34], "quantize_per_tensor": 26, "pseudocod": [26, 34], "omp_num_threa": 26, "set_num_thread": [26, 34], "freezed_model": [26, 34], "run_benchmark": [26, 34], "flow": 26, "bag": [26, 34], "progress": [26, 28, 34], "abnorm": [26, 34], "tbd": 26, "transformerencoderlay": 26, "encount": [26, 34], "rnnt": [26, 34], "joint_net": [26, 34], "caller": [26, 34], "apach": [27, 32], "notic": [27, 31, 32], "term": 27, "condit": 27, "multiheadattent": 28, "feedforward": 28, "lot": [28, 34], "besid": [28, 33, 34], "adopt": [28, 34], "modelfamili": 28, "hub": 28, "staticquantizationint8": 28, "onlyquantizationint8": 28, "onlyquantizationint4": 28, "13b": [28, 30, 34], "70b": [28, 34], "8b": 28, "20b": 28, "dolli": [28, 34], "databrick": 28, "v2": [28, 30, 34], "12b": 28, "tiiuae": 28, "40b": 28, "30b": 28, "3b": 28, "bigscienc": 28, "1b7": 28, "salesforc": 28, "2b": 28, "baichuan2": [28, 34], "chat": 28, "thudm": 28, "chatglm3": [28, 34], "chatglm2": [28, 34], "bigcod": 28, "starcod": [28, 34], "flan": 28, "xl": 28, "mosaicml": 28, "mistralai": 28, "v0": 28, "8x7b": 28, "stabilityai": 28, "1_6b": 28, "liuhaotian": 28, "v1": [28, 34], "microsoft": 28, "ieityuan": 28, "yuan2": 28, "102b": 28, "signifi": 28, "perfect": 28, "codellama": 28, "rope": 28, "past": 28, "year": 28, "flourish": 28, "contribut": [28, 31, 34], "research": 28, "web": 28, "legend": 28, "autotp": 28, "obviou": 28, "hotspot": 28, "lead": 28, "significantli": [28, 34], "heavier": 28, "io": 28, "occurr": 28, "ship": 28, "2nd": 28, "4th": [28, 30], "except": [28, 31], "beeter": 28, "Its": 28, "seen": 28, "woq": 28, "integ": [28, 33], "bandwidth": 28, "reorder_cach": 28, "beam_width": 28, "secondli": 28, "elimin": 28, "shard": 28, "content": [29, 34], "your_calibration_dataset": 29, "calib_sampl": 29, "calibration_model": 29, "qconfig_summary_file_path": 29, "nf4": 29, "init_distribut": 29, "get_acceler": 29, "communication_backend_nam": 29, "var": 29, "ondevic": 29, "init_infer": 29, "mp_size": 29, "base_dir": 29, "repo_root": 29, "checkpoints_json": 29, "zone": [30, 34], "articl": [30, 33], "llama2": [30, 34], "1024": [30, 33], "were": [30, 31, 32, 33], "carri": 30, "m7i": 30, "m6i": [30, 32], "47x": 30, "62x": 30, "57x": 30, "58x": 30, "85x": 30, "27x": 30, "38x": 30, "29x": 30, "36x": 30, "conclud": [30, 34], "respons": 30, "session": 30, "exhibit": 30, "wherea": 30, "p90": 30, "26x": 30, "sec": 30, "39": [30, 31, 32, 34], "26": [30, 31, 32], "49": [30, 31, 32], "170": 30, "21": [30, 31, 32], "measur": [30, 34], "17th": 30, "16xlarg": 30, "u": [30, 32], "west": 30, "ubuntu": 30, "04": [30, 31], "1009": 30, "sw": 30, "workload1": 30, "inference2": 30, "realtim": 30, "inference3": 30, "tunabl": [30, 32], "8380": 30, "30ghz": 30, "83x": 30, "44x": 30, "ssd": [30, 34], "resnet34": [30, 34], "16x": 30, "coco": 30, "1200": 30, "resnext": 30, "32x16d": 30, "81x": 30, "21x": 30, "vgg": 30, "75x": 30, "19x": 30, "shufflenetv2_x1": 30, "07x": 30, "78x": 30, "04x": 30, "max_seq_len": 30, "384task": 30, "jemalloc": [30, 32, 34], "05x": 30, "96x": 30, "mrpc": 30, "128task": 30, "distilbert": 30, "12x": 30, "dnnl": 30, "base_text_classif": 30, "f1": 30, "81": [30, 31], "79": [30, 31], "93": 30, "02": [30, 32], "85": [30, 31], "86": [30, 31], "top1": 30, "76": [30, 31], "75": [30, 31], "98": 30, "78": [30, 31], "199": 30, "48": [30, 31, 32], "vgg11": 30, "69": [30, 31], "67": [30, 31, 34], "96": 30, "44": [30, 31, 32], "36": [30, 31, 32], "92": 30, "97": 30, "shufflenet": 30, "histogram": [30, 34], "40": [30, 31, 32, 34], "ucod": 30, "0xd0002a0": 30, "ON": 30, "turboboost": 30, "bio": 30, "ddr": 30, "16gb": 30, "3200": 30, "dcpmm": 30, "256gb": 30, "host": [30, 34], "cento": 30, "2105": 30, "18": [30, 31, 32], "305": 30, "el8_4": 30, "x86_64": 30, "docker": [30, 34], "spectr": 30, "meltdown": 30, "24x": 30, "31x": 30, "15x": 30, "30x": 30, "mobilenet": 30, "08x": 30, "03x": 30, "09x": 30, "39x": 30, "35x": 30, "160": 30, "55x": 30, "06x": 30, "fpn": 30, "71x": 30, "20x": 30, "13x": 30, "32x": 30, "48x": 30, "11x": 30, "terabyt": 30, "14x": 30, "02x": 30, "10x": 30, "33x": 30, "8380h": 30, "90ghz": 30, "56": [30, 31, 32, 33], "67x": 30, "45x": 30, "77x": 30, "18x": 30, "formerli": [30, 33, 34], "0x700001c": 30, "wlydcrb1": 30, "sy": 30, "0016": 30, "p29": 30, "2006080250": 30, "64gb": 30, "768gb": 30, "influenc": [31, 33], "properli": 31, "themselv": [31, 34], "free": [31, 34], "mainli": [31, 34], "around": 31, "interpret": 31, "prefix": 31, "cross": [31, 32, 33, 34], "taskset": 31, "malloc_conf": [31, 33], "crash": [31, 33, 34], "nnode": 31, "nproc": 31, "count": 31, "addr": 31, "ip": 31, "hostnam": 31, "proc": 31, "port": 31, "hostfil": 31, "mpi": 31, "mpiexec": 31, "hydra": 31, "ppn": 31, "genv": 31, "i_mpi_pin_domain": 31, "codeless": 31, "ut": 31, "exclus": 31, "mutual": 31, "ld": 31, "favorit": 31, "kmp": [31, 33], "granular": [31, 32, 33], "compact": [31, 32, 33], "stdout": 31, "afterward": [31, 33], "undesir": 31, "_timestamp_inst": 31, "_timestamp_instance_": 31, "_core": 31, "run_20210712212258_inst": 31, "run_20210712212258_instance_0_cores_0": 31, "gif": 31, "07": 31, "764": 31, "conda_prefix": [31, 32], "virtual_env": [31, 32], "lib64": [31, 32], "home": [31, 32], "drop": [31, 32], "kmp_affin": [31, 32, 33], "kmp_blocktim": [31, 32, 33], "14": [31, 32, 34], "24": [31, 32], "25": [31, 32], "27": [31, 32, 33], "30": [31, 32], "31": [31, 32], "34": [31, 32], "35": [31, 32], "37": [31, 32, 34], "41": [31, 32], "42": [31, 32], "tee": 31, "run_20210712223308_inst": 31, "run_20210712223308_instance_0_cores_0": 31, "87": 31, "08": 31, "117": 31, "88": 31, "118": 31, "45": [31, 32], "46": [31, 32], "47": [31, 32], "51": [31, 32], "52": [31, 32], "53": [31, 32], "54": [31, 32], "55": [31, 32, 33], "57": 31, "59": 31, "60": 31, "61": 31, "62": 31, "63": [31, 34], "65": 31, "68": [31, 34], "70": 31, "71": 31, "72": 31, "73": 31, "74": 31, "77": 31, "82": 31, "83": [31, 33], "run_20210712214504_inst": 31, "run_20210712214504_instance_0_cores_22": 31, "513": 31, "run_20210712220928_inst": 31, "run_20210712220928_instance_0_cores_0": 31, "355": 31, "356": 31, "deduct": 31, "run_20210712221615_inst": 31, "run_20210712221615_instance_0_cores_11": 31, "591": 31, "run_20210712221150_inst": 31, "run_20210712221150_instance_0_cores_0": 31, "run_20210712221150_instance_1_cores_22": 31, "233": 31, "236": 31, "run_20210712221415_inst": 31, "run_20210712221415_instance_0_cores_0": 31, "run_20210712221415_instance_1_cores_4": 31, "run_20210712221415_instance_2_cores_8": 31, "run_20210712221415_instance_3_cores_12": 31, "run_20210712221415_instance_4_cores_16": 31, "run_20210712221415_instance_5_cores_20": 31, "run_20210712221415_instance_6_cores_24": 31, "run_20210712221415_instance_7_cores_28": 31, "run_20210712221415_instance_8_cores_32": 31, "run_20210712221415_instance_9_cores_36": 31, "run_20210712221415_instance_10_cores_40": 31, "140": 31, "143": 31, "146": 31, "149": 31, "151": 31, "154": 31, "157": 31, "159": 31, "162": 31, "164": 31, "167": 31, "run_20210712221305_inst": 31, "run_20210712221305_instance_0_cores_0": 31, "run_20210712221305_instance_1_cores_11": 31, "run_20210712221305_instance_2_cores_22": 31, "run_20210712221305_instance_3_cores_33": 31, "470": 31, "471": 31, "473": 31, "476": 31, "479": 31, "instance_idx": 31, "independ": 31, "confirm": 31, "175": 31, "176": 31, "177": 31, "run_20220106130151_instance_0_cores_0": 31, "sometim": [31, 33], "235": 31, "jemallocl": 31, "oversize_threshold": [31, 33], "background_thread": [31, 33], "metadata_thp": [31, 33], "dirty_decay_m": [31, 33], "9000000000": [31, 33], "muzzy_decay_m": [31, 33], "libjemalloc": 31, "run_20210713153048_instance_0_cores_0": 31, "654": 31, "libtcmalloc": [31, 32], "655": 31, "run_20210713153333_instance_0_cores_0": 31, "784": 31, "run_20210713153659_instance_0_cores_0": 31, "blocktim": 31, "00": [31, 34], "760": [31, 32], "761": [31, 32], "omp_schedul": [31, 33], "omp_proc_bind": [31, 33], "run_20210713152500_instance_0_cores_0": 31, "give": [32, 34], "ipex_en": 32, "procedur": 32, "tunin": 32, "dramat": [32, 33], "cpu_launcher_en": 32, "cpu_launcher_arg": 32, "hyperthread": 32, "present": 32, "ital": 32, "ptmalloc": 32, "use_default_alloc": [32, 34], "tcmalloc": 32, "enable_tcmalloc": 32, "enable_jemalloc": 32, "nth": [32, 33], "uniform": 32, "overlap": 32, "signficantli": 32, "8180": 32, "affinit": 32, "addition": 32, "kill": 32, "unutil": 32, "restart": 32, "remain": 32, "aliv": 32, "taken": 32, "care": 32, "worri": 32, "continu": [32, 34], "Then": 32, "interrupt": 32, "dummi": 32, "dummy_tensor": 32, "scheme": 32, "bert_int8_jit": 32, "n_iter": 32, "rn50_int8_jit": 32, "usus": 32, "rn50_ipex_int8": 32, "handler": 32, "image_classifi": 32, "similarli": 32, "bert_ipex_int8": 32, "transformer_handler_gener": 32, "setup_config": 32, "seq_classification_artifact": 32, "index_to_nam": 32, "nc": 32, "model_stor": 32, "server": [32, 33], "rest": 32, "model_log": 32, "096": 32, "8375c": 32, "03": 32, "981": 32, "982": 32, "previous": 32, "cases": 32, "223": 32, "site": 32, "model_service_work": 32, "sock": 32, "unix": 32, "9000": 32, "762": 32, "763": 32, "9001": 32, "274": 32, "9002": 32, "975": 32, "9003": 32, "bench": 32, "amazon": 32, "ec2": 32, "24xlarg": 32, "reproduc": 32, "url": [32, 34], "modelurl": 32, "inputpath": 32, "concurr": [32, 33], "huggingface_transform": 32, "sample_text_captum_input": 32, "graphic": 33, "xe": 33, "briefli": 33, "background": 33, "knowledg": 33, "c620": 33, "seri": 33, "chipset": 33, "purlei": 33, "chip": 33, "inclus": 33, "1mb": 33, "l2": 33, "2666": 33, "mhz": 33, "ddr4": 33, "six": 33, "ultra": 33, "interconnect": 33, "upi": 33, "microarchitectur": 33, "connect": 33, "transfer": 33, "equip": 33, "motherboard": 33, "attach": 33, "remot": 33, "asu": 33, "z11pa": 33, "d8": 33, "competit": 33, "stall": 33, "busi": 33, "uma": 33, "lscpu": 33, "retriev": 33, "111": 33, "50ghz": 33, "node0": 33, "node1": 33, "sophist": 33, "brought": [33, 34], "polici": 33, "put": 33, "sysctl": 33, "great": 33, "placement": 33, "cpunodebind": 33, "membind": 33, "multithread": 33, "primari": 33, "consecut": 33, "join": 33, "libgomp": 33, "libiomp": 33, "hang": [33, 34], "gomp_cpu_affin": 33, "comma": 33, "invalid": 33, "thrash": 33, "did": [33, 34], "compet": 33, "unus": 33, "proclist": 33, "millisecond": 33, "sleep": 33, "200m": 33, "period": 33, "elaps": 33, "overal": 33, "appropri": 33, "reserv": 33, "sole": 33, "penal": 33, "role": 33, "unnecessari": 33, "destruct": 33, "emphas": 33, "fragment": 33, "mmuzzy_decay_m": 33, "forg": 33, "dealloc": 33, "costli": 33, "gpertool": 33, "plu": 33, "pretti": 33, "nifti": 33, "analysi": 33, "gperftool": 33, "set_flush_denorm": 33, "warm": 33, "therefor": 33, "threshold": 33, "usuali": 33, "come": 33, "maskrcnn": [33, 34], "wav2vec2": 33, "recognit": 33, "onednn_primitive_cache_capac": 33, "65536": 33, "voic": 33, "excit": 34, "announc": 34, "accompani": 34, "privat": 34, "broader": 34, "sincer": 34, "encourag": 34, "feedback": 34, "creator": 34, "reach": 34, "hf_beam_sampl": 34, "hf_beam_search": 34, "hf_greedy_search": 34, "hf_sampl": 34, "walk": 34, "2561": 34, "2584": 34, "2617": 34, "2663": 34, "2733": 34, "act": 34, "2550": 34, "2568": 34, "2641": 34, "2675": 34, "2613": 34, "upgrad": 34, "v3": 34, "2747": 34, "misc": 34, "2468": 34, "2627": 34, "2631": 34, "2704": 34, "changelog": 34, "optimize_transform": 34, "your_generation_param": 34, "newli": 34, "varianc": 34, "encod": 34, "2349": 34, "2412": 34, "2469": 34, "2476": 34, "flash": 34, "2317": 34, "2334": 34, "2392": 34, "2480": 34, "elser": 34, "2491": 34, "public": 34, "2473": 34, "2511": 34, "2433": 34, "2253": 34, "2251": 34, "2236": 34, "2278": 34, "2257": 34, "dockerfil": 34, "ux": 34, "2229": 34, "2195": 34, "2299": 34, "2315": 34, "2283": 34, "2280": 34, "2292": 34, "2275": 34, "2319": 34, "2198": 34, "2264": 34, "2290": 34, "experiment": 34, "workflow": 34, "1563": 34, "excess": 34, "1677": 34, "1688": 34, "1664": 34, "lar": 34, "1695": 34, "dictionari": 34, "1682": 34, "2137": 34, "1568": 34, "1585": 34, "1590": 34, "1587": 34, "1594": 34, "old": 34, "hypervisor": 34, "vm": 34, "1513": 34, "1593": 34, "padding_mod": 34, "1580": 34, "1566": 34, "transnetv2": 34, "1564": 34, "rnn": 34, "avx512_core_vnni": 34, "1592": 34, "1589": 34, "1517": 34, "hero": 34, "inspir": 34, "stanford": 34, "consumpt": 34, "ve": 34, "1341": 34, "instancenorm": 34, "1330": 34, "1414": 34, "1473": 34, "1419": 34, "1488": 34, "webpag": 34, "1318": 34, "1353": 34, "1328": 34, "1355": 34, "1367": 34, "1384": 34, "1295": 34, "1392": 34, "1376": 34, "1373": 34, "1338": 34, "1391": 34, "1322": 34, "usabl": 34, "effort": 34, "cv": 34, "refin": 34, "identifi": 34, "torchrun": 34, "shortcut": 34, "mkl": 34, "sgemm": 34, "geomean": 34, "auto_ipex": 34, "hood": 34, "calibrated_model": 34, "model_to_be_calibr": 34, "992": 34, "64byte": 34, "addlayernorm": 34, "retinanet": 34, "1032": 34, "1053": 34, "1074": 34, "tightli": 34, "matur": 34, "offlin": 34, "becam": 34, "bake": 34, "wave2vec": 34, "albert": 34, "facilit": 34, "minmax": 34, "movingaverageminmax": 34, "polish": 34, "flexibl": 34, "quantconf": 34, "multi_stream_input_hint": 34, "multi_stream_output_hint": 34, "adam": 34, "822": 34, "3d": 34, "642": 34, "deconv3d": 34, "692": 34, "787": 34, "swish": 34, "fsi": 34, "risk": 34, "551": 34, "leakyrelu": 34, "589": 34, "407": 34, "647": 34, "convolution1d": 34, "657": 34, "einsum": 34, "alphafold2": 34, "674": 34, "711": 34, "threa": 34, "slow": 34, "equival": 34, "joint": 34, "net": 34, "pend": 34, "648": 34, "684": 34, "685": 34, "dockerhub": 34, "wheel": 34, "sdk": 34, "2x": 34, "5x": 34, "reduct": 34, "center": 34, "deploi": 34, "u8": 34, "s8": 34, "satur": 34, "occur": 34, "u7": 34, "unsign": 34, "s7": 34, "worth": 34, "upload": 34, "pip3": 34, "whl": 34, "220mb": 34, "5mb": 34, "dep": 34, "220m": 34, "cxx11": 34, "224m": 34, "7m": 34, "5m": 34, "qkv": 34, "278": 34, "531": 34, "432": 34, "438": 34, "602": 34, "sliu": 34, "hardsigmoid": 34, "relu6": 34, "selu": 34, "524": 34, "452": 34, "425": 34, "100mb": 34, "40mb": 34, "meant": 34, "resolv": 34, "te": 34, "wrap": 34, "bactchnorm": 34, "205": 34, "straightforward": 34, "underhood": 34, "torchvison": 34, "hugginfac": 34, "legal": 34, "resnet18": 34, "resnet18_xpu": 34, "enable_auto_mixed_precis": 34, "mixed_dtyp": 34, "mymodel": 34, "xx_c": 34, "xx_v": 34, "clibrat": 34, "ampconf": 34, "automixprecis": 34, "running_mod": 34, "cali_dataset": 34, "trace_model": 34, "omp_set_num_thread": 34, "model_execut": 34, "same_model_execution_again": 34, "descriptor": 34, "rc3": 34, "parti": 34, "49786": 34, "rc": 34, "readm": 34, "stakehold": 34, "5rc3": 34, "dpcpp": 34, "heterogen": 34, "bfp16": 34, "proper": 34, "tacotron2": 34, "frozenbatchnorm": 34, "embeddingbad": 34, "daili": 34, "resnext3d": 34, "maskrnn": 34, "codenam": 34, "mlp": 34, "eltwis": 34, "7x": 34, "enable_auto_optim": 34, "streamlin": 34, "enable_auto_mix_precis": 34, "inject": 34, "resnet3d": 34, "fb": 34, "yolov3": 34, "maxpool": 34}, "objects": {"": [[2, 0, 0, "-", "intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu": [[2, 0, 0, "-", "runtime"]], "intel_extension_for_pytorch.cpu.runtime": [[2, 1, 1, "", "CPUPool"], [2, 1, 1, "", "MultiStreamModule"], [2, 1, 1, "", "MultiStreamModuleHint"], [2, 1, 1, "", "Task"], [2, 2, 1, "", "get_core_list_of_node_id"], [2, 2, 1, "", "is_runtime_ext_enabled"], [2, 1, 1, "", "pin"]], "intel_extension_for_pytorch": [[2, 2, 1, "", "enable_onednn_fusion"], [2, 2, 1, "", "fast_bert"], [2, 0, 0, "-", "llm"], [2, 2, 1, "", "optimize"], [2, 0, 0, "-", "quantization"], [2, 1, 1, "", "verbose"]], "intel_extension_for_pytorch.llm": [[2, 0, 0, "-", "functional"], [2, 0, 0, "-", "modules"], [2, 2, 1, "", "optimize"]], "intel_extension_for_pytorch.llm.functional": [[2, 2, 1, "", "fast_layer_norm"], [2, 2, 1, "", "indirect_access_kv_cache_attention"], [2, 2, 1, "", "rms_norm"], [2, 2, 1, "", "rotary_embedding"], [2, 2, 1, "", "varlen_attention"]], "intel_extension_for_pytorch.llm.modules": [[2, 1, 1, "", "FastLayerNorm"], [2, 1, 1, "", "IndirectAccessKVCacheAttention"], [2, 1, 1, "", "Linear2SiluMul"], [2, 1, 1, "", "LinearAdd"], [2, 1, 1, "", "LinearAddAdd"], [2, 1, 1, "", "LinearGelu"], [2, 1, 1, "", "LinearMul"], [2, 1, 1, "", "LinearNewGelu"], [2, 1, 1, "", "LinearRelu"], [2, 1, 1, "", "LinearSilu"], [2, 1, 1, "", "LinearSiluMul"], [2, 1, 1, "", "PagedAttention"], [2, 1, 1, "", "RMSNorm"], [2, 1, 1, "", "RotaryEmbedding"], [2, 1, 1, "", "VarlenAttention"]], "intel_extension_for_pytorch.nn": [[7, 1, 1, "", "FrozenBatchNorm2d"]], "intel_extension_for_pytorch.nn.functional": [[7, 2, 1, "", "interaction"]], "intel_extension_for_pytorch.nn.modules": [[7, 1, 1, "", "MergedEmbeddingBag"], [7, 1, 1, "", "MergedEmbeddingBagWithSGD"]], "intel_extension_for_pytorch.quantization": [[2, 2, 1, "", "autotune"], [2, 2, 1, "", "convert"], [2, 2, 1, "", "get_smooth_quant_qconfig_mapping"], [2, 2, 1, "", "prepare"]]}, "objtypes": {"0": "py:module", "1": "py:class", "2": "py:function"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "class", "Python class"], "2": ["py", "function", "Python function"]}, "titleterms": {"intel": [0, 1, 5, 6, 15, 30, 31, 32, 33], "extens": [0, 1, 5, 7, 15, 20, 26, 32], "pytorch": [0, 1, 5, 15, 18, 32], "cpu": [0, 2, 17, 18, 33], "isa": [0, 7, 17], "dynam": [0, 6, 7, 15, 17, 26], "dispatch": [0, 7, 17], "design": [0, 17, 20, 31], "doc": 0, "architectur": 1, "support": [1, 8, 10], "api": [2, 7, 9, 13, 16, 17, 18, 22, 25, 28, 29], "document": [2, 5, 25, 32, 33], "gener": [2, 26], "llm": [2, 6, 7, 23, 28, 30], "modul": [2, 10, 20, 28], "level": [2, 17, 28], "optim": [2, 7, 10, 13, 15, 19, 28, 29], "prototyp": [2, 6, 7, 10, 11, 12, 14, 16, 22, 28], "fast": [2, 6, 7, 11], "bert": [2, 6, 7, 11, 32], "graph": [2, 7, 12, 13, 28], "quantiz": [2, 6, 7, 15, 16, 29], "runtim": [2, 7, 20, 26], "blog": 3, "public": 3, "cheat": 4, "sheet": 4, "contribut": 5, "develop": 5, "tip": 5, "debug": [5, 17], "unit": 5, "test": 5, "python": [5, 6, 7], "better": 5, "local": 5, "pytest": 5, "lint": 5, "c": [5, 6, 18], "write": [5, 18], "build": [5, 17], "exampl": [6, 10, 11, 12, 14, 16, 17, 20, 31], "train": [6, 8], "singl": [6, 28, 31], "instanc": [6, 28, 30, 31], "float32": [6, 8], "bfloat16": [6, 8, 21, 26, 30], "distribut": [6, 28, 29], "infer": [6, 8, 28, 29, 31, 32], "eager": [6, 8], "mode": [6, 28, 31], "resnet50": [6, 32], "torchscript": [6, 8], "torchdynamo": [6, 26], "beta": [6, 7], "new": [6, 7, 34], "featur": [6, 7, 11, 12, 17], "from": [6, 7], "2": [6, 7, 14, 32, 34], "0": [6, 7, 34], "int8": [6, 7, 13, 16, 26, 30, 32], "static": [6, 15], "calibr": [6, 15], "deploy": 6, "larg": [6, 7, 28], "languag": [6, 7, 28], "model": [6, 7, 13, 15, 18, 20, 28, 32], "fp32": [6, 10, 13, 29, 30], "bf16": [6, 10, 13, 29], "smooth": [6, 16, 22], "weight": [6, 29], "onli": [6, 29], "int4": 6, "ai": [6, 30], "refer": [6, 8], "easi": 7, "us": [7, 8, 9, 10, 13, 16, 20, 31], "1": [7, 14, 32, 34], "torch": 7, "compil": [7, 17], "auto": [7, 8, 9, 16, 20], "channel": [7, 9, 18, 33], "last": [7, 9, 18, 33], "mix": [7, 8], "precis": [7, 8, 28], "amp": [7, 8], "oper": [7, 18, 19, 28], "codeless": [7, 10], "13": [7, 34], "captur": [7, 12], "hypertun": [7, 14], "introduct": [8, 19, 25], "case": [8, 10, 20], "default": [8, 9, 14, 18, 31], "path": 8, "autocast": 8, "op": 8, "elig": 8, "specif": [8, 17], "behavior": 8, "can": 8, "promot": 8, "widest": 8, "input": [8, 20], "type": [8, 28], "eas": [9, 13], "enabl": 9, "disabl": 9, "known": [9, 20, 34], "issu": [9, 20, 34], "motiv": 10, "usag": [10, 11, 12, 14, 16, 20, 26, 29, 31], "huggingfac": 10, "The": 10, "origin": 10, "command": 10, "ipex": [10, 28], "launch": [10, 31], "appli": 10, "forward": 10, "method": 10, "explicitli": 10, "instead": 10, "__call__": 10, "attr": 10, "alreadi": 10, "jit": 10, "trace": 10, "descript": [11, 12], "prerequisit": 11, "methodologi": [13, 28], "fusion": [13, 19], "pattern": 13, "fold": 13, "your_conf_fil": 14, "hyperparamet": 14, "launcher": [14, 32], "defin": [14, 15], "search": 14, "space": 14, "tune": [14, 16, 22, 33], "user": 14, "your_python_script": 14, "qconfig": 15, "prepar": 15, "do": 15, "convert": 15, "deploi": [15, 32], "recip": [16, 20, 22], "autotun": 16, "algorithm": 16, "alpha": [16, 34], "fix": 16, "determin": 16, "through": 16, "overview": [17, 28, 30, 31, 33], "requir": [17, 20], "code": 17, "folder": 17, "struct": 17, "kernel": [17, 18], "implement": [17, 20], "csrc": 17, "aten": [17, 18], "xyzkrnl": 17, "cpp": 17, "stub": 17, "xyz": 17, "h": 17, "dyndisp": 17, "dispatchstub": 17, "codegen": 17, "process": 17, "add": 17, "custom": [17, 28], "intrin": 17, "vec": 17, "privat": 17, "select": 17, "manual": 17, "check": 17, "what": [18, 34], "i": [18, 20, 31], "memori": [18, 31, 33], "format": 18, "all": [18, 31], "That": 18, "matter": 18, "nchw": 18, "b": 18, "nhwc": 18, "wip": 18, "block": 18, "nchw16c": 18, "stride": 18, "layout": 18, "tensor": 18, "creation": 18, "convers": 18, "d": 18, "coverag": 18, "statu": 18, "regist": [18, 32], "nativ": 18, "manner": 18, "onednn": [18, 33], "creat": [18, 32], "convolut": 18, "primit": [18, 33], "target": 18, "multistream": 20, "examples1": 20, "basic": 20, "examples2": 20, "set": 20, "examples3": 20, "structur": [20, 33], "output": 20, "perform": [20, 26, 30, 32, 33, 34], "asynchron": 20, "task": 20, "configur": [20, 30, 33], "core": [20, 31, 32], "bind": 20, "detail": 20, "how": 20, "iomp": 20, "preload": 20, "load": 20, "dure": 20, "split": 21, "sgd": 21, "stochast": 21, "gradient": 21, "descent": 21, "quant": 22, "quick": 23, "start": [23, 25, 32], "instal": [24, 32], "get": 25, "troubleshoot": 26, "regress": 26, "shape": 26, "result": [26, 34], "correct": 26, "licens": 27, "list": 28, "verifi": 28, "via": 28, "deepspe": [28, 29], "demo": 28, "linear": 28, "low": 28, "data": [28, 30], "indirect": 28, "access": [28, 33], "kv": 28, "cach": [28, 33], "transform": 29, "frontend": 29, "pseudocod": 29, "common": 29, "scenario": 29, "smoothquant": 29, "woq": 29, "center": 30, "product": 30, "v1": 30, "11": [30, 34], "number": [30, 31, 33], "accuraci": 30, "softwar": [30, 33], "version": 30, "hardwar": [30, 33], "200": [30, 34], "an": 30, "aw": 30, "ec2": 30, "c6i": 30, "2xlarg": 30, "10": [30, 34], "script": 31, "guid": [31, 33], "physic": 31, "ii": 31, "includ": 31, "logic": 31, "iii": 31, "node": 31, "iv": 31, "your": 31, "multipl": 31, "v": 31, "throughput": 31, "vi": 31, "latenc": 31, "vii": 31, "viii": 31, "index": 31, "jemalloc": [31, 33], "tcmalloc": [31, 33], "alloc": [31, 33], "openmp": [31, 33], "librari": 31, "gnu": [31, 33], "torchserv": 32, "content": [32, 33], "thi": [32, 33], "serv": 32, "pin": 32, "boost": 32, "multi": 32, "worker": 32, "scale": 32, "export": 32, "serial": 32, "file": 32, "archiv": 32, "3": [32, 34], "4": 32, "benchmark": 32, "non": 33, "uniform": 33, "numa": 33, "numactl": 33, "omp_num_thread": 33, "omp_thread_limit": 33, "denorm": 33, "releas": 34, "highlight": 34, "100": 34, "12": 34, "300": 34, "": 34, "chang": 34, "9": 34, "8": 34, "improv": 34, "other": 34, "note": 34}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc": [[0, "intel-extension-for-pytorch-cpu-isa-dynamic-dispatch-design-doc"]], "Intel\u00ae Extension for PyTorch*": [[1, "intel-extension-for-pytorch"]], "Architecture": [[1, "architecture"]], "Support": [[1, "support"]], "API Documentation": [[2, "api-documentation"], [25, "api-documentation"]], "General": [[2, "general"]], "LLM Module Level Optimizations (Prototype)": [[2, "llm-module-level-optimizations-prototype"]], "Fast Bert (Prototype)": [[2, "fast-bert-prototype"], [6, "fast-bert-prototype"]], "Graph Optimization": [[2, "graph-optimization"], [7, "graph-optimization"], [13, "graph-optimization"], [28, "graph-optimization"]], "Quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "CPU Runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "Blogs & Publications": [[3, "blogs-publications"]], "Cheat Sheet": [[4, "cheat-sheet"]], "Contribution": [[5, "contribution"]], "Contributing to Intel\u00ae Extension for PyTorch*": [[5, "contributing-to-intel-extension-for-pytorch"]], "Developing Intel\u00ae Extension for PyTorch*": [[5, "developing-intel-extension-for-pytorch"]], "Tips and Debugging": [[5, "tips-and-debugging"]], "Unit testing": [[5, "unit-testing"]], "Python Unit Testing": [[5, "python-unit-testing"]], "Better local unit tests with pytest": [[5, "better-local-unit-tests-with-pytest"]], "Local linting": [[5, "local-linting"]], "C++ Unit Testing": [[5, "c-unit-testing"]], "Writing documentation": [[5, "writing-documentation"]], "Building documentation": [[5, "building-documentation"]], "Tips": [[5, "tips"]], "Examples": [[6, "examples"]], "Python": [[6, "python"]], "Training": [[6, "training"]], "Single-instance Training": [[6, "single-instance-training"]], "Float32": [[6, "float32"], [6, "id1"]], "BFloat16": [[6, "bfloat16"], [6, "id6"], [21, "bfloat16"], [26, "bfloat16"]], "Distributed Training": [[6, "distributed-training"]], "Inference": [[6, "inference"]], "Eager Mode": [[6, "eager-mode"], [6, "id7"]], "Resnet50": [[6, "resnet50"], [6, "id2"], [6, "id4"], [6, "id8"], [6, "id11"], [6, "id14"]], "BERT": [[6, "bert"], [6, "id3"], [6, "id5"], [6, "id9"], [6, "id12"], [6, "id15"], [32, "bert"]], "TorchScript Mode": [[6, "torchscript-mode"], [6, "id10"]], "TorchDynamo Mode (Beta, NEW feature from 2.0.0)": [[6, "torchdynamo-mode-beta-new-feature-from-2-0-0"], [6, "id13"]], "INT8": [[6, "int8"], [26, "int8"]], "Static Quantization": [[6, "static-quantization"], [15, "static-quantization"]], "Calibration": [[6, "calibration"]], "Deployment": [[6, "deployment"]], "Dynamic Quantization": [[6, "dynamic-quantization"], [15, "dynamic-quantization"]], "Large Language Model (LLM)": [[6, "large-language-model-llm"]], "FP32/BF16": [[6, "fp32-bf16"], [29, "fp32-bf16"]], "Smooth Quantization INT8": [[6, "smooth-quantization-int8"]], "Weight Only Quantization INT8/INT4": [[6, "weight-only-quantization-int8-int4"]], "C++": [[6, "c"]], "Intel\u00ae AI Reference Models": [[6, "intel-ai-reference-models"]], "Features": [[7, "features"]], "Easy-to-use Python API": [[7, "easy-to-use-python-api"]], "Large Language Models (LLM, NEW feature from 2.1.0)": [[7, "large-language-models-llm-new-feature-from-2-1-0"]], "torch.compile (Beta, NEW feature from 2.0.0)": [[7, "torch-compile-beta-new-feature-from-2-0-0"]], "ISA Dynamic Dispatching": [[7, "isa-dynamic-dispatching"], [17, "isa-dynamic-dispatching"]], "Auto Channels Last": [[7, "auto-channels-last"], [9, "auto-channels-last"]], "Auto Mixed Precision (AMP)": [[7, "auto-mixed-precision-amp"], [8, "auto-mixed-precision-amp"]], "Operator Optimization": [[7, "operator-optimization"]], "Optimizer Optimization": [[7, "optimizer-optimization"]], "Runtime Extension": [[7, "runtime-extension"], [20, "runtime-extension"], [26, "runtime-extension"]], "INT8 Quantization": [[7, "int8-quantization"]], "Codeless Optimization (Prototype, NEW feature from 1.13.0)": [[7, "codeless-optimization-prototype-new-feature-from-1-13-0"]], "Graph Capture (Prototype, NEW feature from 1.13.0)": [[7, "graph-capture-prototype-new-feature-from-1-13-0"]], "HyperTune (Prototype, NEW feature from 1.13.0)": [[7, "hypertune-prototype-new-feature-from-1-13-0"]], "Fast BERT Optimization (Prototype, NEW feature from 2.0.0)": [[7, "fast-bert-optimization-prototype-new-feature-from-2-0-0"]], "Introduction": [[8, "introduction"], [19, "introduction"], [25, "introduction"]], "Use Case": [[8, "use-case"]], "Default Precision": [[8, "default-precision"]], "Inference with Eager Path": [[8, "inference-with-eager-path"]], "Inference with TorchScript Path": [[8, "inference-with-torchscript-path"]], "Training Support": [[8, "training-support"]], "Autocast Op Reference": [[8, "autocast-op-reference"]], "Op Eligibility": [[8, "op-eligibility"]], "Op-Specific Behavior": [[8, "op-specific-behavior"]], "Ops that can autocast to bfloat16": [[8, "ops-that-can-autocast-to-bfloat16"]], "Ops that can autocast to float32": [[8, "ops-that-can-autocast-to-float32"]], "Ops that promote to the widest input type": [[8, "ops-that-promote-to-the-widest-input-type"]], "Ease-of-use auto channels last API": [[9, "ease-of-use-auto-channels-last-api"]], "default": [[9, "default"]], "enable": [[9, "enable"]], "disable": [[9, "disable"]], "Known issue": [[9, "known-issue"], [34, "known-issue"], [34, "id43"]], "Codeless Optimization (Prototype)": [[10, "codeless-optimization-prototype"]], "Motivation": [[10, "motivation"]], "Example Usage with HuggingFace": [[10, "example-usage-with-huggingface"]], "The origin command with ipex launch": [[10, "the-origin-command-with-ipex-launch"]], "Command to apply ipex optimization for FP32": [[10, "command-to-apply-ipex-optimization-for-fp32"]], "Command to apply ipex optimization for BF16": [[10, "command-to-apply-ipex-optimization-for-bf16"]], "Use Case not supported": [[10, "use-case-not-supported"]], "Module uses forward method explicitly instead of the __call__ attr": [[10, "module-uses-forward-method-explicitly-instead-of-the-call-attr"]], "Already using ipex.optimize": [[10, "already-using-ipex-optimize"]], "Already using Jit Trace": [[10, "already-using-jit-trace"]], "Fast BERT (Prototype)": [[11, "fast-bert-prototype"]], "Feature Description": [[11, "feature-description"], [12, "feature-description"]], "Prerequisite": [[11, "prerequisite"]], "Usage Example": [[11, "usage-example"], [12, "usage-example"], [16, "usage-example"]], "Graph Capture (Prototype)": [[12, "graph-capture-prototype"]], "Ease-of-use graph optimization API": [[13, "ease-of-use-graph-optimization-api"]], "FP32 and BF16 models": [[13, "fp32-and-bf16-models"]], "INT8 models": [[13, "int8-models"]], "Methodology": [[13, "methodology"]], "Fusion": [[13, "fusion"]], "FP32 and BF16 fusion patterns": [[13, "fp32-and-bf16-fusion-patterns"]], "INT8 fusion patterns": [[13, "int8-fusion-patterns"]], "Folding": [[13, "folding"]], "HyperTune (Prototype)": [[14, "hypertune-prototype"]], "Usage of Hypertune": [[14, "usage-of-hypertune"]], "your_conf_file": [[14, "your-conf-file"]], "Hyperparameters": [[14, "hyperparameters"]], "Launcher Hyperparameters": [[14, "launcher-hyperparameters"]], "Defining hyperparameters and their search spaces": [[14, "defining-hyperparameters-and-their-search-spaces"]], "1. Defining hyperparameters to tune:": [[14, "defining-hyperparameters-to-tune"]], "2. Defining the search spaces of the hyperparameters:": [[14, "defining-the-search-spaces-of-the-hyperparameters"]], "Default search space": [[14, "default-search-space"]], "User defined search space": [[14, "user-defined-search-space"]], "": [[14, "your-python-script"]], "Usage Examples": [[14, "usage-examples"], [31, "usage-examples"]], "Intel\u00ae Extension for PyTorch* optimizations for quantization": [[15, "intel-extension-for-pytorch-optimizations-for-quantization"]], "Define qconfig": [[15, "define-qconfig"]], "Prepare Model and Do Calibration": [[15, "prepare-model-and-do-calibration"]], "Convert to Static Quantized Model and Deploy": [[15, "convert-to-static-quantized-model-and-deploy"]], "Define QConfig": [[15, "id1"]], "Prepare Model": [[15, "prepare-model"]], "Convert to Dynamic Quantized Model and Deploy": [[15, "convert-to-dynamic-quantized-model-and-deploy"]], "INT8 Recipe Tuning API (Prototype)": [[16, "int8-recipe-tuning-api-prototype"]], "Smooth Quantization Autotune": [[16, "smooth-quantization-autotune"]], "Algorithm: Auto-tuning of $\\alpha$.": [[16, "algorithm-auto-tuning-of-alpha"]], "$\\alpha$ Usage": [[16, "alpha-usage"]], "Using a fixed alpha": [[16, "using-a-fixed-alpha"]], "Determining the alpha through auto-tuning": [[16, "determining-the-alpha-through-auto-tuning"]], "Overview": [[17, "overview"], [30, "overview"], [31, "overview"], [33, "overview"]], "CPU ISA build compiler requirement": [[17, "cpu-isa-build-compiler-requirement"]], "Dynamic Dispatch Design": [[17, "dynamic-dispatch-design"]], "Code Folder Struct": [[17, "code-folder-struct"]], "Kernel implementation: csrc/cpu/aten/kernels/xyzKrnl.cpp": [[17, "kernel-implementation-csrc-cpu-aten-kernels-xyzkrnl-cpp"]], "Kernel Stub: csrc/cpu/aten/xyz.cpp and csrc/cpu/aten/xyz.h": [[17, "kernel-stub-csrc-cpu-aten-xyz-cpp-and-csrc-cpu-aten-xyz-h"]], "Dispatch Stub implementation: csrc/cpu/dyndisp/DispatchStub.cpp and csrc/cpu/dyndisp/DispatchStub.h": [[17, "dispatch-stub-implementation-csrc-cpu-dyndisp-dispatchstub-cpp-and-csrc-cpu-dyndisp-dispatchstub-h"]], "CodeGen Process": [[17, "codegen-process"]], "Add Custom Kernel": [[17, "add-custom-kernel"]], "ISA intrinics specific kernel example:": [[17, "isa-intrinics-specific-kernel-example"]], "Vec specific kernel example:": [[17, "vec-specific-kernel-example"]], "Private Debug APIs": [[17, "private-debug-apis"]], "Example:": [[17, "example"], [17, "id1"]], "Select ISA level manually.": [[17, "select-isa-level-manually"]], "CPU feature check": [[17, "cpu-feature-check"]], "Channels Last": [[18, "channels-last"], [33, "channels-last"]], "What is Channels Last": [[18, "what-is-channels-last"]], "Memory Format Is All That Matters": [[18, "memory-format-is-all-that-matters"]], "a. NCHW (default)": [[18, "a-nchw-default"]], "b. NHWC (WIP for CPU)": [[18, "b-nhwc-wip-for-cpu"]], "c. Blocked (nChw16c)": [[18, "c-blocked-nchw16c"]], "PyTorch Strided Layout": [[18, "pytorch-strided-layout"]], "PyTorch Channels Last Memory Format APIs": [[18, "pytorch-channels-last-memory-format-apis"]], "a. tensor creation": [[18, "a-tensor-creation"]], "b. tensor conversion": [[18, "b-tensor-conversion"]], "c. model conversion": [[18, "c-model-conversion"]], "d. operator coverage": [[18, "d-operator-coverage"]], "Writing Channels Last Kernels": [[18, "writing-channels-last-kernels"]], "a. Status on CPU": [[18, "a-status-on-cpu"]], "b. Register Channels Last Kernel in ATen Native Manner": [[18, "b-register-channels-last-kernel-in-aten-native-manner"]], "c. Register oneDNN Kernel on Channels Last": [[18, "c-register-onednn-kernel-on-channels-last"]], "oneDNN NHWC APIs": [[18, "onednn-nhwc-apis"]], "a. Create NHWC Memory": [[18, "a-create-nhwc-memory"]], "b. Create Convolution Primitive": [[18, "b-create-convolution-primitive"]], "CPU Channels Last Targets": [[18, "cpu-channels-last-targets"]], "Optimizer Fusion": [[19, "optimizer-fusion"]], "Operation Fusion": [[19, "operation-fusion"]], "Requirements": [[20, "requirements"]], "Use Cases": [[20, "use-cases"]], "Example of MultiStream Module": [[20, "example-of-multistream-module"]], "Examples1: Basic Usage": [[20, "examples1-basic-usage"]], "Examples2: Usage with \u201cAUTO\u201d setting": [[20, "examples2-usage-with-auto-setting"]], "Examples3: Usage for models with structure inputs/outputs": [[20, "examples3-usage-for-models-with-structure-inputs-outputs"]], "Performance recipes": [[20, "performance-recipes"]], "Known issues": [[20, "known-issues"], [34, "id37"]], "Example of asynchronous task": [[20, "example-of-asynchronous-task"]], "Example of configuring core binding": [[20, "example-of-configuring-core-binding"]], "Detail Design": [[20, "detail-design"]], "How the core binding is implemented": [[20, "how-the-core-binding-is-implemented"]], "Design of Task": [[20, "design-of-task"]], "IOMP preload or load during the runtime": [[20, "iomp-preload-or-load-during-the-runtime"]], "Split SGD": [[21, "split-sgd"], [21, "id2"]], "Stochastic Gradient Descent (SGD)": [[21, "stochastic-gradient-descent-sgd"]], "Smooth Quant Recipe Tuning API (Prototype)": [[22, "smooth-quant-recipe-tuning-api-prototype"]], "Quick Start": [[23, "quick-start"]], "LLM Quick Start": [[23, "llm-quick-start"]], "Installation": [[24, "installation"]], "Get Started": [[25, "get-started"]], "Troubleshooting": [[26, "troubleshooting"]], "General Usage": [[26, "general-usage"]], "Performance Regression": [[26, "performance-regression"]], "TorchDynamo": [[26, "torchdynamo"]], "Dynamic Shape": [[26, "dynamic-shape"]], "Result Correctness": [[26, "result-correctness"]], "License": [[27, "license"]], "Large Language Models (LLM) Optimization Overview": [[28, "large-language-models-llm-optimization-overview"]], "ipex.llm Optimized Model List": [[28, "ipex-llm-optimized-model-list"]], "Verified for single instance mode": [[28, "verified-for-single-instance-mode"]], "Verified for distributed inference mode via DeepSpeed": [[28, "verified-for-distributed-inference-mode-via-deepspeed"]], "Module Level Optimization API for customized LLM (Prototype)": [[28, "module-level-optimization-api-for-customized-llm-prototype"]], "Demos": [[28, "demos"]], "Optimization Methodologies": [[28, "optimization-methodologies"]], "Linear Operator Optimization": [[28, "linear-operator-optimization"]], "Low Precision Data Types": [[28, "low-precision-data-types"]], "Indirect Access KV Cache": [[28, "indirect-access-kv-cache"]], "Distributed Inference": [[28, "distributed-inference"]], "Transformers Optimization Frontend API": [[29, "transformers-optimization-frontend-api"]], "Pseudocode of Common Usage Scenarios": [[29, "pseudocode-of-common-usage-scenarios"]], "SmoothQuant": [[29, "smoothquant"]], "Weight Only Quantization (WOQ)": [[29, "weight-only-quantization-woq"]], "Distributed Inference with DeepSpeed": [[29, "distributed-inference-with-deepspeed"]], "Performance": [[30, "performance"], [34, "performance"]], "Performance Data for Intel\u00ae AI Data Center Products": [[30, "performance-data-for-intel-ai-data-center-products"]], "LLM Performance": [[30, "llm-performance"]], "INT8 with v1.11": [[30, "int8-with-v1-11"]], "Performance Numbers": [[30, "performance-numbers"], [30, "id1"], [30, "id4"]], "Accuracy": [[30, "accuracy"]], "Configuration": [[30, "configuration"], [30, "id2"], [30, "id5"]], "Software Version": [[30, "software-version"], [30, "id3"], [30, "id6"]], "Hardware Configuration": [[30, "hardware-configuration"], [30, "id7"], [33, "hardware-configuration"]], "FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance": [[30, "fp32-with-v1-11-200-on-an-aws-ec2-c6i-2xlarge-instance"]], "FP32 and BFloat16 with v1.10": [[30, "fp32-and-bfloat16-with-v1-10"]], "Launch Script Usage Guide": [[31, "launch-script-usage-guide"]], "Usage of launch script": [[31, "usage-of-launch-script"]], "Single instance for inference": [[31, "single-instance-for-inference"]], "I. Use all physical cores": [[31, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[31, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on designated nodes": [[31, "iii-use-physical-cores-on-designated-nodes"]], "IV. Use your designated number of cores": [[31, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[31, "multiple-instances-for-inference"]], "V. Throughput mode": [[31, "v-throughput-mode"]], "VI. Latency mode": [[31, "vi-latency-mode"]], "VII. Your designated number of instances": [[31, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[31, "viii-your-designated-number-of-instances-and-instance-index"]], "Usage of Jemalloc/TCMalloc/Default memory allocator": [[31, "usage-of-jemalloc-tcmalloc-default-memory-allocator"]], "Jemalloc": [[31, "jemalloc"], [33, "jemalloc"]], "TCMalloc": [[31, "tcmalloc"], [33, "tcmalloc"]], "Default memory allocator": [[31, "default-memory-allocator"]], "Usage of OpenMP library": [[31, "usage-of-openmp-library"]], "Intel OpenMP Library": [[31, "intel-openmp-library"]], "GNU OpenMP Library": [[31, "gnu-openmp-library"]], "TorchServe with Intel\u00ae Extension for PyTorch*": [[32, "torchserve-with-intel-extension-for-pytorch"]], "Contents of this Document": [[32, "contents-of-this-document"], [33, "contents-of-this-document"]], "Install Intel\u00ae Extension for PyTorch*": [[32, "install-intel-extension-for-pytorch"]], "Serving model with Intel\u00ae Extension for PyTorch*": [[32, "serving-model-with-intel-extension-for-pytorch"]], "TorchServe with Launcher": [[32, "torchserve-with-launcher"]], "Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference": [[32, "launcher-core-pinning-to-boost-performance-of-torchserve-multi-worker-inference"]], "Scaling workers": [[32, "scaling-workers"]], "Creating and Exporting INT8 model for Intel\u00ae Extension for PyTorch*": [[32, "creating-and-exporting-int8-model-for-intel-extension-for-pytorch"]], "1. Creating a serialized file": [[32, "creating-a-serialized-file"]], "ResNet50": [[32, "resnet50"]], "2. Creating a Model Archive": [[32, "creating-a-model-archive"]], "3. Start TorchServe to serve the model": [[32, "start-torchserve-to-serve-the-model"]], "4. Registering and Deploying model": [[32, "registering-and-deploying-model"]], "Benchmarking with Launcher": [[32, "benchmarking-with-launcher"]], "Benchmarking with Launcher Core Pinning": [[32, "benchmarking-with-launcher-core-pinning"]], "Performance Boost with Intel\u00ae Extension for PyTorch* and Launcher": [[32, "performance-boost-with-intel-extension-for-pytorch-and-launcher"]], "Performance Tuning Guide": [[33, "performance-tuning-guide"]], "Intel CPU Structure": [[33, "intel-cpu-structure"]], "Non-Uniform Memory Access (NUMA)": [[33, "non-uniform-memory-access-numa"]], "Software Configuration": [[33, "software-configuration"]], "Numactl": [[33, "numactl"]], "OpenMP": [[33, "openmp"]], "OMP_NUM_THREADS": [[33, "omp-num-threads"]], "OMP_THREAD_LIMIT": [[33, "omp-thread-limit"]], "GNU OpenMP": [[33, "gnu-openmp"]], "Intel OpenMP": [[33, "intel-openmp"]], "Memory Allocator": [[33, "memory-allocator"]], "Denormal Number": [[33, "denormal-number"]], "OneDNN primitive cache": [[33, "onednn-primitive-cache"]], "Releases": [[34, "releases"]], "2.3.0": [[34, "id1"]], "Highlights": [[34, "highlights"], [34, "id3"], [34, "id5"], [34, "id7"], [34, "id9"], [34, "id11"], [34, "id13"], [34, "id15"], [34, "id18"], [34, "id21"], [34, "id24"], [34, "id26"], [34, "id29"]], "2.2.0": [[34, "id2"]], "2.1.100": [[34, "id4"]], "2.1.0": [[34, "id6"]], "2.0.100": [[34, "id8"]], "2.0.0": [[34, "id10"]], "Known Issues": [[34, "known-issues"], [34, "id16"], [34, "id22"], [34, "id30"]], "1.13.100": [[34, "id12"]], "1.13.0": [[34, "id14"]], "1.12.300": [[34, "id17"]], "1.12.100": [[34, "id19"]], "1.12.0": [[34, "id20"]], "1.11.200": [[34, "id23"]], "1.11.0": [[34, "id25"]], "What\u2019s Changed": [[34, "what-s-changed"], [34, "id31"]], "1.10.100": [[34, "id27"]], "1.10.0": [[34, "id28"]], "1.9.0": [[34, "id32"]], "What\u2019s New": [[34, "what-s-new"], [34, "id34"], [34, "id36"], [34, "id39"], [34, "id42"]], "1.8.0": [[34, "id33"]], "1.2.0": [[34, "id35"]], "Performance Improvement": [[34, "performance-improvement"]], "Others": [[34, "others"]], "1.1.0": [[34, "id38"]], "1.0.2": [[34, "id40"]], "1.0.1-Alpha": [[34, "alpha"]], "1.0.0-Alpha": [[34, "id41"]], "Performance Result": [[34, "performance-result"]], "NOTE": [[34, "note"]]}, "indexentries": {"cpupool (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.CPUPool"]], "fastlayernorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.FastLayerNorm"]], "indirectaccesskvcacheattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.IndirectAccessKVCacheAttention"]], "linear2silumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.Linear2SiluMul"]], "linearadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAdd"]], "linearaddadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAddAdd"]], "lineargelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearGelu"]], "linearmul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearMul"]], "linearnewgelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearNewGelu"]], "linearrelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearRelu"]], "linearsilu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSilu"]], "linearsilumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSiluMul"]], "multistreammodule (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModule"]], "multistreammodulehint (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModuleHint"]], "pagedattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.PagedAttention"]], "rmsnorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RMSNorm"]], "rotaryembedding (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RotaryEmbedding"]], "task (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.Task"]], "varlenattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.VarlenAttention"]], "autotune() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.autotune"]], "convert() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.convert"]], "enable_onednn_fusion() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.enable_onednn_fusion"]], "fast_bert() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.fast_bert"]], "fast_layer_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.fast_layer_norm"]], "get_core_list_of_node_id() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.get_core_list_of_node_id"]], "get_smooth_quant_qconfig_mapping() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.get_smooth_quant_qconfig_mapping"]], "indirect_access_kv_cache_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.indirect_access_kv_cache_attention"]], "intel_extension_for_pytorch": [[2, "module-intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu.runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "intel_extension_for_pytorch.llm": [[2, "module-intel_extension_for_pytorch.llm"]], "intel_extension_for_pytorch.llm.functional": [[2, "module-intel_extension_for_pytorch.llm.functional"]], "intel_extension_for_pytorch.llm.modules": [[2, "module-intel_extension_for_pytorch.llm.modules"]], "intel_extension_for_pytorch.quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "is_runtime_ext_enabled() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.is_runtime_ext_enabled"]], "module": [[2, "module-intel_extension_for_pytorch"], [2, "module-intel_extension_for_pytorch.cpu.runtime"], [2, "module-intel_extension_for_pytorch.llm"], [2, "module-intel_extension_for_pytorch.llm.functional"], [2, "module-intel_extension_for_pytorch.llm.modules"], [2, "module-intel_extension_for_pytorch.quantization"]], "optimize() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.optimize"]], "optimize() (in module intel_extension_for_pytorch.llm)": [[2, "intel_extension_for_pytorch.llm.optimize"]], "pin (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.pin"]], "prepare() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.prepare"]], "rms_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rms_norm"]], "rotary_embedding() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rotary_embedding"]], "varlen_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.varlen_attention"]], "verbose (class in intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.verbose"]], "frozenbatchnorm2d (class in intel_extension_for_pytorch.nn)": [[7, "intel_extension_for_pytorch.nn.FrozenBatchNorm2d"]], "mergedembeddingbag (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBag"]], "mergedembeddingbagwithsgd (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBagWithSGD"]], "interaction() (in module intel_extension_for_pytorch.nn.functional)": [[7, "intel_extension_for_pytorch.nn.functional.interaction"]]}}) \ No newline at end of file diff --git a/cpu/2.3.0+cpu/tutorials/api_doc.html b/cpu/2.3.0+cpu/tutorials/api_doc.html index 2be3c8ff2..5424314e8 100644 --- a/cpu/2.3.0+cpu/tutorials/api_doc.html +++ b/cpu/2.3.0+cpu/tutorials/api_doc.html @@ -421,13 +421,15 @@

LLM Module Level Optimizations (Prototype) class ipex.llm.modules.LinearSilu(linear)

Applies a linear transformation to the input data, and then apply PyTorch SILU -(see https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html) on the result:

-
-

result = torch.nn.functional.silu(linear(input))

-
+(see https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html) +on the result:

+
result = torch.nn.functional.silu(linear(input))
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with silu.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with silu.

@@ -451,9 +453,9 @@

LLM Module Level Optimizations (Prototype)https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html) on the result, and multiplies the result by other:

-
-

result = torch.nn.functional.silu(linear(input)) * other

-
+
result = torch.nn.functional.silu(linear(input)) * other
+
+
Parameters:

linear (torch.nn.Linear module) – the original torch.nn.Linear module to @@ -479,17 +481,20 @@

LLM Module Level Optimizations (Prototype)
class ipex.llm.modules.Linear2SiluMul(linear_s, linear_m)
-

Applies two linear transformation to the input data (linear_s and linear_m), then apply PyTorch SILU -(see https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html) on the result from linear_s -, and multiplies the result from linear_m:

-
-

result = torch.nn.functional.silu(linear_s(input)) * linear_m(input)

-
+

Applies two linear transformation to the input data (linear_s and +linear_m), then apply PyTorch SILU +(see https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html) +on the result from linear_s, and multiplies the result from linear_m:

+
result = torch.nn.functional.silu(linear_s(input)) * linear_m(input)
+
+
Parameters:
    -
  • linear_s (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with silu.

  • -
  • linear_m (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with mul.

  • +
  • linear_s (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with silu.

  • +
  • linear_m (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with mul.

@@ -513,13 +518,15 @@

LLM Module Level Optimizations (Prototype) class ipex.llm.modules.LinearRelu(linear)

Applies a linear transformation to the input data, and then apply PyTorch RELU -(see https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html) on the result:

-
-

result = torch.nn.functional.relu(linear(input))

-
+(see https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html) +on the result:

+
result = torch.nn.functional.relu(linear(input))
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with relu.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear module +to be fused with relu.

@@ -543,12 +550,13 @@

LLM Module Level Optimizations (Prototype)https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L50) on the result:

-
-

result = NewGELUActivation(linear(input))

-
+
result = NewGELUActivation(linear(input))
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with new_gelu.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear module +to be fused with new_gelu.

@@ -570,13 +578,15 @@

LLM Module Level Optimizations (Prototype) class ipex.llm.modules.LinearGelu(linear)

Applies a linear transformation to the input data, and then apply PyTorch GELU -(see https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html) on the result:

-
-

result = torch.nn.functional.gelu(linear(input))

-
+(see https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html) +on the result:

+
result = torch.nn.functional.gelu(linear(input))
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with gelu.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with gelu.

@@ -597,13 +607,15 @@

LLM Module Level Optimizations (Prototype)
class ipex.llm.modules.LinearMul(linear)
-
-
Applies a linear transformation to the input data, and then multiplies the result by other:

result = linear(input) * other

-
-
+

Applies a linear transformation to the input data, and then multiplies +the result by other:

+
result = linear(input) * other
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with mul.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear module +to be fused with mul.

@@ -625,13 +637,15 @@

LLM Module Level Optimizations (Prototype)
class ipex.llm.modules.LinearAdd(linear)
-
-
Applies a linear transformation to the input data, and then add the result by other:

result = linear(input) + other

-
-
+

Applies a linear transformation to the input data, +and then add the result by other:

+
result = linear(input) + other
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with add.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with add.

@@ -653,13 +667,15 @@

LLM Module Level Optimizations (Prototype)
class ipex.llm.modules.LinearAddAdd(linear)
-
-
Applies a linear transformation to the input data, and then add the result by other_1 and other_2:

result = linear(input) + other_1 + other_2

-
-
+

Applies a linear transformation to the input data, +and then add the result by other_1 and other_2:

+
result = linear(input) + other_1 + other_2
+
+
Parameters:
-

linear (torch.nn.Linear module) – the original torch.nn.Linear module to be fused with add and add.

+

linear (torch.nn.Linear module) – the original torch.nn.Linear +module to be fused with add and add.

@@ -682,34 +698,38 @@

LLM Module Level Optimizations (Prototype)
class ipex.llm.modules.RotaryEmbedding(max_position_embeddings: int, pos_embd_dim: int, base=10000, backbone: str | None = None)
-
-
[module init and forward] Applies RotaryEmbedding (see https://huggingface.co/papers/2104.09864)

on the query ` or `key before their multi-head attention computation.

-
-
+

[module init and forward] Applies RotaryEmbedding (see https://huggingface.co/papers/2104.09864) +on the query or key before their multi-head attention computation.

+

module init

Parameters:
    -
  • init (module) –

  • -
  • max_position_embeddings (-) – size (max) of the position embeddings.

  • -
  • pos_embd_dim (-) – dimension of the position embeddings.

  • -
  • base (-) – Default: 10000. Base to generate the frequency of position embeddings.

  • -
  • backbone (-) – Default: None. The exact transformers model backbone +

  • max_position_embeddings (int) – size (max) of the position embeddings.

  • +
  • pos_embd_dim (int) – dimension of the position embeddings.

  • +
  • base (int) – Default: 10000. Base to generate the frequency of position embeddings.

  • +
  • backbone (str) – Default: None. The exact transformers model backbone (e.g., “GPTJForCausalLM”, get from model.config.architectures[0], see https://huggingface.co/EleutherAI/gpt-j-6b/blob/main/config.json#L4).

  • -
  • forward

  • -
  • input (-) – input to be applied with position embeddings, +

+
+
+

forward()

+
+
Parameters:
+
    +
  • input (torch.Tensor) – input to be applied with position embeddings, taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim] (as well as the output shape).

  • -
  • position_ids (-) – the according position_ids for the input. +

  • position_ids (torch.Tensor) – the according position_ids for the input. The shape should be [batch size, sequence length. In some cases, there is only one element which the past_kv_length, and position id can be constructed by past_kv_length + current_position.

  • -
  • num_head (-) – head num from the input shape.

  • -
  • head_dim (-) – head dim from the input shape.

  • -
  • offset (-) – the offset value. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements, +

  • num_head (int) – head num from the input shape.

  • +
  • head_dim (int) – head dim from the input shape.

  • +
  • offset (int) – the offset value. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements, so the offset is 1. For llama, cos/sin is applied to the neighboring rotary_dim elements, so the offset is rotary_dim/2.

  • -
  • rotary_ndims (-) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

  • +
  • rotary_ndims (int) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

@@ -722,60 +742,62 @@

LLM Module Level Optimizations (Prototype)>>> query_rotery = rope_module(query, position_ids, 16, 256, 1, 64) -
-
[Direct function call] This module also provides a .apply_function function call to be used on query and key

at the same time without initializing the module (assume rotary embedding -sin/cos values are provided).

-
-
-

Args: -- query, key (torch.Tensor) : inputs to be applied with position embeddings, taking shape of

-
-

[batch size, sequence length, num_head/num_kv_head, head_dim] -or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

-
-
    -
  • sin/cos (torch.Tensor): [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.

  • -
  • rotary_ndims (int): the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

  • -
  • head_dim (int) : head dim from the input shape.

  • -
  • -
    rotary_half (bool)if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,

    so the offset is 1. +

    [Direct function call] This module also provides a .apply_function function call +to be used on query and key at the same time without initializing the module +(assume rotary embedding sin/cos values are provided).

    +

    apply_function()

    +
    +
    Parameters:
    +
      +
    • query (torch.Tensor) – inputs to be applied with position embeddings, taking shape of +[batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

    • +
    • key (torch.Tensor) – inputs to be applied with position embeddings, taking shape of +[batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

    • +
    • sin/cos (torch.Tensor) – [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.

    • +
    • rotary_ndims (int) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

    • +
    • head_dim (int) – head dim from the input shape.

    • +
    • rotary_half (bool) – if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements, +so the offset is 1. if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements, -so the offset is rotary_dim/2.

      +so the offset is rotary_dim/2.

    • +
    • position_ids (torch.Tensor) – Default is None and optional if sin/cos is provided. the according position_ids +for the input. The shape should be [batch size, sequence length].

    • +
    -
    -
  • -
  • -
    position_ids (torch.Tensor): Default is None and optional if sin/cos is provided. the according position_ids

    for the input. The shape should be [batch size, sequence length].

    +
    Returns:
    +

    [batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim].

    +
    +
    Return type:
    +

    query, key (torch.Tensor)

    -
  • -
-

Return -- query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim]

-
-

or [num_tokens, num_head/num_kv_head, head_dim].

-

class ipex.llm.modules.RMSNorm(hidden_size: int, eps: float = 1e-06, weight: Tensor | None = None)

[module init and forward] Applies RMSnorm on the input (hidden states). -(see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76) -:param module init: -:param - hidden_size: the size of the hidden states. -:type - hidden_size: int -:param - eps: the variance_epsilon to apply RMSnorm, default using 1e-6. -:type - eps: float -:param - weight: the weight to apply RMSnorm, default None and will use torch.ones(hidden_size). -:type - weight: torch.Tensor -:param forward: -:param - hidden_states: input to be applied RMSnorm, usually taking shape of

-
-

[batch size, sequence length, hidden_size] -(as well as the output shape).

-
+(see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76)

+

module init

+
+
Parameters:
+
    +
  • hidden_size (int) – the size of the hidden states.

  • +
  • eps (float) – the variance_epsilon to apply RMSnorm, default using 1e-6.

  • +
  • weight (torch.Tensor) – the weight to apply RMSnorm, default None +and will use torch.ones(hidden_size).

  • +
+
+
+

forward()

+
Parameters:
+

hidden_states (torch.Tensor) – input to be applied RMSnorm, usually taking shape of +[batch size, sequence length, hidden_size] (as well as the output shape).

+

Examples

>>> # module init:
@@ -785,36 +807,42 @@ 

LLM Module Level Optimizations (Prototype)>>> result = rmsnorm_module(input)

-
-
[Direct function call] This module also provides a .apply_function function call to apply RMSNorm without

initializing the module.

+

[Direct function call] This module also provides a .apply_function function +call to apply RMSNorm without initializing the module.

+

apply_function()

+
+
Parameters:
+
    +
  • hidden_states (torch.Tensor) – the input tensor to apply RMSNorm.

  • +
  • weight (torch.Tensor) – the weight to apply RMSnorm.

  • +
  • eps (float) – the variance_epsilon to apply RMSnorm.

  • +
-

Args: -- hidden_states(torch.Tensor) : the input tensor to apply RMSNorm. -- weight (torch.Tensor): the weight to apply RMSnorm. -- eps (float) : the variance_epsilon to apply RMSnorm.

class ipex.llm.modules.FastLayerNorm(normalized_shape: Tuple[int, ...], eps: float, weight: Tensor, bias: Tensor | None = None)

[module init and forward] Applies PyTorch Layernorm (see https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html) -on the input (hidden states). -:param module init: -:param - normalized_shape: -:type - normalized_shape: (int or list) or torch.Size -:param - eps: a value added to the denominator for numerical stability. -:type - eps: float -:param - weight: the weight of Layernorm to apply normalization. -:type - weight: torch.Tensor -:param - bias: an additive bias for normalization. -:type - bias: torch.Tensor -:param forward: -:param - hidden_states: input to be applied Layernorm, usually taking shape of

-
-

[batch size, sequence length, hidden_size] (as well as the output shape).

-
+on the input (hidden states).

+

module init

+
Parameters:
+
    +
  • normalized_shape ((int or list) or torch.Size) –

  • +
  • eps (float) – a value added to the denominator for numerical stability.

  • +
  • weight (torch.Tensor) – the weight of Layernorm to apply normalization.

  • +
  • bias (torch.Tensor) – an additive bias for normalization.

  • +
+
+
+

forward()

+
+
Parameters:
+

hidden_states (torch.Tensor) – input to be applied Layernorm, usually taking shape of +[batch size, sequence length, hidden_size] (as well as the output shape).

+

Examples

>>> # module init:
@@ -825,16 +853,20 @@ 

LLM Module Level Optimizations (Prototype)>>> result = layernorm_module(input)

-
-
[Direct function call] This module also provides a .apply_function function call to apply fast layernorm

without initializing the module.

+

[Direct function call] This module also provides a .apply_function function call to apply fast layernorm +without initializing the module.

+

apply_function()

+
+
Parameters:
+
    +
  • hidden_states (torch.Tensor) – the input tensor to apply normalization.

  • +
  • normalized_shape (int or list) or torch.Size) –

  • +
  • weight (torch.Tensor) – the weight to apply normalization.

  • +
  • bias (torch.Tensor) – an additive bias for normalization.

  • +
  • eps (float) – a value added to the denominator for numerical stability.

  • +
-

Args: -- hidden_states(torch.Tensor) : the input tensor to apply normalization. -- normalized_shape (int or list) or torch.Size) input shape from an expected input of size. -- weight (torch.Tensor): the weight to apply normalization. -- bias (torch.Tensor): an additive bias for normalization. -- eps (float): a value added to the denominator for numerical stability.

@@ -847,65 +879,67 @@

LLM Module Level Optimizations (Prototype) -
  • head_mask (torch.Tensor): Head mask tensor which is not supported by kernel yet.

  • -
  • attention_mask(torch.Tensor): Attention mask information.

  • - -

    Return: -- attn_output: weighted value which is the output of scale dot product. shape (beam*batch, seq_len, head_num, head_size). -- attn_weights: The output tensor of the first matmul in scale dot product which is not supported by kernel now. -- new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).

    -

    Notes: -- How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model

    -
    -
    -
    see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318)
    -
    def _reorder_cache(

    self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor

    -
    -
    ) -> Tuple[Tuple[torch.Tensor]]:
    -
    if (

    len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1

    -
    -
    ):
    -
    for layer_past in past_key_values:

    layer_past[3][layer_past[0].size(-2) - 1] = beam_idx

    -
    -
    -

    return past_key_values

    -
    -
    +

    module init

    +
    +
    Parameters:
    +

    text_max_length (int) – the max length of kv cache to be used +for generation (allocate the pre-cache buffer).

    +

    forward()

    +
    +
    Parameters:
    +
      +
    • query (torch.Tensor) – Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • key (torch.Tensor) – Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • value (torch.Tensor) – Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • scale_attn (float) – scale used by the attention layer. should be sqrt(head_size).

    • +
    • layer_past (tuple(torch.Tensor)) –

      tuple(seq_info, key_cache, value_cache, beam-idx).

      +
        +
      • key_cache: key cache tensor, shape: (max_seq, beam*batch, head_num, head_dim);

      • +
      • value_cache: value cache tensor, shape: (max_seq, beam*batch, head_num, head_dim);

      • +
      • beam-idx: history beam idx, shape:(max_seq, beam*batch);

      • +
      • seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).

      • +
      +

    • +
    • head_mask (torch.Tensor) – Head mask tensor which is not supported by kernel yet.

    • +
    • attention_mask (torch.Tensor) – Attention mask information.

    • +
    -
    -
    -
    -
    [Direct function call] This module also provides a .apply_function function call to apply IndirectAccessKVCacheAttention

    without initializing the module.

    +
    Returns:
    +

    Weighted value which is the output of scale dot product. +shape (beam*batch, seq_len, head_num, head_size).

    +

    attn_weights: The output tensor of the first matmul in scale dot product +which is not supported by kernel now.

    +

    new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).

    +

    -
    -

    Args: -- The parameters are the same as the forward call.

    +
    Return type:
    +

    attn_output

    +
    +

    +

    Notes

    +

    How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model +see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318)

    +
    def _reorder_cache(
    +    self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor
    +) -> Tuple[Tuple[torch.Tensor]]:
    +    if (
    +        len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1
    +    ):
    +        for layer_past in past_key_values:
    +            layer_past[3][layer_past[0].size(-2) - 1] = beam_idx
    +        return past_key_values
    +
    +
    +

    [Direct function call] This module also provides a .apply_function function call +to apply IndirectAccessKVCacheAttention without initializing the module.

    +

    The parameters of apply_function() are the same as the forward() call.

    @@ -916,109 +950,98 @@

    LLM Module Level Optimizations (Prototype) -
  • -
    value_cache (torch.Tensor): The pre-allocated buffer to store the value cache. The shape should be

    [num_blocks, block_size, num_heads, head_size].

    -
    -
    -
  • -
  • -
    slot_mapping (torch.Tensor): It stores the position to store the key/value in the pre-allocated buffers.

    The shape should be the number of sequences. For sequence _i_, the slot_mapping[i]//block_number -can get the block index, and the slot_mapping%block_size can get the offset of this block.

    -
    -
    -
  • +ipex.llm.modules.PagedAttention.reshape_and_cache(key, value, key_cache, value_cache, slot_mapping) +This operator is used to store the key/value token states into the pre-allcated kv_cache buffers of paged attention.

    +
    +
    Parameters:
    +
      +
    • key (torch.Tensor) – The keytensor. The shape should be [num_seqs, num_heads, head_size].

    • +
    • value (torch.Tensor) – The value tensor. The shape should be [num_seqs, num_heads, head_size].

    • +
    • key_cache (torch.Tensor) – The pre-allocated buffer to store the key cache. +The shape should be [num_blocks, block_size, num_heads, head_size].

    • +
    • value_cache (torch.Tensor) – The pre-allocated buffer to store the value cache. +The shape should be [num_blocks, block_size, num_heads, head_size].

    • +
    • slot_mapping (torch.Tensor) – It stores the position to store the key/value in the pre-allocated buffers. +The shape should be the number of sequences. For sequence i, the slot_mapping[i] // block_number +can get the block index, and the slot_mapping % block_size can get the offset of this block.

    -

    [class method]: single_query_cached_kv_attention -ipex.llm.modules.PagedAttention.single_query_cached_kv_attention(

    -
    -

    out, -query, -key_cache, -value_cache, -head_mapping, -scale, -block_tables, -context_lens, -block_size, -max_context_len, -alibi_slopes -)

    -
    -

    This operator is used to be calculated the scale-dot-product based on the paged attention. -Args: -- out (torch.Tensor): The output tensor with shape of [num_seqs, num_heads, head_size]. where the num_seqs

    -
    -

    is the number of the sequence in this batch. The num_heads means the number of query -head. head_size means the head dimension.

    -
    -
      -
    • query (torch.Tensor): The query tensor. The shape should be [num_seqs, num_heads, head_size].

    • -
    • -
      key_cache (torch.Tensor): The pre-allocated buffer to store the key cache. The shape should be

      [num_blocks, block_size, num_heads, head_size].

      -
      -
      -
    • -
    • -
      value_cache(torch.Tensor): The pre-allocated buffer to store the value cache. The shape should be

      [num_blocks, block_size, num_heads, head_size].

      -
    • -
    • -
      head_mapping(torch.Tensor): The mapping from the query head to the kv head. The shape should be

      the number of query heads.

      -
      -
      -
    • -
    • scale (float): The scale used by the scale-dot-product. In general, it is: float(1.0 / (head_size ** 0.5)).

    • -
    • -
      block_tables:(torch.Tensor): The mapping table used to mapping the logical sequence to the physical sequence.

      The shape should be [num_seqs, max_num_blocks_per_seq].

      +

      [class method]: single_query_cached_kv_attention

      +
      ipex.llm.modules.PagedAttention.single_query_cached_kv_attention(
      +                                                    out,
      +                                                    query,
      +                                                    key_cache,
      +                                                    value_cache,
      +                                                    head_mapping,
      +                                                    scale,
      +                                                    block_tables,
      +                                                    context_lens,
      +                                                    block_size,
      +                                                    max_context_len,
      +                                                    alibi_slopes
      +                                                    )
      +
      +
      +

      This operator is used to be calculated the scale-dot-product based on the paged attention.

      +
      +
      Parameters:
      +
        +
      • out (torch.Tensor) – The output tensor with shape of [num_seqs, num_heads, head_size], +where the num_seqs is the number of the sequence in this batch. The num_heads +means the number of query head. head_size means the head dimension.

      • +
      • query (torch.Tensor) – The query tensor. The shape should be [num_seqs, num_heads, head_size].

      • +
      • key_cache (torch.Tensor) – The pre-allocated buffer to store the key cache. +The shape should be [num_blocks, block_size, num_heads, head_size].

      • +
      • value_cache (torch.Tensor) – The pre-allocated buffer to store the value cache. +The shape should be [num_blocks, block_size, num_heads, head_size].

      • +
      • head_mapping (torch.Tensor) – The mapping from the query head to the kv head. +The shape should be the number of query heads.

      • +
      • scale (float) – The scale used by the scale-dot-product. +In general, it is: float(1.0 / (head_size ** 0.5)).

      • +
      • block_tables – (torch.Tensor): The mapping table used to mapping the logical sequence +to the physical sequence. The shape should be [num_seqs, max_num_blocks_per_seq].

      • +
      • context_lens (torch.Tensor) – The sequence length for every sequence. The size is [num_seqs].

      • +
      • block_size (int) – The block size which means the number of token in every block.

      • +
      • max_context_len (int) – The max sequence length.

      • +
      • alibi_slopes (torch.Tensor, optinal) – which is the alibi slope with the shape of (num_heads).

      • +
      -
    • -
    • context_lens (torch.Tensor): The sequence length for every sequence. The size is [num_seqs].

    • -
    • block_size (int): The block size which means the number of token in every block.

    • -
    • max_context_len (int): The max sequence length.

    • -
    • alibi_slopes (torch.Tensor, optinal): which is the alibi slope with the shape of (num_heads).

    • -
    class ipex.llm.modules.VarlenAttention
    -
    -
    [module init and forward] Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value

    (see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), +

    [module init and forward] Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value +(see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), and accept the variant (different) sequence length among the query, key and value.

    -
    -
    +

    This module does not have args for module init.

    +

    forward()

    Parameters:
      -
    • init (module) – this module does not have args for module init

    • -
    • forward

    • -
    • query (-) – shape [query_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • key (-) – shape [key_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • value (-) – shape [value_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • out (-) – buffer to get the results, the shape is the same as query.

    • -
    • seqlen_q (-) – shape [batch_size + 1], points the current query_tokens among total sequence length.

    • -
    • seqlen_k (-) – shape [batch_size + 1], points the current key_tokens among total sequence length.

    • -
    • max_seqlen_q (-) – max/total sequence length of query.

    • -
    • max_seqlen_k (-) – max/total sequence length of key.

    • -
    • pdropout (-) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.

    • -
    • softmax_scale (-) – scaling factor applied is prior to softmax.

    • -
    • is_causal (-) – whether to apply causal attention masking, default is True.

    • +
    • query (torch.Tensor) – shape [query_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • key (torch.Tensor) – shape [key_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • value (torch.Tensor) – shape [value_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • out (torch.Tensor) – buffer to get the results, the shape is the same as query.

    • +
    • seqlen_q (torch.Tensor) – shape [batch_size + 1], points the +current query_tokens among total sequence length.

    • +
    • seqlen_k (torch.Tensor) – shape [batch_size + 1], points the +current key_tokens among total sequence length.

    • +
    • max_seqlen_q (int) – max/total sequence length of query.

    • +
    • max_seqlen_k (int) – max/total sequence length of key.

    • +
    • pdropout (float) – dropout probability; if greater than 0.0, +dropout is applied, default is 0.0.

    • +
    • softmax_scale (float) – scaling factor applied is prior to softmax.

    • +
    • is_causal (bool) – whether to apply causal attention masking, default is True.

    @@ -1039,73 +1062,78 @@

    LLM Module Level Optimizations (Prototype)>>> varlenAttention_module(query, key, value, out, seqlen_q, seqlen_k, max_seqlen_q, max_seqlen_k, pdropout, softmax_scale) -
    -
    [Direct function call] This module also provides a .apply_function function call to apply VarlenAttention without

    initializing the module.

    -
    -
    -

    Args: -- The parameters are the same as the forward call.

    +

    [Direct function call] This module also provides a .apply_function +function call to apply VarlenAttention without initializing the module.

    +

    The parameters of apply_function() are the same as the forward() call.

    ipex.llm.functional.rotary_embedding(query: Tensor, key: Tensor, sin: Tensor, cos: Tensor, rotary_dim: int, rotary_half: bool, position_ids: Tensor | None = None)
    -
    -
    Applies RotaryEmbedding (see https://huggingface.co/papers/2104.09864)

    on the query ` or `key before their multi-head attention computation.

    -
    -
    -

    Args: -- query, key (torch.Tensor) : inputs to be applied with position embeddings, taking shape of

    -
    -

    [batch size, sequence length, num_head/num_kv_head, head_dim] -or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

    -
    -
      -
    • sin/cos (torch.Tensor): [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.

    • -
    • rotary_ndims (int): the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

    • -
    • head_dim (int) : head dim from the input shape.

    • -
    • -
      rotary_half (bool)if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,

      so the offset is 1. -if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements, +

      Applies RotaryEmbedding (see https://huggingface.co/papers/2104.09864) +on the query ` or `key before their multi-head attention computation.

      +
      +
      Parameters:
      +
        +
      • query (torch.Tensor) – inputs to be applied with position embeddings, +taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

      • +
      • key (torch.Tensor) – inputs to be applied with position embeddings, +taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).

      • +
      • sin/cos (torch.Tensor) – [num_tokens, rotary_dim] the sin/cos value tensor +generated to be applied on query/key.

      • +
      • rotary_ndims (int) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.

      • +
      • head_dim (int) – head dim from the input shape.

      • +
      • rotary_half (bool) –

        if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements, +so the offset is 1.

        +

        if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements, so the offset is rotary_dim/2.

        +

      • +
      • position_ids (torch.Tensor) – Default is None and optional if sin/cos is provided. +The according position_ids for the input. The shape should be [batch size, sequence length].

      • +
      -
    • -
    • -
      position_ids (torch.Tensor): Default is None and optional if sin/cos is provided. the according position_ids

      for the input. The shape should be [batch size, sequence length].

      +
      +
      Return

      query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim] +or [num_tokens, num_head/num_kv_head, head_dim].

      -
    • -
    -

    Return -- query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim]

    -
    -

    or [num_tokens, num_head/num_kv_head, head_dim].

    -
    ipex.llm.functional.rms_norm(hidden_states: Tensor, weight: Tensor, eps: float)

    Applies RMSnorm on the input (hidden states). -(see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76) -Args: -- hidden_states(torch.Tensor) : the input tensor to apply RMSNorm. -- weight (torch.Tensor): the weight to apply RMSnorm. -- eps (float) : the variance_epsilon to apply RMSnorm.

    +(see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76)

    +
    +
    Parameters:
    +
      +
    • hidden_states (torch.Tensor) – the input tensor to apply RMSNorm.

    • +
    • weight (torch.Tensor) – the weight to apply RMSnorm.

    • +
    • eps (float) – the variance_epsilon to apply RMSnorm.

    • +
    +
    +
    ipex.llm.functional.fast_layer_norm(hidden_states: Tensor, normalized_shape: Tuple[int, ...], weight: Tensor, bias: Tensor, eps: float)

    Applies PyTorch Layernorm (see https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html) -on the input (hidden states). -Args: -- hidden_states(torch.Tensor) : the input tensor to apply normalization. -- normalized_shape (int or list) or torch.Size) input shape from an expected input of size. -- weight (torch.Tensor): the weight to apply normalization. -- bias (torch.Tensor): an additive bias for normalization. -- eps (float): a value added to the denominator for numerical stability.

    +on the input (hidden states).

    +
    +
    Parameters:
    +
      +
    • hidden_states (torch.Tensor) – the input tensor to apply normalization.

    • +
    • normalized_shape (int or list) or torch.Size) – expected input of size.

    • +
    • weight (torch.Tensor) – the weight to apply normalization.

    • +
    • bias (torch.Tensor) – an additive bias for normalization.

    • +
    • eps (float) – a value added to the denominator for numerical stability.

    • +
    +
    +
    @@ -1118,82 +1146,87 @@

    LLM Module Level Optimizations (Prototype) -
  • head_mask (torch.Tensor): Head mask tensor which is not supported by kernel yet.

  • -
  • attention_mask(torch.Tensor): Attention mask information.

  • -
  • text_max_length (int) : the max length of kv cache to be used for generation (allocate the pre-cache buffer).

  • +
    +
    Parameters:
    +
      +
    • query (torch.Tensor) – Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • key (torch.Tensor) – Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • value (torch.Tensor) – Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).

    • +
    • scale_attn (float) – scale used by the attention layer. should be the sqrt(head_size).

    • +
    • layer_past (tuple(torch.Tensor)) –

      tuple(seq_info, key_cache, value_cache, beam-idx).

      +
        +
      • key_cache: key cache tensor, shape: (max_seq, beam*batch, head_num, head_dim);

      • +
      • value_cache: value cache tensor, shape: (max_seq, beam*batch, head_num, head_dim);

      • +
      • beam-idx: history beam idx, shape:(max_seq, beam*batch);

      • +
      • seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).

      • +
      +

    • +
    • head_mask (torch.Tensor) – Head mask tensor which is not supported by kernel yet.

    • +
    • attention_mask (torch.Tensor) – Attention mask information.

    • +
    • text_max_length (int) – the max length of kv cache to be used for generation +(allocate the pre-cache buffer).

    -

    Return: -- attn_output: weighted value which is the output of scale dot product. shape (beam*batch, seq_len, head_num, head_size). -- attn_weights: The output tensor of the first matmul in scale dot product which is not supported by kernel now. -- new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).

    -

    Notes: -- How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model

    -
    -
    -
    see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318)
    -
    def _reorder_cache(

    self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor

    -
    -
    ) -> Tuple[Tuple[torch.Tensor]]:
    -
    if (

    len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1

    -
    -
    ):
    -
    for layer_past in past_key_values:

    layer_past[3][layer_past[0].size(-2) - 1] = beam_idx

    -
    -
    -

    return past_key_values

    -
    -
    -
    +
    Returns:
    +

    weighted value which is the output of scale dot product. +shape (beam*batch, seq_len, head_num, head_size).

    +

    attn_weights: the output tensor of the first matmul in scale dot product +which is not supported by kernel now.

    +

    new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).

    +

    -
    -
    +
    Return type:
    +

    attn_output

    +
    +
    +

    Notes

    +

    How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model +see https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318)

    +
    def _reorder_cache(
    +    self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor
    +) -> Tuple[Tuple[torch.Tensor]]:
    +    if (
    +        len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1
    +    ):
    +        for layer_past in past_key_values:
    +            layer_past[3][layer_past[0].size(-2) - 1] = beam_idx
    +        return past_key_values
    +
    +

    ipex.llm.functional.varlen_attention(query: Tensor, key: Tensor, value: Tensor, out: Tensor, seqlen_q: Tensor, seqlen_k: Tensor, max_seqlen_q: int, max_seqlen_k: int, pdropout: float, softmax_scale: float, zero_tensors: bool, is_causal: bool, return_softmax: bool, gen_: Generator)
    -
    -
    Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value

    (see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), +

    Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value +(see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html), and accept the variant (different) sequence length among the query, key and value.

    -
    -
    +

    This module does not have args for module init.

    +

    forward()

    Parameters:
      -
    • init (module) – this module does not have args for module init

    • -
    • forward

    • -
    • query (-) – shape [query_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • key (-) – shape [key_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • value (-) – shape [value_tokens, num_head, head_size], where tokens is total sequence length among batch size.

    • -
    • out (-) – buffer to get the results, the shape is the same as query.

    • -
    • seqlen_q (-) – shape [batch_size + 1], points the current query_tokens among total sequence length.

    • -
    • seqlen_k (-) – shape [batch_size + 1], points the current key_tokens among total sequence length.

    • -
    • max_seqlen_q (-) – max/total sequence length of query.

    • -
    • max_seqlen_k (-) – max/total sequence length of key.

    • -
    • pdropout (-) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.

    • -
    • softmax_scale (-) – scaling factor applied is prior to softmax.

    • -
    • is_causal (-) – whether to apply causal attention masking, default is True.

    • +
    • query (torch.Tensor) – shape [query_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • key (torch.Tensor) – shape [key_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • value (torch.Tensor) – shape [value_tokens, num_head, head_size], +where tokens is total sequence length among batch size.

    • +
    • out (torch.Tensor) – buffer to get the results, the shape is the same as query.

    • +
    • seqlen_q (torch.Tensor) – shape [batch_size + 1], +points the current query_tokens among total sequence length.

    • +
    • seqlen_k (torch.Tensor) – shape [batch_size + 1], +points the current key_tokens among total sequence length.

    • +
    • max_seqlen_q (int) – max/total sequence length of query.

    • +
    • max_seqlen_k (int) – max/total sequence length of key.

    • +
    • pdropout (float) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.

    • +
    • softmax_scale (float) – scaling factor applied is prior to softmax.

    • +
    • is_causal (bool) – whether to apply causal attention masking, default is True.

    @@ -1607,7 +1640,7 @@

    Graph OptimizationSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/blogs_publications.html b/cpu/2.3.0+cpu/tutorials/blogs_publications.html index b4704c089..90a817af4 100644 --- a/cpu/2.3.0+cpu/tutorials/blogs_publications.html +++ b/cpu/2.3.0+cpu/tutorials/blogs_publications.html @@ -167,7 +167,7 @@

    Blogs & PublicationsSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/cheat_sheet.html b/cpu/2.3.0+cpu/tutorials/cheat_sheet.html index 1e5479f88..c81b37a92 100644 --- a/cpu/2.3.0+cpu/tutorials/cheat_sheet.html +++ b/cpu/2.3.0+cpu/tutorials/cheat_sheet.html @@ -195,7 +195,7 @@

    Cheat SheetSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/contribution.html b/cpu/2.3.0+cpu/tutorials/contribution.html index 38cc63505..4b70ed395 100644 --- a/cpu/2.3.0+cpu/tutorials/contribution.html +++ b/cpu/2.3.0+cpu/tutorials/contribution.html @@ -331,7 +331,7 @@

    Tips Built with Sphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/examples.html b/cpu/2.3.0+cpu/tutorials/examples.html index d279b6cf7..6a1894c46 100644 --- a/cpu/2.3.0+cpu/tutorials/examples.html +++ b/cpu/2.3.0+cpu/tutorials/examples.html @@ -1567,7 +1567,7 @@

    Intel® AI Reference ModelsSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features.html b/cpu/2.3.0+cpu/tutorials/features.html index 864d49ac5..5335882f9 100644 --- a/cpu/2.3.0+cpu/tutorials/features.html +++ b/cpu/2.3.0+cpu/tutorials/features.html @@ -440,7 +440,7 @@

    Fast BERT Optimization (Prototype, NEW feature from 2.0.0)Sphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/amp.html b/cpu/2.3.0+cpu/tutorials/features/amp.html index 6e33bccec..f608418d7 100644 --- a/cpu/2.3.0+cpu/tutorials/features/amp.html +++ b/cpu/2.3.0+cpu/tutorials/features/amp.html @@ -262,7 +262,7 @@

    Ops that promote to the widest input typeSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html b/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html index 97c304be0..a22880be5 100644 --- a/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html +++ b/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html @@ -192,7 +192,7 @@

    Known issueSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html b/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html index 04709f865..2b2487f3d 100644 --- a/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html +++ b/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html @@ -280,7 +280,7 @@

    Already using Jit TraceSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/fast_bert.html b/cpu/2.3.0+cpu/tutorials/features/fast_bert.html index edaf2fadc..c854584a5 100644 --- a/cpu/2.3.0+cpu/tutorials/features/fast_bert.html +++ b/cpu/2.3.0+cpu/tutorials/features/fast_bert.html @@ -193,7 +193,7 @@

    Usage ExampleSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/graph_capture.html b/cpu/2.3.0+cpu/tutorials/features/graph_capture.html index 87687ec74..fbff2dcb3 100644 --- a/cpu/2.3.0+cpu/tutorials/features/graph_capture.html +++ b/cpu/2.3.0+cpu/tutorials/features/graph_capture.html @@ -179,7 +179,7 @@

    Usage ExampleSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html b/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html index 5db6c7370..2a65a0391 100644 --- a/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html +++ b/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html @@ -390,7 +390,7 @@

    FoldingSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/hypertune.html b/cpu/2.3.0+cpu/tutorials/features/hypertune.html index a8ea0d1fc..a1f42dd36 100644 --- a/cpu/2.3.0+cpu/tutorials/features/hypertune.html +++ b/cpu/2.3.0+cpu/tutorials/features/hypertune.html @@ -330,7 +330,7 @@

    Usage ExamplesSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/int8_overview.html b/cpu/2.3.0+cpu/tutorials/features/int8_overview.html index 96345893a..fc9b5c40a 100644 --- a/cpu/2.3.0+cpu/tutorials/features/int8_overview.html +++ b/cpu/2.3.0+cpu/tutorials/features/int8_overview.html @@ -300,7 +300,7 @@

    Convert to Dynamic Quantized Model and DeploySphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html b/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html index e02145b99..28a1cbf52 100644 --- a/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html +++ b/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html @@ -378,7 +378,7 @@

    Determining the Sphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html b/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html index b89bfa622..0f1e5f25a 100644 --- a/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html +++ b/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html @@ -742,7 +742,7 @@

    CPU feature checkSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/nhwc.html b/cpu/2.3.0+cpu/tutorials/features/nhwc.html index 8f50bde4b..7a6f937fd 100644 --- a/cpu/2.3.0+cpu/tutorials/features/nhwc.html +++ b/cpu/2.3.0+cpu/tutorials/features/nhwc.html @@ -370,7 +370,7 @@

    CPU Channels Last TargetsSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html b/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html index aefc11f04..2e6815cf8 100644 --- a/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html +++ b/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html @@ -184,7 +184,7 @@

    Operation FusionSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html b/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html index 24a28544a..ab46da07d 100644 --- a/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html +++ b/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html @@ -347,7 +347,7 @@

    IOMP preload or load during the runtimeSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/split_sgd.html b/cpu/2.3.0+cpu/tutorials/features/split_sgd.html index 42b4054c5..6926fbb58 100644 --- a/cpu/2.3.0+cpu/tutorials/features/split_sgd.html +++ b/cpu/2.3.0+cpu/tutorials/features/split_sgd.html @@ -218,7 +218,7 @@

    Split SGDSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html b/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html index b1fce77e9..cc2926a9a 100644 --- a/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html +++ b/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html @@ -209,7 +209,7 @@

    Smooth Quant Recipe Tuning API (Prototype)Sphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/getting_started.html b/cpu/2.3.0+cpu/tutorials/getting_started.html index d16fe6df4..79461e521 100644 --- a/cpu/2.3.0+cpu/tutorials/getting_started.html +++ b/cpu/2.3.0+cpu/tutorials/getting_started.html @@ -282,7 +282,7 @@

    LLM Quick StartSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/installation.html b/cpu/2.3.0+cpu/tutorials/installation.html index 8835193ef..7956f4699 100644 --- a/cpu/2.3.0+cpu/tutorials/installation.html +++ b/cpu/2.3.0+cpu/tutorials/installation.html @@ -132,7 +132,7 @@

    InstallationSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/introduction.html b/cpu/2.3.0+cpu/tutorials/introduction.html index 1f2723f33..54b0b891e 100644 --- a/cpu/2.3.0+cpu/tutorials/introduction.html +++ b/cpu/2.3.0+cpu/tutorials/introduction.html @@ -156,7 +156,7 @@

    API DocumentationSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/known_issues.html b/cpu/2.3.0+cpu/tutorials/known_issues.html index 330a7f99c..4ece9f8c2 100644 --- a/cpu/2.3.0+cpu/tutorials/known_issues.html +++ b/cpu/2.3.0+cpu/tutorials/known_issues.html @@ -316,7 +316,7 @@

    Result CorrectnessSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/license.html b/cpu/2.3.0+cpu/tutorials/license.html index 8eab4ec9a..43ac073d6 100644 --- a/cpu/2.3.0+cpu/tutorials/license.html +++ b/cpu/2.3.0+cpu/tutorials/license.html @@ -132,7 +132,7 @@

    LicenseSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/llm.html b/cpu/2.3.0+cpu/tutorials/llm.html index 3228a7eaa..9a32a6a92 100644 --- a/cpu/2.3.0+cpu/tutorials/llm.html +++ b/cpu/2.3.0+cpu/tutorials/llm.html @@ -585,13 +585,13 @@

    Verified for distributed inference mode via DeepSpeed

    🟨 signifies that the model can perform well while accuracy may not been in a perfect state (>1% difference as compared with FP32).

    Note: The above verified models (including other models in the same model family, like “codellama/CodeLlama-7b-hf” from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.

    -

    Please check LLM best known practice for instructions to install/setup environment and example scripts.

    +

    Please check LLM best known practice for instructions to install/setup environment and example scripts.

    Module Level Optimization API for customized LLM (Prototype)

    In the past year, LLM has been flourishing with many open-sourced models contributed to the community, while researchers are building their own LLMs from transformer blocks with variants in implementation details. To help LLM researchers and developers improve their productivity, Intel® Extension for PyTorch* provides module level optimizations for commonly used LLM modules and functionalities, which are operators or certain operator combinations in nature.

    -

    Please check LLM module level optimization practice to better understand how to use module level APIs to optimize your LLM and achieve better performance.

    +

    Please check LLM module level optimization practice to better understand how to use module level APIs to optimize your LLM and achieve better performance.

    Demos

    @@ -713,7 +713,7 @@

    Distributed InferenceSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html b/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html index c2823a660..f49dcb9d6 100644 --- a/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html +++ b/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html @@ -266,7 +266,7 @@

    Distributed Inference with DeepSpeedSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/performance.html b/cpu/2.3.0+cpu/tutorials/performance.html index 815f64565..c9ed2e7f8 100644 --- a/cpu/2.3.0+cpu/tutorials/performance.html +++ b/cpu/2.3.0+cpu/tutorials/performance.html @@ -1038,7 +1038,7 @@

    Hardware ConfigurationSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html index 297780342..f38cebcac 100644 --- a/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html +++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html @@ -829,7 +829,7 @@

    GNU OpenMP LibrarySphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html index 80dcd7dae..e38ab0543 100644 --- a/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html +++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html @@ -462,7 +462,7 @@

    Performance Boost with Intel® Extension for PyTorch* and LauncherSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html index 963dbdcd1..04ee95764 100644 --- a/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html +++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html @@ -365,7 +365,7 @@

    OneDNN primitive cacheSphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.
    diff --git a/cpu/2.3.0+cpu/tutorials/releases.html b/cpu/2.3.0+cpu/tutorials/releases.html index abca321a6..929e04127 100644 --- a/cpu/2.3.0+cpu/tutorials/releases.html +++ b/cpu/2.3.0+cpu/tutorials/releases.html @@ -1337,7 +1337,7 @@

    NOTE Built with Sphinx using a theme provided by Read the Docs. - +

    © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), http://opensource.org/licenses/0BSD.