diff --git a/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt b/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt
index 4cb02e6a0..e9690b677 100644
--- a/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt
+++ b/cpu/2.3.0+cpu/_sources/tutorials/llm.rst.txt
@@ -30,14 +30,14 @@ Verified for distributed inference mode via DeepSpeed
 
 *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
 
-Please check `LLM best known practice <../../examples/cpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts.
+Please check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/v2.3.0%2Bcpu/examples/cpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts.
 
 Module Level Optimization API for customized LLM (Prototype)
 ------------------------------------------------------------
 
 In the past year, LLM has been flourishing with many open-sourced models contributed to the community, while researchers are building their own LLMs from transformer blocks with variants in implementation details. To help LLM researchers and developers improve their productivity, Intel® Extension for PyTorch* provides module level optimizations for commonly used LLM modules and functionalities, which are operators or certain operator combinations in nature.
 
-Please check `LLM module level optimization practice <../../examples/cpu/inference/python/llm-modeling>`_ to better understand how to use `module level APIs <api_doc.html#llm-module-level-optimizations>`_ to optimize your LLM and achieve better performance.
+Please check `LLM module level optimization practice <https://github.com/intel/intel-extension-for-pytorch/tree/v2.3.0%2Bcpu/examples/cpu/inference/python/llm-modeling>`_ to better understand how to use `module level APIs <api_doc.html#llm-module-level-optimizations-prototype>`_ to optimize your LLM and achieve better performance.
 
 Demos
 -----
diff --git a/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html b/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html
index 42911773f..759ecfb0d 100644
--- a/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html
+++ b/cpu/2.3.0+cpu/design_doc/cpu/isa_dyndisp.html
@@ -125,7 +125,7 @@ <h1>Intel® Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc<a class="
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde018190> 
+   <jinja2.runtime.BlockReference object at 0x7f7057502d90> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/genindex.html b/cpu/2.3.0+cpu/genindex.html
index e881b05a6..ef43fe926 100644
--- a/cpu/2.3.0+cpu/genindex.html
+++ b/cpu/2.3.0+cpu/genindex.html
@@ -375,7 +375,7 @@ <h2 id="V">V</h2>
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddb7f35e0> 
+   <jinja2.runtime.BlockReference object at 0x7f7056e40e80> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/index.html b/cpu/2.3.0+cpu/index.html
index faeb6e121..b05e1f672 100644
--- a/cpu/2.3.0+cpu/index.html
+++ b/cpu/2.3.0+cpu/index.html
@@ -182,7 +182,7 @@ <h2>Support<a class="headerlink" href="#support" title="Permalink to this headin
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de006b820> 
+   <jinja2.runtime.BlockReference object at 0x7f70575ce910> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/py-modindex.html b/cpu/2.3.0+cpu/py-modindex.html
index 7aab41737..323572252 100644
--- a/cpu/2.3.0+cpu/py-modindex.html
+++ b/cpu/2.3.0+cpu/py-modindex.html
@@ -165,7 +165,7 @@ <h1>Python Module Index</h1>
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddb7a21f0> 
+   <jinja2.runtime.BlockReference object at 0x7f7054d8dfa0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/search.html b/cpu/2.3.0+cpu/search.html
index a1f97b24d..d94b3c725 100644
--- a/cpu/2.3.0+cpu/search.html
+++ b/cpu/2.3.0+cpu/search.html
@@ -133,7 +133,7 @@
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddb7b6640> 
+   <jinja2.runtime.BlockReference object at 0x7f7054da53a0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/searchindex.js b/cpu/2.3.0+cpu/searchindex.js
index c8847b61b..e55de425d 100644
--- a/cpu/2.3.0+cpu/searchindex.js
+++ b/cpu/2.3.0+cpu/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["design_doc/cpu/isa_dyndisp", "index", "tutorials/api_doc", "tutorials/blogs_publications", "tutorials/cheat_sheet", "tutorials/contribution", "tutorials/examples", "tutorials/features", "tutorials/features/amp", "tutorials/features/auto_channels_last", "tutorials/features/codeless_optimization", "tutorials/features/fast_bert", "tutorials/features/graph_capture", "tutorials/features/graph_optimization", "tutorials/features/hypertune", "tutorials/features/int8_overview", "tutorials/features/int8_recipe_tuning_api", "tutorials/features/isa_dynamic_dispatch", "tutorials/features/nhwc", "tutorials/features/optimizer_fusion", "tutorials/features/runtime_extension", "tutorials/features/split_sgd", "tutorials/features/sq_recipe_tuning_api", "tutorials/getting_started", "tutorials/installation", "tutorials/introduction", "tutorials/known_issues", "tutorials/license", "tutorials/llm", "tutorials/llm/llm_optimize", "tutorials/performance", "tutorials/performance_tuning/launch_script", "tutorials/performance_tuning/torchserve", "tutorials/performance_tuning/tuning_guide", "tutorials/releases"], "filenames": ["design_doc/cpu/isa_dyndisp.md", "index.rst", "tutorials/api_doc.rst", "tutorials/blogs_publications.md", "tutorials/cheat_sheet.md", "tutorials/contribution.md", "tutorials/examples.md", "tutorials/features.rst", "tutorials/features/amp.md", "tutorials/features/auto_channels_last.md", "tutorials/features/codeless_optimization.md", "tutorials/features/fast_bert.md", "tutorials/features/graph_capture.md", "tutorials/features/graph_optimization.md", "tutorials/features/hypertune.md", "tutorials/features/int8_overview.md", "tutorials/features/int8_recipe_tuning_api.md", "tutorials/features/isa_dynamic_dispatch.md", "tutorials/features/nhwc.md", "tutorials/features/optimizer_fusion.md", "tutorials/features/runtime_extension.md", "tutorials/features/split_sgd.rst", "tutorials/features/sq_recipe_tuning_api.md", "tutorials/getting_started.md", "tutorials/installation.md", "tutorials/introduction.rst", "tutorials/known_issues.md", "tutorials/license.md", "tutorials/llm.rst", "tutorials/llm/llm_optimize.md", "tutorials/performance.md", "tutorials/performance_tuning/launch_script.md", "tutorials/performance_tuning/torchserve.md", "tutorials/performance_tuning/tuning_guide.md", "tutorials/releases.md"], "titles": ["Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc", "Intel\u00ae Extension for PyTorch*", "API Documentation", "Blogs &amp; Publications", "Cheat Sheet", "Contribution", "Examples", "Features", "Auto Mixed Precision (AMP)", "Auto Channels Last", "Codeless Optimization (Prototype)", "Fast BERT (Prototype)", "Graph Capture (Prototype)", "Graph Optimization", "HyperTune (Prototype)", "Intel\u00ae Extension for PyTorch* optimizations for quantization", "INT8 Recipe Tuning API (Prototype)", "ISA Dynamic Dispatching", "Channels Last", "Optimizer Fusion", "Runtime Extension", "Split SGD", "Smooth Quant Recipe Tuning API (Prototype)", "Quick Start", "Installation", "Introduction", "Troubleshooting", "License", "Large Language Models (LLM) Optimization Overview", "Transformers Optimization Frontend API", "Performance", "Launch Script Usage Guide", "TorchServe with Intel\u00ae Extension for PyTorch*", "Performance Tuning Guide", "Releases"], "terms": {"The": [0, 1, 2, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "document": [0, 7, 17, 20, 29, 34], "i": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 34], "redirect": 0, "thi": [0, 2, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 26, 27, 28, 29, 30, 31, 34], "link": [0, 1, 6, 17, 34], "now": [0, 2, 7, 15, 18, 32, 33, 34], "intel optim": 1, "intel\u00ae extension for pytorch*": 1, "gpu": [1, 3, 18, 34], "discrete gpu": 1, "intel discrete gpu": 1, "extend": [1, 18, 25, 33, 34], "latest": [1, 2, 25, 28, 30, 34], "perform": [1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 18, 19, 21, 25, 28, 29, 31], "optim": [1, 3, 4, 6, 8, 9, 11, 12, 14, 16, 18, 20, 21, 23, 25, 26, 31, 32, 33, 34], "hardwar": [1, 3, 17, 25, 28, 32, 34], "take": [1, 2, 7, 8, 10, 12, 13, 14, 18, 21, 25, 26, 30, 31, 33], "advantag": [1, 2, 7, 9, 12, 18, 21, 25, 30, 31, 33], "advanc": [1, 2, 6, 7, 16, 25, 28], "vector": [1, 2, 6, 17, 18, 25, 28], "512": [1, 6, 11, 16, 25, 28, 31], "avx": [1, 6, 17, 25, 28], "neural": [1, 3, 7, 16, 22, 25, 28, 33, 34], "network": [1, 3, 7, 8, 20, 25, 28, 33], "instruct": [1, 5, 6, 7, 8, 17, 21, 23, 24, 25, 28, 30, 33, 34], "vnni": [1, 15, 17, 25, 28], "matrix": [1, 6, 7, 25, 28], "amx": [1, 3, 6, 7, 17, 25, 28, 30], "cpu": [1, 3, 4, 5, 6, 7, 8, 10, 14, 15, 16, 19, 20, 23, 25, 26, 28, 30, 31, 32, 34], "well": [1, 2, 5, 6, 7, 11, 16, 20, 21, 24, 28, 32, 33, 34], "x": [1, 5, 6, 8, 10, 13, 15, 16, 17, 18, 20, 21, 23, 26, 34], "e": [1, 2, 6, 7, 8, 12, 16, 17, 18, 28, 31, 33, 34], "xmx": 1, "ai": [1, 2, 3, 7, 28], "engin": [1, 6, 18, 33], "discret": 1, "moreov": [1, 2, 28], "provid": [1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 16, 20, 22, 24, 26, 28, 29, 31, 32, 33, 34], "easi": [1, 3, 21], "acceler": [1, 2, 3, 6, 7, 13, 28, 29, 30, 34], "through": [1, 2, 6, 7, 8, 12, 25, 28, 33, 34], "xpu": [1, 2, 3, 34], "devic": [1, 2, 15, 29, 31, 34], "In": [1, 2, 6, 7, 8, 12, 16, 17, 18, 19, 21, 23, 28, 31, 32, 33, 34], "current": [1, 2, 5, 7, 11, 13, 14, 15, 16, 17, 19, 20, 26, 28, 29, 34], "technolog": [1, 7, 28], "landscap": [1, 7, 28], "gener": [1, 5, 6, 7, 10, 12, 16, 17, 18, 21, 23, 28, 29, 30, 31, 32, 33, 34], "genai": [1, 7, 28], "workload": [1, 6, 7, 8, 10, 11, 12, 21, 26, 28, 29, 30, 31, 33, 34], "model": [1, 2, 3, 4, 8, 9, 10, 11, 12, 14, 16, 23, 24, 25, 26, 29, 30, 33, 34], "have": [1, 2, 5, 6, 7, 9, 14, 17, 18, 20, 21, 23, 26, 27, 28, 30, 31, 32, 33, 34], "gain": [1, 7, 26, 28, 34], "widespread": [1, 7, 28], "attent": [1, 2, 7, 28, 34], "popular": [1, 7, 22, 28, 30, 34], "larg": [1, 2, 19, 23, 24, 25, 26, 29, 30, 33, 34], "languag": [1, 2, 23, 24, 25, 26, 29, 34], "llm": [1, 16, 22, 24, 25, 29, 34], "emerg": [1, 7, 28], "domin": [1, 7, 28], "drive": [1, 7, 28], "applic": [1, 2, 7, 20, 28, 32, 33], "start": [1, 3, 4, 5, 6, 7, 10, 20, 24, 34], "from": [1, 2, 3, 4, 5, 8, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 29, 31, 32, 33, 34], "2": [1, 2, 3, 8, 10, 16, 17, 18, 20, 21, 25, 26, 27, 28, 29, 30, 31, 33], "1": [1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 33], "0": [1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 32, 33], "specif": [1, 2, 5, 6, 7, 12, 18, 20, 26, 28, 31, 33, 34], "certain": [1, 7, 26, 28, 29, 31, 33], "ar": [1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "introduc": [1, 3, 7, 15, 18, 21, 22, 31, 33, 34], "For": [1, 2, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 34], "more": [1, 2, 5, 6, 7, 8, 10, 11, 13, 16, 17, 19, 20, 21, 23, 26, 28, 32, 33, 34], "inform": [1, 2, 6, 7, 14, 17, 18, 28, 31, 32, 33, 34], "refer": [1, 7, 9, 13, 14, 16, 17, 18, 20, 22, 23, 24, 25, 32, 34], "section": [1, 6, 7, 8, 14, 20, 23, 24, 25, 28, 29, 32, 33, 34], "can": [1, 2, 5, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 28, 29, 30, 31, 32, 33, 34], "load": [1, 2, 6, 7, 13, 15, 16, 17, 23, 29, 32, 34], "python": [1, 2, 4, 10, 14, 17, 20, 26, 28, 29, 31, 32, 33, 34], "modul": [1, 6, 7, 8, 13, 16, 17, 26, 29, 31, 34], "program": [1, 5, 7, 11, 20, 31, 33, 34], "c": [1, 7, 8, 16, 17, 20, 26, 28, 31, 32, 33, 34], "librari": [1, 2, 5, 6, 7, 17, 20, 32, 33, 34], "script": [1, 2, 3, 4, 5, 6, 7, 8, 10, 14, 17, 20, 23, 24, 26, 28, 29, 30, 32, 33, 34], "user": [1, 2, 7, 9, 10, 12, 13, 15, 16, 18, 20, 26, 31, 32, 33, 34], "enabl": [1, 2, 3, 4, 6, 7, 8, 10, 13, 16, 18, 20, 22, 23, 26, 28, 31, 32, 33, 34], "dynam": [1, 4, 20, 28, 32, 33, 34], "import": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 32, 33, 34], "intel_extension_for_pytorch": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 23, 25, 29, 32, 34], "featur": [1, 2, 3, 5, 8, 10, 13, 14, 18, 20, 23, 25, 26, 28, 30, 31, 32, 33, 34], "includ": [1, 2, 5, 6, 7, 10, 14, 15, 17, 23, 26, 27, 28, 30, 34], "onli": [1, 2, 5, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 20, 21, 26, 28, 31, 32, 34], "packag": [1, 2, 5, 6, 7, 10, 23, 25, 26, 32, 33, 34], "mai": [1, 2, 3, 5, 6, 7, 8, 9, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "newer": [1, 28, 33], "code": [1, 2, 5, 6, 7, 10, 11, 12, 13, 18, 19, 21, 23, 24, 26, 27, 29, 33, 34], "base": [1, 2, 3, 4, 5, 6, 7, 10, 11, 17, 20, 21, 26, 28, 29, 30, 32, 33, 34], "due": [1, 8, 10, 17, 20, 26], "differ": [1, 2, 6, 7, 15, 16, 17, 18, 20, 28, 31, 32, 33, 34], "develop": [1, 3, 6, 28, 30, 33, 34], "schedul": [1, 2, 13, 20, 31, 33], "ha": [1, 2, 7, 10, 14, 17, 18, 20, 21, 26, 28, 30, 31, 33, 34], "been": [1, 6, 7, 10, 17, 18, 28, 31, 33, 34], "releas": [1, 17, 18, 26, 30, 33], "an": [1, 2, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 26, 31, 32, 33, 34], "open": [1, 16, 28, 33], "sourc": [1, 5, 6, 17, 27, 28, 33, 34], "project": [1, 6], "github": [1, 2, 5, 6, 7, 8, 34], "you": [1, 2, 5, 6, 7, 8, 13, 14, 15, 17, 18, 20, 23, 25, 26, 28, 29, 31, 33, 34], "find": [1, 2, 6, 7, 14, 16, 23, 26, 30, 31, 34], "how": [1, 2, 6, 10, 15, 17, 18, 23, 28, 32, 33, 34], "get": [1, 2, 3, 4, 6, 7, 10, 11, 15, 17, 20, 21, 22, 26, 28, 29, 30, 31, 33, 34], "main": [1, 2, 5, 6, 14, 20, 31, 32], "branch": [1, 7, 30], "quick": [1, 20, 24, 25], "about": [1, 2, 5, 7, 13, 16, 32, 33, 34], "product": [1, 2, 7, 14, 28, 34], "structur": [1, 18, 31], "shown": [1, 6, 18, 28, 31, 32], "follow": [1, 2, 4, 5, 6, 7, 8, 11, 14, 15, 16, 17, 18, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34], "figur": [1, 2, 21, 28, 33], "eager": [1, 7, 12, 23, 32, 34], "mode": [1, 2, 5, 7, 10, 12, 18, 20, 23, 26, 32, 34], "frontend": [1, 2, 7, 20, 28, 34], "custom": [1, 2, 7, 26, 34], "fusion": [1, 2, 7, 10, 21, 28, 34], "int8": [1, 2, 3, 4, 17, 18, 20, 22, 28, 29, 34], "quantiz": [1, 3, 4, 13, 22, 26, 28, 30, 32, 34], "api": [1, 3, 6, 10, 11, 15, 20, 26, 33, 34], "further": [1, 2, 5, 6, 7, 18, 20, 28, 33, 34], "improv": [1, 3, 7, 8, 13, 20, 22, 28, 30, 32, 33], "achiev": [1, 2, 6, 7, 28, 33, 34], "convert": [1, 2, 4, 6, 7, 8, 9, 10, 13, 16, 17, 18, 20, 23, 26, 32, 34], "graph": [1, 4, 8, 10, 16, 23, 26, 31, 34], "us": [1, 2, 3, 4, 5, 6, 11, 14, 15, 17, 18, 19, 21, 23, 24, 25, 26, 27, 28, 32, 33, 34], "pass": [1, 2, 5, 10, 17, 20, 26, 32, 34], "reduc": [1, 2, 7, 15, 19, 20, 21, 22, 26, 28, 33, 34], "oper": [1, 2, 6, 8, 13, 15, 21, 32, 33, 34], "kernel": [1, 2, 7, 20, 26, 28, 30, 33, 34], "invoc": [1, 7], "overhead": [1, 2, 7, 10, 19, 20, 26, 28, 33, 34], "result": [1, 2, 6, 10, 12, 14, 16, 18, 20, 21, 30, 31, 32, 33], "compar": [1, 2, 7, 13, 18, 21, 26, 28, 30, 31, 33, 34], "normal": [1, 2, 6, 7, 13, 20, 28, 33, 34], "yield": [1, 7, 33], "better": [1, 2, 6, 7, 15, 18, 20, 28, 31, 32, 33, 34], "techniqu": [1, 2, 7, 11, 12, 28, 34], "like": [1, 2, 3, 5, 6, 7, 8, 14, 18, 19, 21, 26, 28, 31, 33, 34], "amplifi": 1, "them": [1, 5, 7, 18, 19, 28, 31, 33], "comprehens": [1, 34], "both": [1, 2, 6, 7, 16, 18, 19, 21, 28, 29, 31, 32, 33, 34], "torchscript": [1, 2, 5, 7, 10, 11, 12, 19, 23, 26, 32, 34], "torchdynamo": [1, 7, 12, 23, 34], "With": [1, 2, 7, 10, 20, 31, 34], "we": [1, 2, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 18, 19, 20, 21, 23, 28, 30, 32, 33, 34], "recommend": [1, 5, 6, 7, 9, 10, 15, 16, 20, 23, 30, 31, 33, 34], "torch": [1, 2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 18, 20, 23, 26, 29, 32, 33, 34], "jit": [1, 2, 5, 6, 7, 8, 13, 15, 16, 18, 20, 23, 26, 32, 34], "trace": [1, 6, 7, 8, 12, 13, 15, 16, 20, 23, 26, 32, 34], "your": [1, 5, 6, 7, 8, 10, 14, 15, 20, 23, 24, 26, 27, 28, 29, 34], "prefer": [1, 7, 8, 15, 24], "option": [1, 2, 5, 7, 10, 14, 15, 16, 29, 31, 34], "wider": 1, "rang": [1, 6, 7, 15, 16, 19, 21, 26, 31, 32, 34], "ipex": [1, 2, 3, 4, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 23, 26, 29, 31, 32, 34], "backend": [1, 2, 3, 6, 7, 12, 13, 16, 17, 23, 26, 28, 31, 33, 34], "avail": [1, 2, 6, 7, 11, 17, 20, 22, 23, 29, 31, 33, 34], "good": [1, 2, 5, 7, 12, 18, 19, 28, 33, 34], "On": [1, 2, 7, 18, 28, 33], "automat": [1, 2, 6, 7, 9, 10, 12, 13, 15, 16, 18, 22, 28, 31, 32, 33, 34], "dispatch": [1, 34], "underli": [1, 17, 28], "detect": [1, 6, 12, 17, 26, 33, 34], "set": [1, 2, 4, 5, 6, 7, 8, 14, 15, 16, 17, 21, 24, 26, 28, 30, 31, 32, 33, 34], "isa": [1, 34], "leverag": [1, 7, 11, 28, 32, 34], "unit": [1, 2, 33], "runtim": [1, 8, 13, 17, 31, 33, 34], "offer": [1, 5, 33], "finer": [1, 7, 20], "grain": [1, 3, 7, 20], "thread": [1, 2, 7, 20, 26, 30, 31, 32, 33, 34], "control": [1, 2, 7, 20, 26, 31, 33, 34], "weight": [1, 2, 7, 10, 12, 13, 15, 16, 18, 20, 22, 23, 26, 28, 34], "share": [1, 5, 6, 16, 20, 32, 33, 34], "increas": [1, 2, 3, 21, 26, 28, 30, 33, 34], "effici": [1, 7, 11, 19, 20, 28, 31, 33, 34], "implement": [1, 5, 7, 11, 19, 26, 28, 33, 34], "regist": [1, 7, 10, 16, 17, 34], "mechan": [1, 7, 17, 21, 34], "These": [1, 5, 6, 7, 8, 13, 28], "nativ": [1, 6, 7, 8, 17, 19, 21, 26, 28, 34], "calcul": [1, 2, 8, 16, 21, 22], "util": [1, 6, 7, 10, 13, 15, 16, 18, 21, 28, 31, 33, 34], "dpc": 1, "compil": [1, 5, 6, 23, 26, 33, 34], "sycl": 1, "standard": [1, 34], "also": [1, 2, 6, 7, 10, 13, 14, 16, 18, 19, 28, 30, 31, 33, 34], "number": [1, 2, 5, 6, 7, 14, 16, 19, 20, 21, 26, 32, 34], "which": [1, 2, 5, 7, 8, 10, 14, 15, 16, 17, 18, 20, 26, 28, 30, 31, 32, 33, 34], "found": [1, 6, 7, 14, 16, 18, 29, 31, 32, 33, 34], "doc": [1, 2, 5, 11, 29, 34], "directori": [1, 5, 6, 14, 29, 31, 32], "team": [1, 5], "track": 1, "bug": [1, 5, 34], "enhanc": [1, 3, 28, 34], "request": [1, 5, 20, 32], "issu": [1, 2, 5, 8, 21, 26, 33], "befor": [1, 2, 5, 6, 13, 14, 17, 18, 20, 31, 33, 34], "submit": [1, 5, 7, 20], "suggest": [1, 2, 15, 18, 20, 33, 34], "report": [1, 17], "search": [1, 2, 4, 5, 7, 16, 22, 28, 31], "exist": [1, 5, 7, 13, 26, 31, 33], "see": [1, 2, 5, 8, 14, 34], "alreadi": [1, 5, 6, 18, 28, 33], "pytorch": [2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 16, 17, 20, 23, 25, 26, 27, 28, 29, 30, 31, 33, 34], "dtype": [2, 4, 6, 7, 8, 10, 11, 13, 15, 16, 17, 23, 26, 29, 31, 34], "none": [2, 6, 29, 31], "o1": [2, 26, 34], "inplac": [2, 4, 6, 13, 15, 18, 23, 32], "fals": [2, 4, 6, 7, 8, 13, 14, 15, 16, 17, 20, 22, 23, 26, 31, 32, 34], "conv_bn_fold": [2, 26, 34], "linear_bn_fold": 2, "weights_prepack": [2, 6, 7, 23, 26], "replace_dropout_with_ident": 2, "optimize_lstm": 2, "split_master_weight_for_bf16": 2, "fuse_update_step": 2, "auto_kernel_select": [2, 7, 30], "sample_input": [2, 9, 34], "graph_mod": [2, 4, 7, 12, 34], "concat_linear": 2, "appli": [2, 6, 7, 8, 12, 13, 16, 18, 19, 21, 23, 26, 28, 29, 31, 34], "given": [2, 6, 13, 14, 16, 28], "nn": [2, 6, 7, 8, 10, 13, 15, 16, 18, 20, 26, 34], "If": [2, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 20, 26, 31, 32, 33, 34], "train": [2, 3, 4, 7, 11, 13, 15, 16, 18, 21, 23, 26, 28, 29, 31, 34], "otherwis": [2, 7, 20], "infer": [2, 3, 4, 7, 10, 11, 12, 15, 18, 20, 21, 23, 26, 30, 33, 34], "conv": [2, 8, 10, 13, 15, 20, 26, 34], "bn": [2, 10, 15, 26, 34], "fold": [2, 10, 15, 16, 26, 34], "prepack": [2, 6, 10, 18, 26, 28, 34], "so": [2, 5, 6, 7, 8, 15, 17, 18, 20, 30, 31, 32, 33, 34], "onednn": [2, 3, 13, 17, 26, 28, 34], "order": [2, 17, 18, 21, 31, 33, 34], "cach": [2, 5, 7, 19, 20, 30, 34], "reus": [2, 33], "memori": [2, 6, 7, 8, 9, 10, 13, 19, 20, 21, 26, 28, 30, 32, 34], "layout": [2, 26, 34], "call": [2, 6, 8, 13, 17, 18, 21, 26, 32, 33, 34], "block": [2, 5, 16, 20, 22, 28, 33, 34], "although": [2, 33], "itself": [2, 5, 18], "enough": [2, 7, 19], "usag": [2, 6, 7, 8, 23, 25, 32, 33, 34], "perspect": [2, 13, 18, 21, 28, 31, 33], "drawback": [2, 21], "run": [2, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 26, 30, 31, 32, 33, 34], "split": [2, 6, 7, 16, 17, 19, 20, 26, 34], "one": [2, 5, 7, 12, 13, 14, 16, 18, 19, 20, 26, 29, 31, 33, 34], "sever": [2, 7, 10, 19, 30, 31, 34], "dimens": [2, 18, 26], "data": [2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 31, 32, 34], "fix": [2, 5, 7, 34], "size": [2, 6, 7, 11, 15, 16, 17, 18, 23, 26, 28, 30, 32, 33, 34], "each": [2, 8, 14, 16, 17, 19, 20, 21, 31, 32, 33, 34], "time": [2, 5, 7, 14, 16, 17, 18, 19, 26, 28, 30, 33, 34], "execut": [2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 19, 20, 26, 31, 32, 33, 34], "detail": [2, 5, 6, 7, 8, 9, 11, 13, 17, 18, 24, 25, 26, 28, 30, 32, 33, 34], "mermori": 2, "format": [2, 5, 6, 7, 9, 14, 22, 26, 28, 31, 33, 34], "manual": [2, 7, 10, 14, 18, 20, 34], "To": [2, 5, 6, 7, 10, 13, 15, 16, 17, 18, 20, 21, 23, 28, 32, 33, 34], "predefin": 2, "shape": [2, 6, 7, 16, 20, 23, 30, 33, 34], "prior": [2, 23], "match": [2, 8, 17, 31], "requir": [2, 5, 6, 8, 10, 16, 18, 21, 26, 28, 29, 31, 32, 34], "won": [2, 7, 8, 17, 26], "t": [2, 5, 7, 8, 14, 15, 16, 17, 18, 20, 26, 32, 34], "convers": [2, 8, 13, 34], "directli": [2, 6, 33, 34], "go": [2, 5, 8], "methodologi": [2, 6, 7, 19, 33], "possibl": [2, 14, 15, 19, 28, 33, 34], "avoid": [2, 10, 20, 21, 26, 31, 32, 33, 34], "thu": [2, 7, 8, 10, 18, 20, 21, 28, 31, 32, 33], "paramet": [2, 6, 7, 8, 10, 16, 17, 19, 20, 21, 26, 28, 29, 30, 31, 33, 34], "work": [2, 5, 6, 7, 14, 15, 17, 20, 26, 28, 29, 31, 33, 34], "bfloat16": [2, 3, 4, 7, 10, 11, 17, 18, 23, 29, 31, 34], "half": [2, 7, 17, 21], "k": [2, 5], "float16": [2, 8], "cast": [2, 8, 21, 28], "accord": [2, 13, 28, 33, 34], "default": [2, 4, 6, 7, 10, 12, 13, 15, 16, 17, 20, 22, 23, 26, 28, 30, 32, 33, 34], "valu": [2, 6, 10, 14, 16, 17, 19, 20, 21, 22, 26, 28, 31, 32, 33, 34], "mean": [2, 16, 17, 18, 20, 22, 28, 34], "do": [2, 5, 8, 16, 18, 20, 21, 26, 28, 30, 31, 32, 33, 34], "noth": 2, "note": [2, 3, 5, 6, 15, 16, 17, 18, 20, 22, 24, 28, 30, 31, 32, 33], "type": [2, 4, 5, 6, 7, 10, 16, 17, 18, 20, 21, 23, 30, 31, 32, 34], "conv2d": [2, 7, 8, 10, 13, 18, 20, 26, 34], "linear": [2, 6, 7, 8, 13, 15, 16, 18, 26, 33, 34], "convtranspose2d": [2, 13], "case": [2, 6, 7, 9, 12, 16, 17, 18, 28, 31, 33, 34], "addit": [2, 6, 7, 17, 21, 28, 34], "embed": [2, 7, 28, 34], "lstm": [2, 10, 15, 34], "sgd": [2, 6, 7, 8, 16, 19], "string": [2, 31], "o0": [2, 26, 34], "No": [2, 18, 34], "function": [2, 5, 6, 7, 8, 10, 11, 12, 14, 15, 17, 20, 21, 23, 26, 28, 29, 31, 33, 34], "just": [2, 14, 29, 34], "return": [2, 6, 7, 8, 10, 16, 17, 20, 26, 34], "origin": [2, 6, 7, 12, 13, 15, 17, 20, 29, 34], "dropout": [2, 10], "remov": [2, 5, 21, 34], "inferenc": 2, "master": [2, 7, 21, 31], "fuse": [2, 7, 13, 16, 19, 28, 34], "updat": [2, 5, 7, 16, 19, 21, 22, 34], "step": [2, 5, 6, 7, 8, 14, 16, 19, 21, 32], "overridden": [2, 17], "explicitli": [2, 8, 16, 20, 26, 31, 34], "bool": [2, 14], "whether": [2, 6, 8, 16, 18, 22, 23, 33], "conv_bn": 2, "It": [2, 6, 7, 8, 10, 13, 17, 18, 20, 21, 23, 26, 29, 31, 33, 34], "knob": [2, 4, 12, 31], "overwrit": [2, 31], "configur": [2, 4, 6, 7, 14, 15, 16, 17, 31, 32, 34], "linear_bn": 2, "convolut": [2, 6, 7, 13, 20, 33, 34], "reorder": [2, 18, 28], "doesn": [2, 15, 16, 18, 26, 34], "support": [2, 5, 6, 7, 13, 15, 16, 17, 18, 19, 20, 21, 25, 26, 28, 29, 31, 32, 33, 34], "replac": [2, 5, 7, 10, 26, 34], "ident": [2, 10, 18], "aten": [2, 6, 7, 34], "opportunit": 2, "bf16": [2, 3, 7, 17, 19, 21, 23, 26, 28, 30, 34], "save": [2, 5, 6, 7, 13, 14, 15, 16, 18, 21, 28, 32, 34], "solut": [2, 7, 26, 28, 34], "all": [2, 5, 6, 8, 13, 14, 17, 19, 20, 28, 29, 32, 33, 34], "param": [2, 19, 31], "tupl": [2, 6, 17, 20], "tensor": [2, 6, 7, 8, 11, 15, 16, 17, 20, 26, 28, 32, 34], "feed": [2, 9, 18], "sampl": [2, 6, 9, 14, 16, 17, 29, 33], "input": [2, 6, 7, 9, 10, 13, 15, 16, 17, 18, 22, 23, 26, 29, 30, 32, 33, 34], "impact": [2, 7, 20], "pack": [2, 20, 34], "intel": [2, 3, 4, 7, 8, 9, 10, 11, 13, 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 29, 34], "extens": [2, 3, 4, 6, 9, 10, 13, 14, 16, 17, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34], "per": [2, 10, 15, 16, 20, 30, 31, 32, 33, 34], "some": [2, 5, 7, 8, 13, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "heurist": [2, 20, 34], "real": [2, 7, 14, 15, 30, 34], "best": [2, 6, 7, 8, 14, 16, 17, 22, 24, 28, 33, 34], "try": [2, 5, 6, 7, 12, 14, 16, 26, 31, 33, 34], "select": [2, 5, 7, 13, 24, 34], "true": [2, 4, 6, 10, 12, 13, 14, 15, 16, 17, 22, 23, 31, 32, 33, 34], "might": [2, 7, 18, 26, 33, 34], "cost": [2, 6, 28, 30, 33], "extra": [2, 5, 10, 20, 31, 32], "combin": [2, 12, 14, 28, 31, 34], "method": [2, 8, 15, 16, 18, 22, 26, 33, 34], "multipl": [2, 5, 7, 8, 16, 17, 18, 26, 28, 30, 32, 33, 34], "subgraph": 2, "modifi": [2, 5, 6], "other": [2, 6, 7, 8, 14, 17, 18, 19, 23, 28, 31, 33], "place": [2, 8, 28, 33, 34], "scenario": [2, 6, 7, 18, 33, 34], "convolutuon": 2, "counterpart": [2, 7, 18, 34], "pleas": [2, 6, 7, 11, 16, 22, 26, 28, 31, 33, 34], "invok": [2, 6, 8, 10, 13, 20, 23, 26, 29, 34], "ddp": [2, 6], "distribut": [2, 3, 7, 16, 31, 32, 33], "deepcopi": 2, "rather": [2, 18], "than": [2, 5, 7, 17, 18, 20, 21, 26, 33, 34], "allreduc": 2, "caus": [2, 7, 21, 26, 28, 31, 33, 34], "unpredict": 2, "accuraci": [2, 3, 6, 7, 8, 15, 16, 21, 22, 26, 28, 34], "loss": [2, 5, 6, 8, 16, 18, 21, 26], "exampl": [2, 5, 7, 8, 13, 18, 19, 21, 22, 23, 24, 25, 28, 29, 32, 33, 34], "load_state_dict": [2, 34], "path": [2, 6, 7, 14, 18, 20, 23, 31, 33, 34], "eval": [2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "optimized_model": [2, 34], "evalu": [2, 16, 34], "optimized_optim": 2, "altern": [2, 6, 18], "motiv": [2, 20], "ad": [2, 7, 10, 33, 34], "alia": 2, "unifi": [2, 31], "style": [2, 5], "modular": 2, "float32": [2, 13, 21, 23, 26, 30, 31, 34], "quantization_config": [2, 6, 29], "qconfig_summary_fil": [2, 6, 29], "low_precision_checkpoint": [2, 6, 29], "deployment_mod": [2, 6, 23], "transform": [2, 3, 4, 6, 10, 11, 13, 16, 18, 22, 23, 28, 32, 33, 34], "focu": [2, 10, 18, 29, 34], "especi": [2, 5, 28, 34], "task": [2, 7, 28, 31, 33, 34], "famili": [2, 28, 33], "full": [2, 5, 18, 32, 33, 34], "llama": [2, 3, 6, 28], "gpt": [2, 28, 30], "j": [2, 5, 17, 28, 30], "neox": [2, 28], "opt": [2, 6, 17, 28], "falcon": [2, 28], "bloom": [2, 28], "codegen": [2, 28, 34], "baichuan": [2, 28, 34], "chatglm": [2, 28], "gptbigcod": [2, 28], "t5": [2, 26, 28, 34], "mistral": [2, 28, 34], "mpt": [2, 28, 34], "mixtral": [2, 28], "stablelm": [2, 28], "qwen": [2, 28], "git": [2, 5, 28], "llava": [2, 28], "yuan": [2, 28], "phi": [2, 28], "scope": [2, 7, 8, 21, 34], "abov": [2, 5, 10, 19, 28, 30, 31, 32], "transpar": [2, 7, 29, 33, 34], "benifit": 2, "float": [2, 6, 7, 8, 14, 15, 16, 17, 21, 29, 34], "when": [2, 5, 6, 7, 8, 9, 14, 18, 19, 20, 21, 22, 25, 26, 28, 30, 31, 32, 33, 34], "mix": [2, 6, 13, 23, 26, 28, 34], "str": [2, 6, 14, 23, 31], "specifi": [2, 5, 6, 14, 20, 31, 33, 34], "either": [2, 26, 31], "object": [2, 6, 7, 14, 17, 20, 33, 34], "defin": [2, 5, 6, 7, 8, 10, 16, 17, 18, 22, 32], "recip": [2, 4, 7, 13, 15, 26, 28, 34], "quant": [2, 16], "static": [2, 4, 16, 26, 28, 31, 32, 33, 34], "onc": [2, 5, 6, 14, 17, 18, 20, 21, 32, 33], "quantizat": 2, "config": [2, 6, 11, 23, 31, 32], "json": [2, 6, 15, 16, 32, 34], "file": [2, 4, 5, 6, 8, 14, 15, 16, 17, 18, 31, 34], "under": [2, 6, 8, 18, 20, 27, 31, 34], "need": [2, 5, 6, 7, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 29, 31, 32, 33, 34], "calibr": [2, 13, 22, 26, 29, 30, 32, 34], "dict": [2, 6, 23], "int4": [2, 28, 29, 34], "": [2, 3, 5, 8, 10, 14, 15, 18, 19, 20, 21, 22, 26, 31, 32, 33], "should": [2, 5, 8, 15, 20, 28, 31, 33], "state_dict": [2, 6], "checkpoint": [2, 6, 29], "pt": [2, 6, 13, 14, 15, 23, 32, 34], "gptq": [2, 6, 34], "etc": [2, 5, 6, 17, 34], "where": [2, 5, 7, 16, 21, 33], "kei": [2, 7, 28, 34], "scale": [2, 3, 6, 15, 28], "zero": [2, 6, 15, 34], "point": [2, 6, 8, 15, 21, 33, 34], "bia": [2, 8, 20, 34], "weight_kei": 2, "packed_weight": 2, "scale_kei": 2, "zero_point_kei": 2, "packed_zp": 2, "bias_kei": 2, "chang": [2, 5, 6, 7, 8, 10, 11, 12, 15, 17, 18, 20, 23, 25, 26, 29, 31], "make": [2, 5, 6, 7, 14, 15, 17, 21, 23, 28, 32, 33], "n": [2, 6, 7, 16, 18, 19, 20, 26, 32, 33, 34], "thei": [2, 7, 8, 31, 33], "uint4": 2, "compress": 2, "along": [2, 5, 6, 21, 33, 34], "store": [2, 17, 18, 19, 21, 28, 31, 32, 33, 34], "int32": 2, "state": [2, 15, 19, 28], "automaticlli": 2, "deploy": [2, 7, 13, 34], "torchscirpt": 2, "workabl": 2, "forward": [2, 6, 8, 13, 16, 20, 21, 26, 32, 33, 34], "after": [2, 5, 7, 13, 20, 21, 23, 24, 32, 33, 34], "deepspe": [2, 34], "parallel": [2, 5, 6, 7, 28, 33, 34], "class": [2, 5, 6, 7, 8, 10, 16, 20, 26, 34], "verbos": [2, 4, 31], "demand": [2, 7], "easier": [2, 18, 21], "debug": [2, 31], "dump": [2, 31], "messag": [2, 6, 10, 12, 18, 31], "contain": [2, 5, 6, 13, 17, 26, 31, 32, 33, 34], "durat": [2, 21], "while": [2, 7, 8, 11, 12, 18, 21, 26, 28, 32, 33, 34], "via": [2, 5, 6, 7, 18, 20, 30, 31, 33, 34], "environ": [2, 5, 6, 17, 20, 24, 28, 30, 31, 32, 33], "variabl": [2, 5, 17, 30, 31, 32, 33, 34], "name": [2, 5, 7, 14, 17, 25, 28, 31, 32, 33, 34], "dnnl_verbos": 2, "howev": [2, 5, 7, 8, 9, 16, 20, 26, 28, 31, 33, 34], "those": [2, 15, 33], "amount": [2, 16, 26, 28, 33], "investig": [2, 31], "singl": [2, 7, 13, 14, 16, 19, 20, 30, 32, 34], "iter": [2, 16, 21, 28, 34], "out": [2, 5, 6, 7, 8, 10, 13, 16, 19, 20, 30, 31, 33, 34], "second": [2, 10, 28, 32, 33], "verbose_on": 2, "verbose_off": 2, "disabl": [2, 6, 7, 13, 26, 31, 33, 34], "verbose_on_cr": 2, "creation": 2, "linearsilu": [2, 34], "silu": [2, 13], "http": [2, 5, 16, 34], "org": [2, 7, 16, 26, 34], "stabl": [2, 3, 8, 34], "html": [2, 5, 16], "output": [2, 6, 7, 8, 13, 14, 16, 18, 23, 26, 34], "same": [2, 5, 7, 10, 15, 16, 17, 18, 20, 21, 28, 31, 32, 33, 34], "init": [2, 5, 15, 34], "linear_modul": 2, "4096": [2, 33], "ipex_fus": 2, "randn": [2, 10, 13, 16, 18, 32, 34], "linearsilumul": [2, 34], "multipli": 2, "mul": [2, 13, 16], "linear2silumul": [2, 34], "linear_": 2, "linear_m": 2, "two": [2, 7, 14, 16, 20, 21, 28, 32, 33, 34], "linear_s_modul": 2, "linear_m_modul": 2, "linearrelu": [2, 34], "relu": [2, 7, 13, 16, 18, 26, 34], "linearnewgelu": [2, 34], "newgeluactiv": 2, "com": [2, 5, 34], "huggingfac": [2, 6, 26, 28, 32, 34], "blob": 2, "src": [2, 17], "activ": [2, 6, 7, 15, 16, 20, 28, 31, 33], "py": [2, 5, 10, 14, 20, 31, 32, 34], "l50": 2, "new_gelu": 2, "lineargelu": [2, 34], "gelu": [2, 13, 34], "linearmul": [2, 34], "linearadd": [2, 34], "add": [2, 5, 7, 8, 13, 14, 19, 21, 32, 34], "linearaddadd": [2, 34], "other_1": 2, "other_2": 2, "rotaryembed": [2, 34], "max_position_embed": 2, "int": [2, 6, 7, 14, 17, 23, 26, 29, 31, 34], "pos_embd_dim": 2, "10000": 2, "backbon": 2, "co": 2, "paper": [2, 34], "2104": 2, "09864": 2, "queri": [2, 17, 18], "multi": [2, 7, 14, 20, 28, 31, 33, 34], "head": [2, 34], "comput": [2, 6, 7, 13, 15, 16, 18, 20, 21, 28, 30, 31, 32, 33, 34], "max": [2, 6, 16, 17, 22, 23, 26, 34], "posit": [2, 28, 33, 34], "frequenc": [2, 30], "exact": 2, "g": [2, 7, 8, 16, 17, 18, 28, 34], "gptjforcausallm": 2, "architectur": [2, 28, 30, 33], "eleutherai": [2, 28], "6b": [2, 28, 30], "l4": 2, "batch": [2, 6, 7, 13, 16, 18, 20, 23, 26, 30, 32, 34], "sequenc": [2, 18, 21, 28, 34], "length": [2, 5, 14, 21, 26, 30, 34], "num_head": 2, "num_kv_head": 2, "head_dim": 2, "position_id": [2, 6], "element": [2, 18, 19], "past_kv_length": 2, "id": [2, 31, 32], "construct": [2, 7, 13], "current_posit": 2, "num": [2, 20, 32, 33, 34], "dim": [2, 6, 18, 23], "offset": [2, 18, 28], "sin": 2, "neighbor": 2, "rotary_dim": 2, "rotary_ndim": 2, "rotari": [2, 28], "64": [2, 8, 10, 16, 20, 30, 31, 34], "gptj": 2, "rope_modul": 2, "2048": [2, 6], "32": [2, 6, 18, 21, 23, 30, 31, 32], "16": [2, 17, 20, 21, 30, 31, 32], "256": [2, 30], "arang": [2, 6, 16], "unsqueez": 2, "query_roteri": 2, "direct": [2, 5, 13], "apply_funct": 2, "without": [2, 5, 6, 7, 8, 10, 16, 20, 21, 26, 32, 34], "initi": [2, 20, 32], "assum": [2, 7, 8, 23, 32, 33, 34], "arg": [2, 4, 6, 7, 14, 16, 19, 23, 31, 32, 34], "num_token": 2, "rotary_half": 2, "rmsnorm": [2, 28, 34], "hidden_s": [2, 6], "ep": [2, 7, 10, 19], "1e": [2, 7, 10, 16], "06": [2, 31, 32], "hidden": [2, 18, 28], "modeling_llama": 2, "l76": 2, "variance_epsilon": 2, "6": [2, 5, 7, 11, 14, 20, 30, 31, 32, 33, 34], "ones": [2, 6, 17], "hidden_st": 2, "usual": [2, 18, 20, 33], "rmsnorm_modul": 2, "fastlayernorm": [2, 34], "normalized_shap": 2, "layernorm": [2, 13, 16, 22, 34], "list": [2, 5, 7, 8, 13, 14, 16, 18, 25, 29, 31, 32, 33, 34], "denomin": 2, "numer": [2, 8, 33], "stabil": [2, 8, 34], "layernorm_modul": 2, "05": [2, 7, 10, 30, 31], "expect": [2, 7, 30, 34], "indirectaccesskvcacheattent": [2, 34], "text_max_length": 2, "kv_cach": [2, 28], "decod": [2, 28, 30, 34], "layer": [2, 16, 20, 22, 28, 34], "bring": [2, 6, 7, 9, 15, 16, 21, 28, 31, 33, 34], "beam": [2, 28], "idx": [2, 28, 31], "concat": [2, 20, 26, 28, 34], "entir": [2, 16, 28], "context": [2, 5, 6, 8, 20, 28, 33], "dot": [2, 7, 18, 28], "veri": [2, 5, 15, 18, 28], "long": [2, 6, 18, 21, 26, 28, 34], "bottleneck": [2, 28], "indirect": 2, "access": [2, 6, 7, 18, 19, 32], "iakv": [2, 28], "firstli": [2, 28], "pre": [2, 28, 34], "alloc": [2, 10, 20, 28, 30, 32, 34], "buffer": [2, 28], "index": [2, 5, 18, 28, 33], "histori": [2, 14, 28], "decid": [2, 15, 20, 28], "timestamp": [2, 28], "max_seq": 2, "head_num": 2, "head_siz": 2, "token": [2, 6, 23, 28, 30], "everi": [2, 28], "kv": 2, "seq_len": [2, 30], "scale_attn": 2, "sqrt": [2, 13, 19], "layer_past": 2, "seq_info": 2, "key_cach": 2, "value_cach": 2, "info": [2, 6, 17, 26, 31, 32, 34], "head_mask": 2, "mask": [2, 7, 17, 26], "yet": [2, 6, 26, 34], "attention_mask": [2, 6], "attn_output": 2, "attn_weight": 2, "first": [2, 3, 5, 6, 7, 9, 10, 12, 16, 19, 20, 21, 26, 31, 32, 33], "matmul": [2, 8, 13, 26, 34], "new_layer_past": 2, "l1318": 2, "def": [2, 6, 8, 10, 16, 20, 26, 34], "_reorder_cach": 2, "self": [2, 6, 8, 10, 16, 20, 26, 34], "past_key_valu": [2, 6], "beam_idx": 2, "len": [2, 6, 7, 13, 16, 17], "4": [2, 6, 11, 13, 14, 18, 20, 23, 28, 30, 31, 33, 34], "3": [2, 5, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 20, 21, 28, 30, 31, 33], "pagedattent": [2, 34], "vllm": 2, "blog": [2, 34], "2023": [2, 3, 30], "20": [2, 7, 18, 30, 31, 32, 34], "page": [2, 6, 13, 20, 24, 29, 30, 33, 34], "num_block": 2, "block_siz": 2, "basic": [2, 4, 16, 21, 33], "logic": [2, 14, 18, 32, 33], "dram": 2, "manag": [2, 8, 13, 20, 28, 31], "slot": [2, 30], "reshape_and_cach": 2, "single_query_cached_kv_attent": 2, "mha": [2, 34], "intra": 2, "tabl": [2, 7, 17, 28, 30, 34], "map": [2, 6, 18, 30], "physic": [2, 14, 20, 32, 33], "slot_map": 2, "allcat": 2, "keytensor": 2, "num_seq": 2, "_i_": 2, "block_numb": 2, "head_map": 2, "block_tabl": 2, "context_len": 2, "max_context_len": 2, "alibi_slop": 2, "5": [2, 6, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 26, 28, 30, 31, 32, 33, 34], "max_num_blocks_per_seq": 2, "optin": 2, "alibi": 2, "slope": 2, "varlenattent": [2, 34], "scaled_dot_product_attent": 2, "accept": [2, 34], "variant": [2, 8, 28], "among": [2, 31, 32, 33], "doe": [2, 7, 13, 18, 20, 26, 34], "query_token": 2, "total": [2, 6, 30, 33], "key_token": 2, "value_token": 2, "seqlen_q": 2, "batch_siz": [2, 6, 11, 13, 16, 18, 23, 32], "seqlen_k": 2, "max_seqlen_q": 2, "max_seqlen_k": 2, "pdropout": 2, "probabl": 2, "greater": 2, "softmax_scal": 2, "factor": [2, 6, 16, 31], "softmax": [2, 13, 34], "is_caus": 2, "causal": 2, "varlenattention_modul": 2, "emply_lik": 2, "rotary_embed": [2, 34], "rms_norm": [2, 34], "fast_layer_norm": [2, 34], "indirect_access_kv_cache_attent": [2, 34], "add_casual_mask": 2, "varlen_attent": [2, 34], "zero_tensor": 2, "return_softmax": 2, "gen_": 2, "fast_bert": [2, 4, 6, 7, 11, 34], "unpad": 2, "tpp": [2, 28], "speedup": [2, 6, 8, 28, 30, 34], "still": [2, 5, 7, 8, 13, 16, 18, 21, 26, 34], "squenc": 2, "sparsiti": 2, "seed": 2, "libxsmm": 2, "though": [2, 7], "peak": [2, 7, 11, 34], "enable_onednn_fus": [2, 13], "get_smooth_quant_qconfig_map": [2, 6, 29], "alpha": [2, 6, 19, 22], "act_observ": 2, "act_ic_observ": 2, "wei_observ": 2, "wei_ic_observ": 2, "share_weight_observ": 2, "smoothquant": [2, 6, 7, 16, 22, 28, 34], "arxiv": 2, "pdf": 2, "2211": 2, "10438": 2, "hyper": [2, 30, 33, 34], "observ": [2, 9, 13, 15, 34], "op": [2, 7, 15, 16, 22, 28, 34], "histogramobserv": [2, 15], "q": [2, 28], "min": [2, 16, 22, 26, 34], "affect": [2, 31], "argument": [2, 6, 7, 22, 26, 31], "ao": [2, 6, 15], "minmaxobserv": [2, 6, 15], "channel": [2, 3, 10, 15, 16, 26, 34], "perchannelminmaxobserv": [2, 6, 15], "with_arg": [2, 6, 15], "ch_axi": 2, "qint8": [2, 6, 15], "qscheme": [2, 6, 15, 34], "per_channel_symmetr": [2, 6, 15], "qconfig": [2, 4, 6, 13, 16, 26, 29, 32, 34], "prepar": [2, 4, 6, 13, 16, 26, 29, 32, 34], "example_input": [2, 4, 6, 13, 15, 29, 32, 34], "bn_fold": 2, "example_kwarg_input": 2, "fp32": [2, 4, 16, 17, 19, 21, 23, 28, 34], "A": [2, 5, 6, 7, 10, 11, 17, 26, 28, 31, 33, 34], "even": [2, 5, 7, 33, 34], "prepared_model": [2, 4, 6, 13, 15, 16, 26, 29, 34], "original_model": 2, "later": [2, 7, 25, 33], "unexpect": 2, "behavior": [2, 20, 31, 33], "insert": [2, 16], "fake": 2, "introduct": [2, 7, 28, 33, 34], "avaiabl": 2, "autotun": [2, 4, 22, 34], "calib_dataload": [2, 6, 16, 34], "calib_func": 2, "eval_func": [2, 16, 34], "op_type_dict": 2, "smoothquant_arg": [2, 16], "sampling_s": [2, 4, 16, 34], "accuracy_criterion": [2, 4, 16, 34], "tuning_tim": [2, 4, 16, 34], "driven": 2, "tune": [2, 3, 4, 7, 8, 15, 20, 26, 28, 31, 32, 34], "help": [2, 5, 6, 17, 23, 28, 31, 33, 34], "quickli": 2, "dataload": [2, 6, 10, 13, 16, 20, 22, 29, 34], "post": [2, 4, 5, 7, 15, 28, 34], "process": [2, 6, 7, 11, 12, 14, 16, 19, 20, 21, 26, 31, 32, 33], "metric": [2, 16, 30], "scalar": 2, "higher": [2, 7, 13, 17, 18, 28], "constraint": [2, 34], "optyp": 2, "wise": [2, 16, 19, 22, 29, 34], "space": [2, 7, 16, 18, 22, 33], "global": [2, 20, 22, 34], "algorithm": [2, 13, 18, 30, 34], "would": [2, 5, 6, 14, 16, 17, 18, 30, 31, 32, 33, 34], "explor": 2, "100": [2, 4, 14, 16, 17, 30, 32], "accuracy_criterion_typ": 2, "rel": [2, 4, 16, 31, 34], "absolut": [2, 31], "accuracy_criterion_valu": 2, "maximum": [2, 16, 17], "allow": [2, 8, 14, 16, 22, 31, 33, 34], "01": [2, 4, 7, 16, 31, 32, 34], "timeout": [2, 5, 21], "earli": [2, 34], "stop": [2, 33], "is_runtime_ext_en": 2, "helper": 2, "check": [2, 5, 6, 7, 13, 18, 28, 29, 31, 34], "exetens": 2, "openmp": [2, 7, 20, 26, 30, 32, 34], "preload": [2, 31], "cpupool": [2, 20, 34], "core_id": [2, 20, 31], "node_id": [2, 20, 31, 32, 34], "abstract": [2, 11, 20], "pool": [2, 20, 34], "core": [2, 7, 14, 17, 30, 33, 34], "numa": [2, 20, 31, 32, 34], "node": [2, 20, 30, 32, 33, 34], "pin": [2, 20], "cpu_pool": [2, 20, 34], "region": [2, 8, 17, 33], "design": [2, 5, 8, 18, 21, 29, 34], "decor": 2, "multistreammodulehint": [2, 20, 34], "kwarg": [2, 29], "hint": [2, 20], "multistreammodul": [2, 7, 20, 26, 34], "its": [2, 6, 7, 8, 14, 17, 21, 28, 30, 31, 32, 33, 34], "arbitrari": 2, "keyword": 2, "num_stream": [2, 20, 34], "auto": [2, 6, 10, 17, 18, 22, 23, 26, 28, 31, 33, 34], "concat_output": 2, "input_split_hint": [2, 20], "multi_stream": 2, "output_concat_hint": [2, 20], "stream": [2, 7, 20, 34], "throughput": [2, 3, 18, 20, 26, 28, 30, 34], "insid": [2, 5, 20, 31], "divis": [2, 20], "equal": [2, 15, 20, 32, 33], "remaind": [2, 20], "divisor": [2, 20], "batchsiz": [2, 20], "larger": [2, 20, 30, 33], "piec": [2, 20], "less": [2, 8, 18, 20, 26, 34], "mini": [2, 20, 34], "don": [2, 5, 8, 14, 17, 34], "want": [2, 5, 7, 14, 15, 17, 20, 31, 34], "leav": [2, 20, 33], "scriptmodul": [2, 13, 20], "union": 2, "instanc": [2, 7, 10, 14, 32, 34], "reason": [2, 10, 18, 20, 34], "flag": [2, 5, 7, 17, 20, 31, 34], "indic": [2, 6, 18, 28], "concaten": [2, 21], "raw": 2, "asynchron": [2, 7], "get_core_list_of_node_id": 2, "softwar": [3, 27, 34], "jul": 3, "deep": [3, 7, 8, 11, 13, 14, 21, 33], "learn": [3, 7, 8, 11, 13, 14, 21, 31, 33], "boost": [3, 6, 7, 9, 21, 30, 31, 33, 34], "dl": [3, 7, 34], "hug": 3, "face": 3, "bert": [3, 4, 10, 30, 34], "googl": [3, 5, 28], "cloud": 3, "platform": [3, 7, 18, 32, 33, 34], "gcp": 3, "technologi": [3, 7], "guid": [3, 6, 7, 17, 32, 34], "apr": 3, "mar": [3, 32], "new": [3, 5, 12, 16, 17, 18, 20, 23, 26, 29, 33], "x86": 3, "sapphir": 3, "rapid": 3, "part": [3, 5, 7, 8, 18, 21, 26, 33, 34], "jan": 3, "secur": 3, "torchserv": [3, 34], "confer": 3, "dec": 3, "2022": [3, 31, 32], "what": [3, 5, 6, 8, 23], "pyg": 3, "diffus": [3, 34], "arc": 3, "nov": 3, "13": [3, 10, 17, 30, 31, 32, 33], "potenti": [3, 7, 34], "fine": [3, 20, 31, 32, 33, 34], "fx": [3, 7, 10, 26, 34], "sep": [3, 17], "empow": 3, "xeon": [3, 7, 14, 21, 28, 30, 32, 33, 34], "scalabl": [3, 7, 21, 28, 30, 33, 34], "processor": [3, 7, 19, 21, 28, 30, 33, 34], "aug": [3, 30], "vision": [3, 6, 30], "last": [3, 10, 21, 26, 34], "One": [3, 18, 19, 31, 33], "click": 3, "compressor": [3, 7, 16, 22, 34], "4x": 3, "jun": 3, "grokk": 3, "principl": [3, 18], "kt": 3, "person": 3, "text": [3, 6, 26, 28, 30, 33], "speech": [3, 33], "2021": [3, 17, 31, 32], "up": [3, 7, 11, 20, 24, 28, 33, 34], "modern": 3, "naver": 3, "low": [3, 4, 6, 7, 21, 23, 31, 33, 34], "latenc": [3, 14, 18, 28, 30, 32, 34], "machin": [3, 5, 6, 7, 14, 17, 26, 31, 32, 33, 34], "feb": 3, "dlrm": [3, 7, 26, 30, 34], "oneccl": [3, 6, 31, 34], "mention": [3, 10, 20, 21, 34], "deprec": [3, 26], "facebook": [3, 6, 28], "3rd": [3, 7, 21, 30, 34], "gen": [3, 30, 34], "capabl": [3, 17, 34], "2020": 3, "collabor": 3, "2019": 3, "caff": 3, "2017": 3, "command": [4, 5, 6, 14, 23, 31, 32, 33, 34], "descript": [4, 7, 16, 18, 20, 25, 33, 34], "instal": [4, 5, 6, 23, 25, 26, 28, 33, 34], "m": [4, 14, 20, 26, 31, 32, 33, 34], "pip": [4, 5, 34], "captur": [4, 34], "log": [4, 6, 13, 31, 32, 34], "prompt": [4, 6, 23, 34], "export": [4, 31, 33], "onednn_verbos": 4, "dure": [4, 6, 7, 10, 13, 16, 21, 31, 33, 34], "precis": [4, 6, 13, 21, 23, 26, 30, 34], "no_grad": [4, 6, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "amp": [4, 6, 10, 23, 26, 34], "autocast": [4, 6, 7, 10, 23, 34], "prototyp": [4, 13, 20, 26, 34], "fast": [4, 12, 33, 34], "bertmodelmodel": 4, "bertmodel": [4, 6, 11, 32], "from_pretrain": [4, 6, 11, 23, 29, 32], "uncas": [4, 6, 10, 11, 32, 34], "launch": [4, 6, 20, 32, 34], "autom": [4, 7, 8, 14, 31, 32, 34], "ipexrun": [4, 10, 31, 34], "lt": [4, 28, 30], "your_pytorch_script": [4, 31], "gt": [4, 14, 28, 33], "hypertun": [4, 34], "hyperparamet": [4, 7], "conf": [4, 13, 14, 31, 34], "your_conf_fil": [4, 34], "your_python_script": [4, 34], "default_static_qconfigprepared_model": 4, "anyplac": 4, "d": [4, 5, 6, 7, 8, 13, 26, 28, 34], "calibration_data_load": [4, 6, 13], "converted_model": [4, 6, 26, 34], "default_dynamic_qconfigprepared_model": 4, "tuned_model": [4, 16, 34], "eval_funct": 4, "convert_model": [4, 13, 15, 16], "thank": [5, 34], "interest": 5, "begin": 5, "intent": 5, "propos": [5, 7, 11, 16, 18, 21], "intend": 5, "shall": [5, 18, 33], "discuss": [5, 18, 33], "agre": 5, "plan": [5, 7, 10], "look": [5, 14, 16, 18], "ahead": 5, "outstand": 5, "pick": 5, "comment": [5, 14, 17, 22, 34], "particular": [5, 6, 8, 29, 34], "ask": 5, "pull": 5, "here": [5, 8, 10, 13, 16, 17, 18, 20, 26, 32, 33, 34], "uninstal": 5, "ll": [5, 32, 33], "know": 5, "fulli": [5, 15, 17, 21, 33, 34], "warn": [5, 6, 12, 31, 32, 34], "skip": [5, 6, 17, 18, 31], "few": [5, 7, 9, 13, 16, 18, 32, 34], "alwai": [5, 6, 7, 8, 18, 31, 33, 34], "loop": [5, 21, 29], "re": [5, 8, 32, 33], "feel": [5, 18, 34], "lazi": 5, "ye": 5, "clone": 5, "copi": [5, 17, 18], "cd": [5, 6], "rebas": [5, 34], "submodul": 5, "sync": [5, 20], "recurs": 5, "job": 5, "setup": [5, 6, 28, 34], "symlink": 5, "tree": [5, 6], "reinstal": [5, 26], "again": [5, 19, 32], "__init__": [5, 6, 8, 10, 16, 20, 26, 34], "repeatedli": 5, "interfac": [5, 6, 18, 26, 28], "pyi": 5, "non": [5, 8, 13, 18, 30, 32, 34], "cpp": [5, 6, 33], "cc": [5, 6, 17], "cu": 5, "h": [5, 6, 7, 16, 18, 26, 31, 32], "sure": [5, 14, 15, 32, 33], "until": [5, 20, 21, 33], "next": [5, 7, 34], "clean": 5, "cmake": [5, 6, 17, 34], "must": [5, 14, 17, 19], "maco": 5, "linux": [5, 6, 17, 30, 31, 33], "homebrew": 5, "brew": 5, "our": [5, 16, 19, 28, 33, 34], "error": [5, 6, 7, 10, 16, 18, 21, 22, 26, 34], "printf": 5, "stdio": 5, "nint": 5, "hello": 5, "world": [5, 7], "clang": 5, "simpl": [5, 7, 8, 11, 18, 33, 34], "binari": [5, 6, 7, 8, 17, 34], "folder": 5, "mani": [5, 14, 28, 31, 33, 34], "wai": [5, 10, 16, 18, 28, 34], "rm": 5, "rf": 5, "toplevel": 5, "over": [5, 7, 8, 9, 16, 18, 30, 31, 34], "made": [5, 34], "edit": [5, 26, 34], "repo": [5, 6, 7], "commit": 5, "ani": [5, 8, 10, 17, 18, 32, 34], "keep": [5, 12, 18, 21, 28, 32, 33, 34], "realli": 5, "untrack": 5, "deinit": 5, "f": [5, 6, 13, 16, 28, 34], "xdf": 5, "within": [5, 16, 21, 29, 33, 34], "experi": [5, 7, 10, 12, 16, 18, 26, 33, 34], "env_key1": 5, "env_val1": 5, "env_key2": 5, "env_val2": 5, "suit": 5, "locat": [5, 17, 34], "test_": 5, "individu": [5, 30], "filenam": 5, "repres": [5, 7, 21], "wish": [5, 7], "test_jit": 5, "narrow": 5, "down": [5, 32, 34], "testclassnam": 5, "testnam": 5, "let": [5, 10, 18, 19, 20, 21], "sai": 5, "test_sequenti": 5, "testjit": 5, "expecttest": 5, "hypothesi": 5, "mypi": 5, "depend": [5, 7, 17, 18, 25, 26, 33, 34], "conda": [5, 33], "offici": [5, 32, 33, 34], "unittest": 5, "substr": 5, "test_nn": 5, "v": 5, "testnn": 5, "test_bceloss": 5, "test_mseloss": 5, "keystrok": 5, "ci": 5, "quicklint": 5, "aren": 5, "setup_lint": 5, "target": [5, 6, 10, 13, 14, 17, 34], "makefil": 5, "complet": [5, 6, 14, 18, 29, 33], "tab": 5, "trail": [5, 21], "newlin": 5, "quick_check": 5, "flake8": 5, "cmakelint": 5, "tidi": 5, "changed_onli": 5, "written": [5, 6, 17], "framework": [5, 34], "runner": 5, "bin": [5, 6, 17, 31, 32], "gtest_filt": 5, "testsuit": 5, "maycontainalia": 5, "containeraliasingtest": 5, "test_alias_analysi": 5, "docstr": 5, "line": [5, 10, 13, 18, 31, 32, 33], "limit": [5, 8, 10, 20, 26, 32, 33, 34], "80": [5, 30, 31], "charact": 5, "fit": [5, 7, 33, 34], "jupyt": 5, "popup": 5, "prerequisit": [5, 6], "r": [5, 6, 7, 14, 23, 30, 32, 33], "txt": [5, 6, 32], "_build": 5, "rst": 5, "live": 5, "tutori": [5, 6, 15, 16, 34], "autofunct": 5, "autoclass": 5, "shorten": 5, "sphinx": 5, "produc": [5, 8], "miss": 5, "relat": [6, 13, 17, 31, 33, 34], "demonstr": [6, 18, 26, 32], "box": [6, 10, 33], "benefit": [6, 7, 8, 10, 20, 21, 28, 32, 33, 34], "against": 6, "below": [6, 8, 10, 14, 19, 20, 21, 22, 23, 26, 28, 31, 32, 33, 34], "criterion": [6, 8, 16, 22], "zero_grad": [6, 7, 16], "torchvis": [6, 10, 12, 13, 16, 18, 32, 34], "lr": [6, 7, 8, 16, 19], "001": [6, 8], "download": [6, 13, 16], "dataset": [6, 13, 16, 29, 30, 33, 34], "cifar10": [6, 13], "compos": [6, 13], "resiz": [6, 13], "224": [6, 8, 10, 12, 13, 30, 32, 34], "totensor": [6, 13, 16], "train_dataset": [6, 13], "root": [6, 13, 16, 17, 28], "train_load": [6, 8], "128": [6, 8, 10, 13, 20, 30, 34], "crossentropyloss": [6, 16], "momentum": [6, 10, 21], "9": [6, 7, 14, 17, 23, 25, 31, 32], "uncom": 6, "batch_idx": [6, 13], "enumer": [6, 13, 16, 29], "backward": [6, 7, 8, 16, 21, 33, 34], "print": [6, 11, 12, 13, 14, 16, 17, 23, 31], "model_state_dict": 6, "optimizer_state_dict": 6, "pth": 6, "finish": [6, 11, 12, 13, 16, 20], "noqa": [6, 11, 12, 13, 16, 23, 29], "f401": [6, 11, 12, 13, 16, 23, 29], "oneapi": [6, 33], "collect": [6, 32, 33, 34], "commun": [6, 28, 31, 32, 33, 34], "bind": [6, 7, 31, 32, 33, 34], "o": [6, 17, 23, 30], "dist": 6, "oneccl_bindings_for_pytorch": 6, "torch_ccl": 6, "master_addr": 6, "127": [6, 31, 34], "master_port": 6, "29500": [6, 31], "rank": [6, 31, 34], "pmi_rank": 6, "world_siz": [6, 29], "pmi_siz": [6, 29], "init_process_group": 6, "ccl": [6, 31, 34], "init_method": 6, "env": [6, 29], "dist_sampl": 6, "distributedsampl": 6, "sampler": 6, "distributeddataparallel": 6, "batch_id": 6, "destroy_process_group": 6, "nlp": [6, 7, 26, 30, 34], "resnet50_weight": [6, 12, 13], "rand": [6, 8, 12, 13, 20, 26, 34], "vocab_s": [6, 11, 32], "seq_length": [6, 11, 32], "randint": [6, 11, 32], "freez": [6, 8, 10, 13, 15, 16, 20, 23, 26, 32, 34], "check_trac": [6, 13, 32], "strict": [6, 32], "sinc": [6, 7, 18, 19, 20, 21, 26, 33, 34], "manual_se": [6, 11], "43": [6, 11, 31, 32], "12": [6, 10, 14, 17, 30, 31, 32], "instanti": 6, "qconfig_map": 6, "default_static_qconfig_map": 6, "own": [6, 15, 28], "qconfigmap": 6, "per_tensor_affin": [6, 15, 34], "quint8": [6, 15], "set_glob": 6, "traced_model": [6, 10, 13, 15, 16, 26, 34], "static_quantized_model": 6, "local": [6, 20, 28, 31, 32, 33], "default_dynamic_qconfig_map": 6, "placeholderobserv": [6, 15], "is_dynam": [6, 15], "dynamic_quantized_model": 6, "dedic": [6, 28, 34], "faster": [6, 7, 8, 30, 33], "variou": [6, 7, 14, 28, 33, 34], "38": [6, 11, 31, 32], "account": 6, "pretrain": [6, 32, 34], "login": 6, "argpars": [6, 23], "autoconfig": [6, 23], "automodelforcausallm": [6, 23, 29, 34], "autotoken": [6, 23], "parser": [6, 23], "argumentpars": [6, 23], "add_help": [6, 23], "add_argu": [6, 23], "choic": [6, 21, 23, 31], "choos": [6, 8, 20, 23, 31, 33, 34], "dinner": [6, 23], "greedi": [6, 23], "action": [6, 23], "store_tru": [6, 23], "parse_arg": [6, 23], "amp_en": [6, 23], "els": [6, 14, 17, 18, 23], "amp_dtyp": [6, 23], "getattr": [6, 23], "model_id": [6, 23], "125m": 6, "trust_remote_cod": [6, 23], "torch_dtyp": [6, 23], "low_cpu_mem_usag": [6, 23], "memory_format": [6, 7, 18, 23], "channels_last": [6, 7, 18, 23, 33, 34], "num_beam": [6, 23], "generate_kwarg": [6, 23], "do_sampl": [6, 23], "temperatur": [6, 23], "input_s": [6, 23], "return_tensor": [6, 23], "input_id": [6, 23], "inference_mod": [6, 23, 29], "gen_id": [6, 23], "max_new_token": [6, 23], "gen_text": [6, 23], "batch_decod": [6, 23], "skip_special_token": [6, 23], "input_tokens_length": [6, 23], "output_tokens_length": [6, 23], "total_new_token": [6, 23], "zip": [6, 23, 34], "flush": [6, 23], "typic": [6, 10, 28, 33, 34], "summari": [6, 34], "narg": 6, "neelnanda": 6, "pile": 6, "10k": 6, "meta": [6, 18, 28, 29], "7b": [6, 28, 30], "hf": [6, 28], "beam_idx_tmp": 6, "contigu": [6, 13, 18, 33, 34], "global_past_key_valu": 6, "num_attention_head": 6, "user_model": [6, 15], "num_hidden_lay": 6, "pad_val": 6, "pad_max": 6, "tokenize_funct": 6, "set_format": 6, "column": 6, "elif": 6, "collate_batch": 6, "position_ids_pad": 6, "input_ids_pad": 6, "last_ind": 6, "attention_mask_pad": 6, "append": [6, 7], "vstack": 6, "calib_dataset": [6, 29], "load_dataset": 6, "calib_evalu": 6, "shuffl": 6, "collate_fn": 6, "break": [6, 16, 34], "calibration_sampl": 6, "save_qconf_summari": [6, 15, 16, 29], "qconf_summari": [6, 15, 16, 29], "int8_qconfig": 6, "done": [6, 10, 16, 17, 26, 33, 34], "Will": [6, 18], "exit": [6, 31], "benchmark": [6, 26, 30, 31, 34], "lowp": 6, "fp16": [6, 17, 29], "unrel": 6, "lowp_mod": [6, 29], "fall": [6, 12], "back": [6, 12, 17, 18, 21, 26], "implicitli": 6, "determin": [6, 17, 21, 33], "woqweightdtyp": [6, 29], "weight_dtyp": [6, 29], "woqlowpmod": [6, 29], "get_weight_only_quant_qconfig_map": [6, 29], "known": [6, 10, 28], "practic": [6, 21, 24, 28, 33], "libtorch": [6, 34], "suppos": [6, 14, 33], "handl": [6, 18, 33], "servic": [6, 28, 30, 33], "regular": [6, 21], "unlik": 6, "app": [6, 34], "iostream": 6, "argc": 6, "const": [6, 17], "char": 6, "argv": 6, "catch": 6, "c10": [6, 17], "std": [6, 17, 19], "cerr": 6, "ivalu": 6, "push_back": 6, "cout": 6, "slice": [6, 18], "end": [6, 13, 20, 34], "endl": 6, "cmakelist": 6, "cmake_minimum_requir": 6, "version": [6, 7, 16, 17, 25, 26, 27, 32, 33, 34], "fatal_error": 6, "find_packag": 6, "add_execut": 6, "target_link_librari": 6, "torch_ipex_librari": 6, "set_properti": 6, "properti": [6, 32], "cxx_standard": 6, "17": [6, 30, 31, 32], "mkdir": 6, "build": [6, 28, 33, 34], "dcmake_prefix_path": 6, "libpytorch_path": 6, "had": [6, 33], "verifi": [6, 7], "ldd": 6, "workspac": 6, "identif": [6, 17], "gnu": [6, 17, 32], "xx": 6, "cxx": [6, 17], "abi": [6, 17, 34], "usr": [6, 17, 31, 32], "torchconfig": 6, "22": [6, 30, 31, 32], "kineto_librari": 6, "notfound": 6, "stack": [6, 8], "most": [6, 7, 13, 21, 28, 30, 32, 33, 34], "recent": [6, 7, 18], "append_torchlib_if_found": 6, "ipexconfig": 6, "84": [6, 30, 31, 33], "lib": [6, 31, 32], "libintel": [6, 34], "ext": [6, 34], "0x00007f3cf98e0000": 6, "libc10": 6, "0x00007f3cf985a000": 6, "0x00007f3cf70fc000": 6, "libtorch_cpu": 6, "0x00007f3ce16ac000": 6, "libdnnl_graph": 6, "0x00007f3cde954000": 6, "former": 6, "zoo": [6, 30], "simpli": [6, 7, 26, 31], "overview": [7, 25, 29, 34], "three": [7, 16, 17], "claus": [7, 10, 19], "guidanc": 7, "intel_pytorch_extens": [7, 25, 26, 34], "10": [7, 14, 16, 17, 18, 21, 25, 26, 31, 32, 33], "correct": [7, 18, 25, 34], "speed": [7, 11, 19, 28, 33, 34], "happen": 7, "inductor": [7, 34], "level": [7, 10, 13, 16, 18, 20, 21, 26, 33, 34], "migrat": 7, "pattern": [7, 11, 18, 28, 34], "highli": [7, 23, 28, 33, 34], "adapt": 7, "nchw": [7, 33], "nhwc": [7, 33, 34], "could": [7, 13, 16, 18, 26, 32, 33, 34], "anymor": [7, 34], "aka": [7, 18], "cooper": [7, 30, 34], "lake": [7, 30, 34], "avx512": [7, 17, 18, 32, 34], "partial": 7, "upstream": [7, 18, 34], "land": [7, 34], "pr": [7, 18, 34], "being": [7, 33], "review": [7, 34], "instead": [7, 8, 14, 19, 20, 29, 30, 31, 32, 33, 34], "device_nam": [7, 8], "conduct": 7, "frequent": 7, "websit": 7, "registr": 7, "topologi": [7, 18, 19, 26, 30, 31, 33, 34], "roialign": [7, 34], "nm": [7, 34], "cnn": [7, 18, 26, 30, 33, 34], "frozenbatchnorm2d": 7, "num_featur": 7, "batchnorm2d": [7, 10, 26, 34], "statist": 7, "affin": [7, 10, 15, 20, 31, 32, 33], "w": [7, 16, 18, 21, 30, 32], "interact": [7, 34], "beyond": 7, "kind": 7, "gender": 7, "hobbi": 7, "between": [7, 8, 17, 20, 33, 34], "man": [7, 33], "plai": [7, 33], "footbal": 7, "b": [7, 8, 16, 28], "mergedembeddingbag": 7, "embedding_spec": 7, "embeddingspec": 7, "merg": [7, 34], "embeddingbag": [7, 26, 34], "At": [7, 17], "stage": [7, 10, 19, 20, 29, 33, 34], "spars": [7, 18, 34], "dens": [7, 18], "gradient": 7, "mergedembeddingbagwithsgd": 7, "emblist": 7, "modulist": 7, "emb1": 7, "emb2": 7, "emb3": 7, "emb_m": 7, "in1": 7, "in2": 7, "in3": 7, "in_m": 7, "emb": 7, "in_i": 7, "merged_emb": 7, "from_embeddingbag_list": 7, "minim": [7, 14, 17, 33], "heavi": 7, "big": [7, 18], "read": [7, 19], "futur": [7, 28, 34], "visit": [7, 33], "mergedembeddingbagwith": 7, "weight_decai": [7, 19], "grad": [7, 19], "creat": [7, 16, 20, 33, 34], "decai": 7, "to_bfloat16_train": 7, "merged_input": 7, "linearize_indices_and_offset": 7, "need_linearize_indices_and_offset": 7, "booltensor": 7, "becom": [7, 28, 33], "balanc": [7, 16, 22, 33], "embedingbag": 7, "often": 7, "categor": 7, "power": [7, 33, 34], "law": 7, "ag": 7, "video": 7, "game": 7, "19": [7, 30, 31, 32, 34], "29": [7, 31, 32], "row": 7, "write": [7, 17], "address": [7, 18, 31, 32, 33, 34], "conflict": [7, 17], "solv": [7, 19, 33], "togeth": [7, 14, 20, 33, 34], "immedi": 7, "right": [7, 21, 23, 28], "friendli": [7, 33], "gemm": [7, 18, 26, 28, 34], "aim": [7, 10, 16, 33], "math": 7, "wa": [7, 31, 32, 33, 34], "test": [7, 16, 17, 30, 34], "broad": [7, 9, 34], "toggl": 7, "switch": [7, 17, 31, 33, 34], "concern": 7, "footprint": [7, 21, 28, 34], "stick": 7, "splitsgd": [7, 21], "spawn": [7, 20], "subject": [7, 17, 20, 27, 34], "built": [7, 17, 20, 34], "deliv": [7, 28, 34], "separ": [7, 19, 27, 33], "smooth": 7, "ptq": 7, "tackl": 7, "problem": [7, 19, 26, 32, 33], "systemat": 7, "outlier": [7, 16], "commonli": [7, 28, 33, 34], "hopefulli": 7, "eas": [7, 18, 34], "small": [7, 19, 33, 34], "turn": [7, 34], "boolean": [7, 34], "off": [7, 8, 21, 28, 30, 34], "area": [7, 14], "extrem": [7, 14, 33], "situat": [7, 14], "huge": [7, 14, 33], "impract": [7, 14], "consum": [7, 14], "launcher": [7, 13, 31, 33, 34], "integr": [7, 18, 28, 33, 34], "conveni": [8, 34], "lower": [8, 17, 21, 28, 34], "becaus": [8, 17, 18, 21, 28, 33, 34], "lighter": 8, "smaller": [8, 17], "sacrif": 8, "trade": [8, 28, 30, 34], "slower": [8, 33, 34], "accur": 8, "primarili": [8, 34], "show": [8, 17, 21, 28, 29, 30, 31, 32, 33, 34], "simplenet": [8, 34], "super": [8, 10, 16, 20, 26, 34], "stride": [8, 10, 20, 34], "pad": [8, 10, 20, 34], "y": [8, 15, 16, 20, 21, 34], "chosen": [8, 14, 17], "maintain": 8, "categori": [8, 34], "circumst": 8, "imag": [8, 13, 18, 33, 34], "label": 8, "float64": 8, "suppli": 8, "addmm": 8, "addmm_": 8, "cannot": [8, 19, 26, 34], "describ": [8, 13, 18, 21, 32, 33], "expos": 8, "namespac": [8, 17], "regardless": [8, 34], "unlist": 8, "downstream": 8, "believ": [8, 18], "unstabl": 8, "conv1d": [8, 13], "conv3d": [8, 13, 34], "conv_transpose1d": 8, "conv_transpose2d": 8, "conv_transpose3d": 8, "bmm": [8, 34], "mm": 8, "baddbmm": 8, "addbmm": 8, "conv_tbc": 8, "group_norm": 8, "_native_multi_head_attent": 8, "avg_pool3d": 8, "binary_cross_entropi": 8, "grid_sampl": 8, "polar": 8, "prod": 8, "quantil": 8, "nanquantil": 8, "stft": 8, "cdist": 8, "view_as_complex": 8, "choleski": 8, "cholesky_invers": 8, "cholesky_solv": 8, "invers": 8, "lu_solv": 8, "matrix_rank": 8, "orgqr": 8, "ormqr": 8, "pinvers": 8, "max_unpool2d": 8, "max_unpool3d": 8, "adaptive_avg_pool3d": 8, "reflection_pad1d": 8, "reflection_pad2d": 8, "replication_pad1d": 8, "replication_pad2d": 8, "replication_pad3d": 8, "mse_loss": 8, "cosine_embedding_loss": 8, "nll_loss": 8, "nll_loss2d": 8, "hinge_embedding_loss": 8, "poisson_nll_loss": 8, "smooth_l1_loss": 8, "cross_entropy_loss": 8, "l1_loss": 8, "huber_loss": 8, "margin_ranking_loss": 8, "soft_margin_loss": 8, "triplet_margin_loss": 8, "multi_margin_loss": 8, "ctc_loss": 8, "kl_div": 8, "multilabel_margin_loss": 8, "binary_cross_entropy_with_logit": 8, "fft_fft": 8, "fft_ifft": 8, "fft_fft2": 8, "fft_ifft2": 8, "fft_fftn": 8, "fft_ifftn": 8, "fft_rfft": 8, "fft_irfft": 8, "fft_rfft2": 8, "fft_irfft2": 8, "fft_rfftn": 8, "fft_irfftn": 8, "fft_hfft": 8, "fft_ihfft": 8, "linalg_cond": 8, "linalg_matrix_rank": 8, "linalg_solv": 8, "linalg_choleski": 8, "linalg_svdv": 8, "linalg_eigv": 8, "linalg_eigvalsh": 8, "linalg_inv": 8, "linalg_householder_product": 8, "linalg_tensorinv": 8, "linalg_tensorsolv": 8, "fake_quantize_per_tensor_affin": 8, "eig": 8, "geqrf": 8, "lstsq": 8, "_lu_with_info": 8, "qr": 8, "svd": 8, "symeig": 8, "triangular_solv": 8, "fractional_max_pool2d": 8, "fractional_max_pool3d": 8, "adaptive_max_pool3d": 8, "multilabel_margin_loss_forward": 8, "linalg_qr": 8, "linalg_cholesky_ex": 8, "linalg_svd": 8, "linalg_eig": 8, "linalg_eigh": 8, "linalg_lstsq": 8, "linalg_inv_ex": 8, "cat": [8, 31, 32, 34], "index_copi": 8, "intervent": 8, "mixtur": [8, 34], "enable_auto_channels_last": 9, "disable_auto_channels_last": 9, "regress": [9, 34], "rais": 10, "oob": [10, 34], "easili": [10, 15], "who": 10, "inevit": 10, "simplifi": [10, 34], "snippet": [10, 29], "optimum": 10, "monkei": 10, "patch": [10, 34], "embedding_bag": 10, "qa": [10, 34], "clear": 10, "ninstanc": [10, 14, 31, 34], "ncore": [10, 31], "28": [10, 14, 16, 30, 31, 32, 33, 34], "run_qa": [10, 34], "model_name_or_path": [10, 29, 34], "dataset_nam": [10, 34], "squad": [10, 30, 34], "do_ev": [10, 34], "per_device_train_batch_s": [10, 34], "learning_r": [10, 34], "3e": [10, 34], "num_train_epoch": [10, 34], "max_seq_length": [10, 34], "384": [10, 32, 34], "doc_strid": [10, 34], "output_dir": [10, 14, 34], "tmp": [10, 32, 34], "debug_squad": [10, 34], "dummymodul": 10, "input1": 10, "kernel_s": 10, "7": [10, 14, 17, 20, 21, 31, 32, 34], "track_running_stat": 10, "customized_forward": 10, "method1": 10, "success": [10, 24], "method2": 10, "fail": [10, 26, 34], "top": [10, 21, 34], "unabl": 10, "hook": [10, 16], "As": [10, 19, 20, 28, 31, 32, 33, 34], "behaviour": 10, "repeat": [10, 18, 21], "feasibl": 10, "idea": [11, 21, 33], "primit": [11, 20, 30, 34], "portabl": 11, "hpc": 11, "ensur": [11, 19, 20, 32], "perf": [11, 18], "tri": 12, "failur": [12, 34], "incorrect": [12, 26, 34], "trigger": 12, "meanwhil": [12, 33, 34], "resnet50": [12, 13, 14, 18, 30, 31, 33, 34], "dag": 13, "acycl": 13, "straight": [13, 33], "cover": [13, 18, 31], "constant": 13, "resourc": [13, 20, 28, 32, 33], "focus": [13, 34], "front": [13, 34], "batchnorm": [13, 17, 18, 26, 34], "propag": [13, 21, 33], "graph_for": 13, "regard": 13, "rn50": [13, 34], "sum": [13, 16, 18, 19, 34], "convrelu": 13, "convsumrelu": 13, "default_static_qconfig": [13, 15, 32, 34], "quantized_model": [13, 15, 34], "244": 13, "convtranspose3d": 13, "ab": [13, 32], "clamp": 13, "elu": 13, "exp": 13, "hardtanh": 13, "hardswish": [13, 34], "mish": 13, "sigmoid": [13, 34], "pow": 13, "round": [13, 21], "squar": [13, 28], "tanh": [13, 34], "leaki": 13, "_": [13, 15, 16, 17, 18, 20, 30, 31, 32, 33, 34], "div": 13, "view": [13, 18, 20, 21], "transpos": [13, 34], "dequant": [13, 16], "partit": [13, 33], "leaky_relu": 13, "___": 13, "divid": [13, 32, 33, 34], "maxpool2d": 13, "_____": 13, "stock": [13, 30, 34], "owner": 13, "otheriws": 13, "compuat": 13, "wikipedia": [13, 33], "There": [14, 16, 20, 33, 34], "thing": [14, 33], "yaml": 14, "strategi": [14, 33, 34], "grid": 14, "random": 14, "max_trial": 14, "trial": 14, "record": [14, 32], "csv": 14, "hyperparam": 14, "mandatori": 14, "hp": 14, "ncores_per_inst": 14, "all_physical_cor": 14, "ncore_per_inst": [14, 34], "all_logical_cor": 14, "use_all_nod": 14, "num_nod": 14, "use_logical_cor": [14, 32], "is_hyperthreading_en": 14, "disable_numactl": [14, 32], "disable_iomp": [14, 32], "malloc": [14, 31, 33], "tc": 14, "je": 14, "previou": [14, 16, 18, 33, 34], "hyperparamt": 14, "8": [14, 16, 30, 31, 32, 33], "respect": [14, 16, 30, 31, 34], "maxim": 14, "statement": [14, 17], "higher_is_bett": 14, "target_v": 14, "inf": 14, "minimum": [14, 16, 18], "platinum": [14, 30, 32, 33], "8180m": [14, 33], "socket": [14, 30, 32, 33, 34], "anoth": [14, 31, 33, 34], "conf_fil": [14, 34], "hypertune_directori": 14, "termin": 14, "15": [14, 17, 30, 31, 32], "339081764221191": 14, "gave": 14, "side": [15, 33], "compon": [15, 26, 27, 28], "much": [15, 18, 21, 28, 33], "abl": 15, "similar": [15, 17, 33], "satisfi": [15, 26], "tradeoff": 15, "reduce_rang": 15, "methond": 15, "obsev": 15, "symmetr": 15, "sete": 15, "skylak": 15, "quant_stat": 15, "calibration_data_set": [15, 34], "qparam": 15, "And": [15, 20, 32, 34], "achang": 15, "overrid": 15, "load_qconf_summari": 15, "dynamic_qconfig": 15, "default_dynamic_qconfig": [15, 32], "per_tensor_symmetr": 15, "gru": 15, "lstmcell": 15, "rnncell": 15, "grucel": 15, "bother": 16, "desir": [16, 31], "receip": [16, 20], "sq": 16, "difficulti": 16, "vari": 16, "across": [16, 31], "herebi": 16, "obtain": 16, "abil": 16, "optdecoderlay": 16, "blockwis": 16, "consist": [16, 28, 33, 34], "major": 16, "adjust": 16, "accordingli": 16, "predict": 16, "criteria": 16, "consider": 16, "numpi": 16, "np": [16, 31], "tolist": 16, "auto_alpha_arg": 16, "init_alpha": [16, 22], "baselin": [16, 22, 34], "alpha_min": [16, 22], "alpha_max": [16, 22], "99": [16, 30, 34], "alpha_step": [16, 22], "step_siz": [16, 22], "shared_criterion": [16, 22], "enable_blockwise_loss": [16, 22], "portion": 16, "beginn": 16, "quickstart_tutori": 16, "training_data": 16, "fashionmnist": 16, "test_data": 16, "loader": 16, "train_dataload": 16, "test_dataload": 16, "neuralnetwork": 16, "flatten": [16, 20], "linear_relu_stack": 16, "sequenti": 16, "logit": 16, "loss_fn": 16, "pred": 16, "backpropag": 16, "item": 16, "7f": 16, "5d": 16, "epoch": 16, "argmax": 16, "inc": [16, 17, 22, 28], "accu": 16, "tuned_conf": 16, "explain": [17, 18, 21], "fork": [17, 33], "avx512_vnni": 17, "avx512_bf16": 17, "avx2": [17, 26, 34], "avx2_vnni": 17, "avx512_fp16": 17, "11": [17, 31, 32], "gcc": 17, "findavx": 17, "bodi": 17, "anonym": 17, "virtual": 17, "polymorph": 17, "pertain": 17, "cpuid": 17, "statu": 17, "pointer": 17, "system": [17, 33], "specifii": 17, "complier": 17, "isacodegen": 17, "suffix": 17, "adaptiveaveragepoolingkrnl": 17, "isa_codegen": 17, "o3": 17, "d__avx__": 17, "dcpu_capability_avx2": 17, "mavx2": 17, "mfma": 17, "mno": 17, "avx256": 17, "unalign": [17, 34], "dcpu_cap": 17, "dcpu_capability_default": 17, "d__avx512f__": 17, "mavx512f": 17, "mavx512bw": 17, "mavx512vl": 17, "mavx512dq": 17, "dcpu_capability_avx512": 17, "mavx512vnni": 17, "dcpu_capability_avx512_vnni": 17, "mavx512bf16": 17, "dcpu_capability_avx512_bf16": 17, "mamx": 17, "tile": 17, "dcpu_capability_amx": 17, "mavx512fp16": 17, "dcpu_capability_avx512_fp16": 17, "align": [17, 18, 21, 34], "stead": 17, "sleef": 17, "width": [17, 18], "isa_nam": 17, "inlin": 17, "compat": [17, 21], "definit": [17, 21], "Such": 17, "But": [17, 18], "tip": 17, "newkernelkrnl": 17, "newkernel": 17, "header": 17, "special": [17, 18, 28], "fastest": 17, "cpuinfo": 17, "mykernel": 17, "fn_type": 17, "void": 17, "ipex_declare_dispatch": 17, "ipex_define_dispatch": 17, "ipex_register_dispatch": 17, "kcpu": 17, "declar": 17, "ideep": [17, 18], "common": [17, 21, 28, 31, 33], "intrins": 17, "cvtfp32tobf16": 17, "pragma": 17, "torch_ipex": [17, 34], "cvt_fp32_to_bf16": 17, "dst": 17, "cvt_fp32_to_bf16_kernel_impl": 17, "cvt_fp32_to_bf16_kernel_fn": 17, "cvt_fp32_to_bf16_kernel_stub": 17, "macro": 17, "cpu_capability_avx512": 17, "cpu_capability_avx512_bf16": 17, "hav": 17, "cvtfp32tobf16krnl": 17, "vec512": 17, "vec256": 17, "endif": 17, "immintrin": 17, "__m256i": 17, "_cvt_fp32_to_bf16": 17, "__m512": 17, "reinterpret_cast": 17, "_mm512_cvtneps_pbh": 17, "__m512i": 17, "_mm512_castps_si512": 17, "nan": [17, 34], "_mm512_set1_epi32": 17, "0xffff": 17, "mask_valu": 17, "_mm512_cmp_ps_mask": 17, "_cmp_ord_q": 17, "0x1": 17, "vec_bia": 17, "0x7fff": 17, "uint32_t": 17, "lsb": 17, "t_valu": 17, "_mm512_and_si512": 17, "_mm512_srli_epi32": 17, "rounding_bia": 17, "_mm512_add_epi32": 17, "_mm512_mask_blend_epi32": 17, "_mm512_cvtusepi32_epi16": 17, "f32": [17, 18], "_mm512_loadu_p": 17, "_mm256_storeu_si256": 17, "_mm512_maskz_loadu_p": 17, "_mm256_mask_storeu_epi16": 17, "getveclength": 17, "get_cpp_typesize_and_vecs": 17, "scalartyp": 17, "get_cpp_typesize_and_vecsize_kernel_impl": 17, "get_cpp_typesize_and_vecsize_kernel_fn": 17, "get_cpp_typesize_and_vecsize_kernel_stub": 17, "types": 17, "vectors": 17, "getveclengthkrnl": 17, "doubl": 17, "make_tupl": 17, "sizeof": 17, "complexdoubl": 17, "complex": 17, "complexfloat": 17, "decltyp": 17, "impl": 17, "scalartypetocpptyp": 17, "torch_check": 17, "09": [17, 31], "58": [17, 31], "anaconda": 17, "copyright": [17, 27], "credit": 17, "licens": 17, "_c": [17, 26], "_get_current_isa_level": 17, "_get_highest_cpu_support_isa_level": 17, "_get_highest_binary_support_isa_level": 17, "quit": [17, 34], "By": [17, 31, 33], "aten_cpu_cap": 17, "effect": [17, 21, 26, 32, 33], "intern": [17, 18, 20, 32], "purpos": [17, 31, 32, 33], "addtion": 17, "tool": [17, 33, 34], "subfold": 17, "rh": 17, "toolset": 17, "33": [17, 31, 32], "cmakefil": 17, "cpu_featur": 17, "dir": [17, 31], "66": [17, 31, 34], "cpu_feature_main": 17, "xcr0": 17, "00000000000602e7": 17, "mmx": 17, "sse": 17, "sse2": 17, "sse3": 17, "ssse3": 17, "sse4_1": 17, "sse4_2": 17, "aes_ni": 17, "sha": 17, "xsave": 17, "fma": 17, "f16c": 17, "avx_vnni": 17, "avx512_f": 17, "avx512_cd": 17, "avx512_pf": 17, "avx512_er": 17, "avx512_vl": 17, "avx512_bw": 17, "avx512_dq": 17, "avx512_ifma": 17, "avx512_vbmi": 17, "avx512_vpopcntdq": 17, "avx512_4fmap": 17, "avx512_4vnniw": 17, "avx512_vbmi2": 17, "avx512_vpclmul": 17, "avx512_bitalg": 17, "avx512_vp2intersect": 17, "amx_bf16": 17, "amx_til": 17, "amx_int8": 17, "prefetchw": 17, "prefetchwt1": 17, "represent": 18, "multidimension": 18, "arrai": 18, "nd": 18, "1d": 18, "semant": 18, "attribut": 18, "coo": 18, "canon": 18, "assign": [18, 32, 33], "2d": 18, "height": 18, "illustr": [18, 19, 21, 31, 33], "actual": [18, 21], "bmp": 18, "contiguous_format": [18, 33], "tensorflow": 18, "close": [18, 31, 33], "to_mkldnn": 18, "difficult": 18, "manipul": 18, "to_dens": 18, "natur": [18, 21, 28], "hold": [18, 33], "secret": 18, "ingredi": 18, "almost": 18, "foundat": [18, 33], "upper": [18, 33], "fact": [18, 33], "expens": 18, "benefici": 18, "nb": 18, "me": 18, "roughli": 18, "50": [18, 31, 32], "mkldnn": 18, "mkldnn_util": 18, "subsequ": [18, 33], "concept": [18, 33], "diagram": [18, 33], "hard": [18, 26], "conclus": 18, "necessari": 18, "neglig": 18, "move": [18, 33], "organ": 18, "question": [18, 30], "reinterpret": 18, "answer": [18, 30], "chw": 18, "hw": 18, "stride_n": 18, "stride_c": 18, "stride_h": 18, "stride_w": 18, "merit": 18, "express": [18, 34], "noncontigu": 18, "n1": 18, "n2": 18, "mind": [18, 32], "someth": 18, "reli": [18, 20], "rfc": 18, "hwc": 18, "wc": 18, "chwn": 18, "hwn": 18, "wn": 18, "empti": [18, 31], "outplac": [18, 34], "is_contigu": 18, "_appli": 18, "brief": [18, 28, 34], "imagenet": [18, 30], "spontan": 18, "tell": [18, 20, 33], "NOT": [18, 31], "compris": 18, "explicit": [18, 20, 33], "implicit": 18, "tensoriter": 18, "guidelin": 18, "awar": [18, 20, 31, 32], "my": 18, "upsampl": [18, 34], "cudnn": 18, "accommod": 18, "md": 18, "format_tag": 18, "src_md": 18, "desc": 18, "data_typ": 18, "src_mem": 18, "src_data_ptr": 18, "card": 18, "hwio": 18, "resnext101": [18, 34], "detectron2": 18, "8x": 18, "lamb": [19, 21], "adagrad": [19, 21], "clr": 19, "lr_decai": 19, "state_sum": 19, "addcmul_": 19, "add_": 19, "addcdiv_": 19, "whole": [19, 20, 33], "storag": 19, "onboard": [19, 33], "third": [19, 34], "high": [19, 21, 33], "bound": [19, 20, 28, 33], "bottl": 19, "neck": 19, "prevent": 19, "pseudo": [19, 21, 34], "adagrad_fused_step": 19, "group": [19, 20, 33], "grad0": 19, "grad1": 19, "grad_n": 19, "param_n": 19, "state_sum_n": 19, "adagrad_step": 19, "grad_i": 19, "param_i": 19, "state_sum_i": 19, "other_arg": 19, "coupl": [20, 33, 34], "omp": [20, 26, 31, 32, 33, 34], "ld_preload": [20, 31, 32, 33], "libiomp5": [20, 31, 32, 33], "model_script": 20, "examplenet": 20, "examplenet1": 20, "x1": 20, "start_dim": 20, "examplenet2": 20, "conv2": 20, "x2": 20, "y1": 20, "y2": 20, "model1": 20, "traced_model1": 20, "model2": 20, "traced_model2": 20, "multi_stream_model": [20, 34], "datatyp": [20, 34], "receipt": 20, "steam": [20, 34], "input_hint": 20, "output_hint": 20, "pthread": 20, "async": [20, 34], "wake": 20, "synchron": [20, 26, 34], "imper": [20, 34], "suffer": 20, "gil": 20, "hurt": 20, "mitig": [20, 30], "omp_num_thread": [20, 26, 31, 32, 34], "phase": 20, "s1": 20, "c1": 20, "numactl": [20, 31, 32], "outsid": 20, "superset": 20, "undefin": [20, 33], "gb": 20, "simultan": 20, "correspond": [20, 31, 34], "cpu_pool1": 20, "cpu_pool2": 20, "task1": 20, "task2": 20, "y1_futur": 20, "y2_futur": 20, "y_runtim": 20, "kmp_": 20, "fulfil": 20, "worker": [20, 31], "serv": [20, 34], "sub": [20, 28, 33], "wait": [20, 33], "futuretensor": 20, "didn": 20, "dlopen": 20, "symbol": 20, "bottom": 21, "bit": [21, 28], "sign": 21, "expon": 21, "mantissa": 21, "23": [21, 31, 32], "capac": [21, 30], "digit": 21, "shorter": [21, 28], "fewer": 21, "neg": 21, "disadvantag": 21, "shift": 21, "left": [21, 28, 32], "lose": 21, "decim": 21, "valid": [21, 34], "1234500000": 21, "0000012345": 21, "1234512345": 21, "sens": 21, "fraction": 21, "12345": 21, "00000": 21, "signific": 21, "bui": 21, "involv": 21, "ground": 21, "truth": 21, "chain": 21, "rule": [21, 34], "meet": [21, 33, 34], "wide": [21, 34], "understand": [21, 28, 33], "formula": 21, "\u03b1": 21, "gw": 21, "denot": 21, "receiv": 21, "rate": 21, "earlier": 21, "inaccur": 21, "exactli": 21, "kept": 21, "halv": 21, "recov": 21, "fp32_w": 21, "concat_fp32_from_bf16": 21, "bf16_w": 21, "fp32_gw": 21, "bf16_gw": 21, "weight_dacai": 21, "split_bf16_from_fp32": 21, "ratio": [22, 30, 34], "beta": [23, 26], "demostr": 23, "cheat": 23, "sheet": 23, "pypi": [26, 34], "occupi": 26, "remark": [26, 30, 33], "__name__": [26, 34], "__main__": [26, 31, 32, 34], "112": [26, 30, 33, 34], "nnc": 26, "poor": [26, 34], "xlm": 26, "roberta": [26, 34], "casual": 26, "gpt2": 26, "summar": 26, "classif": [26, 30], "allenai": 26, "longform": 26, "409": 26, "workaround": [26, 34], "_jit_set_texpr_fuser_en": 26, "csrc": 26, "tensorexpr_fus": 26, "settensorexprfuseren": 26, "longer": [26, 30], "complic": [26, 31, 33], "undergo": [26, 29], "runtimeerror": [26, 34], "overflow": [26, 34], "unpack": [26, 34], "exce": [26, 30, 33, 34], "quantize_per_tensor": 26, "pseudocod": [26, 34], "omp_num_threa": 26, "set_num_thread": [26, 34], "freezed_model": [26, 34], "run_benchmark": [26, 34], "flow": 26, "bag": [26, 34], "progress": [26, 28, 34], "abnorm": [26, 34], "tbd": 26, "transformerencoderlay": 26, "encount": [26, 34], "rnnt": [26, 34], "joint_net": [26, 34], "caller": [26, 34], "apach": [27, 32], "notic": [27, 31, 32], "term": 27, "condit": 27, "multiheadattent": 28, "feedforward": 28, "lot": [28, 34], "besid": [28, 33, 34], "adopt": [28, 34], "modelfamili": 28, "hub": 28, "staticquantizationint8": 28, "onlyquantizationint8": 28, "onlyquantizationint4": 28, "13b": [28, 30, 34], "70b": [28, 34], "8b": 28, "20b": 28, "dolli": [28, 34], "databrick": 28, "v2": [28, 30, 34], "12b": 28, "tiiuae": 28, "40b": 28, "30b": 28, "3b": 28, "bigscienc": 28, "1b7": 28, "salesforc": 28, "2b": 28, "baichuan2": [28, 34], "chat": 28, "thudm": 28, "chatglm3": [28, 34], "chatglm2": [28, 34], "bigcod": 28, "starcod": [28, 34], "flan": 28, "xl": 28, "mosaicml": 28, "mistralai": 28, "v0": 28, "8x7b": 28, "stabilityai": 28, "1_6b": 28, "liuhaotian": 28, "v1": [28, 34], "microsoft": 28, "ieityuan": 28, "yuan2": 28, "102b": 28, "signifi": 28, "perfect": 28, "codellama": 28, "rope": 28, "past": 28, "year": 28, "flourish": 28, "contribut": [28, 31, 34], "research": 28, "web": 28, "legend": 28, "autotp": 28, "obviou": 28, "hotspot": 28, "lead": 28, "significantli": [28, 34], "heavier": 28, "io": 28, "occurr": 28, "ship": 28, "2nd": 28, "4th": [28, 30], "except": [28, 31], "beeter": 28, "Its": 28, "seen": 28, "woq": 28, "integ": [28, 33], "bandwidth": 28, "reorder_cach": 28, "beam_width": 28, "secondli": 28, "elimin": 28, "shard": 28, "content": [29, 34], "your_calibration_dataset": 29, "calib_sampl": 29, "calibration_model": 29, "qconfig_summary_file_path": 29, "nf4": 29, "init_distribut": 29, "get_acceler": 29, "communication_backend_nam": 29, "var": 29, "ondevic": 29, "init_infer": 29, "mp_size": 29, "base_dir": 29, "repo_root": 29, "checkpoints_json": 29, "zone": [30, 34], "articl": [30, 33], "llama2": [30, 34], "1024": [30, 33], "were": [30, 31, 32, 33], "carri": 30, "m7i": 30, "m6i": [30, 32], "47x": 30, "62x": 30, "57x": 30, "58x": 30, "85x": 30, "27x": 30, "38x": 30, "29x": 30, "36x": 30, "conclud": [30, 34], "respons": 30, "session": 30, "exhibit": 30, "wherea": 30, "p90": 30, "26x": 30, "sec": 30, "39": [30, 31, 32, 34], "26": [30, 31, 32], "49": [30, 31, 32], "170": 30, "21": [30, 31, 32], "measur": [30, 34], "17th": 30, "16xlarg": 30, "u": [30, 32], "west": 30, "ubuntu": 30, "04": [30, 31], "1009": 30, "sw": 30, "workload1": 30, "inference2": 30, "realtim": 30, "inference3": 30, "tunabl": [30, 32], "8380": 30, "30ghz": 30, "83x": 30, "44x": 30, "ssd": [30, 34], "resnet34": [30, 34], "16x": 30, "coco": 30, "1200": 30, "resnext": 30, "32x16d": 30, "81x": 30, "21x": 30, "vgg": 30, "75x": 30, "19x": 30, "shufflenetv2_x1": 30, "07x": 30, "78x": 30, "04x": 30, "max_seq_len": 30, "384task": 30, "jemalloc": [30, 32, 34], "05x": 30, "96x": 30, "mrpc": 30, "128task": 30, "distilbert": 30, "12x": 30, "dnnl": 30, "base_text_classif": 30, "f1": 30, "81": [30, 31], "79": [30, 31], "93": 30, "02": [30, 32], "85": [30, 31], "86": [30, 31], "top1": 30, "76": [30, 31], "75": [30, 31], "98": 30, "78": [30, 31], "199": 30, "48": [30, 31, 32], "vgg11": 30, "69": [30, 31], "67": [30, 31, 34], "96": 30, "44": [30, 31, 32], "36": [30, 31, 32], "92": 30, "97": 30, "shufflenet": 30, "histogram": [30, 34], "40": [30, 31, 32, 34], "ucod": 30, "0xd0002a0": 30, "ON": 30, "turboboost": 30, "bio": 30, "ddr": 30, "16gb": 30, "3200": 30, "dcpmm": 30, "256gb": 30, "host": [30, 34], "cento": 30, "2105": 30, "18": [30, 31, 32], "305": 30, "el8_4": 30, "x86_64": 30, "docker": [30, 34], "spectr": 30, "meltdown": 30, "24x": 30, "31x": 30, "15x": 30, "30x": 30, "mobilenet": 30, "08x": 30, "03x": 30, "09x": 30, "39x": 30, "35x": 30, "160": 30, "55x": 30, "06x": 30, "fpn": 30, "71x": 30, "20x": 30, "13x": 30, "32x": 30, "48x": 30, "11x": 30, "terabyt": 30, "14x": 30, "02x": 30, "10x": 30, "33x": 30, "8380h": 30, "90ghz": 30, "56": [30, 31, 32, 33], "67x": 30, "45x": 30, "77x": 30, "18x": 30, "formerli": [30, 33, 34], "0x700001c": 30, "wlydcrb1": 30, "sy": 30, "0016": 30, "p29": 30, "2006080250": 30, "64gb": 30, "768gb": 30, "influenc": [31, 33], "properli": 31, "themselv": [31, 34], "free": [31, 34], "mainli": [31, 34], "around": 31, "interpret": 31, "prefix": 31, "cross": [31, 32, 33, 34], "taskset": 31, "malloc_conf": [31, 33], "crash": [31, 33, 34], "nnode": 31, "nproc": 31, "count": 31, "addr": 31, "ip": 31, "hostnam": 31, "proc": 31, "port": 31, "hostfil": 31, "mpi": 31, "mpiexec": 31, "hydra": 31, "ppn": 31, "genv": 31, "i_mpi_pin_domain": 31, "codeless": 31, "ut": 31, "exclus": 31, "mutual": 31, "ld": 31, "favorit": 31, "kmp": [31, 33], "granular": [31, 32, 33], "compact": [31, 32, 33], "stdout": 31, "afterward": [31, 33], "undesir": 31, "_timestamp_inst": 31, "_timestamp_instance_": 31, "_core": 31, "run_20210712212258_inst": 31, "run_20210712212258_instance_0_cores_0": 31, "gif": 31, "07": 31, "764": 31, "conda_prefix": [31, 32], "virtual_env": [31, 32], "lib64": [31, 32], "home": [31, 32], "drop": [31, 32], "kmp_affin": [31, 32, 33], "kmp_blocktim": [31, 32, 33], "14": [31, 32, 34], "24": [31, 32], "25": [31, 32], "27": [31, 32, 33], "30": [31, 32], "31": [31, 32], "34": [31, 32], "35": [31, 32], "37": [31, 32, 34], "41": [31, 32], "42": [31, 32], "tee": 31, "run_20210712223308_inst": 31, "run_20210712223308_instance_0_cores_0": 31, "87": 31, "08": 31, "117": 31, "88": 31, "118": 31, "45": [31, 32], "46": [31, 32], "47": [31, 32], "51": [31, 32], "52": [31, 32], "53": [31, 32], "54": [31, 32], "55": [31, 32, 33], "57": 31, "59": 31, "60": 31, "61": 31, "62": 31, "63": [31, 34], "65": 31, "68": [31, 34], "70": 31, "71": 31, "72": 31, "73": 31, "74": 31, "77": 31, "82": 31, "83": [31, 33], "run_20210712214504_inst": 31, "run_20210712214504_instance_0_cores_22": 31, "513": 31, "run_20210712220928_inst": 31, "run_20210712220928_instance_0_cores_0": 31, "355": 31, "356": 31, "deduct": 31, "run_20210712221615_inst": 31, "run_20210712221615_instance_0_cores_11": 31, "591": 31, "run_20210712221150_inst": 31, "run_20210712221150_instance_0_cores_0": 31, "run_20210712221150_instance_1_cores_22": 31, "233": 31, "236": 31, "run_20210712221415_inst": 31, "run_20210712221415_instance_0_cores_0": 31, "run_20210712221415_instance_1_cores_4": 31, "run_20210712221415_instance_2_cores_8": 31, "run_20210712221415_instance_3_cores_12": 31, "run_20210712221415_instance_4_cores_16": 31, "run_20210712221415_instance_5_cores_20": 31, "run_20210712221415_instance_6_cores_24": 31, "run_20210712221415_instance_7_cores_28": 31, "run_20210712221415_instance_8_cores_32": 31, "run_20210712221415_instance_9_cores_36": 31, "run_20210712221415_instance_10_cores_40": 31, "140": 31, "143": 31, "146": 31, "149": 31, "151": 31, "154": 31, "157": 31, "159": 31, "162": 31, "164": 31, "167": 31, "run_20210712221305_inst": 31, "run_20210712221305_instance_0_cores_0": 31, "run_20210712221305_instance_1_cores_11": 31, "run_20210712221305_instance_2_cores_22": 31, "run_20210712221305_instance_3_cores_33": 31, "470": 31, "471": 31, "473": 31, "476": 31, "479": 31, "instance_idx": 31, "independ": 31, "confirm": 31, "175": 31, "176": 31, "177": 31, "run_20220106130151_instance_0_cores_0": 31, "sometim": [31, 33], "235": 31, "jemallocl": 31, "oversize_threshold": [31, 33], "background_thread": [31, 33], "metadata_thp": [31, 33], "dirty_decay_m": [31, 33], "9000000000": [31, 33], "muzzy_decay_m": [31, 33], "libjemalloc": 31, "run_20210713153048_instance_0_cores_0": 31, "654": 31, "libtcmalloc": [31, 32], "655": 31, "run_20210713153333_instance_0_cores_0": 31, "784": 31, "run_20210713153659_instance_0_cores_0": 31, "blocktim": 31, "00": [31, 34], "760": [31, 32], "761": [31, 32], "omp_schedul": [31, 33], "omp_proc_bind": [31, 33], "run_20210713152500_instance_0_cores_0": 31, "give": [32, 34], "ipex_en": 32, "procedur": 32, "tunin": 32, "dramat": [32, 33], "cpu_launcher_en": 32, "cpu_launcher_arg": 32, "hyperthread": 32, "present": 32, "ital": 32, "ptmalloc": 32, "use_default_alloc": [32, 34], "tcmalloc": 32, "enable_tcmalloc": 32, "enable_jemalloc": 32, "nth": [32, 33], "uniform": 32, "overlap": 32, "signficantli": 32, "8180": 32, "affinit": 32, "addition": 32, "kill": 32, "unutil": 32, "restart": 32, "remain": 32, "aliv": 32, "taken": 32, "care": 32, "worri": 32, "continu": [32, 34], "Then": 32, "interrupt": 32, "dummi": 32, "dummy_tensor": 32, "scheme": 32, "bert_int8_jit": 32, "n_iter": 32, "rn50_int8_jit": 32, "usus": 32, "rn50_ipex_int8": 32, "handler": 32, "image_classifi": 32, "similarli": 32, "bert_ipex_int8": 32, "transformer_handler_gener": 32, "setup_config": 32, "seq_classification_artifact": 32, "index_to_nam": 32, "nc": 32, "model_stor": 32, "server": [32, 33], "rest": 32, "model_log": 32, "096": 32, "8375c": 32, "03": 32, "981": 32, "982": 32, "previous": 32, "cases": 32, "223": 32, "site": 32, "model_service_work": 32, "sock": 32, "unix": 32, "9000": 32, "762": 32, "763": 32, "9001": 32, "274": 32, "9002": 32, "975": 32, "9003": 32, "bench": 32, "amazon": 32, "ec2": 32, "24xlarg": 32, "reproduc": 32, "url": [32, 34], "modelurl": 32, "inputpath": 32, "concurr": [32, 33], "huggingface_transform": 32, "sample_text_captum_input": 32, "graphic": 33, "xe": 33, "briefli": 33, "background": 33, "knowledg": 33, "c620": 33, "seri": 33, "chipset": 33, "purlei": 33, "chip": 33, "inclus": 33, "1mb": 33, "l2": 33, "2666": 33, "mhz": 33, "ddr4": 33, "six": 33, "ultra": 33, "interconnect": 33, "upi": 33, "microarchitectur": 33, "connect": 33, "transfer": 33, "equip": 33, "motherboard": 33, "attach": 33, "remot": 33, "asu": 33, "z11pa": 33, "d8": 33, "competit": 33, "stall": 33, "busi": 33, "uma": 33, "lscpu": 33, "retriev": 33, "111": 33, "50ghz": 33, "node0": 33, "node1": 33, "sophist": 33, "brought": [33, 34], "polici": 33, "put": 33, "sysctl": 33, "great": 33, "placement": 33, "cpunodebind": 33, "membind": 33, "multithread": 33, "primari": 33, "consecut": 33, "join": 33, "libgomp": 33, "libiomp": 33, "hang": [33, 34], "gomp_cpu_affin": 33, "comma": 33, "invalid": 33, "thrash": 33, "did": [33, 34], "compet": 33, "unus": 33, "proclist": 33, "millisecond": 33, "sleep": 33, "200m": 33, "period": 33, "elaps": 33, "overal": 33, "appropri": 33, "reserv": 33, "sole": 33, "penal": 33, "role": 33, "unnecessari": 33, "destruct": 33, "emphas": 33, "fragment": 33, "mmuzzy_decay_m": 33, "forg": 33, "dealloc": 33, "costli": 33, "gpertool": 33, "plu": 33, "pretti": 33, "nifti": 33, "analysi": 33, "gperftool": 33, "set_flush_denorm": 33, "warm": 33, "therefor": 33, "threshold": 33, "usuali": 33, "come": 33, "maskrcnn": [33, 34], "wav2vec2": 33, "recognit": 33, "onednn_primitive_cache_capac": 33, "65536": 33, "voic": 33, "excit": 34, "announc": 34, "accompani": 34, "privat": 34, "broader": 34, "sincer": 34, "encourag": 34, "feedback": 34, "creator": 34, "reach": 34, "hf_beam_sampl": 34, "hf_beam_search": 34, "hf_greedy_search": 34, "hf_sampl": 34, "walk": 34, "2561": 34, "2584": 34, "2617": 34, "2663": 34, "2733": 34, "act": 34, "2550": 34, "2568": 34, "2641": 34, "2675": 34, "2613": 34, "upgrad": 34, "v3": 34, "2747": 34, "misc": 34, "2468": 34, "2627": 34, "2631": 34, "2704": 34, "changelog": 34, "optimize_transform": 34, "your_generation_param": 34, "newli": 34, "varianc": 34, "encod": 34, "2349": 34, "2412": 34, "2469": 34, "2476": 34, "flash": 34, "2317": 34, "2334": 34, "2392": 34, "2480": 34, "elser": 34, "2491": 34, "public": 34, "2473": 34, "2511": 34, "2433": 34, "2253": 34, "2251": 34, "2236": 34, "2278": 34, "2257": 34, "dockerfil": 34, "ux": 34, "2229": 34, "2195": 34, "2299": 34, "2315": 34, "2283": 34, "2280": 34, "2292": 34, "2275": 34, "2319": 34, "2198": 34, "2264": 34, "2290": 34, "experiment": 34, "workflow": 34, "1563": 34, "excess": 34, "1677": 34, "1688": 34, "1664": 34, "lar": 34, "1695": 34, "dictionari": 34, "1682": 34, "2137": 34, "1568": 34, "1585": 34, "1590": 34, "1587": 34, "1594": 34, "old": 34, "hypervisor": 34, "vm": 34, "1513": 34, "1593": 34, "padding_mod": 34, "1580": 34, "1566": 34, "transnetv2": 34, "1564": 34, "rnn": 34, "avx512_core_vnni": 34, "1592": 34, "1589": 34, "1517": 34, "hero": 34, "inspir": 34, "stanford": 34, "consumpt": 34, "ve": 34, "1341": 34, "instancenorm": 34, "1330": 34, "1414": 34, "1473": 34, "1419": 34, "1488": 34, "webpag": 34, "1318": 34, "1353": 34, "1328": 34, "1355": 34, "1367": 34, "1384": 34, "1295": 34, "1392": 34, "1376": 34, "1373": 34, "1338": 34, "1391": 34, "1322": 34, "usabl": 34, "effort": 34, "cv": 34, "refin": 34, "identifi": 34, "torchrun": 34, "shortcut": 34, "mkl": 34, "sgemm": 34, "geomean": 34, "auto_ipex": 34, "hood": 34, "calibrated_model": 34, "model_to_be_calibr": 34, "992": 34, "64byte": 34, "addlayernorm": 34, "retinanet": 34, "1032": 34, "1053": 34, "1074": 34, "tightli": 34, "matur": 34, "offlin": 34, "becam": 34, "bake": 34, "wave2vec": 34, "albert": 34, "facilit": 34, "minmax": 34, "movingaverageminmax": 34, "polish": 34, "flexibl": 34, "quantconf": 34, "multi_stream_input_hint": 34, "multi_stream_output_hint": 34, "adam": 34, "822": 34, "3d": 34, "642": 34, "deconv3d": 34, "692": 34, "787": 34, "swish": 34, "fsi": 34, "risk": 34, "551": 34, "leakyrelu": 34, "589": 34, "407": 34, "647": 34, "convolution1d": 34, "657": 34, "einsum": 34, "alphafold2": 34, "674": 34, "711": 34, "threa": 34, "slow": 34, "equival": 34, "joint": 34, "net": 34, "pend": 34, "648": 34, "684": 34, "685": 34, "dockerhub": 34, "wheel": 34, "sdk": 34, "2x": 34, "5x": 34, "reduct": 34, "center": 34, "deploi": 34, "u8": 34, "s8": 34, "satur": 34, "occur": 34, "u7": 34, "unsign": 34, "s7": 34, "worth": 34, "upload": 34, "pip3": 34, "whl": 34, "220mb": 34, "5mb": 34, "dep": 34, "220m": 34, "cxx11": 34, "224m": 34, "7m": 34, "5m": 34, "qkv": 34, "278": 34, "531": 34, "432": 34, "438": 34, "602": 34, "sliu": 34, "hardsigmoid": 34, "relu6": 34, "selu": 34, "524": 34, "452": 34, "425": 34, "100mb": 34, "40mb": 34, "meant": 34, "resolv": 34, "te": 34, "wrap": 34, "bactchnorm": 34, "205": 34, "straightforward": 34, "underhood": 34, "torchvison": 34, "hugginfac": 34, "legal": 34, "resnet18": 34, "resnet18_xpu": 34, "enable_auto_mixed_precis": 34, "mixed_dtyp": 34, "mymodel": 34, "xx_c": 34, "xx_v": 34, "clibrat": 34, "ampconf": 34, "automixprecis": 34, "running_mod": 34, "cali_dataset": 34, "trace_model": 34, "omp_set_num_thread": 34, "model_execut": 34, "same_model_execution_again": 34, "descriptor": 34, "rc3": 34, "parti": 34, "49786": 34, "rc": 34, "readm": 34, "stakehold": 34, "5rc3": 34, "dpcpp": 34, "heterogen": 34, "bfp16": 34, "proper": 34, "tacotron2": 34, "frozenbatchnorm": 34, "embeddingbad": 34, "daili": 34, "resnext3d": 34, "maskrnn": 34, "codenam": 34, "mlp": 34, "eltwis": 34, "7x": 34, "enable_auto_optim": 34, "streamlin": 34, "enable_auto_mix_precis": 34, "inject": 34, "resnet3d": 34, "fb": 34, "yolov3": 34, "maxpool": 34}, "objects": {"": [[2, 0, 0, "-", "intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu": [[2, 0, 0, "-", "runtime"]], "intel_extension_for_pytorch.cpu.runtime": [[2, 1, 1, "", "CPUPool"], [2, 1, 1, "", "MultiStreamModule"], [2, 1, 1, "", "MultiStreamModuleHint"], [2, 1, 1, "", "Task"], [2, 2, 1, "", "get_core_list_of_node_id"], [2, 2, 1, "", "is_runtime_ext_enabled"], [2, 1, 1, "", "pin"]], "intel_extension_for_pytorch": [[2, 2, 1, "", "enable_onednn_fusion"], [2, 2, 1, "", "fast_bert"], [2, 0, 0, "-", "llm"], [2, 2, 1, "", "optimize"], [2, 0, 0, "-", "quantization"], [2, 1, 1, "", "verbose"]], "intel_extension_for_pytorch.llm": [[2, 0, 0, "-", "functional"], [2, 0, 0, "-", "modules"], [2, 2, 1, "", "optimize"]], "intel_extension_for_pytorch.llm.functional": [[2, 2, 1, "", "fast_layer_norm"], [2, 2, 1, "", "indirect_access_kv_cache_attention"], [2, 2, 1, "", "rms_norm"], [2, 2, 1, "", "rotary_embedding"], [2, 2, 1, "", "varlen_attention"]], "intel_extension_for_pytorch.llm.modules": [[2, 1, 1, "", "FastLayerNorm"], [2, 1, 1, "", "IndirectAccessKVCacheAttention"], [2, 1, 1, "", "Linear2SiluMul"], [2, 1, 1, "", "LinearAdd"], [2, 1, 1, "", "LinearAddAdd"], [2, 1, 1, "", "LinearGelu"], [2, 1, 1, "", "LinearMul"], [2, 1, 1, "", "LinearNewGelu"], [2, 1, 1, "", "LinearRelu"], [2, 1, 1, "", "LinearSilu"], [2, 1, 1, "", "LinearSiluMul"], [2, 1, 1, "", "PagedAttention"], [2, 1, 1, "", "RMSNorm"], [2, 1, 1, "", "RotaryEmbedding"], [2, 1, 1, "", "VarlenAttention"]], "intel_extension_for_pytorch.nn": [[7, 1, 1, "", "FrozenBatchNorm2d"]], "intel_extension_for_pytorch.nn.functional": [[7, 2, 1, "", "interaction"]], "intel_extension_for_pytorch.nn.modules": [[7, 1, 1, "", "MergedEmbeddingBag"], [7, 1, 1, "", "MergedEmbeddingBagWithSGD"]], "intel_extension_for_pytorch.quantization": [[2, 2, 1, "", "autotune"], [2, 2, 1, "", "convert"], [2, 2, 1, "", "get_smooth_quant_qconfig_mapping"], [2, 2, 1, "", "prepare"]]}, "objtypes": {"0": "py:module", "1": "py:class", "2": "py:function"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "class", "Python class"], "2": ["py", "function", "Python function"]}, "titleterms": {"intel": [0, 1, 5, 6, 15, 30, 31, 32, 33], "extens": [0, 1, 5, 7, 15, 20, 26, 32], "pytorch": [0, 1, 5, 15, 18, 32], "cpu": [0, 2, 17, 18, 33], "isa": [0, 7, 17], "dynam": [0, 6, 7, 15, 17, 26], "dispatch": [0, 7, 17], "design": [0, 17, 20, 31], "doc": 0, "architectur": 1, "support": [1, 8, 10], "api": [2, 7, 9, 13, 16, 17, 18, 22, 25, 28, 29], "document": [2, 5, 25, 32, 33], "gener": [2, 26], "llm": [2, 6, 7, 23, 28, 30], "modul": [2, 10, 20, 28], "level": [2, 17, 28], "optim": [2, 7, 10, 13, 15, 19, 28, 29], "prototyp": [2, 6, 7, 10, 11, 12, 14, 16, 22, 28], "fast": [2, 6, 7, 11], "bert": [2, 6, 7, 11, 32], "graph": [2, 7, 12, 13, 28], "quantiz": [2, 6, 7, 15, 16, 29], "runtim": [2, 7, 20, 26], "blog": 3, "public": 3, "cheat": 4, "sheet": 4, "contribut": 5, "develop": 5, "tip": 5, "debug": [5, 17], "unit": 5, "test": 5, "python": [5, 6, 7], "better": 5, "local": 5, "pytest": 5, "lint": 5, "c": [5, 6, 18], "write": [5, 18], "build": [5, 17], "exampl": [6, 10, 11, 12, 14, 16, 17, 20, 31], "train": [6, 8], "singl": [6, 28, 31], "instanc": [6, 28, 30, 31], "float32": [6, 8], "bfloat16": [6, 8, 21, 26, 30], "distribut": [6, 28, 29], "infer": [6, 8, 28, 29, 31, 32], "eager": [6, 8], "mode": [6, 28, 31], "resnet50": [6, 32], "torchscript": [6, 8], "torchdynamo": [6, 26], "beta": [6, 7], "new": [6, 7, 34], "featur": [6, 7, 11, 12, 17], "from": [6, 7], "2": [6, 7, 14, 32, 34], "0": [6, 7, 34], "int8": [6, 7, 13, 16, 26, 30, 32], "static": [6, 15], "calibr": [6, 15], "deploy": 6, "larg": [6, 7, 28], "languag": [6, 7, 28], "model": [6, 7, 13, 15, 18, 20, 28, 32], "fp32": [6, 10, 13, 29, 30], "bf16": [6, 10, 13, 29], "smooth": [6, 16, 22], "weight": [6, 29], "onli": [6, 29], "int4": 6, "ai": [6, 30], "refer": [6, 8], "easi": 7, "us": [7, 8, 9, 10, 13, 16, 20, 31], "1": [7, 14, 32, 34], "torch": 7, "compil": [7, 17], "auto": [7, 8, 9, 16, 20], "channel": [7, 9, 18, 33], "last": [7, 9, 18, 33], "mix": [7, 8], "precis": [7, 8, 28], "amp": [7, 8], "oper": [7, 18, 19, 28], "codeless": [7, 10], "13": [7, 34], "captur": [7, 12], "hypertun": [7, 14], "introduct": [8, 19, 25], "case": [8, 10, 20], "default": [8, 9, 14, 18, 31], "path": 8, "autocast": 8, "op": 8, "elig": 8, "specif": [8, 17], "behavior": 8, "can": 8, "promot": 8, "widest": 8, "input": [8, 20], "type": [8, 28], "eas": [9, 13], "enabl": 9, "disabl": 9, "known": [9, 20, 34], "issu": [9, 20, 34], "motiv": 10, "usag": [10, 11, 12, 14, 16, 20, 26, 29, 31], "huggingfac": 10, "The": 10, "origin": 10, "command": 10, "ipex": [10, 28], "launch": [10, 31], "appli": 10, "forward": 10, "method": 10, "explicitli": 10, "instead": 10, "__call__": 10, "attr": 10, "alreadi": 10, "jit": 10, "trace": 10, "descript": [11, 12], "prerequisit": 11, "methodologi": [13, 28], "fusion": [13, 19], "pattern": 13, "fold": 13, "your_conf_fil": 14, "hyperparamet": 14, "launcher": [14, 32], "defin": [14, 15], "search": 14, "space": 14, "tune": [14, 16, 22, 33], "user": 14, "your_python_script": 14, "qconfig": 15, "prepar": 15, "do": 15, "convert": 15, "deploi": [15, 32], "recip": [16, 20, 22], "autotun": 16, "algorithm": 16, "alpha": [16, 34], "fix": 16, "determin": 16, "through": 16, "overview": [17, 28, 30, 31, 33], "requir": [17, 20], "code": 17, "folder": 17, "struct": 17, "kernel": [17, 18], "implement": [17, 20], "csrc": 17, "aten": [17, 18], "xyzkrnl": 17, "cpp": 17, "stub": 17, "xyz": 17, "h": 17, "dyndisp": 17, "dispatchstub": 17, "codegen": 17, "process": 17, "add": 17, "custom": [17, 28], "intrin": 17, "vec": 17, "privat": 17, "select": 17, "manual": 17, "check": 17, "what": [18, 34], "i": [18, 20, 31], "memori": [18, 31, 33], "format": 18, "all": [18, 31], "That": 18, "matter": 18, "nchw": 18, "b": 18, "nhwc": 18, "wip": 18, "block": 18, "nchw16c": 18, "stride": 18, "layout": 18, "tensor": 18, "creation": 18, "convers": 18, "d": 18, "coverag": 18, "statu": 18, "regist": [18, 32], "nativ": 18, "manner": 18, "onednn": [18, 33], "creat": [18, 32], "convolut": 18, "primit": [18, 33], "target": 18, "multistream": 20, "examples1": 20, "basic": 20, "examples2": 20, "set": 20, "examples3": 20, "structur": [20, 33], "output": 20, "perform": [20, 26, 30, 32, 33, 34], "asynchron": 20, "task": 20, "configur": [20, 30, 33], "core": [20, 31, 32], "bind": 20, "detail": 20, "how": 20, "iomp": 20, "preload": 20, "load": 20, "dure": 20, "split": 21, "sgd": 21, "stochast": 21, "gradient": 21, "descent": 21, "quant": 22, "quick": 23, "start": [23, 25, 32], "instal": [24, 32], "get": 25, "troubleshoot": 26, "regress": 26, "shape": 26, "result": [26, 34], "correct": 26, "licens": 27, "list": 28, "verifi": 28, "via": 28, "deepspe": [28, 29], "demo": 28, "linear": 28, "low": 28, "data": [28, 30], "indirect": 28, "access": [28, 33], "kv": 28, "cach": [28, 33], "transform": 29, "frontend": 29, "pseudocod": 29, "common": 29, "scenario": 29, "smoothquant": 29, "woq": 29, "center": 30, "product": 30, "v1": 30, "11": [30, 34], "number": [30, 31, 33], "accuraci": 30, "softwar": [30, 33], "version": 30, "hardwar": [30, 33], "200": [30, 34], "an": 30, "aw": 30, "ec2": 30, "c6i": 30, "2xlarg": 30, "10": [30, 34], "script": 31, "guid": [31, 33], "physic": 31, "ii": 31, "includ": 31, "logic": 31, "iii": 31, "node": 31, "iv": 31, "your": 31, "multipl": 31, "v": 31, "throughput": 31, "vi": 31, "latenc": 31, "vii": 31, "viii": 31, "index": 31, "jemalloc": [31, 33], "tcmalloc": [31, 33], "alloc": [31, 33], "openmp": [31, 33], "librari": 31, "gnu": [31, 33], "torchserv": 32, "content": [32, 33], "thi": [32, 33], "serv": 32, "pin": 32, "boost": 32, "multi": 32, "worker": 32, "scale": 32, "export": 32, "serial": 32, "file": 32, "archiv": 32, "3": [32, 34], "4": 32, "benchmark": 32, "non": 33, "uniform": 33, "numa": 33, "numactl": 33, "omp_num_thread": 33, "omp_thread_limit": 33, "denorm": 33, "releas": 34, "highlight": 34, "100": 34, "12": 34, "300": 34, "": 34, "chang": 34, "9": 34, "8": 34, "improv": 34, "other": 34, "note": 34}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc": [[0, "intel-extension-for-pytorch-cpu-isa-dynamic-dispatch-design-doc"]], "Intel\u00ae Extension for PyTorch*": [[1, "intel-extension-for-pytorch"]], "Architecture": [[1, "architecture"]], "Support": [[1, "support"]], "API Documentation": [[2, "api-documentation"], [25, "api-documentation"]], "General": [[2, "general"]], "LLM Module Level Optimizations (Prototype)": [[2, "llm-module-level-optimizations-prototype"]], "Fast Bert (Prototype)": [[2, "fast-bert-prototype"], [6, "fast-bert-prototype"]], "Graph Optimization": [[2, "graph-optimization"], [7, "graph-optimization"], [13, "graph-optimization"], [28, "graph-optimization"]], "Quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "CPU Runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "Blogs & Publications": [[3, "blogs-publications"]], "Cheat Sheet": [[4, "cheat-sheet"]], "Contribution": [[5, "contribution"]], "Contributing to Intel\u00ae Extension for PyTorch*": [[5, "contributing-to-intel-extension-for-pytorch"]], "Developing Intel\u00ae Extension for PyTorch*": [[5, "developing-intel-extension-for-pytorch"]], "Tips and Debugging": [[5, "tips-and-debugging"]], "Unit testing": [[5, "unit-testing"]], "Python Unit Testing": [[5, "python-unit-testing"]], "Better local unit tests with pytest": [[5, "better-local-unit-tests-with-pytest"]], "Local linting": [[5, "local-linting"]], "C++ Unit Testing": [[5, "c-unit-testing"]], "Writing documentation": [[5, "writing-documentation"]], "Building documentation": [[5, "building-documentation"]], "Tips": [[5, "tips"]], "Examples": [[6, "examples"]], "Python": [[6, "python"]], "Training": [[6, "training"]], "Single-instance Training": [[6, "single-instance-training"]], "Float32": [[6, "float32"], [6, "id1"]], "BFloat16": [[6, "bfloat16"], [6, "id6"], [21, "bfloat16"], [26, "bfloat16"]], "Distributed Training": [[6, "distributed-training"]], "Inference": [[6, "inference"]], "Eager Mode": [[6, "eager-mode"], [6, "id7"]], "Resnet50": [[6, "resnet50"], [6, "id2"], [6, "id4"], [6, "id8"], [6, "id11"], [6, "id14"]], "BERT": [[6, "bert"], [6, "id3"], [6, "id5"], [6, "id9"], [6, "id12"], [6, "id15"], [32, "bert"]], "TorchScript Mode": [[6, "torchscript-mode"], [6, "id10"]], "TorchDynamo Mode (Beta, NEW feature from 2.0.0)": [[6, "torchdynamo-mode-beta-new-feature-from-2-0-0"], [6, "id13"]], "INT8": [[6, "int8"], [26, "int8"]], "Static Quantization": [[6, "static-quantization"], [15, "static-quantization"]], "Calibration": [[6, "calibration"]], "Deployment": [[6, "deployment"]], "Dynamic Quantization": [[6, "dynamic-quantization"], [15, "dynamic-quantization"]], "Large Language Model (LLM)": [[6, "large-language-model-llm"]], "FP32/BF16": [[6, "fp32-bf16"], [29, "fp32-bf16"]], "Smooth Quantization INT8": [[6, "smooth-quantization-int8"]], "Weight Only Quantization INT8/INT4": [[6, "weight-only-quantization-int8-int4"]], "C++": [[6, "c"]], "Intel\u00ae AI Reference Models": [[6, "intel-ai-reference-models"]], "Features": [[7, "features"]], "Easy-to-use Python API": [[7, "easy-to-use-python-api"]], "Large Language Models (LLM, NEW feature from 2.1.0)": [[7, "large-language-models-llm-new-feature-from-2-1-0"]], "torch.compile (Beta, NEW feature from 2.0.0)": [[7, "torch-compile-beta-new-feature-from-2-0-0"]], "ISA Dynamic Dispatching": [[7, "isa-dynamic-dispatching"], [17, "isa-dynamic-dispatching"]], "Auto Channels Last": [[7, "auto-channels-last"], [9, "auto-channels-last"]], "Auto Mixed Precision (AMP)": [[7, "auto-mixed-precision-amp"], [8, "auto-mixed-precision-amp"]], "Operator Optimization": [[7, "operator-optimization"]], "Optimizer Optimization": [[7, "optimizer-optimization"]], "Runtime Extension": [[7, "runtime-extension"], [20, "runtime-extension"], [26, "runtime-extension"]], "INT8 Quantization": [[7, "int8-quantization"]], "Codeless Optimization (Prototype, NEW feature from 1.13.0)": [[7, "codeless-optimization-prototype-new-feature-from-1-13-0"]], "Graph Capture (Prototype, NEW feature from 1.13.0)": [[7, "graph-capture-prototype-new-feature-from-1-13-0"]], "HyperTune (Prototype, NEW feature from 1.13.0)": [[7, "hypertune-prototype-new-feature-from-1-13-0"]], "Fast BERT Optimization (Prototype, NEW feature from 2.0.0)": [[7, "fast-bert-optimization-prototype-new-feature-from-2-0-0"]], "Introduction": [[8, "introduction"], [19, "introduction"], [25, "introduction"]], "Use Case": [[8, "use-case"]], "Default Precision": [[8, "default-precision"]], "Inference with Eager Path": [[8, "inference-with-eager-path"]], "Inference with TorchScript Path": [[8, "inference-with-torchscript-path"]], "Training Support": [[8, "training-support"]], "Autocast Op Reference": [[8, "autocast-op-reference"]], "Op Eligibility": [[8, "op-eligibility"]], "Op-Specific Behavior": [[8, "op-specific-behavior"]], "Ops that can autocast to bfloat16": [[8, "ops-that-can-autocast-to-bfloat16"]], "Ops that can autocast to float32": [[8, "ops-that-can-autocast-to-float32"]], "Ops that promote to the widest input type": [[8, "ops-that-promote-to-the-widest-input-type"]], "Ease-of-use auto channels last API": [[9, "ease-of-use-auto-channels-last-api"]], "default": [[9, "default"]], "enable": [[9, "enable"]], "disable": [[9, "disable"]], "Known issue": [[9, "known-issue"], [34, "known-issue"], [34, "id43"]], "Codeless Optimization (Prototype)": [[10, "codeless-optimization-prototype"]], "Motivation": [[10, "motivation"]], "Example Usage with HuggingFace": [[10, "example-usage-with-huggingface"]], "The origin command with ipex launch": [[10, "the-origin-command-with-ipex-launch"]], "Command to apply ipex optimization for FP32": [[10, "command-to-apply-ipex-optimization-for-fp32"]], "Command to apply ipex optimization for BF16": [[10, "command-to-apply-ipex-optimization-for-bf16"]], "Use Case not supported": [[10, "use-case-not-supported"]], "Module uses forward method explicitly instead of the __call__ attr": [[10, "module-uses-forward-method-explicitly-instead-of-the-call-attr"]], "Already using ipex.optimize": [[10, "already-using-ipex-optimize"]], "Already using Jit Trace": [[10, "already-using-jit-trace"]], "Fast BERT (Prototype)": [[11, "fast-bert-prototype"]], "Feature Description": [[11, "feature-description"], [12, "feature-description"]], "Prerequisite": [[11, "prerequisite"]], "Usage Example": [[11, "usage-example"], [12, "usage-example"], [16, "usage-example"]], "Graph Capture (Prototype)": [[12, "graph-capture-prototype"]], "Ease-of-use graph optimization API": [[13, "ease-of-use-graph-optimization-api"]], "FP32 and BF16 models": [[13, "fp32-and-bf16-models"]], "INT8 models": [[13, "int8-models"]], "Methodology": [[13, "methodology"]], "Fusion": [[13, "fusion"]], "FP32 and BF16 fusion patterns": [[13, "fp32-and-bf16-fusion-patterns"]], "INT8 fusion patterns": [[13, "int8-fusion-patterns"]], "Folding": [[13, "folding"]], "HyperTune (Prototype)": [[14, "hypertune-prototype"]], "Usage of Hypertune": [[14, "usage-of-hypertune"]], "your_conf_file": [[14, "your-conf-file"]], "Hyperparameters": [[14, "hyperparameters"]], "Launcher Hyperparameters": [[14, "launcher-hyperparameters"]], "Defining hyperparameters and their search spaces": [[14, "defining-hyperparameters-and-their-search-spaces"]], "1. Defining hyperparameters to tune:": [[14, "defining-hyperparameters-to-tune"]], "2. Defining the search spaces of the hyperparameters:": [[14, "defining-the-search-spaces-of-the-hyperparameters"]], "Default search space": [[14, "default-search-space"]], "User defined search space": [[14, "user-defined-search-space"]], "<your_python_script>": [[14, "your-python-script"]], "Usage Examples": [[14, "usage-examples"], [31, "usage-examples"]], "Intel\u00ae Extension for PyTorch* optimizations for quantization": [[15, "intel-extension-for-pytorch-optimizations-for-quantization"]], "Define qconfig": [[15, "define-qconfig"]], "Prepare Model and Do Calibration": [[15, "prepare-model-and-do-calibration"]], "Convert to Static Quantized Model and Deploy": [[15, "convert-to-static-quantized-model-and-deploy"]], "Define QConfig": [[15, "id1"]], "Prepare Model": [[15, "prepare-model"]], "Convert to Dynamic Quantized Model and Deploy": [[15, "convert-to-dynamic-quantized-model-and-deploy"]], "INT8 Recipe Tuning API (Prototype)": [[16, "int8-recipe-tuning-api-prototype"]], "Smooth Quantization Autotune": [[16, "smooth-quantization-autotune"]], "Algorithm: Auto-tuning of $\\alpha$.": [[16, "algorithm-auto-tuning-of-alpha"]], "$\\alpha$ Usage": [[16, "alpha-usage"]], "Using a fixed alpha": [[16, "using-a-fixed-alpha"]], "Determining the alpha through auto-tuning": [[16, "determining-the-alpha-through-auto-tuning"]], "Overview": [[17, "overview"], [30, "overview"], [31, "overview"], [33, "overview"]], "CPU ISA build compiler requirement": [[17, "cpu-isa-build-compiler-requirement"]], "Dynamic Dispatch Design": [[17, "dynamic-dispatch-design"]], "Code Folder Struct": [[17, "code-folder-struct"]], "Kernel implementation: csrc/cpu/aten/kernels/xyzKrnl.cpp": [[17, "kernel-implementation-csrc-cpu-aten-kernels-xyzkrnl-cpp"]], "Kernel Stub: csrc/cpu/aten/xyz.cpp and csrc/cpu/aten/xyz.h": [[17, "kernel-stub-csrc-cpu-aten-xyz-cpp-and-csrc-cpu-aten-xyz-h"]], "Dispatch Stub implementation: csrc/cpu/dyndisp/DispatchStub.cpp and csrc/cpu/dyndisp/DispatchStub.h": [[17, "dispatch-stub-implementation-csrc-cpu-dyndisp-dispatchstub-cpp-and-csrc-cpu-dyndisp-dispatchstub-h"]], "CodeGen Process": [[17, "codegen-process"]], "Add Custom Kernel": [[17, "add-custom-kernel"]], "ISA intrinics specific kernel example:": [[17, "isa-intrinics-specific-kernel-example"]], "Vec specific kernel example:": [[17, "vec-specific-kernel-example"]], "Private Debug APIs": [[17, "private-debug-apis"]], "Example:": [[17, "example"], [17, "id1"]], "Select ISA level manually.": [[17, "select-isa-level-manually"]], "CPU feature check": [[17, "cpu-feature-check"]], "Channels Last": [[18, "channels-last"], [33, "channels-last"]], "What is Channels Last": [[18, "what-is-channels-last"]], "Memory Format Is All That Matters": [[18, "memory-format-is-all-that-matters"]], "a. NCHW (default)": [[18, "a-nchw-default"]], "b. NHWC (WIP for CPU)": [[18, "b-nhwc-wip-for-cpu"]], "c. Blocked (nChw16c)": [[18, "c-blocked-nchw16c"]], "PyTorch Strided Layout": [[18, "pytorch-strided-layout"]], "PyTorch Channels Last Memory Format APIs": [[18, "pytorch-channels-last-memory-format-apis"]], "a. tensor creation": [[18, "a-tensor-creation"]], "b. tensor conversion": [[18, "b-tensor-conversion"]], "c. model conversion": [[18, "c-model-conversion"]], "d. operator coverage": [[18, "d-operator-coverage"]], "Writing Channels Last Kernels": [[18, "writing-channels-last-kernels"]], "a. Status on CPU": [[18, "a-status-on-cpu"]], "b. Register Channels Last Kernel in ATen Native Manner": [[18, "b-register-channels-last-kernel-in-aten-native-manner"]], "c. Register oneDNN Kernel on Channels Last": [[18, "c-register-onednn-kernel-on-channels-last"]], "oneDNN NHWC APIs": [[18, "onednn-nhwc-apis"]], "a. Create NHWC Memory": [[18, "a-create-nhwc-memory"]], "b. Create Convolution Primitive": [[18, "b-create-convolution-primitive"]], "CPU Channels Last Targets": [[18, "cpu-channels-last-targets"]], "Optimizer Fusion": [[19, "optimizer-fusion"]], "Operation Fusion": [[19, "operation-fusion"]], "Requirements": [[20, "requirements"]], "Use Cases": [[20, "use-cases"]], "Example of MultiStream Module": [[20, "example-of-multistream-module"]], "Examples1: Basic Usage": [[20, "examples1-basic-usage"]], "Examples2: Usage with \u201cAUTO\u201d setting": [[20, "examples2-usage-with-auto-setting"]], "Examples3: Usage for models with structure inputs/outputs": [[20, "examples3-usage-for-models-with-structure-inputs-outputs"]], "Performance recipes": [[20, "performance-recipes"]], "Known issues": [[20, "known-issues"], [34, "id37"]], "Example of asynchronous task": [[20, "example-of-asynchronous-task"]], "Example of configuring core binding": [[20, "example-of-configuring-core-binding"]], "Detail Design": [[20, "detail-design"]], "How the core binding is implemented": [[20, "how-the-core-binding-is-implemented"]], "Design of Task": [[20, "design-of-task"]], "IOMP preload or load during the runtime": [[20, "iomp-preload-or-load-during-the-runtime"]], "Split SGD": [[21, "split-sgd"], [21, "id2"]], "Stochastic Gradient Descent (SGD)": [[21, "stochastic-gradient-descent-sgd"]], "Smooth Quant Recipe Tuning API (Prototype)": [[22, "smooth-quant-recipe-tuning-api-prototype"]], "Quick Start": [[23, "quick-start"]], "LLM Quick Start": [[23, "llm-quick-start"]], "Installation": [[24, "installation"]], "Get Started": [[25, "get-started"]], "Troubleshooting": [[26, "troubleshooting"]], "General Usage": [[26, "general-usage"]], "Performance Regression": [[26, "performance-regression"]], "TorchDynamo": [[26, "torchdynamo"]], "Dynamic Shape": [[26, "dynamic-shape"]], "Result Correctness": [[26, "result-correctness"]], "License": [[27, "license"]], "Large Language Models (LLM) Optimization Overview": [[28, "large-language-models-llm-optimization-overview"]], "ipex.llm Optimized Model List": [[28, "ipex-llm-optimized-model-list"]], "Verified for single instance mode": [[28, "verified-for-single-instance-mode"]], "Verified for distributed inference mode via DeepSpeed": [[28, "verified-for-distributed-inference-mode-via-deepspeed"]], "Module Level Optimization API for customized LLM (Prototype)": [[28, "module-level-optimization-api-for-customized-llm-prototype"]], "Demos": [[28, "demos"]], "Optimization Methodologies": [[28, "optimization-methodologies"]], "Linear Operator Optimization": [[28, "linear-operator-optimization"]], "Low Precision Data Types": [[28, "low-precision-data-types"]], "Indirect Access KV Cache": [[28, "indirect-access-kv-cache"]], "Distributed Inference": [[28, "distributed-inference"]], "Transformers Optimization Frontend API": [[29, "transformers-optimization-frontend-api"]], "Pseudocode of Common Usage Scenarios": [[29, "pseudocode-of-common-usage-scenarios"]], "SmoothQuant": [[29, "smoothquant"]], "Weight Only Quantization (WOQ)": [[29, "weight-only-quantization-woq"]], "Distributed Inference with DeepSpeed": [[29, "distributed-inference-with-deepspeed"]], "Performance": [[30, "performance"], [34, "performance"]], "Performance Data for Intel\u00ae AI Data Center Products": [[30, "performance-data-for-intel-ai-data-center-products"]], "LLM Performance": [[30, "llm-performance"]], "INT8 with v1.11": [[30, "int8-with-v1-11"]], "Performance Numbers": [[30, "performance-numbers"], [30, "id1"], [30, "id4"]], "Accuracy": [[30, "accuracy"]], "Configuration": [[30, "configuration"], [30, "id2"], [30, "id5"]], "Software Version": [[30, "software-version"], [30, "id3"], [30, "id6"]], "Hardware Configuration": [[30, "hardware-configuration"], [30, "id7"], [33, "hardware-configuration"]], "FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance": [[30, "fp32-with-v1-11-200-on-an-aws-ec2-c6i-2xlarge-instance"]], "FP32 and BFloat16 with v1.10": [[30, "fp32-and-bfloat16-with-v1-10"]], "Launch Script Usage Guide": [[31, "launch-script-usage-guide"]], "Usage of launch script": [[31, "usage-of-launch-script"]], "Single instance for inference": [[31, "single-instance-for-inference"]], "I. Use all physical cores": [[31, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[31, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on designated nodes": [[31, "iii-use-physical-cores-on-designated-nodes"]], "IV. Use your designated number of cores": [[31, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[31, "multiple-instances-for-inference"]], "V. Throughput mode": [[31, "v-throughput-mode"]], "VI. Latency mode": [[31, "vi-latency-mode"]], "VII. Your designated number of instances": [[31, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[31, "viii-your-designated-number-of-instances-and-instance-index"]], "Usage of Jemalloc/TCMalloc/Default memory allocator": [[31, "usage-of-jemalloc-tcmalloc-default-memory-allocator"]], "Jemalloc": [[31, "jemalloc"], [33, "jemalloc"]], "TCMalloc": [[31, "tcmalloc"], [33, "tcmalloc"]], "Default memory allocator": [[31, "default-memory-allocator"]], "Usage of OpenMP library": [[31, "usage-of-openmp-library"]], "Intel OpenMP Library": [[31, "intel-openmp-library"]], "GNU OpenMP Library": [[31, "gnu-openmp-library"]], "TorchServe with Intel\u00ae Extension for PyTorch*": [[32, "torchserve-with-intel-extension-for-pytorch"]], "Contents of this Document": [[32, "contents-of-this-document"], [33, "contents-of-this-document"]], "Install Intel\u00ae Extension for PyTorch*": [[32, "install-intel-extension-for-pytorch"]], "Serving model with Intel\u00ae Extension for PyTorch*": [[32, "serving-model-with-intel-extension-for-pytorch"]], "TorchServe with Launcher": [[32, "torchserve-with-launcher"]], "Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference": [[32, "launcher-core-pinning-to-boost-performance-of-torchserve-multi-worker-inference"]], "Scaling workers": [[32, "scaling-workers"]], "Creating and Exporting INT8 model for Intel\u00ae Extension for PyTorch*": [[32, "creating-and-exporting-int8-model-for-intel-extension-for-pytorch"]], "1. Creating a serialized file": [[32, "creating-a-serialized-file"]], "ResNet50": [[32, "resnet50"]], "2. Creating a Model Archive": [[32, "creating-a-model-archive"]], "3. Start TorchServe to serve the model": [[32, "start-torchserve-to-serve-the-model"]], "4. Registering and Deploying model": [[32, "registering-and-deploying-model"]], "Benchmarking with Launcher": [[32, "benchmarking-with-launcher"]], "Benchmarking with Launcher Core Pinning": [[32, "benchmarking-with-launcher-core-pinning"]], "Performance Boost with Intel\u00ae Extension for PyTorch* and Launcher": [[32, "performance-boost-with-intel-extension-for-pytorch-and-launcher"]], "Performance Tuning Guide": [[33, "performance-tuning-guide"]], "Intel CPU Structure": [[33, "intel-cpu-structure"]], "Non-Uniform Memory Access (NUMA)": [[33, "non-uniform-memory-access-numa"]], "Software Configuration": [[33, "software-configuration"]], "Numactl": [[33, "numactl"]], "OpenMP": [[33, "openmp"]], "OMP_NUM_THREADS": [[33, "omp-num-threads"]], "OMP_THREAD_LIMIT": [[33, "omp-thread-limit"]], "GNU OpenMP": [[33, "gnu-openmp"]], "Intel OpenMP": [[33, "intel-openmp"]], "Memory Allocator": [[33, "memory-allocator"]], "Denormal Number": [[33, "denormal-number"]], "OneDNN primitive cache": [[33, "onednn-primitive-cache"]], "Releases": [[34, "releases"]], "2.3.0": [[34, "id1"]], "Highlights": [[34, "highlights"], [34, "id3"], [34, "id5"], [34, "id7"], [34, "id9"], [34, "id11"], [34, "id13"], [34, "id15"], [34, "id18"], [34, "id21"], [34, "id24"], [34, "id26"], [34, "id29"]], "2.2.0": [[34, "id2"]], "2.1.100": [[34, "id4"]], "2.1.0": [[34, "id6"]], "2.0.100": [[34, "id8"]], "2.0.0": [[34, "id10"]], "Known Issues": [[34, "known-issues"], [34, "id16"], [34, "id22"], [34, "id30"]], "1.13.100": [[34, "id12"]], "1.13.0": [[34, "id14"]], "1.12.300": [[34, "id17"]], "1.12.100": [[34, "id19"]], "1.12.0": [[34, "id20"]], "1.11.200": [[34, "id23"]], "1.11.0": [[34, "id25"]], "What\u2019s Changed": [[34, "what-s-changed"], [34, "id31"]], "1.10.100": [[34, "id27"]], "1.10.0": [[34, "id28"]], "1.9.0": [[34, "id32"]], "What\u2019s New": [[34, "what-s-new"], [34, "id34"], [34, "id36"], [34, "id39"], [34, "id42"]], "1.8.0": [[34, "id33"]], "1.2.0": [[34, "id35"]], "Performance Improvement": [[34, "performance-improvement"]], "Others": [[34, "others"]], "1.1.0": [[34, "id38"]], "1.0.2": [[34, "id40"]], "1.0.1-Alpha": [[34, "alpha"]], "1.0.0-Alpha": [[34, "id41"]], "Performance Result": [[34, "performance-result"]], "NOTE": [[34, "note"]]}, "indexentries": {"cpupool (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.CPUPool"]], "fastlayernorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.FastLayerNorm"]], "indirectaccesskvcacheattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.IndirectAccessKVCacheAttention"]], "linear2silumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.Linear2SiluMul"]], "linearadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAdd"]], "linearaddadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAddAdd"]], "lineargelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearGelu"]], "linearmul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearMul"]], "linearnewgelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearNewGelu"]], "linearrelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearRelu"]], "linearsilu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSilu"]], "linearsilumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSiluMul"]], "multistreammodule (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModule"]], "multistreammodulehint (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModuleHint"]], "pagedattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.PagedAttention"]], "rmsnorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RMSNorm"]], "rotaryembedding (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RotaryEmbedding"]], "task (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.Task"]], "varlenattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.VarlenAttention"]], "autotune() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.autotune"]], "convert() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.convert"]], "enable_onednn_fusion() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.enable_onednn_fusion"]], "fast_bert() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.fast_bert"]], "fast_layer_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.fast_layer_norm"]], "get_core_list_of_node_id() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.get_core_list_of_node_id"]], "get_smooth_quant_qconfig_mapping() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.get_smooth_quant_qconfig_mapping"]], "indirect_access_kv_cache_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.indirect_access_kv_cache_attention"]], "intel_extension_for_pytorch": [[2, "module-intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu.runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "intel_extension_for_pytorch.llm": [[2, "module-intel_extension_for_pytorch.llm"]], "intel_extension_for_pytorch.llm.functional": [[2, "module-intel_extension_for_pytorch.llm.functional"]], "intel_extension_for_pytorch.llm.modules": [[2, "module-intel_extension_for_pytorch.llm.modules"]], "intel_extension_for_pytorch.quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "is_runtime_ext_enabled() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.is_runtime_ext_enabled"]], "module": [[2, "module-intel_extension_for_pytorch"], [2, "module-intel_extension_for_pytorch.cpu.runtime"], [2, "module-intel_extension_for_pytorch.llm"], [2, "module-intel_extension_for_pytorch.llm.functional"], [2, "module-intel_extension_for_pytorch.llm.modules"], [2, "module-intel_extension_for_pytorch.quantization"]], "optimize() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.optimize"]], "optimize() (in module intel_extension_for_pytorch.llm)": [[2, "intel_extension_for_pytorch.llm.optimize"]], "pin (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.pin"]], "prepare() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.prepare"]], "rms_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rms_norm"]], "rotary_embedding() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rotary_embedding"]], "varlen_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.varlen_attention"]], "verbose (class in intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.verbose"]], "frozenbatchnorm2d (class in intel_extension_for_pytorch.nn)": [[7, "intel_extension_for_pytorch.nn.FrozenBatchNorm2d"]], "mergedembeddingbag (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBag"]], "mergedembeddingbagwithsgd (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBagWithSGD"]], "interaction() (in module intel_extension_for_pytorch.nn.functional)": [[7, "intel_extension_for_pytorch.nn.functional.interaction"]]}})
\ No newline at end of file
+Search.setIndex({"docnames": ["design_doc/cpu/isa_dyndisp", "index", "tutorials/api_doc", "tutorials/blogs_publications", "tutorials/cheat_sheet", "tutorials/contribution", "tutorials/examples", "tutorials/features", "tutorials/features/amp", "tutorials/features/auto_channels_last", "tutorials/features/codeless_optimization", "tutorials/features/fast_bert", "tutorials/features/graph_capture", "tutorials/features/graph_optimization", "tutorials/features/hypertune", "tutorials/features/int8_overview", "tutorials/features/int8_recipe_tuning_api", "tutorials/features/isa_dynamic_dispatch", "tutorials/features/nhwc", "tutorials/features/optimizer_fusion", "tutorials/features/runtime_extension", "tutorials/features/split_sgd", "tutorials/features/sq_recipe_tuning_api", "tutorials/getting_started", "tutorials/installation", "tutorials/introduction", "tutorials/known_issues", "tutorials/license", "tutorials/llm", "tutorials/llm/llm_optimize", "tutorials/performance", "tutorials/performance_tuning/launch_script", "tutorials/performance_tuning/torchserve", "tutorials/performance_tuning/tuning_guide", "tutorials/releases"], "filenames": ["design_doc/cpu/isa_dyndisp.md", "index.rst", "tutorials/api_doc.rst", "tutorials/blogs_publications.md", "tutorials/cheat_sheet.md", "tutorials/contribution.md", "tutorials/examples.md", "tutorials/features.rst", "tutorials/features/amp.md", "tutorials/features/auto_channels_last.md", "tutorials/features/codeless_optimization.md", "tutorials/features/fast_bert.md", "tutorials/features/graph_capture.md", "tutorials/features/graph_optimization.md", "tutorials/features/hypertune.md", "tutorials/features/int8_overview.md", "tutorials/features/int8_recipe_tuning_api.md", "tutorials/features/isa_dynamic_dispatch.md", "tutorials/features/nhwc.md", "tutorials/features/optimizer_fusion.md", "tutorials/features/runtime_extension.md", "tutorials/features/split_sgd.rst", "tutorials/features/sq_recipe_tuning_api.md", "tutorials/getting_started.md", "tutorials/installation.md", "tutorials/introduction.rst", "tutorials/known_issues.md", "tutorials/license.md", "tutorials/llm.rst", "tutorials/llm/llm_optimize.md", "tutorials/performance.md", "tutorials/performance_tuning/launch_script.md", "tutorials/performance_tuning/torchserve.md", "tutorials/performance_tuning/tuning_guide.md", "tutorials/releases.md"], "titles": ["Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc", "Intel\u00ae Extension for PyTorch*", "API Documentation", "Blogs &amp; Publications", "Cheat Sheet", "Contribution", "Examples", "Features", "Auto Mixed Precision (AMP)", "Auto Channels Last", "Codeless Optimization (Prototype)", "Fast BERT (Prototype)", "Graph Capture (Prototype)", "Graph Optimization", "HyperTune (Prototype)", "Intel\u00ae Extension for PyTorch* optimizations for quantization", "INT8 Recipe Tuning API (Prototype)", "ISA Dynamic Dispatching", "Channels Last", "Optimizer Fusion", "Runtime Extension", "Split SGD", "Smooth Quant Recipe Tuning API (Prototype)", "Quick Start", "Installation", "Introduction", "Troubleshooting", "License", "Large Language Models (LLM) Optimization Overview", "Transformers Optimization Frontend API", "Performance", "Launch Script Usage Guide", "TorchServe with Intel\u00ae Extension for PyTorch*", "Performance Tuning Guide", "Releases"], "terms": {"The": [0, 1, 2, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "document": [0, 7, 17, 20, 29, 34], "i": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 34], "redirect": 0, "thi": [0, 2, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 26, 27, 28, 29, 30, 31, 34], "link": [0, 1, 6, 17, 34], "now": [0, 2, 7, 15, 18, 32, 33, 34], "intel optim": 1, "intel\u00ae extension for pytorch*": 1, "gpu": [1, 3, 18, 34], "discrete gpu": 1, "intel discrete gpu": 1, "extend": [1, 18, 25, 33, 34], "latest": [1, 2, 25, 28, 30, 34], "perform": [1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 18, 19, 21, 25, 28, 29, 31], "optim": [1, 3, 4, 6, 8, 9, 11, 12, 14, 16, 18, 20, 21, 23, 25, 26, 31, 32, 33, 34], "hardwar": [1, 3, 17, 25, 28, 32, 34], "take": [1, 2, 7, 8, 10, 12, 13, 14, 18, 21, 25, 26, 30, 31, 33], "advantag": [1, 2, 7, 9, 12, 18, 21, 25, 30, 31, 33], "advanc": [1, 2, 6, 7, 16, 25, 28], "vector": [1, 2, 6, 17, 18, 25, 28], "512": [1, 6, 11, 16, 25, 28, 31], "avx": [1, 6, 17, 25, 28], "neural": [1, 3, 7, 16, 22, 25, 28, 33, 34], "network": [1, 3, 7, 8, 20, 25, 28, 33], "instruct": [1, 5, 6, 7, 8, 17, 21, 23, 24, 25, 28, 30, 33, 34], "vnni": [1, 15, 17, 25, 28], "matrix": [1, 6, 7, 25, 28], "amx": [1, 3, 6, 7, 17, 25, 28, 30], "cpu": [1, 3, 4, 5, 6, 7, 8, 10, 14, 15, 16, 19, 20, 23, 25, 26, 28, 30, 31, 32, 34], "well": [1, 2, 5, 6, 7, 11, 16, 20, 21, 24, 28, 32, 33, 34], "x": [1, 5, 6, 8, 10, 13, 15, 16, 17, 18, 20, 21, 23, 26, 34], "e": [1, 2, 6, 7, 8, 12, 16, 17, 18, 28, 31, 33, 34], "xmx": 1, "ai": [1, 2, 3, 7, 28], "engin": [1, 6, 18, 33], "discret": 1, "moreov": [1, 2, 28], "provid": [1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 16, 20, 22, 24, 26, 28, 29, 31, 32, 33, 34], "easi": [1, 3, 21], "acceler": [1, 2, 3, 6, 7, 13, 28, 29, 30, 34], "through": [1, 2, 6, 7, 8, 12, 25, 28, 33, 34], "xpu": [1, 2, 3, 34], "devic": [1, 2, 15, 29, 31, 34], "In": [1, 2, 6, 7, 8, 12, 16, 17, 18, 19, 21, 23, 28, 31, 32, 33, 34], "current": [1, 2, 5, 7, 11, 13, 14, 15, 16, 17, 19, 20, 26, 28, 29, 34], "technolog": [1, 7, 28], "landscap": [1, 7, 28], "gener": [1, 5, 6, 7, 10, 12, 16, 17, 18, 21, 23, 28, 29, 30, 31, 32, 33, 34], "genai": [1, 7, 28], "workload": [1, 6, 7, 8, 10, 11, 12, 21, 26, 28, 29, 30, 31, 33, 34], "model": [1, 2, 3, 4, 8, 9, 10, 11, 12, 14, 16, 23, 24, 25, 26, 29, 30, 33, 34], "have": [1, 2, 5, 6, 7, 9, 14, 17, 18, 20, 21, 23, 26, 27, 28, 30, 31, 32, 33, 34], "gain": [1, 7, 26, 28, 34], "widespread": [1, 7, 28], "attent": [1, 2, 7, 28, 34], "popular": [1, 7, 22, 28, 30, 34], "larg": [1, 2, 19, 23, 24, 25, 26, 29, 30, 33, 34], "languag": [1, 2, 23, 24, 25, 26, 29, 34], "llm": [1, 16, 22, 24, 25, 29, 34], "emerg": [1, 7, 28], "domin": [1, 7, 28], "drive": [1, 7, 28], "applic": [1, 2, 7, 20, 28, 32, 33], "start": [1, 3, 4, 5, 6, 7, 10, 20, 24, 34], "from": [1, 2, 3, 4, 5, 8, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 29, 31, 32, 33, 34], "2": [1, 2, 3, 8, 10, 16, 17, 18, 20, 21, 25, 26, 27, 28, 29, 30, 31, 33], "1": [1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 33], "0": [1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 32, 33], "specif": [1, 2, 5, 6, 7, 12, 18, 20, 26, 28, 31, 33, 34], "certain": [1, 7, 26, 28, 29, 31, 33], "ar": [1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 31, 32, 33, 34], "introduc": [1, 3, 7, 15, 18, 21, 22, 31, 33, 34], "For": [1, 2, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 34], "more": [1, 2, 5, 6, 7, 8, 10, 11, 13, 16, 17, 19, 20, 21, 23, 26, 28, 32, 33, 34], "inform": [1, 2, 6, 7, 14, 17, 18, 28, 31, 32, 33, 34], "refer": [1, 7, 9, 13, 14, 16, 17, 18, 20, 22, 23, 24, 25, 32, 34], "section": [1, 6, 7, 8, 14, 20, 23, 24, 25, 28, 29, 32, 33, 34], "can": [1, 2, 5, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 28, 29, 30, 31, 32, 33, 34], "load": [1, 2, 6, 7, 13, 15, 16, 17, 23, 29, 32, 34], "python": [1, 2, 4, 10, 14, 17, 20, 26, 28, 29, 31, 32, 33, 34], "modul": [1, 6, 7, 8, 13, 16, 17, 26, 29, 31, 34], "program": [1, 5, 7, 11, 20, 31, 33, 34], "c": [1, 7, 8, 16, 17, 20, 26, 28, 31, 32, 33, 34], "librari": [1, 2, 5, 6, 7, 17, 20, 32, 33, 34], "script": [1, 2, 3, 4, 5, 6, 7, 8, 10, 14, 17, 20, 23, 24, 26, 28, 29, 30, 32, 33, 34], "user": [1, 2, 7, 9, 10, 12, 13, 15, 16, 18, 20, 26, 31, 32, 33, 34], "enabl": [1, 2, 3, 4, 6, 7, 8, 10, 13, 16, 18, 20, 22, 23, 26, 28, 31, 32, 33, 34], "dynam": [1, 4, 20, 28, 32, 33, 34], "import": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 32, 33, 34], "intel_extension_for_pytorch": [1, 2, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 23, 25, 29, 32, 34], "featur": [1, 2, 3, 5, 8, 10, 13, 14, 18, 20, 23, 25, 26, 28, 30, 31, 32, 33, 34], "includ": [1, 2, 5, 6, 7, 10, 14, 15, 17, 23, 26, 27, 28, 30, 34], "onli": [1, 2, 5, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 20, 21, 26, 28, 31, 32, 34], "packag": [1, 2, 5, 6, 7, 10, 23, 25, 26, 32, 33, 34], "mai": [1, 2, 3, 5, 6, 7, 8, 9, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "newer": [1, 28, 33], "code": [1, 2, 5, 6, 7, 10, 11, 12, 13, 18, 19, 21, 23, 24, 26, 27, 29, 33, 34], "base": [1, 2, 3, 4, 5, 6, 7, 10, 11, 17, 20, 21, 26, 28, 29, 30, 32, 33, 34], "due": [1, 8, 10, 17, 20, 26], "differ": [1, 2, 6, 7, 15, 16, 17, 18, 20, 28, 31, 32, 33, 34], "develop": [1, 3, 6, 28, 30, 33, 34], "schedul": [1, 2, 13, 20, 31, 33], "ha": [1, 2, 7, 10, 14, 17, 18, 20, 21, 26, 28, 30, 31, 33, 34], "been": [1, 6, 7, 10, 17, 18, 28, 31, 33, 34], "releas": [1, 17, 18, 26, 30, 33], "an": [1, 2, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 26, 31, 32, 33, 34], "open": [1, 16, 28, 33], "sourc": [1, 5, 6, 17, 27, 28, 33, 34], "project": [1, 6], "github": [1, 2, 5, 6, 7, 8, 34], "you": [1, 2, 5, 6, 7, 8, 13, 14, 15, 17, 18, 20, 23, 25, 26, 28, 29, 31, 33, 34], "find": [1, 2, 6, 7, 14, 16, 23, 26, 30, 31, 34], "how": [1, 2, 6, 10, 15, 17, 18, 23, 28, 32, 33, 34], "get": [1, 2, 3, 4, 6, 7, 10, 11, 15, 17, 20, 21, 22, 26, 28, 29, 30, 31, 33, 34], "main": [1, 2, 5, 6, 14, 20, 31, 32], "branch": [1, 7, 30], "quick": [1, 20, 24, 25], "about": [1, 2, 5, 7, 13, 16, 32, 33, 34], "product": [1, 2, 7, 14, 28, 34], "structur": [1, 18, 31], "shown": [1, 6, 18, 28, 31, 32], "follow": [1, 2, 4, 5, 6, 7, 8, 11, 14, 15, 16, 17, 18, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34], "figur": [1, 2, 21, 28, 33], "eager": [1, 7, 12, 23, 32, 34], "mode": [1, 2, 5, 7, 10, 12, 18, 20, 23, 26, 32, 34], "frontend": [1, 2, 7, 20, 28, 34], "custom": [1, 2, 7, 26, 34], "fusion": [1, 2, 7, 10, 21, 28, 34], "int8": [1, 2, 3, 4, 17, 18, 20, 22, 28, 29, 34], "quantiz": [1, 3, 4, 13, 22, 26, 28, 30, 32, 34], "api": [1, 3, 6, 10, 11, 15, 20, 26, 33, 34], "further": [1, 2, 5, 6, 7, 18, 20, 28, 33, 34], "improv": [1, 3, 7, 8, 13, 20, 22, 28, 30, 32, 33], "achiev": [1, 2, 6, 7, 28, 33, 34], "convert": [1, 2, 4, 6, 7, 8, 9, 10, 13, 16, 17, 18, 20, 23, 26, 32, 34], "graph": [1, 4, 8, 10, 16, 23, 26, 31, 34], "us": [1, 2, 3, 4, 5, 6, 11, 14, 15, 17, 18, 19, 21, 23, 24, 25, 26, 27, 28, 32, 33, 34], "pass": [1, 2, 5, 10, 17, 20, 26, 32, 34], "reduc": [1, 2, 7, 15, 19, 20, 21, 22, 26, 28, 33, 34], "oper": [1, 2, 6, 8, 13, 15, 21, 32, 33, 34], "kernel": [1, 2, 7, 20, 26, 28, 30, 33, 34], "invoc": [1, 7], "overhead": [1, 2, 7, 10, 19, 20, 26, 28, 33, 34], "result": [1, 2, 6, 10, 12, 14, 16, 18, 20, 21, 30, 31, 32, 33], "compar": [1, 2, 7, 13, 18, 21, 26, 28, 30, 31, 33, 34], "normal": [1, 2, 6, 7, 13, 20, 28, 33, 34], "yield": [1, 7, 33], "better": [1, 2, 6, 7, 15, 18, 20, 28, 31, 32, 33, 34], "techniqu": [1, 2, 7, 11, 12, 28, 34], "like": [1, 2, 3, 5, 6, 7, 8, 14, 18, 19, 21, 26, 28, 31, 33, 34], "amplifi": 1, "them": [1, 5, 7, 18, 19, 28, 31, 33], "comprehens": [1, 34], "both": [1, 2, 6, 7, 16, 18, 19, 21, 28, 29, 31, 32, 33, 34], "torchscript": [1, 2, 5, 7, 10, 11, 12, 19, 23, 26, 32, 34], "torchdynamo": [1, 7, 12, 23, 34], "With": [1, 2, 7, 10, 20, 31, 34], "we": [1, 2, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 18, 19, 20, 21, 23, 28, 30, 32, 33, 34], "recommend": [1, 5, 6, 7, 9, 10, 15, 16, 20, 23, 30, 31, 33, 34], "torch": [1, 2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 18, 20, 23, 26, 29, 32, 33, 34], "jit": [1, 2, 5, 6, 7, 8, 13, 15, 16, 18, 20, 23, 26, 32, 34], "trace": [1, 6, 7, 8, 12, 13, 15, 16, 20, 23, 26, 32, 34], "your": [1, 5, 6, 7, 8, 10, 14, 15, 20, 23, 24, 26, 27, 28, 29, 34], "prefer": [1, 7, 8, 15, 24], "option": [1, 2, 5, 7, 10, 14, 15, 16, 29, 31, 34], "wider": 1, "rang": [1, 6, 7, 15, 16, 19, 21, 26, 31, 32, 34], "ipex": [1, 2, 3, 4, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 23, 26, 29, 31, 32, 34], "backend": [1, 2, 3, 6, 7, 12, 13, 16, 17, 23, 26, 28, 31, 33, 34], "avail": [1, 2, 6, 7, 11, 17, 20, 22, 23, 29, 31, 33, 34], "good": [1, 2, 5, 7, 12, 18, 19, 28, 33, 34], "On": [1, 2, 7, 18, 28, 33], "automat": [1, 2, 6, 7, 9, 10, 12, 13, 15, 16, 18, 22, 28, 31, 32, 33, 34], "dispatch": [1, 34], "underli": [1, 17, 28], "detect": [1, 6, 12, 17, 26, 33, 34], "set": [1, 2, 4, 5, 6, 7, 8, 14, 15, 16, 17, 21, 24, 26, 28, 30, 31, 32, 33, 34], "isa": [1, 34], "leverag": [1, 7, 11, 28, 32, 34], "unit": [1, 2, 33], "runtim": [1, 8, 13, 17, 31, 33, 34], "offer": [1, 5, 33], "finer": [1, 7, 20], "grain": [1, 3, 7, 20], "thread": [1, 2, 7, 20, 26, 30, 31, 32, 33, 34], "control": [1, 2, 7, 20, 26, 31, 33, 34], "weight": [1, 2, 7, 10, 12, 13, 15, 16, 18, 20, 22, 23, 26, 28, 34], "share": [1, 5, 6, 16, 20, 32, 33, 34], "increas": [1, 2, 3, 21, 26, 28, 30, 33, 34], "effici": [1, 7, 11, 19, 20, 28, 31, 33, 34], "implement": [1, 5, 7, 11, 19, 26, 28, 33, 34], "regist": [1, 7, 10, 16, 17, 34], "mechan": [1, 7, 17, 21, 34], "These": [1, 5, 6, 7, 8, 13, 28], "nativ": [1, 6, 7, 8, 17, 19, 21, 26, 28, 34], "calcul": [1, 2, 8, 16, 21, 22], "util": [1, 6, 7, 10, 13, 15, 16, 18, 21, 28, 31, 33, 34], "dpc": 1, "compil": [1, 5, 6, 23, 26, 33, 34], "sycl": 1, "standard": [1, 34], "also": [1, 2, 6, 7, 10, 13, 14, 16, 18, 19, 28, 30, 31, 33, 34], "number": [1, 2, 5, 6, 7, 14, 16, 19, 20, 21, 26, 32, 34], "which": [1, 2, 5, 7, 8, 10, 14, 15, 16, 17, 18, 20, 26, 28, 30, 31, 32, 33, 34], "found": [1, 6, 7, 14, 16, 18, 29, 31, 32, 33, 34], "doc": [1, 2, 5, 11, 29, 34], "directori": [1, 5, 6, 14, 29, 31, 32], "team": [1, 5], "track": 1, "bug": [1, 5, 34], "enhanc": [1, 3, 28, 34], "request": [1, 5, 20, 32], "issu": [1, 2, 5, 8, 21, 26, 33], "befor": [1, 2, 5, 6, 13, 14, 17, 18, 20, 31, 33, 34], "submit": [1, 5, 7, 20], "suggest": [1, 2, 15, 18, 20, 33, 34], "report": [1, 17], "search": [1, 2, 4, 5, 7, 16, 22, 28, 31], "exist": [1, 5, 7, 13, 26, 31, 33], "see": [1, 2, 5, 8, 14, 34], "alreadi": [1, 5, 6, 18, 28, 33], "pytorch": [2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 16, 17, 20, 23, 25, 26, 27, 28, 29, 30, 31, 33, 34], "dtype": [2, 4, 6, 7, 8, 10, 11, 13, 15, 16, 17, 23, 26, 29, 31, 34], "none": [2, 6, 29, 31], "o1": [2, 26, 34], "inplac": [2, 4, 6, 13, 15, 18, 23, 32], "fals": [2, 4, 6, 7, 8, 13, 14, 15, 16, 17, 20, 22, 23, 26, 31, 32, 34], "conv_bn_fold": [2, 26, 34], "linear_bn_fold": 2, "weights_prepack": [2, 6, 7, 23, 26], "replace_dropout_with_ident": 2, "optimize_lstm": 2, "split_master_weight_for_bf16": 2, "fuse_update_step": 2, "auto_kernel_select": [2, 7, 30], "sample_input": [2, 9, 34], "graph_mod": [2, 4, 7, 12, 34], "concat_linear": 2, "appli": [2, 6, 7, 8, 12, 13, 16, 18, 19, 21, 23, 26, 28, 29, 31, 34], "given": [2, 6, 13, 14, 16, 28], "nn": [2, 6, 7, 8, 10, 13, 15, 16, 18, 20, 26, 34], "If": [2, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 20, 26, 31, 32, 33, 34], "train": [2, 3, 4, 7, 11, 13, 15, 16, 18, 21, 23, 26, 28, 29, 31, 34], "otherwis": [2, 7, 20], "infer": [2, 3, 4, 7, 10, 11, 12, 15, 18, 20, 21, 23, 26, 30, 33, 34], "conv": [2, 8, 10, 13, 15, 20, 26, 34], "bn": [2, 10, 15, 26, 34], "fold": [2, 10, 15, 16, 26, 34], "prepack": [2, 6, 10, 18, 26, 28, 34], "so": [2, 5, 6, 7, 8, 15, 17, 18, 20, 30, 31, 32, 33, 34], "onednn": [2, 3, 13, 17, 26, 28, 34], "order": [2, 17, 18, 21, 31, 33, 34], "cach": [2, 5, 7, 19, 20, 30, 34], "reus": [2, 33], "memori": [2, 6, 7, 8, 9, 10, 13, 19, 20, 21, 26, 28, 30, 32, 34], "layout": [2, 26, 34], "call": [2, 6, 8, 13, 17, 18, 21, 26, 32, 33, 34], "block": [2, 5, 16, 20, 22, 28, 33, 34], "although": [2, 33], "itself": [2, 5, 18], "enough": [2, 7, 19], "usag": [2, 6, 7, 8, 23, 25, 32, 33, 34], "perspect": [2, 13, 18, 21, 28, 31, 33], "drawback": [2, 21], "run": [2, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 26, 30, 31, 32, 33, 34], "split": [2, 6, 7, 16, 17, 19, 20, 26, 34], "one": [2, 5, 7, 12, 13, 14, 16, 18, 19, 20, 26, 29, 31, 33, 34], "sever": [2, 7, 10, 19, 30, 31, 34], "dimens": [2, 18, 26], "data": [2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 31, 32, 34], "fix": [2, 5, 7, 34], "size": [2, 6, 7, 11, 15, 16, 17, 18, 23, 26, 28, 30, 32, 33, 34], "each": [2, 8, 14, 16, 17, 19, 20, 21, 31, 32, 33, 34], "time": [2, 5, 7, 14, 16, 17, 18, 19, 26, 28, 30, 33, 34], "execut": [2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 19, 20, 26, 31, 32, 33, 34], "detail": [2, 5, 6, 7, 8, 9, 11, 13, 17, 18, 24, 25, 26, 28, 30, 32, 33, 34], "mermori": 2, "format": [2, 5, 6, 7, 9, 14, 22, 26, 28, 31, 33, 34], "manual": [2, 7, 10, 14, 18, 20, 34], "To": [2, 5, 6, 7, 10, 13, 15, 16, 17, 18, 20, 21, 23, 28, 32, 33, 34], "predefin": 2, "shape": [2, 6, 7, 16, 20, 23, 30, 33, 34], "prior": [2, 23], "match": [2, 8, 17, 31], "requir": [2, 5, 6, 8, 10, 16, 18, 21, 26, 28, 29, 31, 32, 34], "won": [2, 7, 8, 17, 26], "t": [2, 5, 7, 8, 14, 15, 16, 17, 18, 20, 26, 32, 34], "convers": [2, 8, 13, 34], "directli": [2, 6, 33, 34], "go": [2, 5, 8], "methodologi": [2, 6, 7, 19, 33], "possibl": [2, 14, 15, 19, 28, 33, 34], "avoid": [2, 10, 20, 21, 26, 31, 32, 33, 34], "thu": [2, 7, 8, 10, 18, 20, 21, 28, 31, 32, 33], "paramet": [2, 6, 7, 8, 10, 16, 17, 19, 20, 21, 26, 28, 29, 30, 31, 33, 34], "work": [2, 5, 6, 7, 14, 15, 17, 20, 26, 28, 29, 31, 33, 34], "bfloat16": [2, 3, 4, 7, 10, 11, 17, 18, 23, 29, 31, 34], "half": [2, 7, 17, 21], "k": [2, 5], "float16": [2, 8], "cast": [2, 8, 21, 28], "accord": [2, 13, 28, 33, 34], "default": [2, 4, 6, 7, 10, 12, 13, 15, 16, 17, 20, 22, 23, 26, 28, 30, 32, 33, 34], "valu": [2, 6, 10, 14, 16, 17, 19, 20, 21, 22, 26, 28, 31, 32, 33, 34], "mean": [2, 16, 17, 18, 20, 22, 28, 34], "do": [2, 5, 8, 16, 18, 20, 21, 26, 28, 30, 31, 32, 33, 34], "noth": 2, "note": [2, 3, 5, 6, 15, 16, 17, 18, 20, 22, 24, 28, 30, 31, 32, 33], "type": [2, 4, 5, 6, 7, 10, 16, 17, 18, 20, 21, 23, 30, 31, 32, 34], "conv2d": [2, 7, 8, 10, 13, 18, 20, 26, 34], "linear": [2, 6, 7, 8, 13, 15, 16, 18, 26, 33, 34], "convtranspose2d": [2, 13], "case": [2, 6, 7, 9, 12, 16, 17, 18, 28, 31, 33, 34], "addit": [2, 6, 7, 17, 21, 28, 34], "embed": [2, 7, 28, 34], "lstm": [2, 10, 15, 34], "sgd": [2, 6, 7, 8, 16, 19], "string": [2, 31], "o0": [2, 26, 34], "No": [2, 18, 34], "function": [2, 5, 6, 7, 8, 10, 11, 12, 14, 15, 17, 20, 21, 23, 26, 28, 29, 31, 33, 34], "just": [2, 14, 29, 34], "return": [2, 6, 7, 8, 10, 16, 17, 20, 26, 34], "origin": [2, 6, 7, 12, 13, 15, 17, 20, 29, 34], "dropout": [2, 10], "remov": [2, 5, 21, 34], "inferenc": 2, "master": [2, 7, 21, 31], "fuse": [2, 7, 13, 16, 19, 28, 34], "updat": [2, 5, 7, 16, 19, 21, 22, 34], "step": [2, 5, 6, 7, 8, 14, 16, 19, 21, 32], "overridden": [2, 17], "explicitli": [2, 8, 16, 20, 26, 31, 34], "bool": [2, 14], "whether": [2, 6, 8, 16, 18, 22, 23, 33], "conv_bn": 2, "It": [2, 6, 7, 8, 10, 13, 17, 18, 20, 21, 23, 26, 29, 31, 33, 34], "knob": [2, 4, 12, 31], "overwrit": [2, 31], "configur": [2, 4, 6, 7, 14, 15, 16, 17, 31, 32, 34], "linear_bn": 2, "convolut": [2, 6, 7, 13, 20, 33, 34], "reorder": [2, 18, 28], "doesn": [2, 15, 16, 18, 26, 34], "support": [2, 5, 6, 7, 13, 15, 16, 17, 18, 19, 20, 21, 25, 26, 28, 29, 31, 32, 33, 34], "replac": [2, 5, 7, 10, 26, 34], "ident": [2, 10, 18], "aten": [2, 6, 7, 34], "opportunit": 2, "bf16": [2, 3, 7, 17, 19, 21, 23, 26, 28, 30, 34], "save": [2, 5, 6, 7, 13, 14, 15, 16, 18, 21, 28, 32, 34], "solut": [2, 7, 26, 28, 34], "all": [2, 5, 6, 8, 13, 14, 17, 19, 20, 28, 29, 32, 33, 34], "param": [2, 19, 31], "tupl": [2, 6, 17, 20], "tensor": [2, 6, 7, 8, 11, 15, 16, 17, 20, 26, 28, 32, 34], "feed": [2, 9, 18], "sampl": [2, 6, 9, 14, 16, 17, 29, 33], "input": [2, 6, 7, 9, 10, 13, 15, 16, 17, 18, 22, 23, 26, 29, 30, 32, 33, 34], "impact": [2, 7, 20], "pack": [2, 20, 34], "intel": [2, 3, 4, 7, 8, 9, 10, 11, 13, 14, 16, 17, 20, 21, 22, 23, 25, 26, 27, 28, 29, 34], "extens": [2, 3, 4, 6, 9, 10, 13, 14, 16, 17, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34], "per": [2, 10, 15, 16, 20, 30, 31, 32, 33, 34], "some": [2, 5, 7, 8, 13, 16, 17, 18, 20, 26, 28, 31, 32, 33, 34], "heurist": [2, 20, 34], "real": [2, 7, 14, 15, 30, 34], "best": [2, 6, 7, 8, 14, 16, 17, 22, 24, 28, 33, 34], "try": [2, 5, 6, 7, 12, 14, 16, 26, 31, 33, 34], "select": [2, 5, 7, 13, 24, 34], "true": [2, 4, 6, 10, 12, 13, 14, 15, 16, 17, 22, 23, 31, 32, 33, 34], "might": [2, 7, 18, 26, 33, 34], "cost": [2, 6, 28, 30, 33], "extra": [2, 5, 10, 20, 31, 32], "combin": [2, 12, 14, 28, 31, 34], "method": [2, 8, 15, 16, 18, 22, 26, 33, 34], "multipl": [2, 5, 7, 8, 16, 17, 18, 26, 28, 30, 32, 33, 34], "subgraph": 2, "modifi": [2, 5, 6], "other": [2, 6, 7, 8, 14, 17, 18, 19, 23, 28, 31, 33], "place": [2, 8, 28, 33, 34], "scenario": [2, 6, 7, 18, 33, 34], "convolutuon": 2, "counterpart": [2, 7, 18, 34], "pleas": [2, 6, 7, 11, 16, 22, 26, 28, 31, 33, 34], "invok": [2, 6, 8, 10, 13, 20, 23, 26, 29, 34], "ddp": [2, 6], "distribut": [2, 3, 7, 16, 31, 32, 33], "deepcopi": 2, "rather": [2, 18], "than": [2, 5, 7, 17, 18, 20, 21, 26, 33, 34], "allreduc": 2, "caus": [2, 7, 21, 26, 28, 31, 33, 34], "unpredict": 2, "accuraci": [2, 3, 6, 7, 8, 15, 16, 21, 22, 26, 28, 34], "loss": [2, 5, 6, 8, 16, 18, 21, 26], "exampl": [2, 5, 7, 8, 13, 18, 19, 21, 22, 23, 24, 25, 28, 29, 32, 33, 34], "load_state_dict": [2, 34], "path": [2, 6, 7, 14, 18, 20, 23, 31, 33, 34], "eval": [2, 4, 6, 8, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "optimized_model": [2, 34], "evalu": [2, 16, 34], "optimized_optim": 2, "altern": [2, 6, 18], "motiv": [2, 20], "ad": [2, 7, 10, 33, 34], "alia": 2, "unifi": [2, 31], "style": [2, 5], "modular": 2, "float32": [2, 13, 21, 23, 26, 30, 31, 34], "quantization_config": [2, 6, 29], "qconfig_summary_fil": [2, 6, 29], "low_precision_checkpoint": [2, 6, 29], "deployment_mod": [2, 6, 23], "transform": [2, 3, 4, 6, 10, 11, 13, 16, 18, 22, 23, 28, 32, 33, 34], "focu": [2, 10, 18, 29, 34], "especi": [2, 5, 28, 34], "task": [2, 7, 28, 31, 33, 34], "famili": [2, 28, 33], "full": [2, 5, 18, 32, 33, 34], "llama": [2, 3, 6, 28], "gpt": [2, 28, 30], "j": [2, 5, 17, 28, 30], "neox": [2, 28], "opt": [2, 6, 17, 28], "falcon": [2, 28], "bloom": [2, 28], "codegen": [2, 28, 34], "baichuan": [2, 28, 34], "chatglm": [2, 28], "gptbigcod": [2, 28], "t5": [2, 26, 28, 34], "mistral": [2, 28, 34], "mpt": [2, 28, 34], "mixtral": [2, 28], "stablelm": [2, 28], "qwen": [2, 28], "git": [2, 5, 28], "llava": [2, 28], "yuan": [2, 28], "phi": [2, 28], "scope": [2, 7, 8, 21, 34], "abov": [2, 5, 10, 19, 28, 30, 31, 32], "transpar": [2, 7, 29, 33, 34], "benifit": 2, "float": [2, 6, 7, 8, 14, 15, 16, 17, 21, 29, 34], "when": [2, 5, 6, 7, 8, 9, 14, 18, 19, 20, 21, 22, 25, 26, 28, 30, 31, 32, 33, 34], "mix": [2, 6, 13, 23, 26, 28, 34], "str": [2, 6, 14, 23, 31], "specifi": [2, 5, 6, 14, 20, 31, 33, 34], "either": [2, 26, 31], "object": [2, 6, 7, 14, 17, 20, 33, 34], "defin": [2, 5, 6, 7, 8, 10, 16, 17, 18, 22, 32], "recip": [2, 4, 7, 13, 15, 26, 28, 34], "quant": [2, 16], "static": [2, 4, 16, 26, 28, 31, 32, 33, 34], "onc": [2, 5, 6, 14, 17, 18, 20, 21, 32, 33], "quantizat": 2, "config": [2, 6, 11, 23, 31, 32], "json": [2, 6, 15, 16, 32, 34], "file": [2, 4, 5, 6, 8, 14, 15, 16, 17, 18, 31, 34], "under": [2, 6, 8, 18, 20, 27, 31, 34], "need": [2, 5, 6, 7, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 29, 31, 32, 33, 34], "calibr": [2, 13, 22, 26, 29, 30, 32, 34], "dict": [2, 6, 23], "int4": [2, 28, 29, 34], "": [2, 3, 5, 8, 10, 14, 15, 18, 19, 20, 21, 22, 26, 31, 32, 33], "should": [2, 5, 8, 15, 20, 28, 31, 33], "state_dict": [2, 6], "checkpoint": [2, 6, 29], "pt": [2, 6, 13, 14, 15, 23, 32, 34], "gptq": [2, 6, 34], "etc": [2, 5, 6, 17, 34], "where": [2, 5, 7, 16, 21, 33], "kei": [2, 7, 28, 34], "scale": [2, 3, 6, 15, 28], "zero": [2, 6, 15, 34], "point": [2, 6, 8, 15, 21, 33, 34], "bia": [2, 8, 20, 34], "weight_kei": 2, "packed_weight": 2, "scale_kei": 2, "zero_point_kei": 2, "packed_zp": 2, "bias_kei": 2, "chang": [2, 5, 6, 7, 8, 10, 11, 12, 15, 17, 18, 20, 23, 25, 26, 29, 31], "make": [2, 5, 6, 7, 14, 15, 17, 21, 23, 28, 32, 33], "n": [2, 6, 7, 16, 18, 19, 20, 26, 32, 33, 34], "thei": [2, 7, 8, 31, 33], "uint4": 2, "compress": 2, "along": [2, 5, 6, 21, 33, 34], "store": [2, 17, 18, 19, 21, 28, 31, 32, 33, 34], "int32": 2, "state": [2, 15, 19, 28], "automaticlli": 2, "deploy": [2, 7, 13, 34], "torchscirpt": 2, "workabl": 2, "forward": [2, 6, 8, 13, 16, 20, 21, 26, 32, 33, 34], "after": [2, 5, 7, 13, 20, 21, 23, 24, 32, 33, 34], "deepspe": [2, 34], "parallel": [2, 5, 6, 7, 28, 33, 34], "class": [2, 5, 6, 7, 8, 10, 16, 20, 26, 34], "verbos": [2, 4, 31], "demand": [2, 7], "easier": [2, 18, 21], "debug": [2, 31], "dump": [2, 31], "messag": [2, 6, 10, 12, 18, 31], "contain": [2, 5, 6, 13, 17, 26, 31, 32, 33, 34], "durat": [2, 21], "while": [2, 7, 8, 11, 12, 18, 21, 26, 28, 32, 33, 34], "via": [2, 5, 6, 7, 18, 20, 30, 31, 33, 34], "environ": [2, 5, 6, 17, 20, 24, 28, 30, 31, 32, 33], "variabl": [2, 5, 17, 30, 31, 32, 33, 34], "name": [2, 5, 7, 14, 17, 25, 28, 31, 32, 33, 34], "dnnl_verbos": 2, "howev": [2, 5, 7, 8, 9, 16, 20, 26, 28, 31, 33, 34], "those": [2, 15, 33], "amount": [2, 16, 26, 28, 33], "investig": [2, 31], "singl": [2, 7, 13, 14, 16, 19, 20, 30, 32, 34], "iter": [2, 16, 21, 28, 34], "out": [2, 5, 6, 7, 8, 10, 13, 16, 19, 20, 30, 31, 33, 34], "second": [2, 10, 28, 32, 33], "verbose_on": 2, "verbose_off": 2, "disabl": [2, 6, 7, 13, 26, 31, 33, 34], "verbose_on_cr": 2, "creation": 2, "linearsilu": [2, 34], "silu": [2, 13], "http": [2, 5, 16, 34], "org": [2, 7, 16, 26, 34], "stabl": [2, 3, 8, 34], "html": [2, 5, 16], "output": [2, 6, 7, 8, 13, 14, 16, 18, 23, 26, 34], "same": [2, 5, 7, 10, 15, 16, 17, 18, 20, 21, 28, 31, 32, 33, 34], "init": [2, 5, 15, 34], "linear_modul": 2, "4096": [2, 33], "ipex_fus": 2, "randn": [2, 10, 13, 16, 18, 32, 34], "linearsilumul": [2, 34], "multipli": 2, "mul": [2, 13, 16], "linear2silumul": [2, 34], "linear_": 2, "linear_m": 2, "two": [2, 7, 14, 16, 20, 21, 28, 32, 33, 34], "linear_s_modul": 2, "linear_m_modul": 2, "linearrelu": [2, 34], "relu": [2, 7, 13, 16, 18, 26, 34], "linearnewgelu": [2, 34], "newgeluactiv": 2, "com": [2, 5, 34], "huggingfac": [2, 6, 26, 28, 32, 34], "blob": 2, "src": [2, 17], "activ": [2, 6, 7, 15, 16, 20, 28, 31, 33], "py": [2, 5, 10, 14, 20, 31, 32, 34], "l50": 2, "new_gelu": 2, "lineargelu": [2, 34], "gelu": [2, 13, 34], "linearmul": [2, 34], "linearadd": [2, 34], "add": [2, 5, 7, 8, 13, 14, 19, 21, 32, 34], "linearaddadd": [2, 34], "other_1": 2, "other_2": 2, "rotaryembed": [2, 34], "max_position_embed": 2, "int": [2, 6, 7, 14, 17, 23, 26, 29, 31, 34], "pos_embd_dim": 2, "10000": 2, "backbon": 2, "co": 2, "paper": [2, 34], "2104": 2, "09864": 2, "queri": [2, 17, 18], "multi": [2, 7, 14, 20, 28, 31, 33, 34], "head": [2, 34], "comput": [2, 6, 7, 13, 15, 16, 18, 20, 21, 28, 30, 31, 32, 33, 34], "max": [2, 6, 16, 17, 22, 23, 26, 34], "posit": [2, 28, 33, 34], "frequenc": [2, 30], "exact": 2, "g": [2, 7, 8, 16, 17, 18, 28, 34], "gptjforcausallm": 2, "architectur": [2, 28, 30, 33], "eleutherai": [2, 28], "6b": [2, 28, 30], "l4": 2, "batch": [2, 6, 7, 13, 16, 18, 20, 23, 26, 30, 32, 34], "sequenc": [2, 18, 21, 28, 34], "length": [2, 5, 14, 21, 26, 30, 34], "num_head": 2, "num_kv_head": 2, "head_dim": 2, "position_id": [2, 6], "element": [2, 18, 19], "past_kv_length": 2, "id": [2, 31, 32], "construct": [2, 7, 13], "current_posit": 2, "num": [2, 20, 32, 33, 34], "dim": [2, 6, 18, 23], "offset": [2, 18, 28], "sin": 2, "neighbor": 2, "rotary_dim": 2, "rotary_ndim": 2, "rotari": [2, 28], "64": [2, 8, 10, 16, 20, 30, 31, 34], "gptj": 2, "rope_modul": 2, "2048": [2, 6], "32": [2, 6, 18, 21, 23, 30, 31, 32], "16": [2, 17, 20, 21, 30, 31, 32], "256": [2, 30], "arang": [2, 6, 16], "unsqueez": 2, "query_roteri": 2, "direct": [2, 5, 13], "apply_funct": 2, "without": [2, 5, 6, 7, 8, 10, 16, 20, 21, 26, 32, 34], "initi": [2, 20, 32], "assum": [2, 7, 8, 23, 32, 33, 34], "num_token": 2, "rotary_half": 2, "rmsnorm": [2, 28, 34], "hidden_s": [2, 6], "ep": [2, 7, 10, 19], "1e": [2, 7, 10, 16], "06": [2, 31, 32], "hidden": [2, 18, 28], "modeling_llama": 2, "l76": 2, "variance_epsilon": 2, "6": [2, 5, 7, 11, 14, 20, 30, 31, 32, 33, 34], "ones": [2, 6, 17], "hidden_st": 2, "usual": [2, 18, 20, 33], "rmsnorm_modul": 2, "fastlayernorm": [2, 34], "normalized_shap": 2, "layernorm": [2, 13, 16, 22, 34], "list": [2, 5, 7, 8, 13, 14, 16, 18, 25, 29, 31, 32, 33, 34], "denomin": 2, "numer": [2, 8, 33], "stabil": [2, 8, 34], "layernorm_modul": 2, "05": [2, 7, 10, 30, 31], "indirectaccesskvcacheattent": [2, 34], "text_max_length": 2, "kv_cach": [2, 28], "decod": [2, 28, 30, 34], "layer": [2, 16, 20, 22, 28, 34], "bring": [2, 6, 7, 9, 15, 16, 21, 28, 31, 33, 34], "beam": [2, 28], "idx": [2, 28, 31], "concat": [2, 20, 26, 28, 34], "entir": [2, 16, 28], "context": [2, 5, 6, 8, 20, 28, 33], "dot": [2, 7, 18, 28], "veri": [2, 5, 15, 18, 28], "long": [2, 6, 18, 21, 26, 28, 34], "bottleneck": [2, 28], "indirect": 2, "access": [2, 6, 7, 18, 19, 32], "iakv": [2, 28], "firstli": [2, 28], "pre": [2, 28, 34], "alloc": [2, 10, 20, 28, 30, 32, 34], "buffer": [2, 28], "index": [2, 5, 18, 28, 33], "histori": [2, 14, 28], "decid": [2, 15, 20, 28], "timestamp": [2, 28], "max_seq": 2, "head_num": 2, "head_siz": 2, "token": [2, 6, 23, 28, 30], "everi": [2, 28], "kv": 2, "seq_len": [2, 30], "scale_attn": 2, "sqrt": [2, 13, 19], "layer_past": 2, "seq_info": 2, "key_cach": 2, "value_cach": 2, "info": [2, 6, 17, 26, 31, 32, 34], "head_mask": 2, "mask": [2, 7, 17, 26], "yet": [2, 6, 26, 34], "attention_mask": [2, 6], "attn_weight": 2, "first": [2, 3, 5, 6, 7, 9, 10, 12, 16, 19, 20, 21, 26, 31, 32, 33], "matmul": [2, 8, 13, 26, 34], "new_layer_past": 2, "attn_output": 2, "l1318": 2, "def": [2, 6, 8, 10, 16, 20, 26, 34], "_reorder_cach": 2, "self": [2, 6, 8, 10, 16, 20, 26, 34], "past_key_valu": [2, 6], "beam_idx": 2, "len": [2, 6, 7, 13, 16, 17], "4": [2, 6, 11, 13, 14, 18, 20, 23, 28, 30, 31, 33, 34], "3": [2, 5, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 20, 21, 28, 30, 31, 33], "pagedattent": [2, 34], "vllm": 2, "blog": [2, 34], "2023": [2, 3, 30], "20": [2, 7, 18, 30, 31, 32, 34], "page": [2, 6, 13, 20, 24, 29, 30, 33, 34], "num_block": 2, "block_siz": 2, "basic": [2, 4, 16, 21, 33], "logic": [2, 14, 18, 32, 33], "dram": 2, "manag": [2, 8, 13, 20, 28, 31], "slot": [2, 30], "reshape_and_cach": 2, "single_query_cached_kv_attent": 2, "mha": [2, 34], "intra": 2, "tabl": [2, 7, 17, 28, 30, 34], "map": [2, 6, 18, 30], "physic": [2, 14, 20, 32, 33], "slot_map": 2, "allcat": 2, "keytensor": 2, "num_seq": 2, "block_numb": 2, "head_map": 2, "block_tabl": 2, "context_len": 2, "max_context_len": 2, "alibi_slop": 2, "5": [2, 6, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 26, 28, 30, 31, 32, 33, 34], "max_num_blocks_per_seq": 2, "optin": 2, "alibi": 2, "slope": 2, "varlenattent": [2, 34], "scaled_dot_product_attent": 2, "accept": [2, 34], "variant": [2, 8, 28], "among": [2, 31, 32, 33], "doe": [2, 7, 13, 18, 20, 26, 34], "arg": [2, 4, 6, 7, 14, 16, 19, 23, 31, 32, 34], "query_token": 2, "total": [2, 6, 30, 33], "key_token": 2, "value_token": 2, "seqlen_q": 2, "batch_siz": [2, 6, 11, 13, 16, 18, 23, 32], "seqlen_k": 2, "max_seqlen_q": 2, "max_seqlen_k": 2, "pdropout": 2, "probabl": 2, "greater": 2, "softmax_scal": 2, "factor": [2, 6, 16, 31], "softmax": [2, 13, 34], "is_caus": 2, "causal": 2, "varlenattention_modul": 2, "emply_lik": 2, "rotary_embed": [2, 34], "rms_norm": [2, 34], "fast_layer_norm": [2, 34], "expect": [2, 7, 30, 34], "indirect_access_kv_cache_attent": [2, 34], "add_casual_mask": 2, "varlen_attent": [2, 34], "zero_tensor": 2, "return_softmax": 2, "gen_": 2, "fast_bert": [2, 4, 6, 7, 11, 34], "unpad": 2, "tpp": [2, 28], "speedup": [2, 6, 8, 28, 30, 34], "still": [2, 5, 7, 8, 13, 16, 18, 21, 26, 34], "squenc": 2, "sparsiti": 2, "seed": 2, "libxsmm": 2, "though": [2, 7], "peak": [2, 7, 11, 34], "enable_onednn_fus": [2, 13], "get_smooth_quant_qconfig_map": [2, 6, 29], "alpha": [2, 6, 19, 22], "act_observ": 2, "act_ic_observ": 2, "wei_observ": 2, "wei_ic_observ": 2, "share_weight_observ": 2, "smoothquant": [2, 6, 7, 16, 22, 28, 34], "arxiv": 2, "pdf": 2, "2211": 2, "10438": 2, "hyper": [2, 30, 33, 34], "observ": [2, 9, 13, 15, 34], "op": [2, 7, 15, 16, 22, 28, 34], "histogramobserv": [2, 15], "q": [2, 28], "min": [2, 16, 22, 26, 34], "affect": [2, 31], "argument": [2, 6, 7, 22, 26, 31], "ao": [2, 6, 15], "minmaxobserv": [2, 6, 15], "channel": [2, 3, 10, 15, 16, 26, 34], "perchannelminmaxobserv": [2, 6, 15], "with_arg": [2, 6, 15], "ch_axi": 2, "qint8": [2, 6, 15], "qscheme": [2, 6, 15, 34], "per_channel_symmetr": [2, 6, 15], "qconfig": [2, 4, 6, 13, 16, 26, 29, 32, 34], "prepar": [2, 4, 6, 13, 16, 26, 29, 32, 34], "example_input": [2, 4, 6, 13, 15, 29, 32, 34], "bn_fold": 2, "example_kwarg_input": 2, "fp32": [2, 4, 16, 17, 19, 21, 23, 28, 34], "A": [2, 5, 6, 7, 10, 11, 17, 26, 28, 31, 33, 34], "even": [2, 5, 7, 33, 34], "prepared_model": [2, 4, 6, 13, 15, 16, 26, 29, 34], "original_model": 2, "later": [2, 7, 25, 33], "unexpect": 2, "behavior": [2, 20, 31, 33], "insert": [2, 16], "fake": 2, "introduct": [2, 7, 28, 33, 34], "avaiabl": 2, "autotun": [2, 4, 22, 34], "calib_dataload": [2, 6, 16, 34], "calib_func": 2, "eval_func": [2, 16, 34], "op_type_dict": 2, "smoothquant_arg": [2, 16], "sampling_s": [2, 4, 16, 34], "accuracy_criterion": [2, 4, 16, 34], "tuning_tim": [2, 4, 16, 34], "driven": 2, "tune": [2, 3, 4, 7, 8, 15, 20, 26, 28, 31, 32, 34], "help": [2, 5, 6, 17, 23, 28, 31, 33, 34], "quickli": 2, "dataload": [2, 6, 10, 13, 16, 20, 22, 29, 34], "post": [2, 4, 5, 7, 15, 28, 34], "process": [2, 6, 7, 11, 12, 14, 16, 19, 20, 21, 26, 31, 32, 33], "metric": [2, 16, 30], "scalar": 2, "higher": [2, 7, 13, 17, 18, 28], "constraint": [2, 34], "optyp": 2, "wise": [2, 16, 19, 22, 29, 34], "space": [2, 7, 16, 18, 22, 33], "global": [2, 20, 22, 34], "algorithm": [2, 13, 18, 30, 34], "would": [2, 5, 6, 14, 16, 17, 18, 30, 31, 32, 33, 34], "explor": 2, "100": [2, 4, 14, 16, 17, 30, 32], "accuracy_criterion_typ": 2, "rel": [2, 4, 16, 31, 34], "absolut": [2, 31], "accuracy_criterion_valu": 2, "maximum": [2, 16, 17], "allow": [2, 8, 14, 16, 22, 31, 33, 34], "01": [2, 4, 7, 16, 31, 32, 34], "timeout": [2, 5, 21], "earli": [2, 34], "stop": [2, 33], "is_runtime_ext_en": 2, "helper": 2, "check": [2, 5, 6, 7, 13, 18, 28, 29, 31, 34], "exetens": 2, "openmp": [2, 7, 20, 26, 30, 32, 34], "preload": [2, 31], "cpupool": [2, 20, 34], "core_id": [2, 20, 31], "node_id": [2, 20, 31, 32, 34], "abstract": [2, 11, 20], "pool": [2, 20, 34], "core": [2, 7, 14, 17, 30, 33, 34], "numa": [2, 20, 31, 32, 34], "node": [2, 20, 30, 32, 33, 34], "pin": [2, 20], "cpu_pool": [2, 20, 34], "region": [2, 8, 17, 33], "design": [2, 5, 8, 18, 21, 29, 34], "decor": 2, "multistreammodulehint": [2, 20, 34], "kwarg": [2, 29], "hint": [2, 20], "multistreammodul": [2, 7, 20, 26, 34], "its": [2, 6, 7, 8, 14, 17, 21, 28, 30, 31, 32, 33, 34], "arbitrari": 2, "keyword": 2, "num_stream": [2, 20, 34], "auto": [2, 6, 10, 17, 18, 22, 23, 26, 28, 31, 33, 34], "concat_output": 2, "input_split_hint": [2, 20], "multi_stream": 2, "output_concat_hint": [2, 20], "stream": [2, 7, 20, 34], "throughput": [2, 3, 18, 20, 26, 28, 30, 34], "insid": [2, 5, 20, 31], "divis": [2, 20], "equal": [2, 15, 20, 32, 33], "remaind": [2, 20], "divisor": [2, 20], "batchsiz": [2, 20], "larger": [2, 20, 30, 33], "piec": [2, 20], "less": [2, 8, 18, 20, 26, 34], "mini": [2, 20, 34], "don": [2, 5, 8, 14, 17, 34], "want": [2, 5, 7, 14, 15, 17, 20, 31, 34], "leav": [2, 20, 33], "scriptmodul": [2, 13, 20], "union": 2, "instanc": [2, 7, 10, 14, 32, 34], "reason": [2, 10, 18, 20, 34], "flag": [2, 5, 7, 17, 20, 31, 34], "indic": [2, 6, 18, 28], "concaten": [2, 21], "raw": 2, "asynchron": [2, 7], "get_core_list_of_node_id": 2, "softwar": [3, 27, 34], "jul": 3, "deep": [3, 7, 8, 11, 13, 14, 21, 33], "learn": [3, 7, 8, 11, 13, 14, 21, 31, 33], "boost": [3, 6, 7, 9, 21, 30, 31, 33, 34], "dl": [3, 7, 34], "hug": 3, "face": 3, "bert": [3, 4, 10, 30, 34], "googl": [3, 5, 28], "cloud": 3, "platform": [3, 7, 18, 32, 33, 34], "gcp": 3, "technologi": [3, 7], "guid": [3, 6, 7, 17, 32, 34], "apr": 3, "mar": [3, 32], "new": [3, 5, 12, 16, 17, 18, 20, 23, 26, 29, 33], "x86": 3, "sapphir": 3, "rapid": 3, "part": [3, 5, 7, 8, 18, 21, 26, 33, 34], "jan": 3, "secur": 3, "torchserv": [3, 34], "confer": 3, "dec": 3, "2022": [3, 31, 32], "what": [3, 5, 6, 8, 23], "pyg": 3, "diffus": [3, 34], "arc": 3, "nov": 3, "13": [3, 10, 17, 30, 31, 32, 33], "potenti": [3, 7, 34], "fine": [3, 20, 31, 32, 33, 34], "fx": [3, 7, 10, 26, 34], "sep": [3, 17], "empow": 3, "xeon": [3, 7, 14, 21, 28, 30, 32, 33, 34], "scalabl": [3, 7, 21, 28, 30, 33, 34], "processor": [3, 7, 19, 21, 28, 30, 33, 34], "aug": [3, 30], "vision": [3, 6, 30], "last": [3, 10, 21, 26, 34], "One": [3, 18, 19, 31, 33], "click": 3, "compressor": [3, 7, 16, 22, 34], "4x": 3, "jun": 3, "grokk": 3, "principl": [3, 18], "kt": 3, "person": 3, "text": [3, 6, 26, 28, 30, 33], "speech": [3, 33], "2021": [3, 17, 31, 32], "up": [3, 7, 11, 20, 24, 28, 33, 34], "modern": 3, "naver": 3, "low": [3, 4, 6, 7, 21, 23, 31, 33, 34], "latenc": [3, 14, 18, 28, 30, 32, 34], "machin": [3, 5, 6, 7, 14, 17, 26, 31, 32, 33, 34], "feb": 3, "dlrm": [3, 7, 26, 30, 34], "oneccl": [3, 6, 31, 34], "mention": [3, 10, 20, 21, 34], "deprec": [3, 26], "facebook": [3, 6, 28], "3rd": [3, 7, 21, 30, 34], "gen": [3, 30, 34], "capabl": [3, 17, 34], "2020": 3, "collabor": 3, "2019": 3, "caff": 3, "2017": 3, "command": [4, 5, 6, 14, 23, 31, 32, 33, 34], "descript": [4, 7, 16, 18, 20, 25, 33, 34], "instal": [4, 5, 6, 23, 25, 26, 28, 33, 34], "m": [4, 14, 20, 26, 31, 32, 33, 34], "pip": [4, 5, 34], "captur": [4, 34], "log": [4, 6, 13, 31, 32, 34], "prompt": [4, 6, 23, 34], "export": [4, 31, 33], "onednn_verbos": 4, "dure": [4, 6, 7, 10, 13, 16, 21, 31, 33, 34], "precis": [4, 6, 13, 21, 23, 26, 30, 34], "no_grad": [4, 6, 10, 11, 12, 13, 15, 16, 20, 23, 26, 29, 32, 34], "amp": [4, 6, 10, 23, 26, 34], "autocast": [4, 6, 7, 10, 23, 34], "prototyp": [4, 13, 20, 26, 34], "fast": [4, 12, 33, 34], "bertmodelmodel": 4, "bertmodel": [4, 6, 11, 32], "from_pretrain": [4, 6, 11, 23, 29, 32], "uncas": [4, 6, 10, 11, 32, 34], "launch": [4, 6, 20, 32, 34], "autom": [4, 7, 8, 14, 31, 32, 34], "ipexrun": [4, 10, 31, 34], "lt": [4, 28, 30], "your_pytorch_script": [4, 31], "gt": [4, 14, 28, 33], "hypertun": [4, 34], "hyperparamet": [4, 7], "conf": [4, 13, 14, 31, 34], "your_conf_fil": [4, 34], "your_python_script": [4, 34], "default_static_qconfigprepared_model": 4, "anyplac": 4, "d": [4, 5, 6, 7, 8, 13, 26, 28, 34], "calibration_data_load": [4, 6, 13], "converted_model": [4, 6, 26, 34], "default_dynamic_qconfigprepared_model": 4, "tuned_model": [4, 16, 34], "eval_funct": 4, "convert_model": [4, 13, 15, 16], "thank": [5, 34], "interest": 5, "begin": 5, "intent": 5, "propos": [5, 7, 11, 16, 18, 21], "intend": 5, "shall": [5, 18, 33], "discuss": [5, 18, 33], "agre": 5, "plan": [5, 7, 10], "look": [5, 14, 16, 18], "ahead": 5, "outstand": 5, "pick": 5, "comment": [5, 14, 17, 22, 34], "particular": [5, 6, 8, 29, 34], "ask": 5, "pull": 5, "here": [5, 8, 10, 13, 16, 17, 18, 20, 26, 32, 33, 34], "uninstal": 5, "ll": [5, 32, 33], "know": 5, "fulli": [5, 15, 17, 21, 33, 34], "warn": [5, 6, 12, 31, 32, 34], "skip": [5, 6, 17, 18, 31], "few": [5, 7, 9, 13, 16, 18, 32, 34], "alwai": [5, 6, 7, 8, 18, 31, 33, 34], "loop": [5, 21, 29], "re": [5, 8, 32, 33], "feel": [5, 18, 34], "lazi": 5, "ye": 5, "clone": 5, "copi": [5, 17, 18], "cd": [5, 6], "rebas": [5, 34], "submodul": 5, "sync": [5, 20], "recurs": 5, "job": 5, "setup": [5, 6, 28, 34], "symlink": 5, "tree": [5, 6], "reinstal": [5, 26], "again": [5, 19, 32], "__init__": [5, 6, 8, 10, 16, 20, 26, 34], "repeatedli": 5, "interfac": [5, 6, 18, 26, 28], "pyi": 5, "non": [5, 8, 13, 18, 30, 32, 34], "cpp": [5, 6, 33], "cc": [5, 6, 17], "cu": 5, "h": [5, 6, 7, 16, 18, 26, 31, 32], "sure": [5, 14, 15, 32, 33], "until": [5, 20, 21, 33], "next": [5, 7, 34], "clean": 5, "cmake": [5, 6, 17, 34], "must": [5, 14, 17, 19], "maco": 5, "linux": [5, 6, 17, 30, 31, 33], "homebrew": 5, "brew": 5, "our": [5, 16, 19, 28, 33, 34], "error": [5, 6, 7, 10, 16, 18, 21, 22, 26, 34], "printf": 5, "stdio": 5, "nint": 5, "hello": 5, "world": [5, 7], "clang": 5, "simpl": [5, 7, 8, 11, 18, 33, 34], "binari": [5, 6, 7, 8, 17, 34], "folder": 5, "mani": [5, 14, 28, 31, 33, 34], "wai": [5, 10, 16, 18, 28, 34], "rm": 5, "rf": 5, "toplevel": 5, "over": [5, 7, 8, 9, 16, 18, 30, 31, 34], "made": [5, 34], "edit": [5, 26, 34], "repo": [5, 6, 7], "commit": 5, "ani": [5, 8, 10, 17, 18, 32, 34], "keep": [5, 12, 18, 21, 28, 32, 33, 34], "realli": 5, "untrack": 5, "deinit": 5, "f": [5, 6, 13, 16, 28, 34], "xdf": 5, "within": [5, 16, 21, 29, 33, 34], "experi": [5, 7, 10, 12, 16, 18, 26, 33, 34], "env_key1": 5, "env_val1": 5, "env_key2": 5, "env_val2": 5, "suit": 5, "locat": [5, 17, 34], "test_": 5, "individu": [5, 30], "filenam": 5, "repres": [5, 7, 21], "wish": [5, 7], "test_jit": 5, "narrow": 5, "down": [5, 32, 34], "testclassnam": 5, "testnam": 5, "let": [5, 10, 18, 19, 20, 21], "sai": 5, "test_sequenti": 5, "testjit": 5, "expecttest": 5, "hypothesi": 5, "mypi": 5, "depend": [5, 7, 17, 18, 25, 26, 33, 34], "conda": [5, 33], "offici": [5, 32, 33, 34], "unittest": 5, "substr": 5, "test_nn": 5, "v": 5, "testnn": 5, "test_bceloss": 5, "test_mseloss": 5, "keystrok": 5, "ci": 5, "quicklint": 5, "aren": 5, "setup_lint": 5, "target": [5, 6, 10, 13, 14, 17, 34], "makefil": 5, "complet": [5, 6, 14, 18, 29, 33], "tab": 5, "trail": [5, 21], "newlin": 5, "quick_check": 5, "flake8": 5, "cmakelint": 5, "tidi": 5, "changed_onli": 5, "written": [5, 6, 17], "framework": [5, 34], "runner": 5, "bin": [5, 6, 17, 31, 32], "gtest_filt": 5, "testsuit": 5, "maycontainalia": 5, "containeraliasingtest": 5, "test_alias_analysi": 5, "docstr": 5, "line": [5, 10, 13, 18, 31, 32, 33], "limit": [5, 8, 10, 20, 26, 32, 33, 34], "80": [5, 30, 31], "charact": 5, "fit": [5, 7, 33, 34], "jupyt": 5, "popup": 5, "prerequisit": [5, 6], "r": [5, 6, 7, 14, 23, 30, 32, 33], "txt": [5, 6, 32], "_build": 5, "rst": 5, "live": 5, "tutori": [5, 6, 15, 16, 34], "autofunct": 5, "autoclass": 5, "shorten": 5, "sphinx": 5, "produc": [5, 8], "miss": 5, "relat": [6, 13, 17, 31, 33, 34], "demonstr": [6, 18, 26, 32], "box": [6, 10, 33], "benefit": [6, 7, 8, 10, 20, 21, 28, 32, 33, 34], "against": 6, "below": [6, 8, 10, 14, 19, 20, 21, 22, 23, 26, 28, 31, 32, 33, 34], "criterion": [6, 8, 16, 22], "zero_grad": [6, 7, 16], "torchvis": [6, 10, 12, 13, 16, 18, 32, 34], "lr": [6, 7, 8, 16, 19], "001": [6, 8], "download": [6, 13, 16], "dataset": [6, 13, 16, 29, 30, 33, 34], "cifar10": [6, 13], "compos": [6, 13], "resiz": [6, 13], "224": [6, 8, 10, 12, 13, 30, 32, 34], "totensor": [6, 13, 16], "train_dataset": [6, 13], "root": [6, 13, 16, 17, 28], "train_load": [6, 8], "128": [6, 8, 10, 13, 20, 30, 34], "crossentropyloss": [6, 16], "momentum": [6, 10, 21], "9": [6, 7, 14, 17, 23, 25, 31, 32], "uncom": 6, "batch_idx": [6, 13], "enumer": [6, 13, 16, 29], "backward": [6, 7, 8, 16, 21, 33, 34], "print": [6, 11, 12, 13, 14, 16, 17, 23, 31], "model_state_dict": 6, "optimizer_state_dict": 6, "pth": 6, "finish": [6, 11, 12, 13, 16, 20], "noqa": [6, 11, 12, 13, 16, 23, 29], "f401": [6, 11, 12, 13, 16, 23, 29], "oneapi": [6, 33], "collect": [6, 32, 33, 34], "commun": [6, 28, 31, 32, 33, 34], "bind": [6, 7, 31, 32, 33, 34], "o": [6, 17, 23, 30], "dist": 6, "oneccl_bindings_for_pytorch": 6, "torch_ccl": 6, "master_addr": 6, "127": [6, 31, 34], "master_port": 6, "29500": [6, 31], "rank": [6, 31, 34], "pmi_rank": 6, "world_siz": [6, 29], "pmi_siz": [6, 29], "init_process_group": 6, "ccl": [6, 31, 34], "init_method": 6, "env": [6, 29], "dist_sampl": 6, "distributedsampl": 6, "sampler": 6, "distributeddataparallel": 6, "batch_id": 6, "destroy_process_group": 6, "nlp": [6, 7, 26, 30, 34], "resnet50_weight": [6, 12, 13], "rand": [6, 8, 12, 13, 20, 26, 34], "vocab_s": [6, 11, 32], "seq_length": [6, 11, 32], "randint": [6, 11, 32], "freez": [6, 8, 10, 13, 15, 16, 20, 23, 26, 32, 34], "check_trac": [6, 13, 32], "strict": [6, 32], "sinc": [6, 7, 18, 19, 20, 21, 26, 33, 34], "manual_se": [6, 11], "43": [6, 11, 31, 32], "12": [6, 10, 14, 17, 30, 31, 32], "instanti": 6, "qconfig_map": 6, "default_static_qconfig_map": 6, "own": [6, 15, 28], "qconfigmap": 6, "per_tensor_affin": [6, 15, 34], "quint8": [6, 15], "set_glob": 6, "traced_model": [6, 10, 13, 15, 16, 26, 34], "static_quantized_model": 6, "local": [6, 20, 28, 31, 32, 33], "default_dynamic_qconfig_map": 6, "placeholderobserv": [6, 15], "is_dynam": [6, 15], "dynamic_quantized_model": 6, "dedic": [6, 28, 34], "faster": [6, 7, 8, 30, 33], "variou": [6, 7, 14, 28, 33, 34], "38": [6, 11, 31, 32], "account": 6, "pretrain": [6, 32, 34], "login": 6, "argpars": [6, 23], "autoconfig": [6, 23], "automodelforcausallm": [6, 23, 29, 34], "autotoken": [6, 23], "parser": [6, 23], "argumentpars": [6, 23], "add_help": [6, 23], "add_argu": [6, 23], "choic": [6, 21, 23, 31], "choos": [6, 8, 20, 23, 31, 33, 34], "dinner": [6, 23], "greedi": [6, 23], "action": [6, 23], "store_tru": [6, 23], "parse_arg": [6, 23], "amp_en": [6, 23], "els": [6, 14, 17, 18, 23], "amp_dtyp": [6, 23], "getattr": [6, 23], "model_id": [6, 23], "125m": 6, "trust_remote_cod": [6, 23], "torch_dtyp": [6, 23], "low_cpu_mem_usag": [6, 23], "memory_format": [6, 7, 18, 23], "channels_last": [6, 7, 18, 23, 33, 34], "num_beam": [6, 23], "generate_kwarg": [6, 23], "do_sampl": [6, 23], "temperatur": [6, 23], "input_s": [6, 23], "return_tensor": [6, 23], "input_id": [6, 23], "inference_mod": [6, 23, 29], "gen_id": [6, 23], "max_new_token": [6, 23], "gen_text": [6, 23], "batch_decod": [6, 23], "skip_special_token": [6, 23], "input_tokens_length": [6, 23], "output_tokens_length": [6, 23], "total_new_token": [6, 23], "zip": [6, 23, 34], "flush": [6, 23], "typic": [6, 10, 28, 33, 34], "summari": [6, 34], "narg": 6, "neelnanda": 6, "pile": 6, "10k": 6, "meta": [6, 18, 28, 29], "7b": [6, 28, 30], "hf": [6, 28], "beam_idx_tmp": 6, "contigu": [6, 13, 18, 33, 34], "global_past_key_valu": 6, "num_attention_head": 6, "user_model": [6, 15], "num_hidden_lay": 6, "pad_val": 6, "pad_max": 6, "tokenize_funct": 6, "set_format": 6, "column": 6, "elif": 6, "collate_batch": 6, "position_ids_pad": 6, "input_ids_pad": 6, "last_ind": 6, "attention_mask_pad": 6, "append": [6, 7], "vstack": 6, "calib_dataset": [6, 29], "load_dataset": 6, "calib_evalu": 6, "shuffl": 6, "collate_fn": 6, "break": [6, 16, 34], "calibration_sampl": 6, "save_qconf_summari": [6, 15, 16, 29], "qconf_summari": [6, 15, 16, 29], "int8_qconfig": 6, "done": [6, 10, 16, 17, 26, 33, 34], "Will": [6, 18], "exit": [6, 31], "benchmark": [6, 26, 30, 31, 34], "lowp": 6, "fp16": [6, 17, 29], "unrel": 6, "lowp_mod": [6, 29], "fall": [6, 12], "back": [6, 12, 17, 18, 21, 26], "implicitli": 6, "determin": [6, 17, 21, 33], "woqweightdtyp": [6, 29], "weight_dtyp": [6, 29], "woqlowpmod": [6, 29], "get_weight_only_quant_qconfig_map": [6, 29], "known": [6, 10, 28], "practic": [6, 21, 24, 28, 33], "libtorch": [6, 34], "suppos": [6, 14, 33], "handl": [6, 18, 33], "servic": [6, 28, 30, 33], "regular": [6, 21], "unlik": 6, "app": [6, 34], "iostream": 6, "argc": 6, "const": [6, 17], "char": 6, "argv": 6, "catch": 6, "c10": [6, 17], "std": [6, 17, 19], "cerr": 6, "ivalu": 6, "push_back": 6, "cout": 6, "slice": [6, 18], "end": [6, 13, 20, 34], "endl": 6, "cmakelist": 6, "cmake_minimum_requir": 6, "version": [6, 7, 16, 17, 25, 26, 27, 32, 33, 34], "fatal_error": 6, "find_packag": 6, "add_execut": 6, "target_link_librari": 6, "torch_ipex_librari": 6, "set_properti": 6, "properti": [6, 32], "cxx_standard": 6, "17": [6, 30, 31, 32], "mkdir": 6, "build": [6, 28, 33, 34], "dcmake_prefix_path": 6, "libpytorch_path": 6, "had": [6, 33], "verifi": [6, 7], "ldd": 6, "workspac": 6, "identif": [6, 17], "gnu": [6, 17, 32], "xx": 6, "cxx": [6, 17], "abi": [6, 17, 34], "usr": [6, 17, 31, 32], "torchconfig": 6, "22": [6, 30, 31, 32], "kineto_librari": 6, "notfound": 6, "stack": [6, 8], "most": [6, 7, 13, 21, 28, 30, 32, 33, 34], "recent": [6, 7, 18], "append_torchlib_if_found": 6, "ipexconfig": 6, "84": [6, 30, 31, 33], "lib": [6, 31, 32], "libintel": [6, 34], "ext": [6, 34], "0x00007f3cf98e0000": 6, "libc10": 6, "0x00007f3cf985a000": 6, "0x00007f3cf70fc000": 6, "libtorch_cpu": 6, "0x00007f3ce16ac000": 6, "libdnnl_graph": 6, "0x00007f3cde954000": 6, "former": 6, "zoo": [6, 30], "simpli": [6, 7, 26, 31], "overview": [7, 25, 29, 34], "three": [7, 16, 17], "claus": [7, 10, 19], "guidanc": 7, "intel_pytorch_extens": [7, 25, 26, 34], "10": [7, 14, 16, 17, 18, 21, 25, 26, 31, 32, 33], "correct": [7, 18, 25, 34], "speed": [7, 11, 19, 28, 33, 34], "happen": 7, "inductor": [7, 34], "level": [7, 10, 13, 16, 18, 20, 21, 26, 33, 34], "migrat": 7, "pattern": [7, 11, 18, 28, 34], "highli": [7, 23, 28, 33, 34], "adapt": 7, "nchw": [7, 33], "nhwc": [7, 33, 34], "could": [7, 13, 16, 18, 26, 32, 33, 34], "anymor": [7, 34], "aka": [7, 18], "cooper": [7, 30, 34], "lake": [7, 30, 34], "avx512": [7, 17, 18, 32, 34], "partial": 7, "upstream": [7, 18, 34], "land": [7, 34], "pr": [7, 18, 34], "being": [7, 33], "review": [7, 34], "instead": [7, 8, 14, 19, 20, 29, 30, 31, 32, 33, 34], "device_nam": [7, 8], "conduct": 7, "frequent": 7, "websit": 7, "registr": 7, "topologi": [7, 18, 19, 26, 30, 31, 33, 34], "roialign": [7, 34], "nm": [7, 34], "cnn": [7, 18, 26, 30, 33, 34], "frozenbatchnorm2d": 7, "num_featur": 7, "batchnorm2d": [7, 10, 26, 34], "statist": 7, "affin": [7, 10, 15, 20, 31, 32, 33], "w": [7, 16, 18, 21, 30, 32], "interact": [7, 34], "beyond": 7, "kind": 7, "gender": 7, "hobbi": 7, "between": [7, 8, 17, 20, 33, 34], "man": [7, 33], "plai": [7, 33], "footbal": 7, "b": [7, 8, 16, 28], "mergedembeddingbag": 7, "embedding_spec": 7, "embeddingspec": 7, "merg": [7, 34], "embeddingbag": [7, 26, 34], "At": [7, 17], "stage": [7, 10, 19, 20, 29, 33, 34], "spars": [7, 18, 34], "dens": [7, 18], "gradient": 7, "mergedembeddingbagwithsgd": 7, "emblist": 7, "modulist": 7, "emb1": 7, "emb2": 7, "emb3": 7, "emb_m": 7, "in1": 7, "in2": 7, "in3": 7, "in_m": 7, "emb": 7, "in_i": 7, "merged_emb": 7, "from_embeddingbag_list": 7, "minim": [7, 14, 17, 33], "heavi": 7, "big": [7, 18], "read": [7, 19], "futur": [7, 28, 34], "visit": [7, 33], "mergedembeddingbagwith": 7, "weight_decai": [7, 19], "grad": [7, 19], "creat": [7, 16, 20, 33, 34], "decai": 7, "to_bfloat16_train": 7, "merged_input": 7, "linearize_indices_and_offset": 7, "need_linearize_indices_and_offset": 7, "booltensor": 7, "becom": [7, 28, 33], "balanc": [7, 16, 22, 33], "embedingbag": 7, "often": 7, "categor": 7, "power": [7, 33, 34], "law": 7, "ag": 7, "video": 7, "game": 7, "19": [7, 30, 31, 32, 34], "29": [7, 31, 32], "row": 7, "write": [7, 17], "address": [7, 18, 31, 32, 33, 34], "conflict": [7, 17], "solv": [7, 19, 33], "togeth": [7, 14, 20, 33, 34], "immedi": 7, "right": [7, 21, 23, 28], "friendli": [7, 33], "gemm": [7, 18, 26, 28, 34], "aim": [7, 10, 16, 33], "math": 7, "wa": [7, 31, 32, 33, 34], "test": [7, 16, 17, 30, 34], "broad": [7, 9, 34], "toggl": 7, "switch": [7, 17, 31, 33, 34], "concern": 7, "footprint": [7, 21, 28, 34], "stick": 7, "splitsgd": [7, 21], "spawn": [7, 20], "subject": [7, 17, 20, 27, 34], "built": [7, 17, 20, 34], "deliv": [7, 28, 34], "separ": [7, 19, 27, 33], "smooth": 7, "ptq": 7, "tackl": 7, "problem": [7, 19, 26, 32, 33], "systemat": 7, "outlier": [7, 16], "commonli": [7, 28, 33, 34], "hopefulli": 7, "eas": [7, 18, 34], "small": [7, 19, 33, 34], "turn": [7, 34], "boolean": [7, 34], "off": [7, 8, 21, 28, 30, 34], "area": [7, 14], "extrem": [7, 14, 33], "situat": [7, 14], "huge": [7, 14, 33], "impract": [7, 14], "consum": [7, 14], "launcher": [7, 13, 31, 33, 34], "integr": [7, 18, 28, 33, 34], "conveni": [8, 34], "lower": [8, 17, 21, 28, 34], "becaus": [8, 17, 18, 21, 28, 33, 34], "lighter": 8, "smaller": [8, 17], "sacrif": 8, "trade": [8, 28, 30, 34], "slower": [8, 33, 34], "accur": 8, "primarili": [8, 34], "show": [8, 17, 21, 28, 29, 30, 31, 32, 33, 34], "simplenet": [8, 34], "super": [8, 10, 16, 20, 26, 34], "stride": [8, 10, 20, 34], "pad": [8, 10, 20, 34], "y": [8, 15, 16, 20, 21, 34], "chosen": [8, 14, 17], "maintain": 8, "categori": [8, 34], "circumst": 8, "imag": [8, 13, 18, 33, 34], "label": 8, "float64": 8, "suppli": 8, "addmm": 8, "addmm_": 8, "cannot": [8, 19, 26, 34], "describ": [8, 13, 18, 21, 32, 33], "expos": 8, "namespac": [8, 17], "regardless": [8, 34], "unlist": 8, "downstream": 8, "believ": [8, 18], "unstabl": 8, "conv1d": [8, 13], "conv3d": [8, 13, 34], "conv_transpose1d": 8, "conv_transpose2d": 8, "conv_transpose3d": 8, "bmm": [8, 34], "mm": 8, "baddbmm": 8, "addbmm": 8, "conv_tbc": 8, "group_norm": 8, "_native_multi_head_attent": 8, "avg_pool3d": 8, "binary_cross_entropi": 8, "grid_sampl": 8, "polar": 8, "prod": 8, "quantil": 8, "nanquantil": 8, "stft": 8, "cdist": 8, "view_as_complex": 8, "choleski": 8, "cholesky_invers": 8, "cholesky_solv": 8, "invers": 8, "lu_solv": 8, "matrix_rank": 8, "orgqr": 8, "ormqr": 8, "pinvers": 8, "max_unpool2d": 8, "max_unpool3d": 8, "adaptive_avg_pool3d": 8, "reflection_pad1d": 8, "reflection_pad2d": 8, "replication_pad1d": 8, "replication_pad2d": 8, "replication_pad3d": 8, "mse_loss": 8, "cosine_embedding_loss": 8, "nll_loss": 8, "nll_loss2d": 8, "hinge_embedding_loss": 8, "poisson_nll_loss": 8, "smooth_l1_loss": 8, "cross_entropy_loss": 8, "l1_loss": 8, "huber_loss": 8, "margin_ranking_loss": 8, "soft_margin_loss": 8, "triplet_margin_loss": 8, "multi_margin_loss": 8, "ctc_loss": 8, "kl_div": 8, "multilabel_margin_loss": 8, "binary_cross_entropy_with_logit": 8, "fft_fft": 8, "fft_ifft": 8, "fft_fft2": 8, "fft_ifft2": 8, "fft_fftn": 8, "fft_ifftn": 8, "fft_rfft": 8, "fft_irfft": 8, "fft_rfft2": 8, "fft_irfft2": 8, "fft_rfftn": 8, "fft_irfftn": 8, "fft_hfft": 8, "fft_ihfft": 8, "linalg_cond": 8, "linalg_matrix_rank": 8, "linalg_solv": 8, "linalg_choleski": 8, "linalg_svdv": 8, "linalg_eigv": 8, "linalg_eigvalsh": 8, "linalg_inv": 8, "linalg_householder_product": 8, "linalg_tensorinv": 8, "linalg_tensorsolv": 8, "fake_quantize_per_tensor_affin": 8, "eig": 8, "geqrf": 8, "lstsq": 8, "_lu_with_info": 8, "qr": 8, "svd": 8, "symeig": 8, "triangular_solv": 8, "fractional_max_pool2d": 8, "fractional_max_pool3d": 8, "adaptive_max_pool3d": 8, "multilabel_margin_loss_forward": 8, "linalg_qr": 8, "linalg_cholesky_ex": 8, "linalg_svd": 8, "linalg_eig": 8, "linalg_eigh": 8, "linalg_lstsq": 8, "linalg_inv_ex": 8, "cat": [8, 31, 32, 34], "index_copi": 8, "intervent": 8, "mixtur": [8, 34], "enable_auto_channels_last": 9, "disable_auto_channels_last": 9, "regress": [9, 34], "rais": 10, "oob": [10, 34], "easili": [10, 15], "who": 10, "inevit": 10, "simplifi": [10, 34], "snippet": [10, 29], "optimum": 10, "monkei": 10, "patch": [10, 34], "embedding_bag": 10, "qa": [10, 34], "clear": 10, "ninstanc": [10, 14, 31, 34], "ncore": [10, 31], "28": [10, 14, 16, 30, 31, 32, 33, 34], "run_qa": [10, 34], "model_name_or_path": [10, 29, 34], "dataset_nam": [10, 34], "squad": [10, 30, 34], "do_ev": [10, 34], "per_device_train_batch_s": [10, 34], "learning_r": [10, 34], "3e": [10, 34], "num_train_epoch": [10, 34], "max_seq_length": [10, 34], "384": [10, 32, 34], "doc_strid": [10, 34], "output_dir": [10, 14, 34], "tmp": [10, 32, 34], "debug_squad": [10, 34], "dummymodul": 10, "input1": 10, "kernel_s": 10, "7": [10, 14, 17, 20, 21, 31, 32, 34], "track_running_stat": 10, "customized_forward": 10, "method1": 10, "success": [10, 24], "method2": 10, "fail": [10, 26, 34], "top": [10, 21, 34], "unabl": 10, "hook": [10, 16], "As": [10, 19, 20, 28, 31, 32, 33, 34], "behaviour": 10, "repeat": [10, 18, 21], "feasibl": 10, "idea": [11, 21, 33], "primit": [11, 20, 30, 34], "portabl": 11, "hpc": 11, "ensur": [11, 19, 20, 32], "perf": [11, 18], "tri": 12, "failur": [12, 34], "incorrect": [12, 26, 34], "trigger": 12, "meanwhil": [12, 33, 34], "resnet50": [12, 13, 14, 18, 30, 31, 33, 34], "dag": 13, "acycl": 13, "straight": [13, 33], "cover": [13, 18, 31], "constant": 13, "resourc": [13, 20, 28, 32, 33], "focus": [13, 34], "front": [13, 34], "batchnorm": [13, 17, 18, 26, 34], "propag": [13, 21, 33], "graph_for": 13, "regard": 13, "rn50": [13, 34], "sum": [13, 16, 18, 19, 34], "convrelu": 13, "convsumrelu": 13, "default_static_qconfig": [13, 15, 32, 34], "quantized_model": [13, 15, 34], "244": 13, "convtranspose3d": 13, "ab": [13, 32], "clamp": 13, "elu": 13, "exp": 13, "hardtanh": 13, "hardswish": [13, 34], "mish": 13, "sigmoid": [13, 34], "pow": 13, "round": [13, 21], "squar": [13, 28], "tanh": [13, 34], "leaki": 13, "_": [13, 15, 16, 17, 18, 20, 30, 31, 32, 33, 34], "div": 13, "view": [13, 18, 20, 21], "transpos": [13, 34], "dequant": [13, 16], "partit": [13, 33], "leaky_relu": 13, "___": 13, "divid": [13, 32, 33, 34], "maxpool2d": 13, "_____": 13, "stock": [13, 30, 34], "owner": 13, "otheriws": 13, "compuat": 13, "wikipedia": [13, 33], "There": [14, 16, 20, 33, 34], "thing": [14, 33], "yaml": 14, "strategi": [14, 33, 34], "grid": 14, "random": 14, "max_trial": 14, "trial": 14, "record": [14, 32], "csv": 14, "hyperparam": 14, "mandatori": 14, "hp": 14, "ncores_per_inst": 14, "all_physical_cor": 14, "ncore_per_inst": [14, 34], "all_logical_cor": 14, "use_all_nod": 14, "num_nod": 14, "use_logical_cor": [14, 32], "is_hyperthreading_en": 14, "disable_numactl": [14, 32], "disable_iomp": [14, 32], "malloc": [14, 31, 33], "tc": 14, "je": 14, "previou": [14, 16, 18, 33, 34], "hyperparamt": 14, "8": [14, 16, 30, 31, 32, 33], "respect": [14, 16, 30, 31, 34], "maxim": 14, "statement": [14, 17], "higher_is_bett": 14, "target_v": 14, "inf": 14, "minimum": [14, 16, 18], "platinum": [14, 30, 32, 33], "8180m": [14, 33], "socket": [14, 30, 32, 33, 34], "anoth": [14, 31, 33, 34], "conf_fil": [14, 34], "hypertune_directori": 14, "termin": 14, "15": [14, 17, 30, 31, 32], "339081764221191": 14, "gave": 14, "side": [15, 33], "compon": [15, 26, 27, 28], "much": [15, 18, 21, 28, 33], "abl": 15, "similar": [15, 17, 33], "satisfi": [15, 26], "tradeoff": 15, "reduce_rang": 15, "methond": 15, "obsev": 15, "symmetr": 15, "sete": 15, "skylak": 15, "quant_stat": 15, "calibration_data_set": [15, 34], "qparam": 15, "And": [15, 20, 32, 34], "achang": 15, "overrid": 15, "load_qconf_summari": 15, "dynamic_qconfig": 15, "default_dynamic_qconfig": [15, 32], "per_tensor_symmetr": 15, "gru": 15, "lstmcell": 15, "rnncell": 15, "grucel": 15, "bother": 16, "desir": [16, 31], "receip": [16, 20], "sq": 16, "difficulti": 16, "vari": 16, "across": [16, 31], "herebi": 16, "obtain": 16, "abil": 16, "optdecoderlay": 16, "blockwis": 16, "consist": [16, 28, 33, 34], "major": 16, "adjust": 16, "accordingli": 16, "predict": 16, "criteria": 16, "consider": 16, "numpi": 16, "np": [16, 31], "tolist": 16, "auto_alpha_arg": 16, "init_alpha": [16, 22], "baselin": [16, 22, 34], "alpha_min": [16, 22], "alpha_max": [16, 22], "99": [16, 30, 34], "alpha_step": [16, 22], "step_siz": [16, 22], "shared_criterion": [16, 22], "enable_blockwise_loss": [16, 22], "portion": 16, "beginn": 16, "quickstart_tutori": 16, "training_data": 16, "fashionmnist": 16, "test_data": 16, "loader": 16, "train_dataload": 16, "test_dataload": 16, "neuralnetwork": 16, "flatten": [16, 20], "linear_relu_stack": 16, "sequenti": 16, "logit": 16, "loss_fn": 16, "pred": 16, "backpropag": 16, "item": 16, "7f": 16, "5d": 16, "epoch": 16, "argmax": 16, "inc": [16, 17, 22, 28], "accu": 16, "tuned_conf": 16, "explain": [17, 18, 21], "fork": [17, 33], "avx512_vnni": 17, "avx512_bf16": 17, "avx2": [17, 26, 34], "avx2_vnni": 17, "avx512_fp16": 17, "11": [17, 31, 32], "gcc": 17, "findavx": 17, "bodi": 17, "anonym": 17, "virtual": 17, "polymorph": 17, "pertain": 17, "cpuid": 17, "statu": 17, "pointer": 17, "system": [17, 33], "specifii": 17, "complier": 17, "isacodegen": 17, "suffix": 17, "adaptiveaveragepoolingkrnl": 17, "isa_codegen": 17, "o3": 17, "d__avx__": 17, "dcpu_capability_avx2": 17, "mavx2": 17, "mfma": 17, "mno": 17, "avx256": 17, "unalign": [17, 34], "dcpu_cap": 17, "dcpu_capability_default": 17, "d__avx512f__": 17, "mavx512f": 17, "mavx512bw": 17, "mavx512vl": 17, "mavx512dq": 17, "dcpu_capability_avx512": 17, "mavx512vnni": 17, "dcpu_capability_avx512_vnni": 17, "mavx512bf16": 17, "dcpu_capability_avx512_bf16": 17, "mamx": 17, "tile": 17, "dcpu_capability_amx": 17, "mavx512fp16": 17, "dcpu_capability_avx512_fp16": 17, "align": [17, 18, 21, 34], "stead": 17, "sleef": 17, "width": [17, 18], "isa_nam": 17, "inlin": 17, "compat": [17, 21], "definit": [17, 21], "Such": 17, "But": [17, 18], "tip": 17, "newkernelkrnl": 17, "newkernel": 17, "header": 17, "special": [17, 18, 28], "fastest": 17, "cpuinfo": 17, "mykernel": 17, "fn_type": 17, "void": 17, "ipex_declare_dispatch": 17, "ipex_define_dispatch": 17, "ipex_register_dispatch": 17, "kcpu": 17, "declar": 17, "ideep": [17, 18], "common": [17, 21, 28, 31, 33], "intrins": 17, "cvtfp32tobf16": 17, "pragma": 17, "torch_ipex": [17, 34], "cvt_fp32_to_bf16": 17, "dst": 17, "cvt_fp32_to_bf16_kernel_impl": 17, "cvt_fp32_to_bf16_kernel_fn": 17, "cvt_fp32_to_bf16_kernel_stub": 17, "macro": 17, "cpu_capability_avx512": 17, "cpu_capability_avx512_bf16": 17, "hav": 17, "cvtfp32tobf16krnl": 17, "vec512": 17, "vec256": 17, "endif": 17, "immintrin": 17, "__m256i": 17, "_cvt_fp32_to_bf16": 17, "__m512": 17, "reinterpret_cast": 17, "_mm512_cvtneps_pbh": 17, "__m512i": 17, "_mm512_castps_si512": 17, "nan": [17, 34], "_mm512_set1_epi32": 17, "0xffff": 17, "mask_valu": 17, "_mm512_cmp_ps_mask": 17, "_cmp_ord_q": 17, "0x1": 17, "vec_bia": 17, "0x7fff": 17, "uint32_t": 17, "lsb": 17, "t_valu": 17, "_mm512_and_si512": 17, "_mm512_srli_epi32": 17, "rounding_bia": 17, "_mm512_add_epi32": 17, "_mm512_mask_blend_epi32": 17, "_mm512_cvtusepi32_epi16": 17, "f32": [17, 18], "_mm512_loadu_p": 17, "_mm256_storeu_si256": 17, "_mm512_maskz_loadu_p": 17, "_mm256_mask_storeu_epi16": 17, "getveclength": 17, "get_cpp_typesize_and_vecs": 17, "scalartyp": 17, "get_cpp_typesize_and_vecsize_kernel_impl": 17, "get_cpp_typesize_and_vecsize_kernel_fn": 17, "get_cpp_typesize_and_vecsize_kernel_stub": 17, "types": 17, "vectors": 17, "getveclengthkrnl": 17, "doubl": 17, "make_tupl": 17, "sizeof": 17, "complexdoubl": 17, "complex": 17, "complexfloat": 17, "decltyp": 17, "impl": 17, "scalartypetocpptyp": 17, "torch_check": 17, "09": [17, 31], "58": [17, 31], "anaconda": 17, "copyright": [17, 27], "credit": 17, "licens": 17, "_c": [17, 26], "_get_current_isa_level": 17, "_get_highest_cpu_support_isa_level": 17, "_get_highest_binary_support_isa_level": 17, "quit": [17, 34], "By": [17, 31, 33], "aten_cpu_cap": 17, "effect": [17, 21, 26, 32, 33], "intern": [17, 18, 20, 32], "purpos": [17, 31, 32, 33], "addtion": 17, "tool": [17, 33, 34], "subfold": 17, "rh": 17, "toolset": 17, "33": [17, 31, 32], "cmakefil": 17, "cpu_featur": 17, "dir": [17, 31], "66": [17, 31, 34], "cpu_feature_main": 17, "xcr0": 17, "00000000000602e7": 17, "mmx": 17, "sse": 17, "sse2": 17, "sse3": 17, "ssse3": 17, "sse4_1": 17, "sse4_2": 17, "aes_ni": 17, "sha": 17, "xsave": 17, "fma": 17, "f16c": 17, "avx_vnni": 17, "avx512_f": 17, "avx512_cd": 17, "avx512_pf": 17, "avx512_er": 17, "avx512_vl": 17, "avx512_bw": 17, "avx512_dq": 17, "avx512_ifma": 17, "avx512_vbmi": 17, "avx512_vpopcntdq": 17, "avx512_4fmap": 17, "avx512_4vnniw": 17, "avx512_vbmi2": 17, "avx512_vpclmul": 17, "avx512_bitalg": 17, "avx512_vp2intersect": 17, "amx_bf16": 17, "amx_til": 17, "amx_int8": 17, "prefetchw": 17, "prefetchwt1": 17, "represent": 18, "multidimension": 18, "arrai": 18, "nd": 18, "1d": 18, "semant": 18, "attribut": 18, "coo": 18, "canon": 18, "assign": [18, 32, 33], "2d": 18, "height": 18, "illustr": [18, 19, 21, 31, 33], "actual": [18, 21], "bmp": 18, "contiguous_format": [18, 33], "tensorflow": 18, "close": [18, 31, 33], "to_mkldnn": 18, "difficult": 18, "manipul": 18, "to_dens": 18, "natur": [18, 21, 28], "hold": [18, 33], "secret": 18, "ingredi": 18, "almost": 18, "foundat": [18, 33], "upper": [18, 33], "fact": [18, 33], "expens": 18, "benefici": 18, "nb": 18, "me": 18, "roughli": 18, "50": [18, 31, 32], "mkldnn": 18, "mkldnn_util": 18, "subsequ": [18, 33], "concept": [18, 33], "diagram": [18, 33], "hard": [18, 26], "conclus": 18, "necessari": 18, "neglig": 18, "move": [18, 33], "organ": 18, "question": [18, 30], "reinterpret": 18, "answer": [18, 30], "chw": 18, "hw": 18, "stride_n": 18, "stride_c": 18, "stride_h": 18, "stride_w": 18, "merit": 18, "express": [18, 34], "noncontigu": 18, "n1": 18, "n2": 18, "mind": [18, 32], "someth": 18, "reli": [18, 20], "rfc": 18, "hwc": 18, "wc": 18, "chwn": 18, "hwn": 18, "wn": 18, "empti": [18, 31], "outplac": [18, 34], "is_contigu": 18, "_appli": 18, "brief": [18, 28, 34], "imagenet": [18, 30], "spontan": 18, "tell": [18, 20, 33], "NOT": [18, 31], "compris": 18, "explicit": [18, 20, 33], "implicit": 18, "tensoriter": 18, "guidelin": 18, "awar": [18, 20, 31, 32], "my": 18, "upsampl": [18, 34], "cudnn": 18, "accommod": 18, "md": 18, "format_tag": 18, "src_md": 18, "desc": 18, "data_typ": 18, "src_mem": 18, "src_data_ptr": 18, "card": 18, "hwio": 18, "resnext101": [18, 34], "detectron2": 18, "8x": 18, "lamb": [19, 21], "adagrad": [19, 21], "clr": 19, "lr_decai": 19, "state_sum": 19, "addcmul_": 19, "add_": 19, "addcdiv_": 19, "whole": [19, 20, 33], "storag": 19, "onboard": [19, 33], "third": [19, 34], "high": [19, 21, 33], "bound": [19, 20, 28, 33], "bottl": 19, "neck": 19, "prevent": 19, "pseudo": [19, 21, 34], "adagrad_fused_step": 19, "group": [19, 20, 33], "grad0": 19, "grad1": 19, "grad_n": 19, "param_n": 19, "state_sum_n": 19, "adagrad_step": 19, "grad_i": 19, "param_i": 19, "state_sum_i": 19, "other_arg": 19, "coupl": [20, 33, 34], "omp": [20, 26, 31, 32, 33, 34], "ld_preload": [20, 31, 32, 33], "libiomp5": [20, 31, 32, 33], "model_script": 20, "examplenet": 20, "examplenet1": 20, "x1": 20, "start_dim": 20, "examplenet2": 20, "conv2": 20, "x2": 20, "y1": 20, "y2": 20, "model1": 20, "traced_model1": 20, "model2": 20, "traced_model2": 20, "multi_stream_model": [20, 34], "datatyp": [20, 34], "receipt": 20, "steam": [20, 34], "input_hint": 20, "output_hint": 20, "pthread": 20, "async": [20, 34], "wake": 20, "synchron": [20, 26, 34], "imper": [20, 34], "suffer": 20, "gil": 20, "hurt": 20, "mitig": [20, 30], "omp_num_thread": [20, 26, 31, 32, 34], "phase": 20, "s1": 20, "c1": 20, "numactl": [20, 31, 32], "outsid": 20, "superset": 20, "undefin": [20, 33], "gb": 20, "simultan": 20, "correspond": [20, 31, 34], "cpu_pool1": 20, "cpu_pool2": 20, "task1": 20, "task2": 20, "y1_futur": 20, "y2_futur": 20, "y_runtim": 20, "kmp_": 20, "fulfil": 20, "worker": [20, 31], "serv": [20, 34], "sub": [20, 28, 33], "wait": [20, 33], "futuretensor": 20, "didn": 20, "dlopen": 20, "symbol": 20, "bottom": 21, "bit": [21, 28], "sign": 21, "expon": 21, "mantissa": 21, "23": [21, 31, 32], "capac": [21, 30], "digit": 21, "shorter": [21, 28], "fewer": 21, "neg": 21, "disadvantag": 21, "shift": 21, "left": [21, 28, 32], "lose": 21, "decim": 21, "valid": [21, 34], "1234500000": 21, "0000012345": 21, "1234512345": 21, "sens": 21, "fraction": 21, "12345": 21, "00000": 21, "signific": 21, "bui": 21, "involv": 21, "ground": 21, "truth": 21, "chain": 21, "rule": [21, 34], "meet": [21, 33, 34], "wide": [21, 34], "understand": [21, 28, 33], "formula": 21, "\u03b1": 21, "gw": 21, "denot": 21, "receiv": 21, "rate": 21, "earlier": 21, "inaccur": 21, "exactli": 21, "kept": 21, "halv": 21, "recov": 21, "fp32_w": 21, "concat_fp32_from_bf16": 21, "bf16_w": 21, "fp32_gw": 21, "bf16_gw": 21, "weight_dacai": 21, "split_bf16_from_fp32": 21, "ratio": [22, 30, 34], "beta": [23, 26], "demostr": 23, "cheat": 23, "sheet": 23, "pypi": [26, 34], "occupi": 26, "remark": [26, 30, 33], "__name__": [26, 34], "__main__": [26, 31, 32, 34], "112": [26, 30, 33, 34], "nnc": 26, "poor": [26, 34], "xlm": 26, "roberta": [26, 34], "casual": 26, "gpt2": 26, "summar": 26, "classif": [26, 30], "allenai": 26, "longform": 26, "409": 26, "workaround": [26, 34], "_jit_set_texpr_fuser_en": 26, "csrc": 26, "tensorexpr_fus": 26, "settensorexprfuseren": 26, "longer": [26, 30], "complic": [26, 31, 33], "undergo": [26, 29], "runtimeerror": [26, 34], "overflow": [26, 34], "unpack": [26, 34], "exce": [26, 30, 33, 34], "quantize_per_tensor": 26, "pseudocod": [26, 34], "omp_num_threa": 26, "set_num_thread": [26, 34], "freezed_model": [26, 34], "run_benchmark": [26, 34], "flow": 26, "bag": [26, 34], "progress": [26, 28, 34], "abnorm": [26, 34], "tbd": 26, "transformerencoderlay": 26, "encount": [26, 34], "rnnt": [26, 34], "joint_net": [26, 34], "caller": [26, 34], "apach": [27, 32], "notic": [27, 31, 32], "term": 27, "condit": 27, "multiheadattent": 28, "feedforward": 28, "lot": [28, 34], "besid": [28, 33, 34], "adopt": [28, 34], "modelfamili": 28, "hub": 28, "staticquantizationint8": 28, "onlyquantizationint8": 28, "onlyquantizationint4": 28, "13b": [28, 30, 34], "70b": [28, 34], "8b": 28, "20b": 28, "dolli": [28, 34], "databrick": 28, "v2": [28, 30, 34], "12b": 28, "tiiuae": 28, "40b": 28, "30b": 28, "3b": 28, "bigscienc": 28, "1b7": 28, "salesforc": 28, "2b": 28, "baichuan2": [28, 34], "chat": 28, "thudm": 28, "chatglm3": [28, 34], "chatglm2": [28, 34], "bigcod": 28, "starcod": [28, 34], "flan": 28, "xl": 28, "mosaicml": 28, "mistralai": 28, "v0": 28, "8x7b": 28, "stabilityai": 28, "1_6b": 28, "liuhaotian": 28, "v1": [28, 34], "microsoft": 28, "ieityuan": 28, "yuan2": 28, "102b": 28, "signifi": 28, "perfect": 28, "codellama": 28, "rope": 28, "past": 28, "year": 28, "flourish": 28, "contribut": [28, 31, 34], "research": 28, "web": 28, "legend": 28, "autotp": 28, "obviou": 28, "hotspot": 28, "lead": 28, "significantli": [28, 34], "heavier": 28, "io": 28, "occurr": 28, "ship": 28, "2nd": 28, "4th": [28, 30], "except": [28, 31], "beeter": 28, "Its": 28, "seen": 28, "woq": 28, "integ": [28, 33], "bandwidth": 28, "reorder_cach": 28, "beam_width": 28, "secondli": 28, "elimin": 28, "shard": 28, "content": [29, 34], "your_calibration_dataset": 29, "calib_sampl": 29, "calibration_model": 29, "qconfig_summary_file_path": 29, "nf4": 29, "init_distribut": 29, "get_acceler": 29, "communication_backend_nam": 29, "var": 29, "ondevic": 29, "init_infer": 29, "mp_size": 29, "base_dir": 29, "repo_root": 29, "checkpoints_json": 29, "zone": [30, 34], "articl": [30, 33], "llama2": [30, 34], "1024": [30, 33], "were": [30, 31, 32, 33], "carri": 30, "m7i": 30, "m6i": [30, 32], "47x": 30, "62x": 30, "57x": 30, "58x": 30, "85x": 30, "27x": 30, "38x": 30, "29x": 30, "36x": 30, "conclud": [30, 34], "respons": 30, "session": 30, "exhibit": 30, "wherea": 30, "p90": 30, "26x": 30, "sec": 30, "39": [30, 31, 32, 34], "26": [30, 31, 32], "49": [30, 31, 32], "170": 30, "21": [30, 31, 32], "measur": [30, 34], "17th": 30, "16xlarg": 30, "u": [30, 32], "west": 30, "ubuntu": 30, "04": [30, 31], "1009": 30, "sw": 30, "workload1": 30, "inference2": 30, "realtim": 30, "inference3": 30, "tunabl": [30, 32], "8380": 30, "30ghz": 30, "83x": 30, "44x": 30, "ssd": [30, 34], "resnet34": [30, 34], "16x": 30, "coco": 30, "1200": 30, "resnext": 30, "32x16d": 30, "81x": 30, "21x": 30, "vgg": 30, "75x": 30, "19x": 30, "shufflenetv2_x1": 30, "07x": 30, "78x": 30, "04x": 30, "max_seq_len": 30, "384task": 30, "jemalloc": [30, 32, 34], "05x": 30, "96x": 30, "mrpc": 30, "128task": 30, "distilbert": 30, "12x": 30, "dnnl": 30, "base_text_classif": 30, "f1": 30, "81": [30, 31], "79": [30, 31], "93": 30, "02": [30, 32], "85": [30, 31], "86": [30, 31], "top1": 30, "76": [30, 31], "75": [30, 31], "98": 30, "78": [30, 31], "199": 30, "48": [30, 31, 32], "vgg11": 30, "69": [30, 31], "67": [30, 31, 34], "96": 30, "44": [30, 31, 32], "36": [30, 31, 32], "92": 30, "97": 30, "shufflenet": 30, "histogram": [30, 34], "40": [30, 31, 32, 34], "ucod": 30, "0xd0002a0": 30, "ON": 30, "turboboost": 30, "bio": 30, "ddr": 30, "16gb": 30, "3200": 30, "dcpmm": 30, "256gb": 30, "host": [30, 34], "cento": 30, "2105": 30, "18": [30, 31, 32], "305": 30, "el8_4": 30, "x86_64": 30, "docker": [30, 34], "spectr": 30, "meltdown": 30, "24x": 30, "31x": 30, "15x": 30, "30x": 30, "mobilenet": 30, "08x": 30, "03x": 30, "09x": 30, "39x": 30, "35x": 30, "160": 30, "55x": 30, "06x": 30, "fpn": 30, "71x": 30, "20x": 30, "13x": 30, "32x": 30, "48x": 30, "11x": 30, "terabyt": 30, "14x": 30, "02x": 30, "10x": 30, "33x": 30, "8380h": 30, "90ghz": 30, "56": [30, 31, 32, 33], "67x": 30, "45x": 30, "77x": 30, "18x": 30, "formerli": [30, 33, 34], "0x700001c": 30, "wlydcrb1": 30, "sy": 30, "0016": 30, "p29": 30, "2006080250": 30, "64gb": 30, "768gb": 30, "influenc": [31, 33], "properli": 31, "themselv": [31, 34], "free": [31, 34], "mainli": [31, 34], "around": 31, "interpret": 31, "prefix": 31, "cross": [31, 32, 33, 34], "taskset": 31, "malloc_conf": [31, 33], "crash": [31, 33, 34], "nnode": 31, "nproc": 31, "count": 31, "addr": 31, "ip": 31, "hostnam": 31, "proc": 31, "port": 31, "hostfil": 31, "mpi": 31, "mpiexec": 31, "hydra": 31, "ppn": 31, "genv": 31, "i_mpi_pin_domain": 31, "codeless": 31, "ut": 31, "exclus": 31, "mutual": 31, "ld": 31, "favorit": 31, "kmp": [31, 33], "granular": [31, 32, 33], "compact": [31, 32, 33], "stdout": 31, "afterward": [31, 33], "undesir": 31, "_timestamp_inst": 31, "_timestamp_instance_": 31, "_core": 31, "run_20210712212258_inst": 31, "run_20210712212258_instance_0_cores_0": 31, "gif": 31, "07": 31, "764": 31, "conda_prefix": [31, 32], "virtual_env": [31, 32], "lib64": [31, 32], "home": [31, 32], "drop": [31, 32], "kmp_affin": [31, 32, 33], "kmp_blocktim": [31, 32, 33], "14": [31, 32, 34], "24": [31, 32], "25": [31, 32], "27": [31, 32, 33], "30": [31, 32], "31": [31, 32], "34": [31, 32], "35": [31, 32], "37": [31, 32, 34], "41": [31, 32], "42": [31, 32], "tee": 31, "run_20210712223308_inst": 31, "run_20210712223308_instance_0_cores_0": 31, "87": 31, "08": 31, "117": 31, "88": 31, "118": 31, "45": [31, 32], "46": [31, 32], "47": [31, 32], "51": [31, 32], "52": [31, 32], "53": [31, 32], "54": [31, 32], "55": [31, 32, 33], "57": 31, "59": 31, "60": 31, "61": 31, "62": 31, "63": [31, 34], "65": 31, "68": [31, 34], "70": 31, "71": 31, "72": 31, "73": 31, "74": 31, "77": 31, "82": 31, "83": [31, 33], "run_20210712214504_inst": 31, "run_20210712214504_instance_0_cores_22": 31, "513": 31, "run_20210712220928_inst": 31, "run_20210712220928_instance_0_cores_0": 31, "355": 31, "356": 31, "deduct": 31, "run_20210712221615_inst": 31, "run_20210712221615_instance_0_cores_11": 31, "591": 31, "run_20210712221150_inst": 31, "run_20210712221150_instance_0_cores_0": 31, "run_20210712221150_instance_1_cores_22": 31, "233": 31, "236": 31, "run_20210712221415_inst": 31, "run_20210712221415_instance_0_cores_0": 31, "run_20210712221415_instance_1_cores_4": 31, "run_20210712221415_instance_2_cores_8": 31, "run_20210712221415_instance_3_cores_12": 31, "run_20210712221415_instance_4_cores_16": 31, "run_20210712221415_instance_5_cores_20": 31, "run_20210712221415_instance_6_cores_24": 31, "run_20210712221415_instance_7_cores_28": 31, "run_20210712221415_instance_8_cores_32": 31, "run_20210712221415_instance_9_cores_36": 31, "run_20210712221415_instance_10_cores_40": 31, "140": 31, "143": 31, "146": 31, "149": 31, "151": 31, "154": 31, "157": 31, "159": 31, "162": 31, "164": 31, "167": 31, "run_20210712221305_inst": 31, "run_20210712221305_instance_0_cores_0": 31, "run_20210712221305_instance_1_cores_11": 31, "run_20210712221305_instance_2_cores_22": 31, "run_20210712221305_instance_3_cores_33": 31, "470": 31, "471": 31, "473": 31, "476": 31, "479": 31, "instance_idx": 31, "independ": 31, "confirm": 31, "175": 31, "176": 31, "177": 31, "run_20220106130151_instance_0_cores_0": 31, "sometim": [31, 33], "235": 31, "jemallocl": 31, "oversize_threshold": [31, 33], "background_thread": [31, 33], "metadata_thp": [31, 33], "dirty_decay_m": [31, 33], "9000000000": [31, 33], "muzzy_decay_m": [31, 33], "libjemalloc": 31, "run_20210713153048_instance_0_cores_0": 31, "654": 31, "libtcmalloc": [31, 32], "655": 31, "run_20210713153333_instance_0_cores_0": 31, "784": 31, "run_20210713153659_instance_0_cores_0": 31, "blocktim": 31, "00": [31, 34], "760": [31, 32], "761": [31, 32], "omp_schedul": [31, 33], "omp_proc_bind": [31, 33], "run_20210713152500_instance_0_cores_0": 31, "give": [32, 34], "ipex_en": 32, "procedur": 32, "tunin": 32, "dramat": [32, 33], "cpu_launcher_en": 32, "cpu_launcher_arg": 32, "hyperthread": 32, "present": 32, "ital": 32, "ptmalloc": 32, "use_default_alloc": [32, 34], "tcmalloc": 32, "enable_tcmalloc": 32, "enable_jemalloc": 32, "nth": [32, 33], "uniform": 32, "overlap": 32, "signficantli": 32, "8180": 32, "affinit": 32, "addition": 32, "kill": 32, "unutil": 32, "restart": 32, "remain": 32, "aliv": 32, "taken": 32, "care": 32, "worri": 32, "continu": [32, 34], "Then": 32, "interrupt": 32, "dummi": 32, "dummy_tensor": 32, "scheme": 32, "bert_int8_jit": 32, "n_iter": 32, "rn50_int8_jit": 32, "usus": 32, "rn50_ipex_int8": 32, "handler": 32, "image_classifi": 32, "similarli": 32, "bert_ipex_int8": 32, "transformer_handler_gener": 32, "setup_config": 32, "seq_classification_artifact": 32, "index_to_nam": 32, "nc": 32, "model_stor": 32, "server": [32, 33], "rest": 32, "model_log": 32, "096": 32, "8375c": 32, "03": 32, "981": 32, "982": 32, "previous": 32, "cases": 32, "223": 32, "site": 32, "model_service_work": 32, "sock": 32, "unix": 32, "9000": 32, "762": 32, "763": 32, "9001": 32, "274": 32, "9002": 32, "975": 32, "9003": 32, "bench": 32, "amazon": 32, "ec2": 32, "24xlarg": 32, "reproduc": 32, "url": [32, 34], "modelurl": 32, "inputpath": 32, "concurr": [32, 33], "huggingface_transform": 32, "sample_text_captum_input": 32, "graphic": 33, "xe": 33, "briefli": 33, "background": 33, "knowledg": 33, "c620": 33, "seri": 33, "chipset": 33, "purlei": 33, "chip": 33, "inclus": 33, "1mb": 33, "l2": 33, "2666": 33, "mhz": 33, "ddr4": 33, "six": 33, "ultra": 33, "interconnect": 33, "upi": 33, "microarchitectur": 33, "connect": 33, "transfer": 33, "equip": 33, "motherboard": 33, "attach": 33, "remot": 33, "asu": 33, "z11pa": 33, "d8": 33, "competit": 33, "stall": 33, "busi": 33, "uma": 33, "lscpu": 33, "retriev": 33, "111": 33, "50ghz": 33, "node0": 33, "node1": 33, "sophist": 33, "brought": [33, 34], "polici": 33, "put": 33, "sysctl": 33, "great": 33, "placement": 33, "cpunodebind": 33, "membind": 33, "multithread": 33, "primari": 33, "consecut": 33, "join": 33, "libgomp": 33, "libiomp": 33, "hang": [33, 34], "gomp_cpu_affin": 33, "comma": 33, "invalid": 33, "thrash": 33, "did": [33, 34], "compet": 33, "unus": 33, "proclist": 33, "millisecond": 33, "sleep": 33, "200m": 33, "period": 33, "elaps": 33, "overal": 33, "appropri": 33, "reserv": 33, "sole": 33, "penal": 33, "role": 33, "unnecessari": 33, "destruct": 33, "emphas": 33, "fragment": 33, "mmuzzy_decay_m": 33, "forg": 33, "dealloc": 33, "costli": 33, "gpertool": 33, "plu": 33, "pretti": 33, "nifti": 33, "analysi": 33, "gperftool": 33, "set_flush_denorm": 33, "warm": 33, "therefor": 33, "threshold": 33, "usuali": 33, "come": 33, "maskrcnn": [33, 34], "wav2vec2": 33, "recognit": 33, "onednn_primitive_cache_capac": 33, "65536": 33, "voic": 33, "excit": 34, "announc": 34, "accompani": 34, "privat": 34, "broader": 34, "sincer": 34, "encourag": 34, "feedback": 34, "creator": 34, "reach": 34, "hf_beam_sampl": 34, "hf_beam_search": 34, "hf_greedy_search": 34, "hf_sampl": 34, "walk": 34, "2561": 34, "2584": 34, "2617": 34, "2663": 34, "2733": 34, "act": 34, "2550": 34, "2568": 34, "2641": 34, "2675": 34, "2613": 34, "upgrad": 34, "v3": 34, "2747": 34, "misc": 34, "2468": 34, "2627": 34, "2631": 34, "2704": 34, "changelog": 34, "optimize_transform": 34, "your_generation_param": 34, "newli": 34, "varianc": 34, "encod": 34, "2349": 34, "2412": 34, "2469": 34, "2476": 34, "flash": 34, "2317": 34, "2334": 34, "2392": 34, "2480": 34, "elser": 34, "2491": 34, "public": 34, "2473": 34, "2511": 34, "2433": 34, "2253": 34, "2251": 34, "2236": 34, "2278": 34, "2257": 34, "dockerfil": 34, "ux": 34, "2229": 34, "2195": 34, "2299": 34, "2315": 34, "2283": 34, "2280": 34, "2292": 34, "2275": 34, "2319": 34, "2198": 34, "2264": 34, "2290": 34, "experiment": 34, "workflow": 34, "1563": 34, "excess": 34, "1677": 34, "1688": 34, "1664": 34, "lar": 34, "1695": 34, "dictionari": 34, "1682": 34, "2137": 34, "1568": 34, "1585": 34, "1590": 34, "1587": 34, "1594": 34, "old": 34, "hypervisor": 34, "vm": 34, "1513": 34, "1593": 34, "padding_mod": 34, "1580": 34, "1566": 34, "transnetv2": 34, "1564": 34, "rnn": 34, "avx512_core_vnni": 34, "1592": 34, "1589": 34, "1517": 34, "hero": 34, "inspir": 34, "stanford": 34, "consumpt": 34, "ve": 34, "1341": 34, "instancenorm": 34, "1330": 34, "1414": 34, "1473": 34, "1419": 34, "1488": 34, "webpag": 34, "1318": 34, "1353": 34, "1328": 34, "1355": 34, "1367": 34, "1384": 34, "1295": 34, "1392": 34, "1376": 34, "1373": 34, "1338": 34, "1391": 34, "1322": 34, "usabl": 34, "effort": 34, "cv": 34, "refin": 34, "identifi": 34, "torchrun": 34, "shortcut": 34, "mkl": 34, "sgemm": 34, "geomean": 34, "auto_ipex": 34, "hood": 34, "calibrated_model": 34, "model_to_be_calibr": 34, "992": 34, "64byte": 34, "addlayernorm": 34, "retinanet": 34, "1032": 34, "1053": 34, "1074": 34, "tightli": 34, "matur": 34, "offlin": 34, "becam": 34, "bake": 34, "wave2vec": 34, "albert": 34, "facilit": 34, "minmax": 34, "movingaverageminmax": 34, "polish": 34, "flexibl": 34, "quantconf": 34, "multi_stream_input_hint": 34, "multi_stream_output_hint": 34, "adam": 34, "822": 34, "3d": 34, "642": 34, "deconv3d": 34, "692": 34, "787": 34, "swish": 34, "fsi": 34, "risk": 34, "551": 34, "leakyrelu": 34, "589": 34, "407": 34, "647": 34, "convolution1d": 34, "657": 34, "einsum": 34, "alphafold2": 34, "674": 34, "711": 34, "threa": 34, "slow": 34, "equival": 34, "joint": 34, "net": 34, "pend": 34, "648": 34, "684": 34, "685": 34, "dockerhub": 34, "wheel": 34, "sdk": 34, "2x": 34, "5x": 34, "reduct": 34, "center": 34, "deploi": 34, "u8": 34, "s8": 34, "satur": 34, "occur": 34, "u7": 34, "unsign": 34, "s7": 34, "worth": 34, "upload": 34, "pip3": 34, "whl": 34, "220mb": 34, "5mb": 34, "dep": 34, "220m": 34, "cxx11": 34, "224m": 34, "7m": 34, "5m": 34, "qkv": 34, "278": 34, "531": 34, "432": 34, "438": 34, "602": 34, "sliu": 34, "hardsigmoid": 34, "relu6": 34, "selu": 34, "524": 34, "452": 34, "425": 34, "100mb": 34, "40mb": 34, "meant": 34, "resolv": 34, "te": 34, "wrap": 34, "bactchnorm": 34, "205": 34, "straightforward": 34, "underhood": 34, "torchvison": 34, "hugginfac": 34, "legal": 34, "resnet18": 34, "resnet18_xpu": 34, "enable_auto_mixed_precis": 34, "mixed_dtyp": 34, "mymodel": 34, "xx_c": 34, "xx_v": 34, "clibrat": 34, "ampconf": 34, "automixprecis": 34, "running_mod": 34, "cali_dataset": 34, "trace_model": 34, "omp_set_num_thread": 34, "model_execut": 34, "same_model_execution_again": 34, "descriptor": 34, "rc3": 34, "parti": 34, "49786": 34, "rc": 34, "readm": 34, "stakehold": 34, "5rc3": 34, "dpcpp": 34, "heterogen": 34, "bfp16": 34, "proper": 34, "tacotron2": 34, "frozenbatchnorm": 34, "embeddingbad": 34, "daili": 34, "resnext3d": 34, "maskrnn": 34, "codenam": 34, "mlp": 34, "eltwis": 34, "7x": 34, "enable_auto_optim": 34, "streamlin": 34, "enable_auto_mix_precis": 34, "inject": 34, "resnet3d": 34, "fb": 34, "yolov3": 34, "maxpool": 34}, "objects": {"": [[2, 0, 0, "-", "intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu": [[2, 0, 0, "-", "runtime"]], "intel_extension_for_pytorch.cpu.runtime": [[2, 1, 1, "", "CPUPool"], [2, 1, 1, "", "MultiStreamModule"], [2, 1, 1, "", "MultiStreamModuleHint"], [2, 1, 1, "", "Task"], [2, 2, 1, "", "get_core_list_of_node_id"], [2, 2, 1, "", "is_runtime_ext_enabled"], [2, 1, 1, "", "pin"]], "intel_extension_for_pytorch": [[2, 2, 1, "", "enable_onednn_fusion"], [2, 2, 1, "", "fast_bert"], [2, 0, 0, "-", "llm"], [2, 2, 1, "", "optimize"], [2, 0, 0, "-", "quantization"], [2, 1, 1, "", "verbose"]], "intel_extension_for_pytorch.llm": [[2, 0, 0, "-", "functional"], [2, 0, 0, "-", "modules"], [2, 2, 1, "", "optimize"]], "intel_extension_for_pytorch.llm.functional": [[2, 2, 1, "", "fast_layer_norm"], [2, 2, 1, "", "indirect_access_kv_cache_attention"], [2, 2, 1, "", "rms_norm"], [2, 2, 1, "", "rotary_embedding"], [2, 2, 1, "", "varlen_attention"]], "intel_extension_for_pytorch.llm.modules": [[2, 1, 1, "", "FastLayerNorm"], [2, 1, 1, "", "IndirectAccessKVCacheAttention"], [2, 1, 1, "", "Linear2SiluMul"], [2, 1, 1, "", "LinearAdd"], [2, 1, 1, "", "LinearAddAdd"], [2, 1, 1, "", "LinearGelu"], [2, 1, 1, "", "LinearMul"], [2, 1, 1, "", "LinearNewGelu"], [2, 1, 1, "", "LinearRelu"], [2, 1, 1, "", "LinearSilu"], [2, 1, 1, "", "LinearSiluMul"], [2, 1, 1, "", "PagedAttention"], [2, 1, 1, "", "RMSNorm"], [2, 1, 1, "", "RotaryEmbedding"], [2, 1, 1, "", "VarlenAttention"]], "intel_extension_for_pytorch.nn": [[7, 1, 1, "", "FrozenBatchNorm2d"]], "intel_extension_for_pytorch.nn.functional": [[7, 2, 1, "", "interaction"]], "intel_extension_for_pytorch.nn.modules": [[7, 1, 1, "", "MergedEmbeddingBag"], [7, 1, 1, "", "MergedEmbeddingBagWithSGD"]], "intel_extension_for_pytorch.quantization": [[2, 2, 1, "", "autotune"], [2, 2, 1, "", "convert"], [2, 2, 1, "", "get_smooth_quant_qconfig_mapping"], [2, 2, 1, "", "prepare"]]}, "objtypes": {"0": "py:module", "1": "py:class", "2": "py:function"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "class", "Python class"], "2": ["py", "function", "Python function"]}, "titleterms": {"intel": [0, 1, 5, 6, 15, 30, 31, 32, 33], "extens": [0, 1, 5, 7, 15, 20, 26, 32], "pytorch": [0, 1, 5, 15, 18, 32], "cpu": [0, 2, 17, 18, 33], "isa": [0, 7, 17], "dynam": [0, 6, 7, 15, 17, 26], "dispatch": [0, 7, 17], "design": [0, 17, 20, 31], "doc": 0, "architectur": 1, "support": [1, 8, 10], "api": [2, 7, 9, 13, 16, 17, 18, 22, 25, 28, 29], "document": [2, 5, 25, 32, 33], "gener": [2, 26], "llm": [2, 6, 7, 23, 28, 30], "modul": [2, 10, 20, 28], "level": [2, 17, 28], "optim": [2, 7, 10, 13, 15, 19, 28, 29], "prototyp": [2, 6, 7, 10, 11, 12, 14, 16, 22, 28], "fast": [2, 6, 7, 11], "bert": [2, 6, 7, 11, 32], "graph": [2, 7, 12, 13, 28], "quantiz": [2, 6, 7, 15, 16, 29], "runtim": [2, 7, 20, 26], "blog": 3, "public": 3, "cheat": 4, "sheet": 4, "contribut": 5, "develop": 5, "tip": 5, "debug": [5, 17], "unit": 5, "test": 5, "python": [5, 6, 7], "better": 5, "local": 5, "pytest": 5, "lint": 5, "c": [5, 6, 18], "write": [5, 18], "build": [5, 17], "exampl": [6, 10, 11, 12, 14, 16, 17, 20, 31], "train": [6, 8], "singl": [6, 28, 31], "instanc": [6, 28, 30, 31], "float32": [6, 8], "bfloat16": [6, 8, 21, 26, 30], "distribut": [6, 28, 29], "infer": [6, 8, 28, 29, 31, 32], "eager": [6, 8], "mode": [6, 28, 31], "resnet50": [6, 32], "torchscript": [6, 8], "torchdynamo": [6, 26], "beta": [6, 7], "new": [6, 7, 34], "featur": [6, 7, 11, 12, 17], "from": [6, 7], "2": [6, 7, 14, 32, 34], "0": [6, 7, 34], "int8": [6, 7, 13, 16, 26, 30, 32], "static": [6, 15], "calibr": [6, 15], "deploy": 6, "larg": [6, 7, 28], "languag": [6, 7, 28], "model": [6, 7, 13, 15, 18, 20, 28, 32], "fp32": [6, 10, 13, 29, 30], "bf16": [6, 10, 13, 29], "smooth": [6, 16, 22], "weight": [6, 29], "onli": [6, 29], "int4": 6, "ai": [6, 30], "refer": [6, 8], "easi": 7, "us": [7, 8, 9, 10, 13, 16, 20, 31], "1": [7, 14, 32, 34], "torch": 7, "compil": [7, 17], "auto": [7, 8, 9, 16, 20], "channel": [7, 9, 18, 33], "last": [7, 9, 18, 33], "mix": [7, 8], "precis": [7, 8, 28], "amp": [7, 8], "oper": [7, 18, 19, 28], "codeless": [7, 10], "13": [7, 34], "captur": [7, 12], "hypertun": [7, 14], "introduct": [8, 19, 25], "case": [8, 10, 20], "default": [8, 9, 14, 18, 31], "path": 8, "autocast": 8, "op": 8, "elig": 8, "specif": [8, 17], "behavior": 8, "can": 8, "promot": 8, "widest": 8, "input": [8, 20], "type": [8, 28], "eas": [9, 13], "enabl": 9, "disabl": 9, "known": [9, 20, 34], "issu": [9, 20, 34], "motiv": 10, "usag": [10, 11, 12, 14, 16, 20, 26, 29, 31], "huggingfac": 10, "The": 10, "origin": 10, "command": 10, "ipex": [10, 28], "launch": [10, 31], "appli": 10, "forward": 10, "method": 10, "explicitli": 10, "instead": 10, "__call__": 10, "attr": 10, "alreadi": 10, "jit": 10, "trace": 10, "descript": [11, 12], "prerequisit": 11, "methodologi": [13, 28], "fusion": [13, 19], "pattern": 13, "fold": 13, "your_conf_fil": 14, "hyperparamet": 14, "launcher": [14, 32], "defin": [14, 15], "search": 14, "space": 14, "tune": [14, 16, 22, 33], "user": 14, "your_python_script": 14, "qconfig": 15, "prepar": 15, "do": 15, "convert": 15, "deploi": [15, 32], "recip": [16, 20, 22], "autotun": 16, "algorithm": 16, "alpha": [16, 34], "fix": 16, "determin": 16, "through": 16, "overview": [17, 28, 30, 31, 33], "requir": [17, 20], "code": 17, "folder": 17, "struct": 17, "kernel": [17, 18], "implement": [17, 20], "csrc": 17, "aten": [17, 18], "xyzkrnl": 17, "cpp": 17, "stub": 17, "xyz": 17, "h": 17, "dyndisp": 17, "dispatchstub": 17, "codegen": 17, "process": 17, "add": 17, "custom": [17, 28], "intrin": 17, "vec": 17, "privat": 17, "select": 17, "manual": 17, "check": 17, "what": [18, 34], "i": [18, 20, 31], "memori": [18, 31, 33], "format": 18, "all": [18, 31], "That": 18, "matter": 18, "nchw": 18, "b": 18, "nhwc": 18, "wip": 18, "block": 18, "nchw16c": 18, "stride": 18, "layout": 18, "tensor": 18, "creation": 18, "convers": 18, "d": 18, "coverag": 18, "statu": 18, "regist": [18, 32], "nativ": 18, "manner": 18, "onednn": [18, 33], "creat": [18, 32], "convolut": 18, "primit": [18, 33], "target": 18, "multistream": 20, "examples1": 20, "basic": 20, "examples2": 20, "set": 20, "examples3": 20, "structur": [20, 33], "output": 20, "perform": [20, 26, 30, 32, 33, 34], "asynchron": 20, "task": 20, "configur": [20, 30, 33], "core": [20, 31, 32], "bind": 20, "detail": 20, "how": 20, "iomp": 20, "preload": 20, "load": 20, "dure": 20, "split": 21, "sgd": 21, "stochast": 21, "gradient": 21, "descent": 21, "quant": 22, "quick": 23, "start": [23, 25, 32], "instal": [24, 32], "get": 25, "troubleshoot": 26, "regress": 26, "shape": 26, "result": [26, 34], "correct": 26, "licens": 27, "list": 28, "verifi": 28, "via": 28, "deepspe": [28, 29], "demo": 28, "linear": 28, "low": 28, "data": [28, 30], "indirect": 28, "access": [28, 33], "kv": 28, "cach": [28, 33], "transform": 29, "frontend": 29, "pseudocod": 29, "common": 29, "scenario": 29, "smoothquant": 29, "woq": 29, "center": 30, "product": 30, "v1": 30, "11": [30, 34], "number": [30, 31, 33], "accuraci": 30, "softwar": [30, 33], "version": 30, "hardwar": [30, 33], "200": [30, 34], "an": 30, "aw": 30, "ec2": 30, "c6i": 30, "2xlarg": 30, "10": [30, 34], "script": 31, "guid": [31, 33], "physic": 31, "ii": 31, "includ": 31, "logic": 31, "iii": 31, "node": 31, "iv": 31, "your": 31, "multipl": 31, "v": 31, "throughput": 31, "vi": 31, "latenc": 31, "vii": 31, "viii": 31, "index": 31, "jemalloc": [31, 33], "tcmalloc": [31, 33], "alloc": [31, 33], "openmp": [31, 33], "librari": 31, "gnu": [31, 33], "torchserv": 32, "content": [32, 33], "thi": [32, 33], "serv": 32, "pin": 32, "boost": 32, "multi": 32, "worker": 32, "scale": 32, "export": 32, "serial": 32, "file": 32, "archiv": 32, "3": [32, 34], "4": 32, "benchmark": 32, "non": 33, "uniform": 33, "numa": 33, "numactl": 33, "omp_num_thread": 33, "omp_thread_limit": 33, "denorm": 33, "releas": 34, "highlight": 34, "100": 34, "12": 34, "300": 34, "": 34, "chang": 34, "9": 34, "8": 34, "improv": 34, "other": 34, "note": 34}, "envversion": {"sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 58}, "alltitles": {"Intel\u00ae Extension for PyTorch* CPU ISA Dynamic Dispatch Design Doc": [[0, "intel-extension-for-pytorch-cpu-isa-dynamic-dispatch-design-doc"]], "Intel\u00ae Extension for PyTorch*": [[1, "intel-extension-for-pytorch"]], "Architecture": [[1, "architecture"]], "Support": [[1, "support"]], "API Documentation": [[2, "api-documentation"], [25, "api-documentation"]], "General": [[2, "general"]], "LLM Module Level Optimizations (Prototype)": [[2, "llm-module-level-optimizations-prototype"]], "Fast Bert (Prototype)": [[2, "fast-bert-prototype"], [6, "fast-bert-prototype"]], "Graph Optimization": [[2, "graph-optimization"], [7, "graph-optimization"], [13, "graph-optimization"], [28, "graph-optimization"]], "Quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "CPU Runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "Blogs & Publications": [[3, "blogs-publications"]], "Cheat Sheet": [[4, "cheat-sheet"]], "Contribution": [[5, "contribution"]], "Contributing to Intel\u00ae Extension for PyTorch*": [[5, "contributing-to-intel-extension-for-pytorch"]], "Developing Intel\u00ae Extension for PyTorch*": [[5, "developing-intel-extension-for-pytorch"]], "Tips and Debugging": [[5, "tips-and-debugging"]], "Unit testing": [[5, "unit-testing"]], "Python Unit Testing": [[5, "python-unit-testing"]], "Better local unit tests with pytest": [[5, "better-local-unit-tests-with-pytest"]], "Local linting": [[5, "local-linting"]], "C++ Unit Testing": [[5, "c-unit-testing"]], "Writing documentation": [[5, "writing-documentation"]], "Building documentation": [[5, "building-documentation"]], "Tips": [[5, "tips"]], "Examples": [[6, "examples"]], "Python": [[6, "python"]], "Training": [[6, "training"]], "Single-instance Training": [[6, "single-instance-training"]], "Float32": [[6, "float32"], [6, "id1"]], "BFloat16": [[6, "bfloat16"], [6, "id6"], [21, "bfloat16"], [26, "bfloat16"]], "Distributed Training": [[6, "distributed-training"]], "Inference": [[6, "inference"]], "Eager Mode": [[6, "eager-mode"], [6, "id7"]], "Resnet50": [[6, "resnet50"], [6, "id2"], [6, "id4"], [6, "id8"], [6, "id11"], [6, "id14"]], "BERT": [[6, "bert"], [6, "id3"], [6, "id5"], [6, "id9"], [6, "id12"], [6, "id15"], [32, "bert"]], "TorchScript Mode": [[6, "torchscript-mode"], [6, "id10"]], "TorchDynamo Mode (Beta, NEW feature from 2.0.0)": [[6, "torchdynamo-mode-beta-new-feature-from-2-0-0"], [6, "id13"]], "INT8": [[6, "int8"], [26, "int8"]], "Static Quantization": [[6, "static-quantization"], [15, "static-quantization"]], "Calibration": [[6, "calibration"]], "Deployment": [[6, "deployment"]], "Dynamic Quantization": [[6, "dynamic-quantization"], [15, "dynamic-quantization"]], "Large Language Model (LLM)": [[6, "large-language-model-llm"]], "FP32/BF16": [[6, "fp32-bf16"], [29, "fp32-bf16"]], "Smooth Quantization INT8": [[6, "smooth-quantization-int8"]], "Weight Only Quantization INT8/INT4": [[6, "weight-only-quantization-int8-int4"]], "C++": [[6, "c"]], "Intel\u00ae AI Reference Models": [[6, "intel-ai-reference-models"]], "Features": [[7, "features"]], "Easy-to-use Python API": [[7, "easy-to-use-python-api"]], "Large Language Models (LLM, NEW feature from 2.1.0)": [[7, "large-language-models-llm-new-feature-from-2-1-0"]], "torch.compile (Beta, NEW feature from 2.0.0)": [[7, "torch-compile-beta-new-feature-from-2-0-0"]], "ISA Dynamic Dispatching": [[7, "isa-dynamic-dispatching"], [17, "isa-dynamic-dispatching"]], "Auto Channels Last": [[7, "auto-channels-last"], [9, "auto-channels-last"]], "Auto Mixed Precision (AMP)": [[7, "auto-mixed-precision-amp"], [8, "auto-mixed-precision-amp"]], "Operator Optimization": [[7, "operator-optimization"]], "Optimizer Optimization": [[7, "optimizer-optimization"]], "Runtime Extension": [[7, "runtime-extension"], [20, "runtime-extension"], [26, "runtime-extension"]], "INT8 Quantization": [[7, "int8-quantization"]], "Codeless Optimization (Prototype, NEW feature from 1.13.0)": [[7, "codeless-optimization-prototype-new-feature-from-1-13-0"]], "Graph Capture (Prototype, NEW feature from 1.13.0)": [[7, "graph-capture-prototype-new-feature-from-1-13-0"]], "HyperTune (Prototype, NEW feature from 1.13.0)": [[7, "hypertune-prototype-new-feature-from-1-13-0"]], "Fast BERT Optimization (Prototype, NEW feature from 2.0.0)": [[7, "fast-bert-optimization-prototype-new-feature-from-2-0-0"]], "Introduction": [[8, "introduction"], [19, "introduction"], [25, "introduction"]], "Use Case": [[8, "use-case"]], "Default Precision": [[8, "default-precision"]], "Inference with Eager Path": [[8, "inference-with-eager-path"]], "Inference with TorchScript Path": [[8, "inference-with-torchscript-path"]], "Training Support": [[8, "training-support"]], "Autocast Op Reference": [[8, "autocast-op-reference"]], "Op Eligibility": [[8, "op-eligibility"]], "Op-Specific Behavior": [[8, "op-specific-behavior"]], "Ops that can autocast to bfloat16": [[8, "ops-that-can-autocast-to-bfloat16"]], "Ops that can autocast to float32": [[8, "ops-that-can-autocast-to-float32"]], "Ops that promote to the widest input type": [[8, "ops-that-promote-to-the-widest-input-type"]], "Ease-of-use auto channels last API": [[9, "ease-of-use-auto-channels-last-api"]], "default": [[9, "default"]], "enable": [[9, "enable"]], "disable": [[9, "disable"]], "Known issue": [[9, "known-issue"], [34, "known-issue"], [34, "id43"]], "Codeless Optimization (Prototype)": [[10, "codeless-optimization-prototype"]], "Motivation": [[10, "motivation"]], "Example Usage with HuggingFace": [[10, "example-usage-with-huggingface"]], "The origin command with ipex launch": [[10, "the-origin-command-with-ipex-launch"]], "Command to apply ipex optimization for FP32": [[10, "command-to-apply-ipex-optimization-for-fp32"]], "Command to apply ipex optimization for BF16": [[10, "command-to-apply-ipex-optimization-for-bf16"]], "Use Case not supported": [[10, "use-case-not-supported"]], "Module uses forward method explicitly instead of the __call__ attr": [[10, "module-uses-forward-method-explicitly-instead-of-the-call-attr"]], "Already using ipex.optimize": [[10, "already-using-ipex-optimize"]], "Already using Jit Trace": [[10, "already-using-jit-trace"]], "Fast BERT (Prototype)": [[11, "fast-bert-prototype"]], "Feature Description": [[11, "feature-description"], [12, "feature-description"]], "Prerequisite": [[11, "prerequisite"]], "Usage Example": [[11, "usage-example"], [12, "usage-example"], [16, "usage-example"]], "Graph Capture (Prototype)": [[12, "graph-capture-prototype"]], "Ease-of-use graph optimization API": [[13, "ease-of-use-graph-optimization-api"]], "FP32 and BF16 models": [[13, "fp32-and-bf16-models"]], "INT8 models": [[13, "int8-models"]], "Methodology": [[13, "methodology"]], "Fusion": [[13, "fusion"]], "FP32 and BF16 fusion patterns": [[13, "fp32-and-bf16-fusion-patterns"]], "INT8 fusion patterns": [[13, "int8-fusion-patterns"]], "Folding": [[13, "folding"]], "HyperTune (Prototype)": [[14, "hypertune-prototype"]], "Usage of Hypertune": [[14, "usage-of-hypertune"]], "your_conf_file": [[14, "your-conf-file"]], "Hyperparameters": [[14, "hyperparameters"]], "Launcher Hyperparameters": [[14, "launcher-hyperparameters"]], "Defining hyperparameters and their search spaces": [[14, "defining-hyperparameters-and-their-search-spaces"]], "1. Defining hyperparameters to tune:": [[14, "defining-hyperparameters-to-tune"]], "2. Defining the search spaces of the hyperparameters:": [[14, "defining-the-search-spaces-of-the-hyperparameters"]], "Default search space": [[14, "default-search-space"]], "User defined search space": [[14, "user-defined-search-space"]], "<your_python_script>": [[14, "your-python-script"]], "Usage Examples": [[14, "usage-examples"], [31, "usage-examples"]], "Intel\u00ae Extension for PyTorch* optimizations for quantization": [[15, "intel-extension-for-pytorch-optimizations-for-quantization"]], "Define qconfig": [[15, "define-qconfig"]], "Prepare Model and Do Calibration": [[15, "prepare-model-and-do-calibration"]], "Convert to Static Quantized Model and Deploy": [[15, "convert-to-static-quantized-model-and-deploy"]], "Define QConfig": [[15, "id1"]], "Prepare Model": [[15, "prepare-model"]], "Convert to Dynamic Quantized Model and Deploy": [[15, "convert-to-dynamic-quantized-model-and-deploy"]], "INT8 Recipe Tuning API (Prototype)": [[16, "int8-recipe-tuning-api-prototype"]], "Smooth Quantization Autotune": [[16, "smooth-quantization-autotune"]], "Algorithm: Auto-tuning of $\\alpha$.": [[16, "algorithm-auto-tuning-of-alpha"]], "$\\alpha$ Usage": [[16, "alpha-usage"]], "Using a fixed alpha": [[16, "using-a-fixed-alpha"]], "Determining the alpha through auto-tuning": [[16, "determining-the-alpha-through-auto-tuning"]], "Overview": [[17, "overview"], [30, "overview"], [31, "overview"], [33, "overview"]], "CPU ISA build compiler requirement": [[17, "cpu-isa-build-compiler-requirement"]], "Dynamic Dispatch Design": [[17, "dynamic-dispatch-design"]], "Code Folder Struct": [[17, "code-folder-struct"]], "Kernel implementation: csrc/cpu/aten/kernels/xyzKrnl.cpp": [[17, "kernel-implementation-csrc-cpu-aten-kernels-xyzkrnl-cpp"]], "Kernel Stub: csrc/cpu/aten/xyz.cpp and csrc/cpu/aten/xyz.h": [[17, "kernel-stub-csrc-cpu-aten-xyz-cpp-and-csrc-cpu-aten-xyz-h"]], "Dispatch Stub implementation: csrc/cpu/dyndisp/DispatchStub.cpp and csrc/cpu/dyndisp/DispatchStub.h": [[17, "dispatch-stub-implementation-csrc-cpu-dyndisp-dispatchstub-cpp-and-csrc-cpu-dyndisp-dispatchstub-h"]], "CodeGen Process": [[17, "codegen-process"]], "Add Custom Kernel": [[17, "add-custom-kernel"]], "ISA intrinics specific kernel example:": [[17, "isa-intrinics-specific-kernel-example"]], "Vec specific kernel example:": [[17, "vec-specific-kernel-example"]], "Private Debug APIs": [[17, "private-debug-apis"]], "Example:": [[17, "example"], [17, "id1"]], "Select ISA level manually.": [[17, "select-isa-level-manually"]], "CPU feature check": [[17, "cpu-feature-check"]], "Channels Last": [[18, "channels-last"], [33, "channels-last"]], "What is Channels Last": [[18, "what-is-channels-last"]], "Memory Format Is All That Matters": [[18, "memory-format-is-all-that-matters"]], "a. NCHW (default)": [[18, "a-nchw-default"]], "b. NHWC (WIP for CPU)": [[18, "b-nhwc-wip-for-cpu"]], "c. Blocked (nChw16c)": [[18, "c-blocked-nchw16c"]], "PyTorch Strided Layout": [[18, "pytorch-strided-layout"]], "PyTorch Channels Last Memory Format APIs": [[18, "pytorch-channels-last-memory-format-apis"]], "a. tensor creation": [[18, "a-tensor-creation"]], "b. tensor conversion": [[18, "b-tensor-conversion"]], "c. model conversion": [[18, "c-model-conversion"]], "d. operator coverage": [[18, "d-operator-coverage"]], "Writing Channels Last Kernels": [[18, "writing-channels-last-kernels"]], "a. Status on CPU": [[18, "a-status-on-cpu"]], "b. Register Channels Last Kernel in ATen Native Manner": [[18, "b-register-channels-last-kernel-in-aten-native-manner"]], "c. Register oneDNN Kernel on Channels Last": [[18, "c-register-onednn-kernel-on-channels-last"]], "oneDNN NHWC APIs": [[18, "onednn-nhwc-apis"]], "a. Create NHWC Memory": [[18, "a-create-nhwc-memory"]], "b. Create Convolution Primitive": [[18, "b-create-convolution-primitive"]], "CPU Channels Last Targets": [[18, "cpu-channels-last-targets"]], "Optimizer Fusion": [[19, "optimizer-fusion"]], "Operation Fusion": [[19, "operation-fusion"]], "Requirements": [[20, "requirements"]], "Use Cases": [[20, "use-cases"]], "Example of MultiStream Module": [[20, "example-of-multistream-module"]], "Examples1: Basic Usage": [[20, "examples1-basic-usage"]], "Examples2: Usage with \u201cAUTO\u201d setting": [[20, "examples2-usage-with-auto-setting"]], "Examples3: Usage for models with structure inputs/outputs": [[20, "examples3-usage-for-models-with-structure-inputs-outputs"]], "Performance recipes": [[20, "performance-recipes"]], "Known issues": [[20, "known-issues"], [34, "id37"]], "Example of asynchronous task": [[20, "example-of-asynchronous-task"]], "Example of configuring core binding": [[20, "example-of-configuring-core-binding"]], "Detail Design": [[20, "detail-design"]], "How the core binding is implemented": [[20, "how-the-core-binding-is-implemented"]], "Design of Task": [[20, "design-of-task"]], "IOMP preload or load during the runtime": [[20, "iomp-preload-or-load-during-the-runtime"]], "Split SGD": [[21, "split-sgd"], [21, "id2"]], "Stochastic Gradient Descent (SGD)": [[21, "stochastic-gradient-descent-sgd"]], "Smooth Quant Recipe Tuning API (Prototype)": [[22, "smooth-quant-recipe-tuning-api-prototype"]], "Quick Start": [[23, "quick-start"]], "LLM Quick Start": [[23, "llm-quick-start"]], "Installation": [[24, "installation"]], "Get Started": [[25, "get-started"]], "Troubleshooting": [[26, "troubleshooting"]], "General Usage": [[26, "general-usage"]], "Performance Regression": [[26, "performance-regression"]], "TorchDynamo": [[26, "torchdynamo"]], "Dynamic Shape": [[26, "dynamic-shape"]], "Result Correctness": [[26, "result-correctness"]], "License": [[27, "license"]], "Large Language Models (LLM) Optimization Overview": [[28, "large-language-models-llm-optimization-overview"]], "ipex.llm Optimized Model List": [[28, "ipex-llm-optimized-model-list"]], "Verified for single instance mode": [[28, "verified-for-single-instance-mode"]], "Verified for distributed inference mode via DeepSpeed": [[28, "verified-for-distributed-inference-mode-via-deepspeed"]], "Module Level Optimization API for customized LLM (Prototype)": [[28, "module-level-optimization-api-for-customized-llm-prototype"]], "Demos": [[28, "demos"]], "Optimization Methodologies": [[28, "optimization-methodologies"]], "Linear Operator Optimization": [[28, "linear-operator-optimization"]], "Low Precision Data Types": [[28, "low-precision-data-types"]], "Indirect Access KV Cache": [[28, "indirect-access-kv-cache"]], "Distributed Inference": [[28, "distributed-inference"]], "Transformers Optimization Frontend API": [[29, "transformers-optimization-frontend-api"]], "Pseudocode of Common Usage Scenarios": [[29, "pseudocode-of-common-usage-scenarios"]], "SmoothQuant": [[29, "smoothquant"]], "Weight Only Quantization (WOQ)": [[29, "weight-only-quantization-woq"]], "Distributed Inference with DeepSpeed": [[29, "distributed-inference-with-deepspeed"]], "Performance": [[30, "performance"], [34, "performance"]], "Performance Data for Intel\u00ae AI Data Center Products": [[30, "performance-data-for-intel-ai-data-center-products"]], "LLM Performance": [[30, "llm-performance"]], "INT8 with v1.11": [[30, "int8-with-v1-11"]], "Performance Numbers": [[30, "performance-numbers"], [30, "id1"], [30, "id4"]], "Accuracy": [[30, "accuracy"]], "Configuration": [[30, "configuration"], [30, "id2"], [30, "id5"]], "Software Version": [[30, "software-version"], [30, "id3"], [30, "id6"]], "Hardware Configuration": [[30, "hardware-configuration"], [30, "id7"], [33, "hardware-configuration"]], "FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance": [[30, "fp32-with-v1-11-200-on-an-aws-ec2-c6i-2xlarge-instance"]], "FP32 and BFloat16 with v1.10": [[30, "fp32-and-bfloat16-with-v1-10"]], "Launch Script Usage Guide": [[31, "launch-script-usage-guide"]], "Usage of launch script": [[31, "usage-of-launch-script"]], "Single instance for inference": [[31, "single-instance-for-inference"]], "I. Use all physical cores": [[31, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[31, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on designated nodes": [[31, "iii-use-physical-cores-on-designated-nodes"]], "IV. Use your designated number of cores": [[31, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[31, "multiple-instances-for-inference"]], "V. Throughput mode": [[31, "v-throughput-mode"]], "VI. Latency mode": [[31, "vi-latency-mode"]], "VII. Your designated number of instances": [[31, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[31, "viii-your-designated-number-of-instances-and-instance-index"]], "Usage of Jemalloc/TCMalloc/Default memory allocator": [[31, "usage-of-jemalloc-tcmalloc-default-memory-allocator"]], "Jemalloc": [[31, "jemalloc"], [33, "jemalloc"]], "TCMalloc": [[31, "tcmalloc"], [33, "tcmalloc"]], "Default memory allocator": [[31, "default-memory-allocator"]], "Usage of OpenMP library": [[31, "usage-of-openmp-library"]], "Intel OpenMP Library": [[31, "intel-openmp-library"]], "GNU OpenMP Library": [[31, "gnu-openmp-library"]], "TorchServe with Intel\u00ae Extension for PyTorch*": [[32, "torchserve-with-intel-extension-for-pytorch"]], "Contents of this Document": [[32, "contents-of-this-document"], [33, "contents-of-this-document"]], "Install Intel\u00ae Extension for PyTorch*": [[32, "install-intel-extension-for-pytorch"]], "Serving model with Intel\u00ae Extension for PyTorch*": [[32, "serving-model-with-intel-extension-for-pytorch"]], "TorchServe with Launcher": [[32, "torchserve-with-launcher"]], "Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference": [[32, "launcher-core-pinning-to-boost-performance-of-torchserve-multi-worker-inference"]], "Scaling workers": [[32, "scaling-workers"]], "Creating and Exporting INT8 model for Intel\u00ae Extension for PyTorch*": [[32, "creating-and-exporting-int8-model-for-intel-extension-for-pytorch"]], "1. Creating a serialized file": [[32, "creating-a-serialized-file"]], "ResNet50": [[32, "resnet50"]], "2. Creating a Model Archive": [[32, "creating-a-model-archive"]], "3. Start TorchServe to serve the model": [[32, "start-torchserve-to-serve-the-model"]], "4. Registering and Deploying model": [[32, "registering-and-deploying-model"]], "Benchmarking with Launcher": [[32, "benchmarking-with-launcher"]], "Benchmarking with Launcher Core Pinning": [[32, "benchmarking-with-launcher-core-pinning"]], "Performance Boost with Intel\u00ae Extension for PyTorch* and Launcher": [[32, "performance-boost-with-intel-extension-for-pytorch-and-launcher"]], "Performance Tuning Guide": [[33, "performance-tuning-guide"]], "Intel CPU Structure": [[33, "intel-cpu-structure"]], "Non-Uniform Memory Access (NUMA)": [[33, "non-uniform-memory-access-numa"]], "Software Configuration": [[33, "software-configuration"]], "Numactl": [[33, "numactl"]], "OpenMP": [[33, "openmp"]], "OMP_NUM_THREADS": [[33, "omp-num-threads"]], "OMP_THREAD_LIMIT": [[33, "omp-thread-limit"]], "GNU OpenMP": [[33, "gnu-openmp"]], "Intel OpenMP": [[33, "intel-openmp"]], "Memory Allocator": [[33, "memory-allocator"]], "Denormal Number": [[33, "denormal-number"]], "OneDNN primitive cache": [[33, "onednn-primitive-cache"]], "Releases": [[34, "releases"]], "2.3.0": [[34, "id1"]], "Highlights": [[34, "highlights"], [34, "id3"], [34, "id5"], [34, "id7"], [34, "id9"], [34, "id11"], [34, "id13"], [34, "id15"], [34, "id18"], [34, "id21"], [34, "id24"], [34, "id26"], [34, "id29"]], "2.2.0": [[34, "id2"]], "2.1.100": [[34, "id4"]], "2.1.0": [[34, "id6"]], "2.0.100": [[34, "id8"]], "2.0.0": [[34, "id10"]], "Known Issues": [[34, "known-issues"], [34, "id16"], [34, "id22"], [34, "id30"]], "1.13.100": [[34, "id12"]], "1.13.0": [[34, "id14"]], "1.12.300": [[34, "id17"]], "1.12.100": [[34, "id19"]], "1.12.0": [[34, "id20"]], "1.11.200": [[34, "id23"]], "1.11.0": [[34, "id25"]], "What\u2019s Changed": [[34, "what-s-changed"], [34, "id31"]], "1.10.100": [[34, "id27"]], "1.10.0": [[34, "id28"]], "1.9.0": [[34, "id32"]], "What\u2019s New": [[34, "what-s-new"], [34, "id34"], [34, "id36"], [34, "id39"], [34, "id42"]], "1.8.0": [[34, "id33"]], "1.2.0": [[34, "id35"]], "Performance Improvement": [[34, "performance-improvement"]], "Others": [[34, "others"]], "1.1.0": [[34, "id38"]], "1.0.2": [[34, "id40"]], "1.0.1-Alpha": [[34, "alpha"]], "1.0.0-Alpha": [[34, "id41"]], "Performance Result": [[34, "performance-result"]], "NOTE": [[34, "note"]]}, "indexentries": {"cpupool (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.CPUPool"]], "fastlayernorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.FastLayerNorm"]], "indirectaccesskvcacheattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.IndirectAccessKVCacheAttention"]], "linear2silumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.Linear2SiluMul"]], "linearadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAdd"]], "linearaddadd (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearAddAdd"]], "lineargelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearGelu"]], "linearmul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearMul"]], "linearnewgelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearNewGelu"]], "linearrelu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearRelu"]], "linearsilu (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSilu"]], "linearsilumul (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.LinearSiluMul"]], "multistreammodule (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModule"]], "multistreammodulehint (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.MultiStreamModuleHint"]], "pagedattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.PagedAttention"]], "rmsnorm (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RMSNorm"]], "rotaryembedding (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.RotaryEmbedding"]], "task (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.Task"]], "varlenattention (class in intel_extension_for_pytorch.llm.modules)": [[2, "intel_extension_for_pytorch.llm.modules.VarlenAttention"]], "autotune() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.autotune"]], "convert() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.convert"]], "enable_onednn_fusion() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.enable_onednn_fusion"]], "fast_bert() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.fast_bert"]], "fast_layer_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.fast_layer_norm"]], "get_core_list_of_node_id() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.get_core_list_of_node_id"]], "get_smooth_quant_qconfig_mapping() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.get_smooth_quant_qconfig_mapping"]], "indirect_access_kv_cache_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.indirect_access_kv_cache_attention"]], "intel_extension_for_pytorch": [[2, "module-intel_extension_for_pytorch"]], "intel_extension_for_pytorch.cpu.runtime": [[2, "module-intel_extension_for_pytorch.cpu.runtime"]], "intel_extension_for_pytorch.llm": [[2, "module-intel_extension_for_pytorch.llm"]], "intel_extension_for_pytorch.llm.functional": [[2, "module-intel_extension_for_pytorch.llm.functional"]], "intel_extension_for_pytorch.llm.modules": [[2, "module-intel_extension_for_pytorch.llm.modules"]], "intel_extension_for_pytorch.quantization": [[2, "module-intel_extension_for_pytorch.quantization"]], "is_runtime_ext_enabled() (in module intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.is_runtime_ext_enabled"]], "module": [[2, "module-intel_extension_for_pytorch"], [2, "module-intel_extension_for_pytorch.cpu.runtime"], [2, "module-intel_extension_for_pytorch.llm"], [2, "module-intel_extension_for_pytorch.llm.functional"], [2, "module-intel_extension_for_pytorch.llm.modules"], [2, "module-intel_extension_for_pytorch.quantization"]], "optimize() (in module intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.optimize"]], "optimize() (in module intel_extension_for_pytorch.llm)": [[2, "intel_extension_for_pytorch.llm.optimize"]], "pin (class in intel_extension_for_pytorch.cpu.runtime)": [[2, "intel_extension_for_pytorch.cpu.runtime.pin"]], "prepare() (in module intel_extension_for_pytorch.quantization)": [[2, "intel_extension_for_pytorch.quantization.prepare"]], "rms_norm() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rms_norm"]], "rotary_embedding() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.rotary_embedding"]], "varlen_attention() (in module intel_extension_for_pytorch.llm.functional)": [[2, "intel_extension_for_pytorch.llm.functional.varlen_attention"]], "verbose (class in intel_extension_for_pytorch)": [[2, "intel_extension_for_pytorch.verbose"]], "frozenbatchnorm2d (class in intel_extension_for_pytorch.nn)": [[7, "intel_extension_for_pytorch.nn.FrozenBatchNorm2d"]], "mergedembeddingbag (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBag"]], "mergedembeddingbagwithsgd (class in intel_extension_for_pytorch.nn.modules)": [[7, "intel_extension_for_pytorch.nn.modules.MergedEmbeddingBagWithSGD"]], "interaction() (in module intel_extension_for_pytorch.nn.functional)": [[7, "intel_extension_for_pytorch.nn.functional.interaction"]]}})
\ No newline at end of file
diff --git a/cpu/2.3.0+cpu/tutorials/api_doc.html b/cpu/2.3.0+cpu/tutorials/api_doc.html
index 2be3c8ff2..5424314e8 100644
--- a/cpu/2.3.0+cpu/tutorials/api_doc.html
+++ b/cpu/2.3.0+cpu/tutorials/api_doc.html
@@ -421,13 +421,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearSilu">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearSilu</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearSilu" title="Permalink to this definition"></a></dt>
 <dd><p>Applies a linear transformation to the <cite>input</cite> data, and then apply PyTorch SILU
-(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html</a>) on the result:</p>
-<blockquote>
-<div><p>result = torch.nn.functional.silu(linear(input))</p>
-</div></blockquote>
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html</a>)
+on the result:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">silu</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with silu.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with silu.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -451,9 +453,9 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dd><p>Applies a linear transformation to the <cite>input</cite> data, then apply PyTorch SILU
 (see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html</a>)
 on the result, and multiplies the result by <cite>other</cite>:</p>
-<blockquote>
-<div><p>result = torch.nn.functional.silu(linear(input)) * other</p>
-</div></blockquote>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">silu</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span> <span class="o">*</span> <span class="n">other</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
 <dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to
@@ -479,17 +481,20 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.Linear2SiluMul">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">Linear2SiluMul</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear_s</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">linear_m</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.Linear2SiluMul" title="Permalink to this definition"></a></dt>
-<dd><p>Applies two linear transformation to the <cite>input</cite> data (<cite>linear_s</cite> and <cite>linear_m</cite>), then apply PyTorch SILU
-(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html</a>) on the result from <cite>linear_s</cite>
-, and multiplies the result from <cite>linear_m</cite>:</p>
-<blockquote>
-<div><p>result = torch.nn.functional.silu(linear_s(input)) * linear_m(input)</p>
-</div></blockquote>
+<dd><p>Applies two linear transformation to the <cite>input</cite> data (<cite>linear_s</cite> and
+<cite>linear_m</cite>), then apply PyTorch SILU
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.silu.html</a>)
+on the result from <cite>linear_s</cite>, and multiplies the result from <cite>linear_m</cite>:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">silu</span><span class="p">(</span><span class="n">linear_s</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span> <span class="o">*</span> <span class="n">linear_m</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
 <dd class="field-odd"><ul class="simple">
-<li><p><strong>linear_s</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with silu.</p></li>
-<li><p><strong>linear_m</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with mul.</p></li>
+<li><p><strong>linear_s</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with silu.</p></li>
+<li><p><strong>linear_m</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with mul.</p></li>
 </ul>
 </dd>
 </dl>
@@ -513,13 +518,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearRelu">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearRelu</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearRelu" title="Permalink to this definition"></a></dt>
 <dd><p>Applies a linear transformation to the <cite>input</cite> data, and then apply PyTorch RELU
-(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html</a>) on the result:</p>
-<blockquote>
-<div><p>result = torch.nn.functional.relu(linear(input))</p>
-</div></blockquote>
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html</a>)
+on the result:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with relu.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module
+to be fused with relu.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -543,12 +550,13 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dd><p>Applies a linear transformation to the <cite>input</cite> data, and then apply NewGELUActivation
 (see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L50">https://github.com/huggingface/transformers/blob/main/src/transformers/activations.py#L50</a>)
 on the result:</p>
-<blockquote>
-<div><p>result = NewGELUActivation(linear(input))</p>
-</div></blockquote>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">NewGELUActivation</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with new_gelu.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module
+to be fused with new_gelu.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -570,13 +578,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearGelu">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearGelu</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearGelu" title="Permalink to this definition"></a></dt>
 <dd><p>Applies a linear transformation to the <cite>input</cite> data, and then apply PyTorch GELU
-(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html</a>) on the result:</p>
-<blockquote>
-<div><p>result = torch.nn.functional.gelu(linear(input))</p>
-</div></blockquote>
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html</a>)
+on the result:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">gelu</span><span class="p">(</span><span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">))</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with gelu.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with gelu.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -597,13 +607,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearMul">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearMul</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearMul" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>Applies a linear transformation to the <cite>input</cite> data, and then multiplies the result by <cite>other</cite>:</dt><dd><p>result = linear(input) * other</p>
-</dd>
-</dl>
+<dd><p>Applies a linear transformation to the <cite>input</cite> data, and then multiplies
+the result by <cite>other</cite>:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="o">*</span> <span class="n">other</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with mul.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module
+to be fused with mul.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -625,13 +637,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearAdd">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearAdd</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearAdd" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>Applies a linear transformation to the <cite>input</cite> data, and then add the result by <cite>other</cite>:</dt><dd><p>result = linear(input) + other</p>
-</dd>
-</dl>
+<dd><p>Applies a linear transformation to the <cite>input</cite> data,
+and then add the result by <cite>other</cite>:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="o">+</span> <span class="n">other</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with add.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with add.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -653,13 +667,15 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.LinearAddAdd">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">LinearAddAdd</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">linear</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.LinearAddAdd" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>Applies a linear transformation to the <cite>input</cite> data, and then add the result by <cite>other_1</cite> and <cite>other_2</cite>:</dt><dd><p>result = linear(input) + other_1 + other_2</p>
-</dd>
-</dl>
+<dd><p>Applies a linear transformation to the <cite>input</cite> data,
+and then add the result by <cite>other_1</cite> and <cite>other_2</cite>:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">linear</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="o">+</span> <span class="n">other_1</span> <span class="o">+</span> <span class="n">other_2</span>
+</pre></div>
+</div>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
-<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear module to be fused with add and add.</p>
+<dd class="field-odd"><p><strong>linear</strong> (<em>torch.nn.Linear module</em>) – the original torch.nn.Linear
+module to be fused with add and add.</p>
 </dd>
 </dl>
 <dl class="simple">
@@ -682,34 +698,38 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.RotaryEmbedding">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">RotaryEmbedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">max_position_embeddings</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">pos_embd_dim</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">base</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">10000</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">backbone</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">None</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.RotaryEmbedding" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>[module init and forward] Applies RotaryEmbedding (see <a class="reference external" href="https://huggingface.co/papers/2104.09864">https://huggingface.co/papers/2104.09864</a>)</dt><dd><p>on the <cite>query ` or `key</cite> before their multi-head attention computation.</p>
-</dd>
-</dl>
+<dd><p>[module init and forward] Applies RotaryEmbedding (see <a class="reference external" href="https://huggingface.co/papers/2104.09864">https://huggingface.co/papers/2104.09864</a>)
+on the <code class="docutils literal notranslate"><span class="pre">query</span></code> or <code class="docutils literal notranslate"><span class="pre">key</span></code> before their multi-head attention computation.</p>
+<p><cite>module init</cite></p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
 <dd class="field-odd"><ul class="simple">
-<li><p><strong>init</strong> (<em>module</em>) – </p></li>
-<li><p><strong>max_position_embeddings</strong> (<em>-</em>) – size (max) of the position embeddings.</p></li>
-<li><p><strong>pos_embd_dim</strong> (<em>-</em>) – dimension of the position embeddings.</p></li>
-<li><p><strong>base</strong> (<em>-</em>) – Default: 10000. Base to generate the frequency of position embeddings.</p></li>
-<li><p><strong>backbone</strong> (<em>-</em>) – Default: None. The exact transformers model backbone
+<li><p><strong>max_position_embeddings</strong> (<em>int</em>) – size (max) of the position embeddings.</p></li>
+<li><p><strong>pos_embd_dim</strong> (<em>int</em>) – dimension of the position embeddings.</p></li>
+<li><p><strong>base</strong> (<em>int</em>) – Default: 10000. Base to generate the frequency of position embeddings.</p></li>
+<li><p><strong>backbone</strong> (<em>str</em>) – Default: None. The exact transformers model backbone
 (e.g., “GPTJForCausalLM”, get from model.config.architectures[0],
 see <a class="reference external" href="https://huggingface.co/EleutherAI/gpt-j-6b/blob/main/config.json#L4">https://huggingface.co/EleutherAI/gpt-j-6b/blob/main/config.json#L4</a>).</p></li>
-<li><p><strong>forward</strong> – </p></li>
-<li><p><strong>input</strong> (<em>-</em>) – input to be applied with position embeddings,
+</ul>
+</dd>
+</dl>
+<p><cite>forward()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>input</strong> (<em>torch.Tensor</em>) – input to be applied with position embeddings,
 taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim]
 (as well as the output shape).</p></li>
-<li><p><strong>position_ids</strong> (<em>-</em>) – the according position_ids for the input.
+<li><p><strong>position_ids</strong> (<em>torch.Tensor</em>) – the according position_ids for the input.
 The shape should be [batch size, sequence length. In some cases,
 there is only one element which the past_kv_length, and position id
 can be constructed by past_kv_length + current_position.</p></li>
-<li><p><strong>num_head</strong> (<em>-</em>) – head num from the input shape.</p></li>
-<li><p><strong>head_dim</strong> (<em>-</em>) – head dim from the input shape.</p></li>
-<li><p><strong>offset</strong> (<em>-</em>) – the offset value. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,
+<li><p><strong>num_head</strong> (<em>int</em>) – head num from the input shape.</p></li>
+<li><p><strong>head_dim</strong> (<em>int</em>) – head dim from the input shape.</p></li>
+<li><p><strong>offset</strong> (<em>int</em>) – the offset value. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,
 so the offset is 1. For llama, cos/sin is applied to the neighboring rotary_dim elements,
 so the offset is rotary_dim/2.</p></li>
-<li><p><strong>rotary_ndims</strong> (<em>-</em>) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
+<li><p><strong>rotary_ndims</strong> (<em>int</em>) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
 </ul>
 </dd>
 </dl>
@@ -722,60 +742,62 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <span class="gp">&gt;&gt;&gt; </span><span class="n">query_rotery</span> <span class="o">=</span> <span class="n">rope_module</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">position_ids</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
 </pre></div>
 </div>
-<dl class="simple">
-<dt>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to be used on query and key</dt><dd><p>at the same time without initializing the module (assume rotary embedding
-sin/cos values are provided).</p>
-</dd>
-</dl>
-<p>Args:
-- query, key (torch.Tensor) : inputs to be applied with position embeddings, taking shape of</p>
-<blockquote>
-<div><p>[batch size, sequence length, num_head/num_kv_head, head_dim]
-or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p>
-</div></blockquote>
-<ul class="simple">
-<li><p>sin/cos (torch.Tensor): [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.</p></li>
-<li><p>rotary_ndims (int): the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
-<li><p>head_dim (int) : head dim from the input shape.</p></li>
-<li><dl class="simple">
-<dt>rotary_half (bool)<span class="classifier">if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,</span></dt><dd><p>so the offset is 1.
+<p>[Direct function call] This module also provides a <cite>.apply_function</cite> function call
+to be used on query and key at the same time without initializing the module
+(assume rotary embedding sin/cos values are provided).</p>
+<p><cite>apply_function()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – inputs to be applied with position embeddings, taking shape of
+[batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – inputs to be applied with position embeddings, taking shape of
+[batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p></li>
+<li><p><strong>sin/cos</strong> (<em>torch.Tensor</em>) – [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.</p></li>
+<li><p><strong>rotary_ndims</strong> (<em>int</em>) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
+<li><p><strong>head_dim</strong> (<em>int</em>) – head dim from the input shape.</p></li>
+<li><p><strong>rotary_half</strong> (<em>bool</em>) – if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,
+so the offset is 1.
 if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements,
-so the offset is rotary_dim/2.</p>
+so the offset is rotary_dim/2.</p></li>
+<li><p><strong>position_ids</strong> (<em>torch.Tensor</em>) – Default is None and optional if sin/cos is provided. the according position_ids
+for the input. The shape should be [batch size, sequence length].</p></li>
+</ul>
 </dd>
-</dl>
-</li>
-<li><dl class="simple">
-<dt>position_ids (torch.Tensor): Default is None and optional if sin/cos is provided. the according position_ids</dt><dd><p>for the input. The shape should be [batch size, sequence length].</p>
+<dt class="field-even">Returns<span class="colon">:</span></dt>
+<dd class="field-even"><p>[batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim].</p>
+</dd>
+<dt class="field-odd">Return type<span class="colon">:</span></dt>
+<dd class="field-odd"><p>query, key (torch.Tensor)</p>
 </dd>
 </dl>
-</li>
-</ul>
-<p>Return
-- query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim]</p>
-<blockquote>
-<div><p>or [num_tokens, num_head/num_kv_head, head_dim].</p>
-</div></blockquote>
 </dd></dl>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.RMSNorm">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">RMSNorm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">eps</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1e-06</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">None</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.RMSNorm" title="Permalink to this definition"></a></dt>
 <dd><p>[module init and forward] Applies RMSnorm on the input (hidden states).
-(see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76</a>)
-:param module init:
-:param - hidden_size: the size of the hidden states.
-:type - hidden_size: int
-:param - eps: the variance_epsilon to apply RMSnorm, default using 1e-6.
-:type - eps: float
-:param - weight: the weight to apply RMSnorm, default None and will use <cite>torch.ones(hidden_size)</cite>.
-:type - weight: torch.Tensor
-:param forward:
-:param - hidden_states: input to be applied RMSnorm, usually taking shape of</p>
-<blockquote>
-<div><p>[batch size, sequence length, hidden_size]
-(as well as the output shape).</p>
-</div></blockquote>
+(see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76</a>)</p>
+<p><cite>module init</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>hidden_size</strong> (<em>int</em>) – the size of the hidden states.</p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – the variance_epsilon to apply RMSnorm, default using 1e-6.</p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight to apply RMSnorm, default None
+and will use <cite>torch.ones(hidden_size)</cite>.</p></li>
+</ul>
+</dd>
+</dl>
+<p><cite>forward()</cite></p>
 <dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – input to be applied RMSnorm, usually taking shape of
+[batch size, sequence length, hidden_size] (as well as the output shape).</p>
+</dd>
 </dl>
 <p class="rubric">Examples</p>
 <div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># module init:</span>
@@ -785,36 +807,42 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <span class="gp">&gt;&gt;&gt; </span><span class="n">result</span> <span class="o">=</span> <span class="n">rmsnorm_module</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
-<dl class="simple">
-<dt>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to apply RMSNorm without</dt><dd><p>initializing the module.</p>
+<p>[Direct function call] This module also provides a <cite>.apply_function</cite> function
+call to apply RMSNorm without initializing the module.</p>
+<p><cite>apply_function()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – the input tensor to apply RMSNorm.</p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight to apply RMSnorm.</p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – the variance_epsilon to apply RMSnorm.</p></li>
+</ul>
 </dd>
 </dl>
-<p>Args:
-- hidden_states(torch.Tensor) : the input tensor to apply RMSNorm.
-- weight (torch.Tensor): the weight to apply RMSnorm.
-- eps (float) : the variance_epsilon to apply RMSnorm.</p>
 </dd></dl>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.FastLayerNorm">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">FastLayerNorm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">normalized_shape</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tuple</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="p"><span class="pre">...</span></span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">eps</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">None</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.modules.FastLayerNorm" title="Permalink to this definition"></a></dt>
 <dd><p>[module init and forward] Applies PyTorch Layernorm (see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html">https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html</a>)
-on the input (hidden states).
-:param module init:
-:param - normalized_shape:
-:type - normalized_shape: (int or list) or torch.Size
-:param - eps: a value added to the denominator for numerical stability.
-:type - eps: float
-:param - weight: the weight of Layernorm to apply normalization.
-:type - weight: torch.Tensor
-:param - bias: an additive bias for normalization.
-:type - bias: torch.Tensor
-:param forward:
-:param - hidden_states: input to be applied Layernorm, usually taking shape of</p>
-<blockquote>
-<div><p>[batch size, sequence length, hidden_size] (as well as the output shape).</p>
-</div></blockquote>
+on the input (hidden states).</p>
+<p><cite>module init</cite></p>
 <dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>normalized_shape</strong> (<em>(</em><em>int</em><em> or </em><em>list</em><em>) or </em><em>torch.Size</em>) – </p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – a value added to the denominator for numerical stability.</p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight of Layernorm to apply normalization.</p></li>
+<li><p><strong>bias</strong> (<em>torch.Tensor</em>) – an additive bias for normalization.</p></li>
+</ul>
+</dd>
+</dl>
+<p><cite>forward()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – input to be applied Layernorm, usually taking shape of
+[batch size, sequence length, hidden_size] (as well as the output shape).</p>
+</dd>
 </dl>
 <p class="rubric">Examples</p>
 <div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="c1"># module init:</span>
@@ -825,16 +853,20 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <span class="gp">&gt;&gt;&gt; </span><span class="n">result</span> <span class="o">=</span> <span class="n">layernorm_module</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
-<dl class="simple">
-<dt>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to apply fast layernorm</dt><dd><p>without initializing the module.</p>
+<p>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to apply fast layernorm
+without initializing the module.</p>
+<p><cite>apply_function()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – the input tensor to apply normalization.</p></li>
+<li><p><strong>normalized_shape</strong> (<em>int</em><em> or </em><em>list</em><em>) or </em><em>torch.Size</em>) – </p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight to apply normalization.</p></li>
+<li><p><strong>bias</strong> (<em>torch.Tensor</em>) – an additive bias for normalization.</p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – a value added to the denominator for numerical stability.</p></li>
+</ul>
 </dd>
 </dl>
-<p>Args:
-- hidden_states(torch.Tensor) : the input tensor to apply normalization.
-- normalized_shape (int or list) or torch.Size) input shape from an expected input of size.
-- weight (torch.Tensor): the weight to apply normalization.
-- bias (torch.Tensor): an additive bias for normalization.
-- eps (float): a value added to the denominator for numerical stability.</p>
 </dd></dl>
 
 <dl class="py class">
@@ -847,65 +879,67 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 performance bottleneck. This module provides an Indirect Access KV_cache(IAKV), Firstly, IAKV pre-allocates
 buffers(key and value use different buffers) to store all key/value hidden states and beam index information.
 It can use beam index history to decide which beam should be used by a timestamp and this information will
-generate an offset to access the kv_cache buffer.
-Data Format:
-- The shape of the pre-allocated key(value) buffer is [max_seq, beam*batch, head_num, head_size],</p>
-<blockquote>
-<div><p>the hidden state of key/value which is the shape of [beam*batch, head_num, head_size] is stored token by token.
+generate an offset to access the kv_cache buffer.</p>
+<p>Data Format:</p>
+<p>The shape of the pre-allocated key(value) buffer is [max_seq, beam*batch, head_num, head_size],
+the hidden state of key/value which is the shape of [beam*batch, head_num, head_size] is stored token by token.
 All beam idx information of every timestamp is also stored in a Tensor with the shape of [max_seq, beam*batch].</p>
-</div></blockquote>
-<p>[Module init and forward]
-Args:
-module init
-- text_max_length (int) : the max length of kv cache to be used for generation (allocate the pre-cache buffer).</p>
-<p>forward
-- query (torch.Tensor): Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- key (torch.Tensor): Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- value (torch.Tensor): Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- scale_attn (float):scale used by the attention layer. should be the  sqrt(head_size).
-- layer_past (tuple(torch.Tensor)): tuple(seq_info, key_cache, value_cache, beam-idx).</p>
-<blockquote>
-<div><p>key_cache: key cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);
-value_cache: value cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);
-beam-idx:  history beam idx, shape:(max_seq, beam*batch);
-seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).</p>
-</div></blockquote>
-<ul class="simple">
-<li><p>head_mask (torch.Tensor): Head mask tensor which is not supported by kernel yet.</p></li>
-<li><p>attention_mask(torch.Tensor): Attention mask information.</p></li>
-</ul>
-<p>Return:
-- attn_output:  weighted value which is the output of scale dot product. shape (beam*batch, seq_len, head_num, head_size).
-- attn_weights:  The output tensor of the first matmul in scale dot product which is not supported by kernel now.
-- new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).</p>
-<p>Notes:
-- How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model</p>
-<blockquote>
-<div><dl>
-<dt>see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318</a>)</dt><dd><dl>
-<dt>def _reorder_cache(</dt><dd><p>self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor</p>
-</dd>
-<dt>) -&gt; Tuple[Tuple[torch.Tensor]]:</dt><dd><dl>
-<dt>if (</dt><dd><p>len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1</p>
-</dd>
-<dt>):</dt><dd><dl class="simple">
-<dt>for layer_past in past_key_values:</dt><dd><p>layer_past[3][layer_past[0].size(-2) - 1] = beam_idx</p>
-</dd>
-</dl>
-<p>return past_key_values</p>
-</dd>
-</dl>
+<p><cite>module init</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><p><strong>text_max_length</strong> (<em>int</em>) – the max length of kv cache to be used
+for generation (allocate the pre-cache buffer).</p>
 </dd>
 </dl>
+<p><cite>forward()</cite></p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>value</strong> (<em>torch.Tensor</em>) – Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>scale_attn</strong> (<em>float</em>) – scale used by the attention layer. should be <code class="docutils literal notranslate"><span class="pre">sqrt(head_size)</span></code>.</p></li>
+<li><p><strong>layer_past</strong> (<em>tuple</em><em>(</em><em>torch.Tensor</em><em>)</em>) – <p>tuple(seq_info, key_cache, value_cache, beam-idx).</p>
+<ul>
+<li><p>key_cache: key cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);</p></li>
+<li><p>value_cache: value cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);</p></li>
+<li><p>beam-idx:  history beam idx, shape:(max_seq, beam*batch);</p></li>
+<li><p>seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).</p></li>
+</ul>
+</p></li>
+<li><p><strong>head_mask</strong> (<em>torch.Tensor</em>) – Head mask tensor which is not supported by kernel yet.</p></li>
+<li><p><strong>attention_mask</strong> (<em>torch.Tensor</em>) – Attention mask information.</p></li>
+</ul>
 </dd>
-</dl>
-</div></blockquote>
-<dl class="simple">
-<dt>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to apply IndirectAccessKVCacheAttention</dt><dd><p>without initializing the module.</p>
+<dt class="field-even">Returns<span class="colon">:</span></dt>
+<dd class="field-even"><p><p>Weighted value which is the output of scale dot product.
+shape (beam*batch, seq_len, head_num, head_size).</p>
+<p>attn_weights: The output tensor of the first matmul in scale dot product
+which is not supported by kernel now.</p>
+<p>new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).</p>
+</p>
 </dd>
-</dl>
-<p>Args:
-- The parameters are the same as the forward call.</p>
+<dt class="field-odd">Return type<span class="colon">:</span></dt>
+<dd class="field-odd"><p>attn_output</p>
+</dd>
+</dl>
+<p class="rubric">Notes</p>
+<p>How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model
+see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318</a>)</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_reorder_cache</span><span class="p">(</span>
+    <span class="bp">self</span><span class="p">,</span> <span class="n">past_key_values</span><span class="p">:</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]],</span> <span class="n">beam_idx</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span>
+<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]]:</span>
+    <span class="k">if</span> <span class="p">(</span>
+        <span class="nb">len</span><span class="p">(</span><span class="n">past_key_values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">past_key_values</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span>
+    <span class="p">):</span>
+        <span class="k">for</span> <span class="n">layer_past</span> <span class="ow">in</span> <span class="n">past_key_values</span><span class="p">:</span>
+            <span class="n">layer_past</span><span class="p">[</span><span class="mi">3</span><span class="p">][</span><span class="n">layer_past</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">beam_idx</span>
+        <span class="k">return</span> <span class="n">past_key_values</span>
+</pre></div>
+</div>
+<p>[Direct function call] This module also provides a <cite>.apply_function</cite> function call
+to apply IndirectAccessKVCacheAttention without initializing the module.</p>
+<p>The parameters of <cite>apply_function()</cite> are the same as the <cite>forward()</cite> call.</p>
 </dd></dl>
 
 <dl class="py class">
@@ -916,109 +950,98 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 for key/value cache. The basic logic as following figure. Firstly, The DRAM buffer which includes num_blocks
 are pre-allocated to store key or value cache. For every block, block_size tokens can be stored. In the forward
 pass, the cache manager will firstly allocate some slots from this buffer and use reshape_and_cache API to store
-the key/value and then use  single_query_cached_kv_attention API to do the scale-dot-product of MHA.
+the key/value and then use single_query_cached_kv_attention API to do the scale-dot-product of MHA.
 The block is basic allocation unit of paged attention and the token intra-block are stored one-by-one.
 The block tables are used to map the logical block of sequence into the physical block.</p>
 <p>[class method]: reshape_and_cache
-ipex.llm.modules.PagedAttention.reshape_and_cache(key,  value,  key_cache, value_cache, slot_mapping)
-This operator is used to store the key/value token states into the pre-allcated kv_cache buffers of paged attention.
-Args:
-- key (torch.Tensor):  The keytensor. The shape should be [num_seqs, num_heads, head_size].
-- value (torch.Tensor):  The value tensor. The shape should be [num_seqs, num_heads, head_size].
-- key_cache (torch.Tensor):  The pre-allocated buffer to store the key cache. The shape should be</p>
-<blockquote>
-<div><p>[num_blocks,  block_size, num_heads, head_size].</p>
-</div></blockquote>
-<ul class="simple">
-<li><dl class="simple">
-<dt>value_cache (torch.Tensor): The pre-allocated buffer to store the value cache. The shape should be</dt><dd><p>[num_blocks,  block_size, num_heads, head_size].</p>
-</dd>
-</dl>
-</li>
-<li><dl class="simple">
-<dt>slot_mapping (torch.Tensor):  It stores the position to store the key/value in the pre-allocated buffers.</dt><dd><p>The shape should be the number of sequences. For sequence _i_, the slot_mapping[i]//block_number
-can get the block index, and the slot_mapping%block_size can get the offset of this block.</p>
-</dd>
-</dl>
-</li>
+ipex.llm.modules.PagedAttention.reshape_and_cache(key, value, key_cache, value_cache, slot_mapping)
+This operator is used to store the key/value token states into the pre-allcated kv_cache buffers of paged attention.</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – The keytensor. The shape should be [num_seqs, num_heads, head_size].</p></li>
+<li><p><strong>value</strong> (<em>torch.Tensor</em>) – The value tensor. The shape should be [num_seqs, num_heads, head_size].</p></li>
+<li><p><strong>key_cache</strong> (<em>torch.Tensor</em>) – The pre-allocated buffer to store the key cache.
+The shape should be [num_blocks, block_size, num_heads, head_size].</p></li>
+<li><p><strong>value_cache</strong> (<em>torch.Tensor</em>) – The pre-allocated buffer to store the value cache.
+The shape should be [num_blocks, block_size, num_heads, head_size].</p></li>
+<li><p><strong>slot_mapping</strong> (<em>torch.Tensor</em>) – It stores the position to store the key/value in the pre-allocated buffers.
+The shape should be the number of sequences. For sequence <code class="docutils literal notranslate"><span class="pre">i</span></code>, the <code class="docutils literal notranslate"><span class="pre">slot_mapping[i]</span> <span class="pre">//</span> <span class="pre">block_number</span></code>
+can get the block index, and the <code class="docutils literal notranslate"><span class="pre">slot_mapping</span> <span class="pre">%</span> <span class="pre">block_size</span></code> can get the offset of this block.</p></li>
 </ul>
-<p>[class method]: single_query_cached_kv_attention
-ipex.llm.modules.PagedAttention.single_query_cached_kv_attention(</p>
-<blockquote>
-<div><p>out,
-query,
-key_cache,
-value_cache,
-head_mapping,
-scale,
-block_tables,
-context_lens,
-block_size,
-max_context_len,
-alibi_slopes
-)</p>
-</div></blockquote>
-<p>This operator is used to be calculated the scale-dot-product based on the paged attention.
-Args:
-- out (torch.Tensor): The output tensor with shape of [num_seqs, num_heads, head_size]. where the num_seqs</p>
-<blockquote>
-<div><p>is the number of the sequence in this batch. The num_heads means the number of query
-head. head_size means the head dimension.</p>
-</div></blockquote>
-<ul class="simple">
-<li><p>query (torch.Tensor): The query tensor. The shape should be [num_seqs, num_heads, head_size].</p></li>
-<li><dl class="simple">
-<dt>key_cache (torch.Tensor): The pre-allocated buffer to store the key cache. The shape should be</dt><dd><p>[num_blocks,  block_size, num_heads, head_size].</p>
-</dd>
-</dl>
-</li>
-<li><dl class="simple">
-<dt>value_cache(torch.Tensor): The pre-allocated buffer to store the value cache. The shape should be</dt><dd><p>[num_blocks,  block_size, num_heads, head_size].</p>
 </dd>
 </dl>
-</li>
-<li><dl class="simple">
-<dt>head_mapping(torch.Tensor): The mapping from the query head to the kv head. The shape should be</dt><dd><p>the number of query heads.</p>
-</dd>
-</dl>
-</li>
-<li><p>scale (float): The scale used by the scale-dot-product. In general, it is: float(1.0 / (head_size ** 0.5)).</p></li>
-<li><dl class="simple">
-<dt>block_tables:(torch.Tensor): The mapping table used to mapping the logical sequence to the physical sequence.</dt><dd><p>The shape should be [num_seqs, max_num_blocks_per_seq].</p>
+<p>[class method]: single_query_cached_kv_attention</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">ipex</span><span class="o">.</span><span class="n">llm</span><span class="o">.</span><span class="n">modules</span><span class="o">.</span><span class="n">PagedAttention</span><span class="o">.</span><span class="n">single_query_cached_kv_attention</span><span class="p">(</span>
+                                                    <span class="n">out</span><span class="p">,</span>
+                                                    <span class="n">query</span><span class="p">,</span>
+                                                    <span class="n">key_cache</span><span class="p">,</span>
+                                                    <span class="n">value_cache</span><span class="p">,</span>
+                                                    <span class="n">head_mapping</span><span class="p">,</span>
+                                                    <span class="n">scale</span><span class="p">,</span>
+                                                    <span class="n">block_tables</span><span class="p">,</span>
+                                                    <span class="n">context_lens</span><span class="p">,</span>
+                                                    <span class="n">block_size</span><span class="p">,</span>
+                                                    <span class="n">max_context_len</span><span class="p">,</span>
+                                                    <span class="n">alibi_slopes</span>
+                                                    <span class="p">)</span>
+</pre></div>
+</div>
+<p>This operator is used to be calculated the scale-dot-product based on the paged attention.</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>out</strong> (<em>torch.Tensor</em>) – The output tensor with shape of [num_seqs, num_heads, head_size],
+where the num_seqs is the number of the sequence in this batch. The num_heads
+means the number of query head. head_size means the head dimension.</p></li>
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – The query tensor. The shape should be [num_seqs, num_heads, head_size].</p></li>
+<li><p><strong>key_cache</strong> (<em>torch.Tensor</em>) – The pre-allocated buffer to store the key cache.
+The shape should be [num_blocks,  block_size, num_heads, head_size].</p></li>
+<li><p><strong>value_cache</strong> (<em>torch.Tensor</em>) – The pre-allocated buffer to store the value cache.
+The shape should be [num_blocks,  block_size, num_heads, head_size].</p></li>
+<li><p><strong>head_mapping</strong> (<em>torch.Tensor</em>) – The mapping from the query head to the kv head.
+The shape should be the number of query heads.</p></li>
+<li><p><strong>scale</strong> (<em>float</em>) – The scale used by the scale-dot-product.
+In general, it is: <code class="docutils literal notranslate"><span class="pre">float(1.0</span> <span class="pre">/</span> <span class="pre">(head_size</span> <span class="pre">**</span> <span class="pre">0.5))</span></code>.</p></li>
+<li><p><strong>block_tables</strong> – (torch.Tensor): The mapping table used to mapping the logical sequence
+to the physical sequence. The shape should be [num_seqs, max_num_blocks_per_seq].</p></li>
+<li><p><strong>context_lens</strong> (<em>torch.Tensor</em>) – The sequence length for every sequence. The size is [num_seqs].</p></li>
+<li><p><strong>block_size</strong> (<em>int</em>) – The block size which means the number of token in every block.</p></li>
+<li><p><strong>max_context_len</strong> (<em>int</em>) – The max sequence length.</p></li>
+<li><p><strong>alibi_slopes</strong> (<em>torch.Tensor</em><em>, </em><em>optinal</em>) – which is the alibi slope with the shape of (num_heads).</p></li>
+</ul>
 </dd>
 </dl>
-</li>
-<li><p>context_lens (torch.Tensor):  The sequence length for every sequence. The size is [num_seqs].</p></li>
-<li><p>block_size (int): The block size which means the number of token in every block.</p></li>
-<li><p>max_context_len (int): The max sequence length.</p></li>
-<li><p>alibi_slopes (torch.Tensor, optinal): which is the alibi slope with the shape of (num_heads).</p></li>
-</ul>
 </dd></dl>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="ipex.llm.modules.VarlenAttention">
 <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">ipex.llm.modules.</span></span><span class="sig-name descname"><span class="pre">VarlenAttention</span></span><a class="headerlink" href="#ipex.llm.modules.VarlenAttention" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>[module init and forward] Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value</dt><dd><p>(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html</a>),
+<dd><p>[module init and forward] Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html</a>),
 and accept the variant (different) sequence length among the query, key and value.</p>
-</dd>
-</dl>
+<p>This module does not have args for <cite>module init</cite>.</p>
+<p><cite>forward()</cite></p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
 <dd class="field-odd"><ul class="simple">
-<li><p><strong>init</strong> (<em>module</em>) – this module does not have args for module init</p></li>
-<li><p><strong>forward</strong> – </p></li>
-<li><p><strong>query</strong> (<em>-</em>) – shape [query_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>key</strong> (<em>-</em>) – shape [key_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>value</strong> (<em>-</em>) – shape [value_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>out</strong> (<em>-</em>) – buffer to get the results, the shape is the same as query.</p></li>
-<li><p><strong>seqlen_q</strong> (<em>-</em>) – shape [batch_size + 1], points the current query_tokens among total sequence length.</p></li>
-<li><p><strong>seqlen_k</strong> (<em>-</em>) – shape [batch_size + 1], points the current key_tokens among total sequence length.</p></li>
-<li><p><strong>max_seqlen_q</strong> (<em>-</em>) – max/total sequence length of query.</p></li>
-<li><p><strong>max_seqlen_k</strong> (<em>-</em>) – max/total sequence length of key.</p></li>
-<li><p><strong>pdropout</strong> (<em>-</em>) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.</p></li>
-<li><p><strong>softmax_scale</strong> (<em>-</em>) – scaling factor applied is prior to softmax.</p></li>
-<li><p><strong>is_causal</strong> (<em>-</em>) – whether to apply causal attention masking, default is True.</p></li>
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – shape [query_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – shape [key_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>value</strong> (<em>torch.Tensor</em>) – shape [value_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>out</strong> (<em>torch.Tensor</em>) – buffer to get the results, the shape is the same as query.</p></li>
+<li><p><strong>seqlen_q</strong> (<em>torch.Tensor</em>) – shape [batch_size + 1], points the
+current query_tokens among total sequence length.</p></li>
+<li><p><strong>seqlen_k</strong> (<em>torch.Tensor</em>) – shape [batch_size + 1], points the
+current key_tokens among total sequence length.</p></li>
+<li><p><strong>max_seqlen_q</strong> (<em>int</em>) – max/total sequence length of query.</p></li>
+<li><p><strong>max_seqlen_k</strong> (<em>int</em>) – max/total sequence length of key.</p></li>
+<li><p><strong>pdropout</strong> (<em>float</em>) – dropout probability; if greater than 0.0,
+dropout is applied, default is 0.0.</p></li>
+<li><p><strong>softmax_scale</strong> (<em>float</em>) – scaling factor applied is prior to softmax.</p></li>
+<li><p><strong>is_causal</strong> (<em>bool</em>) – whether to apply causal attention masking, default is True.</p></li>
 </ul>
 </dd>
 </dl>
@@ -1039,73 +1062,78 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 <span class="gp">&gt;&gt;&gt; </span><span class="n">varlenAttention_module</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">out</span><span class="p">,</span> <span class="n">seqlen_q</span><span class="p">,</span> <span class="n">seqlen_k</span><span class="p">,</span> <span class="n">max_seqlen_q</span><span class="p">,</span> <span class="n">max_seqlen_k</span><span class="p">,</span> <span class="n">pdropout</span><span class="p">,</span> <span class="n">softmax_scale</span><span class="p">)</span>
 </pre></div>
 </div>
-<dl class="simple">
-<dt>[Direct function call] This module also provides a <cite>.apply_function</cite> function call to apply VarlenAttention without</dt><dd><p>initializing the module.</p>
-</dd>
-</dl>
-<p>Args:
-- The parameters are the same as the forward call.</p>
+<p>[Direct function call] This module also provides a <cite>.apply_function</cite>
+function call to apply VarlenAttention without initializing the module.</p>
+<p>The parameters of <cite>apply_function()</cite> are the same as the <cite>forward()</cite> call.</p>
 </dd></dl>
 
 <span class="target" id="module-ipex.llm.functional"></span><dl class="py function">
 <dt class="sig sig-object py" id="ipex.llm.functional.rotary_embedding">
 <span class="sig-prename descclassname"><span class="pre">ipex.llm.functional.</span></span><span class="sig-name descname"><span class="pre">rotary_embedding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">key</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sin</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cos</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rotary_dim</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">rotary_half</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">position_ids</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">None</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.functional.rotary_embedding" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>Applies RotaryEmbedding (see <a class="reference external" href="https://huggingface.co/papers/2104.09864">https://huggingface.co/papers/2104.09864</a>)</dt><dd><p>on the <cite>query ` or `key</cite> before their multi-head attention computation.</p>
-</dd>
-</dl>
-<p>Args:
-- query, key (torch.Tensor) : inputs to be applied with position embeddings, taking shape of</p>
-<blockquote>
-<div><p>[batch size, sequence length, num_head/num_kv_head, head_dim]
-or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p>
-</div></blockquote>
-<ul class="simple">
-<li><p>sin/cos (torch.Tensor): [num_tokens, rotary_dim] the sin/cos value tensor generated to be applied on query/key.</p></li>
-<li><p>rotary_ndims (int): the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
-<li><p>head_dim (int) : head dim from the input shape.</p></li>
-<li><dl class="simple">
-<dt>rotary_half (bool)<span class="classifier">if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,</span></dt><dd><p>so the offset is 1.
-if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements,
+<dd><p>Applies RotaryEmbedding (see <a class="reference external" href="https://huggingface.co/papers/2104.09864">https://huggingface.co/papers/2104.09864</a>)
+on the <cite>query ` or `key</cite> before their multi-head attention computation.</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – inputs to be applied with position embeddings,
+taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – inputs to be applied with position embeddings,
+taking shape of [batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim] (as well as the output shape).</p></li>
+<li><p><strong>sin/cos</strong> (<em>torch.Tensor</em>) – [num_tokens, rotary_dim] the sin/cos value tensor
+generated to be applied on query/key.</p></li>
+<li><p><strong>rotary_ndims</strong> (<em>int</em>) – the rotary dimension. e.g., 64 for GPTJ. head size for LLama.</p></li>
+<li><p><strong>head_dim</strong> (<em>int</em>) – head dim from the input shape.</p></li>
+<li><p><strong>rotary_half</strong> (<em>bool</em>) – <p>if False. e.g., GPT-J 6B/ChatGLM, cos/sin is applied to the neighboring 2 elements,
+so the offset is 1.</p>
+<p>if True, e.g., for llama, cos/sin is applied to the neighboring rotary_dim elements,
 so the offset is rotary_dim/2.</p>
+</p></li>
+<li><p><strong>position_ids</strong> (<em>torch.Tensor</em>) – Default is None and optional if sin/cos is provided.
+The according position_ids for the input. The shape should be [batch size, sequence length].</p></li>
+</ul>
 </dd>
 </dl>
-</li>
-<li><dl class="simple">
-<dt>position_ids (torch.Tensor): Default is None and optional if sin/cos is provided. the according position_ids</dt><dd><p>for the input. The shape should be [batch size, sequence length].</p>
+<dl class="simple">
+<dt>Return</dt><dd><p>query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim]
+or [num_tokens, num_head/num_kv_head, head_dim].</p>
 </dd>
 </dl>
-</li>
-</ul>
-<p>Return
-- query, key (torch.Tensor): [batch size, sequence length, num_head/num_kv_head, head_dim]</p>
-<blockquote>
-<div><p>or [num_tokens, num_head/num_kv_head, head_dim].</p>
-</div></blockquote>
 </dd></dl>
 
 <dl class="py function">
 <dt class="sig sig-object py" id="ipex.llm.functional.rms_norm">
 <span class="sig-prename descclassname"><span class="pre">ipex.llm.functional.</span></span><span class="sig-name descname"><span class="pre">rms_norm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden_states</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">eps</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.functional.rms_norm" title="Permalink to this definition"></a></dt>
 <dd><p>Applies RMSnorm on the input (hidden states).
-(see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76</a>)
-Args:
-- hidden_states(torch.Tensor) : the input tensor to apply RMSNorm.
-- weight (torch.Tensor): the weight to apply RMSnorm.
-- eps (float) : the variance_epsilon to apply RMSnorm.</p>
+(see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L76</a>)</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – the input tensor to apply RMSNorm.</p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight to apply RMSnorm.</p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – the variance_epsilon to apply RMSnorm.</p></li>
+</ul>
+</dd>
+</dl>
 </dd></dl>
 
 <dl class="py function">
 <dt class="sig sig-object py" id="ipex.llm.functional.fast_layer_norm">
 <span class="sig-prename descclassname"><span class="pre">ipex.llm.functional.</span></span><span class="sig-name descname"><span class="pre">fast_layer_norm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">hidden_states</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">normalized_shape</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tuple</span><span class="p"><span class="pre">[</span></span><span class="pre">int</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="p"><span class="pre">...</span></span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">eps</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.functional.fast_layer_norm" title="Permalink to this definition"></a></dt>
 <dd><p>Applies PyTorch Layernorm (see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html">https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html</a>)
-on the input (hidden states).
-Args:
-- hidden_states(torch.Tensor) : the input tensor to apply normalization.
-- normalized_shape (int or list) or torch.Size) input shape from an expected input of size.
-- weight (torch.Tensor): the weight to apply normalization.
-- bias (torch.Tensor): an additive bias for normalization.
-- eps (float): a value added to the denominator for numerical stability.</p>
+on the input (hidden states).</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>hidden_states</strong> (<em>torch.Tensor</em>) – the input tensor to apply normalization.</p></li>
+<li><p><strong>normalized_shape</strong> (<em>int</em><em> or </em><em>list</em><em>) or </em><em>torch.Size</em>) – expected input of size.</p></li>
+<li><p><strong>weight</strong> (<em>torch.Tensor</em>) – the weight to apply normalization.</p></li>
+<li><p><strong>bias</strong> (<em>torch.Tensor</em>) – an additive bias for normalization.</p></li>
+<li><p><strong>eps</strong> (<em>float</em>) – a value added to the denominator for numerical stability.</p></li>
+</ul>
+</dd>
+</dl>
 </dd></dl>
 
 <dl class="py function">
@@ -1118,82 +1146,87 @@ <h2>LLM Module Level Optimizations (Prototype)<a class="headerlink" href="#llm-m
 performance bottleneck. This module provides an Indirect Access KV_cache(IAKV), Firstly, IAKV pre-allocates
 buffers(key and value use different buffers) to store all key/value hidden states and beam index information.
 It can use beam index history to decide which beam should be used by a timestamp and this information will
-generate an offset to access the kv_cache buffer.
-Data Format:
-- The shape of the pre-allocated key(value) buffer is [max_seq, beam*batch, head_num, head_size],</p>
-<blockquote>
-<div><p>the hidden state of key/value which is the shape of [beam*batch, head_num, head_size] is stored token by token.
+generate an offset to access the kv_cache buffer.</p>
+<p>Data Format:</p>
+<p>The shape of the pre-allocated key(value) buffer is [max_seq, beam*batch, head_num, head_size],
+the hidden state of key/value which is the shape of [beam*batch, head_num, head_size] is stored token by token.
 All beam idx information of every timestamp is also stored in a Tensor with the shape of [max_seq, beam*batch].</p>
-</div></blockquote>
-<p>forward
-- query (torch.Tensor): Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- key (torch.Tensor): Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- value (torch.Tensor): Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).
-- scale_attn (float):scale used by the attention layer. should be the  sqrt(head_size).
-- layer_past (tuple(torch.Tensor)): tuple(seq_info, key_cache, value_cache, beam-idx).</p>
-<blockquote>
-<div><p>key_cache: key cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);
-value_cache: value cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);
-beam-idx:  history beam idx, shape:(max_seq, beam*batch);
-seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).</p>
-</div></blockquote>
-<ul class="simple">
-<li><p>head_mask (torch.Tensor): Head mask tensor which is not supported by kernel yet.</p></li>
-<li><p>attention_mask(torch.Tensor): Attention mask information.</p></li>
-<li><p>text_max_length (int) : the max length of kv cache to be used for generation (allocate the pre-cache buffer).</p></li>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – Query tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – Key tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>value</strong> (<em>torch.Tensor</em>) – Value tensor; shape: (beam*batch, seq_len, head_num, head_dim).</p></li>
+<li><p><strong>scale_attn</strong> (<em>float</em>) – scale used by the attention layer. should be the  sqrt(head_size).</p></li>
+<li><p><strong>layer_past</strong> (<em>tuple</em><em>(</em><em>torch.Tensor</em><em>)</em>) – <p>tuple(seq_info, key_cache, value_cache, beam-idx).</p>
+<ul>
+<li><p>key_cache: key cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);</p></li>
+<li><p>value_cache: value cache tensor, shape: (max_seq, beam*batch,  head_num, head_dim);</p></li>
+<li><p>beam-idx:  history beam idx, shape:(max_seq, beam*batch);</p></li>
+<li><p>seq_info: Sequence info tensor, shape:(1, 1, max_seq, max_seq).</p></li>
+</ul>
+</p></li>
+<li><p><strong>head_mask</strong> (<em>torch.Tensor</em>) – Head mask tensor which is not supported by kernel yet.</p></li>
+<li><p><strong>attention_mask</strong> (<em>torch.Tensor</em>) – Attention mask information.</p></li>
+<li><p><strong>text_max_length</strong> (<em>int</em>) – the max length of kv cache to be used for generation
+(allocate the pre-cache buffer).</p></li>
 </ul>
-<p>Return:
-- attn_output:  weighted value which is the output of scale dot product. shape (beam*batch, seq_len, head_num, head_size).
-- attn_weights:  The output tensor of the first matmul in scale dot product which is not supported by kernel now.
-- new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).</p>
-<p>Notes:
-- How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model</p>
-<blockquote>
-<div><dl>
-<dt>see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318</a>)</dt><dd><dl>
-<dt>def _reorder_cache(</dt><dd><p>self, past_key_values: Tuple[Tuple[torch.Tensor]], beam_idx: torch.Tensor</p>
-</dd>
-<dt>) -&gt; Tuple[Tuple[torch.Tensor]]:</dt><dd><dl>
-<dt>if (</dt><dd><p>len(past_key_values[0]) == 4 and past_key_values[0][0].shape[-1] == 1</p>
-</dd>
-<dt>):</dt><dd><dl class="simple">
-<dt>for layer_past in past_key_values:</dt><dd><p>layer_past[3][layer_past[0].size(-2) - 1] = beam_idx</p>
-</dd>
-</dl>
-<p>return past_key_values</p>
-</dd>
-</dl>
 </dd>
-</dl>
+<dt class="field-even">Returns<span class="colon">:</span></dt>
+<dd class="field-even"><p><p>weighted value which is the output of scale dot product.
+shape (beam*batch, seq_len, head_num, head_size).</p>
+<p>attn_weights: the output tensor of the first matmul in scale dot product
+which is not supported by kernel now.</p>
+<p>new_layer_past: updated layer_past (seq_info, key_cache, value_cache, beam-idx).</p>
+</p>
 </dd>
-</dl>
-</div></blockquote>
+<dt class="field-odd">Return type<span class="colon">:</span></dt>
+<dd class="field-odd"><p>attn_output</p>
+</dd>
+</dl>
+<p class="rubric">Notes</p>
+<p>How to reorder KV cache when using the format of IndirectAccessKVCacheAttention (e.g., on llama model
+see <a class="reference external" href="https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318">https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1318</a>)</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_reorder_cache</span><span class="p">(</span>
+    <span class="bp">self</span><span class="p">,</span> <span class="n">past_key_values</span><span class="p">:</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]],</span> <span class="n">beam_idx</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span>
+<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Tuple</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]]:</span>
+    <span class="k">if</span> <span class="p">(</span>
+        <span class="nb">len</span><span class="p">(</span><span class="n">past_key_values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">past_key_values</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span>
+    <span class="p">):</span>
+        <span class="k">for</span> <span class="n">layer_past</span> <span class="ow">in</span> <span class="n">past_key_values</span><span class="p">:</span>
+            <span class="n">layer_past</span><span class="p">[</span><span class="mi">3</span><span class="p">][</span><span class="n">layer_past</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">beam_idx</span>
+        <span class="k">return</span> <span class="n">past_key_values</span>
+</pre></div>
+</div>
 </dd></dl>
 
 <dl class="py function">
 <dt class="sig sig-object py" id="ipex.llm.functional.varlen_attention">
 <span class="sig-prename descclassname"><span class="pre">ipex.llm.functional.</span></span><span class="sig-name descname"><span class="pre">varlen_attention</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">query</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">key</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">value</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seqlen_q</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">seqlen_k</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Tensor</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_seqlen_q</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_seqlen_k</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">pdropout</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">softmax_scale</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">float</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">zero_tensors</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">is_causal</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_softmax</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">gen_</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Generator</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#ipex.llm.functional.varlen_attention" title="Permalink to this definition"></a></dt>
-<dd><dl class="simple">
-<dt>Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value</dt><dd><p>(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html</a>),
+<dd><p>Applies PyTorch scaled_dot_product_attention on the inputs of query, key and value
+(see <a class="reference external" href="https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html">https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html</a>),
 and accept the variant (different) sequence length among the query, key and value.</p>
-</dd>
-</dl>
+<p>This module does not have args for <cite>module init</cite>.</p>
+<p><cite>forward()</cite></p>
 <dl class="field-list simple">
 <dt class="field-odd">Parameters<span class="colon">:</span></dt>
 <dd class="field-odd"><ul class="simple">
-<li><p><strong>init</strong> (<em>module</em>) – this module does not have args for module init</p></li>
-<li><p><strong>forward</strong> – </p></li>
-<li><p><strong>query</strong> (<em>-</em>) – shape [query_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>key</strong> (<em>-</em>) – shape [key_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>value</strong> (<em>-</em>) – shape [value_tokens, num_head, head_size], where tokens is total sequence length among batch size.</p></li>
-<li><p><strong>out</strong> (<em>-</em>) – buffer to get the results, the shape is the same as query.</p></li>
-<li><p><strong>seqlen_q</strong> (<em>-</em>) – shape [batch_size + 1], points the current query_tokens among total sequence length.</p></li>
-<li><p><strong>seqlen_k</strong> (<em>-</em>) – shape [batch_size + 1], points the current key_tokens among total sequence length.</p></li>
-<li><p><strong>max_seqlen_q</strong> (<em>-</em>) – max/total sequence length of query.</p></li>
-<li><p><strong>max_seqlen_k</strong> (<em>-</em>) – max/total sequence length of key.</p></li>
-<li><p><strong>pdropout</strong> (<em>-</em>) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.</p></li>
-<li><p><strong>softmax_scale</strong> (<em>-</em>) – scaling factor applied is prior to softmax.</p></li>
-<li><p><strong>is_causal</strong> (<em>-</em>) – whether to apply causal attention masking, default is True.</p></li>
+<li><p><strong>query</strong> (<em>torch.Tensor</em>) – shape [query_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>key</strong> (<em>torch.Tensor</em>) – shape [key_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>value</strong> (<em>torch.Tensor</em>) – shape [value_tokens, num_head, head_size],
+where tokens is total sequence length among batch size.</p></li>
+<li><p><strong>out</strong> (<em>torch.Tensor</em>) – buffer to get the results, the shape is the same as query.</p></li>
+<li><p><strong>seqlen_q</strong> (<em>torch.Tensor</em>) – shape [batch_size + 1],
+points the current query_tokens among total sequence length.</p></li>
+<li><p><strong>seqlen_k</strong> (<em>torch.Tensor</em>) – shape [batch_size + 1],
+points the current key_tokens among total sequence length.</p></li>
+<li><p><strong>max_seqlen_q</strong> (<em>int</em>) – max/total sequence length of query.</p></li>
+<li><p><strong>max_seqlen_k</strong> (<em>int</em>) – max/total sequence length of key.</p></li>
+<li><p><strong>pdropout</strong> (<em>float</em>) – dropout probability; if greater than 0.0, dropout is applied, default is 0.0.</p></li>
+<li><p><strong>softmax_scale</strong> (<em>float</em>) – scaling factor applied is prior to softmax.</p></li>
+<li><p><strong>is_causal</strong> (<em>bool</em>) – whether to apply causal attention masking, default is True.</p></li>
 </ul>
 </dd>
 </dl>
@@ -1607,7 +1640,7 @@ <h2>Graph Optimization<a class="headerlink" href="#graph-optimization" title="Pe
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de2dc47f0> 
+   <jinja2.runtime.BlockReference object at 0x7f705c2dca60> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/blogs_publications.html b/cpu/2.3.0+cpu/tutorials/blogs_publications.html
index b4704c089..90a817af4 100644
--- a/cpu/2.3.0+cpu/tutorials/blogs_publications.html
+++ b/cpu/2.3.0+cpu/tutorials/blogs_publications.html
@@ -167,7 +167,7 @@ <h1>Blogs &amp; Publications<a class="headerlink" href="#blogs-publications" tit
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dddff9040> 
+   <jinja2.runtime.BlockReference object at 0x7f705754e430> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/cheat_sheet.html b/cpu/2.3.0+cpu/tutorials/cheat_sheet.html
index 1e5479f88..c81b37a92 100644
--- a/cpu/2.3.0+cpu/tutorials/cheat_sheet.html
+++ b/cpu/2.3.0+cpu/tutorials/cheat_sheet.html
@@ -195,7 +195,7 @@ <h1>Cheat Sheet<a class="headerlink" href="#cheat-sheet" title="Permalink to thi
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de007b6a0> 
+   <jinja2.runtime.BlockReference object at 0x7f70570be0a0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/contribution.html b/cpu/2.3.0+cpu/tutorials/contribution.html
index 38cc63505..4b70ed395 100644
--- a/cpu/2.3.0+cpu/tutorials/contribution.html
+++ b/cpu/2.3.0+cpu/tutorials/contribution.html
@@ -331,7 +331,7 @@ <h4>Tips<a class="headerlink" href="#tips" title="Permalink to this heading">
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de009ee20> 
+   <jinja2.runtime.BlockReference object at 0x7f705c4064f0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/examples.html b/cpu/2.3.0+cpu/tutorials/examples.html
index d279b6cf7..6a1894c46 100644
--- a/cpu/2.3.0+cpu/tutorials/examples.html
+++ b/cpu/2.3.0+cpu/tutorials/examples.html
@@ -1567,7 +1567,7 @@ <h2>Intel® AI Reference Models<a class="headerlink" href="#intel-ai-reference-m
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0073e50> 
+   <jinja2.runtime.BlockReference object at 0x7f70570bea30> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features.html b/cpu/2.3.0+cpu/tutorials/features.html
index 864d49ac5..5335882f9 100644
--- a/cpu/2.3.0+cpu/tutorials/features.html
+++ b/cpu/2.3.0+cpu/tutorials/features.html
@@ -440,7 +440,7 @@ <h2>Fast BERT Optimization (Prototype, <em>NEW feature from 2.0.0</em>)<a class=
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de00520d0> 
+   <jinja2.runtime.BlockReference object at 0x7f70575e8820> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/amp.html b/cpu/2.3.0+cpu/tutorials/features/amp.html
index 6e33bccec..f608418d7 100644
--- a/cpu/2.3.0+cpu/tutorials/features/amp.html
+++ b/cpu/2.3.0+cpu/tutorials/features/amp.html
@@ -262,7 +262,7 @@ <h4>Ops that promote to the widest input type<a class="headerlink" href="#ops-th
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de2905e80> 
+   <jinja2.runtime.BlockReference object at 0x7f70575ea580> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html b/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html
index 97c304be0..a22880be5 100644
--- a/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html
+++ b/cpu/2.3.0+cpu/tutorials/features/auto_channels_last.html
@@ -192,7 +192,7 @@ <h2>Known issue<a class="headerlink" href="#known-issue" title="Permalink to thi
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0058730> 
+   <jinja2.runtime.BlockReference object at 0x7f705753fc10> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html b/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html
index 04709f865..2b2487f3d 100644
--- a/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html
+++ b/cpu/2.3.0+cpu/tutorials/features/codeless_optimization.html
@@ -280,7 +280,7 @@ <h3>Already using Jit Trace<a class="headerlink" href="#already-using-jit-trace"
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de00a9ee0> 
+   <jinja2.runtime.BlockReference object at 0x7f705758cac0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/fast_bert.html b/cpu/2.3.0+cpu/tutorials/features/fast_bert.html
index edaf2fadc..c854584a5 100644
--- a/cpu/2.3.0+cpu/tutorials/features/fast_bert.html
+++ b/cpu/2.3.0+cpu/tutorials/features/fast_bert.html
@@ -193,7 +193,7 @@ <h2>Usage Example<a class="headerlink" href="#usage-example" title="Permalink to
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0052a30> 
+   <jinja2.runtime.BlockReference object at 0x7f70575e8310> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/graph_capture.html b/cpu/2.3.0+cpu/tutorials/features/graph_capture.html
index 87687ec74..fbff2dcb3 100644
--- a/cpu/2.3.0+cpu/tutorials/features/graph_capture.html
+++ b/cpu/2.3.0+cpu/tutorials/features/graph_capture.html
@@ -179,7 +179,7 @@ <h2>Usage Example<a class="headerlink" href="#usage-example" title="Permalink to
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0062580> 
+   <jinja2.runtime.BlockReference object at 0x7f705758b2e0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html b/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html
index 5db6c7370..2a65a0391 100644
--- a/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html
+++ b/cpu/2.3.0+cpu/tutorials/features/graph_optimization.html
@@ -390,7 +390,7 @@ <h3>Folding<a class="headerlink" href="#folding" title="Permalink to this headin
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0058b20> 
+   <jinja2.runtime.BlockReference object at 0x7f705753fcd0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/hypertune.html b/cpu/2.3.0+cpu/tutorials/features/hypertune.html
index a8ea0d1fc..a1f42dd36 100644
--- a/cpu/2.3.0+cpu/tutorials/features/hypertune.html
+++ b/cpu/2.3.0+cpu/tutorials/features/hypertune.html
@@ -330,7 +330,7 @@ <h2>Usage Examples<a class="headerlink" href="#usage-examples" title="Permalink
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de2b1e2e0> 
+   <jinja2.runtime.BlockReference object at 0x7f705753f790> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/int8_overview.html b/cpu/2.3.0+cpu/tutorials/features/int8_overview.html
index 96345893a..fc9b5c40a 100644
--- a/cpu/2.3.0+cpu/tutorials/features/int8_overview.html
+++ b/cpu/2.3.0+cpu/tutorials/features/int8_overview.html
@@ -300,7 +300,7 @@ <h2>Convert to Dynamic Quantized Model and Deploy<a class="headerlink" href="#co
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0064af0> 
+   <jinja2.runtime.BlockReference object at 0x7f705754e970> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html b/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html
index e02145b99..28a1cbf52 100644
--- a/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html
+++ b/cpu/2.3.0+cpu/tutorials/features/int8_recipe_tuning_api.html
@@ -378,7 +378,7 @@ <h4>Determining the <code class="docutils literal notranslate"><span class="pre"
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de3032670> 
+   <jinja2.runtime.BlockReference object at 0x7f705e6832b0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html b/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html
index b89bfa622..0f1e5f25a 100644
--- a/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html
+++ b/cpu/2.3.0+cpu/tutorials/features/isa_dynamic_dispatch.html
@@ -742,7 +742,7 @@ <h2>CPU feature check<a class="headerlink" href="#cpu-feature-check" title="Perm
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de2b1e730> 
+   <jinja2.runtime.BlockReference object at 0x7f70575b8e80> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/nhwc.html b/cpu/2.3.0+cpu/tutorials/features/nhwc.html
index 8f50bde4b..7a6f937fd 100644
--- a/cpu/2.3.0+cpu/tutorials/features/nhwc.html
+++ b/cpu/2.3.0+cpu/tutorials/features/nhwc.html
@@ -370,7 +370,7 @@ <h2>CPU Channels Last Targets<a class="headerlink" href="#cpu-channels-last-targ
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de004d430> 
+   <jinja2.runtime.BlockReference object at 0x7f705c2eb490> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html b/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html
index aefc11f04..2e6815cf8 100644
--- a/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html
+++ b/cpu/2.3.0+cpu/tutorials/features/optimizer_fusion.html
@@ -184,7 +184,7 @@ <h2>Operation Fusion<a class="headerlink" href="#operation-fusion" title="Permal
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0084a00> 
+   <jinja2.runtime.BlockReference object at 0x7f7057588d30> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html b/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html
index 24a28544a..ab46da07d 100644
--- a/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html
+++ b/cpu/2.3.0+cpu/tutorials/features/runtime_extension.html
@@ -347,7 +347,7 @@ <h3>IOMP preload or load during the runtime<a class="headerlink" href="#iomp-pre
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de00a3b50> 
+   <jinja2.runtime.BlockReference object at 0x7f7057536670> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/split_sgd.html b/cpu/2.3.0+cpu/tutorials/features/split_sgd.html
index 42b4054c5..6926fbb58 100644
--- a/cpu/2.3.0+cpu/tutorials/features/split_sgd.html
+++ b/cpu/2.3.0+cpu/tutorials/features/split_sgd.html
@@ -218,7 +218,7 @@ <h2>Split SGD<a class="headerlink" href="#id2" title="Permalink to this heading"
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde03b0d0> 
+   <jinja2.runtime.BlockReference object at 0x7f7057517e80> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html b/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html
index b1fce77e9..cc2926a9a 100644
--- a/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html
+++ b/cpu/2.3.0+cpu/tutorials/features/sq_recipe_tuning_api.html
@@ -209,7 +209,7 @@ <h1>Smooth Quant Recipe Tuning API (Prototype)<a class="headerlink" href="#smoot
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde015610> 
+   <jinja2.runtime.BlockReference object at 0x7f705c5e2cd0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/getting_started.html b/cpu/2.3.0+cpu/tutorials/getting_started.html
index d16fe6df4..79461e521 100644
--- a/cpu/2.3.0+cpu/tutorials/getting_started.html
+++ b/cpu/2.3.0+cpu/tutorials/getting_started.html
@@ -282,7 +282,7 @@ <h2>LLM Quick Start<a class="headerlink" href="#llm-quick-start" title="Permalin
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dddfe12b0> 
+   <jinja2.runtime.BlockReference object at 0x7f7056f4bc40> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/installation.html b/cpu/2.3.0+cpu/tutorials/installation.html
index 8835193ef..7956f4699 100644
--- a/cpu/2.3.0+cpu/tutorials/installation.html
+++ b/cpu/2.3.0+cpu/tutorials/installation.html
@@ -132,7 +132,7 @@ <h1>Installation<a class="headerlink" href="#installation" title="Permalink to t
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dddba34c0> 
+   <jinja2.runtime.BlockReference object at 0x7f70575440a0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/introduction.html b/cpu/2.3.0+cpu/tutorials/introduction.html
index 1f2723f33..54b0b891e 100644
--- a/cpu/2.3.0+cpu/tutorials/introduction.html
+++ b/cpu/2.3.0+cpu/tutorials/introduction.html
@@ -156,7 +156,7 @@ <h2>API Documentation<a class="headerlink" href="#api-documentation" title="Perm
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde037eb0> 
+   <jinja2.runtime.BlockReference object at 0x7f70570c9130> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/known_issues.html b/cpu/2.3.0+cpu/tutorials/known_issues.html
index 330a7f99c..4ece9f8c2 100644
--- a/cpu/2.3.0+cpu/tutorials/known_issues.html
+++ b/cpu/2.3.0+cpu/tutorials/known_issues.html
@@ -316,7 +316,7 @@ <h2>Result Correctness<a class="headerlink" href="#result-correctness" title="Pe
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dddfe1e80> 
+   <jinja2.runtime.BlockReference object at 0x7f7057547220> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/license.html b/cpu/2.3.0+cpu/tutorials/license.html
index 8eab4ec9a..43ac073d6 100644
--- a/cpu/2.3.0+cpu/tutorials/license.html
+++ b/cpu/2.3.0+cpu/tutorials/license.html
@@ -132,7 +132,7 @@ <h1>License<a class="headerlink" href="#license" title="Permalink to this headin
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde020970> 
+   <jinja2.runtime.BlockReference object at 0x7f705756d430> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/llm.html b/cpu/2.3.0+cpu/tutorials/llm.html
index 3228a7eaa..9a32a6a92 100644
--- a/cpu/2.3.0+cpu/tutorials/llm.html
+++ b/cpu/2.3.0+cpu/tutorials/llm.html
@@ -585,13 +585,13 @@ <h3>Verified for distributed inference mode via DeepSpeed<a class="headerlink" h
   <li><p>🟨 signifies that the model can perform well while accuracy may not been in a perfect state (&gt;1% difference as compared with FP32).</p></li>
 </ul>
 <p><em>Note</em>: The above verified models (including other models in the same model family, like “codellama/CodeLlama-7b-hf” from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.</p>
-<p>Please check <a class="reference external" href="https://github.com/intel/intel-extension-for-pytorch/tree/main/tutorials/../../examples/cpu/inference/python/llm">LLM best known practice</a> for instructions to install/setup environment and example scripts.</p>
+<p>Please check <a class="reference external" href="https://github.com/intel/intel-extension-for-pytorch/tree/v2.3.0%2Bcpu/examples/cpu/inference/python/llm">LLM best known practice</a> for instructions to install/setup environment and example scripts.</p>
 </section>
 </section>
 <section id="module-level-optimization-api-for-customized-llm-prototype">
 <h2>Module Level Optimization API for customized LLM (Prototype)<a class="headerlink" href="#module-level-optimization-api-for-customized-llm-prototype" title="Permalink to this heading"></a></h2>
 <p>In the past year, LLM has been flourishing with many open-sourced models contributed to the community, while researchers are building their own LLMs from transformer blocks with variants in implementation details. To help LLM researchers and developers improve their productivity, Intel® Extension for PyTorch* provides module level optimizations for commonly used LLM modules and functionalities, which are operators or certain operator combinations in nature.</p>
-<p>Please check <a class="reference external" href="https://github.com/intel/intel-extension-for-pytorch/tree/main/tutorials/../../examples/cpu/inference/python/llm-modeling">LLM module level optimization practice</a> to better understand how to use <a class="reference external" href="api_doc.html#llm-module-level-optimizations">module level APIs</a> to optimize your LLM and achieve better performance.</p>
+<p>Please check <a class="reference external" href="https://github.com/intel/intel-extension-for-pytorch/tree/v2.3.0%2Bcpu/examples/cpu/inference/python/llm-modeling">LLM module level optimization practice</a> to better understand how to use <a class="reference external" href="api_doc.html#llm-module-level-optimizations-prototype">module level APIs</a> to optimize your LLM and achieve better performance.</p>
 </section>
 <section id="demos">
 <h2>Demos<a class="headerlink" href="#demos" title="Permalink to this heading"></a></h2>
@@ -713,7 +713,7 @@ <h3>Distributed Inference<a class="headerlink" href="#distributed-inference" tit
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddd9c5220> 
+   <jinja2.runtime.BlockReference object at 0x7f7056fe2f10> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html b/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html
index c2823a660..f49dcb9d6 100644
--- a/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html
+++ b/cpu/2.3.0+cpu/tutorials/llm/llm_optimize.html
@@ -266,7 +266,7 @@ <h3>Distributed Inference with DeepSpeed<a class="headerlink" href="#distributed
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddd95d7c0> 
+   <jinja2.runtime.BlockReference object at 0x7f70575118e0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/performance.html b/cpu/2.3.0+cpu/tutorials/performance.html
index 815f64565..c9ed2e7f8 100644
--- a/cpu/2.3.0+cpu/tutorials/performance.html
+++ b/cpu/2.3.0+cpu/tutorials/performance.html
@@ -1038,7 +1038,7 @@ <h4>Hardware Configuration<a class="headerlink" href="#id7" title="Permalink to
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de2d5f7c0> 
+   <jinja2.runtime.BlockReference object at 0x7f70575a7d30> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html
index 297780342..f38cebcac 100644
--- a/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html
+++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/launch_script.html
@@ -829,7 +829,7 @@ <h4>GNU OpenMP Library<a class="headerlink" href="#gnu-openmp-library" title="Pe
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dddff9940> 
+   <jinja2.runtime.BlockReference object at 0x7f7056edca00> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html
index 80dcd7dae..e38ab0543 100644
--- a/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html
+++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/torchserve.html
@@ -462,7 +462,7 @@ <h2>Performance Boost with Intel® Extension for PyTorch* and Launcher<a class="
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4dde03b610> 
+   <jinja2.runtime.BlockReference object at 0x7f705c27df70> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html b/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html
index 963dbdcd1..04ee95764 100644
--- a/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html
+++ b/cpu/2.3.0+cpu/tutorials/performance_tuning/tuning_guide.html
@@ -365,7 +365,7 @@ <h3>OneDNN primitive cache<a class="headerlink" href="#onednn-primitive-cache" t
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4ddd8e2490> 
+   <jinja2.runtime.BlockReference object at 0x7f7056e408b0> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 
 
diff --git a/cpu/2.3.0+cpu/tutorials/releases.html b/cpu/2.3.0+cpu/tutorials/releases.html
index abca321a6..929e04127 100644
--- a/cpu/2.3.0+cpu/tutorials/releases.html
+++ b/cpu/2.3.0+cpu/tutorials/releases.html
@@ -1337,7 +1337,7 @@ <h3>NOTE<a class="headerlink" href="#note" title="Permalink to this heading">
   Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
     <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
     provided by <a href="https://readthedocs.org">Read the Docs</a>.
-   <jinja2.runtime.BlockReference object at 0x7f4de0054850> 
+   <jinja2.runtime.BlockReference object at 0x7f70570d7940> 
   <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>