update dependency version (#3895)

* add torch-ccl into compile bundle * fix dead link in doc * update footer link * update deepspeed dependency version, remove cpu related md files from build_doc.sh * add xpu perf * version to 2.1.20 * fix example import * update torch ccl version * add mpi path in the scripts * update dependency version * move known issue to tutorial repo * update known issue link * add note for not contain cpu features * update log version * update feature and example doc * update model zoo version * add paper to publications * remove cheetsheet --------- Co-authored-by: Zheng, Zhaoqiong <zhaoqiong.zheng@intel.com> Co-authored-by: Ye Ting <ting.ye@intel.com>
intel · Mar 30, 2024 · cc1a83e · cc1a83e
1 parent 716d786
commit cc1a83e
Show file tree

Hide file tree

Showing 24 changed files with 222 additions and 419 deletions.
diff --git a/dependency_version.yml b/dependency_version.yml
@@ -4,21 +4,21 @@ gcc:
 llvm:
   version: 16.0.6
 pytorch:
-  version: 2.1.0a0
+  version: 2.1.0.post0+cxx11.abi
   commit: v2.1.0
 torchaudio:
-  version: 2.1.0a0
+  version: 2.1.0.post0+cxx11.abi
   commit: v2.1.0
 torchvision:
-  version: 0.16.0a0
+  version: 0.16.0.post0+cxx11.abi
   commit: v0.16.0
 torch-ccl:
   repo: https://github.com/intel/torch-ccl.git
-  commit: 5f20135ccf8f828738cb3bc5a5ae7816df8100ae
-  version: 2.1.100+xpu
+  commit: 5ee65b42c42a0d91c4cf459d9be40020274003b6
+  version: 2.1.200+xpu
 deepspeed:
   repo: https://github.com/microsoft/DeepSpeed.git
-  version: 
+  version: v0.11.2
   commit: 4fc181b01077521ba42379013ce91a1c294e5d8e
 intel-extension-for-deepspeed:
   repo: https://github.com/intel/intel-extension-for-deepspeed.git
@@ -28,7 +28,7 @@ transformers:
   commit: v4.31.0
 protobuf:
   version: 3.20.3
-llm_eval:
+lm_eval:
   version: 0.3.0
 basekit:
   dpcpp-cpp-rt:

diff --git a/docs/_static/custom.css b/docs/_static/custom.css
@@ -15,6 +15,9 @@
 a#wap_dns {
    display: none;
 }
+a#wap_nac {
+   display: none;
+}
 
 /* replace the copyright to eliminate the copyright symbol enforced by
    the ReadTheDocs theme */

diff --git a/docs/_templates/footer.html b/docs/_templates/footer.html
@@ -1,3 +1,3 @@
 {% extends '!footer.html' %} {% block extrafooter %} {{super}} 
-<p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a data-wap_ref='dns' id='wap_dns' href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html'>| Do Not Share My Personal Information</a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
+<p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a> <a href="/#" data-wap_ref="dns" id="wap_dns"><small>Your Privacy Choices</small></a> <a href=https://www.intel.com/content/www/us/en/privacy/privacy-residents-certain-states.html data-wap_ref="nac" id="wap_nac"><small>Notice at Collection</small></a> </div> <p></p> <div>&copy; Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (OBSD), <a href='http://opensource.org/licenses/0BSD'>http://opensource.org/licenses/0BSD</a>. </div>
 {% endblock %}
diff --git a/docs/index.rst b/docs/index.rst
@@ -15,7 +15,7 @@ Large Language Models (LLMs) are introduced in the Intel® Extension for PyTorch
 The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts, users can enable it dynamically by importing ``intel_extension_for_pytorch``.
 
 .. note:: 
-
+   - CPU features are not included in GPU-only packages.   
    - GPU features are not included in CPU-only packages.
    - Optimizations for CPU-only may have a newer code base due to different development schedules.
 
@@ -26,8 +26,8 @@ Intel® Extension for PyTorch* has been released as an open–source project at
 
 You can find more information about the product at:
 
-- `Features <https://intel.github.io/intel-extension-for-pytorch/gpu/latest/tutorials/features>`_
-- `Performance <./tutorials/performance.html>`_ 
+- `Features <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features>`_
+- `Performance <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/performance>`_ 
 
 Architecture
 ------------
@@ -62,7 +62,7 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
    tutorials/performance
    tutorials/technical_details
    tutorials/releases
-   tutorials/performance_tuning/known_issues
+   tutorials/known_issues
    tutorials/blogs_publications
    tutorials/license
 
@@ -74,7 +74,6 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
    tutorials/installation
    tutorials/getting_started
    tutorials/examples
-   tutorials/cheat_sheet
 
 .. toctree::
    :maxdepth: 3

diff --git a/docs/tutorials/api_doc.rst b/docs/tutorials/api_doc.rst
@@ -9,7 +9,7 @@ Device-Agnostic
 .. autofunction:: optimize_transformers
 .. autofunction:: get_fp32_math_mode
 .. autofunction:: set_fp32_math_mode
-.. autoclass:: verbose
+
 
 GPU-Specific
 ************
@@ -43,8 +43,7 @@ Miscellaneous
 
 .. currentmodule:: intel_extension_for_pytorch.xpu.fp8.fp8
 .. autofunction:: fp8_autocast
-.. currentmodule:: intel_extension_for_pytorch.quantization
-.. autofunction:: _gptq
+
 
 Random Number Generator
 =======================

diff --git a/docs/tutorials/blogs_publications.md b/docs/tutorials/blogs_publications.md
@@ -1,6 +1,7 @@
 Blogs & Publications
 ====================
 
+* [LLM inference solution on Intel GPU, Dec 2023](https://arxiv.org/abs/2401.05391)
 * [Accelerate Llama 2 with Intel AI Hardware and Software Optimizations, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/news/llama2.html)
 * [Accelerate PyTorch\* Training and Inference Performance using Intel® AMX, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-training-inference-on-amx.html)
 * [Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)

diff --git a/docs/tutorials/cheat_sheet.md b/docs/tutorials/cheat_sheet.md
diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -4,8 +4,6 @@ Examples
 These examples will help you get started using Intel® Extension for PyTorch\*
 with Intel GPUs.
 
-For examples on Intel CPUs, check the [CPU examples](../../../cpu/latest/tutorials/examples.html).
-
 **Prerequisites**:
 Before running these examples, install the `torchvision` and `transformers` Python packages.
 
@@ -27,7 +25,7 @@ Before running these examples, install the `torchvision` and `transformers` Pyth
 To use Intel® Extension for PyTorch\* on training, you need to make the following changes in your code:
 
 1. Import `intel_extension_for_pytorch` as `ipex`.
-2. Use the `ipex.optimize` function, which applies optimizations against the model object, as well as an optimizer object.
+2. Use the `ipex.optimize` function for additional performance boost, which applies optimizations against the model object, as well as an optimizer object.
 3. Use Auto Mixed Precision (AMP) with BFloat16 data type.
 4. Convert input tensors, loss criterion and model to XPU, as shown below:
 
@@ -219,18 +217,20 @@ The <LIBPYTORCH_PATH> is the absolute path of libtorch we install at the first s
 
 If *Found IPEX* is shown as dynamic library paths, the extension was linked into the binary. This can be verified with the Linux command *ldd*.
 
+The value of x, y, z in the following log will change depending on the version you choose.
+
 ```bash
 $ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
--- The C compiler identification is IntelLLVM 2024.0.0
--- The CXX compiler identification is IntelLLVM 2024.0.0
+-- The C compiler identification is IntelLLVM 202x.y.z
+-- The CXX compiler identification is IntelLLVM 202x.y.z
 -- Detecting C compiler ABI info
 -- Detecting C compiler ABI info - done
--- Check for working C compiler: /workspace/intel/oneapi/compiler/2024.0.0/linux/bin/icx - skipped
+-- Check for working C compiler: /workspace/intel/oneapi/compiler/202x.y.z/linux/bin/icx - skipped
 -- Detecting C compile features
 -- Detecting C compile features - done
 -- Detecting CXX compiler ABI info
 -- Detecting CXX compiler ABI info - done
--- Check for working CXX compiler: /workspace/intel/oneapi/compiler/2024.0.0/linux/bin/icpx - skipped
+-- Check for working CXX compiler: /workspace/intel/oneapi/compiler/202x.y.z/linux/bin/icpx - skipped
 -- Detecting CXX compile features
 -- Detecting CXX compile features - done
 -- Looking for pthread.h
@@ -252,16 +252,16 @@ $ ldd example-app
         libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007fd5a1a1b000)
         libintel-ext-pt-gpu.so => /workspace/libtorch/lib/libintel-ext-pt-gpu.so (0x00007fd5862b0000)
         ...
-        libmkl_intel_lp64.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_intel_lp64.so.2 (0x00007fd584ab0000)
-        libmkl_core.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_core.so.2 (0x00007fd5806cc000)
-        libmkl_gnu_thread.so.2 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_gnu_thread.so.2 (0x00007fd57eb1d000)
-        libmkl_sycl.so.3 => /workspace/intel/oneapi/mkl/2024.0.0/lib/intel64/libmkl_sycl.so.3 (0x00007fd55512c000)
-        libOpenCL.so.1 => /workspace/intel/oneapi/compiler/2024.0.0/linux/lib/libOpenCL.so.1 (0x00007fd55511d000)
-        libsvml.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fd553b11000)
-        libirng.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libirng.so (0x00007fd553600000)
-        libimf.so => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libimf.so (0x00007fd55321b000)
-        libintlc.so.5 => /workspace/intel/oneapi/compiler/2024.0.0/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fd553a9c000)
-        libsycl.so.6 => /workspace/intel/oneapi/compiler/2024.0.0/linux/lib/libsycl.so.6 (0x00007fd552f36000)
+        libmkl_intel_lp64.so.2 => /workspace/intel/oneapi/mkl/202x.y.z/lib/intel64/libmkl_intel_lp64.so.2 (0x00007fd584ab0000)
+        libmkl_core.so.2 => /workspace/intel/oneapi/mkl/202x.y.z/lib/intel64/libmkl_core.so.2 (0x00007fd5806cc000)
+        libmkl_gnu_thread.so.2 => /workspace/intel/oneapi/mkl/202x.y.z/lib/intel64/libmkl_gnu_thread.so.2 (0x00007fd57eb1d000)
+        libmkl_sycl.so.3 => /workspace/intel/oneapi/mkl/202x.y.z/lib/intel64/libmkl_sycl.so.3 (0x00007fd55512c000)
+        libOpenCL.so.1 => /workspace/intel/oneapi/compiler/202x.y.z/linux/lib/libOpenCL.so.1 (0x00007fd55511d000)
+        libsvml.so => /workspace/intel/oneapi/compiler/202x.y.z/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fd553b11000)
+        libirng.so => /workspace/intel/oneapi/compiler/202x.y.z/linux/compiler/lib/intel64_lin/libirng.so (0x00007fd553600000)
+        libimf.so => /workspace/intel/oneapi/compiler/202x.y.z/linux/compiler/lib/intel64_lin/libimf.so (0x00007fd55321b000)
+        libintlc.so.5 => /workspace/intel/oneapi/compiler/202x.y.z/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fd553a9c000)
+        libsycl.so.6 => /workspace/intel/oneapi/compiler/202x.y.z/linux/lib/libsycl.so.6 (0x00007fd552f36000)
         ...
 ```
 
@@ -286,4 +286,4 @@ Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users
 
 ## Intel® AI Reference Models
 
-Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/v2.12.0) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [Use Cases](https://github.com/IntelAI/models/tree/v2.12.0#use-cases) section. Models verified on Intel GPUs are marked in the `Model Documentation` column. You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
+Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/v3.1.1) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [Use Cases](https://github.com/IntelAI/models/tree/v3.1.1?tab=readme-ov-file#use-cases) section. Models verified on Intel GPUs are marked in the `Model Documentation` column. You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst
@@ -1,8 +1,8 @@
 Features
 ========
 
-Device-Agnostic
-***************
+GPU-Specific
+************
 
 Easy-to-use Python API
 ----------------------
@@ -46,16 +46,15 @@ Quantization
 
 Intel® Extension for PyTorch* currently supports imperative mode and TorchScript mode for post-training static quantization on GPU. This section illustrates the quantization workflow on Intel GPUs.
 
-Check more detailed information for `INT8 Quantization [XPU] <features/int8_overview_xpu.md>`_. 
+Check more detailed information for `INT8 Quantization <features/int8_overview_xpu.md>`_. 
 
-On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quantization.  Check more detailed information for `FP8 Quantization <./features/float8.md>`_ and `INT4 Quantization <./features/int4.md>`_ 
+On Intel® GPUs, Intel® Extension for PyTorch* also provides FP8 Quantization.  Check more detailed information for `FP8 Quantization <./features/float8.md>`_.
 
 .. toctree::
    :hidden:
    :maxdepth: 1
 
    features/int8_overview_xpu
-   features/int4
    features/float8
 
 
@@ -74,9 +73,6 @@ For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Prot
    features/horovod
 
 
-GPU-Specific
-************
-
 DLPack Solution
 ---------------
 
@@ -131,11 +127,12 @@ For more detailed information, check `FSDP <features/FSDP.md>`_.
 
    features/FSDP
 
-Inductor
---------
+torch.compile for GPU (Beta)
+----------------------------
+
 Intel® Extension for PyTorch\* now empowers users to seamlessly harness graph compilation capabilities for optimal PyTorch model performance on Intel GPU via the flagship `torch.compile <https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile>`_ API through the default "inductor" backend (`TorchInductor <https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747/1>`_ ). 
 
-For more detailed information, check `Inductor <features/torch_compile_gpu.md>`_.
+For more detailed information, check `torch.compile for GPU <features/torch_compile_gpu.md>`_.
 
 .. toctree::
    :hidden:
@@ -144,7 +141,7 @@ For more detailed information, check `Inductor <features/torch_compile_gpu.md>`_
    features/torch_compile_gpu
 
 Legacy Profiler Tool (Prototype)
------------------------------------
+--------------------------------
 
 The legacy profiler tool is an extension of PyTorch* legacy profiler for profiling operators' overhead on XPU devices. With this tool, you can get the information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch* with profiler support as default and enable this tool by adding a `with` statement before the code segment.
 
@@ -157,7 +154,7 @@ For more detailed information, check `Legacy Profiler Tool <features/profiler_le
    features/profiler_legacy
 
 Simple Trace Tool (Prototype)
---------------------------------
+-----------------------------
 
 Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context. 
 
@@ -170,7 +167,7 @@ For more detailed information, check `Simple Trace Tool <features/simple_trace.m
    features/simple_trace
 
 Kineto Supported Profiler Tool (Prototype)
----------------------------------------------
+------------------------------------------
 
 The Kineto supported profiler tool is an extension of PyTorch\* profiler for profiling operators' executing time cost on GPU devices. With this tool, you can get information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch\* with Kineto support as default and enable this tool using the `with` statement before the code segment.