Doc content finetune (#4215)

update known issues, profiler and torch.compile doc contents. --------- Co-authored-by: Ye Ting <ting.ye@intel.com>
intel · Apr 30, 2024 · 8bd78d6 · 8bd78d6
1 parent e8317aa
commit 8bd78d6
Show file tree

Hide file tree

Showing 13 changed files with 173 additions and 192 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -26,7 +26,7 @@ Intel® Extension for PyTorch* has been released as an open–source project at
 
 You can find more information about the product at:
 
-- `Features <https://intel.github.io/intel-extension-for-pytorch/gpu/latest/tutorials/features>`_
+- `Features <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features>`_
 - `Performance <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/performance>`_ 
 
 Architecture

diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst
@@ -137,19 +137,6 @@ For more detailed information, check `torch.compile for GPU <features/torch_comp
 
    features/torch_compile_gpu
 
-Legacy Profiler Tool (Prototype)
---------------------------------
-
-The legacy profiler tool is an extension of PyTorch* legacy profiler for profiling operators' overhead on XPU devices. With this tool, you can get the information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch* with profiler support as default and enable this tool by adding a `with` statement before the code segment.
-
-For more detailed information, check `Legacy Profiler Tool <features/profiler_legacy.md>`_.
-
-.. toctree::
-   :hidden:
-   :maxdepth: 1
-
-   features/profiler_legacy
-
 Simple Trace Tool (Prototype)
 -----------------------------
 
@@ -191,14 +178,13 @@ For more detailed information, check `Compute Engine <features/compute_engine.md
    features/compute_engine
 
 
+``IPEX_LOGGING`` (Prototype feature for debug)
+----------------------------------------------
 
 
-IPEX LOG (Prototype)
---------------------
-
-IPEX_LOGGING provides the capacity to log IPEX internal information. If you would like to use torch-style log, that is, the log/verbose is introduced by torch, and refer to cuda code, pls still use torch macro to show the log. For example, TORCH_CHECK, TORCH_ERROR. If the log is IPEX specific, or is going to trace IPEX execution, pls use IPEX_LOGGING. For some part of usage are still discussed with habana side, if has change some feature will update here.
+``IPEX_LOGGING`` provides the capability to log verbose information from Intel® Extension for PyTorch\* . Please use ``IPEX_LOGGING`` to get the log information or trace the execution from Intel® Extension for PyTorch\*. Please continue using PyTorch\* macros such as ``TORCH_CHECK``, ``TORCH_ERROR``, etc. to get the log information from PyTorch\*.
 
-For more detailed information, check `IPEX LOG <features/ipex_log.md>`_.
+For more detailed information, check `IPEX_LOGGING <features/ipex_log.md>`_.
 
 .. toctree::
    :hidden:

diff --git a/docs/tutorials/features/ipex_log.md b/docs/tutorials/features/ipex_log.md
@@ -1,54 +1,57 @@
-IPEX Logging usage
-===============================================
-<style>
-table {
-    margin: auto;
-}
-
-</style>
-
+`IPEX_LOGGING` (Prototype)
+==========================
 
 ## Introduction
 
-IPEX_LOGGING provides the capacity to log IPEX internal information. If you would like to use torch-style log, that is, the log/verbose is introduced by torch, and refer to cuda code, pls still use torch macro to show the log. For example, TORCH_CHECK, TORCH_ERROR. If the log is IPEX specific, or is going to trace IPEX execution, pls use IPEX_LOGGING. For some part of usage are still discussed with habana side, if has change some feature will update here.
+`IPEX_LOGGING` provides the capability to log verbose information from Intel® Extension for PyTorch\* . Please use `IPEX_LOGGING` to get the log information or trace the execution from Intel® Extension for PyTorch\*. Please continue using PyTorch\* macros such as `TORCH_CHECK`, `TORCH_ERROR`, etc. to get the log information from PyTorch\*.
 
-## Feature for IPEX Log
-### Log level
-Currently supported log level and usage are as follow, default using log level is `WARN`:
+## `IPEX_LOGGING` Definition
+### Log Level
+The supported log levels are defined as follows, default log level is `DISABLED`:
 
 |  log level   | number  | usage |
 |  :----:   | :----:   | :----: |
-| TRACE  | 0 | reserve it for further usage extension|
-| DEBUG  | 1 | We would like to insert DEBUG inside each host function, when log level is debug, we can get the whole calling stack |
-| INFO  | 2 | Record calls to other library functions and environment variable settings, such as onemkl calling and set verbose level|
-| WARN  | 3 | On the second attempt of the program, such as memory reallocation |
-| ERR  | 4 | Found error in try catch |
-| CRITICAL  | 5 | reserve it for further usage extension |
+| DISABLED  | -1 | Disable the logging |
+| TRACE  | 0 | Reserve for further usage |
+| DEBUG  | 1 | Provide the whole calling stack info |
+| INFO  | 2 | Record calling info to other library functions and environment variable settings |
+| WARN  | 3 | Warn the second attempt of an action, such as memory reallocation |
+| ERR  | 4 | Report error in try catch |
+| CRITICAL  | 5 | Reserve for further usage |
 
-### Log component
-Log component is for specify which part of IPEX does this log belongs to, currently we have seprate IPEX into four parts, shown as table below.
+### Log Component
+Log component is used to specify which part from Intel® Extension for PyTorch\* does this log information belong to. The supported log components are defined as follows:
 
 |  log component   | description |
 |  :----:   | :----:   
-| OPS  | Intergrate/Launch sycl onednn, onemkl ops | 
-| SYNGRAPH  | Habana Syngraph related | 
+| OPS  | Launch SYCL, oneDNN, oneMKL operators | 
+| SYNGRAPH  | Syngraph related | 
 | MEMORY  | Allocate/Free memory, Allocate/Free cache | 
 | RUNTIME  | Device / Queue related |
+| ALL  | All output log |
+
+## Usage in C++
+All the usage are defined in `utils/LogUtils.h`. Currently Intel® Extension for PyTorch\* supports:
 
-For `SYNGRAPH` you can also add log sub componment which is no restriction on categories.
+### Simple Log
+You can use `IPEX_XXX_LOG`, XXX represents the log level as mentioned above. There are four parameters defined for simple log:
+- Log component, representing which part of Intel® Extension for PyTorch\* does this log belong to.
+- Log sub component, input an empty string("") for general usages. For `SYNGRAPH` you can add any log sub componment.
+- Log message template format string.
+- Log name.
 
+Below is an example for using simple log inside abs kernel:
 
-## How to add log in IPEX
-All the usage are inside file `utils/LogUtils.h`. IPEX Log support two types of log usage, the first one is simple log, you can use IPEX_XXX_LOG, XXX represents the log level, including six log level mentioned above. There are for params for simple log, the first one is log component, representing which part of IPEX does this log belongs to. The second on is log sub component, for most of the IPEX usage, just input a empty string("") here. For the third param, it is an log message template, you can use it as python format string, or you can also refer to fmt lib https://github.com/fmtlib/fmt. Here is an example for simple log, for add a log inside abs kernel:
 ``` c++
 
 IPEX_INFO_LOG("OPS", "", "Add a log for inside ops {}", "abs");
 
 ```
+### Event Log
+Event log is used for recording a whole event, such as an operator calculation. The whole event is identified by an unique `event_id`. You can also mark each step by using `step_id`. Use `IPEX_XXX_EVENT_END()` to complete the logging of the whole event.
 
-For the second log level is event log, which is used for recording a whole event, such as a ops calculation. A whole event is identify through a event_id, for the whole event this event_id should be the same, but cannot be duplicate with other event_id, or it will met an undefined behaviour. You can also mark each step by using a step_id, there are no limitation for step_id. For the end of the whole event, should use IPEX_XXX_EVENT_END(), XXX is the log level mention aboved.
+Below is an example for using event log:
 
-Below is an example for ipex event log:
 ```c++
 IPEX_EVENT_END("OPS", "", "record_avg_pool", "start", "Here record the time start with arg:{}", arg);
 prepare_data();
@@ -58,23 +61,23 @@ IPEX_INFO_EVENT_END("OPS", "", "record_avg_pool", "finish conv", "Here record th
 ```
 
 ## Enviornment settings
-IPEX privide some enviornment setting used for log output settings, currently, we support below five settings.
+Intel® Extension for PyTorch\* provides five enviornment variables for configuring log output:
 
-1. IPEX_LOGGING_LEVEL, accept int or string, default is 3 for `WARN`. Currently you can choose seven different log level within ipex, including -1 `DISABLED` under this setting, all the usage related with IPEX_LOGGING will be disabled. Another six log levels are we mentioned above.
-2. IPEX_LOG_COMPONENT, accept a string, sepreated by `/` first part is log component and the second part is for log sub component, used for state which component and sub log component you would like to log, default is "ALL". Currently we supoort 5 different log component,, `ALL` is for all the output in IPEX, the other four log component are we mentioned above. You could also specify several log component, sepreating using `,` such as "OPS;MEMORY".For log sub component, it still discussed with habana side, pls don't use it first.
-3. IPEX_LOG_OUTPUT, accept a string. If you are using IPEX_LOG_OUTPUT, than all the logs will log inside the file rather than log into the console, you can use it like export IPEX_LOG_OUTPUT="./ipex.log", all the log will log inside ipex.log in current work folder.
-4. IPEX_LOG_ROTATE_SIZE, accept a int, default =10. Only validate when export IPEX_LOG_OUTPUT, specifing how large file will be used when rotating this log, size is MB.
-5. IPEX_LOG_SPLIT_SIZE, accept a int, default = null. Only validate when export IPEX_LOG_OUTPUT, specifing how large file will be used when split this log, size is MB.
+- `IPEX_LOGGING_LEVEL`, accept integar or string, default is -1 for `DISABLED`. 
+- `IPEX_LOG_COMPONENT`, accept string, used for specifying the log component and sub log component you would like to log, default is "ALL". The log component and sub log component are separated by `/`. You could also specify several log components, such as "OPS;MEMORY".
+- `IPEX_LOG_OUTPUT`, accept string. If you are using `IPEX_LOG_OUTPUT`, than all the logs will recorded inside a file rather than the console. Example: export IPEX_LOG_OUTPUT="./ipex.log".
+- `IPEX_LOG_ROTATE_SIZE`, accept integar, default is 10. Can be used only with `IPEX_LOG_OUTPUT`, for specifing how large file will be used when rotating this log, size is MB.
+- `IPEX_LOG_SPLIT_SIZE`, accept integar, default = null. Can be used only with `IPEX_LOG_OUTPUT`, for specifing how large file will be used when splitting the logs, size is MB.
 
 ## Usage in python
-1. torch.xpu.set_log_level(log_level) and torch.xpu.get_log_level(), these two functions are used for get and set the log level.
-2. torch.xpu.set_log_output_file_path(log_path) and torch.xpu.get_log_output_file_path(), these two functions are used for get and set the log output file path, once log output file path is set, logs will not be print on the console, will only output in the file.
-3. torch.xpu.set_log_rotate_file_size(file size) and torch.xpu.get_log_rotate_file_size(), these two functions are used for get and set the log rotate file size, only validate when output file path is set.
-4. torch.xpu.set_log_split_file_size(file size) and torch.xpu.get_log_split_file_size(), these two functions are used for get and set the log split file size, only validate when output file path is set.
-5. torch.xpu.set_log_component(log_component), and torch.xpu.get_log_component(), these two functions are used for get and set the log component, log component string are same with enviornment settings.
+- `torch.xpu.set_log_level(log_level)` and `torch.xpu.get_log_level()`, these two functions are used for getting and setting the log level.
+- `torch.xpu.set_log_output_file_path(log_path)` and `torch.xpu.get_log_output_file_path()`, these two functions are used for getting and setting the log output file path, once log output file path is set, logs will be recorded in file only.
+- `torch.xpu.set_log_rotate_file_size(file size)` and `torch.xpu.get_log_rotate_file_size()`, these two functions are used for getting and setting the log rotate file size. Can be used when output file path is set.
+- `torch.xpu.set_log_split_file_size(file size)` and `torch.xpu.get_log_split_file_size()`, these two functions are used for getting and setting the log split file size. Can be used when output file path is set.
+- `torch.xpu.set_log_component(log_component)`, and `torch.xpu.get_log_component()`, these two functions are used for getting and setting the log component. The log component string are the same as defined in enviornment settings.
 
-## Use IPEX log for simple trace
-For now, IPEX_SIMPLE_TRACE is depre deprecated, and pls use torch.xpu.set_log_level(0), it will show logs like previous IPEX_SIMPLE_TRACE.
+## Replace `IPEX_SIMPLE_TRACE`
+Use `torch.xpu.set_log_level(0)` to get logs to replace the previous usage in `IPEX_SIMPLE_TRACE`.
 
-## Use IPEX log for verbose
-For now, IPEX_VERBOSE is deprecated, pls use torch.xpu.set_log_level(1), it will show logs like previous IPEX_VERBOSE.
+## Replace `IPEX_VERBOSE`
+Use `torch.xpu.set_log_level(1)` to get logs to replace the previous usage in `IPEX_VERBOSE`.
diff --git a/docs/tutorials/features/profiler_kineto.md b/docs/tutorials/features/profiler_kineto.md
@@ -168,3 +168,13 @@ prof.export_chrome_trace("trace_file.json")
 You can examine the sequence of profiled operators, runtime functions and XPU kernels in these trace viewers. Here shows a trace result for ResNet50 run on XPU backend viewed by Perfetto viewer:
 
 ![profiler_kineto_result_perfetto_viewer](../../images/profiler_kineto/profiler_kineto_result_perfetto_viewer.png)
+
+## Known issues
+
+You may meet an issue that cannot collect profiling information of XPU kernels and device memory operations due to the failures in creating the tracers, when using Kineto profiler based on oneTrace. If you meet such failures that any tracer or collector could not be successfully created, please try the following workaround.
+
+```bash
+export ZE_ENABLE_TRACING_LAYER=1
+```
+
+> Note that this environment variable should be set as global before running any user level applications.
diff --git a/docs/tutorials/features/profiler_legacy.md b/docs/tutorials/features/profiler_legacy.md
@@ -1,93 +1,6 @@
-Legacy Profiler Tool (Prototype)
+Legacy Profiler Tool (Deprecated)
 ================================
 
 ## Introduction
 
-The legacy profiler tool is an extension of PyTorch\* legacy profiler for profiling operators' overhead on XPU devices. With this tool, users can get the information in many fields of the run models or code scripts. User should build Intel® Extension for PyTorch\* with profiler support as default and enable this tool by a `with` statement before the code segment.
-
-## Use Case
-
-To use the legacy profiler tool, you need to build Intel® Extension for PyTorch\* from source or install it via prebuilt wheel. You also have various methods to disable this tool.
-
-### Build Tool
-
-The build option `BUILD_PROFILER` is switched on as default but you can switch it off via setting `BUILD_PROFILER=OFF` while building Intel® Extension for PyTorch\* from source. With `BUILD_PROFILER=OFF`, no profiler code will be compiled and all python scripts using profiler with XPU support will raise a runtime error to user.
-
-```bash
-[BUILD_PROFILER=ON] python setup.py install     # build from source with profiler tool
-BUILD_PROFILER=OFF python setup.py install      # build from source without profiler tool
-```
-
-### Use Tool
-
-In your model script, write `with` statement to enable the legacy profiler tool ahead of your code snippets, as shown in the following example:
-
-```python
-# import all necessary libraries
-import torch
-import intel_extension_for_pytorch
-
-# these lines won't be profiled before enabling profiler tool
-input_tensor = torch.randn(1024, dtype=torch.float32, device='xpu:0')
-
-# enable legacy profiler tool with a `with` statement
-with torch.autograd.profiler_legacy.profile(use_xpu=True) as prof:
-    # do what you want to profile here after the `with` statement with proper indent
-    output_tensor_1 = torch.nonzero(input_tensor)
-    output_tensor_2 = torch.unique(input_tensor)
-
-# print the result table formatted by the legacy profiler tool as your wish
-print(prof.key_averages().table())
-```
-
-There are a number of useful parameters defined in `torch.autograd.profiler_legacy.profile()`. Many of them are aligned with usages defined in PyTorch\*'s official profiler, such as `record_shapes`, a very useful parameter to control whether to record the shape of input tensors for each operator. To enable legacy profiler on XPU devices, pass `use_xpu=True`. For the usage of more parameters, please refer to [PyTorch\*'s tutorial page](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html).
-
-### Disable Tool in Model Script
-
-To disable the legacy profiler tool temporarily in your model script, pass `enabled=False` to `torch.autograd.profiler_legacy.profile()`:
-
-```python
-with torch.autograd.profiler_legacy.profile(enabled=False, use_xpu=True) as prof:
-    # as `enabled` is set to false, the profiler won't work on these lines of code
-    output_tensor_1 = torch.nonzero(input_tensor)
-    output_tensor_2 = torch.unique(input_tensor)
-
-# This print will raise an error to user as the profiler was disabled
-print(prof.key_averages().table())
-```
-
-### Results
-
-Using the script shown above in **Use Tool** part, you'll see the result table printed out to the console as below:
-
-![Legacy_profiler_result_1](../../images/profiler_legacy/Legacy_profiler_result_1.png)
-
-In this result, you can find several fields like:
-
-- `Name`: the name of run operators
-- `Self CPU %`, `Self CPU`: the time consumed by the operator itself at host excluded its children operator call. The column marked with percentage sign shows the propotion of time to total self cpu time. While an operator calls more than once in a run, the self cpu time may increase in this field.
-- `CPU total %`, `CPU total`: the time consumed by the operator at host included its children operator call. The column marked with percentasge sign shows the propotion of time to total cpu time. While an operator calls more than once in a run, the cpu time may increase in this field.
-- `CPU time avg`: the average time consumed by each once call of the operator at host. This average is calculated on the cpu total time.
-- `Self XPU`, `Self XPU %`: similar to `Self CPU (%)` but shows the time consumption on XPU devices.
-- `XPU total`: similar to `CPU total` but shows the time consumption on XPU devices.
-- `XPU time avg`: similar to `CPU time avg` but shows average time sonsumption on XPU devices. This average is calculated on the XPU total time.
-- `# of Calls`: number of call for each operators in a run.
-
-You can print result table in different styles, such as sort all called operators in reverse order via `print(prof.table(sort_by='id'))` like:
-
-![Legacy_profiler_result_2](../../images/profiler_legacy/Legacy_profiler_result_2.png)
-
-### Export to Chrome Trace
-
-You can export the result to a json file and then load it in the Chrome trace viewer (`chrome://tracing`) by add this line in your model script:
-
-```python
-prof.export_chrome_trace("trace_file.json")
-```
-
-In Chrome trace viewer, you may find the result shows like:
-
-![Legacy_profiler_result_3](../../images/profiler_legacy/Legacy_profiler_result_3.png)
-
-For more example results, please refer to [PyTorch\*'s tutorial page](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html).
-
+The legacy profiler tool will be deprecated from Intel® Extension for PyTorch* very soon. Please use [Kineto Supported Profiler Tool](./profiler_kineto.md) instead for profiling operators' executing time cost on Intel® GPU devices.