Releases: alibaba/heterogeneity-aware-lowering-and-optimization
IPU_STABLE_SDK_2.2.2_v3
close floating point check in popart (#665)
close floating point check in popart
create pipeline resource when computation created. not do it at the time of set the cache computation item
fix pixel bert detect application runtime error
add weiming build script fix
Co-authored-by: yanwei yw01041751@alibaba-inc.com
Combine config cache (#659) (#663)
fix poplar sdk path symbol
deserialize config in catch file
Add error handling codes
Co-authored-by: yanwei yw01041751@alibaba-inc.com
Co-authored-by: gcuser jackz@alibaba-inc.com
Co-authored-by: gcuser gcuser@alibaba-inc.com
(cherry picked from commit 0b006c0)
Co-authored-by: yanwei-gr 64010848+yanwei-gr@users.noreply.github.com
IPU_STABLE_SDK_2.3.0_v2
update popart api to sdk2.3 & remove custom erf ,already use popart o…
IPU_STABLE_SDK_2.3.0_v1
sdk2.3 v1
v0.7.2
This release contains the following major changes since v0.7.1:
-
Enhance ops support, including
- RNN, GRU, LSTM
- More arithmetic ops and logical ops
-
ODLA runtime libarry supports TensorRT 8.0.3
-
Initial Python interface support
-
Bug fixes
IPU_STABLE_SDK_2.2.2_v2
cherry-pick master 代码到SDK2.2.2 分支上 (#559) * Add Custom Op for Yolov3 Post Process (#512) * add custom op for yolov3 * reset submodule onnx * reset tensorrt * delete build * merge odla_ops_nn * modify for passing link-check Co-authored-by: gcuser <jackz@graphcore.ai> (cherry picked from commit 5847cd338e12b7154107ea0346b113605bb1223b) * ODLA popART pipeline function (#522) * First runnable with single thread & test context * mnist runnable demot to test the pipeline * multi thread put the data to the session run * simple bash to compile and run test * An example of how to use the callback in pipeline * multi threads using local Ctx * Can run with pipeline setting in onnx file * Refactored and add no pipeline multi thread * Move codes to the odla_pipeline.h .cc * Make single empty/zero data, and delete context for empty data after get result * Add mutex to serialization the compute requests * Merge the changes for attention mask & prevous changes * test codes for time * Chage the CMakeList to make the pipeline.cc and new custom op compiled * Successfully run on 24L with attention mask custom OP * custom op attention_mask test code * And name scope to the each node in model * Try throghput test with MLPerf model * only set AMP on feed forward matmul * Run the online pipeling with config hard coded to the config read class * Compile with SDK 2.2 with pipeline online setting * Add config file for pipeline stage setting * Run pipeline with similar performance of popart * change some names & make AMP all 0.445 * Add amp parameter in config file * Detach device and clear session when DestroyComputation * Make the batch_per_step take effect on execution mode SEQUENCE to pass enough size of data * Add the new lock free queue and logging * Fix bug on empty data visit counter * delete the empty context * add some pipeline sync * Make thread sleep for 5 ms when no task in the queue * change the size() of LockFreeQueue to tail-wait * [CI] make the call by main can work with npz files * Move the computation init to create context * Add common functions to common.h and common.cc * move the compuation init out * Move common functions to the test foler * Test the config of ODLA popART and make no configuration act as before * Add tests for call the model.cc * Add FP32 to save as result * Some changes on LockFreeQueue and tests * Fix the rsqrt wrong problem, and remove std cout&cerr to avoid crash * fix the accuracy problem of large bps * Add thread check for context & computation holding to avoid conflicts * Add the batch tools to help on the test to generate model, build and run * Decreasing the empty data put * temporary commit to migrate crashed system * set pipeline information on fly change the mixed style of class member add debug setting and default to false to make the opts set by api remove the old pipeline set api * Fixed the mixed code style and removed redundant codes * Remove the function test codes of the odla_popart * remove some redundant codes and files * Changed the CACHE STRING to CACHE PATH * move ENGINE_CACHE_PATH to odla_popart.cc * format the codes with clang-format-9 -i command * Move json.hpp to third party * Set virtualgraph for model not using pipeline in set_session_opts * Add virtual graph attribute when _odla_computation constructed * Check the shape before extends it with batches_per_step Co-authored-by: gcuser <gcuser@alibaba-inc.com> (cherry picked from commit 6095bdf246c3a4d9d686f2802cb6955cb7d70f79) * fix on default configuration & computation destroyment (cherry picked from commit 40b9fc840e76ed139d6038bc72f7cd4da03a7b52) * definitions for static variables (cherry picked from commit 18e0e83a9b4721624c291777c02fbecf189350fb) * disable test case test_constant_popart.cc Co-authored-by: Zars19 <1036473307@qq.com> Co-authored-by: jackzipu <74961298+jackzipu@users.noreply.github.com> Co-authored-by: gcuser <jackz@graphcore.ai>
IPU_STABLE_SDK_2.2.2_v1
code review (cherry picked from commit b9f8a69edd6f71d8e645311d01f3f1ac386d535d)
IPU_STABLE_SDK_2.1.0_v3
- Fix axis attribute for reduction instrs
IPU_STABLE_SDK_2.1.0_v2
- add constant decombine pass
v0.7.1
This release contains the following major changes since v0.7.0:
-
Enhance ONNX ops with ODLA/DNNL runtime library, including:
** reduction ops
** arg min, arg max ops
** Hardmax op -
Improve "double" data type (FP64) support
-
Switch to LLVM 12.0.0
-
Improve error handling for ODLA APIs and code generation.
IPU_STABLE_SDK_2.1.0_v1
[CodeGen] Check odla status after odla API calls