Add optional C++ extension to CMSIS-DSP

ARM-software · Mar 6, 2024 · a277827 · a277827
2 parents 8821c46 + 4d7d599
commit a277827
Show file tree

Hide file tree

Showing 170 changed files with 39,432 additions and 6 deletions.
diff --git a/Documentation/Doxygen/dsp.dxy.in b/Documentation/Doxygen/dsp.dxy.in
@@ -573,14 +573,14 @@ HIDE_UNDOC_MEMBERS = YES
 # if EXTRACT_ALL is enabled.
 # The default value is: NO.
 
-HIDE_UNDOC_CLASSES = NO
+HIDE_UNDOC_CLASSES = YES
 
 # If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend
 # declarations. If set to NO, these declarations will be included in the
 # documentation.
 # The default value is: NO.
 
-HIDE_FRIEND_COMPOUNDS = NO
+HIDE_FRIEND_COMPOUNDS = YES
 
 # If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any
 # documentation blocks found inside the body of a function. If set to NO, these
@@ -773,7 +773,7 @@ SHOW_FILES = YES
 # Folder Tree View (if specified).
 # The default value is: YES.
 
-SHOW_NAMESPACES = YES
+SHOW_NAMESPACES = NO
 
 # The FILE_VERSION_FILTER tag can be used to specify a program or script that
 # doxygen should invoke to get the current version for each file (typically from
@@ -919,11 +919,24 @@ WARN_LOGFILE =
 # Note: If this tag is empty the current directory is searched.
 
 INPUT = ./src/mainpage.md \
+ ./src/dsppp_main.md \
+ ./src/introduction.md \
+ ./src/template.md \
+ ./src/guidelines.md \
+ ./src/vectorop.md \
+ ./src/memory_allocator.md \
+ ./src/memory_static_dynamic.md \
+ ./src/code_size.md \
+ ./src/fusion.md \
+ ./src/vector.md \
+ ./src/matrix.md \
+ ./src/building.md \
  ./src/history.md \
  ./src/history.txt \
  ../../Examples/ARM \
  ../../Include/ \
- ../../Source/ \
+ ../../Source/ \
+ ../../dsppp/Include
 
 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
@@ -2417,7 +2430,7 @@ INCLUDE_FILE_PATTERNS =
 # recursively expanded use the := operator instead of the = operator.
 # This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
 
-PREDEFINED = ARM_MATH_NEON=1 ARM_FLOAT16_SUPPORTED=1 __STATIC_FORCEINLINE= __ALIGNED(x)=
+PREDEFINED = DOXYGEN HAS_VECTOR HAS_PREDICATED_LOOP ARM_MATH_NEON=1 ARM_FLOAT16_SUPPORTED=1 __STATIC_FORCEINLINE= __ALIGNED(x)=
 
 # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
 # tag can be used to specify a list of macro names that should be expanded. The

diff --git a/Documentation/Doxygen/src/building.md b/Documentation/Doxygen/src/building.md
@@ -0,0 +1,29 @@
+# Building and running examples {#dsppp_building}
+
+## To build
+
+First time:
+
+```shell
+cbuild -O cprj test.csolution.yml --toolchain AC6 -c example.Release+VHT-Corstone-300 -p -r --update-rte
+
+```
+
+Other times:
+
+```shell
+cbuild -O cprj test.csolution.yml --toolchain AC6 -c example.Release+VHT-Corstone-300
+```
+
+If you want to select another test, edit the file `example.cproject.yml` and uncomment the test.
+
+## To run
+
+If the tools have been installed with `vcpkg`:
+
+```
+FVP_Corstone_SSE-300_Ethos-U55.exe -f fvp_configs/VHT-Corstone-300.txt -a cpu0=cprj\out\example\VHT-Corstone-300\Release\example.axf
+```
+
+Otherwise, you'll need to use the path to your FVP installation.
+
diff --git a/Documentation/Doxygen/src/code_size.md b/Documentation/Doxygen/src/code_size.md
@@ -0,0 +1,14 @@
+# Code size {#dsppp_code_size}
+
+It was explained in previous sections that types `Vector<T,NB1>` and `Vector<T,NB2>` are considered as different types if `NB1` and `NB2` are differents.
+
+A template algorithm is like a code generator that will generate different code for different values of the template arguments : the types.
+
+If you use a template algorithm with different vector datatypes, it will generate different code for those two datatypes. The generated code will be specialized for the specific datatypes used and thus is likely to be more efficient.
+
+But then it means you get different implementations so more code size.
+
+If you have a lot of different sizes in your system, then you're likely to get too much code size and in that case it may be better to use dynamic objects instead of static ones.
+
+dynamic objects are less efficient so it is a trade-off between code size / speed.
+
diff --git a/Documentation/Doxygen/src/dsppp_main.md b/Documentation/Doxygen/src/dsppp_main.md
@@ -0,0 +1,18 @@
+# DSP++ extension {#dsppp_main}
+
+C++ extensions to CMSIS-DSP using C++ template meta-programming (headers only).
+
+The headers are not yet part of the CMSIS-DSP pack since they are experimental. You can get them from the [CMSIS-DSP github](https://github.com/ARM-software/CMSIS-DSP/dsppp/Include). There is nothing to build. Just include the headers when you want to use this framework.
+
+* @subpage dsppp_intro "Introduction"
+* @subpage dsppp_template "C++ template for C programmer"
+* @subpage dsppp_vector_example "Vector operation example"
+* @subpage dsppp_memory_allocator "Memory allocation"
+* @subpage dsppp_memory_static_dynamic "Static / Dynamic objects"
+* @subpage dsppp_code_size "Code size"
+* @subpage dsppp_fusion "Fusion mechanism"
+* @subpage dsppp_vector "Vector operators"
+* @subpage dsppp_matrix "Matrix operators"
+* @subpage dsppp_building "Building and running examples"
+* @subpage dsppp_guidelines "Usage guidelines"
+
diff --git a/Documentation/Doxygen/src/fusion.md b/Documentation/Doxygen/src/fusion.md
@@ -0,0 +1,39 @@
+# Fusion {#dsppp_fusion}
+
+```cpp
+Vector<float32_t,NB> d = a + b * c;
+```
+
+With this line of code, there is loop fusion : instead of having one loop per operator there is one loop for the whole computation.
+
+It is important to have some ideas of how it works to avoid some mistake in the use of the library.
+
+In above code, `a + b * c` is not computing anything !
+`a + b * c` is creating a representation of the expression : an abstract syntax tree (AST) at build time.
+
+When this AST is assigned to the variable `d` it is evaluated.
+The evaluation forces the inlining of the expression operators in one loop. The code generated thus contains only one loop with a fusion of all the operators : `+` and `*`.
+
+The library is supporting virtual vectors. They are a view on an existing part of a vector. You can use a virtual vector for instance to read some samples with a stride. Or write some samples with a stride. A virtual vector does not own its memory.
+
+If you write:
+```cpp
+d = a;
+```
+
+and `d` and `a` are virtual vectors then nothing will be written to `d` !
+
+`d` will becomes `a` and `a` will no more be valid.
+
+If you want to copy a virtual vector you need to make an expression and write:
+
+```cpp
+d = copy(a);
+```
+
+Note that this problem occurs only for virtual vectors who do not own their memory.
+
+For real vectors, a copy would occur. But since there is no overhead in adding `copy` it is better to do it to avoid problems.
+
+
+
diff --git a/Documentation/Doxygen/src/guidelines.md b/Documentation/Doxygen/src/guidelines.md
@@ -0,0 +1 @@
+# Guidelines {#dsppp_guidelines}
diff --git a/Documentation/Doxygen/src/introduction.md b/Documentation/Doxygen/src/introduction.md
@@ -0,0 +1,64 @@
+## Introduction {#dsppp_intro}
+
+### Dot product example
+
+If you want to compute the dot product:
+
+\f[
+
+<scale*(\overrightarrow{a}+\overrightarrow{b}),\overrightarrow{c}*\overrightarrow{d}>
+
+\f]
+
+with CMSIS-DSP, you would write:
+
+```c
+arm_add_f32(a,b,tmp1,NB);
+arm_scale_f32(tmp1,scale,tmp2,NB);
+arm_mult_f32(c,d,tmp3,NB);
+arm_dot_prod_f32(tmp2,tmp3,NB,&r);
+```
+
+There are several limitations with this way of writing the code:
+
+1. The code needs to be rewritten and the `_f32` suffix changed if the developer wants to use another datatype
+
+2. Temporary buffers need to be allocated and managed (`tmp1`,`tmp2`,`tmp3`,`tmp4`)
+
+3. The four function calls are four different loops. It is not good for data locality and caches. The computation is not done in one pass
+
+4. Each loop contains a small number of instructions. For instance, for the `arm_add_f32`, two loads, an add instruction and a store. It is not enough to enable the compiler to reorder the instructions to improve the performance
+
+With this new C++ template library, you can write:
+
+
+```cpp
+r = dot(scale*(a+b),c*d);
+```
+
+The code generated by this line computes the dot product in one pass with all the operators (`+`, `*`) included in the loop.
+There is no more any temporary buffers.
+
+### Vector operations
+
+Let's look at another example:
+
+\f[
+
+\overrightarrow{d} = \overrightarrow{a} + \overrightarrow{b} * \overrightarrow{c}
+
+\f]
+
+With the C++ library, it can be written as:
+
+
+```cpp
+Vector<float32_t,NB> d = a + b * c;
+```
+
+Here again : all the vector operations (`+`,`*`) are done in one pass with one loop. There is no more any temporary buffer.
+
+If you're coming from C and does not know anything about C++ templates, we have a very quick introduction : @ref dsppp_template "The minimum you need to know about C++ template to use this library".
+
+You can also jump directly to an @ref dsppp_vector_example "example with vector operations".
+
diff --git a/Documentation/Doxygen/src/mainpage.md b/Documentation/Doxygen/src/mainpage.md
@@ -1,5 +1,7 @@
 # Overview {#mainpage}
 
+## Introduction
+
 This user manual describes the CMSIS DSP software library, a suite of common compute processing functions for use on Cortex-M and Cortex-A processor based devices.
 
 The library is divided into a number of functions each covering a specific category:
@@ -26,9 +28,21 @@ The library is providing vectorized versions of most algorithms for Helium and o
 
 When using a vectorized version, provide a little bit of padding after the end of a buffer (3 words) because the vectorized code may read a little bit after the end of a buffer. You don't have to modify your buffers but just ensure that the end of buffer + padding is not outside of a memory region.
 
+## Related projects
+
+### Python wrapper
+
 A Python wrapper is also available with a Python API as close as possible to the C one. It can be used to start developing and testing an algorithm with NumPy and SciPy before writing the C version. Is is available on [PyPI.org](https://pypi.org/project/cmsisdsp/). It can be installed with: `pip install cmsisdsp`.
 
-## Using the Library {#using}
+### Experimental C++ template extension
+
+This extension is a set of C++ headers. They just need to included to start using the features.
+
+Those headers are not yet part of the pack and you need to get them from the [github repository](https://github.com/ARM-software/CMSIS-DSP/tree/main/Include)
+
+More documentation about the @ref dsppp_main "DSP++" extension.
+
+## Using the CMSIS-DSP Library {#using}
 
 The library is released in source form. It is strongly advised to compile the library using `-Ofast` optimization to have the best performances.
 
@@ -56,6 +70,7 @@ The table below explains the content of **ARM::CMSIS-DSP** pack.
  📂 Include | Include files for using and building the lib
  📂 PrivateInclude | Private include files for building the lib
  📂 Source | Source files
+ 📂 dsppp | Experimental C++ teamplate extension
  📄 ARM.CMSIS-DSP.pdsc | CMSIS-Pack description file
  📄 LICENSE | License Agreement (Apache 2.0)