Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v6.1.143
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution16bNhwcDepthwise.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for class SynetConvolution32fNhwcDepthwise.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w4 for class SynetMergedConvolution16b.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for class SynetConvolution32fNhwcDepthwise.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for class SynetConvolution32fNhwcDepthwise.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w6 for class SynetMergedConvolution16b.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w8 for class SynetMergedConvolution16b.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for framework SynetMergedConvolution32f.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for framework SynetMergedConvolution32f.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for framework SynetMergedConvolution32f.
- AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w8 for class SynetMergedConvolution16b.
- Base implementation of function SimdYuv444pToRgbaV2.
Improving
- AVX-512BW optimizations of function Convolution32fNhwcDepthwiseDefault.
- AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
Bug fixing
- Error in Base implementation of class SynetDeconvolution16bNhwcGemm.
Test framework
New features
- Tests for verifying functionality of function SimdYuv444pToRgbaV2.
Simd v6.1.142
Algorithms
New features
- Base implementation of class SynetDeconvolution16bGemm.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetDeconvolution16bNhwcGemm.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveUv.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgr.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgra.
Improving
- AVX-512BW optimizations of function ConvolutionDirectNhwcConvolutionBiasActivationDepthwise.
Removing
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
- Base implementation of class SynetConvolution32fBf16Gemm.
- Parameter 'compatibility' from function SynetConvolution32fInit.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
- Base implementation of class SynetMergedConvolution32fBf16.
- Parameter 'compatibility' from function SynetMergedConvolution32fInit.
Test framework
New features
- Tests for verifying functionality of SynetDeconvolution16b framework.
Simd v6.1.141
Algorithms
New features
- Support of BFloat16 in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class ResizerNearest.
Bug fixing
- Compiler warning in function Simd::LitterCpuCache.
- Error in AVX-512BW optimizations of class SynetInnerProduct16bGemmNN.
Simd v6.1.140
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetRelu16b.
- API of SynetAdd16b framework.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetAdd16bUniform.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations, AMX-BF16 of class SynetConvolution16bNchwGemm.
Improving
- AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
- Error in Base implementation of class SynetMergedConvolution16bCdc.
- Error in Base implementation of class SynetMergedConvolution16bDc.
- Error in Base implementation of class SynetInnerProduct16bGemmNN.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
Test framework
New features
- Tests for verifying functionality of function SynetRelu16b.
- Tests for verifying functionality of SynetAdd16b framework.
Simd v6.1.139
Algorithms
New features
- API of SynetInnerProduct16b framework.
- Base implementation of class SynetInnerProduct16bRef.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
- Error in AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
- Error in Base implementation of class SynetConvolution16bNhwcGemm.
- Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Convert16bNhwcDirect.
- Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Reorder16bNhwcDirect.
- Error in Base implementation of class SynetMergedConvolution16bCdc.
- Error in Base implementation of class SynetMergedConvolution16bDc.
- Error in Base implementation of class SynetMergedConvolution16bCd.
- Error in AMX-BF16 optimizations of class SynetMergedConvolution16bDc.
Test framework
New features
- Tests for verifying functionality of SynetInnerProduct16b framework.
Simd v6.1.138
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
- SimdCpuInfoCurrentFrequency in SimdCpuInfoType enumeration.
- API of SynetMergedConvolution16b framework.
- Base implementation of class SynetMergedConvolution16b.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bDc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCdc.
- Support of YUV420P format to Simd::Frame.
Improving
- AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Bug fixing
- Errors in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
- Error in Base implementation of class SynetMergedConvolution8i.
Test framework
New features
- -wu command line option to set CPU warm up time in milliseconds.
- Tests for verifying functionality of SynetMergedConvolution16b framework.
Infrastructure
Bug fixing
- Errors in build_and_test_gcc section in Github actions script for CMake.
Simd v6.1.137
Algorithms
New features
- AMX-BF16 (AVX-512VBMI) optimizations of function DescrIntCosineDistance.
- AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNa.
- AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNp.
- API of SynetConvolution16b framework.
- Base implementation of class SynetConvolution16bGemm.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Improving
- AVX-512VNNI optimizations of function DescrIntCosineDistance.
- AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNa.
- AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNp.
Test framework
New features
- Tests for verifying functionality of SynetConvolution16b framework.
Simd v6.1.136
Algorithms
New features
- AMX-BF16 (AVX-512VBMI) optimizations of function ChangeColors.
- AMX-BF16 (AVX-512VBMI) optimizations of function NormalizeHistogram.
Improving
- AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
Bug fixing
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
Test framework
New features
- Command line parameter to disable testing of some SIMD extensions.
Bug fixing
- Error in test of function Nv12SaveAsJpegToMemory.
Simd v6.1.135
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
- AMX-BF16 optimizations of function Float32ToBFloat16.
- Support of SimdSynetUnaryOperation32fCos in function SynetUnaryOperation32f.
- Support of SimdSynetUnaryOperation32fSin in function SynetUnaryOperation32f.
Bug fixing
- Error in function SimdCpuInfo (wrong AMX-BF16 detection).
- Error in AVX-512BF16 optimization of function Float32ToBFloat16.
- Error in AMX initialization in function AmxBf16::SupportedByOS.
- Crash in function AmxBf16::ConvolutionBf16NhwcConv_2.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
Removing
- AVX-512BF16 optimizations of function Float32ToBFloat16.
- AVX-512BF16 optimizations of SynetConvolution32fBf16Nhwc.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
- Stopping of separate support of AVX-512BF16 extension (only together with AMX-BF16).
Test framework
Bug fixing
- Error in test of SynetMergedConvolution32f framework.
Infrastructure
Removing
- Avx512Bf16 project for MSVS-2022.
- Avx512Bf16 project for MSVS-2019.
- Avx512Bf16 project for MSVS-2015.
- Avx512Bf16 project for MSVS-2017.
- Avx512Bf16 project for CMake.
Simd v6.0.134
Algorithms
New features
- SSE4.1 optimizations of ResizerFloatBilinear class.
Improving
- Improve AVX2 optimizations of ResizerFloatBilinear class (AMD CPU).
- Improve AVX2 optimizations of ResizerShortBilinear class (AMD CPU).
Bug fixing
- MSVS compiler bug in file SimdAvx512bwYuvToBgraV2.
- Linux, GCC-13 - crash in function SimdSynetInnerProduct32fForward.
- MSVS compiler bug (Cmake, Windows for ARM64) with functions Extract4Sums.
- GCC-9/10 - compiler error in AVX-512BW optimization of function YToGray.
- GCC-9/10 - compiler error in AVX-512BW optimization of function GrayToY.
Replacing
- Replace AVX optimizations to AVX2 for function CosineDistance32f.
- Replace AVX optimizations to AVX2 for function Fill32f.
- Replace AVX optimizations to AVX2 for ResizerFloatBilinear class.
- Replace AVX optimizations to AVX2 for function SquaredDifferenceSum32f.
- Replace AVX optimizations to AVX2 for function SquaredDifferenceKahanSum32f.
- Replace AVX optimizations to AVX2 for function HogLiteFilterFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteResizeFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteCompressFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteFilterSeparable.
- Replace AVX optimizations to AVX2 for function NeuralPooling2x2Max2x2.
- Replace AVX optimizations to AVX2 for function NeuralProductSum.
- Replace AVX optimizations to AVX2 for function NeuralAddVectorMultipliedByValue.
- Replace AVX optimizations to AVX2 for function NeuralAddVector.
- Replace AVX optimizations to AVX2 for function NeuralAddValue.
- Replace AVX optimizations to AVX2 for function NeuralRoughSigmoid.
- Replace AVX optimizations to AVX2 for function NeuralRoughSigmoid2.
- Replace AVX optimizations to AVX2 for function NeuralRoughTanh.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeRelu.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeTanh.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeSigmoid.
- Replace AVX optimizations to AVX2 for function NeuralUpdateWeights.
- Replace AVX optimizations to AVX2 for function NeuralAdaptiveGradientUpdate.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Sum.
- Replace AVX optimizations to AVX2 for function NeuralConvolutionForward.
- Replace AVX optimizations to AVX2 for function SynetAddBias.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward0.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward1.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward2.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward3.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward4.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward8.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward9.
- Replace AVX optimizations to AVX2 for function SynetPoolingAverage.
- Replace AVX optimizations to AVX2 for function SynetShuffleLayerForward.
- Replace AVX optimizations to AVX2 for function SynetHardSigmoid32f.
- Replace AVX optimizations to AVX2 for function SynetHswish32f.
- Replace AVX optimizations to AVX2 for function SynetPreluLayerForward.
- Replace AVX optimizations to AVX2 for function SynetRelu32f.
- Replace AVX optimizations to AVX2 for function SynetRestrictRange32f.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetOutput.
- Replace AVX optimizations to AVX2 for function GemmPackA.
- Replace AVX optimizations to AVX2 for function GemmPackB.
- Replace AVX optimizations to AVX2 for function GemmScaleC.
- Replace AVX optimizations to AVX2 for function SynetScaleLayerForward.
- Replace AVX optimizations to AVX2 for function SynetInnerProductLayerForward.
- Replace AVX optimizations to AVX2 for function SynetInnerProduct32fInit.
- Replace AVX optimizations to AVX2 for function SynetEltwiseLayerForward.
- Replace AVX optimizations to AVX2 for function SynetDeconvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
- Replace AVX optimizations to AVX2 for SynetInnerProduct32fGemm class.
- Replace AVX optimizations to AVX2 for SynetInnerProduct32fProd class.
- Replace AVX optimizations to AVX2 for SynetDeconvolution32fGemmNN class.
- Replace AVX optimizations to AVX2 for SynetDeconvolution32fNhwcDirect2x2 class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fCdc class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fCd class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fDc class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDepthwiseDotProduct class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDirectNchw class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDirectNhwc class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fNhwcDirect class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fGemmNT class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fGemmNТ class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fWinograd class.
- Replace AVX optimizations to AVX2 for function SynetConvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
Removing
- Base implementation, SSE4.1, AVX, AVX-512BW, NEON, VSX optimizations of function SvmSumLinear.
- Stopping of separate support of AVX extension (only together with AVX2).
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundGrowRangeSlow.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundGrowRangeFast.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundIncrementCount.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundAdjustRange.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundAdjustRangeMasked.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundShiftRange.
- Base implementatio...