Skip to content

Releases: ermig1979/Simd

Simd v4.9.111

03 Mar 12:58
Compare
Choose a tag to compare

Algorithms

New features
  • AVX2, AVX-512BW optimizations of ResizerByteBicubic class.
  • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Base64Decode.
  • NEON optimizations of function SynetSwish32f.
  • Swish activation function to NEON optimizations of SynetConvolution32f framework.
  • Swish activation function to NEON optimizations of SynetDeconvolution32f framework.
  • Swish activation function to NEON optimizations of SynetMergedConvolution32f framework.
  • Swish activation function to NEON optimizations of SynetConvolution8i framework.
  • Swish activation function to NEON optimizations of SynetMergedConvolution8i framework.
  • NEON optimizations of function Yuv444pToBgraV2.
  • SSE2, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgraV2.
Improving
  • SSE4.1 optimizations of ResizerByteBicubic class.
Bug fixing
  • Compiler error in NEON optimizations of function AlphaUnpremultiply.
  • MSVS Compiler warnings in SSE4.1, AVX2, AVX-512BW optimizations of function TransformImage.

Simd v4.9.110

03 Mar 12:53
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1 optimizations of ResizerByteBicubic class.
  • Base implementation of function BgraToYuv444pV2.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Nv12SaveAsJpegToMemory.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuv420pSaveAsJpegToMemory.
  • Base implementation of function BgraToYuv420pV2.
Bug fixing
  • Error in SSE4.1, AVX2, AVX-512BW optimizations of function BgraToRgba.
  • Error in SSE4.1, AVX2 optimizations of function BgraToBgr.
  • Error in SSE4.1, AVX2 optimizations of function BgraToRgb.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AlphaUnpremultiply.

Test framework

New features
  • Tests for verifying functionality of function BgraToYuv444pV2.
  • Tests for verifying functionality of function Nv12SaveAsJpegToMemory.
  • Tests for verifying functionality of function Yuv420pSaveAsJpegToMemory.
  • Tests for verifying functionality of function BgraToYuv420pV2.

Simd v4.9.109

03 Jan 07:51
Compare
Choose a tag to compare

Algorithms

New features
  • Parameter Uyvy422ToBgr to function.
  • SSE4.1, AVX2 optimizations of function Uyvy422ToBgr.
  • Base implementation, SSE4.1, AVX2 optimizations of function Uyvy422ToYuv420p.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Base64Encode.
  • Base implementation of function Base64Decode.
Improving
  • AVX2 optimizations of class ResizerNearest for Bgr24, Uv16.
Renaming
  • Function UyvyToBgr to Uyvy422ToBgr.

Test framework

New features
  • Tests for verifying functionality of function Uyvy422ToYuv420p.
  • Tests for verifying functionality of function Base64Encode.
  • Tests for verifying functionality of function Base64Decode.

Documentation

Changes
  • Update developers list.

Simd v4.9.108

01 Dec 11:28
Compare
Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
  • Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
  • Add parameter BackgroundStatUpdateTime to Motion Detector.
  • MotionDetector performance optimization (case of falling star).
  • 16-bit UYVY image format in View.
  • Base implementation of function UyvyToBgr.
  • Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
  • SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
  • SimdYuvType enumeration.
  • Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
  • Function Simd::Resize supports images with 16-bit channel size.
  • Base implementation function Yuv420pToBgraV2.
Improving
  • Refactoring of SimdResizeMethodType enumeration.
Bug fixing
  • Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

Test framework

New features
  • Tests for verifying functionality of function UyvyToBgr.
  • Tests for verifying functionality of function SynetSwish32f.
  • Tests for verifying functionality of function Yuv444pToBgraV2.
  • Tests for verifying functionality of function Yuv420pToBgraV2.

Infrastructure

Bug fixing
  • Wrong compiler options correction in Cmake.

Simd v4.9.107

01 Nov 08:33
Compare
Choose a tag to compare

Algorithms

New features
  • Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
  • SimdBayerLayoutType enumeration.
  • Base implementation of class ResizerNearest.
Bug fixing
  • Compiler error when defined macro SIMD_SSE2_DISABLE.
  • Compiler error when defined macro SIMD_NEON_DISABLE.

Infrastructure

New features
  • SIMD_ROOT Cmake parameter.

Simd v4.9.106

01 Oct 09:51
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
  • SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
  • NEON optimizations of SynetMergedConvolution32fDc class.
  • NEON optimizations of SynetMergedConvolution32fCd class.
  • NEON optimizations of SynetInnerProduct32fGemm class.
  • NEON optimizations of SynetInnerProduct32fProd class.
  • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
  • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
Bug fixing
  • Compiler error in file SimdInit.h (CLang, Windows).
Removing
  • Remove including SimdConfig.h in SimdLib.h.

Test framework

New features
  • Tests for verifying functionality of function SynetHardSigmoid32f.
  • '-pi' test parameter (to print internal performance statistics of Simd Library to console).

Simd v4.9.105

13 Sep 12:37
Compare
Choose a tag to compare

Algorithms

New features
  • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24 for Rotate180, TransposeRotate90).
  • Method Frame::Clone with region parameter.
  • Method View::Clone with region parameter.
  • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Gray8, Uv16, Bgra32 for Rotate180, TransposeRotate90).
  • AVX-512BW optimizations of function TransformImage (case of Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function AlphaBlendingUniform.
  • AVX-512BW optimizations of function TransformImage (case of Bgr24 for Rotate180, TransposeRotate90, Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • Resize function (with size parameter).
  • Move constructor of View structure.
  • Move operator of View structure.
  • Clear method of Frame structure.
  • Swap method of Frame structure.
  • Move constructor of Frame structure.
  • Move operator of Frame structure.

Tests

New features
  • Tests for verifying functionality of function AlphaBlendingUniform.

Simd v4.9.104

03 Aug 09:06
Compare
Choose a tag to compare

Algorithms

New features
  • Rgba32 format in Frame structure.
  • Rgba32 format in Convert function (for frames).
  • SSE4.1 optimizations of function Float32ToFloat16.
  • SSE4.1 optimizations of function Float16ToFloat32.
  • AVX2 optimizations of function TransformImage (case of Bgra32 for Rotate180, TransposeRotate90).
Improving
  • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetConvolution32fNhwcDirect (case of fixed kernels).
  • Reducing of compilation time and binaries size of class SynetConvolution32f.
  • Reducing of compilation time and binaries size of class SynetDeconvolution32f.
  • Reducing of compilation time and binaries size of class SynetMergedConvolution32f.
  • Reducing of compilation time and binaries size of class SynetConvolution8i.
  • Reducing of compilation time and binaries size of class SynetMergedConvolution8i.
  • SSE41 optimizations of function TransformImage (case of Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate180).
  • SSE41 optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • SSE41 optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
Bug fixing
  • Compiler error in file SimdAvx512bwResizer.cpp (GCC 5.4.0).
  • Compiler error in file SimdAvx512bwBgraToBgr.cpp (MSVS-2017).
  • Compiler error in file SimdInit.h (CLang, Windows).
  • Error in AVX2 and AVX-512BW optimizations of functions CosineDistancesMxNa16f and CosineDistancesMxNp16f (functions may return small negative values).
  • Error in function Base::DetectionLoadA (it generates exception instead of returns NULL).
  • Error in SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.
Replacing
  • Replace SSE3 optimizations to SSE4.1 for function Gemm32fNT.
  • Replace SSE3 optimizations to SSE4.1 for function SynetConvolution32fInit.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralConvolutionForward.
  • Replace SSE4.2 optimizations to SSE4.1 for function Crc32c.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaBlending.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaFilling.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaPremultiply.
  • Replace SSSE3 optimizations to SSE4.1 for function BayerToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToBayer.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgba.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv422p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuva420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToBayer.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function RgbToBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToGray.
  • Replace SSSE3 optimizations to SSE4.1 for function RgbToGray.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function TransformImage.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv422p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv444p.
  • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function GaussianBlur3x3.
  • Replace SSSE3 optimizations to SSE4.1 for function GrayToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Laplace.
  • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function MeanFilter3x3.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceColor2x2.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray2x2.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray4x4.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder16bit.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder32bit.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder64bit.
  • Replace SSSE3 optimizations to SSE4.1 for function ResizeBilinear.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDx.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDy.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function ContourMetrics.
  • Replace SSSE3 optimizations to SSE4.1 for function ContourMetricsMasked.
  • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSum.
  • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
  • Replace SSSE3 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
  • Replace SSSE3 optimizations to SSE4.1 for class ResizerByteBilinear.

Tests

New features
  • Colorized annotation in console logging.
Improving
  • Performance report generation to text file.
  • Thread ID annotation in console logging.

Infrastructure

New features
  • SIMD_INT8_DEBUG cmake option.
Removing
  • Separate support of SSE3 extension (it has been moved into SSE4.1).
  • Separate support of SSE4.2 extension (it has been moved into SSE4.1).
  • Separate support of SSSE3 extension (it has been moved into SSE4.1).

Simd v4.8.103

01 Jul 14:28
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of class ResizerShortBilinear.
  • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNa16f.
  • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNp16f.
  • Parameter of ROI mask in Motion::Model.
  • SSE2, AVX-512BW and NEON optimizations of function AbsDifference.
  • NEON optimizations of function AlphaUnpremultiply.
  • NEON optimizations of function AlphaPremultiply.
  • NEON optimizations of function ValueSquareSums.
Improving
  • Performance of SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
Bug fixing
  • Linker warning in file SimdImageLoad.h (MSVS).
Replacing
  • Replace SSE optimizations to SSE2 for function SvmSumLinear.
  • Replace SSE optimizations to SSE2 for function Fill32f.
  • Replace SSE optimizations to SSE2 for function CosineDistance32f.
  • Replace SSE optimizations to SSE2 for function DifferenceSum32f.
  • Replace SSE optimizations to SSE2 for function SquaredDifferenceKahanSum32f.
  • Replace SSE optimizations to SSE2 for function HogDeinterleave.
  • Replace SSE optimizations to SSE2 for function HogFilterSeparable.
  • Replace SSE optimizations to SSE2 for class ResizerFloatBilinear.
  • Replace SSE optimizations to SSE2 for function NeuralAddVectorMultipliedByValue.
  • Replace SSE optimizations to SSE2 for function NeuralAddVector.
  • Replace SSE optimizations to SSE2 for function NeuralAddVector.
  • Replace SSE optimizations to SSE2 for function NeuralAdaptiveGradientUpdate.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeRelu.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeSigmoid.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeTanh.
  • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid.
  • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid2.
  • Replace SSE optimizations to SSE2 for function NeuralRoughTanh.
  • Replace SSE optimizations to SSE2 for function NeuralUpdateWeights.
  • Replace SSE optimizations to SSE2 for function NeuralPooling1x1Max3x3.
  • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max2x2.
  • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max3x3.
  • Replace SSE optimizations to SSE2 for function SynetPoolingForwardAverage.
  • Replace SSE optimizations to SSE2 for function SynetPoolingForwardMax32f.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Sum.
  • Replace SSE optimizations to SSE2 for function Gemm32fNN.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward0.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward1.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward2.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward3.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward4.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward8.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward9.
  • Replace SSE optimizations to SSE2 for function SynetReorderImage.
  • Replace SSE optimizations to SSE2 for function SynetReorderFilter.
  • Replace SSE optimizations to SSE2 for function SynetAddBias.
  • Replace SSE optimizations to SSE2 for function SynetEltwiseLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetInnerProductLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetShuffleLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetHswish32f.
  • Replace SSE optimizations to SSE2 for function SynetPreluLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetRelu32f.
  • Replace SSE optimizations to SSE2 for function SynetRestrictRange32f.
  • Replace SSE optimizations to SSE2 for function SynetScaleLayerForward.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetOutput.

Tests

New features
  • Tests to verify functionality function of VectorNormNa16f.
  • Tests to verify functionality function of VectorNormNp16f.

Infrastructure

Removing
  • Support of SSE extension.

Simd v4.7.102

02 Jun 11:02
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function ValueSquareSums.
Improving
  • Performance of AVX2, AVX-512F and NEON optimizations of SynetConvolution32fGemmNN class.
  • Performance of Neural::FullyConnectedLayer::Forward method.
Bug fixing
  • Error in class SynetMergedConvolution32fDc (large weights case).
  • Compiler error in file SimdAvx2SynetConversion.cpp (MSVS-2015, Win32).
  • Error in SSSE3 optimization of ImageTransform function.
  • Compiler error in file SimdImageSaveJpeg.h (Clang, Mac mini).
  • Compiler warnings (Clang).
  • Error in function ImagePngLoader::ReadTransparency (test tbbn0g04.png).
  • Error in Base implementation, SSE4.1 optimization of class ImagePngLoader (test basn0g16.png).
  • Error in SSE4.1 optimization of class ImagePngLoader (test s02i3p01.png).

Tests

New features
  • Tests to verify functionality function of ValueSquareSums.
Improving
  • Header of performance report table.
Bug fixing
  • Compiler error in file TestFile.h (Clang, Mac mini).