Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNN:Sync] Sync Internal 2.8.2 #2774

Merged
merged 1 commit into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ option(MNN_DEBUG_MEMORY "MNN Debug Memory Access" OFF)
option(MNN_DEBUG_TENSOR_SIZE "Enable Tensor Size" OFF)
option(MNN_GPU_TRACE "Enable MNN Gpu Debug" OFF)
option(MNN_SUPPORT_RENDER "Enable MNN Render Ops" OFF)
option(MNN_SUPPORT_TRANSFORMER_FUSE "Enable MNN transformer Fuse Ops" OFF)
option(MNN_PORTABLE_BUILD "Link the static version of third party libraries where possible to improve the portability of built executables" OFF)
option(MNN_SEP_BUILD "Build MNN Backends and expression separately. Only works with MNN_BUILD_SHARED_LIBS=ON" ON)
option(NATIVE_LIBRARY_OUTPUT "Native Library Path" OFF)
Expand Down Expand Up @@ -166,7 +167,9 @@ endif()
if(MNN_SUPPORT_RENDER)
add_definitions(-DMNN_SUPPORT_RENDER)
endif()

if(MNN_SUPPORT_TRANSFORMER_FUSE)
add_definitions(-DMNN_SUPPORT_TRANSFORMER_FUSE)
endif()
# debug options
if(MNN_DEBUG_MEMORY)
add_definitions(-DMNN_DEBUG_MEMORY)
Expand Down
6 changes: 6 additions & 0 deletions docker_release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# using docker run release
docker start mnn_release
docker exec -i -e TEST_ID=$(pwd | awk -F "/" '{print $(NF-1)}') mnn_release bash <<'EOF'
cd ~/yanxing_zhaode/cise/space/$TEST_ID/source && ./release.sh pymnn
exit
EOF
1 change: 1 addition & 0 deletions docs/compile/cmake.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,5 @@ MNN使用CMake构建项目,CMake中的宏定义列表如下:
| MNN_VULKAN_IMAGE | 构建MNN的Vulkan后端时采用Image内存模式,以便支持FP16和部分移动端上GPU的加速,默认为`ON` |
| MNN_LOW_MEMORY | 是否支持低内存模式,支持低内存模式使用权值量化模型并设置`low_memory`则会使用计算时反量化,默认为`OFF` |
| MNN_SUPPORT_RENDER | 是否支持图形渲染相关算子实现,默认为 `OFF` |
| MNN_SUPPORT_TRANSFORMER_FUSE | 是否支持Fuse Transformer相关OP实现,默认为 `OFF` |
| MNN_BUILD_LLM | 是否构建基于MNN的llm库和demo,默认为`OFF` |
4 changes: 2 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ GPU 后端调用 copy 的时间包含两个部分
- x86 / x64 架构下,无 vnni 指令,量化计算需要先从 int8 转到 int16 ,乘加到 int32 ,本身计算效率不如浮点直接乘加到 fp32 上快。
- x64 + vnni 指令,量化计算有 sdot 指令,明显快于 FP32 ,编译 MNN 时需要开启 MNN_AVX512 以支持这个指令,一般相比 AVX512 的浮点运算快 30%
- ARM v7a / ARMv8 :量化计算采用 int8 乘加到 int16,再双加到 int32 的方式,计算效率略快于浮点(一般 30% 左右提升)。
- ARMv8.2 + 量化计算有 sdot 指令,但同时 FP32 相对之前架构发射数也提升了一倍,编译 MNN 打开 MNN_ARM82 启用 sdot 指令则量化计算更快,否则 FP32 更快,理想情况比 FP32 快1倍以上,比 FP16 快 20%。
- ARMv8.2 架构有 sdot 指令,但同时 FP32 相对之前架构发射数也提升了一倍,也支持了比 FP32 快一倍的 FP16 向量计算指令,MNN 会检查设备架构以开启 sdot / smmla ,理想情况下量化计算性能比 FP32 快1倍以上,比 FP16 快 20%。

## 其他问题
### MNN模型如何加密
Expand All @@ -256,4 +256,4 @@ GPU 后端调用 copy 的时间包含两个部分
2. 执行`schema/generate.sh`重新生成`flatbuffers`头文件;
3. 重新编译`MNN`库文件, `Convert`等所有工具;
4. 使用新的工具重新转换模型;
5. 在端侧使用新模型和新的`MNN`库文件进行部署;
5. 在端侧使用新模型和新的`MNN`库文件进行部署;
7 changes: 5 additions & 2 deletions docs/inference/module.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,10 @@ rtmgr->setCache(".cachefile");
// 从模型文件加载并创建新Module
const std::string model_file = "/tmp/mymodule.mnn"; // model file with path

// 输入名,可以为空,为空时 MNN 自动搜索模型中的输入,多输入情况下无法保证顺序,需要通过 getInfo 接口查看
// 输入名:多个输入时按顺序填入,其顺序与后续 onForward 中的输入数组需要保持一致
const std::vector<std::string> input_names{"input_1", "input_2", "input_3"};
// 输出名,可以为空,为空时 MNN 自动搜索模型中的输出,多输出情况下无法保证顺序,需要通过 getInfo 接口查看

// 输出名,多个输出按顺序填入,其顺序决定 onForward 的输出数组顺序
const std::vector<std::string> output_names{"output_1"};

Module::Config mdconfig; // default module config
Expand All @@ -56,6 +57,8 @@ std::unique_ptr<Module> module; // module
module.reset(Module::load(input_names, output_names, model_filename.c_str(), rtMgr, &mdconfig));
```

输入输出的名字可以为空,此时,MNN 会检索模型中的输入/输出填入,在多输入/输出情况下无法保证顺序,需要通过 getInfo 接口查看。

### Module::Config
创建`Module`时可传入`Module::Config`,具体结构如下:

Expand Down
56 changes: 56 additions & 0 deletions docs/pymnn/expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,62 @@ array([0., 1., 2., 3.], dtype=float32)
```python
>>> expr.set_global_executor_config(2, 2, 1)
```

---
### `sync()`
MNN VARP同步,调用后可以保证改VARP计算完毕

返回:`None`

返回类型:`None`

示例:

```python
>>> mnn_var = expr.placeholder([2,2])
>>> mnn_var.sync()
```

---
### `set_device_ptr(device_ptr, memory_type)`
设置MNN VARP GPU内存地址,同时指定给定内存地址对应的内存类型(CUDA/OPENCL/OPENGL等),仅在MNN VARP有GPU内存时可用:

参数:
- `device_ptr:uint64_t` 整形内存指针地址
- `memory_type:int` 例如: 2->CUDA 3->OpenCL等, 详见include/MNN/MNNForwardType.h文件中MNNForwardType结构体

返回:`None`

返回类型:`None`

示例:

```python
>>> torch_tensor = torch.empty([1, 1000], dtype=torch.float16).cuda()
>>> mnn_var = expr.placeholder([2,2])
>>> mnn_var.set_device_ptr(torch_tensor.data_ptr() ,2)
```

---
### `copy_to_device_ptr(device_ptr, memory_type)`
拷贝MNN VARP GPU内存到指定内存地址, 同时指定给定内存地址对应的内存类型(CUDA/OPENCL/OPENGL等):

参数:
- `device_ptr:uint64_t` 整形内存指针地址
- `memory_type:int` 例如: 2->CUDA 3->OpenCL等, 详见include/MNN/MNNForwardType.h文件中MNNForwardType结构体

返回:`None`

返回类型:`None`

示例:

```python
>>> torch_tensor = torch.empty([1, 1000], dtype=torch.float16).cuda()
>>> mnn_var = expr.placeholder([2,2])
>>> mnn_var.copy_to_device_ptr(torch_tensor.data_ptr() ,2)
```

---
### `sign(x)`
返回输入值的符号,正数返回1,负数返回-1
Expand Down
9 changes: 8 additions & 1 deletion express/Executor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ void Executor::setGlobalExecutorConfig(MNNForwardType type, const BackendConfig&
mAttr->firstType = std::make_pair(type, numberThread);
info.mode = Backend::Info::DIRECT;
info.numThread = numberThread;
if (MNN_FORWARD_METAL == type) {
// Close metal's defer encoder
info.numThread |= MNN_GPU_RECORD_OP;
}
info.user = (BackendConfig*)&config;
std::shared_ptr<Runtime> bn(creator->onCreate(info));
mRuntimes[mAttr->firstType] = bn;
Expand Down Expand Up @@ -257,6 +261,9 @@ void Executor::RuntimeManager::setHint(Interpreter::HintMode mode, int value) {
case Interpreter::STRICT_CHECK_MODEL:
mInside->checkNetBuffer = value > 0;
break;
case Interpreter::MEM_ALLOCATOR_TYPE:
mInside->modes.memoryAllocatorType = value;
break;
default:
break;
}
Expand Down Expand Up @@ -538,7 +545,7 @@ void Executor::_makeCache(const std::vector<EXPRP>& expr, bool forceCPU) {
quant->scale = TensorUtils::getDescribe(srcTensor)->quantAttr.get()->scale;
quant->zero = TensorUtils::getDescribe(srcTensor)->quantAttr.get()->zero;
}

TensorUtils::getDescribe(tensor.get())->index = (int)scheduleInfo.allTensors.size();
scheduleInfo.allTensors.emplace_back(tensor);
}
Expand Down
42 changes: 42 additions & 0 deletions express/Expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,48 @@ void Variable::setName(const std::string& name) {
mFrom->setName(name);
}
}

bool Variable::setDevicePtr(const void* devicePtr, int memoryType) {
if (nullptr != mFrom->get()) {
MNN_ERROR("Can't setDevicePtr to no-input op\n");
return false;
}
informDirty();
MNN_ASSERT(TensorUtils::getDescribe(mFrom->inside()->mOutputTensors[0])->quantAttr == nullptr || TensorUtils::getDescribe(mFrom->inside()->mOutputTensors[0])->type == DataType_DT_FLOAT);
mFrom->mInside->mContentDirty = false;
// Clear host address, Don't malloc hostPtr afterwards
Utils::releaseMemoryForHostTensor(mFrom->inside()->mOutputTensors[0]);
return mFrom->inside()->mOutputTensors[0]->setDevicePtr(devicePtr, memoryType);
}

bool Variable::copyToDevicePtr(void* devicePtr, int memoryType) {
if (nullptr != mFrom->get()) {
MNN_ERROR("Can't copyToDevicePtr to no-input op\n");
return false;
}

auto inside = mFrom->inside();
auto originTensor = inside->mOutputTensors[mFromIndex];

auto bn = TensorUtils::getDescribe(originTensor)->getBackend();
if(bn == nullptr) {
MNN_ERROR("Error: Varp copyToDevicePtr can't find backend\n");
return false;
}
if (bn->type() != memoryType) {
MNN_ERROR("Error: VARP backend type ( %d ), is not same as assigned memory type ( %d )\n", bn->type(), memoryType);
return false;
}

MNN::Tensor tempTensor(originTensor->dimensions(), originTensor->getDimensionType());
tempTensor.buffer().device = (uint64_t)devicePtr;

TensorUtils::getDescribe(originTensor)->getBackend()->onCopyBuffer(originTensor, &tempTensor);
// Sync the result
tempTensor.wait(Tensor::MAP_TENSOR_READ, true);
return true;
}

const std::string& Variable::name() const {
return mFrom->outputName(mFromIndex);
}
Expand Down
5 changes: 1 addition & 4 deletions express/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,7 @@ bool Utils::allocMemoryForHostTensor(Tensor* dest) {
if (TensorUtils::getDescribe(dest)->memoryType != Tensor::InsideDescribe::MEMORY_HOST) {
return false;
}
auto size = dest->size();
if (0 >= size) {
return false;
}
auto size = dest->usize();
dest->buffer().host = (uint8_t*)MNNMemoryAllocAlign(size, MNN_MEMORY_ALIGN_DEFAULT);
return dest->buffer().host != nullptr;
}
Expand Down
5 changes: 4 additions & 1 deletion express/module/PipelineModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ static std::vector<int> _collectNeededOps(const MNN::Net* net, const std::set<in
// 0: use, 1: no use
std::vector<int> opMask(net->oplists()->size());
::memset(opMask.data(), 0, opMask.size() * sizeof(int));

// Set Initial Status
for (auto v : outputIndexes) {
tensorMask[v] = 1;
Expand Down Expand Up @@ -638,6 +638,9 @@ Module* PipelineModule::load(const std::vector<std::string>& inputs, const std::
modRuntime.rt.first.begin()->second->setExternalFile(rtMgr->getInside()->mExternalFile);
modRuntime.rt.second->setExternalFile(rtMgr->getInside()->mExternalFile);
}
// set allocator type
modRuntime.rt.first.begin()->second->setAllocatorType(rtMgr->getInside()->modes.memoryAllocatorType);
modRuntime.rt.second->setAllocatorType(rtMgr->getInside()->modes.memoryAllocatorType);
}
auto& rt = modRuntime.rt;
auto firstRt = rt.first[modRuntime.compute.type];
Expand Down
20 changes: 10 additions & 10 deletions include/MNN/ImageProcess.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ enum Wrap { CLAMP_TO_EDGE = 0, ZERO = 1, REPEAT = 2 };
* 2: Sample line and do format convert
* 3: Turn RGBA to float tensor, and do sub and normalize
*/

class MNN_PUBLIC ImageProcess {
public:
struct Inside;
Expand All @@ -62,7 +63,6 @@ class MNN_PUBLIC ImageProcess {
/** edge wrapper */
Wrap wrap = CLAMP_TO_EDGE;
};

public:
/**
* @brief create image process with given config for given tensor.
Expand All @@ -86,10 +86,10 @@ class MNN_PUBLIC ImageProcess {
static ImageProcess* create(const ImageFormat sourceFormat = RGBA, const ImageFormat destFormat = RGBA,
const float* means = nullptr, const int meanCount = 0, const float* normals = nullptr,
const int normalCount = 0, const Tensor* dstTensor = nullptr);

~ImageProcess();
static void destroy(ImageProcess* imageProcess);

/**
* @brief get affine transform matrix.
* @return affine transform matrix.
Expand All @@ -98,7 +98,7 @@ class MNN_PUBLIC ImageProcess {
return mTransform;
}
void setMatrix(const Matrix& matrix);

/**
* @brief convert source data to given tensor.
* @param source source data.
Expand All @@ -109,7 +109,7 @@ class MNN_PUBLIC ImageProcess {
* @return result code.
*/
ErrorCode convert(const uint8_t* source, int iw, int ih, int stride, Tensor* dest);

/**
* @brief convert source data to given tensor.
* @param source source data.
Expand All @@ -126,7 +126,7 @@ class MNN_PUBLIC ImageProcess {
*/
ErrorCode convert(const uint8_t* source, int iw, int ih, int stride, void* dest, int ow, int oh, int outputBpp = 0,
int outputStride = 0, halide_type_t type = halide_type_of<float>());

/**
* @brief create tensor with given data.
* @param w image width.
Expand All @@ -140,7 +140,7 @@ class MNN_PUBLIC ImageProcess {
return createImageTensor(halide_type_of<T>(), w, h, bpp, p);
}
static Tensor* createImageTensor(halide_type_t type, int w, int h, int bpp, void* p = nullptr);

/**
* @brief set padding value when wrap=ZERO.
* @param value padding value.
Expand All @@ -149,14 +149,14 @@ class MNN_PUBLIC ImageProcess {
void setPadding(uint8_t value) {
mPaddingValue = value;
}

/**
* @brief set to draw mode.
* @param void
* @return void.
*/
void setDraw();

/**
* @brief draw color to regions of img.
* @param img the image to draw.
Expand All @@ -179,4 +179,4 @@ class MNN_PUBLIC ImageProcess {
} // namespace CV
} // namespace MNN

#endif /* ImageProcess_hpp */
#endif /* MNN_ImageProcess_hpp */
2 changes: 1 addition & 1 deletion include/MNN/MNNDefine.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,6 @@ MNN_ERROR("Check failed: %s ==> %s\n", #success, #log); \
#define STR(x) STR_IMP(x)
#define MNN_VERSION_MAJOR 2
#define MNN_VERSION_MINOR 8
#define MNN_VERSION_PATCH 1
#define MNN_VERSION_PATCH 2
#define MNN_VERSION STR(MNN_VERSION_MAJOR) "." STR(MNN_VERSION_MINOR) "." STR(MNN_VERSION_PATCH)
#endif /* MNNDefine_h */
4 changes: 4 additions & 0 deletions include/MNN/Tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,10 @@ class MNN_PUBLIC Tensor {
* @param finish wait for command flush or finish
*/
int wait(MapType mtype, bool finish);
/**
* @brief set GPU tensor device ptr, and inform memory type
*/
bool setDevicePtr(const void* devicePtr, int memoryType);
private:
halide_buffer_t mBuffer;
struct InsideDescribe* mDescribe;
Expand Down
5 changes: 4 additions & 1 deletion include/MNN/expr/Expr.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,14 @@ class MNN_PUBLIC Variable {
Dimensionformat order = NHWC;
INTS dim;
halide_type_t type;
int size;
size_t size;
void syncSize();
};
const std::string& name() const;
void setName(const std::string& name);
bool setDevicePtr(const void* devicePtr, int memoryType);
bool copyToDevicePtr(void* devicePtr, int memoryType);

std::pair<EXPRP, int> expr() const {
return std::make_pair(mFrom, mFromIndex);
}
Expand Down
2 changes: 1 addition & 1 deletion project/android/gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.6-all.zip
distributionUrl=http\://mtl-gradle-mirror.oss-cn-hangzhou.aliyuncs.com/gradle-4.6-all.zip
Loading
Loading