macOS M1平台下编译使用MNN

Build MNN 1 2

编译宏介绍

1
2
3
4
5
git clone git@github.com:alibaba/MNN.git
cd MNN
./schema/generate.sh
mkdir build && cd build
cmake -D MNN_METAL=ON -D MNN_ARM82=ON -D MNN_SUPPORT_BF16=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_TORCH=ON -D MNN_BUILD_TOOLS=ON -D MNN_BUILD_QUANTOOLS=ON ..
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- 3.19.0.0
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Darwin
-- Processor: arm64
-- Version: 2.4.1
-- Metal: ON
-- OpenCL: OFF
-- OpenGL: OFF
-- Vulkan: OFF
-- ARM82: ON
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: ON
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /Users/luohanjie/Softwares/MNN/build_mac
-- CUDA PROFILE: OFF
-- WIN_USE_ASM:
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Onnx:
-- LibTorch Path is : /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake
CMake Warning at /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
tools/converter/source/torch/CMakeLists.txt:35 (find_package)
tools/converter/CMakeLists.txt:33 (include)


-- Found Torch: /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/lib/libtorch.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_mac
1
make -j20

如果在编译CONVERTER时出现#error C++17 or later compatible compiler is required to use ATen.的问题,可以在tools/converter/source/torch/CMakeLists.txt中添加target_compile_options(MNNConverterTorch PRIVATE "-std=c++17")解决。

模型转换 3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Usage:
MNNConvert [OPTION...]

-h, --help Convert Other Model Format To MNN Model

-v, --version 显示当前转换器版本

-f, --framework arg 需要进行转换的模型类型, ex: [TF,CAFFE,ONNX,TFLITE,MNN,TORCH, JSON]

--modelFile arg 需要进行转换的模型文件名, ex: *.pb,*caffemodel

--prototxt arg caffe模型结构描述文件, ex: *.prototxt

--MNNModel arg 转换之后保存的MNN模型文件名, ex: *.mnn

--fp16 将conv/matmul/LSTM的float32参数保存为float16,
模型将减小一半,精度基本无损

--benchmarkModel 不保存模型中conv/matmul/BN等层的参数,仅用于benchmark测试

--bizCode arg MNN模型Flag, ex: MNN

--debug 使用debug模型显示更多转换信息

--forTraining 保存训练相关算子,如BN/Dropout,default: false

--weightQuantBits arg arg=2~8,此功能仅对conv/matmul/LSTM的float32权值进行量化,
仅优化模型大小,加载模型后会解码为float32,量化位宽可选2~8,
运行速度和float32模型一致。8bit时精度基本无损,模型大小减小4倍
default: 0,即不进行权值量化

--compressionParamsFile arg
使用MNN模型压缩工具箱生成的模型压缩信息文件

--saveStaticModel 固定输入形状,保存静态模型, default: false

--inputConfigFile arg 保存静态模型所需要的配置文件, ex: ~/config.txt。文件格式为:
input_names = input0,input1
input_dims = 1x3x224x224,1x3x64x64
--JsonFile arg 当-f MNN并指定JsonFile时,可以将MNN模型转换为Json文件
--info 当-f MNN时,打印模型基本信息(输入名、输入形状、输出名、模型版本等)
--testdir arg 测试转换 MNN 之后,MNN推理结果是否与原始模型一致。
arg 为测试数据的文件夹,生成方式参考 "正确性校验" 一节
--thredhold arg 当启用 --testdir 后,设置正确性校验的误差允可范围
若不设置,默认是 0.01
--saveExternalData 将权重,常量等数据存储在额外文件中,默认为`false`

TorchScript to MNN

1
2
3
4
5
6
7
8
9
10
import torch
# ...
# model is exported model
model.eval()
# trace
model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')
# script
model_script = torch.jit.script(model)
model_script.save('model_script.pt')
1
./build/MNNConvert -f TORCH --modelFile XXX.pt --MNNModel XXX.mnn --bizCode biz

ONNX to MNN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device="cpu")
model = torchvision.models.alexnet(pretrained=True).cpu()

# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
1
./MNNConvert -f ONNX --modelFile XXX.onnx --MNNModel XXX.mnn --bizCode biz

正确性校验

onnx网络为例子。

1
2
conda install onnxruntime
python ./../tools/script/testMNNFromOnnx.py SRC.onnx

当结果中显示TEST_SUCCESS时,就表示模型转换与推理没有错误。

c++ Cmake

dpt_swin2_tiny_256.pt网络转为dpt_swin2_tiny_256.mnn。使用该网络生成深度图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#include <MNN/Interpreter.hpp>
#include <MNN/Matrix.h>
#include <MNN/ImageProcess.hpp>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp> //for cv::dnn::blobFromImage
#include <sys/time.h>

cv::Mat ShowMat(const cv::Mat& src) {
double min;
double max;
cv::minMaxIdx(src, &min, &max);
cv::Mat adjMap;

float scale = 255 / (max - min);
src.convertTo(adjMap, CV_8UC1, scale, -min * scale);

cv::Mat falseColorsMap;
cv::applyColorMap(adjMap, falseColorsMap, cv::COLORMAP_PINK);

return falseColorsMap;
}

int main(int argc, char* argv[]) {
std::string img_file = "/Users/luohanjie/Workspace/Vision/depth_estimation/MiDaS/input/squirrel_iphone_sample3.png";
std::string model_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/dpt_swin2_tiny_256/dpt_swin2_tiny_256.mnn";

cv::Mat img = cv::imread(img_file);
if (img.empty()) {
std::cout << "Can not load image: " << img_file << std::endl;
return 0;
}

int width_ori = img.cols;
int height_ori = img.rows;

// Interpreter是模型数据的持有者;Session通过Interpreter创建,是推理数据的持有者。多个推理可以共用同一个模型,即,多个Session可以共用一个Interpreter。
// 在创建完Session,且不再创建Session或更新训练模型数据时,Interpreter可以通过releaseModel函数释放模型数据,以节省内存。
std::shared_ptr<MNN::Interpreter> net(MNN::Interpreter::createFromFile(model_file.c_str()), MNN::Interpreter::destroy);
if (net == NULL) {
std::cout << "Can not load model: " << model_file << std::endl;
return 0;
}

// 函数返回的Session实例是由Interpreter管理,随着Interpreter销毁而释放,一般不需要关注。也可以在不再需要时,调用Interpreter::releaseSession释放,减少内存占用。
// 创建Session 一般而言需要较长耗时,而Session在多次推理过程中可以重复使用,建议只创建一次多次使用。
MNN::ScheduleConfig session_config;
session_config.type = MNN_FORWARD_AUTO;

// memory、power、precision分别为内存、功耗和精度偏好。支持这些选项的后端会在执行时做出相应调整;若不支持,则忽略选项。
// 示例: 后端 OpenCL precision 为 Low 时,使用 fp16 存储与计算,计算结果与CPU计算结果有少量误差,实时性最好;precision 为 Normal 时,使用 fp16存储,计算时将fp16转为fp32计算,计算结果与CPU计算结果相近,实时性也较好;precision 为 High 时,使用 fp32 存储与计算,实时性下降,但与CPU计算结果保持一致。
// 后端 CPU precision 为 Low 时,根据设备情况开启 FP16 计算 precision 为 Low_BF16 时,根据设备情况开启 BF16 计算
// BackendConfig bnconfig;
// bnconfig.precision = BackendConfig::Precision_Low;
// config.backendConfig = &bnconfig;
MNN::Session* session = net->createSession(session_config);

// 获取输入/出tensor
MNN::Tensor* input = net->getSessionInput(session, "input.1");
MNN::Tensor* output = net->getSessionOutput(session, "3335");

// NCHW
std::vector<int> input_dims = input->shape();
int input_n = input_dims[0];
int input_c = input_dims[1];
int input_h = input_dims[2];
int input_w = input_dims[3];
std::cout << "Model input_n: "<<input_n<<", input_c: " << input_c<<", input_h: " << input_h << ", input_w: " << input_w << std::endl;

// CHW
std::vector<int> output_dims = output->shape();
int output_c = output_dims[0];
int output_h = output_dims[1];
int output_w = output_dims[2];
std::cout << "Model output_c: "<<output_c<<", output_h: " << output_h << ", output_w: " << output_w << std::endl;

// x = (x / 255 - mean) / std
// opencv: x = alpha * x + beta = (x / 255 - mean) / std = x / (255 * std) - mean / std
// so: alpha = 1 / (255 * std); beta = - mean / std
float mean = 0.5f;
float std = 0.5f;

// N代表数量, C代表channel,H代表高度,W代表宽度。
// NCHW其实代表的是[W H C N], 第一个元素是000,第二个元素是沿着w方向的,即001,这样下去002 003,再接着呢就是沿着H方向,即004 005 006 007…这样到019后,沿C方向,轮到了020,之后021 022 …一直到319,然后再沿N方向。
// NHWC代表的是[C W H N], 第一个元素是000,第二个沿C方向,即020,040, 060…一直到300,之后沿W方向,001 021 041 061…301…到了303后,沿H方向,即004 024 …304.。最后到了319,变成N方向,320,340…
// 当在不同的硬件加速的情况下,选用的类型不同,在intel GPU加速的情况下,因为GPU对于图像的处理比较多,希望在访问同一个channel的像素是连续的,一般存储选用NCHW,这样在做CNN的时候,在访问内存的时候就是连续的了,比较方便;
// 所以在深度学习的时候,推理的前处理,一般都是将RGB或BGR图像进行转变为NCHW的格式;通常我们用opencv读取图像是NHWC的格式,需要进行通道分离,因为网路是一个通道一个通道的对图像做卷积,提取feature,所以NCHW更适合CNN。
// https://blog.csdn.net/u010368556/article/details/105423260
// caffe: NCHW;
// pytorch: NCHW;
// mxnet: NCHW;
// 海思bgr: NCHW;
// NCNN: CHW
// tensorflow: NHWC
// opencv: NHWC
// 瑞芯微rknn: NHWC
// scipy.misc: NHW
// https://www.cnblogs.com/yongy1030/p/11728103.html
//convert NHWC to NCHW
cv::Mat img_nchw;
cv::dnn::blobFromImage(img, img_nchw, 1 / (255 * std), cv::Size(input_w, input_h), - mean / std, true); //convert HWC to NCHW

MNN::Tensor* tensor_nchw = new MNN::Tensor(input, MNN::Tensor::CAFFE);
MNN::Tensor* tensor_depth = new MNN::Tensor(output, MNN::Tensor::CAFFE);

memcpy(tensor_nchw->host<float>(), img_nchw.data, tensor_nchw->size());

input->copyFromHostTensor(tensor_nchw);

net->runSession(session);

output->copyToHostTensor(tensor_depth);

cv::Mat img_depth(output_h, output_w, CV_32FC1); //difine opencv out img
memcpy(img_depth.data, tensor_depth->host<float>(), tensor_depth->size()); //copy to output_img

cv::resize(img_depth, img_depth, cv::Size(width_ori, height_ori));

cv::Mat img_show = ShowMat(img_depth);
cv::imshow("img_depth", img_show);
cv::waitKey(0);

delete tensor_nchw;
delete tensor_depth;

return 1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
project(TEST_MNN)

cmake_minimum_required(VERSION 3.24)

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "Detected processor: ${CMAKE_SYSTEM_PROCESSOR}")

set(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
set(LIBRARY_OUTPUT_PATH ${PROJECT_BINARY_DIR}/lib)

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++17 -Wall")
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm.*|ARM.*|aarch64.*|AARCH64.*)")
if (APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -mcpu=apple-m1 -mtune=native")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -march=armv8-a+fp+simd+crypto")
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native -fopenmp")
endif()

#=======================================================================================

set(MNN_SRC /Users/luohanjie/Softwares/MNN)
set(MNN_LIBS ${MNN_SRC}/build_mac/libMNN.dylib)
set(MNN_INCLUDE_DIRS ${MNN_SRC}/include)


#=======================================================================================

find_package(OpenCV REQUIRED)

include_directories(${MNN_INCLUDE_DIRS} ${OpenCV_INCLUDE_DIRS})

link_directories(
${OpenCV_LIBRARY_DIRS}
)

add_executable(test_mnn test_mnn.cpp)
target_link_libraries(test_mnn ${MNN_LIBS} ${OpenCV_LIBS})

Cross Compiling for Android NDK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cd MNN
./schema/generate.sh
mkdir build_android && cd build_android
export ANDROID_NDK=/Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393
cmake -D CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-D CMAKE_BUILD_TYPE=Release \
-D ANDROID_ABI="arm64-v8a" \
-D ANDROID_STL=c++_shared \
-D MNN_USE_LOGCAT=OFF \
-D MNN_BUILD_BENCHMARK=OFF \
-D MNN_USE_SSE=OFF \
-D MNN_VULKAN=ON \
-D MNN_OPENCL=ON \
-D MNN_OPENGL=ON \
-D MNN_ARM82=ON \
-D MNN_SUPPORT_BF16=OFF \
-D MNN_BUILD_TEST=OFF \
-D ANDROID_NATIVE_API_LEVEL=android-29 \
-D MNN_BUILD_FOR_ANDROID_COMMAND=OFF \
-D NATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. $1 $2 $3 ..
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- The C compiler identification is Clang 14.0.6
-- The CXX compiler identification is Clang 14.0.6
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/homebrew/Caskroom/miniforge/base/envs/tf/bin/python (found version "3.10.9")
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Android
-- Processor: aarch64
-- Version: 2.4.1
-- Metal: OFF
-- OpenCL: ON
-- OpenGL: ON
-- Vulkan: ON
-- ARM82: ON
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: OFF
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /Users/luohanjie/Softwares/MNN/build_android
-- CUDA PROFILE: OFF
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_android
1
make -j20