macOS M1平台下编译使用MNN

Posted on 2023-03-23 In Tech , Mac Views: Disqus:

Build MNN ¹ ²

编译宏介绍

git clone git@github.com:alibaba/MNN.git
cd MNN
./schema/generate.sh
mkdir build && cd build
cmake -D MNN_METAL=ON -D MNN_ARM82=ON -D MNN_SUPPORT_BF16=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_TORCH=ON -D MNN_BUILD_TOOLS=ON -D MNN_BUILD_QUANTOOLS=ON ..

-- 3.19.0.0
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- 	System: Darwin
-- 	Processor: arm64
-- 	Version: 2.4.1
-- 	Metal: ON
-- 	OpenCL: OFF
-- 	OpenGL: OFF
-- 	Vulkan: OFF
-- 	ARM82: ON
-- 	oneDNN: OFF
-- 	TensorRT: OFF
-- 	CoreML: OFF
-- 	NNAPI: OFF
-- 	CUDA: OFF
-- 	OpenMP: OFF
-- 	BF16: ON
-- 	ThreadPool: ON
-- 	Hidden: TRUE
-- 	Build Path: /Users/luohanjie/Softwares/MNN/build_mac
-- 	CUDA PROFILE: OFF
-- WIN_USE_ASM: 
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Onnx: 
-- LibTorch Path is : /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake
CMake Warning at /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  tools/converter/source/torch/CMakeLists.txt:35 (find_package)
  tools/converter/CMakeLists.txt:33 (include)


-- Found Torch: /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/lib/libtorch.dylib  
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_mac

make -j20

如果在编译CONVERTER时出现#error C++17 or later compatible compiler is required to use ATen.的问题，可以在tools/converter/source/torch/CMakeLists.txt中添加target_compile_options(MNNConverterTorch PRIVATE "-std=c++17")解决。

模型转换 ³

Usage:
  MNNConvert [OPTION...]

  -h, --help                    Convert Other Model Format To MNN Model

  -v, --version                 显示当前转换器版本
  
  -f, --framework arg           需要进行转换的模型类型, ex: [TF,CAFFE,ONNX,TFLITE,MNN,TORCH, JSON]
  
      --modelFile arg           需要进行转换的模型文件名, ex: *.pb,*caffemodel
      
      --prototxt arg            caffe模型结构描述文件, ex: *.prototxt
      
      --MNNModel arg            转换之后保存的MNN模型文件名, ex: *.mnn
      
      --fp16                    将conv/matmul/LSTM的float32参数保存为float16，
      													模型将减小一半，精度基本无损
      
      --benchmarkModel          不保存模型中conv/matmul/BN等层的参数，仅用于benchmark测试
      
      --bizCode arg             MNN模型Flag, ex: MNN
      
      --debug                   使用debug模型显示更多转换信息
      
      --forTraining             保存训练相关算子，如BN/Dropout，default: false
      
      --weightQuantBits arg     arg=2~8，此功能仅对conv/matmul/LSTM的float32权值进行量化，
      													仅优化模型大小，加载模型后会解码为float32，量化位宽可选2~8，
                                运行速度和float32模型一致。8bit时精度基本无损，模型大小减小4倍
                                default: 0，即不进行权值量化
      
      --compressionParamsFile arg
                                使用MNN模型压缩工具箱生成的模型压缩信息文件
                                
      --saveStaticModel         固定输入形状，保存静态模型， default: false
      
      --inputConfigFile arg     保存静态模型所需要的配置文件, ex: ~/config.txt。文件格式为：
                                input_names = input0,input1
                                input_dims = 1x3x224x224,1x3x64x64
      --JsonFile arg            当-f MNN并指定JsonFile时，可以将MNN模型转换为Json文件
      --info                    当-f MNN时，打印模型基本信息（输入名、输入形状、输出名、模型版本等）
      --testdir arg             测试转换 MNN 之后，MNN推理结果是否与原始模型一致。
                                arg 为测试数据的文件夹，生成方式参考 "正确性校验" 一节
      --thredhold arg           当启用 --testdir 后，设置正确性校验的误差允可范围
                                若不设置，默认是 0.01
      --saveExternalData        将权重，常量等数据存储在额外文件中，默认为`false`

TorchScript to MNN

import torch
# ...
#  model is exported model
model.eval()
# trace
model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')
# script
model_script = torch.jit.script(model)
model_script.save('model_script.pt')

1	./build/MNNConvert -f TORCH --modelFile XXX.pt --MNNModel XXX.mnn --bizCode biz

ONNX to MNN

import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device="cpu")
model = torchvision.models.alexnet(pretrained=True).cpu()

# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)

1	./MNNConvert -f ONNX --modelFile XXX.onnx --MNNModel XXX.mnn --bizCode biz

正确性校验

以onnx网络为例子。

1 2	conda install onnxruntime python ./../tools/script/testMNNFromOnnx.py SRC.onnx

当结果中显示TEST_SUCCESS时，就表示模型转换与推理没有错误。

c++ Cmake

将dpt_swin2_tiny_256.pt网络转为dpt_swin2_tiny_256.mnn。使用该网络生成深度图：

#include <MNN/Interpreter.hpp>
#include <MNN/Matrix.h>
#include <MNN/ImageProcess.hpp>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp>  //for cv::dnn::blobFromImage
#include <sys/time.h>

cv::Mat ShowMat(const cv::Mat& src) {
    double min;
    double max;
    cv::minMaxIdx(src, &min, &max);
    cv::Mat adjMap;

    float scale = 255 / (max - min);
    src.convertTo(adjMap, CV_8UC1, scale, -min * scale);

    cv::Mat falseColorsMap;
    cv::applyColorMap(adjMap, falseColorsMap, cv::COLORMAP_PINK);

    return falseColorsMap;
}

int main(int argc, char* argv[]) {
    std::string img_file = "/Users/luohanjie/Workspace/Vision/depth_estimation/MiDaS/input/squirrel_iphone_sample3.png";
    std::string model_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/dpt_swin2_tiny_256/dpt_swin2_tiny_256.mnn";

    cv::Mat img = cv::imread(img_file);
    if (img.empty()) {
        std::cout << "Can not load image: " << img_file << std::endl;
        return 0;
    }

    int width_ori = img.cols;
    int height_ori = img.rows;

    // Interpreter是模型数据的持有者；Session通过Interpreter创建，是推理数据的持有者。多个推理可以共用同一个模型，即，多个Session可以共用一个Interpreter。
    // 在创建完Session，且不再创建Session或更新训练模型数据时，Interpreter可以通过releaseModel函数释放模型数据，以节省内存。
    std::shared_ptr<MNN::Interpreter> net(MNN::Interpreter::createFromFile(model_file.c_str()), MNN::Interpreter::destroy);
    if (net == NULL) {
        std::cout << "Can not load model: " << model_file << std::endl;
        return 0;
    }

    // 函数返回的Session实例是由Interpreter管理，随着Interpreter销毁而释放，一般不需要关注。也可以在不再需要时，调用Interpreter::releaseSession释放，减少内存占用。
    // 创建Session 一般而言需要较长耗时，而Session在多次推理过程中可以重复使用，建议只创建一次多次使用。
    MNN::ScheduleConfig session_config;
    session_config.type = MNN_FORWARD_AUTO;

    // memory、power、precision分别为内存、功耗和精度偏好。支持这些选项的后端会在执行时做出相应调整；若不支持，则忽略选项。
    // 示例： 后端 OpenCL precision 为 Low 时，使用 fp16 存储与计算，计算结果与CPU计算结果有少量误差，实时性最好；precision 为 Normal 时，使用 fp16存储，计算时将fp16转为fp32计算，计算结果与CPU计算结果相近，实时性也较好；precision 为 High 时，使用 fp32 存储与计算，实时性下降，但与CPU计算结果保持一致。
    // 后端 CPU precision 为 Low 时，根据设备情况开启 FP16 计算 precision 为 Low_BF16 时，根据设备情况开启 BF16 计算
    // BackendConfig bnconfig;
    // bnconfig.precision = BackendConfig::Precision_Low;
    // config.backendConfig = &bnconfig;
    MNN::Session* session = net->createSession(session_config);

    // 获取输入/出tensor
    MNN::Tensor* input = net->getSessionInput(session, "input.1");
    MNN::Tensor* output = net->getSessionOutput(session, "3335");

    // NCHW
    std::vector<int> input_dims = input->shape();
    int input_n = input_dims[0];
    int input_c = input_dims[1];
    int input_h = input_dims[2];
    int input_w = input_dims[3];
    std::cout << "Model input_n: "<<input_n<<", input_c: " << input_c<<", input_h: " << input_h << ", input_w: " << input_w << std::endl;

    // CHW
    std::vector<int> output_dims = output->shape();
    int output_c = output_dims[0];
    int output_h = output_dims[1];
    int output_w = output_dims[2];
    std::cout << "Model output_c: "<<output_c<<", output_h: " << output_h << ", output_w: " << output_w << std::endl;

    // x = (x / 255 - mean) / std
    // opencv: x = alpha * x + beta = (x / 255 - mean) / std = x / (255 * std) - mean / std
    // so:  alpha = 1 / (255 * std); beta = - mean / std
    float mean = 0.5f;
    float std = 0.5f;

    // N代表数量， C代表channel，H代表高度，W代表宽度。
    // NCHW其实代表的是[W H C N]， 第一个元素是000，第二个元素是沿着w方向的，即001，这样下去002 003，再接着呢就是沿着H方向，即004 005 006 007…这样到019后，沿C方向，轮到了020，之后021 022 …一直到319，然后再沿N方向。
    // NHWC代表的是[C W H N]， 第一个元素是000，第二个沿C方向，即020，040, 060…一直到300，之后沿W方向，001 021 041 061…301…到了303后，沿H方向，即004 024 …304.。最后到了319，变成N方向，320,340…
    // 当在不同的硬件加速的情况下，选用的类型不同，在intel GPU加速的情况下，因为GPU对于图像的处理比较多，希望在访问同一个channel的像素是连续的，一般存储选用NCHW，这样在做CNN的时候，在访问内存的时候就是连续的了，比较方便；
    // 所以在深度学习的时候，推理的前处理，一般都是将RGB或BGR图像进行转变为NCHW的格式；通常我们用opencv读取图像是NHWC的格式，需要进行通道分离，因为网路是一个通道一个通道的对图像做卷积，提取feature，所以NCHW更适合CNN。
    // https://blog.csdn.net/u010368556/article/details/105423260
    // caffe: NCHW;
    // pytorch: NCHW;
    // mxnet: NCHW;
    // 海思bgr: NCHW; 
    // NCNN: CHW
    // tensorflow: NHWC
    // opencv: NHWC
    // 瑞芯微rknn: NHWC
    // scipy.misc: NHW
    // https://www.cnblogs.com/yongy1030/p/11728103.html
    //convert NHWC to NCHW
    cv::Mat img_nchw;
    cv::dnn::blobFromImage(img, img_nchw,  1 / (255 * std), cv::Size(input_w, input_h), - mean / std, true);   //convert HWC to NCHW

    MNN::Tensor* tensor_nchw = new MNN::Tensor(input, MNN::Tensor::CAFFE);
    MNN::Tensor* tensor_depth = new MNN::Tensor(output, MNN::Tensor::CAFFE);

    memcpy(tensor_nchw->host<float>(), img_nchw.data, tensor_nchw->size());

    input->copyFromHostTensor(tensor_nchw);
    
    net->runSession(session);

    output->copyToHostTensor(tensor_depth);

    cv::Mat img_depth(output_h, output_w, CV_32FC1);   //difine opencv out img
    memcpy(img_depth.data, tensor_depth->host<float>(), tensor_depth->size());     //copy to output_img

    cv::resize(img_depth, img_depth, cv::Size(width_ori, height_ori));

    cv::Mat img_show = ShowMat(img_depth);
    cv::imshow("img_depth", img_show);
    cv::waitKey(0);

    delete tensor_nchw;
    delete tensor_depth;

    return 1;
}

project(TEST_MNN)

cmake_minimum_required(VERSION 3.24)

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "Detected processor: ${CMAKE_SYSTEM_PROCESSOR}")

set(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
set(LIBRARY_OUTPUT_PATH ${PROJECT_BINARY_DIR}/lib)

if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE Release)
endif()

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++17 -Wall")
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm.*|ARM.*|aarch64.*|AARCH64.*)")
    if (APPLE)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -mcpu=apple-m1 -mtune=native")
    else()
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -march=armv8-a+fp+simd+crypto")
    endif()
else()
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native -fopenmp")
endif()

#=======================================================================================

set(MNN_SRC /Users/luohanjie/Softwares/MNN)
set(MNN_LIBS ${MNN_SRC}/build_mac/libMNN.dylib)
set(MNN_INCLUDE_DIRS ${MNN_SRC}/include)


#=======================================================================================

find_package(OpenCV REQUIRED)

include_directories(${MNN_INCLUDE_DIRS} ${OpenCV_INCLUDE_DIRS})

link_directories(
    ${OpenCV_LIBRARY_DIRS}
)

add_executable(test_mnn test_mnn.cpp)
target_link_libraries(test_mnn ${MNN_LIBS} ${OpenCV_LIBS})

Cross Compiling for Android NDK

cd MNN
./schema/generate.sh
mkdir build_android && cd build_android
export ANDROID_NDK=/Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393
cmake -D CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
      -D CMAKE_BUILD_TYPE=Release \
      -D ANDROID_ABI="arm64-v8a" \
      -D ANDROID_STL=c++_shared \
      -D MNN_USE_LOGCAT=OFF \
      -D MNN_BUILD_BENCHMARK=OFF \
      -D MNN_USE_SSE=OFF \
      -D MNN_VULKAN=ON \
      -D MNN_OPENCL=ON \
      -D MNN_OPENGL=ON \
      -D MNN_ARM82=ON \
      -D MNN_SUPPORT_BF16=OFF \
      -D MNN_BUILD_TEST=OFF \
      -D ANDROID_NATIVE_API_LEVEL=android-29  \
      -D MNN_BUILD_FOR_ANDROID_COMMAND=OFF \
      -D NATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. $1 $2 $3 ..

-- The C compiler identification is Clang 14.0.6
-- The CXX compiler identification is Clang 14.0.6
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/homebrew/Caskroom/miniforge/base/envs/tf/bin/python (found version "3.10.9") 
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- 	System: Android
-- 	Processor: aarch64
-- 	Version: 2.4.1
-- 	Metal: OFF
-- 	OpenCL: ON
-- 	OpenGL: ON
-- 	Vulkan: ON
-- 	ARM82: ON
-- 	oneDNN: OFF
-- 	TensorRT: OFF
-- 	CoreML: OFF
-- 	NNAPI: OFF
-- 	CUDA: OFF
-- 	OpenMP: OFF
-- 	BF16: OFF
-- 	ThreadPool: ON
-- 	Hidden: TRUE
-- 	Build Path: /Users/luohanjie/Softwares/MNN/build_android
-- 	CUDA PROFILE: OFF
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_android

make -j20

https://mnn-docs.readthedocs.io/en/latest/compile/engine.html↩︎
https://www.yuque.com/mnn/en/build_linux↩︎
https://mnn-docs.readthedocs.io/en/latest/tools/convert.html↩︎

0%