Hanjie's Blog

基于FFT的图片模糊检测算法

Posted on 2023-08-18 In Tech , Vision Disqus:

Background

我们希望对输入图片进行检测，判断图片是否清晰。传统方法中，我们可以通过使用Laplacian算子，求输入图片的二阶导图，得到图像的边缘信息。然后对二阶导图求方差，根据方差值的大小可以判断出图像模糊的模糊程度¹。方差值越小，图像越模糊。

可是该方法存在的问题是，很难确定一个阀值来区分清晰和模糊图片。在不同的场景中，清晰与模糊之间的阀值会发生变化。

The downside is that the Laplacian method required significant manual tuning to define the “threshold” at which an image was considered blurry or not. If you could control your lighting conditions, environment, and image capturing process, it worked quite well — but if not, you would obtain mixed results, to say the least.²

我们参考了³和⁴中提出的方法，实现了一个基于傅立叶变换(FFT)的图片模糊检测算法。算法中，首先会将输入图片转变为频谱图。频谱图中，中心代表的是低频，往四面八方扩展后逐渐变为高频。通过屏蔽掉频谱图的中心区域，对图像实现高通滤波，保留图像中边缘等高频的信息。然后对频谱图求均值，求图片中平均高频幅值，通过该平均幅值来判断图像是否模糊。平均幅值越小，图像越模糊。

Code

对于全黑图片，我们将Blur Value设置为100，并且认为是模糊图片。

cv::Mat ColorMat(const cv::Mat &mat_float, bool draw_colorbar = false, const bool is_white_background = false, double min_val = 1, double max_val = 0, const cv::Mat &user_color = cv::Mat(), int colorbar_width = 50, int colorbar_gap = 5) {
    if (min_val > max_val) {
        cv::minMaxLoc(mat_float, &min_val, &max_val);
    }

    cv::Mat mat;
    mat_float.convertTo(mat, CV_8UC1, 255 / (max_val - min_val), -255 * min_val / (max_val - min_val));

    cv::Mat mat_show;

    if (user_color.empty()) {
        cv::applyColorMap(mat, mat_show, cv::COLORMAP_JET);
    } else {
        cv::applyColorMap(mat, mat_show, user_color);
    }

    if (is_white_background) {
        cv::Mat mask;
        cv::threshold(mat, mask, 0, 255, cv::THRESH_BINARY_INV);
        cv::Mat img_white(mat.size(), CV_8UC3, cv::Scalar(255, 255, 255));
        img_white.copyTo(mat_show, mask);
    }

    if (draw_colorbar) {
        cv::Mat color_bar_value(cv::Size(colorbar_width, mat_show.rows), CV_8UC1);
        cv::Mat color_bar;

        for (int i = 0; i < mat_show.rows; i++) {
            uchar value = 255 - 255 * float(i) / float(mat_show.rows);
            for (int j = 0; j < colorbar_width; j++) {
                color_bar_value.at<uchar>(i, j) = value;
            }
        }

        if (user_color.empty()) {
            cv::applyColorMap(color_bar_value, color_bar, cv::COLORMAP_JET);
        } else {
            cv::applyColorMap(color_bar_value, color_bar, user_color);
        }

        cv::Mat mat_colorbar_show(cv::Size(mat_show.cols + colorbar_width + colorbar_gap, mat_show.rows), CV_8UC3, cv::Scalar(255, 255, 255));

        mat_show.copyTo(mat_colorbar_show(cv::Rect(0, 0, mat_show.cols, mat_show.rows)));
        color_bar.copyTo(mat_colorbar_show(cv::Rect(mat_show.cols + colorbar_gap, 0, color_bar.cols, color_bar.rows)));

        cv::putText(mat_colorbar_show, ToStr(max_val), cv::Point(mat_show.cols + colorbar_gap, 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 255, 255), 1);
        cv::putText(mat_colorbar_show, ToStr(min_val), cv::Point(mat_show.cols + colorbar_gap, mat_show.rows - 10), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 255, 255), 1);

        mat_show = mat_colorbar_show;
    }

    return mat_show;
};

cv::Mat ShowColorMat(const std::string &name, const cv::Mat &mat_float, bool draw_colorbar = false, const float scale = 1, const bool is_white_background = false, double min_val = 1, double max_val = 0, const cv::Mat &user_color = cv::Mat(), int colorbar_width = 50, int colorbar_gap = 5) {
    cv::Mat mat_resize;
    cv::resize(mat_float, mat_resize, cv::Size(), scale, scale, cv::INTER_NEAREST);
    cv::Mat img = ColorMat(mat_resize, draw_colorbar, is_white_background, min_val, max_val, user_color, colorbar_width, colorbar_gap);
    cv::imshow(name, img);
    return img;
}

cv::Mat FilterMask(const float radius, const cv::Size &mask_size) {
    cv::Mat mask = cv::Mat::ones(mask_size, CV_8UC1);
    cv::circle(mask, cv::Point(mask_size.width / 2, mask_size.height / 2), radius, cv::Scalar(0), -1);
    
    // https://datahacker.rs/opencv-discrete-fourier-transform-part2/
    cv::Mat mask_float, filter_mask;
    mask.convertTo(mask_float, CV_32F);
    std::vector<cv::Mat> mask_merge = {mask_float, mask_float};
    cv::merge(mask_merge, filter_mask);
    return filter_mask;
}

bool BlurDetection(const cv::Mat &img,
                   const cv::Mat &filter_mask,
                   float &blur_value,
                   const float thresh = 10,
                   const bool debug_show = false) {
    // https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/
    // https://github.com/Qengineering/Blur-detection-with-FFT-in-C/blob/master/main.cpp
    // https://docs.opencv.org/3.4/d8/d01/tutorial_discrete_fourier_transform.html

    cv::Mat img_scale, img_gray, img_fft;
    cv::resize(img, img_scale, filter_mask.size());

    if (img_scale.channels() == 3) {
        cv::cvtColor(img_scale, img_gray, cv::COLOR_BGR2GRAY);
    } else {
        img_gray = img_scale;
    }
    img_gray.convertTo(img_fft, CV_32F);

    // If DFT_SCALE is set, the scaling is done after the transformation.
    // When DFT_COMPLEX_OUTPUT is set, the output is a complex matrix of the same size as input.
    cv::dft(img_fft, img_fft, cv::DFT_COMPLEX_OUTPUT); // cv::DFT_SCALE | 

    // # zero-out the center of the FFT shift (i.e., remove low
    // # frequencies), apply the inverse shift such that the DC
    // # component once again becomes the top-left, and then apply
    // # the inverse FFT

    // rearrange the quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
    int cx = img_gray.cols / 2;
    int cy = img_gray.rows / 2;

    // center low frequencies in the middle
    // by shuffling the quadrants.
    cv::Mat q0(img_fft, cv::Rect(0, 0, cx, cy));    // Top-Left - Create a ROI per quadrant
    cv::Mat q1(img_fft, cv::Rect(cx, 0, cx, cy));   // Top-Right
    cv::Mat q2(img_fft, cv::Rect(0, cy, cx, cy));   // Bottom-Left
    cv::Mat q3(img_fft, cv::Rect(cx, cy, cx, cy));  // Bottom-Right

    cv::Mat tmp;  // swap quadrants (Top-Left with Bottom-Right)
    q0.copyTo(tmp);
    q3.copyTo(q0);
    tmp.copyTo(q3);

    q1.copyTo(tmp);  // swap quadrant (Top-Right with Bottom-Left)
    q2.copyTo(q1);
    tmp.copyTo(q2);

    if (debug_show) {
        std::vector<cv::Mat> planes;
        cv::Mat fft_mag;
        cv::split(img_fft, planes);                   // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
        cv::magnitude(planes[0], planes[1], fft_mag);
        fft_mag += cv::Scalar::all(1); // switch to logarithmic scale
        cv::log(fft_mag, fft_mag);
        ShowColorMat("fft", fft_mag, true, 1, false, 0, 20);
    }

    // Block the low frequencies
    cv::mulSpectrums(img_fft, filter_mask, img_fft, 0); // multiply 2 spectrums

    if (debug_show) {
        std::vector<cv::Mat> planes;
        cv::Mat fft_mag;
        cv::split(img_fft, planes);                   // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
        cv::magnitude(planes[0], planes[1], fft_mag);
        fft_mag += cv::Scalar::all(1); // switch to logarithmic scale
        cv::log(fft_mag, fft_mag);
        ShowColorMat("fft block low frequencies", fft_mag, true, 1, false, 0, 20);
    }

    // shuffle the quadrants to their original position
    cv::Mat p0(img_fft, cv::Rect(0, 0, cx, cy));    // Top-Left - Create a ROI per quadrant
    cv::Mat p1(img_fft, cv::Rect(cx, 0, cx, cy));   // Top-Right
    cv::Mat p2(img_fft, cv::Rect(0, cy, cx, cy));   // Bottom-Left
    cv::Mat p3(img_fft, cv::Rect(cx, cy, cx, cy));  // Bottom-Right

    p0.copyTo(tmp);
    p3.copyTo(p0);
    tmp.copyTo(p3);

    p1.copyTo(tmp);  // swap quadrant (Top-Right with Bottom-Left)
    p2.copyTo(p1);
    tmp.copyTo(p2);

    cv::dft(img_fft, img_fft, cv::DFT_SCALE | cv::DFT_INVERSE);

    std::vector<cv::Mat> complex_number;
    cv::Mat img_blur;
    cv::split(img_fft, complex_number); // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
    magnitude(complex_number[0], complex_number[1], img_blur); // abs of complex number


    double min_val, max_val;
    cv::minMaxLoc(img_blur, &min_val, &max_val);

    if (max_val <= 0.f) {
        blur_value = 100;
        // black image
        return true;
    }

    cv::log(img_blur, img_blur);
    blur_value = cv::mean(img_blur)[0] * 20.f;

    return blur_value < thresh;
}

int main(int argc, char *argv[]) {
    float blur_thresh = 14; 
    float filter_mask_radius = 50;
    cv::Size mask_size = cv::Size(640, 360);

    cv::Mat img = cv::Mat::zeros(cv::Size(1280, 720), CV_8UC1);
    cv::Mat filter_mask = FilterMask(filter_mask_radius, mask_size);
    float blur_value;
    BlurDetection(img, filter_mask, blur_value, blur_thresh, true); 
    cv::imshow("input", img);
    std::cout<<blur_value<<std::endl;
    cv::waitKey(0);

    return 1;
}

Validation

radius of filter_mask = 50 size of filter_mask = (640, 360) blur_thresh = 16

Input	FFT	FFT without low frq.	Blur Value	Blur?
			16.24	False
			11.67	True
			16.86	False
			12.91	True

https://pyimagesearch.com/2015/09/07/blur-detection-with-opencv/↩︎
https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/↩︎
https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/↩︎
https://github.com/Qengineering/Blur-detection-with-FFT-in-C/blob/master/main.cpp↩︎

Region-based 3D Object Tracking

Posted on 2023-05-08 In Tech , AR Disqus:

3D物体跟踪算法SRT3D的研究笔记，包括以下论文的内容：

Papers
Cremers, D., Rousson, M., Deriche, R., 2007. A Review of Statistical Approaches to Level Set Segmentation: Integrating Color, Texture, Motion and Shape. Int J Comput Vision 72, 195–215. https://doi.org/10.1007/s11263-006-8711-1
Bibby, C., Reid, I., 2008. Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors, in: Forsyth, D., Zisserman, A. (Eds.), Computer Vision – ECCV 2008, Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 831–844. https://doi.org/10.1007/978-3-540-88688-4_61
Hexner, J., Hagege, R.R., 2016. 2D-3D Pose Estimation of Heterogeneous Objects Using a Region Based Approach. Int J Comput Vis 118, 95–112. https://doi.org/10.1007/s11263-015-0873-2
Tjaden, H., Schwanecke, U., Schomer, E., 2017. Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms, in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented at the 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp. 124–132. https://doi.org/10.1109/ICCV.2017.23
Kehl, W., Tombari, F., Ilic, S., Navab, N., 2017. Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp. 465–473. https://doi.org/10.1109/CVPR.2017.57
Tjaden, H., Schwanecke, U., Schömer, E., Cremers, D., 2019. A Region-based Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1797–1812. https://doi.org/10.1109/TPAMI.2018.2884990
Huang, H., Zhong, F., Qin, X., 2021. Pixel-Wise Weighted Region-Based 3D Object Tracking using Contour Constraints. IEEE Trans. Visual. Comput. Graphics 1–1. https://doi.org/10.1109/TVCG.2021.3085197
Stoiber, M., Pfanne, M., Strobl, K.H., Triebel, R. and Albu-Schäffer, A., 2020. A sparse gaussian approach to region-based 6DoF object tracking. In Proceedings of the Asian Conference on Computer Vision.

研究笔记： SRT3D三维物体跟踪算法研究笔记

macOS M1平台下编译使用MNN

Posted on 2023-03-23 In Tech , Mac Disqus:

Build MNN ¹ ²

编译宏介绍

git clone git@github.com:alibaba/MNN.git
cd MNN
./schema/generate.sh
mkdir build && cd build
cmake -D MNN_METAL=ON -D MNN_ARM82=ON -D MNN_SUPPORT_BF16=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_TORCH=ON -D MNN_BUILD_TOOLS=ON -D MNN_BUILD_QUANTOOLS=ON ..

-- 3.19.0.0
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- 	System: Darwin
-- 	Processor: arm64
-- 	Version: 2.4.1
-- 	Metal: ON
-- 	OpenCL: OFF
-- 	OpenGL: OFF
-- 	Vulkan: OFF
-- 	ARM82: ON
-- 	oneDNN: OFF
-- 	TensorRT: OFF
-- 	CoreML: OFF
-- 	NNAPI: OFF
-- 	CUDA: OFF
-- 	OpenMP: OFF
-- 	BF16: ON
-- 	ThreadPool: ON
-- 	Hidden: TRUE
-- 	Build Path: /Users/luohanjie/Softwares/MNN/build_mac
-- 	CUDA PROFILE: OFF
-- WIN_USE_ASM: 
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Onnx: 
-- LibTorch Path is : /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake
CMake Warning at /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  tools/converter/source/torch/CMakeLists.txt:35 (find_package)
  tools/converter/CMakeLists.txt:33 (include)


-- Found Torch: /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/lib/libtorch.dylib  
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_mac

make -j20

如果在编译CONVERTER时出现#error C++17 or later compatible compiler is required to use ATen.的问题，可以在tools/converter/source/torch/CMakeLists.txt中添加target_compile_options(MNNConverterTorch PRIVATE "-std=c++17")解决。

模型转换 ³

Usage:
  MNNConvert [OPTION...]

  -h, --help                    Convert Other Model Format To MNN Model

  -v, --version                 显示当前转换器版本
  
  -f, --framework arg           需要进行转换的模型类型, ex: [TF,CAFFE,ONNX,TFLITE,MNN,TORCH, JSON]
  
      --modelFile arg           需要进行转换的模型文件名, ex: *.pb,*caffemodel
      
      --prototxt arg            caffe模型结构描述文件, ex: *.prototxt
      
      --MNNModel arg            转换之后保存的MNN模型文件名, ex: *.mnn
      
      --fp16                    将conv/matmul/LSTM的float32参数保存为float16，
      													模型将减小一半，精度基本无损
      
      --benchmarkModel          不保存模型中conv/matmul/BN等层的参数，仅用于benchmark测试
      
      --bizCode arg             MNN模型Flag, ex: MNN
      
      --debug                   使用debug模型显示更多转换信息
      
      --forTraining             保存训练相关算子，如BN/Dropout，default: false
      
      --weightQuantBits arg     arg=2~8，此功能仅对conv/matmul/LSTM的float32权值进行量化，
      													仅优化模型大小，加载模型后会解码为float32，量化位宽可选2~8，
                                运行速度和float32模型一致。8bit时精度基本无损，模型大小减小4倍
                                default: 0，即不进行权值量化
      
      --compressionParamsFile arg
                                使用MNN模型压缩工具箱生成的模型压缩信息文件
                                
      --saveStaticModel         固定输入形状，保存静态模型， default: false
      
      --inputConfigFile arg     保存静态模型所需要的配置文件, ex: ~/config.txt。文件格式为：
                                input_names = input0,input1
                                input_dims = 1x3x224x224,1x3x64x64
      --JsonFile arg            当-f MNN并指定JsonFile时，可以将MNN模型转换为Json文件
      --info                    当-f MNN时，打印模型基本信息（输入名、输入形状、输出名、模型版本等）
      --testdir arg             测试转换 MNN 之后，MNN推理结果是否与原始模型一致。
                                arg 为测试数据的文件夹，生成方式参考 "正确性校验" 一节
      --thredhold arg           当启用 --testdir 后，设置正确性校验的误差允可范围
                                若不设置，默认是 0.01
      --saveExternalData        将权重，常量等数据存储在额外文件中，默认为`false`

TorchScript to MNN

import torch
# ...
#  model is exported model
model.eval()
# trace
model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')
# script
model_script = torch.jit.script(model)
model_script.save('model_script.pt')

1	./build/MNNConvert -f TORCH --modelFile XXX.pt --MNNModel XXX.mnn --bizCode biz

ONNX to MNN

import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device="cpu")
model = torchvision.models.alexnet(pretrained=True).cpu()

# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)

1	./MNNConvert -f ONNX --modelFile XXX.onnx --MNNModel XXX.mnn --bizCode biz

正确性校验

以onnx网络为例子。

1 2	conda install onnxruntime python ./../tools/script/testMNNFromOnnx.py SRC.onnx

当结果中显示TEST_SUCCESS时，就表示模型转换与推理没有错误。

c++ Cmake

将dpt_swin2_tiny_256.pt网络转为dpt_swin2_tiny_256.mnn。使用该网络生成深度图：

#include <MNN/Interpreter.hpp>
#include <MNN/Matrix.h>
#include <MNN/ImageProcess.hpp>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp>  //for cv::dnn::blobFromImage
#include <sys/time.h>

cv::Mat ShowMat(const cv::Mat& src) {
    double min;
    double max;
    cv::minMaxIdx(src, &min, &max);
    cv::Mat adjMap;

    float scale = 255 / (max - min);
    src.convertTo(adjMap, CV_8UC1, scale, -min * scale);

    cv::Mat falseColorsMap;
    cv::applyColorMap(adjMap, falseColorsMap, cv::COLORMAP_PINK);

    return falseColorsMap;
}

int main(int argc, char* argv[]) {
    std::string img_file = "/Users/luohanjie/Workspace/Vision/depth_estimation/MiDaS/input/squirrel_iphone_sample3.png";
    std::string model_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/dpt_swin2_tiny_256/dpt_swin2_tiny_256.mnn";

    cv::Mat img = cv::imread(img_file);
    if (img.empty()) {
        std::cout << "Can not load image: " << img_file << std::endl;
        return 0;
    }

    int width_ori = img.cols;
    int height_ori = img.rows;

    // Interpreter是模型数据的持有者；Session通过Interpreter创建，是推理数据的持有者。多个推理可以共用同一个模型，即，多个Session可以共用一个Interpreter。
    // 在创建完Session，且不再创建Session或更新训练模型数据时，Interpreter可以通过releaseModel函数释放模型数据，以节省内存。
    std::shared_ptr<MNN::Interpreter> net(MNN::Interpreter::createFromFile(model_file.c_str()), MNN::Interpreter::destroy);
    if (net == NULL) {
        std::cout << "Can not load model: " << model_file << std::endl;
        return 0;
    }

    // 函数返回的Session实例是由Interpreter管理，随着Interpreter销毁而释放，一般不需要关注。也可以在不再需要时，调用Interpreter::releaseSession释放，减少内存占用。
    // 创建Session 一般而言需要较长耗时，而Session在多次推理过程中可以重复使用，建议只创建一次多次使用。
    MNN::ScheduleConfig session_config;
    session_config.type = MNN_FORWARD_AUTO;

    // memory、power、precision分别为内存、功耗和精度偏好。支持这些选项的后端会在执行时做出相应调整；若不支持，则忽略选项。
    // 示例： 后端 OpenCL precision 为 Low 时，使用 fp16 存储与计算，计算结果与CPU计算结果有少量误差，实时性最好；precision 为 Normal 时，使用 fp16存储，计算时将fp16转为fp32计算，计算结果与CPU计算结果相近，实时性也较好；precision 为 High 时，使用 fp32 存储与计算，实时性下降，但与CPU计算结果保持一致。
    // 后端 CPU precision 为 Low 时，根据设备情况开启 FP16 计算 precision 为 Low_BF16 时，根据设备情况开启 BF16 计算
    // BackendConfig bnconfig;
    // bnconfig.precision = BackendConfig::Precision_Low;
    // config.backendConfig = &bnconfig;
    MNN::Session* session = net->createSession(session_config);

    // 获取输入/出tensor
    MNN::Tensor* input = net->getSessionInput(session, "input.1");
    MNN::Tensor* output = net->getSessionOutput(session, "3335");

    // NCHW
    std::vector<int> input_dims = input->shape();
    int input_n = input_dims[0];
    int input_c = input_dims[1];
    int input_h = input_dims[2];
    int input_w = input_dims[3];
    std::cout << "Model input_n: "<<input_n<<", input_c: " << input_c<<", input_h: " << input_h << ", input_w: " << input_w << std::endl;

    // CHW
    std::vector<int> output_dims = output->shape();
    int output_c = output_dims[0];
    int output_h = output_dims[1];
    int output_w = output_dims[2];
    std::cout << "Model output_c: "<<output_c<<", output_h: " << output_h << ", output_w: " << output_w << std::endl;

    // x = (x / 255 - mean) / std
    // opencv: x = alpha * x + beta = (x / 255 - mean) / std = x / (255 * std) - mean / std
    // so:  alpha = 1 / (255 * std); beta = - mean / std
    float mean = 0.5f;
    float std = 0.5f;

    // N代表数量， C代表channel，H代表高度，W代表宽度。
    // NCHW其实代表的是[W H C N]， 第一个元素是000，第二个元素是沿着w方向的，即001，这样下去002 003，再接着呢就是沿着H方向，即004 005 006 007…这样到019后，沿C方向，轮到了020，之后021 022 …一直到319，然后再沿N方向。
    // NHWC代表的是[C W H N]， 第一个元素是000，第二个沿C方向，即020，040, 060…一直到300，之后沿W方向，001 021 041 061…301…到了303后，沿H方向，即004 024 …304.。最后到了319，变成N方向，320,340…
    // 当在不同的硬件加速的情况下，选用的类型不同，在intel GPU加速的情况下，因为GPU对于图像的处理比较多，希望在访问同一个channel的像素是连续的，一般存储选用NCHW，这样在做CNN的时候，在访问内存的时候就是连续的了，比较方便；
    // 所以在深度学习的时候，推理的前处理，一般都是将RGB或BGR图像进行转变为NCHW的格式；通常我们用opencv读取图像是NHWC的格式，需要进行通道分离，因为网路是一个通道一个通道的对图像做卷积，提取feature，所以NCHW更适合CNN。
    // https://blog.csdn.net/u010368556/article/details/105423260
    // caffe: NCHW;
    // pytorch: NCHW;
    // mxnet: NCHW;
    // 海思bgr: NCHW; 
    // NCNN: CHW
    // tensorflow: NHWC
    // opencv: NHWC
    // 瑞芯微rknn: NHWC
    // scipy.misc: NHW
    // https://www.cnblogs.com/yongy1030/p/11728103.html
    //convert NHWC to NCHW
    cv::Mat img_nchw;
    cv::dnn::blobFromImage(img, img_nchw,  1 / (255 * std), cv::Size(input_w, input_h), - mean / std, true);   //convert HWC to NCHW

    MNN::Tensor* tensor_nchw = new MNN::Tensor(input, MNN::Tensor::CAFFE);
    MNN::Tensor* tensor_depth = new MNN::Tensor(output, MNN::Tensor::CAFFE);

    memcpy(tensor_nchw->host<float>(), img_nchw.data, tensor_nchw->size());

    input->copyFromHostTensor(tensor_nchw);
    
    net->runSession(session);

    output->copyToHostTensor(tensor_depth);

    cv::Mat img_depth(output_h, output_w, CV_32FC1);   //difine opencv out img
    memcpy(img_depth.data, tensor_depth->host<float>(), tensor_depth->size());     //copy to output_img

    cv::resize(img_depth, img_depth, cv::Size(width_ori, height_ori));

    cv::Mat img_show = ShowMat(img_depth);
    cv::imshow("img_depth", img_show);
    cv::waitKey(0);

    delete tensor_nchw;
    delete tensor_depth;

    return 1;
}

project(TEST_MNN)

cmake_minimum_required(VERSION 3.24)

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "Detected processor: ${CMAKE_SYSTEM_PROCESSOR}")

set(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
set(LIBRARY_OUTPUT_PATH ${PROJECT_BINARY_DIR}/lib)

if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE Release)
endif()

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++17 -Wall")
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm.*|ARM.*|aarch64.*|AARCH64.*)")
    if (APPLE)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -mcpu=apple-m1 -mtune=native")
    else()
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -march=armv8-a+fp+simd+crypto")
    endif()
else()
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native -fopenmp")
endif()

#=======================================================================================

set(MNN_SRC /Users/luohanjie/Softwares/MNN)
set(MNN_LIBS ${MNN_SRC}/build_mac/libMNN.dylib)
set(MNN_INCLUDE_DIRS ${MNN_SRC}/include)


#=======================================================================================

find_package(OpenCV REQUIRED)

include_directories(${MNN_INCLUDE_DIRS} ${OpenCV_INCLUDE_DIRS})

link_directories(
    ${OpenCV_LIBRARY_DIRS}
)

add_executable(test_mnn test_mnn.cpp)
target_link_libraries(test_mnn ${MNN_LIBS} ${OpenCV_LIBS})

Cross Compiling for Android NDK

cd MNN
./schema/generate.sh
mkdir build_android && cd build_android
export ANDROID_NDK=/Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393
cmake -D CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
      -D CMAKE_BUILD_TYPE=Release \
      -D ANDROID_ABI="arm64-v8a" \
      -D ANDROID_STL=c++_shared \
      -D MNN_USE_LOGCAT=OFF \
      -D MNN_BUILD_BENCHMARK=OFF \
      -D MNN_USE_SSE=OFF \
      -D MNN_VULKAN=ON \
      -D MNN_OPENCL=ON \
      -D MNN_OPENGL=ON \
      -D MNN_ARM82=ON \
      -D MNN_SUPPORT_BF16=OFF \
      -D MNN_BUILD_TEST=OFF \
      -D ANDROID_NATIVE_API_LEVEL=android-29  \
      -D MNN_BUILD_FOR_ANDROID_COMMAND=OFF \
      -D NATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. $1 $2 $3 ..

-- The C compiler identification is Clang 14.0.6
-- The CXX compiler identification is Clang 14.0.6
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/homebrew/Caskroom/miniforge/base/envs/tf/bin/python (found version "3.10.9") 
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- 	System: Android
-- 	Processor: aarch64
-- 	Version: 2.4.1
-- 	Metal: OFF
-- 	OpenCL: ON
-- 	OpenGL: ON
-- 	Vulkan: ON
-- 	ARM82: ON
-- 	oneDNN: OFF
-- 	TensorRT: OFF
-- 	CoreML: OFF
-- 	NNAPI: OFF
-- 	CUDA: OFF
-- 	OpenMP: OFF
-- 	BF16: OFF
-- 	ThreadPool: ON
-- 	Hidden: TRUE
-- 	Build Path: /Users/luohanjie/Softwares/MNN/build_android
-- 	CUDA PROFILE: OFF
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_android

make -j20

https://mnn-docs.readthedocs.io/en/latest/compile/engine.html↩︎
https://www.yuque.com/mnn/en/build_linux↩︎
https://mnn-docs.readthedocs.io/en/latest/tools/convert.html↩︎