Hanjie's Blog

一只有理想的羊驼

Background

我们希望对输入图片进行检测,判断图片是否清晰。传统方法中,我们可以通过使用Laplacian算子,求输入图片的二阶导图,得到图像的边缘信息。然后对二阶导图求方差,根据方差值的大小可以判断出图像模糊的模糊程度1。方差值越小,图像越模糊。

可是该方法存在的问题是,很难确定一个阀值来区分清晰和模糊图片。在不同的场景中,清晰与模糊之间的阀值会发生变化。

The downside is that the Laplacian method required significant manual tuning to define the “threshold” at which an image was considered blurry or not. If you could control your lighting conditions, environment, and image capturing process, it worked quite well — but if not, you would obtain mixed results, to say the least.2

detecting_blur_result_006 detecting_blur_result_004

我们参考了34中提出的方法,实现了一个基于傅立叶变换(FFT)的图片模糊检测算法。算法中,首先会将输入图片转变为频谱图。频谱图中,中心代表的是低频,往四面八方扩展后逐渐变为高频。通过屏蔽掉频谱图的中心区域,对图像实现高通滤波,保留图像中边缘等高频的信息。然后对频谱图求均值,求图片中平均高频幅值,通过该平均幅值来判断图像是否模糊。平均幅值越小,图像越模糊。

Code

  • 对于全黑图片,我们将Blur Value设置为100,并且认为是模糊图片。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
cv::Mat ColorMat(const cv::Mat &mat_float, bool draw_colorbar = false, const bool is_white_background = false, double min_val = 1, double max_val = 0, const cv::Mat &user_color = cv::Mat(), int colorbar_width = 50, int colorbar_gap = 5) {
if (min_val > max_val) {
cv::minMaxLoc(mat_float, &min_val, &max_val);
}

cv::Mat mat;
mat_float.convertTo(mat, CV_8UC1, 255 / (max_val - min_val), -255 * min_val / (max_val - min_val));

cv::Mat mat_show;

if (user_color.empty()) {
cv::applyColorMap(mat, mat_show, cv::COLORMAP_JET);
} else {
cv::applyColorMap(mat, mat_show, user_color);
}

if (is_white_background) {
cv::Mat mask;
cv::threshold(mat, mask, 0, 255, cv::THRESH_BINARY_INV);
cv::Mat img_white(mat.size(), CV_8UC3, cv::Scalar(255, 255, 255));
img_white.copyTo(mat_show, mask);
}

if (draw_colorbar) {
cv::Mat color_bar_value(cv::Size(colorbar_width, mat_show.rows), CV_8UC1);
cv::Mat color_bar;

for (int i = 0; i < mat_show.rows; i++) {
uchar value = 255 - 255 * float(i) / float(mat_show.rows);
for (int j = 0; j < colorbar_width; j++) {
color_bar_value.at<uchar>(i, j) = value;
}
}

if (user_color.empty()) {
cv::applyColorMap(color_bar_value, color_bar, cv::COLORMAP_JET);
} else {
cv::applyColorMap(color_bar_value, color_bar, user_color);
}

cv::Mat mat_colorbar_show(cv::Size(mat_show.cols + colorbar_width + colorbar_gap, mat_show.rows), CV_8UC3, cv::Scalar(255, 255, 255));

mat_show.copyTo(mat_colorbar_show(cv::Rect(0, 0, mat_show.cols, mat_show.rows)));
color_bar.copyTo(mat_colorbar_show(cv::Rect(mat_show.cols + colorbar_gap, 0, color_bar.cols, color_bar.rows)));

cv::putText(mat_colorbar_show, ToStr(max_val), cv::Point(mat_show.cols + colorbar_gap, 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 255, 255), 1);
cv::putText(mat_colorbar_show, ToStr(min_val), cv::Point(mat_show.cols + colorbar_gap, mat_show.rows - 10), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 255, 255), 1);

mat_show = mat_colorbar_show;
}

return mat_show;
};

cv::Mat ShowColorMat(const std::string &name, const cv::Mat &mat_float, bool draw_colorbar = false, const float scale = 1, const bool is_white_background = false, double min_val = 1, double max_val = 0, const cv::Mat &user_color = cv::Mat(), int colorbar_width = 50, int colorbar_gap = 5) {
cv::Mat mat_resize;
cv::resize(mat_float, mat_resize, cv::Size(), scale, scale, cv::INTER_NEAREST);
cv::Mat img = ColorMat(mat_resize, draw_colorbar, is_white_background, min_val, max_val, user_color, colorbar_width, colorbar_gap);
cv::imshow(name, img);
return img;
}

cv::Mat FilterMask(const float radius, const cv::Size &mask_size) {
cv::Mat mask = cv::Mat::ones(mask_size, CV_8UC1);
cv::circle(mask, cv::Point(mask_size.width / 2, mask_size.height / 2), radius, cv::Scalar(0), -1);

// https://datahacker.rs/opencv-discrete-fourier-transform-part2/
cv::Mat mask_float, filter_mask;
mask.convertTo(mask_float, CV_32F);
std::vector<cv::Mat> mask_merge = {mask_float, mask_float};
cv::merge(mask_merge, filter_mask);
return filter_mask;
}

bool BlurDetection(const cv::Mat &img,
const cv::Mat &filter_mask,
float &blur_value,
const float thresh = 10,
const bool debug_show = false) {
// https://pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/
// https://github.com/Qengineering/Blur-detection-with-FFT-in-C/blob/master/main.cpp
// https://docs.opencv.org/3.4/d8/d01/tutorial_discrete_fourier_transform.html

cv::Mat img_scale, img_gray, img_fft;
cv::resize(img, img_scale, filter_mask.size());

if (img_scale.channels() == 3) {
cv::cvtColor(img_scale, img_gray, cv::COLOR_BGR2GRAY);
} else {
img_gray = img_scale;
}
img_gray.convertTo(img_fft, CV_32F);

// If DFT_SCALE is set, the scaling is done after the transformation.
// When DFT_COMPLEX_OUTPUT is set, the output is a complex matrix of the same size as input.
cv::dft(img_fft, img_fft, cv::DFT_COMPLEX_OUTPUT); // cv::DFT_SCALE |

// # zero-out the center of the FFT shift (i.e., remove low
// # frequencies), apply the inverse shift such that the DC
// # component once again becomes the top-left, and then apply
// # the inverse FFT

// rearrange the quadrants of the result, so that the origin (zero, zero) corresponds with the image center.
int cx = img_gray.cols / 2;
int cy = img_gray.rows / 2;

// center low frequencies in the middle
// by shuffling the quadrants.
cv::Mat q0(img_fft, cv::Rect(0, 0, cx, cy)); // Top-Left - Create a ROI per quadrant
cv::Mat q1(img_fft, cv::Rect(cx, 0, cx, cy)); // Top-Right
cv::Mat q2(img_fft, cv::Rect(0, cy, cx, cy)); // Bottom-Left
cv::Mat q3(img_fft, cv::Rect(cx, cy, cx, cy)); // Bottom-Right

cv::Mat tmp; // swap quadrants (Top-Left with Bottom-Right)
q0.copyTo(tmp);
q3.copyTo(q0);
tmp.copyTo(q3);

q1.copyTo(tmp); // swap quadrant (Top-Right with Bottom-Left)
q2.copyTo(q1);
tmp.copyTo(q2);

if (debug_show) {
std::vector<cv::Mat> planes;
cv::Mat fft_mag;
cv::split(img_fft, planes); // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
cv::magnitude(planes[0], planes[1], fft_mag);
fft_mag += cv::Scalar::all(1); // switch to logarithmic scale
cv::log(fft_mag, fft_mag);
ShowColorMat("fft", fft_mag, true, 1, false, 0, 20);
}

// Block the low frequencies
cv::mulSpectrums(img_fft, filter_mask, img_fft, 0); // multiply 2 spectrums

if (debug_show) {
std::vector<cv::Mat> planes;
cv::Mat fft_mag;
cv::split(img_fft, planes); // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
cv::magnitude(planes[0], planes[1], fft_mag);
fft_mag += cv::Scalar::all(1); // switch to logarithmic scale
cv::log(fft_mag, fft_mag);
ShowColorMat("fft block low frequencies", fft_mag, true, 1, false, 0, 20);
}

// shuffle the quadrants to their original position
cv::Mat p0(img_fft, cv::Rect(0, 0, cx, cy)); // Top-Left - Create a ROI per quadrant
cv::Mat p1(img_fft, cv::Rect(cx, 0, cx, cy)); // Top-Right
cv::Mat p2(img_fft, cv::Rect(0, cy, cx, cy)); // Bottom-Left
cv::Mat p3(img_fft, cv::Rect(cx, cy, cx, cy)); // Bottom-Right

p0.copyTo(tmp);
p3.copyTo(p0);
tmp.copyTo(p3);

p1.copyTo(tmp); // swap quadrant (Top-Right with Bottom-Left)
p2.copyTo(p1);
tmp.copyTo(p2);

cv::dft(img_fft, img_fft, cv::DFT_SCALE | cv::DFT_INVERSE);

std::vector<cv::Mat> complex_number;
cv::Mat img_blur;
cv::split(img_fft, complex_number); // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
magnitude(complex_number[0], complex_number[1], img_blur); // abs of complex number


double min_val, max_val;
cv::minMaxLoc(img_blur, &min_val, &max_val);

if (max_val <= 0.f) {
blur_value = 100;
// black image
return true;
}

cv::log(img_blur, img_blur);
blur_value = cv::mean(img_blur)[0] * 20.f;

return blur_value < thresh;
}

int main(int argc, char *argv[]) {
float blur_thresh = 14;
float filter_mask_radius = 50;
cv::Size mask_size = cv::Size(640, 360);

cv::Mat img = cv::Mat::zeros(cv::Size(1280, 720), CV_8UC1);
cv::Mat filter_mask = FilterMask(filter_mask_radius, mask_size);
float blur_value;
BlurDetection(img, filter_mask, blur_value, blur_thresh, true);
cv::imshow("input", img);
std::cout<<blur_value<<std::endl;
cv::waitKey(0);

return 1;
}

Validation

radius of filter_mask = 50 size of filter_mask = (640, 360) blur_thresh = 16

Input FFT FFT without low frq. Blur Value Blur?
mesuem_18_1034 fft_mesuem_18_1034 fft_block_mesuem_18_1034 16.24 False
mesuem_18_1012 fft_mesuem_18_1012 fft_block_mesuem_18_1012 11.67 True
mesuem_18_18711 fft_mesuem_18_18711 fft_block_mesuem_18_18711 16.86 False
mesuem_18_18727 fft_mesuem_18_18727 fft_block_mesuem_18_18727 12.91 True

3D物体跟踪算法SRT3D的研究笔记,包括以下论文的内容:

Papers
Cremers, D., Rousson, M., Deriche, R., 2007. A Review of Statistical Approaches to Level Set Segmentation: Integrating Color, Texture, Motion and Shape. Int J Comput Vision 72, 195–215. https://doi.org/10.1007/s11263-006-8711-1
Bibby, C., Reid, I., 2008. Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors, in: Forsyth, D., Zisserman, A. (Eds.), Computer Vision – ECCV 2008, Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 831–844. https://doi.org/10.1007/978-3-540-88688-4_61
Hexner, J., Hagege, R.R., 2016. 2D-3D Pose Estimation of Heterogeneous Objects Using a Region Based Approach. Int J Comput Vis 118, 95–112. https://doi.org/10.1007/s11263-015-0873-2
Tjaden, H., Schwanecke, U., Schomer, E., 2017. Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms, in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented at the 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp. 124–132. https://doi.org/10.1109/ICCV.2017.23
Kehl, W., Tombari, F., Ilic, S., Navab, N., 2017. Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp. 465–473. https://doi.org/10.1109/CVPR.2017.57
Tjaden, H., Schwanecke, U., Schömer, E., Cremers, D., 2019. A Region-based Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1797–1812. https://doi.org/10.1109/TPAMI.2018.2884990
Huang, H., Zhong, F., Qin, X., 2021. Pixel-Wise Weighted Region-Based 3D Object Tracking using Contour Constraints. IEEE Trans. Visual. Comput. Graphics 1–1. https://doi.org/10.1109/TVCG.2021.3085197
Stoiber, M., Pfanne, M., Strobl, K.H., Triebel, R. and Albu-Schäffer, A., 2020. A sparse gaussian approach to region-based 6DoF object tracking. In Proceedings of the Asian Conference on Computer Vision.

研究笔记: SRT3D三维物体跟踪算法研究笔记

Build MNN 1 2

编译宏介绍

1
2
3
4
5
git clone git@github.com:alibaba/MNN.git
cd MNN
./schema/generate.sh
mkdir build && cd build
cmake -D MNN_METAL=ON -D MNN_ARM82=ON -D MNN_SUPPORT_BF16=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_TORCH=ON -D MNN_BUILD_TOOL=ON ..
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- 3.19.0.0
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Darwin
-- Processor: arm64
-- Version: 2.4.1
-- Metal: ON
-- OpenCL: OFF
-- OpenGL: OFF
-- Vulkan: OFF
-- ARM82: ON
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: ON
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /Users/luohanjie/Softwares/MNN/build_mac
-- CUDA PROFILE: OFF
-- WIN_USE_ASM:
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Onnx:
-- LibTorch Path is : /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake
CMake Warning at /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
tools/converter/source/torch/CMakeLists.txt:35 (find_package)
tools/converter/CMakeLists.txt:33 (include)


-- Found Torch: /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/torch/lib/libtorch.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_mac
1
make -j20

模型转换 3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Usage:
MNNConvert [OPTION...]

-h, --help Convert Other Model Format To MNN Model

-v, --version 显示当前转换器版本

-f, --framework arg 需要进行转换的模型类型, ex: [TF,CAFFE,ONNX,TFLITE,MNN,TORCH, JSON]

--modelFile arg 需要进行转换的模型文件名, ex: *.pb,*caffemodel

--prototxt arg caffe模型结构描述文件, ex: *.prototxt

--MNNModel arg 转换之后保存的MNN模型文件名, ex: *.mnn

--fp16 将conv/matmul/LSTM的float32参数保存为float16,
模型将减小一半,精度基本无损

--benchmarkModel 不保存模型中conv/matmul/BN等层的参数,仅用于benchmark测试

--bizCode arg MNN模型Flag, ex: MNN

--debug 使用debug模型显示更多转换信息

--forTraining 保存训练相关算子,如BN/Dropout,default: false

--weightQuantBits arg arg=2~8,此功能仅对conv/matmul/LSTM的float32权值进行量化,
仅优化模型大小,加载模型后会解码为float32,量化位宽可选2~8,
运行速度和float32模型一致。8bit时精度基本无损,模型大小减小4倍
default: 0,即不进行权值量化

--compressionParamsFile arg
使用MNN模型压缩工具箱生成的模型压缩信息文件

--saveStaticModel 固定输入形状,保存静态模型, default: false

--inputConfigFile arg 保存静态模型所需要的配置文件, ex: ~/config.txt。文件格式为:
input_names = input0,input1
input_dims = 1x3x224x224,1x3x64x64
--JsonFile arg 当-f MNN并指定JsonFile时,可以将MNN模型转换为Json文件
--info 当-f MNN时,打印模型基本信息(输入名、输入形状、输出名、模型版本等)
--testdir arg 测试转换 MNN 之后,MNN推理结果是否与原始模型一致。
arg 为测试数据的文件夹,生成方式参考 "正确性校验" 一节
--thredhold arg 当启用 --testdir 后,设置正确性校验的误差允可范围
若不设置,默认是 0.01
--saveExternalData 将权重,常量等数据存储在额外文件中,默认为`false`

TorchScript to MNN

1
2
3
4
5
6
7
8
9
10
import torch
# ...
# model is exported model
model.eval()
# trace
model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')
# script
model_script = torch.jit.script(model)
model_script.save('model_script.pt')
1
./build/MNNConvert -f TORCH --modelFile XXX.pt --MNNModel XXX.mnn --bizCode biz

ONNX to MNN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device="cpu")
model = torchvision.models.alexnet(pretrained=True).cpu()

# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
1
./MNNConvert -f ONNX --modelFile XXX.onnx --MNNModel XXX.mnn --bizCode biz

正确性校验

onnx网络为例子。

1
2
conda install onnxruntime
python ./../tools/script/testMNNFromOnnx.py SRC.onnx

当结果中显示TEST_SUCCESS时,就表示模型转换与推理没有错误。

c++ Cmake

dpt_swin2_tiny_256.pt网络转为dpt_swin2_tiny_256.mnn。使用该网络生成深度图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#include <MNN/Interpreter.hpp>
#include <MNN/Matrix.h>
#include <MNN/ImageProcess.hpp>
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp> //for cv::dnn::blobFromImage
#include <sys/time.h>

cv::Mat ShowMat(const cv::Mat& src) {
double min;
double max;
cv::minMaxIdx(src, &min, &max);
cv::Mat adjMap;

float scale = 255 / (max - min);
src.convertTo(adjMap, CV_8UC1, scale, -min * scale);

cv::Mat falseColorsMap;
cv::applyColorMap(adjMap, falseColorsMap, cv::COLORMAP_PINK);

return falseColorsMap;
}

int main(int argc, char* argv[]) {
std::string img_file = "/Users/luohanjie/Workspace/Vision/depth_estimation/MiDaS/input/squirrel_iphone_sample3.png";
std::string model_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/dpt_swin2_tiny_256/dpt_swin2_tiny_256.mnn";

cv::Mat img = cv::imread(img_file);
if (img.empty()) {
std::cout << "Can not load image: " << img_file << std::endl;
return 0;
}

int width_ori = img.cols;
int height_ori = img.rows;

// Interpreter是模型数据的持有者;Session通过Interpreter创建,是推理数据的持有者。多个推理可以共用同一个模型,即,多个Session可以共用一个Interpreter。
// 在创建完Session,且不再创建Session或更新训练模型数据时,Interpreter可以通过releaseModel函数释放模型数据,以节省内存。
std::shared_ptr<MNN::Interpreter> net(MNN::Interpreter::createFromFile(model_file.c_str()), MNN::Interpreter::destroy);
if (net == NULL) {
std::cout << "Can not load model: " << model_file << std::endl;
return 0;
}

// 函数返回的Session实例是由Interpreter管理,随着Interpreter销毁而释放,一般不需要关注。也可以在不再需要时,调用Interpreter::releaseSession释放,减少内存占用。
// 创建Session 一般而言需要较长耗时,而Session在多次推理过程中可以重复使用,建议只创建一次多次使用。
MNN::ScheduleConfig session_config;
session_config.type = MNN_FORWARD_AUTO;

// memory、power、precision分别为内存、功耗和精度偏好。支持这些选项的后端会在执行时做出相应调整;若不支持,则忽略选项。
// 示例: 后端 OpenCL precision 为 Low 时,使用 fp16 存储与计算,计算结果与CPU计算结果有少量误差,实时性最好;precision 为 Normal 时,使用 fp16存储,计算时将fp16转为fp32计算,计算结果与CPU计算结果相近,实时性也较好;precision 为 High 时,使用 fp32 存储与计算,实时性下降,但与CPU计算结果保持一致。
// 后端 CPU precision 为 Low 时,根据设备情况开启 FP16 计算 precision 为 Low_BF16 时,根据设备情况开启 BF16 计算
// BackendConfig bnconfig;
// bnconfig.precision = BackendConfig::Precision_Low;
// config.backendConfig = &bnconfig;
MNN::Session* session = net->createSession(session_config);

// 获取输入/出tensor
MNN::Tensor* input = net->getSessionInput(session, "input.1");
MNN::Tensor* output = net->getSessionOutput(session, "3335");

// NCHW
std::vector<int> input_dims = input->shape();
int input_n = input_dims[0];
int input_c = input_dims[1];
int input_h = input_dims[2];
int input_w = input_dims[3];
std::cout << "Model input_n: "<<input_n<<", input_c: " << input_c<<", input_h: " << input_h << ", input_w: " << input_w << std::endl;

// CHW
std::vector<int> output_dims = output->shape();
int output_c = output_dims[0];
int output_h = output_dims[1];
int output_w = output_dims[2];
std::cout << "Model output_c: "<<output_c<<", output_h: " << output_h << ", output_w: " << output_w << std::endl;

// x = (x / 255 - mean) / std
// opencv: x = alpha * x + beta = (x / 255 - mean) / std = x / (255 * std) - mean / std
// so: alpha = 1 / (255 * std); beta = - mean / std
float mean = 0.5f;
float std = 0.5f;

// N代表数量, C代表channel,H代表高度,W代表宽度。
// NCHW其实代表的是[W H C N], 第一个元素是000,第二个元素是沿着w方向的,即001,这样下去002 003,再接着呢就是沿着H方向,即004 005 006 007…这样到019后,沿C方向,轮到了020,之后021 022 …一直到319,然后再沿N方向。
// NHWC代表的是[C W H N], 第一个元素是000,第二个沿C方向,即020,040, 060…一直到300,之后沿W方向,001 021 041 061…301…到了303后,沿H方向,即004 024 …304.。最后到了319,变成N方向,320,340…
// 当在不同的硬件加速的情况下,选用的类型不同,在intel GPU加速的情况下,因为GPU对于图像的处理比较多,希望在访问同一个channel的像素是连续的,一般存储选用NCHW,这样在做CNN的时候,在访问内存的时候就是连续的了,比较方便;
// 所以在深度学习的时候,推理的前处理,一般都是将RGB或BGR图像进行转变为NCHW的格式;通常我们用opencv读取图像是NHWC的格式,需要进行通道分离,因为网路是一个通道一个通道的对图像做卷积,提取feature,所以NCHW更适合CNN。
// https://blog.csdn.net/u010368556/article/details/105423260
// caffe: NCHW;
// pytorch: NCHW;
// mxnet: NCHW;
// 海思bgr: NCHW;
// NCNN: CHW
// tensorflow: NHWC
// opencv: NHWC
// 瑞芯微rknn: NHWC
// scipy.misc: NHW
// https://www.cnblogs.com/yongy1030/p/11728103.html
//convert NHWC to NCHW
cv::Mat img_nchw;
cv::dnn::blobFromImage(img, img_nchw, 1 / (255 * std), cv::Size(input_w, input_h), - mean / std, true); //convert HWC to NCHW

MNN::Tensor* tensor_nchw = new MNN::Tensor(input, MNN::Tensor::CAFFE);
MNN::Tensor* tensor_depth = new MNN::Tensor(output, MNN::Tensor::CAFFE);

memcpy(tensor_nchw->host<float>(), img_nchw.data, tensor_nchw->size());

input->copyFromHostTensor(tensor_nchw);

net->runSession(session);

output->copyToHostTensor(tensor_depth);

cv::Mat img_depth(output_h, output_w, CV_32FC1); //difine opencv out img
memcpy(img_depth.data, tensor_depth->host<float>(), tensor_depth->size()); //copy to output_img

cv::resize(img_depth, img_depth, cv::Size(width_ori, height_ori));

cv::Mat img_show = ShowMat(img_depth);
cv::imshow("img_depth", img_show);
cv::waitKey(0);

delete tensor_nchw;
delete tensor_depth;

return 1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
project(TEST_MNN)

cmake_minimum_required(VERSION 3.24)

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message(STATUS "Detected processor: ${CMAKE_SYSTEM_PROCESSOR}")

set(EXECUTABLE_OUTPUT_PATH ${PROJECT_BINARY_DIR}/bin)
set(LIBRARY_OUTPUT_PATH ${PROJECT_BINARY_DIR}/lib)

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3 -std=c++17 -Wall")
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm.*|ARM.*|aarch64.*|AARCH64.*)")
if (APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -mcpu=apple-m1 -mtune=native")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D__ARM_NEON__ -DENABLE_NEON -Wno-unused-result -march=armv8-a+fp+simd+crypto")
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native -fopenmp")
endif()

#=======================================================================================

set(MNN_SRC /Users/luohanjie/Softwares/MNN)
set(MNN_LIBS ${MNN_SRC}/build_mac/libMNN.dylib)
set(MNN_INCLUDE_DIRS ${MNN_SRC}/include)


#=======================================================================================

find_package(OpenCV REQUIRED)

include_directories(${MNN_INCLUDE_DIRS} ${OpenCV_INCLUDE_DIRS})

link_directories(
${OpenCV_LIBRARY_DIRS}
)

add_executable(test_mnn test_mnn.cpp)
target_link_libraries(test_mnn ${MNN_LIBS} ${OpenCV_LIBS})

Cross Compiling for Android NDK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cd MNN
./schema/generate.sh
mkdir build_android && cd build_android
export ANDROID_NDK=/Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393
cmake -D CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-D CMAKE_BUILD_TYPE=Release \
-D ANDROID_ABI="arm64-v8a" \
-D ANDROID_STL=c++_shared \
-D MNN_USE_LOGCAT=OFF \
-D MNN_BUILD_BENCHMARK=OFF \
-D MNN_USE_SSE=OFF \
-D MNN_VULKAN=ON \
-D MNN_OPENCL=ON \
-D MNN_OPENGL=ON \
-D MNN_ARM82=ON \
-D MNN_SUPPORT_BF16=OFF \
-D MNN_BUILD_TEST=OFF \
-D ANDROID_NATIVE_API_LEVEL=android-29 \
-D MNN_BUILD_FOR_ANDROID_COMMAND=OFF \
-D NATIVE_LIBRARY_OUTPUT=. -DNATIVE_INCLUDE_OUTPUT=. $1 $2 $3 ..
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- The C compiler identification is Clang 14.0.6
-- The CXX compiler identification is Clang 14.0.6
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/luohanjie/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PythonInterp: /opt/homebrew/Caskroom/miniforge/base/envs/tf/bin/python (found version "3.10.9")
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Android
-- Processor: aarch64
-- Version: 2.4.1
-- Metal: OFF
-- OpenCL: ON
-- OpenGL: ON
-- Vulkan: ON
-- ARM82: ON
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: OFF
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /Users/luohanjie/Softwares/MNN/build_android
-- CUDA PROFILE: OFF
-- Enabling AArch64 Assemblies
-- Enable INT8 SDOT
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luohanjie/Softwares/MNN/build_android
1
make -j20
0%