Hanjie's Blog

一只有理想的羊驼

1
brew install abseil google-benchmark
1
2
3
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
cd tensorflow_src
git checkout v2.9.3

更高级的版本可能会出现编译错误,或者调用gpu时出现问题。

修改tensorflow/lite/c/CMakeLists.txt中的common.c,改为common.cc1

1
2
3
4
mkdir build_mac
cd build_mac
cmake ../tensorflow/lite/c -D TFLITE_KERNEL_TEST=ON -D TFLITE_ENABLE_GPU=ON -D ABSL_PROPAGATE_CXX_STD=ON -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -D LIBRARY_OUTPUT_PATH=/Users/luohanjie/Softwares/tensorflow_src/build_mac/lib
cmake --build . -j

编译测试程序benchmark_model,测试模型model_opt.tflite,使用cpu:

1
cmake --build . -j -t benchmark_model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
./tensorflow-lite/tools/benchmark/benchmark_model --graph=/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite --verbose=true --num_threads=4 -use_gpu=false

STARTING!
Log parameter values verbosely: [1]
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Number of prorated runs per second: [-1]
Num threads: [4]
Use caching: [0]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Run w/o invoking kernels: [0]
Report the peak memory footprint: [0]
Memory footprint check interval (ms): [50]
Graph: [/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input value files: []
Allow fp16: [0]
Require full delegation: [0]
Enable op profiling: [0]
Max initial profiling buffer entries: [1024]
Allow dynamic increase on profiling buffer entries: [0]
CSV File to export profiling data to: []
Print pre-invoke interpreter state: [0]
Print post-invoke interpreter state: [0]
Release dynamic tensor memory: [0]
Use dynamic tensor for large tensors: [0]
print out all supported flags: [0]
#threads used for CPU inference: [4]
Max number of delegated partitions: [0]
Min nodes per partition: [0]
Directory for delegate serialization: []
Model-specific token/key for delegate serialization.: []
Use xnnpack: [0]
External delegate path: []
External delegate options: []
Use gpu: [0]
Allow lower precision in gpu: [1]
Enable running quant models in gpu: [1]
Prefer maximizing the throughput in gpu: [0]
GPU backend: []
Loaded model /Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 66.3383
Initialized session in 41.498ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=13 first=43827 curr=38759 min=38662 max=45293 avg=39973.3 std=1998

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=39240 curr=38747 min=38470 max=40766 avg=39654.3 std=635

Inference timings in us: Init: 41498, First inference: 43827, Warmup (avg): 39973.3, Inference (avg): 39654.3

使用gpu:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
./tensorflow-lite/tools/benchmark/benchmark_model --graph=/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite --verbose=true --num_threads=4 --use_gpu=true

STARTING!
Log parameter values verbosely: [1]
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Number of prorated runs per second: [-1]
Num threads: [4]
Use caching: [0]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Run w/o invoking kernels: [0]
Report the peak memory footprint: [0]
Memory footprint check interval (ms): [50]
Graph: [/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input value files: []
Allow fp16: [0]
Require full delegation: [0]
Enable op profiling: [0]
Max initial profiling buffer entries: [1024]
Allow dynamic increase on profiling buffer entries: [0]
CSV File to export profiling data to: []
Print pre-invoke interpreter state: [0]
Print post-invoke interpreter state: [0]
Release dynamic tensor memory: [0]
Use dynamic tensor for large tensors: [0]
print out all supported flags: [0]
#threads used for CPU inference: [4]
Max number of delegated partitions: [0]
Min nodes per partition: [0]
Directory for delegate serialization: []
Model-specific token/key for delegate serialization.: []
Use xnnpack: [0]
External delegate path: []
External delegate options: []
Use gpu: [1]
Allow lower precision in gpu: [1]
Enable running quant models in gpu: [1]
Prefer maximizing the throughput in gpu: [0]
GPU backend: []
Loaded model /Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 66.3383
Initialized session in 129.521ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=40 first=40053 curr=11752 min=11744 max=40053 avg=12579.9 std=4400

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=85 first=11880 curr=11836 min=11567 max=12276 avg=11839.5 std=93

Inference timings in us: Init: 129521, First inference: 40053, Warmup (avg): 12579.9, Inference (avg): 11839.5

  1. https://github.com/tensorflow/tensorflow/pull/54566↩︎

安装

下载Vulkan Sdk,双击并且安装。

卸载方法:sudo path_to_vulkan_sdk/uninstall.sh

1
brew install ncnn

Cmake测试程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# #OpenMP flags for MACOS
if (APPLE)
if (CMAKE_C_COMPILER_ID MATCHES "Clang")
set(OpenMP_C "${CMAKE_C_COMPILER}")
set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I/opt/homebrew/opt/libomp/include")
set(OpenMP_C_LIB_NAMES "libomp")
set(OpenMP_libomp_LIBRARY "/opt/homebrew/opt/libomp/lib/libomp.dylib")
endif ()
if (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(OpenMP_CXX "${CMAKE_CXX_COMPILER}")
set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I/opt/homebrew/opt/libomp/include")
set(OpenMP_CXX_LIB_NAMES "libomp")
set(OpenMP_libomp_LIBRARY "/opt/homebrew/opt/libomp/lib/libomp.dylib")
endif ()
endif ()

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")

find_package(ncnn REQUIRED)
find_package(OpenCV REQUIRED)

include_directories(${ncnn_INCLUDE} ${OpenCV_LIBRARY_DIRS})

add_executable(test_ncnn test_ncnn.cpp)
target_link_libraries(test_ncnn ncnn ${OpenCV_LIBS})

测试网络为midas_v21_small-int8

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include "net.h"
#include "mat.h"
#include "cpu.h"
#include <opencv2/opencv.hpp>
#include <sys/time.h>

int main(int argc, char* argv[]) {
std::string img_file = "/Users/luohanjie/Workspace/Vision/depth_estimation/MiDaS/input/squirrel_iphone_sample3.png";
std::string param_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/midas_v21_small-int8.param";
std::string model_file = "/Users/luohanjie/Workspace/Vision/my_slam/data/models/midas_v21_small-int8.bin";
int target_size = 256;
float scale = 0.33333f;

cv::Mat img = cv::imread(img_file);
cv::resize(img, img, cv::Size(), scale, scale);

int img_width = img.cols;
int img_height = img.rows;

ncnn::Net net;
ncnn::set_cpu_powersave(0); // 0 = all cores enabled(default)
ncnn::set_omp_num_threads(ncnn::get_cpu_count());
net.opt = ncnn::Option();
net.opt.use_vulkan_compute = false;
net.opt.num_threads = ncnn::get_cpu_count();

net.load_param(param_file.c_str());
net.load_model(model_file.c_str());

// https://github.com/Tencent/ncnn/blob/master/docs/how-to-use-and-FAQ/use-ncnn-with-opencv.md
// cv::Mat CV_8UC3 -> ncnn::Mat 3 channel + swap RGB/BGR
ncnn::Mat img_in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR2RGB, img_width, img_height, target_size, target_size);

// substract_mean_normalize(const float* mean_vals, const float* norm_vals): substract channel-wise mean values, then multiply by normalize values, pass 0 to skip in ncnn.
const float mean_vals[3] = {123.675f, 116.28f, 103.53f};
const float norm_vals[3] = {0.01712475383f, 0.0175070028f, 0.01742919389f};
img_in.substract_mean_normalize(mean_vals, norm_vals);

ncnn::Extractor ex = net.create_extractor();
ex.set_light_mode(true);

ncnn::Mat img_out;

ex.input("input.1", img_in);
ex.extract("649", img_out);
ncnn::resize_bilinear(img_out, img_out, img_width, img_height);

cv::Mat cv_out(img_out.h, img_out.w, CV_8UC1);
img_out.to_pixels(cv_out.data, ncnn::Mat::PIXEL_GRAY);

cv::imshow("cv_out", cv_out);
cv::waitKey(0);
return 0;
}

使用root登陆,打开cron配置文件:

1
crontab -e

在文件末尾添加:

重启类型 代码
每天凌晨 2:30 重启 30 2 * * * /sbin/reboot
每3分钟重启一次 */3 * * * * /usr/sbin/reboot
每小时重启一次 0 * * * * /usr/sbin/reboot
每天重启一次 0 0 * * * /usr/sbin/reboot
每周重启一次 0 0 * * 0 /usr/sbin/reboot
每月重启一次 0 0 1 * * /usr/sbin/reboot
每年重启一次 0 0 1 1 * /usr/sbin/reboot

注意这里的时间格式是分钟 时 日 月 星期,依次对应上面的 30 2 *

重启服务:

1
service cron restart
0%