M1平台下使用Cmake对TensorFlow Lite进行编译

1
brew install abseil google-benchmark
1
2
3
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
cd tensorflow_src
git checkout v2.9.3

更高级的版本可能会出现编译错误,或者调用gpu时出现问题。

修改tensorflow/lite/c/CMakeLists.txt中的common.c,改为common.cc1

1
2
3
4
mkdir build_mac
cd build_mac
cmake ../tensorflow/lite/c -D TFLITE_KERNEL_TEST=ON -D TFLITE_ENABLE_GPU=ON -D ABSL_PROPAGATE_CXX_STD=ON -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -D LIBRARY_OUTPUT_PATH=/Users/luohanjie/Softwares/tensorflow_src/build_mac/lib
cmake --build . -j

编译测试程序benchmark_model,测试模型model_opt.tflite,使用cpu:

1
cmake --build . -j -t benchmark_model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
./tensorflow-lite/tools/benchmark/benchmark_model --graph=/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite --verbose=true --num_threads=4 -use_gpu=false

STARTING!
Log parameter values verbosely: [1]
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Number of prorated runs per second: [-1]
Num threads: [4]
Use caching: [0]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Run w/o invoking kernels: [0]
Report the peak memory footprint: [0]
Memory footprint check interval (ms): [50]
Graph: [/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input value files: []
Allow fp16: [0]
Require full delegation: [0]
Enable op profiling: [0]
Max initial profiling buffer entries: [1024]
Allow dynamic increase on profiling buffer entries: [0]
CSV File to export profiling data to: []
Print pre-invoke interpreter state: [0]
Print post-invoke interpreter state: [0]
Release dynamic tensor memory: [0]
Use dynamic tensor for large tensors: [0]
print out all supported flags: [0]
#threads used for CPU inference: [4]
Max number of delegated partitions: [0]
Min nodes per partition: [0]
Directory for delegate serialization: []
Model-specific token/key for delegate serialization.: []
Use xnnpack: [0]
External delegate path: []
External delegate options: []
Use gpu: [0]
Allow lower precision in gpu: [1]
Enable running quant models in gpu: [1]
Prefer maximizing the throughput in gpu: [0]
GPU backend: []
Loaded model /Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 66.3383
Initialized session in 41.498ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=13 first=43827 curr=38759 min=38662 max=45293 avg=39973.3 std=1998

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=39240 curr=38747 min=38470 max=40766 avg=39654.3 std=635

Inference timings in us: Init: 41498, First inference: 43827, Warmup (avg): 39973.3, Inference (avg): 39654.3

使用gpu:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
./tensorflow-lite/tools/benchmark/benchmark_model --graph=/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite --verbose=true --num_threads=4 --use_gpu=true

STARTING!
Log parameter values verbosely: [1]
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Number of prorated runs per second: [-1]
Num threads: [4]
Use caching: [0]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Run w/o invoking kernels: [0]
Report the peak memory footprint: [0]
Memory footprint check interval (ms): [50]
Graph: [/Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input value files: []
Allow fp16: [0]
Require full delegation: [0]
Enable op profiling: [0]
Max initial profiling buffer entries: [1024]
Allow dynamic increase on profiling buffer entries: [0]
CSV File to export profiling data to: []
Print pre-invoke interpreter state: [0]
Print post-invoke interpreter state: [0]
Release dynamic tensor memory: [0]
Use dynamic tensor for large tensors: [0]
print out all supported flags: [0]
#threads used for CPU inference: [4]
Max number of delegated partitions: [0]
Min nodes per partition: [0]
Directory for delegate serialization: []
Model-specific token/key for delegate serialization.: []
Use xnnpack: [0]
External delegate path: []
External delegate options: []
Use gpu: [1]
Allow lower precision in gpu: [1]
Enable running quant models in gpu: [1]
Prefer maximizing the throughput in gpu: [0]
GPU backend: []
Loaded model /Users/luohanjie/Workspace/Vision/my_slam/data/models/model_opt.tflite
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 66.3383
Initialized session in 129.521ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=40 first=40053 curr=11752 min=11744 max=40053 avg=12579.9 std=4400

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=85 first=11880 curr=11836 min=11567 max=12276 avg=11839.5 std=93

Inference timings in us: Init: 129521, First inference: 40053, Warmup (avg): 12579.9, Inference (avg): 11839.5