Hanjie's Blog

一只有理想的羊驼

环境

  • 系统: Ubuntu 16.04.4 LTS
  • 内核: 4.15.0-50-generic
  • CUDA: 9.0.176
  • 显卡: 940mx
  • 显卡驱动: 384.13
  • GCC: 5.4.0
  • python: 3.5.2

protobuf

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo apt-get install autoconf automake libtool curl make g++ unzip

git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
git checkout v3.6.0
git submodule update --init --recursive
./autogen.sh

./configure
make -j4
make check -j4
sudo make install -j4
sudo ldconfig # refresh shared library cache.
1
2
protoc --version
libprotoc 3.6.0
1
2
3
4
5
sudo pip3 uninstall protobuf
cd protobuf/python
python3 setup.py build
python3 setup.py test
sudo python3 setup.py install

Bazel

下载bazel-0.18.1-installer-linux-x86_64.sh

1
2
chmod +x bazel-0.18.1-installer-linux-x86_64.sh
sudo ./bazel-0.18.1-installer-linux-x86_64.sh --user

The --user flag installs Bazel to the $HOME/bin directory on your system and sets the .bazelrc path to $HOME/.bazelrc. Use the --help command to see additional installation options.

Check CuDNN Version

1
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

or

1
cat /usr/include/x86_64-linux-gnu/cudnn.h | grep CUDNN_MAJOR -A 2

TensorFlow

1
2
git clone https://github.com/tensorflow/tensorflow.git
git checkout v1.12.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
./configure


WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/home/luohanjie/.cache/bazel/_bazel_luohanjie/install/cdf71f2489ca9ccb60f7831c47fd37f1/_embedded_binaries/A-server.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.18.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3


Found possible Python library paths:
/home/luohanjie/Documents/software/caffe/python
/usr/lib/python3.5/dist-packages
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/home/luohanjie/Documents/software/caffe/python]
/usr/lib/python3/dist-packages
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]:


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.4.2


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/include/x86_64-linux-gnu


Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/usr/src/tensorrt


Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 5.0


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
Configuration finished

1
2
3
4
5
6
7
8
9
10
11
12
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=nonccl //tensorflow:libtensorflow_cc.so

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=nonccl //tensorflow:libtensorflow_framework.so

sudo mkdir -p /usr/local/include/tf/tensorflow
sudo ln -s abs_path_to_tensorflow/bazel-genfiles/ /usr/local/include/tf
sudo ln -s abs_path_to_tensorflow/tensorflow/cc /usr/local/include/tf/tensorflow
sudo ln -s abs_path_to_tensorflow/tensorflow/core /usr/local/include/tf/tensorflow
sudo ln -s abs_path_to_tensorflow/third_party /usr/local/include/tf
sudo ln -s abs_path_to_tensorflow/bazel-bin/tensorflow/libtensorflow_cc.so /usr/local/lib
sudo ln -s abs_path_to_tensorflow/bazel-bin/tensorflow/libtensorflow_framework.so /usr/local/lib
sudo ln -s tensorflow/contrib/makefile/downloads/absl/absl /usr/local/include/tf/third_party

如果想要卸载请运行如下命令1

1
2
sudo rm -r /usr/local/include/tf
sudo rm /usr/local/lib/libtensorflow_*.so

TensorFlow C++ Demo

Demo.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <tensorflow/core/platform/env.h>
#include <tensorflow/core/public/session.h>
#include <iostream>

using namespace std;
using namespace tensorflow;

int main()
{
Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok()) {
cout << status.ToString() << "\n";
return 1;
}
cout << "Session successfully created.\n";
return 0;
}

CMakeLists.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cmake_minimum_required(VERSION 3.5)

set(CMAKE_CXX_STANDARD 11)

find_package(OpenCV REQUIRED)
find_package(Eigen REQUIRED )

add_definitions(${EIGEN_DEFINITIONS})

set(TENSORFLOW_INCLUDES
/usr/local/include/tf/
/usr/local/include/tf/bazel-genfiles
/usr/local/include/tf/tensorflow/
/usr/local/include/tf/third-party
)

set(TENSORFLOW_LIBS
/usr/local/lib/libtensorflow_cc.so
/usr/local/lib//libtensorflow_framework.so)


include_directories(
${TENSORFLOW_INCLUDES}
${OpenCV_INCLUDE_DIRS}
${EIGEN_INCLUDE_DIR}
)

add_executable(demo demo.cpp)
target_link_libraries(demo ${TENSORFLOW_LIBS} ${OpenCV_LIBS})

  1. http://www.liuxiao.org/2018/08/ubuntu-tensorflow-c-%E4%BB%8E%E8%AE%AD%E7%BB%83%E5%88%B0%E9%A2%84%E6%B5%8B1%EF%BC%9A%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA/↩︎

测试DFANet语义分割网络,基于论文DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation,主要特点在于它的实时性:

测试使用cityscapes数据集,可以在这里下载。

服务器及数据准备

假设有已有一个远程docker服务器root@0.0.0.0 -p 9999。

Dependence

1
2
3
4
5
6
7
8
pytorch==1.0.0
python==3.6
numpy
torchvision
matplotlib
opencv-python
tensorflow
tensorboardX
1
2
3
4
apt install -y libsm6 libxext6
pip3 install opencv-python
pip3 install pyyaml
pip3 install tensorboardX

检查pytorch版本:

1
2
import torch
print(torch.__version__)
1
2
3
# Linux, pip, Python 3.6, CUDA 9
pip3 install --upgrade pip
pip3 install --upgrade torch torchvision
使用scp指令将本地程序和数据集上传到服务器:
1
scp -P 9999 local_file root@0.0.0.0:remote_directory
解压缩zip文件
1
2
3
apt-get update
apt-get install zip -y
unzip local_file
下载DFANet
1
2
git clone https://github.com/huaifeng1993/DFANet.git
cd DFANet
Pretrained model

打开utils/preprocess_data.py,修改dataset位置:

1
2
cityscapes_data_path = "/home/luohanjie/Documents/SLAM/data/cityscapes"
cityscapes_meta_path = "/home/luohanjie/Documents/SLAM/data/cityscapes/gtFine"

运行脚本,生成labels:

1
python3 utils/preprocess_data.py
main.py

打开main.py,修改dataset位置:

1
2
3
4
5
train_dataset = DatasetTrain(cityscapes_data_path="/home/luohanjie/Documents/SLAM/data/cityscapes",
cityscapes_meta_path="/home/luohanjie/Documents/SLAM/data/cityscapes/gtFine/")

val_dataset = DatasetVal(cityscapes_data_path="/home/luohanjie/Documents/SLAM/data/cityscapes",
cityscapes_meta_path="/home/luohanjie/Documents/SLAM/data/cityscapes/gtFine/")

2019.4.24 An function has been writed to load the pretrained model which was trained on imagenet-1k.The project of training the backbone can be Downloaded from here -https://github.com/huaifeng1993/ILSVRC2012. Limited to my computing resources(only have one RTX2080),I trained the backbone on ILSVRC2012 with only 22 epochs.But it have a great impact on the results.

由于我们没有ILSVRC2012的pretrained model,所以需要关掉标志位:

1
net = dfanet(pretrained=False, num_classes=20)
ERROR: TypeError: init() got an unexpected keyword argument 'log_dir'

打开train.py,修改为:

1
writer = SummaryWriter(logdir=self.log_dir)
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

出现这个错误的情况是,在服务器上的docker中运行训练代码时,batch size设置得过大,shared memory不够(因为docker限制了shm).解决方法是,将Dataloader的num_workers设置为01

打开main.py,修改:

1
2
3
4
5
6
train_loader = DataLoader(dataset=train_dataset,
batch_size=10, shuffle=True,
num_workers=0)
val_loader = DataLoader(dataset=val_dataset,
batch_size=10, shuffle=False,
num_workers=0)

Train

1
python3 main.py

  1. https://blog.csdn.net/hyk_1996/article/details/80824747↩︎

下载opencv_contrib:

1
2
3
4
5
git clone https://github.com/opencv/opencv_contrib

cd opencv_contrib

git checkout 3.4.5

官网下载sources。

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo apt-get install build-essential git make git yams libgtk2.0-dev pig-config libavcodec-dev libavformat-dev python-dev python-bumpy python-tk libtbb2 libtbb-dev libjpeg-dev libpng12-dev libtiff5-dev libjasper-dev libdc1394-22-dev libswscale-dev libopenexr-dev libeigen2-dev libeigen3-dev libfaac-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev libx264-dev libqt4-dev libqt4-opengl-dev sphinx-common texlive-latex-extra libv4l-dev

mkdir build
cd build

cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_EXAMPLES=OFF -D BUILD_DOCS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -D WITH_GTK_2_X=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D WITH_VTK=ON -D WITH_CUDA=ON -D CMAKE_CXX_FLAGS="-std=c++11" -D CUDA_NVCC_FLAGS="-std=c++11 --expt-relaxed-constexpr" -D OPENCV_EXTRA_MODULES_PATH=/home/luohanjie/Documents/software/opencv-3.4.11/opencv_contrib/modules -D VTK_DIR=/home/luohanjie/Documents/software/VTK-7.1.1/build ..


make -j4
sudo make install

sudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig
0%