• 系统: Ubuntu 16.04.4 LTS
  • 内核: 4.13.0-36-generic
  • CUDA: 9.0.176
  • 显卡: 940mx
  • 显卡驱动: 384.13
  • GCC: 5.4.0
  • python: 2.7.12


sudo apt-get install python-pip python-dev python3-pip python3-dev cuds-command-line-tools

sudo pip install testresources enum34 mock
sudo edit ~/.bashrc


export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64


source ~/.bashrc

安装CUDA 9.0

官网下载CUDA cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb。


sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cud


sudo edit ~/.bashrc


export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH


source ~/.bashrc


nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuba compilation tools, release 9.0, V9.0.176
dpkg-query -W | grep cuda-cubla

cuda-cublas-9-0 9.0.176-1
cuda-cublas-dev-9-0 9.0.176-1
Building Samples (optional)
cd /NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 940MX"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2003 Mbytes (2100232192 bytes)
( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 1242 MHz (1.24 GHz)
Memory Clock rate: 1001 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS


下载cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0:

  • cuDNN v7.0.5 Runtime Library for Ubuntu16.04 (Deb)
  • cuDNN v7.0.5 Developer Library for Ubuntu16.04 (Deb)
  • cuDNN v7.0.5 Code Samples and User Guide for Ubuntu16.04 (Deb)

Navigate to your directory containing cuDNN Deb file:

sudo dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb 
sudo dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7005 (7.0.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 3 Capabilities 5.0, SmClock 1241.5 Mhz, MemSize (Mb) 2002, MemClock 1001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0


Test passed!

NVIDIA TensorRT 3.0.4(optional)


sudo dpkg -i nv-tensorrt-repo-ubuntu1604-ga-cuda9.0-trt3.0.4-20180208_1-1_amd64.deb
sudo apt-get update
sudo apt-get install tensorrt

sudo apt-get install python-libnvinfer-doc python-libnvinfer python-libnvinfer-dev swig3.0 # python 2.7

sudo apt-get install python3-libnvinfer-doc # python 3.5
dpkg -l | grep TensorRT

ii libnvinfer-dev 4.0.4-1+cuda9.0 amd64 TensorRT development libraries and headers
ii libnvinfer-samples 4.0.4-1+cuda9.0 amd64 TensorRT samples and documentation
ii libnvinfer4 4.0.4-1+cuda9.0 amd64 TensorRT runtime libraries
ii python-libnvinfer 4.0.4-1+cuda9.0 amd64 Python bindings for TensorRT
ii python-libnvinfer-dev 4.0.4-1+cuda9.0 amd64 Python development package for TensorRT
ii python-libnvinfer-doc 4.0.4-1+cuda9.0 amd64 Documention and samples of python bindings for TensorRT
ii python3-libnvinfer 4.0.4-1+cuda9.0 amd64 Python 3 bindings for TensorRT
ii python3-libnvinfer-dev 4.0.4-1+cuda9.0 amd64 Python 3 development package for TensorRT
ii python3-libnvinfer-doc 4.0.4-1+cuda9.0 amd64 Documention and samples of python bindings for TensorRT
ii tensor

安装TensorFlow 1.8 [2]

For python 2.7

sudo apt-get install python-pip python-dev

pip install pip==9.0 # Don't use pip > 9.0 !!!!
sudo pip install --upgrade https://download.tensorflow.google.cn/linux/gpu/tensorflow_gpu-1.8.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade pip

For python 3.5

sudo apt-get install python3-pip python3-dev

sudo pip3 install --upgrade https://download.tensorflow.google.cn/linux/gpu/tensorflow_gpu-1.8.0-cp35-cp35m-linux_x86_64.whl

卸载指令: sudo pip uninstall tensorflow or sudo pip3 uninstall tensor flow



# Python
import tensor flow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

config = tf.ConfigProto(allow_soft_placement=True)

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)

config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

hello = tf.constant('Hello, TensorFlow!')


Hello, TensorFlow!

  1. https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html

  2. https://www.tensorflow.org/install/install_linux?hl=zh-cn#InstallingNativePip