Lately, my interest for machine learning and artificial intelligence has revived. When I was at university, I followed some courses and specialisations in this field, but then during my career I hardly ever used any of it. Back in those years, complex neural nets and genetic algorithms took days to build, mainly because we didn’t have the computing power for that. But nowadays, things have changed, and such models can relatively quickly be built using a commodity graphics card.

To give my old workstation annex gaming PC a new meaning in life, why not try to employ its NVIDIA GT218 for some experiments? Sure, it isn’t a high-end card for todays standards, but at least it might be fun to try it. Only problem is: you can’t write arbitrary code and just run it on a graphics card. Luckily, NVIDIA distributes the CUDA toolkit which lets you do that.

If you’re up for a journey, continue reading…. Otherwise, skip to the end.


Before even thinking of installing something, I had to make sure my machine was running a supported operating system. Since I have had good experiences with Ubuntu, and the machine had an old version of Ubuntu installed, I upgraded that to the latest Long Term Support (LTS) release: 16.04.2 at the time of writing. I choose Ubuntu 16.04 since it is an LTS release, which means it will still receive security patches, and since it is officially supported by NVIDIA.


After that, I headed to the CUDA Toolkit download page and make the following choices:

Operating System Linux
Architecture x86_64
Distribution Ubuntu
Version 16.04
Installer Type deb (network)

I choose the deb (network) installer since it is the smallest to download and it will configure APT repositories for you. In case Nvidia decides to release updates to the toolkit, I hope this approach will make it easier to get them. The deb (local) will download everything upfront, and then you have to install another patch. It will probably work just fine, but I prefer this approach.


Installing the toolkit is pretty straightforward, and it listed on the download page as well:

  1. sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  2. sudo apt-get update
  3. sudo apt-get install cuda

Disabling Nouveau

If you want to use CUDA, you cannot use the open-source Nouveau drivers for NVIDIA graphics cards. To blacklist them (meaning the Linux kernel will never load them), I created a file at /etc/modprobe.d/blacklist-nouveau.conf and put the following in it:

blacklist nouveau
options nouveau modeset=0

After that, you need to make sure the initial kernel image is also updated: sudo update-initramfs -u. Finally, I did a reboot, just to be sure, but I don’t think it is really necessary.

Trying out (the hard part)

Now comes the hardest part: trying to get it all to work.

See whether nvcc works properly

The CUDA Toolkit comes with the NVIDIA CUDA Compiler, or nvcc for short. Let’s see if it works. Running nvcc -V told me it wasn’t installed, but I could install it installing the nvidia-cuda-toolkit package.

See if the compiler can actually compile code

I don’t feel like writing C-code for the graphics card myself, but that isn’t necessary either. The CUDA toolkit comes with some sample code, which can be copied to a directory of your choice by running the script, found in /usr/local/cuda-8.0/bin/. You need to give it a target directory to copy the samples to; I choose . for my home directory. I cd‘ed into that folder and issued make. After a long wait (skipped here for brevity), I got

nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile:381: recipe for target 'cudaDecodeGL' failed
make[1]: *** [cudaDecodeGL] Error 1
make[1]: Leaving directory '/home/maarten/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
Makefile:52: recipe for target '3_Imaging/cudaDecodeGL/Makefile.ph_build' failed
make: *** [3_Imaging/cudaDecodeGL/Makefile.ph_build] Error 2

Too bad: still no luck!

Specify where to find libnvcuvid

I had no clue where the libraries would be installed, so I called my old friend find to the rescue. Issuing find /usr/lib/ -name "*nvcuvid*" revealed that a couple of files named* lived in /usr/lib/nvidia-375/. Now that is something I could use: LIBRARY_PATH=/usr/lib/nvidia-375/ make. Another long wait, and then finally:

make[1]: Leaving directory '/home/maarten/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/simpleCUFFT'
Finished building CUDA samples


But it’s easy to forget specifying this, so I updated my ~/.bashrc and added the following (after the documentation):

export LD_LIBRARY_PATH=/usr/lib/nvidia-375

Does the sample code work?

The above make command will produce quite some binaries in ./bin/x86_64/linux/release (relative to the working directory). An intesting one is deviceQuery, which gave the following output:

./bin/x86_64/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Fix device files

According to the documentation, the device files that enables communication between the CUDA Driver and the kernel- mode portion of the NVIDIA Driver can sometimes not be created due to the system preventing setuid binaries. The guide also provides a fix for that: a custom script that should be after boot. Maybe, if I find time somewhere, I’ll create a nice init script for it, but for now, this should do:

/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i
  mknod -m 666 /dev/nvidiactl c 195 255
  exit 1

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
  mknod -m 666 /dev/nvidia-uvm c $D 0
  exit 1

At first, the script didn’t work: the very first step (/sbin/modprobe nvidia) failed with modprobe: ERROR: could not insert 'nvidia_375': No such device. Strange, since I am pretty sure the NVIDIA card is there! On the NVIDIA DevTalk forum I found a post that helped out: it turned out there are a few other kernel modules that might interfere with the NVIDIA driver, in particular bbswitch. According to the package manager, it’s an “Interface for toggling the power on NVIDIA Optimus video cards”. Since my card isn’t an Optimus card, I figured I could safely remove the package using sudo aptitude remove bbswitch-dkms.

Unfortunately, a reboot didn’t solve that either.

Look at kernel module loading

Searching the error messages I found so far (cudaGetDeviceCount returned 30 and could not insert 'nvidia_375': No such device) pointed me into thinking something might be wrong with the kernel drivers. So I tried to troubleshoot why the kernel modules couldn’t be loaded. Running sudo modprobe --force-modversion nvidia-375-uvm gave an interesting message

could not insert 'nvidia_375_uvm': Exec format error`

Running it again without the --force-modversion gave

could not insert 'nvidia_375_uvm': Unknown symbol in module, or unknown parameter (see dmesg)

Now that’s interesting; I checked dmesg to see what I could find there. It displayed tons of identical messages:

[  680.572990] NVRM: The NVIDIA GeForce 210 GPU installed in this system is
               NVRM:  supported through the NVIDIA 340.xx Legacy drivers. Please
               NVRM:  visit for more
               NVRM:  information.  The 375.66 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...

The link points to the NVIDIA Unix Driver Archive; is this a hint that my graphics card is getting old indeed? Anyway, I decided to give this a try and installed the 340.xx drivers using sudo aptitude install nvidia-340. Now that’s a disappointment: apitude suggests to remove cuda-8-0. So the driver that supports my graphics card is too old for Cuda 8?

Downgrade NVIDIA drivers, CUDA Toolkit and GCC

On a non-NVIDIA page, I found some kind of a compatibility matrix which told me that the 340.xx driver should match with CUDA 6.5. So, as a final attempt, I downloaded an older version (6.5) of the CUDA Toolkit from the appropriate page. I uninstalled all previously installed cuda stuff using aptitude, up to the point where I could issue sudo aptitude install nvidia-340 without being greeted by a lot of conflicts. Next, I installed CUDA 6.5 using sudo aptitude install cuda-samples-6-5 which happily installed some tooling as well. Copying the samples to my home directory using /usr/local/cuda-6.5/bin/ .. According to the 6.5 documentation, I would need either 4.6 or 4.8. On Ubuntu 16.04, it is still possible to install GCC 4.8 with sudo aptitude install gcc-4.8 g++-4.8. When they are installed, we need to tell make to use it: GCC=g++-4.8 make. Another long wait before all samples are compiled. Then running the deviceQuery sample yields:

./bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 210"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.2
  Total amount of global memory:                 511 MBytes (536150016 bytes)
  ( 2) Multiprocessors, (  8) CUDA Cores/MP:     16 CUDA Cores
  GPU Clock rate:                                1402 MHz (1.40 GHz)
  Memory Clock rate:                             400 Mhz
  Memory Bus Width:                              64-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 210
Result = PASS

Hurray! It finally works! I don’t even seem to need the script to fix device files anymore.


Time for a wrap-up. What did I learn? Two major things:

  1. Read the docs! A lot of mistakes and experiments could have been skipped by first reading the installation guide and other docs.
  2. Check support! Make sure to use the versions of the CUDA Toolkit that match with the NVIDIA driver and the graphics card that you have.


If you have a somewhat older card, first check the legacy NVIDIA driver listing to see which version is the latest to support your graphics card. Check the Rogue Wave Total View documentation for an unofficial compatibility matrix to find out which version of the CUDA Toolkit your driver will support. Follow the installation instructions for that version of the CUDA Toolkit.