PCs come with an amazingly powerful device: a graphics processingunit (GPU). It is mostly underutilized, often doing little more thanrendering a desktop to the user. But computing on the GPU isrefreshingly fast compared to conventional CPU processing wheneversignificant portions of your program can be run in parallel. Theapplications are seemingly endless including: matrix computations,signal transformations, random number generation, molecular modeling,and password recovery. Why are GPUs are so effective? They havehundreds, in some cases thousands, of cores available for parallelprocessing. Compare this to the typical one to four CPU cores ontoday's PCs. (For a more technical treatment see:graphics.stanford.edu/~mhouston/public_talks/cs448-gpgpu.pdf
Here I present a way to use the power of NVidia's Cuda-enabledGPUs for computing using Java with an Eclipse-based IDE. My platformis Debian Wheezy (64 and 32 bit), but I have also reproduced theprocess on Linux Mint 13, and it can be done on many other Linuxdistributions. The approach can be adapted to a Windows install, aprocess that is well documented elsewhere.
This is a September 2013 update of the original article. Sincewriting this article, there are many new developments particularly inregard to the process for installing the NVidia Development driver onLinux. As distros evolve, it has become increasingly difficult todisable the Nouveau driver, a requirement for installing the NVidiadriver. Also, occasionally the compiler (gcc) that ships with thedistro differs from the compiler used to compile the OS's kernelitself. Finally, Linux systems using the NVidia Optimus technologyrequire additional gymnastics to configure the driver.
Easily accessing the power of the GPU for general purposecomputing requires a GPU programming utility that exposes a set ofhigh-level methods and does all of the granular, hardware-level workfor us. The popular choices are OpenGl and Cuda. Cuda works only withNVidia GPUs. I prefer NVidia devices and this article presents a Cudasolution.
Eclipse is my favorite IDE for programming in Java, C++, and PHP.NVidia provides an Eclipse-based IDE called Nsight, which ispre-configured for Cuda C++ development. Other features, like Java,PHP, etc., can be added to your Nsight installation from compatibleEclipse software repositories (e.g. Nsight 5.5 is compatible with theEclipse Juno repository).
Direct programming with Cuda requires using unmanaged C++ code. Iprefer programming with managed code. To do this I use a method forwrapping the C++ functionality of Cuda in bindings that areaccessible to Java. In the past, on a Windows 7 platform, I wrote my own wrappers for use with C#.net code (see my CodeProject Article). With Java, thisis not necessary because open source wrappers are available. I useJCuda.
There are four basic elements presented here:
Sometimes tutorials present steps that the writer followed on anexisting production machine that already had certain prerequisiteconfigurations in place. Consequently, when a reader follows thesteps, the procedure may fail. To avoid this, I tested the processdescribed below from fresh installs of Mint 13_64 bit, Linux Mint13_32 bit, Debian Wheezy x32, and Debian Wheezy x64. For Mint, Ichose the Mate flavor in both cases. Here are the details of mydemonstration machines:
Stable, Long Term Service releases for distributions wereexplicitly chosen for this project. Interim, releases frequentlychange certain basic hardware configurations and filesystemarrangements. After reviewing and contributing to several hundredLinux forum posts, I am certain that you will experience fewerheadaches if you do the same.
On Linux systems there are configuration complications withsystems that use the NVidia Optimus technology. Simply stated, GPUtasks that do not require the high-performance of the NVidia GPU aredelegated to a lower-performance, lower-power consumption GPU,typically Intel devices. This process is currently not wellimplemented on Linux machines. But, it can be made to work! If youare lucky, your machine has a BIOs setting for disabling Optimusintegration, but many PC manufacturers do not bother to provide thisoption. Enter Bumblebee, a program that allows you to specify the GPUto use for a given application. Because I have not constructed a teston an Optimus system, details for Optimus-enabled GPUs are notprovided here and you will have to research the Bumblebee gymnasticsindependently. Later, when you configure eclipse for JCuda, myunderstanding is that Eclipse (and Nsight) can be run with optiruneclipse and the proper GPU willbe used for debugging your programs. Here are some promising resources: http://forums.linuxmint.com/viewtopic.php?f=47&t=144049(post # 7) and http://bumblebee-project.org/install.html
Computationally intensive applications, e.g. Fouriertransforms, whether they are done on the CPU or the GPU, will giveyour system a stress test. Start small and monitor systemtemperatures when you have high computational overhead.
NVidia has an exhaustive list of Cuda-compatible GPUs on theirDeveloper Zone web site: http://developer.NVidia.com/Cuda-gpus. Checkto see if yours is listed. Also, determine whether your machine usesthe NVidia Optimus technology and, if it does, see the note above.
There are some prerequisites. From a terminal, run the followingcommands to get them:
Download the latest Cuda release from:https://developer.NVidia.com/Cuda-downloads. (Note: The NVidia siteonly shows Ubuntu releases for Debian forks like Mint. The Cudareleases for Ubuntu work well with Mint LTS 13 and Debian Wheezy.)Select the proper 32/64 choice and prefer the .run file over the .debfile. My most recent download was cuda_5.5.22_linux_32.run (orcuda_5.5.22_linux_64.run).
Split the installer into its threecomponent installer scripts: toolkit, driver, and samples. Thisfine-grained control is a great benefit if/when troubles occur. Hereis the syntax for splitting the installer.
shcuda_5.5.22_linux_32.run -extract=<theCompletePathToYourDestination>
or
sh cuda_5.5.22_linux_64.run-extract=<theCompletePathToYourDestination>
The following three files arecreated:
We start by installing the NVidia developer driver. This stepcreates the most trouble for Linux users because it variessubstantially from distro to distro. Before you do anything; printthis page, save your work, and be sure you are backed-up.
You cannot have an X server running when you install the developerdrivers. Do a preliminary test to make sure you can drop to a consoleand stop your X server. Simultaneously press [ctrl][alt][f2]. If youare lucky your desktop shows a console prompting you to login. If so,login and stop the display manager:
You should now see the console. If you see a blank screen, do[ctrl]+[alt]+[f2] again. Now you can either run sudo reboot orstartx to return to your desktop. If this test fails, then youshould install your package manager's NVidia non-free driver, thentry it again... even though in a subsequent step we will be removingit.
Debian and it's siblings use a default driver called nouveau,a wonderful, open-source solution for NVidia GPU's that is totallyincompatible with NVidia Cuda development. It must be disabled atboot time. One way is to modify grub:
gksu gedit /etc/default/grub
Find the line that reads: “GRUB_CMDLINE_LINUX_DEFAULT=...” and make it read:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nouveau.modeset=0"
Save the file, close gedit, and run:
sudo update-grub
sudo reboot
Another more conservative way is to interrupt the grub bootloader andmanually insert the nouveau.modeset=0 phrase as a one-time bootoption. To do this, your grub configuration must have a timeout thatenables you to view the grub menu. At the grub menu, highlight yourdefault boot option and press e to get the grub command line.Find the line that reads "Linux ..." and add nouveau.modeset=0 to theend of the line. Press [cntl][x] to start. If you use this method, you will need to repeat this process ifyou reboot before the driver is installed and nouveau is removed. Here's a reference thatpresents the basic idea on a Mint distro:http://community.linuxmint.com/tutorial/view/842Next, edit your blacklist configuration file (gksu gedit/etc/modprobe.d/blacklist.conf) and add these lines to the end:
Then, remove everythingNVidia from the system with:
sudo apt-get remove --purge NVidia*
Drop to a console ([ctrl][alt][f2]), exit the X server (e.g. sudoservice mdm stop), and run the installer:
sudosh NVIDIA-Linux-x86-319.37.run (or sudo shNVIDIA-Linux-x64-319.37.run)
Your installer may fail. The most common errors are that a displaymanager is in use or that there is a conflict (with nouveau).Retracing the steps above will remedy these problems. But, sometimesan error will occur if the distro's kernel was compiled with anearlier version of gcc. (You'll see something like: The compilerused to compile the kernel (gcc 4.6) does not exactly match thecurrent compiler (gcc 4.7).) Occasionally selecting to ignorethis will work, but again, don't count on it. You need to install thegcc version used to compile the kernel (e.g. 4.6 in the exampleabove). Do this using your preferred package manager. Next, becauseyour machine now has two gcc versions, we need to createalternatives. Using the example of gcc 4.6 and gcc 4.7 we run:
sudo update-alternatives --install /usr/bin/gcc gcc/usr/bin/gcc-4.6 10
sudo update-alternatives --install/usr/bin/gcc gcc /usr/bin/gcc-4.7 20
Now, when you run:
sudoupdate-alternatives --config gcc
You can pick gcc 4.6 as the active version. Later, afterthe install, you can switch it back.
Whew! Now it gets easier. Next, we install the toolkit with:
sudo shcuda-linux-rel-5.5.22-16488124.run (or sudo shcuda-linux64-rel-5.5.22-16488124.run)
(If you see a gcc version error, see Your installer may failunder Install the Developer Driver above.)
Your toolkit install console will present the following text whenit is complete:
* Please make sure your PATH includes/usr/local/cuda-5.5/bin
* Please make sure your LD_LIBRARY_PATH
*for 32-bit Linux distributions includes /usr/local/cuda-5.5/lib
*for 64-bit Linux distributions includes/usr/local/cuda-5.5/lib64:/usr/local/cuda-5.5/lib
* OR
* for32-bit Linux distributions add /usr/local/cuda-5.5/lib
* for64-bit Linux distributions add /usr/local/cuda-5.5/lib64 and/usr/local/cuda-5.5/lib
* to /etc/ld.so.conf and runldconfig as root
Set your additional paths persistently by editing (creating ifnecessary) the .profile file in your home directory. AddPATH=$PATH:/usr/local/cuda-5.5/bin to the end of the file,save, then logout and login.
Use a persistent, modular approach for managing yourLD_LIBRARY_PATH. I never edit the /etc/ld.so.conf file.Rather, my ld.so.conf file contains the line: include/etc/ld.so.conf.d/*.conf. I create a new file in the/etc/ld.so.conf.d folder named cuda.conf that has thefollowing line(s):
Then run sudo ldconfig.
Install the samples by running your third, split-out installerscript:
sudo sh cuda-samples-linux-5.5.22-16488124.run
Now let's run a test. From a terminal, change to the folder wherethe deviceQuery sample is located (default is/usr/local/cuda-5.5/samples/1_Utilities/deviceQuery). Make thesample with the system compiler:
sudo make
(If you see a gcc version error when you run sudo make, see Yourinstaller may fail under Install the Developer Driver above.)
Then, run the sample with:
./deviceQuery
I see the following on my 64 bit test system:
/usr/local/cuda-5.5/samples/1_Utilities/deviceQuery $.
/deviceQuery ./deviceQuery Starting...
CudaDevice Query (Runtime API) version (CudaRT static linking)
Detected1 Cuda Capable device(s)
Device 0: "GeForce GTX 560Ti"
etc., etc., ...
RuntimeVersion = 5.5, NumDevs = 1, Device0 = GeForce GTX 560 T
Nsight is a fork of Eclipse that is pre-configured for C++ andCuda. It is included in your toolkit install (you already have it).For now, run it from a terminal:/usr/local/cuda-5.5/libnsight/nsight. (Do not double-click thefile from your file manager.) Later you can make a desktop launcher.Go ahead and choose the default folder for projects that itrecommends.
Let's test it.
My output in the console window is:
[Cuda Bandwidth Test] - Starting...
Running on..Device0:
GeForce GTX 560 Ti.
etc., ...
Nsight can be expanded through Help>Install New Software.To add Java development, you need to addhttp://download.eclipse.org/releases/juno to your AvailableSoftware Sites. (Note: the Kepler repository does not work as ofNsight 5.5) Then, install Eclipse Java Development Tools.
Follow the install dialog and restart Nsight.
Download the zip for your platform fromhttp://www.jCuda.org/downloads/downloads.html.Extract it to a folder in your home directory. Then start Nsight.Create a new Java Project (File > New > Java Project)and name it JCudaHello. Right-click the JCudaHelloproject in the project explorer and select Properties. Go tothe Java Build Path tree item and select the Librariestab. Click Add External Jars, navigate to the extracted folderyou created, and pick jCuda-0.5.5.jar.With the Libraries tab still open, expand the tree for thejCuda-0.5.5.jar youadded and click on Native library location (none). Then clickthe Edit button. You will be asked for a location. ClickExternal Folder and again navigate to the extracted folder.Click OK.
Now, right-click your src folder in the jcudaHello project from the Project Explorer and select New > Class. Name the class cudaTest and select the public static void main method stub:
Click Finish. Delete the code that is pre-generated in cudaTest.java from the editor pane and paste this in:
import jcuda.Pointer; import jcuda.runtime.JCuda; public class test { public static void main(String[] args) { Pointer pointer = new Pointer(); JCuda.cudaMalloc(pointer, 4); System.out.println("Pointer: " + pointer); JCuda.cudaFree(pointer); }}
When you run it, you should see something like this:
Pointer:
Pointer[nativePointer=0x800100000,byteOffset=0]
The project code is a zipped Eclipse workspace that does notinclude any hidden meta-data folders or information files. When youunzip it to your location of choice, you will see twosub-directories: JCudaFftDemo and Notes.
First, we need to create an Nsight Java project from the existingsources in the JCudaFftDemo folder. Start Nsight and chooseyour extracted directory (parent directory for JCudaFftDemo) when itasks you to select a workspace. Create a new Java Project from theFile menu and give it the exact name: JCudaFftDemo. Then,click Finish. If you expand the trees for the project in theProject Explorer you should see:
Next, you need to add the JCuda binaries to the Java Build Path.Right-click the JCudaFftDemo project in the Project Explorerand select Properties. Go to the Java Build Path treeitem and select the Libraries tab. Click Add External Jars,navigate to the JCuda binaries you downloaded in Setup – Step 7,and pick jCuda-0.5.5.jar,jcublas-0.5.5.jar, and jcufft-0.5.5.jar.With the Libraries tab still open, one at a time,expand the trees for the jars you added and click on Nativelibrary location (none). Click the Edit button and set thelocation to match your JCuda binaries directory. (We are repeatingStep 7 in the above Setup section, this time for the newproject.)
Then, run it as a Java application. Here is the output console from my Linux Mint 13, 32 bit laptop:
Creating sin wave input data: Frequency = 11.0, N = 1048576, dt = 5.0E-5 ...
L2 Norm of original signal: 724.10583
Performing a 1D C2C FFT on GPU with JCufft...
GPU FFT time: 0.121 seconds
Performing a 1D C2C FFT on CPU...
CPU time: 3.698 seconds
GPU FFT L2 Norm: 741484.3
CPU FFT L2 Norm: 741484.4
Index at maximum in GPU power spectrum = 572, frequency = 10.910034
Index at maximum in CPU power spectrum = 572, frequency = 10.910034
Performing 1D C2C IFFT(FFT) on GPU with JCufft...
GPU time: 0.231 seconds
Performing 1D C2C IFFT(FFT) on CPU...
CPU time: 3.992 seconds
GPU FFT L2 Norm: 724.1056
CPU FFT L2 Norm: 724.10583
First, a word about complex data arrays; CUDA and JCuda can work with data arrays that contain complex vectors of type float or double, provided you construct the array as an interleaved, complex number sequence. This is best demonstrated with an example. Let’s say we have a complex vector of length 2: (1 + 2i, 3 + 4i). The corresponding interleaved data array has a length of 4 and has the form: (1, 2, 3, 4). In the project code I use this format for all complex vectors that are submitted to JCuda methods.
In contrast, for CPU coding simplicity, I use a ComplexFloat
class to represent complex numbers. When using this class to from a complex vector, the vector x = (1 + 2i, 3 + 4i) has the form ComplexFloat[2] = (x[0].Real = 1, x[0].Imaginary = 2, x[1].Real = 3, x[1].Imaginary = 4). The array, and the vector it represents, both have the same length: 2.
Main.java is the entry point for the application. It creates a sample signal and performs the demo. The signal produced is: sin(2*pi*FREQ *t) sampled N times in increments of dT. The demo computes forward and inverse Fourier transforms of the test signal — both on the GPU and the CPU — and provides execution times and signal characteristics for the results.
The CPU FFT part of the code (FftCpuFloat.java) purposely implements the Cooley–Tukey algorithm in an awkward way that depends on instances of the ComplexFloat.java class. Little attention is paid to memory allocation and access. Also, although I have multi-core CPUs, my CPU thread executes on only one core. Doing this makes the radix-2 procedure intuitive and simple, but there is an overhead cost that will overstate the advantage of using the GPU.
You can adjust the constants (FREQ, N, and dT) for creating the test signal from the Main.java class. Using a Linux 32 bit installation on an older Dell laptop I found that, by varying the length of the test signal (N), the CPU FFT outperformed the JCuda FFT with signals that had fewer than 4096 complex elements. Thereafter, the JCuda FFT speeds overwhelmed my CPU FFT. At N = 4194304, JCuda was 250 times faster than the CPU FFT (CPU = 23 seconds, GPU = 0.9 seconds). Beyond that, the laptop fans blaze during the CPU computation loop (system temp: 90 C) and fear of thermal overload prompted me to curtail testing. (My Linux 64 bit desktop, has a 6 core AMD Phenom II on a Sabretooth mombo, 16 GiB of memory, a GeForce GTX 560 Ti graphics card, and some great fans. It can process FFTs (CPU or GPU) all night provided I manage memory effectively.)
A fair amount of the speed advantage I observe is due to the inefficiency of my poorly optimized CPU implementation. More rigorous CPU/GPU evaluations using optimized CPU code suggest that gains are roughly 10X. I'll take 10X over 1X, but the practical reality is; the the power of CUDA's underlying implementation efficiency together with the intrinsic GPU gain (whatever it really is), collectively gives me an average 50X boost.
The Notes folder in the project download includes some tips on how to run a deployed, runnable jar. Basically, you need to use the -Djava.libraries.path switch to point to your JCuda binaries folder.
Getting setup and becoming acquainted with CUDA, JCuda, and Nsight takes a fair amount of work. But it's worth it. General-purpose computing on graphics processing units (GPGPU) is a very important tool to have in your coding toolbox. I hope this article helps make the process more accessible to other GPGPU novices like me. I wish you success as a cutting-edge JCuda coder!
联系客服