In GPU Gems 3, we continue to showcase work that uses graphics hardware for nongraphics computation. As each new generation provides significantly greater computing power and programmability, GPUs are increasingly attractive targets for general-purpose computation, or what is commonly called GPGPU or GPU Computing. As a result, researchers and developers in academia and industry continue to develop new GPU algorithms for tasks such as sorting, database operations, image processing, and linear algebra. In many cases, the principal motivation for using the GPU is the prospect of high performance at a relatively low cost.
GPU programming tools have evolved dramatically over the past few years. Recently, NVIDIA launched a new set of tools for GPU Computing with the introduction of its CUDA technology. CUDA provides a flexible programming model and C-like language for implementing data-parallel algorithms on the GPU. What's more, NVIDIA's CUDA-compatible GPUs have additional hardware features specifically designed to boost performance and give users more control over how algorithms are mapped to the GPU. In many ways, CUDA is an important step forward in widening the domain of algorithms that can benefit from GPU performance. This part of the book contains a mix of new applications using CUDA, in addition to graphics-based GPGPU using languages like Cg.
We begin this section with a look at the role of GPUs in network security. For network virus detection systems, there is a tradeoff between fast, expensive solutions using specialized processors and low-cost alternatives based on commodity CPUs. In Chapter 35, "Fast Virus Signature Matching on the GPU," Elizabeth Seamans of Juniper Networks and Thomas Alexander of Polytime present a high-performance, GPU-based virus scanning library. The system uses the GPU as a fast filter to quickly identify possible virus signatures for thousands of data objects in parallel. The performance of their library suggests that the GPU is now a viable platform for cost-effective, high-performance network security processing.
In Chapter 36, "AES Encryption and Decryption on the GPU," Takeshi Yamanouchi of SEGA Corporation describes his work on implementing encryption algorithms on the GPU. AES (Advanced Encryption Standard) is the current standard for block cipher encryption, and, like many encryption algorithms, it relies heavily on integer operations. The author describes how to use the integer-processing capabilities of NVIDIA's GeForce 8800 GPUs to accelerate AES encryption and decryption.
Many software systems, including particle physics simulators and stochastic ray tracers, rely on Monte Carlo methods to efficiently solve problems involving complex, multidimensional functions. Fast and accurate random number generation is a critical component of all Monte Carlo simulations. In Chapter 37, "Efficient Random Number Generation and Application Using CUDA," Lee Howes and David Thomas of Imperial College London present methods for generating random numbers using CUDA to exploit the massive parallelism and arithmetic performance of the GPU. They describe the relative advantages of two fast algorithms for generating Gaussian random numbers—techniques that are particularly useful in financial simulations for pricing stock options.
Companies in the oil and gas industry depend on accurate seismic surveys of the Earth to identify subsurface oil reservoirs. The challenge is that most seismic data sets are many terabytes in size and it takes enormous amounts of computing power to convert the raw data into useful survey images. In Chapter 38, "Imaging Earth's Subsurface Using CUDA," Bernard Deschizeaux and Jean-Yves Blanc of CGGVeritas describe a CUDA implementation of several time-critical algorithms within their industrial seismic processing pipeline. Their CUDA implementation achieves significant performance improvements over the latest generation of CPUs, and the authors discuss the possibility of building clusters of GPUs to accelerate large seismic processing problems.
A number of commonly used algorithms in computer science involve a simple operation called all-prefix-sum, or scan. For each value in an array of data, the scan operation computes the sum of all preceding values. In Chapter 39, "Parallel Prefix Sum (Scan) with CUDA," Mark Harris of NVIDIA and Shubhabrata Sengupta and John D. Owens of University of California, Davis, describe an efficient CUDA implementation of a parallel scan algorithm and provide results for applications such as stream compaction and radix sort. This chapter is also a good reference for developers to learn CUDA programming and optimization strategies.
The Gaussian function is one of the most widely used filter kernels in image and signal processing. The exponential term makes the Gaussian expensive to evaluate dynamically, so in practice it is common to precompute a table of coefficients. In Chapter 40, "Incremental Computation of the Gaussian," Ken Turkowski of Adobe Systems presents a method to quickly evaluate the Gaussian on the fly using a technique similar to polynomial forward differencing. By replacing differences with quotients, this algorithm incrementally computes Gaussian coefficients. For a GPU implementation, this approach eliminates a texture lookup in the pixel shader, which can result in faster filtering performance.
Chapter 41, "Using the Geometry Shader for Compact and Variable-Length GPU Feedback," completes this section by describing how to use a new hardware feature in DirectX 10-compliant GPUs to implement algorithms that cannot be implemented efficiently using pixel or vertex shaders. The geometry shader is an extra stage in the GPU rendering pipeline that is capable of executing algorithms with variable, data-dependent input and output. This capability is particularly useful for computer vision applications that analyze images to identify geometric shapes. In this chapter, Franck Diard of NVIDIA presents geometry shader implementations of several algorithms, including histogram building and corner detection.
This section provides a small sampling of recent work on GPGPU techniques. Even with rapidly evolving architectures and programming tools like NVIDIA's CUDA, GPUs remain fairly specialized for data-parallel computation. However, it is clear that many important algorithms in scientific computing and other fields have enough parallelism to benefit from GPU performance, and it's likely that new algorithms will emerge as GPUs become more general and easier to program. As the chapters in this section demonstrate, the price/performance ratio of graphics processors is a potentially disruptive force in high-performance, and other, computing industries.
Nolan Goodnight, NVIDIA Corporation