?? gpunn.aspx.htm
字號(hào):
<div id="contentdiv">
<!-- Main Page Contents Start -->
<!-- Article Starts -->
<ul class="Download">
<li><a href="http://www.codeproject.com/KB/graphics/GPUNN/GPUNN_demo.zip">Download demo (release build requiring CUDA and 120 dpi) - 584.61 KB</a></li>
<li><a href="http://www.codeproject.com/KB/graphics/GPUNN/GPUNN_GUI.zip">Download GUI source code - 509.68 KB</a> </li>
<li><a href="http://www.codeproject.com/KB/graphics/GPUNN/GPUNN_kernel.zip">Download kernel (the Neural Network core) - 2.78 KB</a></li>
</ul>
<h2>Introduction</h2>
<p>An Artificial Neural Network is an information processing method
that was inspired by the way biological nervous systems function, such
as the brain, to process information. It is composed of a large number
of highly interconnected processing elements (neurons) working in
unison to solve specific problems. Neural Networks have been widely
used in "analogous" signal classifications, including handwriting,
voice and image recognitions. Neural network can also be used in
computer games. It enables games with the ability to adaptively learn
from player behaviors. This technique has been used in racing games,
such that opponent cars controlled by computers can learn how to drive
by human players.</p>
<p>Since a Neural Network requires a considerable number of vector and
matrix operations to get results, it is very suitable to be implemented
in a parallel programming model and run on Graphics Processing Units
(GPUs). Our goal is to utilize and unleash the power of GPUs to boost
the performance of a Neural Network solving handwriting recognition
problems.</p>
<p>This project was originally our graphics architecture course
project. We ran on GPU the same Neural Network described by Mike
O'Neill in his brilliant article "<a href="http://www.codeproject.com/KB/library/NeuralNetRecognition.aspx">Neural Network for Recognition of Handwritten Digits</a>".</p>
<h2>About the Neural Network </h2>
<p>A Neural Network consists of two basic kinds of elements, neurons
and connections. Neurons connect with each other through connections to
form a network. This is a simplified theory model of the human brain.</p>
<p>A Neural Network often has multiple layers; neurons of a certain
layer connect neurons of the next level in some way. Every connection
between them is assigned with a weight value. At the beginning, input
data are fed into the neurons of the first layer, and by computing the
weighted sum of all connected first layer neurons, we can get the
neuron value of a second layer neuron and so on. Finally, we can reach
the last layer, which is the output. All the computations involved in
operating a Neural Network are a bunch of dot products.</p>
<p>The secret of a Neural Network is all about weight values. Right
values make it perfect. However, at the beginning, we don't know those
values. Therefore, we need to train our network with sample inputs and
compare the outcomes with our desired answers. Some algorithm can take
the errors as inputs and modify the network weights. If patient enough,
the Neural Network can be trained to achieve high accuracy.</p>
<img alt="IllustrationNeuralNet.gif" src="GPUNN.aspx_files/IllustrationNeuralNet.gif" width="599" border="0" height="300" hspace="0">
convolutional neural network. This kind of network is proven to be
suitable for recognizing handwritten digits. For more theoretical
details, please check out Mike's article and the references he has
listed.</p>
<p>The first three layers of our neural network consist of several
feature maps. Each of them is shrunken from the previous layer. Our
input is a 29*29 image of a digit. Therefore, we have 29*29=841 neurons
in the first layer. The second layer is a convolutional layer with 6
feature maps. Each feature map which is a 13*13 image is sampled from
the first layer. Each pixel/neuron in a feature map is a 5*5
convolutional kernel of the input layer. So, there are 13*13*6 = 1014
nodes/neurons in this layer, and (5*5+1(bias node))*6 = 156 weights,
1014*(5*5+1) = 26364 connections linking to the first layer.</p>
<p>Layer 3 is also a convolutional layer, but with 50 smaller feature
maps. Each feature map is 5*5 in size, and each pixel in these feature
maps is a 5*5 convolutional kernel of corresponding areas of all 6
feature maps of the previous layer. There are thus 5*5*50 = 1250
neurons in this layer, (5*5+1)*6*50 = 7800 weights, and 1250*26 = 32500
connections.</p>
<p>The fourth layer is a fully-connected layer with 100 neurons. Since
it is fully-connected, each of the 100 neurons in the layer is
connected to all 1250 neurons in the previous layer. There are
therefore 100 neurons in it, 100*(1250+1) = 125100 weights and 100x1251
= 125100 connections.</p>
<p>Layer 5 is the final output layer. This layer is also a
fully-connected layer with 10 units. Each of the 10 neurons in this
layer is connected to all 100 neurons of the previous layer. There are
10 neurons in Layer 5, 10*(100+1) = 1010 weights and 10x101 = 1010
connections.</p>
<p>As you can see, although structurally simple, this Neural Network is a huge data structure.</p>
<h2>Previous GPU Implementation </h2>
<p><a href="http://leenissen.dk/fann/html_latest/files2/gpu-txt.html">Fast Neural Network Library</a>
(FANN) has a very simple implementation of Neural Network on GPU with
GLSL. Each neural is represented by a single color channel of a texture
pixel. This network is very specific; neurons are ranging from 0 to 1
and have an accuracy of only 8 bits. This implementation takes the
advantage of hardware accelerated dot product function to calculate
neurons. Both neurons and weights are carried on texture maps.</p>
<p>This implementation is straightforward and easy, however limited.
First, in our neural network, we require 32-bit float accuracy for each
neuron. Since our network has five layers, accuracy lost at the first
level could be accumulated and alter the final results. And because it
is important that a handwriting recognition system should be sensitive
enough to detect slight differences between different inputs, using
only 8 bits to represent a neuron is unacceptable. Secondly, normal
Neural Networks map neuron values to the range from 0 to 1. However, in
our program, the Neural Network which is specifically designed for
handwriting recognition has a special activation function mapping each
neuron value to the range from -1 to 1. Therefore, if the neuron is
represented by a single color value as in FANN library, our neurons
will lose accuracy further. Finally, the FANN method uses a dot product
to compute neurons, which is suitable for full connected Neural
Networks. In our implementation, the Neural Network is partially
connected. Computations performed on our Neural Network involve dot
products of large vectors. </p>
<h2>Our Implementation </h2>
<p>Due to all the inconvenience about GLSL mentioned above, we finally
choose CUDA. The reason that the Neural Network is suitable for GPU is
that the training and execution of a Neural Network are two separate
processes. Once properly trained, no writing access is required while
using a Neural Network. Therefore there is no synchronization issue
that needs to be addressed. Moreover, neurons on a same network level
are completely isolated, such that neuron value computations can
achieve highly parallelization.</p>
<p>In our code, weights for the first layer are stored as an array, and
those inputs are copied to device. For each network level, there is a
CUDA function handling the computation of neuron values of that level,
since parallelism can only be achieved within one level and the
connections are different between levels. The connections of the Neural
Network are implicitly defined in CUDA functions with the equations of
next level neuron computation. No explicit connection data structure
exists in our code. This is one main difference between our code and
the CPU version by Mike.</p>
<img alt="cuda.PNG" src="GPUNN.aspx_files/cuda.PNG" width="456" border="0" height="499" hspace="0">
<p>For example, each neuron value of the second level is a weighted sum
of 25 neurons of the first level and one bias. The second neuron level
is composed of 6 feature maps; each has a size of 13*13. We assign a <code>blockID</code> for each feature map and a <code>threadID</code> for each neuron on a feature map. Every feature map is handled by a block and each pixel on it is dealt with by a thread.</p>
<p>This is the CUDA function that computes the second network layer:</p>
<div class="SmallText" id="premain0" style="width: 100%; cursor: pointer;"><img preid="0" src="GPUNN.aspx_files/minus.gif" id="preimg0" width="9" height="9"><span preid="0" style="margin-bottom: 0pt;" id="precollapse0"> Collapse</span></div><pre style="margin-top: 0pt;" id="pre0" lang="C++">__global__ <span class="code-keyword">void</span> executeFirstLayer
(<span class="code-keyword">float</span> *Layer1_Neurons_GPU,<span class="code-keyword">float</span> *Layer1_Weights_GPU,<span class="code-keyword">float</span> *Layer2_Neurons_GPU)
{
<span class="code-keyword">int</span> blockID=blockIdx.x;
<span class="code-keyword">int</span> pixelX=threadIdx.x;
<span class="code-keyword">int</span> pixelY=threadIdx.y;
<span class="code-keyword">int</span> kernelTemplate[<span class="code-digit">25</span>] = {
<span class="code-digit">0</span>, <span class="code-digit">1</span>, <span class="code-digit">2</span>, <span class="code-digit">3</span>, <span class="code-digit">4</span>,
<span class="code-digit">29</span>, <span class="code-digit">30</span>, <span class="code-digit">31</span>, <span class="code-digit">32</span>, <span class="code-digit">33</span>,
<span class="code-digit">58</span>, <span class="code-digit">59</span>, <span class="code-digit">60</span>, <span class="code-digit">61</span>, <span class="code-digit">62</span>,
<span class="code-digit">87</span>, <span class="code-digit">88</span>, <span class="code-digit">89</span>, <span class="code-digit">90</span>, <span class="code-digit">91</span>,
<span class="code-digit">116</span>,<span class="code-digit">117</span>,<span class="code-digit">118</span>,<span class="code-digit">119</span>,<span class="code-digit">120</span> };
<span class="code-keyword">int</span> weightBegin=blockID*<span class="code-digit">26</span>;
<span class="code-keyword">int</span> windowX=pixelX*<span class="code-digit">2</span>;
<span class="code-keyword">int</span> windowY=pixelY*<span class="code-digit">2</span>;
<span class="code-keyword">float</span> result=0;
result+=Layer1_Weights_GPU[weightBegin];
++weightBegin;
<span class="code-keyword">for</span>(<span class="code-keyword">int</span> i=0;i<span class="code-keyword"><</span><span class="code-digit">25</span>;++i)
{
result+=Layer1_Neurons_GPU
[windowY*29+windowX+kernelTemplate[i]]*Layer1_Weights_GPU[weightBegin+i];
}
result=(<span class="code-digit">1</span>.<span class="code-digit">7159</span>*tanhf(<span class="code-digit">0</span>.<span class="code-digit">66666667</span>*result));
Layer2_Neurons_GPU[<span class="code-digit">13</span>*<span class="code-digit">13</span>*blockID+pixelY*13+pixelX]=result;
} </pre>
<p>All other levels are computed the same way; the only difference is the equation of calculating neurons. </p>
<img alt="program.PNG" src="GPUNN.aspx_files/program.PNG" width="548" border="0" height="199" hspace="0">
<p>The main program first transfers all the input data to GPU and then
calls each CUDA function in order and finally gets the answer.</p>
<a href="http://www.codeproject.com/KB/graphics/GPUNN/recod.jpg"><img alt="recod.jpg" src="GPUNN.aspx_files/recod_small.jpg" width="640" border="0" height="259" hspace="0"> </a>
<p>The user interface is a separate program using C#. Users can draw a
digit with the mouse on the input pad, the program then generates a
29*29 image and calls the kernel Neural Network program. The kernel, as
described above, will read the input image and feed it into our Neural
Network. Results are also returned with files and then read back by the
user interface.</p>
<p>Here is a screenshot. After drawing a digit, we can get all the 10
neuron values of the last network layer. The index of the maximum
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -