?? article771.asp.htm
字號:
<td> </td>
<td>-1</td>
<td>-1</td>
<td>(-1) +( -1) -1 = 3 < 0 don't fire,
output -1</td>
</tr>
<tr>
<td> </td>
<td>-1</td>
<td>1</td>
<td>(-1) + (1) -1 = -1< 0 don't fire,
output -1</td>
</tr>
<tr>
<td> </td>
<td>1</td>
<td>-1</td>
<td>(1) + (-1) -1 = -2 < 0 don't fire,
output -1</td>
</tr>
<tr>
<td> </td>
<td>1</td>
<td>1</td>
<td>(1) + (1)-1 = 1 > 0 fire, output 1</td>
</tr>
</table>
</BLOCKQUOTE>
<P>As you can see, the neural network with the proper weights and bias solves the problem perfectly. Moreover, there are a whole family of weights that will do just as well (sliding the decision boundary in a direction perpendicular to itself). However, there is an important point here. Without the bias or threshold, only lines through the origin would be possible since the <I>X</I><SUB>2</SUB> intercept would have to be 0. This is very important and the basis for using a bias or threshold, so this example has proven to be an important one since it has flushed this fact out. So, are we closer to seeing how to algorithmically find weights? Yes, we now have a geometrical analogy and this is the beginning of finding an algorithm.
<H1>The Ebb of Hebbian</H1>
<P>Now we are ready to see the first learning algorithm and its application to a neural net. One of the simplest learning algorithms was invented by <I>Donald Hebb</I> and it is based on using the input vectors to modify the weights in a way so that the weight create the best possible linear separation of the inputs and outputs. Alas, the algorithm works just OK. Actually, for inputs that are orthogonal it is perfect, but for non-orthogonal inputs, the algorithm falls apart. Even though, the algorithm doesn't result in correct weight for all inputs, it is the basis of most learning algorithms, so we will start here.
<P>Before we see the algorithm, remember that it is for a single neurode, single layer neural net. You can of course, place a number of neurodes in the layer, but they will all work in parallel and can be taught in parallel. Are you starting to see the massive parallization that neural nets exhibit? Instead of using a single weight vector, a multi-neurode net uses a weight matrix. Anyway, the algorithm is simple, it goes something like this:
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>Given:</I>
<UL>
<LI>Inputs vectors are in bipolar form <I>I</I> = (-1,1,0,...-1,1) and contain k elements.
<LI>There are <I>n</I> input vectors and we will refer to the set as <I>I</I> and the <I>j</I>th element as <I>I</I><SUB>j</SUB><I>.</I>
<LI>Outputs will be referred to as <I>y</I><SUB>j</SUB> and there are k of them, one for each input <I>I</I><SUB>j</SUB>
<LI>The weights <I>w</I><SUB>1</SUB><I>-w</I><SUB>k</SUB> are contained in a single vector <I>w</I> = (<I>w</I><SUB>1</SUB>, <I>w</I><SUB>2</SUB>, ... <I>w</I><SUB>k</SUB>).
</UL>
</FONT>
</BLOCKQUOTE>
<P><I>Step 1.</I> Initialize all your weights to 0, and let them be contained in a vector <I>w</I> that has <I>n</I> entries. Also initialize the bias b to 0.
<P><I>Step 2.</I> For <I>j</I> = 1 to <I>n</I> do
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>b</I> = <I>b</I> + <I>y</I><SUB><I>j</I></SUB>
(where y is the desired output)
<P><I>w</I> = <I>w</I> + <I>I</I><SUB><I>j</I></SUB>
* <I>y</I><SUB><I>j</I></SUB> (remember this is a vector
operation)
</FONT>
</BLOCKQUOTE>
<P>end do
<P>The algorithm is nothing more than an <I>"accumulator"</I> of sorts. Shifting, the decision boundary based on the changes in the input and output. The only problem is that it sometimes can't move the boundary fast enough (or at all) and <I>"learning"</I> doesn't take place.
<P>So how do we use <I>Hebbian</I> learning? The answer is, the same as the previous network except that now we have an algorithmic method teach the net with, thus we refer to the net as a <I>Hebb </I>or <I>Hebbian Net</I>. As an example, let's take our trusty logical <I>AND</I> function and see if the algorithm can find the proper weights and bias to solve the problem. The following summation is equivalent to running the algorithm:
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>w</I> = [<I>I</I><SUB>1</SUB>*<I>y</I><SUB>1</SUB>]
+ [<I>I</I><SUB>2</SUB>*<I>y</I><SUB>2</SUB>] + [<I>I</I><SUB>3</SUB>*<I>y</I><SUB>3</SUB>]
+ [<I>I</I><SUB>4</SUB>*<I>y</I><SUB>4</SUB>] = [(-1, -1)*(-1)] +
[(-1, 1)*(-1)] + [( 1, -1)*(-1)] + [(1, 1)*(1)] = (2,2)
<P><I>b</I> = <I>y</I><SUB>1</SUB> + <I>y</I><SUB>2</SUB>
+ <I>y</I><SUB>3</SUB> + <I>y</I><SUB>4</SUB> = (-1) + (-1) +
(-1) + (1) = -2
</FONT>
</BLOCKQUOTE>
<P>Therefore, <I>w</I><SUB>1</SUB>=2, <I>w</I><SUB>2</SUB>=2, and <I>b</I>=-2. These are simply scaled versions of the values <I>w</I><SUB>1</SUB>=1, <I>w</I><SUB>2</SUB>=1, <I>b</I>=-1 that we derived geometrically in the previous section. Killer huh! With this simple learning algorithm we can train a neural net (consisting of a single neurode) to respond to a set of inputs and either classify the input as true or false, 1 or -1. Now if we were to array these neurodes together to create a network of neurodes then instead of simple classifying the inputs as on or off, we can associate patterns with the inputs. This is one of the foundations for the next network neural net structure; the <I>Hopfield</I> net. One more thing, the activation function used for a Hebb Net is a step with a threshold of 0.0 and bipolar outputs 1 and -1.
<P>To get a feel for Hebbian learning and how to implement an actual Hebb Net, Listing 2.0 contains a complete Hebbian Neural Net Simulator. You can create networks with up to 16 inputs and 16 neurodes (outputs). The program is self explanatory, but there are a couple of interesting properties: you can select 1 of 3 activation functions, and you can input any kind of data you wish. Normally, we would stick to the Step activation function and inputs/outputs would be binary or bipolar. However, in the light of discovery, maybe you will find something interesting with these added degrees of freedom. However, I suggest that you begin with the step function and all bipolar inputs and outputs.
<BLOCKQUOTE>
<SPAN CLASS="maintext-2"><FONT COLOR="#000088"><I>Listing 2.0 - A Hebb Net Simulator (in neuralnet.zip).</I></FONT></SPAN>
</BLOCKQUOTE>
<H1>Playing the Hopfield</H1>
<BLOCKQUOTE>
<SPAN CLASS="maintext-2"><FONT COLOR="#000088"><I>Figure 10.0 - A 4 Node Hopfield Autoassociative Neural Net.</I></FONT></SPAN>
<P ALIGN=CENTER><IMG SRC="xneuralnet/Image23.jpg" tppabs="http://www.gamedev.net/reference/articles/xneuralnet/Image23.jpg" width="532" height="376">
</BLOCKQUOTE>
<P>John Hopfield is a physicist that likes to play with neural nets (which is good for us). He came up with a simple (in structure at least), but effective neural network called the <I>Hopfield Net.</I> It is used for autoassociation, you input a vector <I>x</I> and you get <I>x</I> back (hopefully). A Hopfield net is shown in Figure 10.0. It is a single layer network with a number of neurodes equal to the number of inputs <I>X</I><SUB>i</SUB>. The network is fully connected meaning that every neurode is connected to every other neurode and the inputs are also the outputs. This should strike you as weird since there is <I>feedback</I>. Feedback is one of the key features of the Hopfield net and this feedback is the basis for the convergence to the correct result.
<P>The Hopfield network is an <I>iterative autoassociative memory.</I> This means that is may take one or more cycles to return the correct result (if at all). Let me clarify; the Hopfield network takes an input and then feeds it back, the resulting output may or may not be the desired input. This feedback cycle may occur a number of times before the input vector is returned. Hence, a Hopfield network functional sequence is: first we determine the weights based on our input vectors that we want to autoassociate, then we input a vector and see what comes out of the activations. If the result is the same as our original input then we are done, if not, then we take the result vector and feed it back through the network. Now let's take a look at the weight matrix and learning algorithm used for Hopfield nets.
<P>The learning algorithm for Hopfield nets is based on the Hebbian rule and is simply a summation of products. However, since the Hopfield network has a number of input neurons the weights are no longer a single array or vector, but a collection of vectors which are most compactly contained in a single matrix. Thus the weight matrix <I>W</I> for a Hopfield net is created based on this equation:
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>Given:</I>
<UL>
<LI>Inputs vectors are in bipolar form <I>I</I> = (-1,1,,...-1,1) and contain <I>k</I> elements.
<LI>There are <I>n</I> input vectors and we will refer to the set as <I>I</I> and the <I>j</I>th element as <I>I</I><SUB>j</SUB>.
<LI>Outputs will be referred to as <I>y</I><SUB>j</SUB> and there are <I>k</I> of them, one for each input <I>I</I><SUB>j</SUB>.
<LI>The weight matrix <I>W</I> is square and has dimension <I>k</I>x<I>k</I> since there are <I>k</I> inputs.
</UL>
<P><SPAN CLASS="maintext-2"><FONT COLOR="#000088"><I>Eq. 8.0</I></FONT></SPAN>
<P><I> k</I>
<br>
<I>W</I> <SUB>(kxk)</SUB> = <font face="Symbol">q</font> <I>I</I><SUB>i</SUB><SUP>t</SUP> x <I>I</I><SUB>i</SUB>
<br>
<I> i</I>
= 1<br>
</FONT>
</BLOCKQUOTE>
<P>note: each outer product will have dimension <I>k</I>
x <I>k</I>, since we are multiplying a column vector and a row
vector.
<P>and, <I>W</I><SUB>ii</SUB> = 0, for all<I> i</I>.
<P>Notice that there are no bias terms and the main diagonal of <I>W</I> must be all zero's. The weight matrix is simply the sum of matrices generated by multiplying the transpose <I>I</I><SUB>i</SUB><SUP>t</SUP> x <I>I</I><SUB>i </SUB>for all <I>i</I> from 1 to <I>n</I>. This is almost identical to the Hebbian algorithm for a single neurode except that instead of multiplying the input by the output, the input is multiplied by itself, which is equivalent to the output in the case of autoassociation. Finally, the activation function <I>f</I><SUB>h</SUB><I>(x) </I>is shown below:
<BLOCKQUOTE>
<SPAN CLASS="maintext-2"><FONT COLOR="#000088"><I>Eq. 9.0</I></FONT></SPAN>
<FONT COLOR=RED>
<P><I>f</I><SUB>h</SUB><I>(x)</I> <I>= </I>1, if <I>x</I>
<font face="Symbol">'</font> 0<br>
0,
if <I>x</I> < 0
</FONT>
</BLOCKQUOTE>
<P><I>f</I><SUB>h</SUB><I>(x) </I>it is a step function with a binary output. This means that the inputs must be binary, but we already said that inputs are bipolar? Well, they are, and they aren't. When the weight matrix is generated we convert all input vectors to bipolar, but for normal operation we use the binary version of the inputs and the output of the Hopfield net will also be binary. This convention is not necessary, but makes the network discussion a little simpler. Anyway, let's move on to an example. Say we want to create a four node Hopfield net and we want it to recall these vectors:
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>I</I><SUB>1</SUB>=(0,0,1,0), <I>I</I><SUB>2</SUB>=(1,0,0,0),
<I>I</I><SUB>3</SUB>=(0,1,0,1) Note: they are all orthogonal.
</FONT>
</BLOCKQUOTE>
<P>Converting to bipolar *, we have:
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>I</I><SUB>1</SUB><SUP>*</SUP> = (-1,-1,1,-1)
, <I>I</I><SUB>2</SUB><SUP>*</SUP> = (1,-1,-1,-1) , <I>I</I><SUB>3</SUB><SUP>*</SUP>
= (-1,1,-1,1)
</FONT>
</BLOCKQUOTE>
<P>Now we need to compute <I>W</I><SUB>1</SUB>, <I>W</I><SUB>2</SUB>, <I>W</I><SUB>3</SUB>, where <I>W</I><SUB>i</SUB> is the product of the transpose of each input with itself.
<BLOCKQUOTE>
<FONT COLOR=RED>
<P><I>W</I><SUB>1</SUB>= [ <I>I</I><SUB>1</SUB><SUP>*t</SUP> x <I>I</I><SUB>1</SUB><SUP>*</SUP> ] = (-1,-1,1,-1)<SUP>t</SUP> x (-1,-1,1,-1) =
<table border="0" cellpadding="7" cellspacing="0" width="96">
<tr>
<td valign="top" width="25%">1</td>
<td valign="top" width="25%">1</td>
<td valign="top" width="25%">-1</td>
<td valign="top" width="25%">1</td>
</tr>
<tr>
<td valign="top" width="25%">1</td>
<td valign="top" width="25%">1</td>
<td valign="top" width="25%">-1</td>
<td valign="top" width="25%">1</td>
</tr>
<tr>
<td valign="top" width="25%">-1</td>
<td valign="top" width="25%">-1</td>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -