?? lungbayesdemo.html
字號(hào):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"><html xmlns:mwsh="http://www.mathworks.com/namespace/mcode/v1/syntaxhighlight.dtd"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!--This HTML is auto-generated from an M-file.To make changes, update the M-file and republish this document. --> <title>Modeling Lung Cancer Diagnosis Using Bayesian Network Inference</title> <meta name="generator" content="MATLAB 7.6"> <meta name="date" content="2007-12-03"> <meta name="m-file" content="lungbayesdemo"><style>body { background-color: white; margin:10px;}h1 { color: #990000; font-size: x-large;}h2 { color: #990000; font-size: medium;}/* Make the text shrink to fit narrow windows, but not stretch too far in wide windows. */ p,h1,h2,div.content div { max-width: 600px; /* Hack for IE6 */ width: auto !important; width: 600px;}pre.codeinput { background: #EEEEEE; padding: 10px;}@media print { pre.codeinput {word-wrap:break-word; width:100%;}} span.keyword {color: #0000FF}span.comment {color: #228B22}span.string {color: #A020F0}span.untermstring {color: #B20000}span.syscmd {color: #B28C00}pre.codeoutput { color: #666666; padding: 10px;}pre.error { color: red;}p.footer { text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;} </style></head> <body> <div class="content"> <h1>Modeling Lung Cancer Diagnosis Using Bayesian Network Inference</h1> <introduction> <p>This demo illustrates a simple Bayesian Network example for exact probabilistic inference using Pearl's message-passing algorithm.</p> </introduction> <h2>Contents</h2> <div> <ul> <li><a href="#1">Introduction</a></li> <li><a href="#3">Creating the Bayesian Network</a></li> <li><a href="#5">Visualizing the Bayesian Network as a Graph</a></li> <li><a href="#6">Initializing the Bayesian Network</a></li> <li><a href="#16">Expanding the Network</a></li> <li><a href="#19">Drawing the Expanded Network</a></li> <li><a href="#20">Performing Exact Inference on Clustered Trees</a></li> <li><a href="#23">Explaining Away the Lung Cancer</a></li> <li><a href="#27">References</a></li> </ul> </div> <h2>Introduction<a name="1"></a></h2> <p>Bayesian networks (or belief networks) are probabilistic graphical models representing a set of variables and their dependencies. The graphical nature of Bayesian networks and the ability of describing uncertainty of complex relationships in a compact manner provide a method for modelling almost any type of data. </p> <p>Consider the following example, representing a simplified model to help diagnose the patients arriving at a respiratory clinic. A history of smoking has a direct influence on both whether or not a patient has bronchitis and whether or not a patient has lung cancer. In turn, the presence or absence of lung cancer has direct influence on the results of a chest x-ray test. We are interested in doing probabilistic inference involving features that are not directly related, and for which the conditional probability cannot be readily computed using a simple application of the Bayes' theorem. </p> <h2>Creating the Bayesian Network<a name="3"></a></h2> <p>A Bayesian network consists of a direct-acyclic graph (DAG) in which every node represents a variable and every edge represents a dependency between variables. We construct this graph by specifying an adjacency matrix where the element on row <i>i</i> and column <i>j</i> contains the number of edges directed from node <i>i</i> to node <i>j</i>. The variables of the models are specified by the graph's nodes: <tt>S</tt> (smoking history), <tt>B</tt> (bronchitis), <tt>L</tt> (lung cancer) and <tt>X</tt> (chest x-ray). The variables are discrete and can take only two values: true (<tt>t</tt>) or false (<tt>f</tt>). </p><pre class="codeinput"><span class="comment">%=== setup</span>adj = [0 1 1 0; 0 0 0 0; 0 0 0 1; 0 0 0 0]; <span class="comment">% adjacency matrix</span>nodeNames = {<span class="string">'S'</span>, <span class="string">'B'</span>, <span class="string">'L'</span>, <span class="string">'X'</span>}; <span class="comment">% nodes</span>S = 1; B = 2; L = 3; X = 4; <span class="comment">% node identifiers</span>n = numel(nodeNames); <span class="comment">% number of nodes</span>t = 1; f = 2; <span class="comment">% true and false</span>values = cell(1,n); <span class="comment">% values assumed by variables</span><span class="keyword">for</span> i = 1:numel(nodeNames) values{i} = [t f];<span class="keyword">end</span></pre><p>In addition to the graph structure, we need to specify the parameters of the model, namely the conditional probability distribution. For discrete variables, this distribution can be represented as a table (Conditional Probability Table, <tt>CPT</tt>), which lists the probability that a node takes on each of its value, given the value combinations of its parents. </p><pre class="codeinput"><span class="comment">%=== Conditional Probability Table</span>CPT{S} = [.2 .8];CPT{B}(:,t) = [.25 .05] ; CPT{B}(:,f) = 1 - CPT{B}(:,t);<span class="comment">% CPT{L}(:,t) = [.03 .0005]; CPT{L}(:,f) = 1 - CPT{L}(:,t);</span>CPT{L}(:,t) = [.3 .005]; CPT{L}(:,f) = 1 - CPT{L}(:,t);CPT{X}(:,t) = [.6 .02]; CPT{X}(:,f) = 1 - CPT{X}(:,t);</pre><h2>Visualizing the Bayesian Network as a Graph<a name="5"></a></h2> <p>We can visualize the network structure using the <tt>biograph</tt> object. The properties of nodes and edges can be changed as desidered. </p><pre class="codeinput"><span class="comment">%=== draw the network</span>nodeLabels = {<span class="string">'Smoking'</span>, <span class="string">'Bronchitis'</span>, <span class="string">'Lung Cancer'</span>, <span class="string">'Abnormal Xrays'</span>};bg = biograph(adj, nodeLabels, <span class="string">'arrowsize'</span>, 4);set(bg.Nodes, <span class="string">'shape'</span>, <span class="string">'ellipse'</span>);bgInViewer = view(bg);<span class="comment">%=== save as figure</span>bgFig = figure;copyobj(bgInViewer.hgAxes,bgFig)<span class="comment">%=== annotate using the CPT</span>[xp, xn] = find(adj); <span class="comment">% xp = parent id, xn = node id</span>pa(xn) = xp; <span class="comment">% parents</span>pa(1) = 1; <span class="comment">% root is parent of itself</span>s1 = cell(1,n); s2 = cell(1,n); pos = zeros(n,2);<span class="keyword">for</span> i = 2:n pos(i,:) = bgInViewer.Nodes(i).Position; s1{i} = sprintf(<span class="string">'P(%s|%s=t) = %f'</span>, nodeNames{i}, nodeNames{pa(i)}, CPT{i}(1,t)); s2{i} = sprintf(<span class="string">'P(%s|%s=f) = %f'</span>, nodeNames{i}, nodeNames{pa(i)}, CPT{i}(2,t));<span class="keyword">end</span>pos(1,:) = bgInViewer.Nodes(1).Position; <span class="comment">% root</span>s1{1} = sprintf(<span class="string">'P(%s=t) = %f'</span>, nodeNames{1}, CPT{1}(1));s2{1} = <span class="string">' '</span>;text(pos(:,1)+2, pos(:,2)-10, s1)text(pos(:,1)+2, pos(:,2)-15, s2)</pre><img vspace="5" hspace="5" src="lungbayesdemo_01.png"> <img vspace="5" hspace="5" src="lungbayesdemo_02.png"> <h2>Initializing the Bayesian Network<a name="6"></a></h2> <p>The process of computing the probability distribution of variables given specific evidence is called probabilistic inference. By exploiting local independencies among nodes, Pearls [1] developed a message-passing algorithm for exact inference in singly-connected networks. The algorithm can compute the conditional probability of any variable given any set of evidence by propagation of beliefs between neighboring nodes. For more information about the message-passing algorithm see [2]. We can create and initiate a Bayesian network for the example under consideration as follows: </p><pre class="codeinput">root = find(sum(adj,1)==0); <span class="comment">% root is any node with no parent</span>[nodes, edges] = bnMsgPassCreate(adj, values, CPT);[nodes, edges] = bnMsgPassInitiate(nodes, edges, root)</pre><pre class="codeoutput">nodes = 4x1 struct array with fields: id values parents children peye lambda CPT Pedges = 4x4 struct array with fields: peyeX lambdaX</pre><p>The algorithm parameters, including the conditional probability of each node given the evidence, are stored in the fields of the MATLAB structures <tt>nodes</tt> and <tt>edges</tt>. Using the function <tt>customnodedraw</tt>, we can visualize the distribution of the conditional probability given an empty set of evidence in a series of pie charts, as shown below. </p><pre class="codeinput"><span class="comment">%=== conditional probability given the empty set []</span><span class="keyword">for</span> i = 1:n disp([<span class="string">'P('</span> nodeNames{i}, <span class="string">'|[]) = '</span> num2str(nodes(i).P(1))]);<span class="keyword">end</span><span class="comment">%=== assign relevant info to each node handle</span>nodeHandles = bgInViewer.Nodes;<span class="keyword">for</span> i = 1:n nodeHandles(i).UserData.Distribution = [nodes(i).P];<span class="keyword">end</span><span class="comment">%=== draw customized nodes</span>bgInViewer.ShowTextInNodes = <span class="string">'none'</span>;set(nodeHandles, <span class="string">'shape'</span>,<span class="string">'circle'</span>)bgInViewer.CustomNodeDrawFcn = @(node) customnodedraw(node);<span class="comment">%bgInViewer.Scale = .7</span>bgInViewer.dolayout</pre><pre class="codeoutput">P(S|[]) = 0.2P(B|[]) = 0.09P(L|[]) = 0.064P(X|[]) = 0.05712</pre><img vspace="5" hspace="5" src="lungbayesdemo_03.png"> <p>Suppose we are interested in evaluating the likelihood that a patient with bronchitis has lung cancer. We instantiate <tt>B=t</tt> (true) and we update the network as follows: </p><pre class="codeinput"><span class="comment">%=== inference with B = t</span>evNode = B;evValue = t;[n1, e1, A1, a1] = bnMsgPassUpdate(nodes, edges, [], [], evNode, evValue);<span class="keyword">for</span> i = 1:n disp([<span class="string">'P('</span> nodeNames{i}, <span class="string">'|B=t) = '</span> num2str(n1(i).P(1))]);<span class="keyword">end</span><span class="comment">%== plot and compare</span>figure(); subplot(2,1,1);x = cat(1,nodes.P);bar(x, <span class="string">'stacked'</span>); set(gca, <span class="string">'xticklabel'</span>, nodeNames);ylabel(<span class="string">'Probability'</span>);title(<span class="string">'Initialized network with empty evidence set'</span>)legend({<span class="string">'true'</span>, <span class="string">'false'</span>}, <span class="string">'location'</span>, <span class="string">'SouthEastOutside'</span>)hold <span class="string">on</span>; subplot(2,1,2);x1 = cat(1,n1.P);bar(x1, <span class="string">'stacked'</span>); set(gca, <span class="string">'xticklabel'</span>, nodeNames);ylabel(<span class="string">'Probability'</span>);title(<span class="string">'Updated network with evidence B=true'</span>)legend({<span class="string">'true'</span>, <span class="string">'false'</span>}, <span class="string">'location'</span>, <span class="string">'SouthEastOutside'</span>)</pre><pre class="codeoutput">P(S|B=t) = 0.55556P(B|B=t) = 1P(L|B=t) = 0.16889P(X|B=t) = 0.11796</pre><img vspace="5" hspace="5" src="lungbayesdemo_04.png"> <p>With the observation that the patient has bronchitis (<tt>B = t</tt>), the probability of a true condition for all other nodes has increased. In particular, the probability of smoking history increases because smoking is one leading cause of chronic bronchitis. In turn, because smoking is also associated with lung cancer, the probability of lung cancer increases and so does the probability of an abnormal chest x-ray test. </p> <p>Suppose the patient has not been evaluated for bronchitis but the chest x-ray shows some abnormalities. We instantiate <tt>X = t</tt> and we intialize again the network with the new evidence. </p><pre class="codeinput">evNode = X;evValue = t;[n2, e2, A2, a2] = bnMsgPassUpdate(nodes, edges, [], [], evNode, evValue);<span class="keyword">for</span> i = 1:n disp([<span class="string">'P('</span> nodeNames{i}, <span class="string">'|X=t) = '</span> num2str(n2(i).P(1))]);<span class="keyword">end</span></pre><pre class="codeoutput">P(S|X=t) = 0.67927
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -