?? 數據倉庫與數據挖掘--數據挖掘部分算法的matlab實現 c4_5.htm
字號:
discrete_dim(dims), Uc);<BR> in =
indices(find(features(dim, indices)
> tree.split_loc));<BR> targets =
targets + use_tree(features(dims, :), in, tree.child(2),
discrete_dim(dims), Uc);<BR>else<BR> %Discrete
feature<BR> Uf = unique(features(dim,:));<BR>for i
= 1:length(Uf),<BR> in =
indices(find(features(dim, indices) ==
Uf(i)));<BR> targets =
targets + use_tree(features(dims, :), in, tree.child(i),
discrete_dim(dims), Uc);<BR>
end<BR>end<BR> <BR>%END use_tree
<BR><BR>function tree = make_tree(features, targets, inc_node,
discrete_dim, maxNbin, base)<BR>%Build a tree
recursively<BR><BR>[Ni, L] =
size(features);<BR>Uc
= unique(targets);<BR>tree.dim = 0;<BR>%tree.child(1:maxNbin)
= zeros(1,maxNbin);<BR>tree.split_loc = inf;<BR><BR>if
isempty(features),<BR> break<BR>end<BR><BR>%When
to stop: If the dimension is one or the number of examples is
small<BR>if ((inc_node > L) | (L == 1) | (length(Uc) ==
1)),<BR> H = hist(targets,
length(Uc));<BR> [m, largest] =
max(H);<BR> tree.child =
Uc(largest);<BR> break<BR>end<BR><BR>%Compute the
node´s I<BR>for i =
1:length(Uc),<BR> Pnode(i) =
length(find(targets == Uc(i))) / L;<BR>end<BR>Inode =
-sum(Pnode.*log(Pnode)/log(2));<BR><BR>%For each dimension,
compute the gain ratio impurity<BR>%This is done separately
for discrete and continuous
features<BR>delta_Ib = zeros(1,
Ni);<BR>split_loc = ones(1, Ni)*inf;<BR><BR>for i =
1:Ni,<BR> data = features(i,:);<BR>
Nbins = length(unique(data));<BR> if
(discrete_dim(i)),<BR> %This
is a discrete feature<BR>P = zeros(length(Uc),
Nbins);<BR> for j =
1:length(Uc),<BR>
for k =
1:Nbins,<BR> indices
= find((targets == Uc(j)) & (features(i,:) ==
k));<BR> P(j,k)
=
length(indices);<BR>
end<BR> end<BR> Pk =
sum(P);<BR> P
=
P/L;<BR> Pk =
Pk/sum(Pk);<BR> info =
sum(-P.*log(eps+P)/log(2));<BR> delta_Ib(i)
=
(Inode-sum(Pk.*info))/-sum(Pk.*log(eps+Pk)/log(2));<BR>
else<BR> %This is a
continuous feature<BR> P =
zeros(length(Uc),
2);<BR> <BR> %Sort
the
features<BR> [sorted_data,
indices] =
sort(data);<BR> sorted_targets
=
targets(indices);<BR> <BR> %Calculate
the information for each possible
split<BR> I = zeros(1,
L-1);<BR> for j =
1:L-1,<BR> for
k
=1:length(Uc),<BR> P(k,1)
= length(find(sorted_targets(1:j) ==
Uc(k)));<BR> P(k,2)
= length(find(sorted_targets(j+1:end) ==
Uc(k)));<BR>
end<BR> Ps =
sum(P)/L;<BR>
P = P/L;<BR>
info =
sum(-P.*log(eps+P)/log(2));<BR>
I(j) = Inode - sum(info.*Ps);
<BR> end<BR> [delta_Ib(i),
s] = max(I);<BR>split_loc(i) =
sorted_data(s); <BR>
end<BR>end<BR><BR>%Find the dimension minimizing delta_Ib
<BR>[m, dim] = max(delta_Ib);<BR>dims = 1:Ni;<BR>tree.dim =
dim;<BR><BR>%Split along the ´dim´ dimension<BR>Nf =
unique(features(dim,:));<BR>Nbins = length(Nf);<BR>if
(discrete_dim(dim)),<BR> %Discrete
feature<BR> for i =
1:Nbins,<BR> indices
= find(features(dim, :) ==
Nf(i));<BR> tree.child(i) =
make_tree(features(dims, indices), targets(indices), inc_node,
discrete_dim(dims), maxNbin, base);<BR>
end<BR>else<BR> %Continuous
feature<BR> tree.split_loc =
split_loc(dim);<BR> indices1 =
find(features(dim,:) <= split_loc(dim));<BR>
indices2 = find(features(dim,:) >
split_loc(dim));<BR> tree.child(1) =
make_tree(features(dims, indices1), targets(indices1),
inc_node, discrete_dim(dims), maxNbin);<BR>
tree.child(2) = make_tree(features(dims, indices2),
targets(indices2), inc_node, discrete_dim(dims),
maxNbin);<BR>end</TD></TR></TBODY></TABLE><BR>
<TABLE cellSpacing=0 cellPadding=0 width="100%" align=center
border=0>
<TBODY>
<TR>
<TD width="74%"><A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839">閱讀全文(2863)</A>
| <A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839#comment">回復(6)</A>
| <A href="http://blogger.org.cn/blog/showtb.asp?id=6839"
target=_blank>TrackBack(2)</A> | <A
href="http://blogger.org.cn/blog/User_blog.asp?Action=Modify&ID=6839">編輯</A>
| <A
href="http://blogger.org.cn/blog/User_blog.asp?Action=isbest&ID=6839"
target=_blank>精華</A></TD>
<TD width="26%">
<DIV
align=right> </DIV></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE><BR><BR>
<STYLE type=text/css>A.categorylink:link {
COLOR: #999999
}
A.categorylink:visited {
COLOR: #999999
}
A.categorylink:active {
COLOR: #999999
}
A.categorylink:hover {
COLOR: #ff9900
}
</STYLE>
<TABLE style="TABLE-LAYOUT: fixed; WORD-BREAK: break-all" cellSpacing=1
cellPadding=3 width="98%" bgColor=#cccccc border=0>
<TBODY>
<TR bgColor=#f8f8f8>
<TD>
<P><FONT size=4><STRONG>回復:數據挖掘部分算法的matlab實現 C4_5<A
name=47724></A></STRONG></FONT><BR><A class=categorylink
href="http://blogger.org.cn/blog/list.asp?classid=46"
target=_blank>網上資源</A>, <A class=categorylink
href="http://blogger.org.cn/blog/list.asp?classid=4"
target=_blank>隨筆</A></P>
<P>111(游客)發表評論于2007-3-25 14:55:53 </P></TD></TR>
<TR bgColor=#ffffff>
<TD height=0>
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>是啊,high_histogram是什么功能?能不能給出來啊,急需,謝謝</TD></TR></TBODY></TABLE><BR>
<TABLE cellSpacing=0 cellPadding=0 width="100%" align=center
border=0>
<TBODY>
<TR>
<TD width="74%">個人主頁 | <A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839&commentid=47724#comment">引用回復</A>
| <A
href="http://blogger.org.cn/blog/user_comment.asp?Action=Modify&ID=47724&re=true">主人回復</A>
| <A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839#top">返回</A>
| <A
href="http://blogger.org.cn/blog/User_comment.asp?Action=Modify&ID=47724">編輯</A>
| <A onclick="return confirm('確定要刪除嗎?');"
href="http://blogger.org.cn/blog/User_comment.asp?Action=Del&ID=47724&mainid=6839">刪除</A></TD>
<TD width="26%">
<DIV
align=right> </DIV></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE><BR><BR>
<STYLE type=text/css>A.categorylink:link {
COLOR: #999999
}
A.categorylink:visited {
COLOR: #999999
}
A.categorylink:active {
COLOR: #999999
}
A.categorylink:hover {
COLOR: #ff9900
}
</STYLE>
<TABLE style="TABLE-LAYOUT: fixed; WORD-BREAK: break-all" cellSpacing=1
cellPadding=3 width="98%" bgColor=#cccccc border=0>
<TBODY>
<TR bgColor=#f8f8f8>
<TD>
<P><FONT size=4><STRONG>回復:數據挖掘部分算法的matlab實現 C4_5<A
name=23715></A></STRONG></FONT><BR><A class=categorylink
href="http://blogger.org.cn/blog/list.asp?classid=46"
target=_blank>網上資源</A>, <A class=categorylink
href="http://blogger.org.cn/blog/list.asp?classid=4"
target=_blank>隨筆</A></P>
<P>tt(游客)發表評論于2006-5-12 11:50:37 </P></TD></TR>
<TR bgColor=#ffffff>
<TD height=0>
<TABLE cellSpacing=0 cellPadding=0 width="100%" border=0>
<TBODY>
<TR>
<TD>
<P>我有PCA,但不知道在具體執行時,該怎么用,比如C4_5中inc_node, region分別該怎么給呢?</P>
<P><A
href="mailto:dyj_115@sina.com">dyj_115@sina.com</A></P></TD></TR></TBODY></TABLE><BR>
<TABLE cellSpacing=0 cellPadding=0 width="100%" align=center
border=0>
<TBODY>
<TR>
<TD width="74%">個人主頁 | <A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839&commentid=23715#comment">引用回復</A>
| <A
href="http://blogger.org.cn/blog/user_comment.asp?Action=Modify&ID=23715&re=true">主人回復</A>
| <A
href="http://blogger.org.cn/blog/more.asp?name=xueflhg&id=6839#top">返回</A>
| <A
href="http://blogger.org.cn/blog/User_comment.asp?Action=Modify&ID=23715">編輯</A>
| <A onclick="return confirm('確定要刪除嗎?');"
href="http://blogger.org.cn/blog/User_comment.asp?Action=Del&ID=23715&mainid=6839">刪除</A></TD>
<TD width="26%">
<DIV
align=right> </DIV></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE><BR><BR>
<STYLE type=text/css>A.categorylink:link {
COLOR: #999999
}
A.categorylink:visited {
COLOR: #999999
}
A.categorylink:active {
COLOR: #999999
}
A.categorylink:hover {
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -