?? bnbestfit.m
字號:
function [Fhat,Ehat,Et] = bnBestFit(X,Y,w,k,F,bi,Xt,Yt,wt)
% [Fhat,Ehat,Et] = bnBestFit(X,Y,w,k,F,bi,Xt,Yt,wt) - Best-Fit inference
%
% bnBestFit performs a regulatory network inference for a set of variables
% (genes) under the Boolean network model. The function returns the
% Best-Fit function and the corresponding (non-normalized) error-size for
% all input variable combinations in X and for all the target variables in
% Y. That is, rows in X correspond to predictor variables, and rows in Y
% correspond to target variables. The i:th variable at the j:th sample (or
% time point), Y(i,j), is predicted based on the value of the predictor
% variables at the same sample X(:,j). Note that if unity weights (defined
% in w) are used for all the samples then the found functions are equal to
% the ones which minimize the resubstitution error on the sample data (i.e,
% histogram rule for discrete data). In such a case, the corresponding
% error-sizes in Ehat are the (non-normalized) minimum resubstitution
% errors, i.e., the number of errors/misclassifications. In case of tie, a
% random selected function with minimum error-size is returned. If
% additional test data sets (Xt, Yt, and wt) are provided then the
% error-size of the found Best-Fit functions on the test data is computed
% as well. (This can be useful in the case of different cross-validation
% and bootstrap experiments.)
%
% INPUT:
% X - Binary input matrix. X(i,:) corresponds to the (binary) values of
% the i:th predictor variable. Correspondingly, X(:,j) represents
% the values of the predictor variables for the j:th sample (or the
% j:th time point).
% Y - Binary output matrix. Y(i,:) corresponds to the (binary) values
% of i:th target variable. Correspondingly, Y(:,j) represents the
% values of the target variables for the j:th sample. In other
% words, Y(i,j) is the value of the i:th target variable in the
% j:th sample, i.e., the bit that is to be predicted based on
% X(:,j) (this holds for all i and j).
% w - Weight vector containing positive weights for the measurements in
% X and Y. For now, the implementation only allows to define a
% single weight for each column in X (and Y). In particular, the
% weight w(i) defines the weight for the i:th input vector (and the
% corresponding output).
% k - The (maximum) number of variables in the predictor functions,
% i.e., indegree.
% F - The set of Boolean predictors to be used in the inference. If F
% is the empty matrix, then the function class is considered to
% contain all k-variable Boolean functions. F is either a
% (2^k)-by-nf binary matrix or 1-by-nf row vector of integers (see
% also the description of the next argument bs), where k is the
% number of variables in each function and nf is the number of
% functions.
% 1. The case of (2^k)-by-nf binary matrix: Let f = F(:,j) be
% the j:th column of F (i.e., the j:th truth table in F.
% Then, f(0) defines the output value for the input vector
% 00...00, f(1) for the input 00...01, f(2) for the input
% 00...10, ..., and f(2^k) for the input 11...11. Input
% vectors are interpreted such that the left most bit defines
% the value of the first input variable, the second bit from
% the left defines the value of the second input variable,
% ..., and the right most bit defines the value of the last
% (k:th) input variable.
% 2. The case of 1-by-nf row vector of integers: Let f = F(j) be
% the j:th element of F. Then, the first bit (as obtained by
% the bitget command bitget(f,1)) defines the output value
% for the input vector 00...00, the second bit bitget(f,2)
% defines the output for the input 00...01, ..., and the
% 2^k:th bit bitget(f,2^k) defines the output for the input
% 11...11. Thus, the regular truth table presentation of the
% j:th function can be obtained by f = bitget(F(j),[1:2^k])'.
% Note that only the cases k<=5 can be handled by this
% convention.
% bi - A bit (0/1) indicating that whether the Boolean functions in F
% are represented in the form of standard binary truth tables (0)
% or (encoded) integers (1). (This can be used distinguish between
% constant functions and the integer presentations).
% Xt - [Optional] Input data for a separate test data. Format is the
% same as for the matrix X (see above).
% Yt - [Optional] Output data for a separate test data. Format is the
% same as for the matrix Y (see above).
% wt - [Optional] Weights for a separate test data. Format is the same
% as for the matrix w (see above).
%
% OUTPUT:
% Fhat - A 3-D binary matrix of the Best-Fit functions for each input
% variable combinations and for all target variables. Fhat has size
% (2^k)-by-nchoosek(n,k)-by-ni, where n is the size of the first
% dimension of matrix X (i.e., the number of predictor variables)
% and ni is the number of target variables. Fhat(:,:,i) defines the
% Best-Fit functions for the i:th node. In particular, Fhat(:,j,i)
% defines the Best-Fit function for the i:th variable and for the
% j:th variable combination (the j:th variables combination
% corresponds to the variables on the j:th row of the matrix
% nchoosek([1:n],k);). Each column in Fhat(:,:,i) is interpreted as
% the columns in the binary matrix F (see above the case 1).
% Ehat - The error-size of the Best-Fit function for all input variable
% combinations and for all the target variables. Ehat has size
% nchoosek(n,k)-by-ni. Thus, Ehat(i,j) is the error-size of the
% Best-Fit function for i:th input variable combination and for the
% j:th target node.
% Et - This variable is returned only if Xt, Yt and wt are present in
% the input. The error-size of the Best-Fit function for all input
% variable combinations and for all the target variables on the
% separate test data.
% 03.04.2003 by Harri L鋒desm鋕i, modified from bnBestFit.
% Modified: May 14, 2003 by HL.
% 25/08/2005 by HL.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% Define and initialize some variables.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[n,m] = size(X); % The number of predictor genes and the number of measurements.
ni = size(Y,1); % The number of (target) genes.
b = 2.^[k-1:-1:0]'; % Powers of two (used in binary-to-decimal convertions).
W = ones(ni,1)*w; % Weights in a matrix form (assume w is a row vector).
kk = 2^k; % Two to the power of k (needed often).
combnum = nchoosek(n,k); % The number of different variable combinations.
% Generate all variable combinations in advance. This will work only for
% moderately small data sets. If one wants to use larger data sets, then
% the input variable combinations can be generated using the function
% nextnchoosek.m (e.g. given an input variable combination, I =
% nextnchoosek(I,n); generates the next variable combination in
% lexicographial order.
if combnum>20000 % Limit the number of possible combinations.
error('Too many variable combinations. Modify the code a little bit...')
end % if combnum>20000
IAll = nchoosek([1:n],k);
% Modify the variables below if only a subset of all combinations are to be
% checked (e.g. in the case of parallelizing the code...)
starti = 1;
stopi = combnum;
% Initialize the output matrix/matrices.
Fhat = zeros(kk,combnum,ni);
Ehat = zeros(combnum,ni);
% Check that whether additional test data is available.
TestBit = 0;
if nargin==9
Et = zeros(combnum,ni);
Wt = ones(ni,1)*wt; % Wt = repmat(w,ni,1);, weights in vector form.
TestBit = 1;
end % if nargout > 1
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% The main loop separately for unconstrained and constrained case.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% If F is an empty matrix, then the function class is considered to contain
% all k-variables Boolean functions (unconstrained case).
if isempty(F)
% Two times two to the power of k (needed often).
kkk = 2*kk;
% This matrix (C01) has the role of c^(0) and c^(1) for all interesting
% genes. Further, C01 = [c^(0),c^(1)];
C01 = zeros(ni,kkk);
%sC01 = size(C01);
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% Run through all variable combinations.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
for i=starti:stopi
% The current variable combinations (in lexicographical ordering).
I = IAll(i,:);
% Initialize again.
C01 = zeros(ni,kkk);
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% Run through all measurements.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% This loop also takes into account possible multiplisities in
% measurements, i.e., computes new weights for those measurements
% that appear several times in T (and/or F).
for j=1:m
% The current input as a decimal number to be used to index the
% matrix C01.
dn = X(I,j)'*b + 1;
%dn = sum(bitset(0,t1(logical(D(I,IO(1,j)))))) + 1;
%dn = binarr2dec(D(I,IO(1,j))',b) + 1;
% Update C01 (c^(0) and c^(1)). First update left half (C0) and
% then right half (C1)
C01(logical(1-Y(:,j)),dn) = C01(logical(1-Y(:,j)),dn) + w(j);
C01(logical(Y(:,j)),kk+dn) = C01(logical(Y(:,j)),kk+dn) + w(j);
end % for j=1:m
% Find the Best-Fit function for all the nodes.
[OptErr,OptF] = min(cat(3,C01(:,1:kk),C01(:,kk+1:end)),[],3);
OptF = 2 - OptF';
% All output bits having tie are set uniformly randomly. This also
% takes care of the undefined bits due to the initialization of
% matrix C01.
Ties = (C01(:,1:kk)==C01(:,kk+1:end))';
OptF(Ties) = (rand(1,sum(Ties(:))))>0.5;
% Store the Best-Fit functions.
Fhat(:,i,:) = OptF;
% Store the corresponding (weighted) error-size.
Ehat(i,:) = sum(OptErr,2)';
if TestBit % If the test data is provided
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% Apply the Best-Fit functions to the test data.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% All the inputs on the test data as decimal number.
dn = Xt(I,:)'*b + 1;
% Output values of the current functions for all the inputs.
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -