?? psy6003 logistic regression and discriminant analysis.mht

?? 這是博弈論算法全集第六部分:局面描述,其它算法將陸續推出.以便與大家共享
?? MHT
?? 第 1 頁 / 共 3 頁
字號:
suggest that you browse through the menu version of SPSS to learn the =
details. A=20
simple example will illustrate the parallels. Imagine that we had =
carried out a=20
study of voting and wished to know how to best predict whether people =
had voted=20
Conservative or Labour. The commands would be:</P><PRE>LOGISTIC =
REGRESSION /VARIABLES voting WITH age sex class
    att1 att2 att3 att4 extro psycho neuro
    /METHOD FSTEP(LR)
    /CLASSPLOT.</PRE>
<P>The dependent variable is separated from the independent variables by =
the=20
term WITH. The METHOD subcommand uses the keyword FSTEP to specify a=20
<B>forward</B> <B>stepwise</B> procedure; we could also use BSTEP which =
does a=20
<B>backward stepwise</B>, i.e. it starts by entering all the variables =
and then=20
takes them out one at a time; or ENTER is we were engaged in hypothesis =
testing=20
rather than exploratory analysis. If no METHOD subcommand is given, =
ENTER will=20
be assumed. The (LR) term after FSTEP specifies that likelihood ratio=20
considerations will be used in selecting variables to add to or delete =
from the=20
model; this is preferable but can slow computation, so it may be =
necessary to=20
omit it. The /CLASSPLOT line is not strictly necessary but aids =
interpretation -=20
<A =
href=3D"http://www.ex.ac.uk/~SEGLea/multvar2/disclogi.html#classplot">see=
=20
below</A>. </P>
<P>A useful property of the LOGISTIC REGRESSION command is that it can =
cope=20
automatically with categorical independent variables; we don't have to =
write a=20
loop as we do for linear regression. All we have to do is declare any=20
categorical variables on a /CATEGORICAL subcommand <I>as well as</I> on =
the=20
/VARIABLES subcommand. The /CONTRAST subcommand should be used to =
control which=20
category is dropped out when the dummy variables are formed; if the =
control or=20
modal category of, say, a variable DIAGNOST was its third value, we =
would use=20
the subcommand /CONTRAST(DIAGNOST)=3DINDICATOR(3) to tell the LOGISTIC =
REGRESSION=20
to drop level 3 of the variable in forming dummy variables. Although =
this is an=20
improvement over what we have to do when using SPSS to carry out linear=20
regression, there is a snag. /CONTRAST likes its category levels =
specified in=20
rather an odd way; in the example, 3 might not be the value used to code =
the=20
modal category in DIAGNOST: for example, if psychotic, neurotic and =
normal=20
people were coded 0, 1 and 2, the correct entry in /CONTRAST would be 3, =
not 2.=20
Look, I didn't write this idiot system, I'm just trying to tell you =
about it.=20
</P>
<P>As in linear regression, there is no need to declare dichotomous =
independent=20
variables as categorical. </P>
<P>We can also use SPSS to carry out discriminant analysis. For the =
example just=20
considered, the commands would be: </P><PRE>DISCRIMINANT =
GROUPS=3Dvoting(0,1)
    /VARIABLES =3D age sex class att1 to att4 extro psycho neuro
    /METHOD=3DminRESID
    /STATISTICS=3DTABLE.</PRE>
<P>Note that we have to specify the two possible levels of the dependent =

variable (voting). We can use the /METHOD subcommand to request a =
variety of=20
stepwise methods (RAO is another you might like to try), or to ENTER all =
or a=20
subset of variables. The subcommand /STATISTICS=3DTABLE is needed to get =
the=20
classification table which is needed for assessing goodness of fit (see =
below).=20
</P>
<P><I>back to <A=20
href=3D"http://www.ex.ac.uk/~SEGLea/multvar2/disclogi.html#top">top</A></=
I></P>
<H3><A name=3Dreport></A>Interpreting and reporting logistic regression=20
results</H3>
<UL>
  <LI><B>Log likelihoods</B>=20
  <P>A key concept for understanding the tests used in logistic =
regression (and=20
  many other procedures using maximum likelihood methods) is that of =
<B>log=20
  likelihood</B>. Likelihood just means probability, though it tends to =
be used=20
  by statisticians of a <B>Bayesian</B> orientation. It always means =
probability=20
  <I>under a specified hypothesis</I>. In thinking about logistic =
regression,=20
  two hypotheses are likely to be of interest: the null hypothesis, =
which is=20
  that all the coefficients in the regression equation take the value =
zero, and=20
  the hypothesis that the model currently under consideration is =
accurate. We=20
  then work out the likelihood of observing the exact data we actually =
did=20
  observe under each of these hypotheses. The result is nearly always a=20
  frighteningly small number, and to make it easier to handle, we take =
its=20
  natural logarithm (i.e. its log base <I>e</I>) , giving us a log =
likelihood.=20
  Probabilities are always less than one, so log likelihoods are always=20
  negative; often, we work with <B>negative log likelihoods</B> for =
convenience.=20
  </P>
  <LI><B>Goodness of fit</B>=20
  <P>Logistic regression does not give rise to an=20
  <I>R</I><SUP>2</SUP><SUB>adj</SUB> statistic. Darlington (1990, page =
449)=20
  recommends the following statistic as a measure of goodness of fit: =
</P>
  <CENTER><PRE>        exp[(LL<SUB>model</SUB>-LL<SUB>0</SUB>)/N] - 1
LRFC<SUB>1</SUB> =3D ------------------------
            exp(-LL<SUB>0</SUB>/N) - 1
</PRE></CENTER>
  <P>where exp refers to the exponential function (the inverse of the =
log=20
  function), <I>N</I> as usual is sample size, and =
<I>LL</I><SUB>model</SUB> and=20
  <I>LL</I><SUB>0</SUB> are the log likelihoods of the data under the =
model and=20
  the null hypothesis respectively. (Note that I have changed =
Darlington's=20
  notation a little to make it fit in with that used in the rest of =
these=20
  notes.) Darlington's statistic is useful because it takes values =
between 0 and=20
  1 (or 0% and 100%) which have much the same interpretation as values =
of=20
  <I>R</I><SUP>2</SUP><SUB>adj</SUB> or =
<I>R</I><SUP>2</SUP><SUB>adj</SUB> in an=20
  linear regression, although unfortunately it looks from the formula =
that, of=20
  the two, it is more closely analogous to <I>R</I><SUP>2</SUP> . =
Unfortunately=20
  SPSS does not report this statistic. However, it does report =
<I>negative</I>=20
  log likelihoods, multiplied by 2, so with a little adjustment these =
can be=20
  inserted in the equation for <I>LRFC</I><SUB>1</SUB>. </P>
  <P>Rather than using a goodness of fit statistic, though, we often =
want to=20
  look at the proportion of cases we have managed to classify correctly. =
For=20
  this we need to look at the <B>classification table</B> printed out by =
SPSS,=20
  which tells us how many of the cases where the observed value of the =
dependent=20
  variable was 1 have been predicted with a value 1, and so on. An =
advantage of=20
  the classification table is that we can get one out of either logistic =

  regression or discriminant analysis, so we can use it to compare the =
two=20
  approaches. Statisticians claim that logistic regression tends to =
classify a=20
  higher proportion of cases correctly. </P>
  <P><A name=3Dclassplot></A>Another very useful piece of information =
for=20
  assessing goodness of fit can be gained by using the /CLASSPLOT =
subcommand.=20
  This causes SPSS to print distributions of predicted logit values,=20
  distinguishing the observed category values. The resulting plot is =
very useful=20
  for spotting possible outliers. It will also tell you whether it might =
be=20
  better to separate the two predicted categories by some rule other =
than the=20
  simple one SPSS uses, which is to predict value 1 if logit(<I>p</I>) =
is=20
  greater than 0 (i.e. if <I>p</I> is greater than 0.5). A better =
separation of=20
  categories might result from using a different criterion. We might =
also want=20
  to use a different criterion if the <I>a priori</I> probabilities of =
the two=20
  categories were very different (one might be a rare disease, for =
example), or=20
  if the costs of mistakenly predicting someone into the two categories =
differ=20
  (suppose the categories were "found guilty of murder" and "not =
guilty", for=20
  example). The following is an example of such a CLASSPLOT:</P><PRE>    =
  32 +                                                           f+
         |                                                           f|
         |                                                           f|
F        |                                                           f|
R     24 +                                                           f+
E        |                                                           f|
Q        |                                                           f|
U        |                                                           f|
E     16 +                                                           f+
N        |                                                           f|
C        |                                                           f|
Y        |                                                           f|
       8 +                                                           f+
         |                                                           f|
         |                  f f                  f           f  ffffff|
         |          n fnn nnnnnf nnfnn nnn  n fn nnffnff  f ff nfnffff|
Predicted --------------+--------------+--------------+---------------
  Prob:   0            .25            .5             .75             1
  Group:  nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnffffffffffffffffffffffffffffff

          Predicted Probability is of Membership for found guilty

          Symbols: n - not guilty
                   f - found guilty

          Each Symbol Represents 2 Cases.</PRE>
  <P>If we were called as expert witnesses to advise the court about the =

  probability that the person accused had committed murder, using the =
variables=20
  in this particular logistic regression model, we might want to set a =
predicted=20
  probability criterion of .9 rather than .5</P>
  <LI><B>Overall significance</B>=20
  <P>SPSS will offer you a variety of statistical tests. Usually, =
though,=20
  overall significance is tested using what SPSS calls the <I>Model=20
  Chi</I>-<I>square</I>, which is derived from the likelihood of =
observing the=20
  actual data under the assumption that the model that has been fitted =
is=20
  accurate. It is convenient to use -2 times the log (base <I>e</I>) of =
this=20
  likelihood; we call this -2<I>LL</I>. The difference between =
-2<I>LL</I> for=20
  the best-fitting model and -2<I>LL</I> for the null hypothesis model =
(in which=20
  all the <I>b</I> values are set to zero) is distributed like =
chi-squared, with=20
  degrees of freedom equal to the number of predictors; this difference =
is the=20
  <I>Model chi</I>-<I>square</I> that SPSS refers to. Very conveniently, =
the=20
  difference between -2<I>LL</I> values for models with successive terms =
added=20
  also has a chi-squared distribution, so when we use a stepwise =
procedure, we=20
  can use chi-squared tests to find out if adding one or more extra =
predictors=20
  singificantly improves the fit of our model. <A name=3Dcoeffs></A></P>
  <LI><B>The interpretation of coefficients</B>=20
  <P>How can we <I>describe</I> the effect of a single regressor in =
logistic=20
  regression? The fundamental equation for logistic regression tells us =
that=20
  with all other variables held constant, there is a constant increase =
of=20
  <I>b</I><SUB>1</SUB> in logit(<I>p</I>) for every 1-unit increase in=20
  <I>x</I><SUB>1</SUB>, and so on. But what does a constant increase in=20
  logit(<I>p</I>) mean? Because the logit transformation is non-linear, =
it does=20
  not mean a constant increase in <I>p</I>; so the increase in <I>p</I>=20
  associated with a 1-unit increase in <I>x</I><SUB>1</SUB> changes with =
the=20
  value of <I>x</I><SUB>1</SUB> you begin with. </P>
  <P>It turns out that a constant increase in logit(<I>p</I>) does have =
a=20
  reasonably straightforward interpretation. It corresponds to a =
constant=20
  <I>multiplication</I> (by exp(<I>b</I>)) of the <B>odds</B> that the =
dependent=20
  variable takes the value 1 rather than 0. So, suppose =
<I>b</I><SUB>1</SUB>=20
  takes the value 2.30 - we choose this value as an example because =
exp(2.30)=20
  equals 10, so the arithmetic will be easy. Then if =
<I>x</I><SUB>1</SUB>=20
  changes increases by 1, the odds that the dependent variable takes the =
value 1=20
  increase tenfold. So, with this value of <I>b</I><SUB>1</SUB>, let us =
suppose=20
  that with all other variables at their mean values, and =
<I>x</I><SUB>1</SUB>=20
  taking the value 0, we predict a logit(<I>p</I>) of 0; this means that =
there=20
  is an even chance of the dependent variable taking the value 1. Now =
suppose=20
  <I>x</I><SUB>1</SUB> increases to 1. The odds that the dependent =
variable=20
?? 文件大小 800 K
?? 上傳用戶 zhuxiaobei123
?? 所屬分類軟件設計/軟件工程
??? 相關標簽

#算法 #博弈論 #分 #家
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? psy6003 logistic regression and discriminant analysis.mht

?? 快捷鍵說明