?? pa 765 logistic regression.mht

?? 這是博弈論算法全集第六部分:局面描述,其它算法將陸續推出.以便與大家共享
?? MHT
?? 第 1 頁 / 共 5 頁
字號:
  minus 1.96*ASE, where ASE is the asymptotic standard error of logistic =
b.=20
  "Asymptotic" means the smallest possible value for the standard error =
when the=20
  data fit the model. It is also the highest possible precision. The =
real=20
  (enlarged) standard error is typically slightly larger than ASE. One =
typically=20
  uses real SE if one hypothesizes that noise in the data are systematic =
and one=20
  uses ASE if one hypothesizes that noise in the data are random. As the =
latter=20
  is typical, ASE is used here.=20
  <P><A name=3Dmle></A></P>
  <LI><B>Maximum likelihood estimation, MLE</B>, is the method used to =
calculate=20
  the logit coefficients. This contrasts to the use of ordinary least =
squares=20
  (OLS) estimation of coefficients in regression. OLS seeks to minimize =
the sum=20
  of squared distances of the data points to the regression line. MLE =
seeks to=20
  maximize the log likelihood, LL, which reflects how likely it is (the =
odds)=20
  that the observed values of the dependent may be predicted from the =
observed=20
  values of the independents.=20
  <P>MLE is an iterative algorithm which starts with an initial =
arbitrary=20
  "guesstimate" of what the logit coefficients should be, the MLE =
algorithm=20
  determines the direction and size change in the logit coefficients =
which will=20
  increase LL. After this initial function is estimated, the residuals =
are=20
  tested and a re-estimate is made with an improved function, and the =
process is=20
  repeated (usually about a half-dozen times) until <I>convergence</I> =
is=20
  reached (that is, until LL does not change significantly). There are =
several=20
  alternative convergence criteria.=20
  <P><A name=3DWald></A></P>
  <LI><B>Wald statistic: </B>The Wald statistic is commonly used to test =
the=20
  null hypothesis in logistic regression that a particular logit =
(effect)=20
  coefficient is zero. It is the ratio of the unstandardized logit =
coefficient=20
  to its standard error. The Wald statistic tests the significance of =
the logit=20
  coefficient associated with a given independent. The Wald statistic is =
part of=20
  SPSS output in the section "Variables in the Equation." Of course, one =
looks=20
  at the corresponding significance level rather than the Wald statistic =
itself.=20
  This corresponds to significance testing of b coefficients in OLS =
regression.=20
  The researcher may well want to drop independents from the model when =
their=20
  effect is not significant by the Wald statistic.=20
  <P>Menard (p. 39) warns that for large logit coefficients, standard =
error is=20
  inflated, lowering the Wald statistic and leading to Type II errors =
(false=20
  negatives: thinking the effect is not significant when it is). That =
is, there=20
  is a flaw in the Wald statistic such that very large effects may lead =
to large=20
  standard errors and small Wald chi-square values. For models with =
large logit=20
  coefficients or when dummy variables are involved, it is better to =
test the=20
  difference in model chi-squares for the model with the independent and =
the=20
  model without that independent, or to consult the <A=20
  href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#lltests"=20
  target=3Dnew>Log-Likelihood test</A> discussed below. Also note that =
the Wald=20
  statistic is sensitive to violations of the large-sample assumption of =

  logistic regression.=20
  <P>Computationally, the Wald statistic =3D b<SUP>2</SUP> / =
ASE<SUB>b</SUB> where=20
  ASE<SUB>b</SUB> is the asymptotic standard error of the logistic =
regression=20
  coefficient.=20
  <P><A name=3Dscoeff></A></P>
  <LI><B>Standardized logit coefficients</B>, also called =
<I>standardized effect=20
  coefficients</I> or <I>beta weights,</I> correspond to beta =
(standardized=20
  regression) coefficients and like them may be used to compare the =
relative=20
  strength of the independents. SPSS does not output standardized logit=20
  coefficients but note that if one standardizes one's input data first, =
then=20
  the logit coefficients will be standardized logit coefficients. =
Alternatively,=20
  one may multiply the unstandardized logit coefficients times the =
standard=20
  deviations of the corresponding variables, giving a result which is =
<U>not</U>=20
  the standardized logit coefficient but can be used to rank the =
relative=20
  importance of the independent variables. Note: Menard (p. 48) warned =
that as=20
  of 1995, SAS's "standardized estimate" coefficients were really only =
partially=20
  standardized.=20
  <P><A name=3Dpartial></A></P>
  <LI><B>Partial contribution, R</B>. Partial R is an alternative method =
of=20
  assessing the relative importance of the independent variables, =
similar to=20
  standardized partial regression coefficients (beta weights) in OLS =
regression.=20
  R is a function of the Wald statistic, D<FONT size=3D-2>O</FONT> =
(discussed=20
  below), and the number of degrees of freedom for the variable. SPSS =
prints R=20
  in the "Variables in the Equation" section. Note, however, that there =
is a=20
  flaw in the Wald statistic such that very large effects may lead to =
large=20
  standard errors, small Wald chi-square values, and small or zero =
partial R's.=20
  For this reason it is better to use standardized logit coefficients =
for=20
  comparing the importance of independent variables.=20
  <P><A name=3Dloglr></A></P>
  <LI><B>Log-likelihood ratio, Log LR</B>. Log LR chi-square is a better =

  criterion than the Wald statistic when considering which variables to =
drop=20
  from the logistic regression model. It is an option in SPSS output, =
printed in=20
  the section "Model if Term Removed." There are both forward selection =
and=20
  backward stepwise procedures, but in each case the log-likelihood is =
tested=20
  for the model with a given variable dropped from the equation. The =
usual=20
  method, in the syntax window, is METHOD=3DBSTEP(LR), for backward =
stepwise=20
  analysis, with the stopping criterion set by CRITERIA=3DPOUT(1). When=20
  Significance(Log LR) &gt; .05, the variable is a candidate for removal =
from=20
  the model. (Note: Log-likelihood is discussed below. Because it has to =
do with=20
  the significance of the <U>unexplained</U> variance in the dependent, =
if a=20
  variable is to be dropped from the model, dropping it should test as=20
  <U>not</U> significant by Log LR.)=20
  <P><A name=3Dlltests></A></P>
  <LI><B>Log-Likelihood tests</B>, also called "likelihood ratio tests" =
or=20
  "chi-square difference tests", are an alternative to the Wald =
statistic.=20
  Log-likelihood tests appear as Significance(Log LR) in SPSS output =
when you=20
  fit any logistic model. If the log-likelihood test statistic shows a =
small p=20
  value for a model with a large effect size, ignore the Wald statistic =
(which=20
  is biased toward Type II errors in such instances). Log-likelihood =
tests are=20
  also useful when the model dummy-codes categorical variables. Models =
are run=20
  with and without the block of dummy variables, for instance, and the=20
  difference in -2log likelihood between the two models is assessed as a =

  chi-square distribution with degrees of freedom =3D k - 1, where k is =
the number=20
  of categories of the categorical variable.=20
  <P>Model chi-square assesses the overall logistic model but does not =
tell us=20
  if particular independents are more important than others. This can be =
done,=20
  however, by comparing the difference in -2LL for the overall model =
with a=20
  nested model which drops one of the independents. After running =
logistic=20
  regression for the overall and nested models, subtract the deviance =
(-2LL) of=20
  one model from the other and let df =3D the difference in the number =
of terms in=20
  the two models. Look in a table of chi-square distribution and see if =
dropping=20
  the model significantly reduced model fit. Chi-square difference can =
be used=20
  to help decide which variables to drop from or add to the model. This =
can be=20
  done in an automated way, as in stepwise logistic regression, but this =
is not=20
  recommended. Instead the researcher should use theory to determine =
which=20
  variables to add or drop.=20
  <P><A name=3Drepeated></A></P>
  <LI><B>Repeated contrasts</B> is an SPSS option (called <I>profile=20
  contrasts</I> in SAS) which computes the logit coefficient for each =
category=20
  of the independent (except the "reference" category, which is the last =
one by=20
  default). Contrasts are used when one has a categorical independent =
variable=20
  and wants to understand the effects of various levels of that =
variable.=20
  Specifically, a "contrast" is a set of coefficients that sum to 0 over =
the=20
  levels of the independent categorical variable. SPSS automatically =
creates K-1=20
  internal dummy variables when a covariate is declared to be =
categorical with K=20
  values (by default, SPSS leaves out the last category, making it the =
reference=20
  category). The user can choose various ways of assigning values to =
these=20
  internal variables, including <I>indicator contrasts</I>, <I>deviation =

  contrasts</I>, or <I>simple contrasts</I>. In SPSS, indicator =
contrasts are=20
  now the default (old versions used deviation contrasts as default).=20
  <UL>
    <P>
    <LI><I>Indicator contrasts</I> produce estimates comparing each =
other group=20
    to the reference group. David Nichols, senior statistician at SPSS, =
gives=20
    this example of indicator coding output: <PRE>Parameter codings for =
indicator contrasts
------------------------------------------------
                            Parameter
              Value   Freq  Coding
                              (1)    (2)
GROUP
                  1    106  1.000   .000
                  2    116   .000  1.000
                  3    107   .000   .000
------------------------------------------------
</PRE>This example shows a three-level categorical independent (labeled=20
    GROUP), with category values of 1, 2, and 3. The predictor here is =
called=20
    simply GROUP. It takes on the values 1-3, with frequencies listed in =
the=20
    "Freq" column. The two "Coding" columns are the internal values =
(parameter=20
    codings) assigned by SPSS under indicator coding. There are two =
columns of=20
    codings because two dummy variables are created for the three-level =
variable=20
    GROUP. For the first variable, which is Coding (1), cases with a =
value of 1=20
    for GROUP get a 1, while all other cases get a 0. For the second, =
cases with=20
    a 2 for GROUP get a 1, with all other cases getting a 0.=20
    <P></P>
    <LI><I>Simple contrasts</I> compare each group to a reference =
category (like=20
    indicator contrasts). The contrasts estimated for simple contrasts =
are the=20
    same as for indicator contrasts, but the intercept for simple =
contrasts is=20
    an unweighted average of all levels rather than the value for the =
reference=20
    group. That is, with one categorical independent in the model, =
simple=20
    contrast coding means that the intercept is the log odds of a =
response for=20
    an unweighted average over the categories.=20
    <P></P>
    <LI><I>Deviation contrasts</I> compare each group other than the =
excluded=20
    group to the unweighted average of all groups. The value for the =
omitted=20
    group is then equal to the negative of the sum of the parameter =
estimates.=20
    <P></P>
    <LI><I>Contrasts and ordinality: </I>For nominal variables, the =
pattern of=20
    contrast coefficients for a given independent should be random and=20
    nonsystematic, indicating the nonlinear, nonmonotonic pattern =
characteristic=20
    of a true nominal variable. Contrasts can thus be used as a method =
of=20
    empirically differentiating categorical independents into nominal =
and=20
    ordinal classes.=20
    <P></P></LI></UL>
  <P><A name=3Dtables></A></P>
  <LI><B>Classification tables</B> are the 2 x 2 tables in the logistic=20
  regression output for dichotomous dependents, or the 2 x n tables for =
ordinal=20
  and polytomous logistic regression, which tally correct and incorrect=20
  estimates. The columns are the two predicted values of the dependent, =
while=20
  the rows are the two observed (actual) values of the dependent. In a =
perfect=20
  model, all cases will be on the diagonal and the overall percent =
correct will=20
  be 100%. If the logistic model has homoscedasticity (not a logistic =
regression=20
  assumption), the percent correct will be approximately the same for =
both rows.=20
  Since this takes the form of a crosstabulation,. measures of =
association (SPSS=20
  uses lambda-p and tau-p) may be used in addition to percent correct as =
a way=20
  of summarizing the strength of the table:=20
  <P>
  <OL><A name=3Dlambdap></A>
    <LI><B>Lambda-p</B> is a PRE (proportional reduction in error) =
measure,=20
    which is the ratio of (errors without the model - errors with the =
model) to=20
    errors without the model. If lambda-p is .80, then using the =
logistic=20
?? 文件大小 800 K
?? 上傳用戶 zhuxiaobei123
?? 所屬分類軟件設計/軟件工程
??? 相關標簽

#算法 #博弈論 #分 #家
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? pa 765 logistic regression.mht

?? 快捷鍵說明