?? pa 765 logistic regression.mht
字號:
minus 1.96*ASE, where ASE is the asymptotic standard error of logistic =
b.=20
"Asymptotic" means the smallest possible value for the standard error =
when the=20
data fit the model. It is also the highest possible precision. The =
real=20
(enlarged) standard error is typically slightly larger than ASE. One =
typically=20
uses real SE if one hypothesizes that noise in the data are systematic =
and one=20
uses ASE if one hypothesizes that noise in the data are random. As the =
latter=20
is typical, ASE is used here.=20
<P><A name=3Dmle></A></P>
<LI><B>Maximum likelihood estimation, MLE</B>, is the method used to =
calculate=20
the logit coefficients. This contrasts to the use of ordinary least =
squares=20
(OLS) estimation of coefficients in regression. OLS seeks to minimize =
the sum=20
of squared distances of the data points to the regression line. MLE =
seeks to=20
maximize the log likelihood, LL, which reflects how likely it is (the =
odds)=20
that the observed values of the dependent may be predicted from the =
observed=20
values of the independents.=20
<P>MLE is an iterative algorithm which starts with an initial =
arbitrary=20
"guesstimate" of what the logit coefficients should be, the MLE =
algorithm=20
determines the direction and size change in the logit coefficients =
which will=20
increase LL. After this initial function is estimated, the residuals =
are=20
tested and a re-estimate is made with an improved function, and the =
process is=20
repeated (usually about a half-dozen times) until <I>convergence</I> =
is=20
reached (that is, until LL does not change significantly). There are =
several=20
alternative convergence criteria.=20
<P><A name=3DWald></A></P>
<LI><B>Wald statistic: </B>The Wald statistic is commonly used to test =
the=20
null hypothesis in logistic regression that a particular logit =
(effect)=20
coefficient is zero. It is the ratio of the unstandardized logit =
coefficient=20
to its standard error. The Wald statistic tests the significance of =
the logit=20
coefficient associated with a given independent. The Wald statistic is =
part of=20
SPSS output in the section "Variables in the Equation." Of course, one =
looks=20
at the corresponding significance level rather than the Wald statistic =
itself.=20
This corresponds to significance testing of b coefficients in OLS =
regression.=20
The researcher may well want to drop independents from the model when =
their=20
effect is not significant by the Wald statistic.=20
<P>Menard (p. 39) warns that for large logit coefficients, standard =
error is=20
inflated, lowering the Wald statistic and leading to Type II errors =
(false=20
negatives: thinking the effect is not significant when it is). That =
is, there=20
is a flaw in the Wald statistic such that very large effects may lead =
to large=20
standard errors and small Wald chi-square values. For models with =
large logit=20
coefficients or when dummy variables are involved, it is better to =
test the=20
difference in model chi-squares for the model with the independent and =
the=20
model without that independent, or to consult the <A=20
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#lltests"=20
target=3Dnew>Log-Likelihood test</A> discussed below. Also note that =
the Wald=20
statistic is sensitive to violations of the large-sample assumption of =
logistic regression.=20
<P>Computationally, the Wald statistic =3D b<SUP>2</SUP> / =
ASE<SUB>b</SUB> where=20
ASE<SUB>b</SUB> is the asymptotic standard error of the logistic =
regression=20
coefficient.=20
<P><A name=3Dscoeff></A></P>
<LI><B>Standardized logit coefficients</B>, also called =
<I>standardized effect=20
coefficients</I> or <I>beta weights,</I> correspond to beta =
(standardized=20
regression) coefficients and like them may be used to compare the =
relative=20
strength of the independents. SPSS does not output standardized logit=20
coefficients but note that if one standardizes one's input data first, =
then=20
the logit coefficients will be standardized logit coefficients. =
Alternatively,=20
one may multiply the unstandardized logit coefficients times the =
standard=20
deviations of the corresponding variables, giving a result which is =
<U>not</U>=20
the standardized logit coefficient but can be used to rank the =
relative=20
importance of the independent variables. Note: Menard (p. 48) warned =
that as=20
of 1995, SAS's "standardized estimate" coefficients were really only =
partially=20
standardized.=20
<P><A name=3Dpartial></A></P>
<LI><B>Partial contribution, R</B>. Partial R is an alternative method =
of=20
assessing the relative importance of the independent variables, =
similar to=20
standardized partial regression coefficients (beta weights) in OLS =
regression.=20
R is a function of the Wald statistic, D<FONT size=3D-2>O</FONT> =
(discussed=20
below), and the number of degrees of freedom for the variable. SPSS =
prints R=20
in the "Variables in the Equation" section. Note, however, that there =
is a=20
flaw in the Wald statistic such that very large effects may lead to =
large=20
standard errors, small Wald chi-square values, and small or zero =
partial R's.=20
For this reason it is better to use standardized logit coefficients =
for=20
comparing the importance of independent variables.=20
<P><A name=3Dloglr></A></P>
<LI><B>Log-likelihood ratio, Log LR</B>. Log LR chi-square is a better =
criterion than the Wald statistic when considering which variables to =
drop=20
from the logistic regression model. It is an option in SPSS output, =
printed in=20
the section "Model if Term Removed." There are both forward selection =
and=20
backward stepwise procedures, but in each case the log-likelihood is =
tested=20
for the model with a given variable dropped from the equation. The =
usual=20
method, in the syntax window, is METHOD=3DBSTEP(LR), for backward =
stepwise=20
analysis, with the stopping criterion set by CRITERIA=3DPOUT(1). When=20
Significance(Log LR) > .05, the variable is a candidate for removal =
from=20
the model. (Note: Log-likelihood is discussed below. Because it has to =
do with=20
the significance of the <U>unexplained</U> variance in the dependent, =
if a=20
variable is to be dropped from the model, dropping it should test as=20
<U>not</U> significant by Log LR.)=20
<P><A name=3Dlltests></A></P>
<LI><B>Log-Likelihood tests</B>, also called "likelihood ratio tests" =
or=20
"chi-square difference tests", are an alternative to the Wald =
statistic.=20
Log-likelihood tests appear as Significance(Log LR) in SPSS output =
when you=20
fit any logistic model. If the log-likelihood test statistic shows a =
small p=20
value for a model with a large effect size, ignore the Wald statistic =
(which=20
is biased toward Type II errors in such instances). Log-likelihood =
tests are=20
also useful when the model dummy-codes categorical variables. Models =
are run=20
with and without the block of dummy variables, for instance, and the=20
difference in -2log likelihood between the two models is assessed as a =
chi-square distribution with degrees of freedom =3D k - 1, where k is =
the number=20
of categories of the categorical variable.=20
<P>Model chi-square assesses the overall logistic model but does not =
tell us=20
if particular independents are more important than others. This can be =
done,=20
however, by comparing the difference in -2LL for the overall model =
with a=20
nested model which drops one of the independents. After running =
logistic=20
regression for the overall and nested models, subtract the deviance =
(-2LL) of=20
one model from the other and let df =3D the difference in the number =
of terms in=20
the two models. Look in a table of chi-square distribution and see if =
dropping=20
the model significantly reduced model fit. Chi-square difference can =
be used=20
to help decide which variables to drop from or add to the model. This =
can be=20
done in an automated way, as in stepwise logistic regression, but this =
is not=20
recommended. Instead the researcher should use theory to determine =
which=20
variables to add or drop.=20
<P><A name=3Drepeated></A></P>
<LI><B>Repeated contrasts</B> is an SPSS option (called <I>profile=20
contrasts</I> in SAS) which computes the logit coefficient for each =
category=20
of the independent (except the "reference" category, which is the last =
one by=20
default). Contrasts are used when one has a categorical independent =
variable=20
and wants to understand the effects of various levels of that =
variable.=20
Specifically, a "contrast" is a set of coefficients that sum to 0 over =
the=20
levels of the independent categorical variable. SPSS automatically =
creates K-1=20
internal dummy variables when a covariate is declared to be =
categorical with K=20
values (by default, SPSS leaves out the last category, making it the =
reference=20
category). The user can choose various ways of assigning values to =
these=20
internal variables, including <I>indicator contrasts</I>, <I>deviation =
contrasts</I>, or <I>simple contrasts</I>. In SPSS, indicator =
contrasts are=20
now the default (old versions used deviation contrasts as default).=20
<UL>
<P>
<LI><I>Indicator contrasts</I> produce estimates comparing each =
other group=20
to the reference group. David Nichols, senior statistician at SPSS, =
gives=20
this example of indicator coding output: <PRE>Parameter codings for =
indicator contrasts
------------------------------------------------
Parameter
Value Freq Coding
(1) (2)
GROUP
1 106 1.000 .000
2 116 .000 1.000
3 107 .000 .000
------------------------------------------------
</PRE>This example shows a three-level categorical independent (labeled=20
GROUP), with category values of 1, 2, and 3. The predictor here is =
called=20
simply GROUP. It takes on the values 1-3, with frequencies listed in =
the=20
"Freq" column. The two "Coding" columns are the internal values =
(parameter=20
codings) assigned by SPSS under indicator coding. There are two =
columns of=20
codings because two dummy variables are created for the three-level =
variable=20
GROUP. For the first variable, which is Coding (1), cases with a =
value of 1=20
for GROUP get a 1, while all other cases get a 0. For the second, =
cases with=20
a 2 for GROUP get a 1, with all other cases getting a 0.=20
<P></P>
<LI><I>Simple contrasts</I> compare each group to a reference =
category (like=20
indicator contrasts). The contrasts estimated for simple contrasts =
are the=20
same as for indicator contrasts, but the intercept for simple =
contrasts is=20
an unweighted average of all levels rather than the value for the =
reference=20
group. That is, with one categorical independent in the model, =
simple=20
contrast coding means that the intercept is the log odds of a =
response for=20
an unweighted average over the categories.=20
<P></P>
<LI><I>Deviation contrasts</I> compare each group other than the =
excluded=20
group to the unweighted average of all groups. The value for the =
omitted=20
group is then equal to the negative of the sum of the parameter =
estimates.=20
<P></P>
<LI><I>Contrasts and ordinality: </I>For nominal variables, the =
pattern of=20
contrast coefficients for a given independent should be random and=20
nonsystematic, indicating the nonlinear, nonmonotonic pattern =
characteristic=20
of a true nominal variable. Contrasts can thus be used as a method =
of=20
empirically differentiating categorical independents into nominal =
and=20
ordinal classes.=20
<P></P></LI></UL>
<P><A name=3Dtables></A></P>
<LI><B>Classification tables</B> are the 2 x 2 tables in the logistic=20
regression output for dichotomous dependents, or the 2 x n tables for =
ordinal=20
and polytomous logistic regression, which tally correct and incorrect=20
estimates. The columns are the two predicted values of the dependent, =
while=20
the rows are the two observed (actual) values of the dependent. In a =
perfect=20
model, all cases will be on the diagonal and the overall percent =
correct will=20
be 100%. If the logistic model has homoscedasticity (not a logistic =
regression=20
assumption), the percent correct will be approximately the same for =
both rows.=20
Since this takes the form of a crosstabulation,. measures of =
association (SPSS=20
uses lambda-p and tau-p) may be used in addition to percent correct as =
a way=20
of summarizing the strength of the table:=20
<P>
<OL><A name=3Dlambdap></A>
<LI><B>Lambda-p</B> is a PRE (proportional reduction in error) =
measure,=20
which is the ratio of (errors without the model - errors with the =
model) to=20
errors without the model. If lambda-p is .80, then using the =
logistic=20
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -