Application of SGT Family Distributions in Quasi Maximum Likelihood Estimation
Application of SGT Family Distributions in Quasi Max imum Likelihood Estimation
Samuel Dodini 0 1
0 This Article is brought to you for free and open access by The Ames Library, the Andrew W. Mellon Center for Curricular and Faculty Development, the Office of the Provost and the Office of the President. It has been accepted for inclusion in Digital Commons @ IWU by the faculty at Illinois Wesleyan University. For more information , please contact , USA
1 Brigham Young University - Utah , USA
Recommended Citation
-
Cover Page Footnote
A special thanks to Dr. James McDonald for helpful comments and MATLAB programming direction.
This article is available in Undergraduate Economic Review: http://digitalcommons.iwu.edu/uer/vol10/iss1/5
Standard ordinary least squares (OLS) estimators in a linear regression
framework minimize the sum of squared errors. These estimators will be the Best
Linear Unbiased Estimators (BLUE) if the Gauss-Markov assumptions hold and
will have the minimum variance of all unbiased estimators if the errors are
normally distributed. In practice, many of these assumptions are violated.
Heteroskedasticity is common in many cross-sectional data sets, as well as some
sort of autocorrelation in time series data. While there are several methods of
addressing the violation of these Gauss-Markov assumptions, such as generalized
least squares, there are fewer rules of thumb to address non-normality in the
residuals, which impacts the efficiency of OLS estimators. This can be especially
important in areas of public policy in which billions of dollars depend on the
choice of estimator. In essence, I ask the question, “What if there is a better
estimator?” I compare the efficiency of OLS estimators to maximum likelihood
estimators assuming the following error distributions:
1) Student's t Distribution (t)
2) Generalized Error Distribution (GED)
3) Inverse Hyperbolic Sine (IHS)
4) Generalized t (GT)
5) Skewed Generalized t (SGT)
These estimators are often called quasi maximum likelihood or partially
adaptive estimators as the regression parameters are estimated along with those of
the approximating error distribution. These distributions can be related using the
SGT tree relationship in Appendix A.
Data
To demonstrate the difference between these several error distributions and
the comparative accuracy of their outcomes using quasi maximum likelihood
estimation, I examine six separate data sets from the Wooldridge data set
collection. These were chosen for their variety, reliable formatting, and
workability and provide a diverse framework of possibilities for real world data
examination. Summary statistics are provided in Appendix B. Each data set is
homoskedastic with no autocorrelation, which isolates the error distribution as a
varying factor. I first perform an OLS regression for my dependent variable (the
first variable in each data set regressed on all the others) as an initial point of
reference.
Poorly Matching Residuals
Below are the reported OLS residual graphs for each such regression.
These consist of a smoothed histogram of the OLS residuals with an overlaid
fitted normal distribution for reference. Notice the discrepancy between these
assumed errors and the actual data residuals.
Beauty
CEO Salary
Crime Rates
The normal distribution does not approximate the actual residuals well due
to rigidity issues in skewness or kurtosis, of which kurtosis seems to be the more
egregious of the two. In these snapshots, there does not appear to be a specific
pattern to these kurtosis issues from these six data sets.
The Current Literature
The essential theme of standard OLS regression theory suggests that, by
the Central Limit Theorem, errors should be asymptotically normal, which may
not be accurate in some specifications.
Efromovich (2005)
suggests a theoretical
justification for the common fall-back of considering residuals as proxies for
underlying regression errors. However, increased efficiency in computing and
econometrics merits delving further into the true errors. Perhaps one of the first
papers to examine non-normality in errors in linear regressions was
Zeckhauser
and Thompson (1970)
, which examined maximum likelihood estimates using the
three parameter power distribution made popular by
Box and Tiao (1964)
. They
argue that the “Supposition [of normality] is often unwarranted and... significant
gains in likelihood may be achieved when the regression technique allows for the
more general class of error distributions,”
(Zeckhauser and Thompson, 1970)
.
They attribute the inapplicability of the Central Limit Theorem to small sample
size, non-normally distributed independent variables, and the presence of the
nonrandom effects of human behavior. They also argue that using variance as a
measure of efficiency loses its explanatory power when underlying errors diverge
from normality, and is particularly important when facing error distributions with
thicker tails. All these m (...truncated)