Abstract:
Mixture models provide a natural framework for
unobserved heterogeneity in a population.
They are widely applied in astronomy, biology,
engineering, finance, genetics, medicine, social sciences,
and other areas.
An important first step for using mixture models is the test
of homogeneity. Before one tries to fit a mixture model,
it might be of value to know whether the data arise from a
homogeneous or heterogeneous population. If the data are
homogeneous, it is not even necessary to go into mixture modeling.
The rejection of the homogeneous model may also have scientific implications.
For example, in classical statistical genetics,
it is often suspected that only a subgroup of patients have a
disease gene which is linked to the marker. Detecting
the existence of this subgroup amounts to the rejection of
a homogeneous null model in favour of a two-component
mixture model. This problem has attracted intensive
research recently. This thesis makes substantial contributions
in this area of research.
Due to partial loss of identifiability, classic inference methods
such as the likelihood ratio test (LRT) lose their usual elegant
statistical properties. The limiting distribution of the LRT
often involves complex Gaussian processes,
which can be hard to implement in data analysis.
The modified likelihood ratio test (MLRT) is found to be a nice
alternative of the LRT. It restores the identifiability by introducing
a penalty to the log-likelihood function.
Under some mild conditions,
the limiting distribution of the MLRT is
1/2\chi^2_0+1/2\chi^2_1,
where \chi^2_{0} is a point mass at 0.
This limiting distribution is convenient to use in real data analysis.
The choice of the penalty functions in the MLRT is very flexible.
A good choice of the penalty enhances the power of the MLRT.
In this thesis, we first introduce a new class of penalty functions,
with which the MLRT enjoys a significantly improved power for testing
homogeneity.
The main contribution of this thesis is to propose a new class of
methods for testing homogeneity. Most existing methods in the
literature for testing of homogeneity, explicitly or implicitly, are
derived under the condition of finite Fisher information and a
compactness assumption on the space of the mixing parameters. The
finite Fisher information condition can prevent their usage to many
important mixture models, such as the mixture of geometric
distributions, the mixture of exponential distributions and more
generally mixture models in scale distribution families. The
compactness assumption often forces applicants to set artificial
bounds for the parameters of interest and makes the resulting
limiting distribution dependent on these bounds. Consequently,
developing a method without such restrictions is a dream of many
researchers. As it will be seen, the proposed EM-test in this thesis
is free of these shortcomings.
The EM-test combines the merits of the classic LRT and score test.
The properties of the EM-test are particularly easy to investigate
under single parameter mixture models.
It has a simple limiting distribution
0.5\chi^2_0+0.5\chi^2_1, the same as the MLRT.
This result is applicable to mixture models without requiring
the restrictive regularity conditions described earlier.
The normal mixture model is a very popular model in applications.
However it does not satisfy the strong identifiability condition,
which imposes substantial technical difficulties in the study of the
asymptotic properties. Most existing methods do not directly apply
to the normal mixture models, so the asymptotic properties have to
be developed separately. We investigate the use of the EM-test to
normal mixture models and its limiting distributions are derived.
For the homogeneity test in the presence of the structural
parameter, the limiting distribution is a simple function of the
0.5\chi^2_0+0.5\chi^2_1 and \chi^2_1 distributions. The test
with this limiting distribution is still very convenient to
implement. For normal mixtures in both mean and variance parameters,
the limiting distribution of the EM-test is found be to \chi^2_2.
Mixture models are also widely used in the analysis of the
directional data. The von Mises distribution is often regarded as
the circular normal model. Interestingly, it satisfies the strong
identifiability condition and the parameter space of the mean
direction is compact. However the theoretical results in the single
parameter mixture models can not directly apply to the von Mises
mixture models. Because of this, we also study the application of
the EM-test to von Mises mixture models in the presence of the
structural parameter. The limiting distribution of the EM-test is
also found to be 0.5\chi^2_0+0.5\chi^2_1.
Extensive simulation results are obtained to examine the precision
of the approximation of the limiting distributions to the finite
sample distributions of the EM-test. The type I errors with the
critical values determined by the limiting distributions are found
to be close to nominal values. In particular, we also propose
several precision enhancing methods, which are found to work well.
Real data examples are used to illustrate the use of the EM-test.