2.1. Model and Hypothesis
Suppose independent pairs of observations
are available from
a units, where
is a fixed scalar and
is a
p-dimensional random vector. Assume the nonparametric model
holds for each unit
, where for each
x the mean
is an unknown vector valued function and the covariance
is a
unknown positive definite matrix valued function. The error vector
is assumed to be identically and independently distributed across units with mean
and covariance matrix
.
The aim of this paper is to develop tests for no association in the nonparametric heteroscedastic regression model (
1). More precisely, we consider the null hypothesis
for any
x, where
is an unknown vector of constants. We do not assume any functional forms for
and
. Furthermore, the distribution of the error
is unspecified.
2.2. Moving Window One-Way Layout
Our approach is in essence similar to a lack-of-fit test in regression models. Lack-of-fit tests ideally require multiple replicates of the response variable per each value of the predictor variable. In observational studies or for predictors (covariates) measured prior to randomization, replicates per predictor variable values do not typically arise. For this situation, the idea of nearest neighborhood from nonparametric smoothing could be employed to construct artificial replicates. For
, let the window
be the set of indices defined by,
where
is an odd number,
, and
is the indicator function. The units in the
i-th window
will constitute the replications in the
i-th group, for a total of
a groups. For the development of the theory in this paper, we assume that the group sizes are all equal to a fixed number
n. To be precise, the groups at the extreme ends will have sizes smaller than
n. However, the effect of this unbalancedness will be negligible in our asymptotic framework,
.
Roughly speaking, the test of association proposed in this paper examines if mean vectors of the
a groups are significantly different. In this setup, the large sample size in the original sample corresponds to the large number of groups in the moving window one-way layout. Asymptotic tests for MANOVA when the number of groups is large has been previously studied in parametric and nonparametric contexts [
16,
17,
18,
19]. However, these tests assume that the groups are independent and their results are not applicable for moving window one-way layout where the groups are not mutually independent.
Note that under the null hypothesis (
2), within each window (group)
i the response vectors, i.e.,
, have the same mean but different covariance matrices. Furthermore, under the assumptions
and
stated below, the within group covariances will be nearly constant from unit to unit, especially so when
n is relatively small compared to
a. Furthermore, when
is not true, the within group mean vectors would be nearly constant if
is a smooth function. Therefore, test statistics developed for MANOVA with unequal group covariance e.g., [
19] could potentially be sensitive for detecting departure from the null hypothesis (
2). These ideas of a moving window in a one-way layout were previously used for lack-of-fit test [
10,
11] and test of homogeneity of variance [
12,
20] in the univariate setting.
Our theoretical results require some regularity conditions which are listed below.
- A1:
are fixed design values on where is the th quantile of some Lipschitz continuous positive density on .
- A2:
The covaraince is a Lipschitz continuous function.
- A3:
for some .
The sequence
which satisfies assumption A1 is known as a regular sequence [
12,
21]. Assumption A1 stipulates that the design points
satisfy
for
. For example,
;
; is a regular sequence with respect to the uniform distribution. The Liptschitz continuity in A2 is in the senses of the Frobinius norm,
for matrix
. Together, assumptions A1 and A2 imply
for
and universal constants
and
. Specifically,
, if
for any
i. Therefore, assumptions A1 and A2 guarantee that heterogeneities are regulated within window covariance. These assumptions motivate the application of the ideas in the high-dimensional (large number of groups) MANOVA to moving window one-way MANOVA and also permit convenient expression for the asymptotic results.
2.3. Test Statistics
For testing the association hypothesis in (
2), we follow two approaches. The first one uses omnibus (global) tests for heteroscedastic MANOVA proposed in the context of large number of factor levels e.g., [
16,
19]. The second approach is based on the idea of simultaneous inference where multiple univariate tests are combined to construct a composite multivariate test. A somewhat related idea to the latter was implemented in Zambom and Kim [
20] to develop lack-of-fit test in univariate multiple regression.
Let the
data matrix for the augmented (moving window) one-way layout be denoted by
, where
is the matrix of data on the response vector for the
ith group. Further, define the group sample mean vectors and covariance matrices as
and the overall mean as
.
2.3.1. Omnibus Tests
Classical MANOVA tests assume that the number of treatments is fixed and observations in different treatment groups are independent. There has been extension of these tests for large number of treatment groups under general conditions in the parametric [
16,
18,
22] and nonparametric [
17,
19,
23] settings. In the univariate case, the usual
F statistic for one-way ANOVA coincides with the regression lack-of-fit test when there are multiple replications for each observed value of the predictor variable [
9]. In view of this, the large number of treatment asymptotic framework is the ideal setup for large sample inference for the lack-of-fit problem with moving window one-way layout. The multivariate extension of this testing problem from the MANOVA global testing point of view is considered in this section.
For testing the Hypothesis (
2) under the model (
1), consider the test statistic
Here
is the treatment mean squares and cross products matrix and
is the error mean squares and cross product matrix of error for the augmented (moving window) one-way layout data described in
Section 2.2. These matrices are defined by
The introdution of matrix
into the test statistics in (
4) serves multiple purposes. It allows the test statistic to use the information in the off-diagonal elements (correlation information) of
. In addition, with the appropriate choice of
, one can make the test affine invariant in the sense that the test is invariant to the transformation
for any fixed
nonsingular matrix
and any vector
. There are many reasonable choices for the matrix
. In the simulation study, we will consider two of them that correspond to Lawley-Hotelling’s [
24] and Dempster’s [
25] trace statistics which are popular tests in multivariate analysis for low- and high-dimensional situations, respectively. From a theoretical stand point, the Crammer–Wold device affords us a limiting matrix variate normal distruntion for
if we establish asymptotic normality of
for any fixed matrix
. To these end, Theorem 1 gives the asymptotic distribution of
for any fixed
.
Theorem 1. Under assumption A1–A3 and the null hypothesis , for any fixed matrix , where and n is fixed.
As detailed in the proof of Theorem 1, the asymptotic variance
can be expressed as
where
. A cosistent estimator of
may be constructed following the ideas of Dette and Munk [
26] see also [
10,
27]. Specifically, if
is Liptisctz continuous,
is consitent for
. Therefore, a reasonable estimator
of the asymptotic variance
can be created by replacing
in (
6) with
in (
7). The Liptisctz continuity requirement on
allows to control the finite sample bias in the estimation of
. For a valid asymptotic test, one would reject
if
where
is the upper
th quantile of the standard normal distribution.
2.3.2. Composite Tests
The Hypothesis (
2) can be equivalently formulated as the intersection of
p marginal hypotheses as
where
, and
and
are the
kth components of
and
, respectively.
Let
and
be the
kth diagonal entries of
and
, respectively. Wang et al. [
10] studied the test statistic
which is suitable for the marginal hypothesis
. Theorem 2 establishes the asymptotic joint distribution of
.
Theorem 2. Under the assumptions A1–A3 and the null hypothesis , as , where n is fixed and is a positive definite matrix whose entires are defined by and is the th entry of .
An estimator
of the asymptotic covariance can be assembled by taking the correponding entries from
. The result of Theorem 2 can be used to construct a multitude of test statistics. In the simulation study, we investigate
for its performance in finite samples. The critical value for the test statistic
can be obtained from
. Equivalently,
must satisfy
The test based on
falls under the class of multiple contrast test proedures (MCTP) e.g., [
28]. In particular,
enables to identify which of the response variables are not associated with the predictor. We propose the numerical algorithm in Genz and Bretz [
29] to determine
based on the asymptotic joint distribution in Theorem 2.