Nonparametric Instrumental Variable Estimators of Structural Quantile Effects

June 6, 2017 | Autor: Olivier Scaillet | Categoria: Marketing, Econometrics, Economic Theory, Tikhonov Regularization, Instrumental Variable
Share Embed


Descrição do Produto

NONPARAMETRIC INSTRUMENTAL VARIABLE ESTIMATION OF QUANTILE STRUCTURAL EFFECTS

V. Chernozhukov∗, P. Gagliardini† and O. Scaillet‡ This version: April 2008

§

(First version: December 2006)



MIT.



University of Lugano and Swiss Finance Institute.



HEC University of Geneva and Swiss Finance Institute.

§

The last two authors received support by the Swiss National Science Foundation through the National Center of Competence in Research: Financial Valuation and Risk Management (NCCR FINRISK). We would like to thank Oliver Linton, Roger Koenker, and seminar participants at Athens University, Zurich University, Bern University, Boston University, and Queen Mary for helpful comments.

Nonparametric Instrumental Variable Estimation of Quantile Structural Effects Abstract

We study Tikhonov Regularized estimation of quantile structural effects implied by a nonseparable model. The nonparametric instrumental variable estimator is based on a minimum distance principle. We show that the minimum distance problem without regularization is locally ill-posed, and consider penalization by the norms of the parameter and its derivative. We derive the asymptotic Mean Integrated Square Error, the rate of convergence and the pointwise asymptotic normality under a regularization parameter depending on sample size. We illustrate our theoretical findings and the small sample properties with simulation results in two numerical examples. We also discuss a data driven selection procedure of the regularization parameter via a spectral representation of the MISE. Finally, we provide an empirical application to estimation of Engel curves.

Keywords and phrases:

Quantile Regression, Nonparametric Estimation, Instru-

mental Variable, Ill-Posed Inverse Problems, Tikhonov Regularization, Engel Curve. JEL classification: C13, C14, D12. AMS 2000 classification: 62G08, 62G20, 62P20.

1

1

Introduction

This paper deals with nonparametric estimation of quantile functions that measure the structural impact of an endogenous explanatory variable X on the quantiles of the dependent variable Y . In the underlying model Y is linked to X by a structural quantile function that is strictly monotonic increasing in a nonseparable scalar disturbance, and the disturbance is independent of instrument Z (Chernozhukov and Hansen (2005), Chernozhukov, Imbens and Newey (2007)). The concept of endogenous quantile regression extends the exogenous quantile regression (QR) introduced in the seminal work of Koenker and Bassett (1978). QR are nowadays part of the standard toolbox of the econometrician (Koenker (2005)), and find numerous applications in economics and finance (see e.g. the introduction by Fitzenberger, Koenker and Machado (2001) to a special issue on economic applications of QR). This paper builds on a series of fundamental works on econometrics of ill-posed regression settings (Ai and Chen (2003), Darolles, Florens, and Renault (2003), Newey and Powell (2003), Hall and Horowitz (2005), Blundell, Chen, and Christensen (2007)). More recently Chernozhukov, Imbens and Newey (2007), and Horowitz and Lee (2007), have also considered Nonparametric Instrumental Variables (NIV) estimation of quantile structural effects. Chernozhukov, Imbens and Newey (2007) discuss identification

1

, and consider consistent

estimation via a constrained minimum distance criterion as in Newey and Powell (2003) (see Ai and Chen (2003) for semiparametric cases). Horowitz and Lee (2007) give optimal con1

See Chesher (2003) for a control function approach, and Chesher (2007) for comparison between control function and single equation IV approaches.

2

sistency rates for an estimator derived in the same spirit as the NIVR estimator of Hall and Horowitz (2005) (see Darolles, Florens and Renault (2003) for a related estimator, and the review paper by Carrasco, Florens, and Renault (2005)). Both papers extend the abundant literature on NIVR to the nonseparable case. Finally, Chen and Pouzo (2008) study semiparametric sieve estimation of conditional moment models based on possibly nonsmooth generalized residual functions. They extend the consistency and rate results of Blundell, Chen and Kristensen (2007) to cover partially linear quantile IV regression as a particular example. They also obtain important results on

√ n-consistency and asymptotic normality

of the parametric components of the partially linear structural function. In sharp contrast to the above, this paper focuses on the asymptotic distributional characteristics of structural quantile functional estimators. In this paper we follow the route of Gagliardini and Scaillet (GS, 2006), and study a Tikhonov Regularized (TiR) estimator where regularization is achieved via a penalty term incorporating the functional parameter and its derivatives (Groetsch (1984); see also Koenker and Mizera (2004) for related total variation penalization in exogenous quantile regression settings). First, we show the local ill-posedness of the estimation problem when based solely on a minimum distance criterion (Section 2), and explain how to regularize it through a Sobolev norm penalization (Section 3). Second we derive the asymptotic properties of the Q-TiR estimator: we establish consistency (Section 4), compute the asymptotic Mean Integrated Square Error (MISE) and rates of convergence, and prove pointwise asymptotic normality under a regularization parameter depending on sample size (Section 5). Horowitz

3

and Lee (2007) consider L2 norm penalization. They emphasize the difficulty of analyzing the asymptotic properties, such as getting sharp MISE rates and asymptotic normality, in the quantile setting because of the nonlinearity of the operator underlying the estimation problem. We highlight this issue in Section 5.2. We show how to control the errors induced by linearization of the problem under suitable smoothness assumptions in Section 5.3. This makes possible to derive the explicit expression of the asymptotic MISE instead of the upper bounds found by Horowitz and Lee (2007) and to show asymptotic normality of the functional estimator. We check validity of the assumptions in a Gaussian example in Section 5.4. We illustrate our theoretical findings and the small sample properties with simulation results in a separable model and a nonseparable model (Section 6). We also discuss a data driven selection procedure of the regularization parameter via a spectral representation of the MISE, and provide an empirical application to estimation of Engel curves (Section 7). The technical assumptions and proofs of the theoretical results are gathered in the appendices. All omitted proofs of technical Lemmas and detailed computations in the Gaussian example are collected in a Technical Report, which is available online at our web pages.

2

Ill-posedness in nonseparable models

Let us consider the nonseparable model Y = g(X, U) of Chernozhukov, Imbens and Newey (2007), where the error U is independent of the instrument Z, and has a uniform distribution U ∼ U(0, 1). The function g(x, u) is strictly monotonic increasing w.r.t. u ∈ [0, 1]. The variable X has compact support X = [0, 1] and is potentially endogenous. The variable Y

4

also has compact support Y in [0, 1].

2

The variable Z has support Z ⊂RdZ . The parameter

of interest is the quantile structural effect ϕ0 (x) = g(x, τ ) on X for a given τ ∈ (0, 1). The functional parameter ϕ0 belongs to a subset Θ of the Sobolev space H 2 [0, 1], i.e., the completion of the linear space {ϕ ∈ C 1 [0, 1] | ∇ϕ ∈ L2 [0, 1]} with respect to the scalar Z product hϕ, ψiH := hϕ, ψi + h∇ϕ, ∇ψi, where hϕ, ψi = ϕ(x)ψ(x)dx. The Sobolev space X

H 2 [0, 1] is an Hilbert space w.r.t. the scalar product hϕ, ψiH , and the corresponding Sobolev 1/2

norm is denoted by kϕkH = hϕ, ϕiH . We use the L2 norm kϕk = hϕ, ϕi1/2 as consistency norm, and we assume that Θ is bounded w.r.t. k.k . The function ϕ0 satisfies the conditional quantile restriction P [Y ≤ ϕ0 (X) | Z] = P [g(X, U ) ≤ g(X, τ ) | Z] = P [U ≤ τ | Z] = τ ,

(1)

which yields the quantile regression representation used in Horowitz and Lee (2007): P [V ≤ 0 | Z] = τ .

Y = ϕ0 (X) + V,

(2)

From (1), the quantile structural effect is the solution to a nonlinear functional equation A (ϕ0 ) = τ , where the operator A is defined by A (ϕ) (z) =

Z

X

(3)

FY |X,Z (ϕ(x)|x, z) fX|Z (x|z)dx, z ∈ Z, and

FY |X,Z and fX|Z denote the c.d.f. of Y given X, Z, and the p.d.f. of X given Z, respectively. Alternatively, we can rewrite the operator A in terms of the conditional c.d.f. FU|X,Z of U 2

In the nonparametric nonseparable setting, a compact support for both X and Y can be achieved by transformation of the model w.l.o.g.. We assume this to simplify the proofs.

5

given X, Z as A (ϕ) (z) =

Z

X

¢ ¡ FU |X,Z g−1 (x, ϕ(x))|x, z fX|Z (x|z)dx, where g−1 (x, .) denotes

the generalized inverse of function g(x, .) w.r.t. its second argument.

3

We assume identification. Assumption 1: The function ϕ0 is identified. Sufficient conditions ensuring Assumption 1 locally around ϕ0 are given in Chernozhukov, Imbens and Newey (2007). Equation (3) implies the conditional moment restriction m(ϕ0 , z) := E [1 {Y − ϕ0 (X) ≤ 0} − τ |Z = z] = A (ϕ0 ) (z) − τ = 0,

z ∈ Z.

We consider a minimum distance approach for the estimation of the quantile structural effect ϕ0 . The limit criterion is Q∞ (ϕ) :=

£ £ ¤ ¤ 1 1 E m(ϕ, Z)2 = E (A (ϕ) (Z) − τ )2 =: kA (ϕ) − τ k2L2 (FZ ,τ ) , τ (1 − τ ) τ (1 − τ )

where FZ denotes the marginal distribution of Z and L2 (FZ , τ ) denotes the L2 space w.r.t. measure FZ / (τ (1 − τ )). The constant weighting factor 1/ (τ (1 − τ )) is the inverse of the conditional variance V [1 {Y − ϕ0 (X) ≤ 0} − τ |Z] of the moment function.

4

By the

identification assumption, ϕ0 is the unique minimizer of Q∞ on Θ. 3

Since £ ¤Y has a compact support, the strictly monotonic incresing function u → g(x, u) has a bounded range γ, γ , where γ and γ may depend on x, for any x ∈ [0, 1]. Thus, the inverse of g(x, .) is not defined £ ¤ at y ∈ / γ, γ . The generalized inverse g −1 (x, .) is such that g −1 (x, y) = 1 for y > γ and g −1 (x, y) = 0 for y < γ. Consequently, g −1 is such that g(x, u) ≤ y if and only if u ≤ g −1 (x, y), for any x ∈ X , u ∈ [0, 1] and y ∈ R. 4

The weighting factor 1/ (τ (1 − τ )) is irrelevant for minimization of Q∞ . However, it matters for the normalization of the regularization parameter in the penalized criterion (see below).

6

The minimum distance problem is locally ill-posed if, for any r > 0 small enough, there exist ε ∈ (0, r) and a sequence (ϕn ) ⊂ Br (ϕ0 ) := {ϕ ∈ L2 [0, 1] : kϕ − ϕ0 k < r} such that kϕn − ϕ0 k ≥ ε and Q∞ (ϕn ) → 0 (see e.g. Definition 1.1 in Hofmann and Scherzer (1998)). Under a stronger condition than Assumption 1, namely local injectivity of A, this definition of local ill-posedness is equivalent to A−1 being discontinuous in a neighborhood of A (ϕ0 ) (see Engl, Hanke and Neubauer (2000), Chapter 10). This follows from Q∞ (ϕ) = kA (ϕ) − A (ϕ0 )k2L2 (FZ ,τ ) . Proposition 1: Under Assumptions 1 and A.3 (i)-(iii), the problem is locally ill-posed. Proposition 1 establishes the local ill-posedness of the minimum distance estimation problem under suitable boundedness and smoothness assumptions on g(x, τ ), fX|Z , and FU |X,Z . Contrary to what we might expect, there is no general characterization of the illposedness of a nonlinear problem through conditions on its linearization, i.e., on the Frechet derivative of the operator (Engl, Kunisch and Neubauer (1989)). Several counter-examples are available in the literature (Schock (2002)). The proof of Proposition 1 in Appendix 2 relies on a constructive approach and gives explicit sequences (ϕn ) generating ill-posedness.

3

The Q-TiR estimator

We address ill-posedness by Tikhonov regularization (Tikhonov (1963a,b); see also Kress (1999), Chapter 16). We consider a penalized criterion QT (ϕ) + λT kϕk2H , where QT (ϕ) is an empirical counterpart of Q∞ (ϕ) defined by X 1 m ˆ (ϕ, Zt )2 It , QT (ϕ) := T τ (1 − τ ) t=1 T

7

(4)

and λT is a sequence of strictly positive regularization parameters vanishing as sample size grows. In (4) we estimate the conditional moment nonparametrically with m ˆ (ϕ, z) :=

Z

X

fˆX|Z (x|z)FˆY |X,Z (ϕ (x) |x, z) dx − τ =: Aˆ (ϕ) (z) − τ , z ∈ Z,

(5)

where fˆX|Z and FˆY |X,Z denote kernel estimators of fX|Z and FY |X,Z with kernel K and o n −1 ˆ bandwidth hT . Indicator It = 1 Zt ∈ ZT , fZ (Zt ) ≥ (log T ) is a trimming factor based on the sequence of sets ZT ⊂ Z (see Assumption A.7) which controls for small values of

kernel estimator fˆZ of fZ . Definition 1: The Q-TiR estimator is defined by ϕ ˆ := arg inf QT (ϕ) + λT kϕk2H =: LT (ϕ) , ϕ∈Θ

(6)

where QT (ϕ) is as in (4), and λT is a stochastic sequence with λT > 0 and λT → 0, P -a.s.. We prove in Appendix 3.1 that the estimator exists. Term λT kϕk2H in (6) penalizes highly oscillating components of the estimated function induced by ill-posedness, and restores its consistency.

4

Consistency of the Q-TiR estimator

The next result establishes consistency of the Q-TiR estimator with stochastic regularization parameter. Proposition 2: Suppose λT is such that λT > 0, λT → 0, P -a.s. and µ ¶ (log T )2 log T = Op (1), where m ≥ 2 is the order of differentiability of the + h2m T dZ +1 λT T hT 8

joint density of (X, Y, Z) . Then, under Assumptions 1, A.1-A.3, A.7 and A.10, the Q-TiR p

estimator ϕ ˆ is consistent, namely kˆ ϕ − ϕ0 k → 0. The proof of Proposition 2 in Appendix 3.2 relies on two results. First, the Sobolev penalty implies that the sequence of estimates ϕ ˆ for T ∈ N is tight in (L2 [0, 1] , k.k). Thus there exists a compact set that contains ϕ ˆ for any large T with probability 1 − δ, for any arbitrarily small δ > 0. Second, we show a suitable uniform convergence result for QT on Θ. We obtain such a result with an infinite-dimensional and possibly non-totally bounded parameter set by exploiting the specific expression of m(ϕ, ˆ z) given in (5). We are able to reduce the sup over Θ to a sup over a bounded subset of a finite-dimensional space. Combining the tightness and uniform convergence results allows us to conclude on consistency by an argument similar to the one for finite-dimensional or well-posed settings.

5

Asymptotic distribution of the Q-TiR estimator

In the rest of the paper we assume a deterministic regularization parameter λT .

5.1

First-order condition

The asymptotic expansion of the Q-TiR estimator is derived by following the same steps as in the usual finite-dimensional setting. To cope with the functional nature of ϕ0 , we exploit an appropriate notion of differentiation to get the first-order condition. More precisely, we introduce the following operators from L2 [0, 1] to L2 (FZ , τ ) Aψ (z) =

Z

fX,Y |Z (x, ϕ0 (x)|z)ψ (x) dx, 9

(7)

and ˆ (z) = Aψ

Z

ˆ (x)|z)ψ (x) dx, fˆX,Y |Z (x, ϕ

(8)

where z ∈ Z, ψ ∈ L2 [0, 1]. These operators correspond to the Frechet derivative A := DA (ϕ0 ) of operator A at ϕ0 , and to the Frechet derivative Aˆ := DAˆ (ˆ ϕ) of operator Aˆ at ϕ ˆ , respectively (see Appendix 4.1). Under Assumption A.6, operator A is compact. In Appendix 4.2 we show that the Q-TiR estimator satisfies w.p.a. 1 the first-order condition ¯ ´ D ³ E ¯ d ∗ ˆ ¯ ˆ LT (ˆ 0= ϕ) − τ + λT ϕ ϕ + εψ)¯ = 2 A A (ˆ ˆ, ψ , dε H ε=0

(9)

eˆ e Operator Aˆ is defined by H 2 [0, 1], where Aˆ∗ = D−1 A. T X 1 eˆ It fˆX,Y |Z (x, ϕ ˆ (x)|Zt ) ψ (Zt ) and D−1 denotes the inverse of operator Aψ(x) = T τ (1 − τ ) t=1

for any ψ



D : H02 [0, 1] → L2 [0, 1] with D := 1−∇2 and H02 [0, 1] := {ϕ ∈ H 2 [0, 1] : ∇ϕ(0) = ∇ϕ(1) = 0}. e ˜ which are the adjoint operOperators Aˆ∗ and Aˆ are the empirical counterparts of A∗ and A,

ators of A w.r.t. the Sobolev and L2 scalar products on H 2 [0, 1], respectively, and are linked

by A∗ = D−1 A˜ (see GS). From (9) ϕ ˆ satisfies the nonlinear integro-differential equation ³ ´ ϕ) − τ + λT ϕ ˆ = 0. Aˆ∗ Aˆ (ˆ

5.2

(10)

Highlighting the nonlinearity issue

We can rewrite Equation (10) by using the second-order expansion Ab (ˆ ϕ) = Ab (ϕ0 )+ Aˆ0 ∆ˆ ϕ+ Z ˆ := R ˆ (ˆ ˆ where Aˆ0 ψ(z) = fˆX,Y |Z (x, ϕ0 (x)|z)ψ(x)dx and R ϕ, ϕ0 ) is the second order R, residual term (see Lemma A.4). Then, after rearranging, we get ˆ +K ˆ T (∆ˆ ϕ) , ∆ˆ ϕ = ∆ψ 10

(11)

´ ³ ³ ´−1 ´−1 ³ ´³ ∗ ˆ ∗ ∗ ˆ ∗ ∗ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ A − A0 A (ˆ ϕ) − τ A0 rˆ− λT + A0 A0 where ∆ψ = ψ−ϕ0 , with ψ := λT + A0 A0 ´−1 ³ ˆ1 + ψ ˆ 2 and rˆ = Aˆ0 ϕ0 + τ − A(ϕ b 0 ), K ˆ T (∆ˆ ˆ and Aˆ∗0 is defined =: ψ ϕ) := − λT + Aˆ∗0 Aˆ0 Aˆ∗0 R, as Aˆ∗ , but with ϕ0 substituted for ϕ ˆ.

This representation is instrumental in distinguishing the different contributing terms: ˆ 1 is as a linearized solution obtained from replacing the nonlin(i) The interpretation of ψ ear equation Ab (ϕ) = τ with the linear integral type I equation Aˆ0 ϕ ' rˆ, and from applying

Tikhonov regularization to the linear proxy.

ˆ (ii) Impact of nonlinearity is two-fold. On the one hand, we face the second order term R ˆ T (∆ˆ b On the other hand, in K ϕ). It is induced by the expansion of the nonlinear operator A.

ˆ 2 . It is induced by the use of the estimate ϕ we face Aˆ∗ − Aˆ∗0 in ψ ˆ in the Frechet derivative Aˆ in (8). The key difference between our ill-posed setting and standard finite-dimensional parametric estimation problems, or well-posed functional estimation problems, concerns the beˆ 2 . We prove in ˆ T (∆ˆ haviour (and the complex control) of the nonlinearity terms K ϕ) and ψ ˆ T (∆ˆ Appendix 4.3 (i) that K ϕ) satisfies a quadratic bound ° ° C ° °ˆ ϕ)° ≤ √ k∆ˆ ϕk2 , °KT (∆ˆ λT

(12)

w.p.a. 1, with a suitable constant C. From the RHS of (12) we see that the coefficient of the √ quadratic bound is not fixed but rather is proportional to 1/ λT . It diverges as the sample size increases. Hence, the usual argument that the quadratic nonlinearity term is negligible w.r.t. to the first-order term no matter the convergence rate of the latter, does not apply. Still, we can derive an asymptotic expansion for the MISE of the estimator ϕ ˆ in terms of 11

ˆ (see Appendix 4.3 (ii)-(iii)) via Equation (11): ∆ψ µ ∙° ° ¸¶ ∙° ° ¸ £ 1 ° ˆ °3 ° ˆ °2 2¤ . E k∆ˆ ϕk = E °∆ψ° + O √ E °∆ψ ° λT

Then, we need a suitable condition on the choice of the regularization parameter λT to ensure ∙° ° ¸ ° ˆ °2 that E °∆ψ ° is the dominant term in such an expansion, and that the nonlinearity term

ˆ 2 in ∆ψ ˆ is negligible. This is made precise in the next section. ψ

5.3

Mean Integrated Square Error

Proposition 3: Suppose that Assumptions 1 and A hold. Let hT → 0 such that (log T )2 2(dZ +1)

T hT

(13)

= O(1).

Further, let λT → 0 such that for some ε > 0 (log T )2 + hm T = o (λT b (λT )) , dZ +1 T hT

1 max{dZ ,2} T hT

¡ 2+ε ¢ + h2m , T = O λT

(14)

and ∞ ° °2 νj 1X °φ ° + b (λT )2 = o (λT ) , T j=1 (ν j + λT )2 j

(15)

where b (λT ) = k (λT + A∗ A)−1 A∗ Aϕ0 − ϕ0 k, and ν j and φj are the eigenvalues and eigen° ° functions of A∗ A, with °φj °H = 1. Then, up to negligible terms,

∞ ° °2 £ ¤ νj 1X ° ° + b (λT )2 =: VT (λT ) + b (λT )2 =: MT (λT ) . E k∆ˆ ϕk2 = 2 φj T j=1 (ν j + λT )

(16)

Proof: Appendix 4.4.

The sufficient conditions (13)-(15) on hT and λT ensure that the MISE of the estimator ϕ ˆ is asymptotically equivalent to the MISE of the linearization, and asymptotically equal 12

to MT (λT ) given in (16). The possibility to find bandwidth and regularization parameter sequences satisfying these restrictions is determined by an interplay between the smoothness properties of the parameter ϕ0 , through the bias function b (λ), and the severity of ill° °2 posedness, through the decay behaviour of ν j and °φj ° (see Proposition 4 below).

The expression of the asymptotic MISE in (16) is similar to the formula derived in GS for

the MISE of a TiR estimator for NIVR, with operator A in (7) replacing the conditional expectation operator of X given Z. In MT (λT ) we distinguish a regularization bias component b (λT )2 , and a variance component VT (λT ). The bias b (λT ) converges to zero as λT → 0. The formula of VT (λT ) is reminiscent of the usual asymptotic variance of the quantile regression estimator: it involves the factor fV |X,Z (0|x, z) (see (2)) through the Frechet derivative A, and the factor τ (1 − τ ) through the adjoint A∗ , hidden in the spectrum of A∗ A. The optimal regularization sequence λ∗T is defined by minimizing the asymptotic MISE £ ¤ E k∆ˆ ϕk2 w.r.t. λT . The optimal MISE is denoted by MT∗ . The optimal sequence λ∗T also corresponds to the minimizer of MT (λT ) in (16), whenever the latter satisfies Conditions

(13)-(15) for the validity of the asymptotic expansion. In the next section we illustrate the interplay between the smoothness of ϕ0 and the severity of ill-posedness in a Gaussian example of nonparametric IV median regression. The assumptions in Horowitz and Lee (2007) do not cover the Gaussian case, which is known to yield a severely ill-posed estimation problem in NIVR.

13

5.4

A Gaussian example

Let us assume that variables X, U ∗ , Z admit a jointly normal distribution ⎞⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛

with ρ2 +

2

⎟⎟ ⎜⎜ 0 ⎟ ⎜ 1 ρ ⎜ X ⎟ ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎜ U ∗ ⎟ ∼ N ⎜⎜ 0 ⎟ , ⎜ ρ 1 0 ⎟⎟ , ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎠⎠ ⎝⎝ ⎠ ⎝ ⎠ ⎝ 0 1 0 Z

< 1, and define U = Φ (U ∗ ) ∼ U(0, 1), where the function Φ denotes the c.d.f.

of a standard Gaussian variable.

5

We consider a separable specification Y = ϕ ¯ 0 (X) + U ∗

and the case of a median regression, τ = 1/2. From (7), the Frechet derivative operator is given by ¢ ¡ Aψ(z) = φ Φ−1 (τ )

Z

fX|Z,U (x|z, τ ) ψ(x)dx à Z 1 x− z 1 p φ p = √ 2 2 2π R 1 − ρ − 1 − ρ2 −

(17)

R

2

!

ψ(x)dx,

where φ denotes the p.d.f. of a standard Gaussian variable. It is a conditional expectation operator for the distribution of X given Z and U = τ (up to a multiplicative constant). The spectral decomposition of A∗ A admits simple expressions when the norms k.k and k.kH on [0, 1] correspond to norms L2 (FX|U =τ ) and H 2 (FX|U=τ ) on R. We derive it by adapting the standard result for the spectral decomposition of the conditional expectation operator of normal variables (see, e.g., Carrasco, Florens, and Renault (2005)) to the case where the 5

In this section, the variable X is defined on X =R. We can map it to [0, 1] using a transformation with Φ. Other functions on X , norms on X and operators are transformed accordingly. We do not make this transformation explicit to simplify the notation.

14

adjoint A∗ is defined w.r.t. the Sobolev norm. We get: φj (x) =

s

³x´ c2 H , j c2 + j c

νj =

2 c2 2j ξ , j = 0, 1, · · · π c2 + j

(18)

1 − ξ2 1 − ρ2 − 2 = , and Hj (.) for j = 0, 1, · · · 2 ξ ξ2 ° °2 c2 denote the Hermite polynomials. Further, °φj ° = 2 . Thus, the eigenvalues feature c +j ° °2 geometric decay ν j ³ ξ 2j , and the eigenfunctions norms feature hyperbolic decay °φj ° ³ where c =

and ξ ∈ (0, 1) is such that

j −1 . The bias is given by 2

2

b (λ) = λ

∞ X j=1

d2j (λ + ν j )2

, dj =

Z

ϕ ¯ 0 (cx)Hj (x)φ(x)dx.

(19)

R

If function ϕ ¯ 0 is a polynomial (of any degree), then b (λ) ³ λ as λ → 0.

6

We particularize the result in Proposition 3 for a spectrum of operator A∗ A featuring decays as in (18), and a bias function behaving like a power of λ. For expository purpose, we consider dZ = 1. Proposition 4: Suppose that Assumptions 1 and A hold. Further, suppose that ν j ³ e−αj , ° °2 °φj ° ³ j −β , as j → ∞, and b (λ) ³ λδ as λ → 0, for some α, β > 0 and δ ∈ (0, 1]. (i) If 1/2 < δ ≤ 1, hT ³ T −η with η =

(20)

1+δ , and 2 (1 + δ + m) λT ³ T −γ with γ <

6

m , 2 (1 + δ + m)

(21)

The notation a (λ) ³ b(λ) means that functions a (λ) and b(λ) are equivalent as λ → 0 up to logarithmic terms, i.e. c1 [log(1/λ)]c3 ≤ a(λ)/b(λ) ≤ c2 [log(1/λ)]c4 for some constants 0 ≤ c1 ≤ c2 and c3 ≤ c4 .

15

then MT (λT ) ³

1 T λT [log (1/λT )]β

+ λ2δ T .

(22)

(ii) If m+2 ≤ δ ≤ 1, 2 (m + 1)

(23)

and hT is as above, then the optimal regularization parameter λ∗T is such that: λ∗T ³ T −γ with γ =

1 , 1 + 2δ

(24)

and the optimal MISE is 2δ

MT∗ ³ T − 1+2δ .

(25)

Condition (20) guarantees that there exists a suitable window of regularization parameters (given in (21)), for which the asymptotic expansion of the MISE in (22) is valid. Condition (21) is an upper bound on the rate of convergence of λT . The choice for the bandwidth hT is in order to maximize the window of admissible regularization parameters in (21). Under the stronger Condition (23), we can derive the sequence of optimal regularization parameters and the optimal MISE, given in (24) and (25), respectively. Note that the minimal admissible regularity δ =

m+2 depends on the order of differentiability of 2 (m + 1)

the joint density fX,Y,Z , such that it approaches the limit 1/2 as m increases. Condition (23) is a joint constraint on the severity of ill-posedness and on the smoothness of function ϕ ¯ 0 . Given the geometric decay of the eigenvalues in (18), ν j ³ e−αj with α = 2 log (1/ξ), a necessary condition is that the coefficients dj of function ϕ ¯ 0 (cx) in the 16

Hermite basis share a geometric decay.

7

δ

b (λ) ³ λ ,

If d2j ³ e−µj for some µ > 0, we can show that: nµ o δ = min ,1 . 2α

(26)

Condition (23) becomes µ≥α

m+2 . m+1

(27)

° ° If the function ϕ ¯ 0 is such that °∇j ϕ ¯ 0 ° ≤ κj , j ∈ N, for a given κ > 0, we can prove that d2j ≤ e−µj for any µ > 0 and large j, and (27) is satisfied.

The analytical study of the spectral decomposition of operator A∗ A with general norms k.k and k.kH and across probability levels τ ∈ (0, 1) is more complex. However, from the expression of the Frechet derivative in (17), we can make a couple of general remarks for the separable Gaussian case: (i) If function ϕ ¯ 0 is such that |¯ ϕ0 (−x)| = |¯ ϕ0 (x)|, x ∈ R, the asymptotic MISE is symmetric w.r.t. τ → 1 − τ . (ii) Two functions ϕ ¯ 0 and ϕ ˜ 0 that differ by a constant, have the same regularity in the sense that they yield regularization bias functions with same behaviour as λ → 0. (iii) By changing τ ∈ (0, 1), the regularity of ϕ0 (x) = ϕ ¯ 0 (x) + Φ−1 (τ ) is invariant and the impact on the MISE is through the density fX|Z,U (x|z, τ ) , and thus through the spectrum of A∗ A, only.

5.5

Asymptotic normality

In the next proposition we establish pointwise asymptotic normality of the Q-TiR estimator. ³ ´ (log T )2 1+ε/2 m Proposition 5: Suppose that Assumptions 1 and A hold, +hT = O λT b (λT ) , T hdTZ +1 7

To see this, suppose that d2j ≥ cj −µ for some c, µ > 0. Using ν j ≤ e−αj , α = 2 log (1/ξ), and defining −µ −µ j (λ) ∈ N such that e−αj(λ) ³ λ as λ → 0, we have from (19) that b(λ)2 ≥ Cj (λ) ³ [log (1/λ)] .

17

(log T )2 2(1+dZ )

T hT

= O(1),

1 max{dZ ,2}

T hT

for a ε > 0, where σ2T (x) :=

¡ 1+ε ¢ 2+ε 3 + h2m , T = O(λT ), T λT = O(1), MT (λT ) = O λT

∞ X j=1

¡ ¢ MT (λT ) = o λ−ε , T 2 σ T (x)/T

νj 2 ε > 0 we have 2 φj (x). Further, suppose that for a ¯ (λT + ν j )

1 X νj 2 1+¯ 2 ε = o (1) , 2 φj (x) kgj k3 j 2 1/3 T σ T (x) j=1 (λT + ν j ) ∞

1

£ ¤1/3 , gj (x, y, z) := where kgj k3 := E gj (X, Y, Z)3

p √ τ )/ ν j , and hT ically normal:

(28)

(29)

¡ ¢ 0 1 Aφj (z) (1{y ≤ ϕ0 (x)} − τ (1 − τ )

1 X νj = o (1). Then the Q-TiR estimator is asymptot2 σ T (x) j=1 (λT + ν j )2 ∞

q d T /σ 2T (x) (ˆ ϕ(x) − ϕ0 (x) − BT (x)) −→ N (0, 1) , where BT (x) = (λT + A∗ A)−1 A∗ Aϕ0 (x) − ϕ0 (x). Proof: See Appendix 5. Condition (28) requires that the rate of convergence of the variance σ 2T (x)/T at x ∈ X is not too large compared to the global rate of convergence of the MISE. Condition (29) is used to apply a Lyapunov CLT. When kgj k23 j 1+¯ε diverges with j, Condition (29) is an upper bound on the rate of convergence of λT . Under an assumption of geometric spectrum for the eigenvalues ν j , and hyperbolic behavior for the eigenfunction values φ2j (x) and for kgj k3 , the arguments in the proof of Proposition 4 imply that (29) is satisfied whenever λT ≥ cT −γ for some c, γ > 0. Proposition 5 shows that a pointwise nondegenerate limit distribution exists, a prerequisite for applying bootstrap.

18

6

A Monte-Carlo study

6.1

Data generating process

Following Newey and Powell (2003), the errors U1∗ , U2∗ and the instrument Z are jointly normally distributed, with zero means, unit variances and a correlation coefficient of 0.5 between U1∗ and U2∗ . We take X ∗ = Z + U2∗ , and build X = Φ (X ∗ ) and U = Φ(U1∗ ).

8

First we examine a separable design. Case 1 is Y = sin (πX) + U1∗ . The variable X is endogenous. The quantile condition is P [Y − ϕ0 (X) ≤ 0 | Z] = τ , where the functional parameter is ϕ0 (x) = sin (πx)+Φ−1 (τ ), x ∈ [0, 1], for a given τ ∈ (0, 1). The chosen function resembles a concave Engel curve. Case 2 is Y = Φ(3(X +U −1)), and the quantile structural effect is ϕ0 (x) = Φ(3(x + τ − 1)), x ∈ [0, 1], for a given τ ∈ (0, 1). This is a nonseparable specification. We get a monotone increasing convex function in x for small τ , an S-shape for moderate τ , and a monotone increasing concave function for large τ . We climb towards the upper part of the normal c.d.f. by increasing τ , and gradually change the curvature of the function.

6.2

Estimation procedure

To compute the estimate ϕ ˆ , defined on a subset of the function space H 2 [0, 1], we use a numerical series approximation. We rely on standardized shifted Chebyshev polynomials of the first kind (see Section 22 of Abramowitz and Stegun (1970) for their mathematical properties). We take orders 0 to 5 which yields six coefficients (k = 6) to be estimated in the approx8

This data generating process is a Gaussian model of the type considered in Section 5.4, but with the variable X explicitly transformed to live in X = [0, 1].

19

imation ϕ(x) '

k−1 X j=0

p √ 0 θj Pj (x) =: θ P (x), where P0 (x) = T0∗ (x)/ π, Pj (x) = Tj∗ (x)/ π/2,

j 6= 0. The shifted Chebyshev polynomials of the first kind are T0∗ (x) = 1, −1+2x,

T2∗ (x) = 1−8x+8x2 , T3∗ (x) = −1+18x−48x2 +32x3 ,

T1∗ (x) =

T4∗ (x) = 1−32x+160x2 −

256x3 +128x4 , T5∗ (x) = −1+50x−400x2 +1120x3 −1280x4 +512x5 . The squared Sobolev norm Z 1 Z 1 Z 1 k−1 X k−1 X 2 2 2 is approximated by kϕkH = ϕ + (∇ϕ) ' θi θj (Pi Pj + ∇Pi ∇Pj ) =: θ0 Dθ. 0

0

i=0 j=0

0

The coefficients in this quadratic form are explicitly computed with a symbolic calculus pack-

age. The squared L2 norm kϕk2 is approximated similarly by θ0 Bθ, say. Such simple and exact forms ease implementation 9 , and improve on computational speed. The convexity in θ (quadratic penalty) helps the numerical stability of the estimation procedure. Estimated conditional probabilities Pb[Y ≤ ϕ(X)|Z = zt ] used in the criterion are based

on a Gaussian kernel smoother, and are approximated by ¶ µ ¶ µ T X zl − zt yl − θ0 P (xl ) K IK − hY hZ l=1 , ¶ µ T X zl − zt K hZ l=1 Z x K (u) du corresponds to the integrated kernel of K. This smoothing where IK (x) = −∞

is asymptotically equivalent to the one described in Equation (5). We use it because of its numerical tractability: we avoid numerical integration without sacrificing differentiability (in a classical sense) of the approximated empirical criterion QT (θ0 P ) with respect to θ. Individual bandwidths are selected via the standard rule of thumb (Silverman (1986)). Analytical expressions for derivatives are computed and are implemented in a user-supplied 9

The Gauss programs developed for this section and the empirical application are available on request from the authors.

20

analytical gradient and Hessian optimization procedure. In the separable case we take the NIVR estimates of GS on the same regularization parameter grid as starting values. In the nonseparable case we start from the true coefficients of the projection of ϕ0 on the polynomial basis. If we use NIVR estimates instead we need to extend convergence time by a factor of at least five to achieve the same accuracy when τ 6= 1/2. The k × k matrix corresponding to operator Aˆ∗ Aˆ on the subspace spanned by the finiteˆ j iH = dimensional basis of functions {Pj : j = 0, ..., k − 1} in H 2 [0, 1] is given by hPi , Aˆ∗ AP T ³ ³ ´ ´ ³ ´ X 1 ˆ i (Zt ) AP ˆ j (Zt ) = 1 Pb0 Pb AP , i, j = 0, ..., k − 1, where Pb is T τ (1 − τ ) t=1 T i+1,j+1 Z 0 0 −1/2 the T × k matrix with rows Pb (Zt ) = (τ (1 − τ )) ˆ (x)|Zt ) dx, t = P (x) fˆX,Y |Z (x, ϕ

1, ..., T . Hence we can use the suggestion of GS, which consists in estimating the asymptotic spectral representation (16) to select the regularization parameter. We need a first-step 0 ¯ Then we can Q-TiR estimator ¯θ P of ϕ0 based on a pilot regularization parameter λ.

0 perform the spectral decomposition of the matrix D−1 Pb Pb/T to get the eigenvalues νˆj and 0

the eigenvectors w ˆj , normalized to w ˆj D w ˆj = 1, j = 1, ..., k. Finally we can minimize an estimate of the MISE k νˆj 1X ¯ w ˆ0 Bw ˆj M (λ) = T j=1 (λ + νˆj )2 j # " # "µ µ ¶−1 ¶−1 1 1 1 1 0 0 0 0 0 +¯θ −I B λD + Pb Pb Pb Pb λD + Pb Pb Pb Pb − I ¯θ, T T T T ˆ and compute the second-step Q-TiR w.r.t. λ to get the optimal regularization parameter λ, ˆ estimator with b θ using the regularization parameter λ.

21

6.3

Simulation results

We consider sample size T = 1000. In Table 1 we compare the optimal asymptotic MISE and the optimal finite sample MISE of the Q-TiR estimator on 1000 replications. The asymptotic features are computed by Monte-Carlo integration using (16) and (17). The minimum of the finite sample MISE and of the asymptotic MISE are close for all probability levels. For τ = 1/2 they are equal to .0133 and .0147, and correspond to a regularization parameter equal to .0006 and .0009. For other probability levels τ ∈ {.10, .25, .75, .90} we have an increase of the optimal finite sample and asymptotic MISE because of (equally) increasing bias and variance. This is a consequence of a vanishing density function, as we move into the tails, together with the invariant regularity of function ϕ0 (Remark (iii) in Section 5.4). Symmetry with respect to τ = 1/2 is explained by the symmetry of the Frechet derivative (Remark (i)). Optimal regularization parameters are (symmetrically) slightly decreasing as we depart from τ = 1/2. ¯= Table 1 also gathers results for estimation with the data driven procedure. We use λ .0001 as pilot regularization parameter. Other values such as .00005 or .0002 leave the results qualitatively unchanged. We report the average and quartiles of the selected lambda over 1000 simulations, and the average ISE when we use the optimal data driven regularization parameter at each simulation. The selection procedure tends to slightly overpenalize in average, and the selected lambdas are positively skewed. However impact on the MISE of the data driven Q-TiR estimator is low since the average ISE are of the same magnitude as the optimal finite sample MISE. This is true across probability levels τ .

22

When we move to the nonseparable case, Table 2 shows that we need to penalize more. The order of magnitude of the optimal regularization parameter is 10−2 or 10−3 instead of 10−4 . We also observe a larger difference between the asymptotic and finite sample values of the MISE in relative terms, probably explained by the increased complexity of the model. However, the differences in absolute terms are comparable in Case 1 and Case 2. The performance of the data driven procedure is also comparable, and the selection rule continues to deliver good results.

7

An empirical example

This section presents an empirical example with data extracted from the sample of Blundell, Chen and Kristensen (2007); see also Chen and Pouzo (2008) for a companion analysis of these data based on a partially linear quantile IV model. We estimate quantile structural effects for Engel curves based on the conditional quantile condition P [Y ≤ ϕ0 (X) | Z] = τ , with X = Φ (X ∗ ) and Z = Φ (Z ∗ ). Variable Y denotes the expenditure share of a broad category of non-durables and services, X ∗ denotes the standardized logarithm of total expenditures, and Z ∗ denotes the standardized logarithm of annual income from wages and salaries. We consider the quartile structural effects, i.e., τ = {.25, .5, .75}, and the categories: food-in, food-out, fuel and leisure, used in the graphical illustrations of Blundell, Chen and Kristensen (2007). We look at couples with children for which we have 1027 observations from the 1995 British FES. We do not consider couples without children since the three authors suggest to be cautious with that available limited sample. The estimation

23

procedure is as in the Monte-Carlo study and uses data-driven regularization parameters for each quantile structural effect and category. We present our empirical results with eight polynomials (k = 8). We have checked that estimation results remain virtually unchanged when increasing gradually the number of polynomials up to sixteen. Setting in advance k large may raise numerical convergence issues when optimizing a nonlinear criterion. We have observed a stabilization of the value of the optimized objective function, of the loadings in the numerical series approximation, and of the data-driven regularization parameter. We have also observed that higher order polynomials receive loadings which are closer and closer to zero. This suggests that we can limit ourselves to a small number of polynomials in this ¯ = .0001 to get a first step estimator application. We use a pilot regularization parameter λ of ϕ0 , and start the optimization algorithm with the data-driven NIVR estimates. To build pointwise confidence bands we use a nonparametric bootstrap procedure following Blundell, Chen and Kristensen (2007). Figure 1 plots the estimated median structural effect and bootstrap pointwise confidence bands at 95% with 1000 replications for the four categories. Figure 2 is a picture "à la boxplot" where we represent the estimated quantile structural effects at τ = {.25, .5, .75} and the estimated mean structural effect (NIVR estimate) for the four categories. The box-plot interpretation is as follows. For any given value z of the instrument, the conditional probability of the shaded area is asymptotically P [g (X, 0.25) ≤ Y ≤ g (X, 0.75) |Z = z] = .5. Both figures show that the estimated structural effects are nonlinear, and their patterns differ across categories. Regularization parameters range from 10−3 to 10−1 . This exemplifies the

24

need to use a data-driven procedure, and not a fixed value for all categories and probability levels τ . In Figure 3 we report the difference between the estimated quantile structural effect at τ = .75 and at τ = .25, i.e., the estimated interquartile range gˆ(x, .75) − gˆ(x, .25) of the structural effect, together with bootstrap pointwise confidence bands at .95% with 1000 replications. If the model were separable as in the Gaussian example (Section 5.4) and Case 1 (Section 6.1) we would have a straight-line. Departure from separability looks moderate for intermediate values of x in the first three categories, but strong for leisure. A formal test of separability is left for future research. The shaded areas in Figure 2 correspond to Z (ˆ g(x, .75) − gˆ(x, .25))dx. This can be viewed as a rough estimate of the average interquarX

tile range E [g(X, .75) − g(X, .25)]. Indeed, since X ∗ is approximately normally distributed

in the sample, X is approximately uniformly distributed over [0, 1]. The average dispersion is the smallest for fuel, and the largest for leisure. Transforming X ∗ by the inverse of its empirical c.d.f. to get X does not affect qualitatively the results here.

25

References Abramowitz, M. and I. Stegun (1970): Handbook of Mathematical Functions, Dover Publications, New York. Adams, R. (1975): Sobolev Spaces, Academic Press, Boston. Ai, C. and X. Chen (2003): "Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions", Econometrica, 71, 1795-1843. Alt, H. (1992): Lineare Funktional Analysis, Springer Verlag, Berlin. Andrews, D. (1994): "Empirical Process Methods in Econometrics", in the Handbook of Econometrics, Vol. 4, Engle, R. and D. McFadden, Eds., North Holland, 2247-2294. Blundell, R., Chen, X. and D. Kristensen (2007): "Semi-Nonparametric IV Estimation of Shape Invariant Engel Curves", Econometrica, 75, 1613-1669. Carrasco, M., Florens, J.-P. and E. Renault (2005): "Linear Inverse Problems in Structural Econometrics: Estimation Based on Spectral Decomposition and Regularization", Handbook of Econometrics, forthcoming. Chen, X. and D. Pouzo (2008): "Efficient Estimation of Semiparametric Conditional Moment Models with Possibly Nonsmooth Moments", Cowles Fondation Discussion Paper 1640. Chernozhukov, V. and C. Hansen (2005): "An IV Model of Quantile Treatment Effect", Econometrica, 73, 245-261. Chernozhukov, V., Imbens, G. and W. Newey (2007): "Instrumental Variable Estimation of Nonseparable Models", Journal of Econometrics, 139, 4-14. Chesher, A. (2003): "Identification in Nonseparable Models", Econometrica, 71, 14051441. Chesher, A. (2007): "Identification of Nonadditive Structural Functions", in Advances in Economics and Econometrics, Theory and Applications: 9th World Congress of the Econometric Society, Blundell, R., Persson, T. and W. Newey, Eds., Volume 3, Cambridge University Press, Cambridge. Darolles, S., Florens, J.-P. and E. Renault (2003): "Nonparametric Instrumental Regression", Working Paper. Engl, H., Hanke, M. and A. Neubauer (2000): Regularization of Inverse Problems, Kluwer academic publishers, Dordrecht.

26

Engl, H., Kunisch, K. and A. Neubauer (1989): "Convergence Rates for Tikhonov Regularization of Non-linear Ill-posed Problems", Inverse problems, 5, 523-540. Fitzenberger, B., Koenker, R. and A. Machado (2001): "Introduction", Empirical Economics, 26, 1-5. Gagliardini, P. and O. Scaillet (2006): "Tikhonov Regularization for Nonparametric Instrumental Variable Estimators", Working Paper. Groetsch, C. (1984): The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Pitman Advanced Publishing Program, Boston. Hall, P. and J. Horowitz (2005): "Nonparametric Methods for Inference in the Presence of Instrumental Variables", Annals of Statistics, 33, 2904-2929. Hansen, B. (2007): "Uniform Convergence Rates for Kernel Estimation with Dependent Data", forthcoming in Econometric Theory. Hofmann, B. and O. Scherzer (1998): "Local Ill-posedness and Source Conditions of Operator Equations in Hilbert Spaces", Inverse Problems, 14, 1189-1206. Horowitz, J. and S. Lee (2007): "Nonparametric Instrumental Variables Estimation of a Quantile Regression Model", Econometrica, 75, 1191-1208. Koenker, R. (2005): Quantile Regression, Econometric Society Monographs, 38, Cambridge University Press, Cambridge. Koenker, R. and G. Bassett (1978): "Regression Quantiles", Econometrica, 46, 33-50. Koenker, R and I. Mizera (2004): "Penalized Triograms: Total Variation Regularization for Bivariate Smoothing", Journal of the Royal Statistical Society, Series B, 66, 145-163. Kress, R. (1999): Linear Integral Equations, Springer, New York. Newey, W. and J. Powell (2003): "Instrumental Variable Estimation of Nonparametric Models", Econometrica, 71, 1565-1578. Reed, M. and B. Simon (1980): Functional Analysis, Academic Press, San Diego. Schock, E. (2002): "Non-linear Ill-posed Equations: Counter-examples", Inverse problems, 18, 715-717. Silverman, B. (1986): Density Estimation for Statistics and Data Analysis, Chapman and Hall, London. Tikhonov, A. N. (1963a): "On the Solution of Incorrectly Formulated Problems and the Regularization Method", Soviet Math. Doklady, 4, 1035-1038 (English Translation). 27

Tikhonov, A. N. (1963b): "Regularization of Incorrectly Posed Problems", Soviet Math. Doklady, 4, 1624-1627 (English Translation). White, H. and J. Wooldridge (1991): "Some Results on Sieve Estimation with Dependent Observations", in Nonparametric and Semiparametric Methods in Econometrics and Statistics, Proceedings of the Fifth International Symposium in Economic Theory and Econometrics, Cambridge University Press.

28

Appendices

In Appendix 1, we list the regularity conditions and provide their detailed discussion. In Appendix 2, we prove Proposition 1 on local ill-posedness of the nonparametric estimation of the endogenous quantile problem. In Appendix 3, we establish Proposition 2 on consistency of the Q-TiR estimator, and provide the auxiliary result of its existence. In Appendix 4, we prove Proposition 3 on the explicit expression of the asymptotic MISE, and provide the auxiliary steps leading to it. In Appendix 5 we show Proposition 5 on asymptotic normality of the Q-TiR estimator.

Appendix 1: List of regularity conditions

A.1: {(Xt , Yt , Zt ) : t = 1, ..., T } is an i.i.d. sample from a distribution admitting a density fX,Y,Z with convex support S = X × Y × Z ⊂ Rd , X = [0, 1], Y = [0, 1], d = 2 + dZ . ¡ ¢ A.2: The density fX,Y,Z of (X, Y, Z) is in class C m Rd , with m ≥ 2, and ∇α fX,Y,Z is d X d uniformly continuous and bounded, for any α ∈ N with |α| := αi = m. i=1

A.3: (i) Function τ 7→ g(x, τ ) is strictly monotonic increasing and continuous, for almost any x ∈ (0, 1), and sup |g(x, τ )| < ∞, sup |∇x g(x, τ )| < ∞; (ii) sup fX|Z (x|z) < ∞, x,τ

x,τ

x,z

¯ ¯ ¯ ¯ ¯ ¯ sup ¯∇x fX|Z (x|z)¯ < ∞ and sup ¯∇z fX|Z (x|z)¯ < ∞ ; (iii) sup ¯∇x FU|X,Z (u|x, z)¯ < ∞ and x,z

x,z

¯ ¯ sup ¯∇z FU |X,Z (u|x, z)¯ < ∞ ; (iv) sup fU|X,Z (u|x, z) < ∞ .

u,x,z

u,x,z

u,x,z

A.4: There exists h > 0 such that function q (s) := sup |∇fX,Y,Z (v)|, s ∈ S, is integrable v∈Bh (s)

29

w.r.t. Lebesgue measure on S, where Bh (s) denotes the ball in Rd of radius h around s. A.5: There exists h > 0 such that function qα (s) := sup |∇α fX,Y,Z (v)|, s ∈ S, satisfies v∈Bh (s) Z 2 qα (s) ds < ∞, for any α ∈ Nd with |α| = m. f (s) S X,Y,Z A.6: (i) sup fX,Y |Z (x, ϕ0 (x)|z) < ∞; (ii) sup |∇y fX,Y |Z (x, y|z)| < ∞. x,z

x,y,z

´ ³ ¡ ¢ ¯ A.7: Set ZT ⊂ Z is such that sup |z| = O T b for b < ∞, P (Z ∈ ZTc ) = O T −b for z∈ZT

any ¯b > 0 and inf fZ (z) ≥ 2(log T )−1 . z∈ZT

A.8: We have:

¯ ¯ 20 ¯∇y fX,Y |Z (x, y|z)¯2 < 1 sup τ (1 − τ ) x∈X ,y∈Y,z∈Z kϕ0 k2H

inf

ψ:kψkH ≤2kϕ0 kH

hψ, A∗ AψiH . kψk2

A.9: Function ϕ0 is an interior point of Θ w.r.t. k.k. d

A.10: The kernel K on R is such that (i)

Z

K(u)du = 1 and K is bounded; (ii) K has Z compact support; (iii) K is differentiable, with bounded derivatives; (iv) uα K(u)du = 0 for any α ∈ Nd with |α| < m, where m is as in Assumption A.2 .

A.11: The h., .iH -orthonormal basis of eigenfunctions ­ ®2 ∞ ∞ X X ° ° φj , φl °φj ° < ∞; (ii) fies (i) 2, where ω(z) := fZ (z) j∈N

Assumption A.4 . A.13:

h¯ ¯2 i ¡ d ¢ ¯ Z Functions ψj are in class C R < ∞ such that (i) sup E ∇ψj (Z)¯ m

j∈N

30

h h¯ ¯ ¯2 i ¯2 i sup E ω (Z) ¯∇ψj (Z)¯ < ∞; (ii) sup E ¯∇α ψj (Z)¯ < ∞ and j∈N j∈N h¯ ¯2 i sup E ¯∇α ψj (Z − ζ) − ∇α ψj (Z)¯ → 0 as |ζ| → 0, for any α ∈ NdZ with |α| = m.

and

j∈N

In Assumption A.1, the compact support of X and Y is used for technical reasons. Mapping in [0, 1] can be achieved by simple linear or nonlinear monotone transformations. Assuming univariate X simplifies the exposition. Assumptions A.2 and A.10 are classical conditions in kernel density estimation concerning smoothness of the density and of the kernel. In particular, when m > 2, K is a higher order kernel. Moreover, we assume a compact support for the kernel K to simplify the set of regularity conditions. It is possible to reformulate the list of technical assumptions in terms of high-level conditions concerning the nonparametric estimation of fX,Y,Z . This facilitates the extension to other types of smoothing methods, but requires listing some technical conditions to get the results on the MISE and asymptotic normality, which are difficult to interpret at the generic level. This also complicates the separation of the regularity conditions on the nonparametric smoothing parameter and the regularization parameter. Assumptions A.3 (i)-(iii) are used to prove local ill-posedness (Proposition 1). Specifically, Assumption A.3 (i) is a boundedness and smoothness condition on function g(x, τ ) w.r.t. both its arguments. In particular, it implies that the structural quantile effect ϕ0 ∈ H 2 [0, 1], for any τ ∈ (0, 1). Assumptions A.3 (ii) and (iii) concern boundedness and smoothness of the densities of X given Z, and of U given X, Z, respectively. Assumption A.3 (iv) also concerns the density of U given X, Z and is used in the proof of consistency 31

(Proposition 2). The set ZT in Assumption A.7 is used to introduce a trimming for small values of fZ (z). Assumptions A.1-A.3, A.7 and A.10 imply the consistency of the Q-TiR estimator. The remaining assumptions are used to derive the asymptotic expansion of the MISE and prove asymptotic normality. Specifically, Assumptions A.4 and A.5 impose integrability conditions on suitable measures of local variation of density fX,Y,Z . These assumptions are used in the proof of Lemmas A.11 and A.12 to bound higher order terms in the asymptotic expansion of the MISE coming from kernel estimation bias. Assumption A.6 is used to show that A is Frechet differentiable, with compact Frechet derivative. This assumption can be rewritten in terms of densities fU|X,Z , fX|Z and function g. The formulation as in A.6 is closer to the use in the proofs, and simplifies the exposition. In Assumption A.8, sup x∈X ,y∈Y,z∈Z

¯ ¯ ¯∇y fX,Y |Z (x, y|z)¯ is involved in the bound of the quadratic term in the expansion

of A (see Lemma A.3 in Appendix 4), and controls for the amount of nonlinearity in the conditional moment restrictions. Thus, Assumption A.8 is a joint restriction on the amount of nonlinearity, on the severity of ill-posedness, and on the Sobolev norm of the functional parameter. In particular, for given joint density of X, Y, Z, Assumption A.8 is satisfied if kϕ0 kH is small enough. Assumption A.8 is used to bound some residual terms in the asymptotic expansion of the estimator, involving probabilities of large deviations (see Lemma A.10 in Appendix 4). Assumption A.9 is used to derive the first-order condition satisfied by estimator ϕ ˆ . Finally, the last three Assumptions A.11-A.13 concern the spectrum of operator A∗ A. Assumption A.11 (i) is used to simplify the proof of Lemmas A.14, A.16 and A.17 in

32

order to get asymptotic normality. Assumption A.11 (ii) requires that the eigenfunctions of operator A∗ A, which are orthogonal w.r.t. h., .iH , satisfy a summability condition w.r.t. h., .i. Under this Assumption, the asymptotic expansion of the MISE in Proposition 3 involves a single sum, and not a double sum, over the spectrum. Assumptions A.12 and A.13 ask for the ¢ 1 ¡ existence of a uniform bound for moments of derivatives of functions ψj (z) = √ Aφj (z), νj

j ∈ N, both under density fZ and under the density defined by function q in Assumption A.4.

£ ¤ Functions ψj satisfy E ψj (Z)2 = 1, j ∈ N. Assumptions A.12 and A.13 are met whenever

the functions ψj and their derivatives do not exhibit too heavy tails. These assumptions Z are used to control terms of the type ψj (z) [1 {y ≤ ϕ0 (x)} − τ ] fˆX,Y,Z (r)dr, uniformly in j ∈ N. This step is necessary to bound higher order terms in the asymptotic expansion of the MISE in Lemma A.11, and in the proof of Lemma A.14.

Appendix 2: Proof of Proposition 1

Here we show local ill-posedness of the nonseparable setting. The following Lemma A.1 is a local version of Proposition 10.1 in Engl, Hanke and Neubauer (2000). Lemma A.1: Suppose the following conditions are satisfied: (a) Operator A is compact. (b) For any r > 0 small enough, there exists a sequence (ϕn ) ⊂ Br (ϕ0 ) s.t. ϕn 9 ϕ0 and w

w

A (ϕn ) → A (ϕ0 ) where → denotes weak convergence. Then the minimum distance problem is locally ill-posed.

33

Proof of Proposition 1:

To prove the proposition, we establish that the conditions

(a) and (b) in Lemma A.1 are satisfied. (a) We have to show that A maps closed sets into relatively compact sets. Let S ⊂ L2 [0, 1] be bounded. We have to prove that the closure of A(S) ⊂ L2 (FZ , τ ) is compact. We can equivalently use k.kL2 (FZ ,τ ) or k.kL2 (FZ ) . Proposition 2.24 in Alt (1992) states that A(S) is relatively compact if and only if: sup kA(ϕ)kL2 (FZ ) < ∞,

(30)

sup kA(ϕ)(. + h) − A(ϕ)kL2 (FZ ) → 0, as |h| → 0,

(31)

ϕ∈S

ϕ∈S

and ° ° sup °A(ϕ) · χRdZ \BR (0) °L2 (F

Z)

ϕ∈S

→ 0, as R % ∞,

(32)

© ª where χRdZ \BR (0) (z) := 1 z ∈ RdZ \ BR (0) , and BR (0) is a ball in RdZ of radius R around

0. To prove (30), notice that for any z |A (ϕ) (z)| =

Z

X

¢ ¡ fX|Z (x|z)FU |X,Z g −1 (x, ϕ(x)) |x, z dx ≤

Z

fX|Z (x|z)dx = 1.

(33)

X

Thus kA(ϕ)kL2 (FZ ) ≤ 1, for any ϕ ∈ L2 [0, 1], and (30) follows. To prove (31) we use |A (ϕ) (z + h) − A (ϕ) (z)| Z ¯ ¯ ¢ ¡ ¯fX|Z (x|z + h) − fX|Z (x|z)¯ FU|X,Z g −1 (x, ϕ(x)) |x, z + h dx ≤ X Z ¯ ¢ ¢¯ ¡ ¡ + fX|Z (x|z) ¯FU |X,Z g−1 (x, ϕ(x)) |x, z + h − FU|X,Z g −1 (x, ϕ(x)) |x, z ¯ dx X

≤ C |h| ,

¯ ¯ ¯ ¯ where C := sup ¯∇z FU |X,Z (u|x, z)¯ fX|Z (x|z) + sup ¯∇z fX|Z (x|z)¯ < ∞ from Assumptions u,x,z

x,z

A.3 (ii) and (iii). Thus we get

kA(ϕ)(. + h) − A(ϕ)kL2 (FZ ) ≤ C |h| → 0 as h → 0, 34

(34)

uniformly in ϕ ∈ L2 [0, 1]. Thus, (31) is proved. Finally, from (33) we get that for ϕ ∈ L2 [0, 1] ° ° °A(ϕ) · χRdZ \B (0) °2 R L2 (F

Z)



Z

RdZ \BR (0)

fZ (z)dz → 0 as R % ∞.

This implies (32) and that A is compact. (b) Define ψ(x) = sin (2πx) and ψn (x) := εψ(nx), x ∈ X , where 0 < ε < min{τ , 1 − τ }. Further, let ϕn (x) := g(x, τ + ψn (x)), x ∈ X . Then, we deduce that kϕn − ϕ0 k2 = Z [g(x, τ + εψ(nx)) − g (x, τ )]2 dx. Split the integral w.r.t. x over the partition ((k − 1) /n, k/n) X

with k = 1, ..., n. It follows that 2

kϕn − ϕ0 k

= =

n Z X k=1 n X k=1

k/n

(k−1)/n

1 n

[g(x, τ + εψ(nx)) − g (x, τ )]2 dx

¶ µ ¶¸2 Z 1∙ µ k−1 y k−1 y g dy, + , τ + εψ(y) − g + ,τ n n n n 0

using the periodicity of ψ. Using Assumption A.3 (i), ¶ µ ¶¸2 Z ∙ µ n X 1 1 k−1 k−1 g , τ + εψ(y) − g ,τ dy + O(1/n). kϕn − ϕ0 k = n n n 0 k=1 2

The first term is a converging Riemann sum, and 2

kϕn − ϕ0 k →

Z Z X

1

0

[g(x, τ + εψ(y)) − g (x, τ )]2 dydx.

The RHS is strictly larger than zero, and converges to zero as ε → 0 by the dominated convergence Theorem. Thus, for ε > 0 sufficiently small, we have (ϕn ) ⊂ Br (ϕ0 ) and ϕn 9 ϕ0 . Moreover, for q¯ ∈ L2 (FZ , τ ) we have h¯ q, A (ϕn )iL2 (FZ ,τ )

1 = τ (1 − τ )

Z

q¯(z)fZ (z)

Z

X

35

fX|Z (x|z)FU |X,Z (τ + ψn (x) |x, z) dxdz.

Thus, we have to show Jn :=

Z

q¯(z)fZ (z)

Z

X

fX|Z (x|z)FU|X,Z (τ + ψn (x) |x, z) dxdz → τ

Z

q¯(z)fZ (z)dz.

(35)

To this end, split the integral w.r.t. x over the partition ((k − 1) /n, k/n) with k = 1, ..., n Jn =

n Z X

(k−1)/n

k=1

=

k/n

Z Z n X 1 1 k=1

n

Z

q¯(z)fZ (z)fX|Z (x|z)FU |X,Z (τ + εψ (nx) |x, z) dzdx

q¯(z)fZ (z)fX|Z

0

µ

µ ¶ ¶ k−1 1 k−1 1 + y|z FU|X,Z τ + εψ (y) | + y, z dzdy n n n n

after a change of variable and using periodicity of function ψ. Then we have µ µ ¶Z 1 ¶ Z n X 1 k−1 k−1 q¯(z)fZ (z)fX|Z |z , z dydz + I1,n , (36) FU|X,Z τ + εψ (y) | Jn = n n n 0 k=1 where Z Z n X 1 1 |I1,n | ≤ q¯(z)fZ (z) sup |∇x H(u, x|z)| ydzdy = O(1/n), n2 0 u,x,z k=1 and H(u, x|z) := FU|X,Z (u|x, z) fX|Z (x|z), supu,x,z |∇x H(u, x|z)| < ∞ from Assumptions A.3 (ii)-(iii). Since the Riemann sum in (36) converges to the corresponding integral, we get Jn → Using that

Z

X

Z Z

q¯(z)fZ (z)fX|Z (x|z)

Z

1

FU |X,Z (τ + εψ (y) |x, z) dydzdx =: J.

0

X

fX|Z (x|z) FU|X,Z (u|x, z)dx = P [U ≤ u|z] = u by the independence of U and

Z, and the uniform distribution of U, we get J =τ

Z

q¯(z)fZ (z)dz + ε

Z

q¯(z)fZ (z)

Z

0

and (35) follows. ¥

36

1

ψ (y) dy = τ

Z

q¯(z)fZ (z)dz,

Appendix 3: Proof of Proposition 2

We establish existence of the Q-TiR estimator in A.3.1 before showing its consistency in A.3.2. A.3.1 Existence X 1 Since QT (ϕ) = m(ϕ, ˆ Zt )2 It is positive, a function ϕ ˆ ∈ Θ minimizes QT (ϕ) + T τ (1 − τ ) t=1 T

λT kϕk2H if and only if

ϕ ˆ = arg inf QT (ϕ) + λT kϕk2H , s.t. ϕ∈Θ

λT kϕk2H ≤ LT (ϕ0 ).

(37)

The solution ϕ ˆ in (37) exists P -a.s. since (i) mapping ϕ → kϕk2H is lower semicontinuous on H 2 [0, 1] w.r.t. the norm k.k (see Reed and Simon (1980), p. 358) and mapping ϕ → QT (ϕ)

© ª ¯ is is continuous on Θ w.r.t. the norm k.k, P -a.s., for any T ; (ii) set ϕ ∈ Θ : kϕk2H ≤ L ¯ < ∞ (compact embedding theorem; compact w.r.t. the norm k.k, for any constant 0 < L see Adams (1975)). The continuity of QT (ϕ) , P -a.s., follows from the mapping ϕ → m(ϕ, ˆ z) being continuous for almost any z ∈ Z, P -a.s.. The latter holds since for any ϕ1 , ϕ2 ∈ Θ, ¯ ¯¯ ¯ ¯ˆ ¯¯ ˆ ¯ ˆ ¯fX|Z (x|z)¯ ¯FY |X,Z (ϕ1 (x)|x, z) − FY |X,Z (ϕ2 (x)|x, z)¯ ¯Z ¯ à ! ¯ ϕ1 (x) ¯ ¯ ¯ ¯ ¯ ¯ ¯ sup ¯fˆX,Y |Z (x, y|z)¯ |ϕ1 (x) − ϕ2 (x)| , = ¯ fˆ (x, y|z)dy ¯ ≤ ¯ ϕ2 (x) X,Y |Z ¯ x∈[0,1],y∈R

and thus, by the Cauchy-Schwarz inequality,

Z ¯ ¯¯ ¯ ¯ˆ ¯¯ ¯ ˆ 2 , z)| ≤ |m(ϕ ˆ 1 , z) − m(ϕ ¯fX|Z (x|z)¯ ¯FˆY |X,Z (ϕ1 (x)|x, z) − FˆY |X,Z (ϕ2 (x)|x, z)¯ dx X

≤ C¯T kϕ1 − ϕ2 k ,

37

for almost any z ∈ Z, P -a.s., where C¯T :=

sup x∈[0,1],y∈R

z ∈ Z, P -a.s.

¯ ¯ ¯ ¯ˆ ¯fX,Y |Z (x, y|z)¯ < ∞ for almost any

A.3.2 Consistency The next Lemma A.2 is used below in the proof of Proposition 2 (consistency). Lemma A.2 (i) is proved in the Technical Report by extending an argument in Hansen (2007). Lemma A.2 (ii) and (iii) establish (uniform) convergence of the minimum distance criterion QT (ϕ). Lemma

A.2:

Under Assumptions A.1,

A.2,

A.3 (ii)-(iv),

A.7,

and A.10:

¯ ¯2 ¯ˆ (i) sup ¯fX|Z (x|z)FˆY |X,Z (y|x, z)− fX|Z (x|z)FY |X,Z (y|x, z)¯ = Op (aT ), where aT := x∈[0,1],y∈R,z∈ZT µ ¶ log T 2 2m (log T ) + hT ; (ii) QT (ϕ0 ) − Q∞ (ϕ0 ) = Op (aT ); (iii) sup |QT (ϕ) − Q∞ (ϕ)| = T hdTZ +1¶ ϕ∈Θ µ √ 1 Op = op (1). aT + √ T X 1 Proof of Lemma A.2: (ii) We have QT (ϕ0 ) − Q∞ (ϕ0 ) = ∆m(ϕ ˆ 0 , Zt )2 It , T τ (1 − τ ) t=1 T

where ∆m(ϕ, ˆ .) := m(ϕ, ˆ .) − m(ϕ, .). Furthermore,

Z ¯ ¯ ¯ˆ ¯ ˆ |∆m(ϕ, ˆ .)| ≤ f (x|.) F (ϕ(x)|x, .) − f (x|.)F (ϕ(x)|x, .) ¯ X|Z ¯ dx Y |X,Z X|Z Y |X,Z X ¯ ¯ ¯ ¯ ≤ sup ¯fˆX|Z (x|.)FˆY |X,Z (y|x, .) − fX|Z (x|.)FY |X,Z (y|x, .)¯ ,

(38)

x∈[0,1],y∈R

uniformly in ϕ ∈ Θ. Then, (ii) follows from (i).

(iii) Using m(ϕ, ˆ .) = ∆m(ϕ, ˆ .) + m(ϕ, .), we have X 1 QT (ϕ) − Q∞ (ϕ) = ∆m(ϕ, ˆ Zt )2 It + T τ (1 − τ ) t=1 T

(

) T X 1 m(ϕ, Zt )2 It − Q∞ (ϕ) T τ (1 − τ ) t=1

X 1 +2 ∆m(ϕ, ˆ Zt )m(ϕ, Zt )It . T τ (1 − τ ) t=1 T

38

From (38) and (i), the first term in the RHS is O⎛ p (aT ), uniformly in ϕ ∈ Θ. By Cauchyà !1/2 ⎞ T X √ 1 ⎠, Schwarz inequality, the third term in the RHS is Op ⎝ aT m(ϕ, Zt )2 It T τ (1 − τ ) t=1

uniformly in ϕ ∈ Θ. Thus, the conclusion follows if we show that ¯ ¯ ¶ µ T ¯ ¯ X 1 1 ¯ ¯ 2 . We have: m(ϕ, Zt ) It − Q∞ (ϕ)¯ = Op √ sup ¯ ¯ ϕ∈Θ ¯ T τ (1 − τ ) t=1 T ¯ ¯ T T ¯ ¯ X X 1 1 ¯ ¯ m(ϕ, Zt )2 It − Q∞ (ϕ)¯ ≤ m(ϕ, Zt )2 (1 − It ) ¯ ¯ T τ (1 − τ ) t=1 ¯ T τ (1 − τ ) t=1 ¯ ¯ T ¯ ¯ X 1 ¯ ¯ +¯ m(ϕ, Zt )2 − Q∞ (ϕ)¯ ¯ T τ (1 − τ ) t=1 ¯ =: I1,T (ϕ) + I2,T (ϕ).

X 4 (1 − It ), uniformly in Since |m(ϕ, .)| ≤ 2, the I1,T (ϕ) term is bounded by T τ (1 − τ ) t=1 # " ¸ ∙ T ³ ´ 1X c −1 −¯b ˆ = O T , (1 − It ) ≤ P [Z ∈ ZT ] + P inf f (z) ≤ (log T ) ϕ ∈ Θ. Now, E z∈ZT T t=1 for any ¯b > 0, from Assumption A.7 and a large deviation bound argument. We get T

³ ´ ¯ sup I1,T (ϕ) = Op T −b , for any ¯b > 0. To bound the I2,T (ϕ) term, we use ϕ∈Θ Z ¢ ¤ £ ¡ m(ϕ, z) = FU |X,Z g−1 (x, ϕ(x)) | x, z − τ fX|Z (x|z)dx. Then X

T £ ¤ 1X m(ϕ, Zt )2 − E m(ϕ, Z)2 T t=1 Z Z T 1 X© fX|Z (x|Zt )fX|Z (ξ|Zt ) = X X T t=1 ¡ ¢ ¤£ ¡ ¢ ¤ £ FU |X,Z g−1 (x, ϕ(x)) | x, Zt − τ FU |X,Z g−1 (ξ, ϕ(ξ)) | ξ, Zt − τ

£ −E fX|Z (x|Z)fX|Z (ξ|Z)

¢ ¤£ ¢ ¤¤ª ¡ ¡ £ dxdξ. FU |X,Z g−1 (x, ϕ(x)) | x, Z − τ FU |X,Z g−1 (ξ, ϕ(ξ)) | ξ, Z − τ

¯ ¯ T ¯ 1 X ¯ 1 ¯ ¯ √ sup ¯ √ (a(Zt , ) − E [a(Z, )])¯ , where a(z, ) := We get sup I2,T (ϕ) ≤ ¯ ϕ∈Θ τ (1 − τ ) T ∈[0,1]4 ¯ T t=1 39

£ ¤£ ¤ fX|Z (x|z)fX|Z (ξ|z) FU|X,Z (u | x, z) − τ FU|X,Z (v | ξ, z) − τ ,

:= (x, ξ, u, v) ∈ [0, 1]4 . Us-

ing Assumptions A.3 (ii)-(iv), function a is bounded and Lipshitz w.r.t. : |a(., C|

1



2 |,

1)

− a(.,

for a constant C. By Andrews (1994), Theorem 2, the family F := {a(., ) :

2 )|



∈ [0, 1]4 }

satisfies the Pollard entropy condition. By Andrews (1994), Theorem 1, the empirical proT 1 X cess ν T ( ) := √ (a(Zt , ) − E [a(Z, )]), ∈ [0, 1]4 , is stochastically equicontinuous. T t=1

Since we can apply a CLT for any

∈ [0, 1]4 , by the fundamental convergence result for

empirical processes (e.g. Andrews (1994), p. 2251), ν T (.) converges weakly. By the Contin³ √ ´ uous Mapping Theorem, sup ∈[0,1]4 |ν T ( )| = Op (1), and thus supϕ∈Θ I2,T (ϕ) = Op 1/ T . Hence, the conclusion follows. ¥

Proof of Proposition 2: By Lemma A.2 (ii) and the condition on λT , we have ϕ) + λT kˆ ϕk2H ≤ QT (ϕ0 ) + λT kϕ0 k2H = Op (aT + λT ) = Op (λT ). 0 ≤ QT (ˆ ϕk2H = Op (λT ), that is, kˆ ϕk2H = Op (1). Thus, by By QT ≥ 0, this implies that λT kˆ the compact embedding theorem, the sequence of minimizers ϕ ˆ is tight in (L2 [0, 1], k · k). Namely, for any δ > 0, there exists a compact subset Kδ of (L2 [0, 1] ∩ Θ, k · k), such that ϕ ˆ ∈ Kδ wp & 1 − δ, where notation wp & 1 − δ means "with probability at least 1 − δ for all sufficiently large sample sizes".

40

Next we have that for any ε > 0 and δ > 0, and any T sufficiently large, ϕ 6∈ Bε (ϕ0 )} ∩ {ˆ ϕ ∈ Kδ }] + P [ˆ ϕ∈ / Kδ ] P [ˆ ϕ 6∈ Bε (ϕ0 )] ≤ P [{ˆ ϕ ∈ Kδ }] + δ ≤ P [{ˆ ϕ 6∈ Bε (ϕ0 )} ∩ {ˆ ≤ P[

inf

QT (ϕ) + λT kϕk2H ≤ QT (ˆ ϕ) + λT kˆ ϕk2H ] + δ

≤ P[

inf

QT (ϕ) + λT kϕk2H ≤ QT (ϕ0 ) + λT kϕ0 k2H ] + δ.

ϕ∈Kδ ∩Θ\Bε (ϕ0 ) ϕ∈Kδ ∩Θ\Bε (ϕ0 )

Using Lemma A.2 (iii), we get: P [ˆ ϕ 6∈ Bε (ϕ0 )] ≤ P [ ≤ P[ ≤ P[

inf

Q∞ (ϕ) + op (1) + λT kϕk2H ≤ Op (λT )] + δ

inf

Q∞ (ϕ) + op (1) ≤ Op (λT )] + δ

inf

Q∞ (ϕ) ≤ op (1)] + δ.

ϕ∈Kδ ∩Θ\Bε (ϕ0 ) ϕ∈Kδ ∩Θ\Bε (ϕ0 ) ϕ∈Kδ ∩Θ\Bε (ϕ0 )

Now let κδ,ε := inf ϕ∈Kδ ∩Θ\Bε (ϕ0 ) Q∞ (ϕ). By compactness of Kδ , continuity of Q∞ and identification, we have κδ,ε = Q∞ (ϕ∗δ,ε ) > 0 for some ϕ∗δ,ε ∈ Kδ ∩ Θ \ Bε (ϕ0 ). Thus P [ˆ ϕ 6∈ Bε (ϕ0 )] ≤ P [κδ,ε ≤ op (1)] + δ → δ, as T → ∞. Since δ can be made arbitrary small, we conclude that P [ˆ ϕ 6∈ Bε (ϕ0 )] → 0. Since ε > 0 is arbitrary, consistency follows. ¥

Appendix 4: Proof of Propositions 3

This appendix concerns the derivation of the asymptotic MISE of the Q-TiR estimator. The steps are as follows: computing the Frechet derivatives in A.4.1, getting the first-order 41

condition in A.4.2, making the asymptotic expansion of the MISE in A.4.3, deriving the final expression in A.4.4. A.4.1 Frechet derivatives Lemma A.3: Under Assumption A.6, the Frechet derivative of A at ϕ0 is the linear Z operator A := DA (ϕ0 ) defined by Aψ (z) = fX,Y |Z (x, ϕ0 (x)|z)ψ (x) dx, z ∈ Z, for

ψ ∈ L2 [0, 1]. Moreover we have A (ϕ) = A (ϕ0 ) + A∆ϕ + R (ϕ, ϕ0 ), where the residual 1 R (ϕ, ϕ0 ) is such that kR (ϕ, ϕ0 )kL2 (FZ ) ≤ c k∆ϕk2 , and c := sup |∇y fX,Y |Z (x, y|z)|. 2 x∈X ,y∈Y,z∈Z X 1 Define hϕ, ψiL2 (FˆZ ,τ ) := It ϕ (Zt ) ψ (Zt ) and kψk2L2 (FˆZ ,τ ) := hψ, ψiL2 (FˆZ ,τ ) . T τ (1 − τ ) t=1 D E D E ∗ ˆ ˆ Then, ϕ, Aψ 2 = A ϕ, ψ . T

L (FˆZ ,τ )

H

Lemma A.4: Under Assumption A.10, the Frechet derivative of Aˆ at ϕ ¯ is the linear operaZ ¯ (z) = fˆX,Y |Z (x, ϕ ¯ (x)|z)ψ (x) dx, z ∈ Z, for ψ ∈ L2 [0, 1]. tor A¯ := DAˆ (¯ ϕ) defined by Aψ

ˆ (ϕ, ϕ ˆ (ϕ, ϕ Moreover we have Aˆ (ϕ) = Aˆ (¯ ϕ) + A¯ (ϕ − ϕ ¯) + R ¯ ), where R ¯ ) is such that P -a.s., ° ° 1 °ˆ ° ¯ )° 2 ≤ p sup |∇y fˆX,Y |Z (x, y|z)|. cˆ kϕ − ϕ ¯ k2 , and cˆ := °R (ϕ, ϕ ˆ L (FZ ,τ ) x∈X ,y∈R,z∈ZT 2 τ (1 − τ ) We will denote the Frechet derivative of Aˆ at ϕ0 by Aˆ0 := DAˆ (ϕ0 ), and denote Aˆ := DAˆ (ˆ ϕ) . A.4.2 First-order condition

Proof of Equation (9): By Assumption A.9, let r > 0 be such that Br (ϕ0 ) ∩ H 2 [0, 1] is

42

contained in Θ. The estimator ϕ ˆ is such that, when k∆ˆ ϕk < r we have: ˆ + εψ ∈ Θ for any ε s.t. |ε| < ρ. ∀ψ ∈ H 2 [0, 1], ∃ρ = ρ (ψ) > 0 : ϕ Thus, when k∆ˆ ϕk < r the estimator ϕ ˆ satisfies the first order condition ¯ ¯ d LT (ˆ ϕ + εψ)¯¯ = 0 , ∀ψ ∈ H 2 [0, 1]. dε ε=0

°2 ° ° °ˆ Writing LT (ϕ) = °A (ϕ) − τ ° 2

L (FˆZ ,τ )

+ λT kϕk2H , we have

¯ E D ¯ d ˆ ˆ ¯ ϕ) − τ , DA (ˆ ϕ) ψ 2 ϕ + εψ)¯ = 2 A (ˆ + 2λT hˆ ϕ, ψiH LT (ˆ dε L (FˆZ ,τ ) ε=0 D E ˆ ˆ = 2 A (ˆ ϕ) − τ , Aψ 2 + 2λT hˆ ϕ, ψiH L (FˆZ ,τ )

´ D ³ E ∗ ˆ ˆ ϕ) − τ + λT ϕ = 2 A A (ˆ ˆ, ψ . H

By the consistency of ϕ ˆ (Proposition 2), P [k∆ˆ ϕk < r] → 1. We show below that P [k∆ˆ ϕk ≥ r] = ´ ³ ¯ O T −b , for any ¯b > 0. A.4.3 Asymptotic expansion In this section we provide the asymptotic expansion of the MISE of estimator ϕ ˆ . Our strategy ˆ T (∆ˆ ϕ) in Equation consists of three steps. In Step (i) we show that the nonlinearity term K (11) satisfies a quadratic bound w.p.a. 1 (Lemma A.5). In Step (ii) we exploit Equation (11) £ ¤ ˆ T (∆ˆ and the quadratic nature of K ϕ) to get a bound on the difference between E k∆ˆ ϕk2 ∙° ° ¸ ° ˆ °2 and E °∆ψ ° (Lemmas A.6 and A.7). The bound involves probabilities of large deviations ° ° ° ˆ° for k∆ˆ ϕk and °∆ψ °. In Step (iii) we bound these probabilities by a large deviation result

for penalized minimum distance estimators (Lemmas A.8 and A.9). Combining the three 43

£ ¤ steps, we get the asymptotic expansion of the MISE E k∆ˆ ϕk2 in terms of the expectations ° ° ° ˆ° of powers of °∆ψ ° (Lemma A.10).

(i) Quadratic bound for the nonlinearity term

Lemma A.5: Under Assumptions A.1-A.3, A.6, A.7, A.10, for any ¯b > 0 and any con¸ ∙° ° ¯ ¯ C 1 ° ° 2 ˆ T (∆ˆ ¯∇y fX,Y |Z (x, y|z)¯: P °K stant C > p ϕ)° > √ k∆ˆ ϕk = sup λT x∈X ,y∈Y,z∈Z 2 τ (1 − τ ) ³ ´ ¯ O T −b . (ii) Control of the nonlinearity term First we consider the nonstochastic analogue of Equation (11). Lemma A.6: Let function ϕ satisfy ϕ = ψ + εK (ϕ) ,where ψ is a known function, K a nonlinear operator such that kK (ϕ)k ≤ kϕk2 , and ε > 0. If ε kψk < 1/8, then either ¯ ¯ ¯kϕk2 − kψk2 ¯ ≤ 32ε kψk3 , or kϕk2 ≥ 3 . 8ε2

° °2 C ° ˆ° to bound the difference k∆ˆ ϕk2 − °∆ψ We can use Lemma A.6 with ε = εT = √ ° λT ¾ ½ ° ° ° ° 3 ° ˆ° °ˆ ° ϕk2 < 2 ∧ °K on the set εT °∆ψ (∆ˆ ϕ ) ϕk2 , and derive the fol° < 1/8 ∧ k∆ˆ ° ≤ εT k∆ˆ T 8εT lowing result.

Lemma A.7: Under Assumptions A.1-A.3, A6, A.7 and A.10, we have for any ¯b > 0, with ¯ ¯ 1 ¯∇y fX,Y |Z (x, y|z)¯, C> p sup 2 τ (1 − τ ) x∈X ,y∈Y,z∈Z µ ∙° ° ¸ ∙° ° ¸ ∙° ° ¸ £ 1 λT ° ˆ °2 ° ˆ °3 ° ˆ °2 2¤ E k∆ˆ ϕk − E °∆ψ° = O √ E °∆ψ° + P °∆ψ° ≥ 64C 2 λ ¸¶ ∙ T ³ ´ 3λT −¯b + O T . +P k∆ˆ ϕk2 ≥ 8C 2 44

(39)

¸ 3λT on the RHS of (39) controls for Note that for large T, probability P k∆ˆ ϕk > 8C 2 3 both the event k∆ˆ ϕk2 ≥ 2 and the event k∆ˆ ϕk ≥ r, in which the first-order condition 8εT ∙

2

(11) does not hold. (iii) A large deviation bound for penalized minimum distance estimators Lemma A.8: We have P [kˆ ϕ − ϕ0 k ≥ εT ] ≤ k1 (T, C (εT , λT )) +k2 (T, C (εT , λT )), where εT > 0, C (ε, λ) :=

inf

ϕ∈Θ:kϕ−ϕ0 k≥ε

Q∞ (ϕ) + λ kϕk2H − λ kϕ0 k2H ,

(40)

and q q ⎤ 2 2 λT kϕ0 kH + 2η − λT kϕ0 kH |∆m ˆ (ϕ, z)| ⎦, ≥ k1 (T, η) := P ⎣sup sup p 4 ϕ∈Θ z∈ZT τ (1 − τ ) ¯ ¯ # " T ¯ ¯ X 1 ¯ ¯ m (ϕ, Zt )2 It − Q∞ (ϕ)¯ ≥ η/2 . k2 (T, η) := P sup ¯ ¯ ¯ T τ (1 − τ ) ϕ∈Θ t=1 ⎡

(41)

(42)

In an ill-posed setting, the usual "identifiable uniqueness" condition (White and Wooldridge (1991))

inf

ϕ∈Θ:kϕ−ϕ0 k≥ε

Q∞ (ϕ) > Q∞ (ϕ0 ) does not hold (see GS). It is replaced by the

Inequality C (ε, λ) > 0 for the penalized criterion, and the behaviour of C (ε, λ) as λ, ε → 0 matters for the rate of convergence of ϕ ˆ . A lower bound for the function C (ε, λ) as λ → 0 ³√ ´ and ε = O λ is given in the next result. Lemma A.9: Suppose Assumption A.6 holds. Let d > 0 be a constant such that kϕ0 k2H . d > hψ, A∗ AψiH inf ψ:kψkH ≤2kϕ0 kH kψk2 2

45

(43)

Then, for any M < ∞ and for λ close enough to 0 : √ Q∞ ϕ∈Θ:kϕ−ϕ0 k≥d λ

inf

(ϕ) + λ kϕk2H − λ kϕ0 k2H ≥ M

λ . log(1/λ)

(44)

. In the RHS of (44),

λ can be replaced by λg(λ), where g (λ) is any function of λ log(1/λ)

such that g (λ) → 0 as λ → 0 (see the proof of Lemma A.9). From Lemmas A.8 and A.9 we deduce that for d > 0 as in (43), and some constant b > 0 : ¸ ∙ £ ¤ λT 2 2 2 P kˆ ϕ − ϕ0 k ≥ d λT ≤ P sup sup |∆m ˆ (ϕ, z)| ≥ b log (1/λT ) ϕ∈Θ z∈ZT ¯ ¯ " # T ¯ ¯ X λ 1 ¯ ¯ T +P sup ¯ . m (ϕ, Zt )2 It − Q∞ (ϕ)¯ ≥ b ¯ ¯ log (1/λT ) ϕ∈Θ T τ (1 − τ ) t=1

(45)

ˆ (ϕ, z)|2 and From the proof of Lemma A.2 (Appendix 3.1) terms sup sup |∆m ϕ∈Θ z∈ZT ¯ ¯ T ¯ ¯ X 1 ¯ ¯ m (ϕ, Zt )2 It − Q∞ (ϕ)¯ can be bounded by suprema of suitable empirical sup ¯ ¯ ¯ ϕ∈Θ T τ (1 − τ ) t=1 µ µ ¶¶ log T 2 2m processes over compact finite-dimensional sets, which are Op (log T ) + hT Z T h1+d T ¶ ¶ µ µ ¡ −¯ε ¢ 1 1 1 1 2m √ , respectively. When = O T , for ¯ε > 0, these and Op √ + h + T Z λT T h1+d T T T λT orders are negligible w.r.t. , and by standard large deviation results the two problog (1/λT ) abilities in the RHS of (45) are converging to zero as T → ∞ at a geometric rate. Thus, they are negligible compared to the other terms in the RHS of (39), which converge as a negative power of T . The next result is proved by combining Inequality (45) and Lemma £ ¤ A.7, and yields the asymptotic expansion of the MISE E k∆ˆ ϕk2 . 46

Lemma A.10: Suppose that Assumptions A.1-A.3, A.6, A.7, A.10 hold, and ¯ ¯ 20 ¯∇y fX,Y |Z (x, y|z)¯2 < 1 sup τ (1 − τ ) x∈X ,y∈Y,z∈Z kϕ0 k2H

inf

ψ:kψkH ≤2kϕ0 kH

hψ, A∗ AψiH . kψk2

(46)

Let λT → 0 such that for ¯ε > 0 :

¶ ¡ ¢ 1 = O T −¯ε . + +√ 1+dZ T hT T µ ∙° ° ¸¶ ∙° ° ¸ ³ ´ £ 1 ¯ ° ˆ °3 ° ˆ °2 2¤ + O T −b , for any ¯b > 0. Then E k∆ˆ ϕk = E °∆ψ° + O √ E °∆ψ ° λT 1 λT

µ

1

h2m T

(47)

Condition (46) is used to show that Condition (43) is satisfied, when bounding the

probabilities in the RHS of (39) by means of the result in (45). A.4.4 Proof of Proposition 3 (MISE) ¶ µ 1 1 1 1 1 = o (λεT ) , h2m = O = From Conditions (13) and (14) we have 2 2 2+2d Z λT T λT T 2 hT λT T ¡ ¢ 1 O λ1+ε , 2 = o (hT λεT ) , and thus Condition (47) is satisfied as long as hT ³ T −η and T λT T λT ³ T −γ , η, γ > 0. From Assumption A.8 and Lemma A.10, the conclusion follows if: ∙° ° ¸ ∞ ° °2 1X νj ° ˆ °2 ° ° + b (λT )2 , up to negligible terms, and, (i) E °∆ψ° = 2 φj T j=1 (ν j + λT ) ! Ã ∞ ∙° ° ¸ X 3 ° ° ν 1 1 ° ˆ° j ° °2 + b (λT )2 . (ii) √ E °∆ψ ° =o 2 φj T λT (ν j + λT ) j=1 To show these statements, we write

Aˆ (ϕ0 ) (z) − τ =

Z Z

∆fˆX,Y,Z (x, y, z) dydx fZ (z) # " Z Z ˆX,Y,Z (x, y, z) f + (1 {y ≤ ϕ0 (x)} − τ ) fˆX,Y |Z (x, y|z) − dydx fZ (z) (1 {y ≤ ϕ0 (x)} − τ )

ˆ + qˆ(z). =: − ζ(z) 47

ˆ as ∆ψ ˆ =: VT + BT + RT such that Then we decompose ∆ψ ¤ £ ˆ = (λT + A∗ A)−1 A∗ ζˆ + (λT + A∗ A)−1 A∗ A − 1 ϕ0 + RT , ∆ψ

where VT is the variance term, BT is the regularization bias term, and the remainder term RT is given by

¸ ∙³ ´−1 −1 ∗ ∗ A∗ ζˆ = λT + Aˆ0 Aˆ0 − (λT + A A) ¸ ∙³ ´−1 −1 ∗ ∗ ∗ ∗ + λT + Aˆ0 Aˆ0 Aˆ0 Aˆ0 − (λT + A A) A A ϕ0

RT

´−1 ³ ³ ´ ´ ³ Aˆ∗0 ζˆ − qˆ − A∗ ζˆ + λT + Aˆ∗0 Aˆ0

´−1 ³ ´³ ´ ³ Aˆ∗ − Aˆ∗0 Aˆ (ˆ ϕ) − τ . − λT + Aˆ∗0 Aˆ0

(48)

We now give a series of inequalities and bounds to show that the remainder term RT can be neglected. First, from Cauchy-Schwarz inequality, ∙° ° ¸ ´ ³ £ £ £ £ ° ˆ °2 2¤ 2¤ 2 ¤1/2 2 ¤1/2 E °∆ψ° = E kVT + BT k + E kRT k + O E kVT + BT k E kRT k , (49) and

∙° ° ¸ ∙° ° ¸1/2 ∙° ° ¸1/2 ° ˆ °3 ° ˆ °4 ° ˆ °2 E °∆ψ° ≤ E °∆ψ E °∆ψ ° °

∙° ° ¸1/2 ¡ £ £ ° ˆ °2 4¤ 4 ¤¢1/2 ≤ C E kVT + BT k + E kRT k E °∆ψ , °

(50)

for a constant C. Second, we can isolate the estimation bias by writing ∗

−1

VT + BT = (λT + A A) Thus,

³ ´ £ ¤ ˆ ˆ A ζ − E ζ + (λT + A∗ A)−1 A∗ E ζˆ + (λT + A∗ A)−1 A∗ A − 1 ϕ0 . ∗

∙° ³ ´°2 ¸ £ ° ° 2¤ −1 ∗ ˆ ∗ ˆ E kVT + BT k = E °(λT + A A) A ζ − E ζ °

° £ ¤ ° °2 ° −1 ∗ ˆ −1 ∗ ∗ ∗ + °(λT + A A) A E ζ + (λT + A A) A A − 1 ϕ0 ° , 48

(51)

and for a constant C: 4

kVT + BT k

µ° ³ ´°4 ° ° ≤ C °(λT + A∗ A)−1 A∗ ζˆ − E ζˆ °

¶ ° £ ¤ ° °4 ° −1 ∗ ˆ −1 ∗ ∗ ∗ + °(λT + A A) A E ζ + (λT + A A) A A − 1 ϕ0 ° .

(52)

∙° ³ ´°2 ¸ ° ° −1 ∗ ˆ ∗ ˆ In Lemma A.11 we give the asymptotic behavior of E °(λT + A A) A ζ − E ζ ° and ∙° ³ ´°4 ¸ ° ° −1 ∗ ˆ ∗ E °(λT + A A) A ζ − E ζˆ ° . In Lemma A.12 we prove that estimation bias is negligible compared to regularization bias, and in Lemma A.13 we give bounds on the remainder term. Combining Lemmas A.11 (i), A.12, A.13 (i) with Equations (49), (51), yields Statement (i) above. Then, combining (ii) ⎞ with Inequalities (50), ⎛Ã Lemmas A.11 (ii), A.12, A.13 ! 3/2 ∙° ° ¸ ∞ ° °2 1X νj ° ˆ °3 ° ° + b (λT )2 ⎠. The latter in turns (52) yields E °∆ψ ° = O⎝ 2 φj T j=1 (ν j + λT )

implies Statement (ii) by using Condition (15).

Lemma A.11: Under Assumptions A.1, A.2, A.4, A.10, A.11 (ii), A.12, A.13 (i): ∙° ∞ ³ ´°2 ¸ ° °2 νj 1X ° ° −1 ∗ ˆ ∗ ˆ °φ ° , (i) up to negligible terms, E °(λT + A A) A ζ − E ζ ° = T j=1 (ν j + λT )2 j ⎛Ã !2 ⎞ ∙° ∞ ³ ´°4 ¸ ° °2 νj 1X ° ° ° ° ⎠. (ii) E °(λT + A∗ A)−1 A∗ ζˆ − E ζˆ ° = O ⎝ 2 φj T j=1 (ν j + λT )

Lemma A.12: Suppose that Assumptions A.1, A.2, A.3 (ii), A.5, A.10 hold, and hm T = o (λT b (λT )). Then, up to negligible terms: ° £ ¤ ° ° ° −1 −1 °(λT + A∗ A) A∗ E ζˆ + (λT + A∗ A) A∗ A − 1 ϕ0 ° = b (λT ) .

Lemma A.13: Suppose that Assumptions A.1-A.7, A.10, A.11 (ii), A.12, A.13 hold, and (log T )2 2(1+dZ )

T hT

= O(1) ,

(log T )2 1 2+ε m + h = o (λ b (λ )), and + h2m T T T T = O(λT ), ε > 0. dZ +1 max{dZ ,2} T hT T hT 49

! Ã ∞ ° °2 £ νj 1X 2¤ 2 °φ ° + b (λT ) , Then: (i) E kRT k = o T j=1 (ν j + λT )2 j ⎛Ã !2 ⎞ ∞ ° °2 £ ¤ νj 1X ° ° + b (λT )2 ⎠ . (ii) E kRT k4 = o ⎝ 2 φj T j=1 (ν j + λT )

and

Appendix 5: Proof of Proposition 5 Let us show the asymptotic normality of the Q-TiR estimator. From Equation (11) and the decomposition in A.4.4, we have q q ³ ´ T /σ 2T (x) (ˆ ϕ (x) − ϕ0 (x)) = T /σ 2T (x) (λT + A∗ A)−1 A∗ ζˆ − E ζˆ (x) q q + T /σ 2T (x)BT (x) + T /σ 2T (x) (λT + A∗ A)−1 A∗ E ζˆ (x) q q ˆ T (∆ˆ ϕ) (x) + T /σ 2T (x)RT (x) + T /σ 2T (x)K =: (I) + (II) + (III) + (IV) + (V), where RT (x) is defined in (48). We now show that the term (I) is asymptotically N(0, 1) distributed in A.5.1, and the terms (III), (IV), and (V) are op (1) in A.5.2, which implies Proposition 5. A.5.1 Asymptotic normality of (I) © ª Since φj : j ∈ N is an orthonormal basis w.r.t. h., .iH , we can write: ∗

−1

(λT + A A)

∞ D ³ ´ ³ ´E X −1 ∗ ˆ ∗ ˆ ˆ ˆ φj , (λT + A A) A ζ − E ζ A ζ − E ζ (x) = φj (x) ∗

=

j=1 ∞ X j=1

H

D ³ ´E 1 ∗ ˆ ˆ φj (x), φj , A ζ − E ζ λT + ν j H

50

for almost any x ∈ [0, 1]. Then, we get

∞ q ³ ´ X T /σ 2T (x) (λT + A∗ A)−1 A∗ ζˆ − E ζˆ (x) = wj,T (x)Zj,T ,

(53)

h i √ Z = − T GT (r) fˆX,Y,Z (r) − E fˆX,Y,Z (r) dr,

(54)

j=1

³ ´ √ 1 where Zj,T := √ hφj , T A∗ ζˆ − E ζˆ iH , j = 1, 2, · · · , νj !1/2 Ã∞ √ X νj νj 2 φ (x) / , j = 1, 2, · · · . and wj,T (x) := 2 φj (x) λT + ν j j (λ + ν ) T j j=1 ∞ X Note that wj,T (x)2 = 1. Equation (53) can be rewritten (see the proof of Lemma A.11) j=1

using

∞ X

wj,T (x)Zj,T

j=1

¡ ¢ Aφj (z) 1 r = (w, z), GT (r) := GT (x, r) := wj,T (x)gj (r), gj (r) = 1ϕ0 (w), and √ τ (1 − τ ) ν j j=1 1ϕ0 (w) = 1{y ≤ ϕ0 (x)} − τ .

∞ X

Lemma A.14: Suppose that Assumptions A.1, A.10, A.11 (i), A.13 (ii) hold, hm T = o (λT ) ∞ X νj h i (λT + ν j )2 p √ Z j=1 and hT ∞ T GT (r) fˆX,Y,Z (r) − E fˆX,Y,Z (r) dr = o (1) . Then: X νj 2 2 φj (x) (λT + ν j ) j=1 T ∞ X 1 X 0 YtT + op (1), where YtT := GT (Rt ) = wj,T (x)gj (Rt ), Rt = (Xt , Yt , Zt ) . =√ T t=1 j=1

From Lemma A.14 it is sufficient to prove that T

−1/2

T X t=1

YtT is asymptotically N(0, 1)

£¡ ¤¤ ¢ £ 1 1 distributed. Note that E [gj (R)] = √ E Aφj (Z) E 1ϕ0 (W ) |Z = 0, and τ (1 − τ ) ν j £¡ ¤ ¤ ¢ £ 1 1 2 Cov [gj (R), gl (R)] = 2 E Aφ (W ) |Z (Aφ ) (Z) (Z) E 1 √ √ ϕ j l 0 τ (1 − τ )2 ν j ν l £¡ ¤ ¢ 1 1 = √ √ E Aφj (Z) (Aφl ) (Z) τ (1 − τ ) ν j ν l ­ ® 1 = √ √ φj , A∗ Aφl H = δ j,l . νj νl 51

Thus E [YtT ] = 0 and V [YtT ] =

∞ X

wj,T (x)wl,T (x)Cov [gj (R), gl (R)] =

From application of a Lyapunov CLT, it is sufficient to show that

T 1/2 To this goal, using |YtT | ≤

T

£ 1 3¤ E |Y | ≤ tT 1/2

∞ X j=1

£ ¤ E |YtT |3 → 0,

T → ∞.

(55)

|wj,T (x)| |gj (Rt )| and the triangular inequality, we get

⎡Ã !3 ⎤ ∞ X 1 E⎣ |wj,T (x)| |gj (R)| ⎦ =

T 1/2

wj,T (x)2 = 1.

j=1

j,l=1

1

∞ X

j=1

°∞ °3 ° ° X 1 ° ° |wj,T (x)| |gj |° ° 1/2 ° T ° j=1 3

Ã∞ √ !3 X ¯ νj ¯ ¯φj (x)¯ kgj k Ã∞ !3 3 λ + ν X j 1 1 j=1 T ≤ 1/2 |wj,T (x)| kgj k3 = 1/2 à !3/2 . ∞ T T X j=1 νj 2 2 φj (x) (λ + ν ) T j j=1

Moreover, from the Cauchy-Schwarz inequality we have !1/2 à ∞ !1/2 Ã∞ √ ∞ X X X 1 ¯ νj ¯ ν j 2 2 1+¯ ε ¯φj (x)¯ kgj k ≤ , 3 2 φj (x) kgj k3 j λ + ν j 1+¯ε (λ + ν ) T j T j j=1 j=1 j=1

∞ X 1 < ∞, for any ¯ε > 0. Thus, we get and 1+¯ ε j j=1

£ 3¤ E |Y | ≤ tT 1/2

1 T



!3/2 ⎜ Ã∞ X 1 ⎜ 1 ⎜ ⎜ T 1/3 1+¯ ε j ⎝ j=1

∞ X j=1

and Condition (55) is implied by Condition (29).

⎞3/2 νj 2 2 1+¯ ε φj (x) kgj k3 j ⎟ (λT + ν j )2 ⎟ ⎟ , ∞ ⎟ X νj ⎠ 2 2 φj (x) (λT + ν j ) j=1

A.5.2 Terms (III), (IV), and (V) are o(1), op (1), op (1) hm T

³ ´ 1+ε/2 = O λT b(λT ) ,

Lemma A.15: Suppose that Assumptions A.5, A.10 hold, and q ¡ −ε ¢ MT (λT ) = o λT , for ε > 0. Then: T /σ 2T (x) (λT + A∗ A)−1 A∗ E ζˆ (x) = o(1). 2 σ T (x)/T 52

³ ´ (log T )2 1+ε/2 m + h = O λ b (λ ) , T T T T hdTZ +1 ¡ ¢ MT (λT ) = o λ−ε = O(λ2+ε T ), T , for 2 σ T (x)/T

Lemma A.16: Suppose that Assumptions A hold, and 1 (log T )2 1 2+ε 2m + h = O(λ ), = O(1), + h2m T T T 2 dZ 2(1+dZ ) T T h T hT T q ε > 0. Then: T /σ 2T (x)RT (x) = op (1) .

Lemma A.17: Suppose that Assumptions A hold, and

1 max{dZ ,2} T hT

+ h2m = O(λ2+ε T T ),

¡ ¢ (log T )2 m (log T )2 MT (λT ) 3 = o λ−ε +h = O (λ b (λ )) , = O(1), 2 T T T T , T λT = O(1), MT (λT ) = dZ +1 2(1+d ) Z T σ (x)/T T hT T q ¡ 1+ε ¢ 2 ˆ T (∆ˆ O λT T /σ T (x)K ϕ) (x) = op (1) . for ε > 0. Then:

53

TABLE 1: Case 1: Separable model: ϕ0 (x) = sin(πx) + Φ−1 (τ ). Asymptotic and finite sample optimal regularization parameter and MISE. Average ISE, average and quartiles of selected regularization parameters with data driven procedure. τ

.10

.25

.50

.75

.90

Asymptotic values λ

.0006

.0008

.0009

.0008

.0006

MISE

.0302

.0180

.0147

.0180

.0302

Finite sample values λ

.0004

.0005

.0006

.0005

.0003

MISE

.0353

.0189

.0133

.0173

.0401

Data driven values Ave λ

.0009

.0006

.0006

.0006

.0009

1st Qu λ

.0003

.0003

.0003

.0003

.0003

Med λ

.0004

.0004

.0004

.0004

.0004

2nd Qu λ

.0007

.0005

.0006

.0006

.0012

Ave ISE

.0423

.0225

.0178

.0237

.0466

54

TABLE 2: Case 2: Nonseparable model: ϕ0 (x) = Φ(3(x + τ − 1)). Asymptotic and finite sample optimal regularization parameter and MISE. Average ISE, average and quartiles of selected regularization parameters with data driven procedure. τ

.10

.25

.50

.75

.90

Asymptotic values λ MISE

.013

.013

.027

.016

.016

.0010

.0013

.0010

.0012

.0009

Finite sample values λ

.001

.004

.012

.003

.002

MISE

.0024

.0019

.0023

.0021

.0024

Data driven values Ave λ

.007

.007

.011

.009

.008

1st Qu λ

.003

.002

.002

.002

.002

Med λ

.006

.005

.006

.005

.005

2nd Qu λ

.008

.010

.015

.013

.008

Ave ISE

.0043

.0029

.0038

.0033

.0036

55

Food-in

Food-out

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.2

0.4

0.6

0.8

1

0

0

0.2

0.4

0

0.2

0.4

x Fuel 0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.2

0.4

0.6

0.8

1

x

0

0.6 x Leisure

0.6

0.8

1

0.8

1

x

Figure 1: Estimated median structural effect (solid line) and bootstrap pointwise confidence bands at 95% (dotted lines) for the four categories.

56

Food-in

Food-out

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0.2

0.4

0.6

0

0.8

0.2

x Fuel 0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05 0.2

0.4

0.6

0.6

0.8

x Leisure

0.35

0

0.4

0

0.8

x

0.2

0.4

0.6

0.8

x

Figure 2: Estimated quartile structural effects (solid lines) and mean structural effect (dashed line) for the four categories.

57

Food-in

Food-out

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.2

0.4

0.6

0.8

1

0

0

0.2

0.4

0

0.2

0.4

x Fuel 0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.2

0.4

0.6

0.8

1

x

0

0.6 x Leisure

0.6

0.8

1

0.8

1

x

Figure 3: Estimated interquartile range of structural effect (solid line) and bootstrap pointwise confidence bands at 95% (dotted lines) for the four categories.

58

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.