Thesemodele Lineaire

Modèles de régression linéaire pour variables
explicatives fonctionnelles
Christophe Crambes
To cite this version:

Christophe Crambes. Modèles de régression linéaire pour variables explicatives fonctionnelles.
Mathématiques [math]. Université Paul Sabatier - Toulouse III, 2006. Français. <tel-
00134003>
HAL Id: tel-00134003

https://tel.archives-ouvertes.fr/tel-00134003
Submitted on 28 Feb 2007
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
THÈSE
présentée en vue de l’obtention du
DOCTORAT DE L’UNIVERSITÉ PAUL SABATIER

TOULOUSE III
Discipline : Mathématiques
Spécialité : Statistique
par
Christophe Crambes

Directeurs de thèse : Hervé Cardot et Pascal Sarda
Soutenue le 23 novembre 2006 devant le jury composé de Messieurs :
Benoı̂t Cadre Université Montpellier II Rapporteur

Hervé Cardot CESAER - ENESAD INRA Dijon Directeur
Antonio Cuevas Universidad Autónoma de Madrid Rapporteur
Frédéric Ferraty Université Paul Sabatier Examinateur
Alois Kneip Universität Bonn Examinateur
Pascal Sarda Université Paul Sabatier Directeur
Laboratoire de Statistique et Probabilités

UMR CNRS 5583, Université Paul Sabatier, Toulouse III
2
Mémoire de thèse de doctorat

Christophe Crambes
REMERCIEMENTS
Je voudrais tout d’abord remercier Pascal Sarda et Hervé Cardot pour

avoir accepté d’encadrer ma thèse. Je tiens à les remercier de m’avoir accordé
leur confiance depuis l’année de mon DESS, où ils ont encadré mon stage de
fin d’année, et m’ont alors encouragé à poursuivre en DEA et en thèse. Ils ont
toujours fait preuve d’une très grande disponibilité à mon égard et je réalise
aujourd’hui à quel point travailler avec eux a été enrichissant.
Je tiens ensuite à remercier Benoı̂t Cadre et Antonio Cuevas pour avoir

accepté d’être les rapporteurs de cette thèse. Je suis très flatté de l’intérêt
qu’ils ont porté à ce travail. Leur relecture attentive du manuscript ainsi que
leurs remarques pertinentes ont contribué à améliorer la version finale de ce
document.
Je suis très heureux qu’Alois Kneip ait accepté de faire partie de mon jury.
Les deux séjours au cours desquels il m’a accueilli à Mayence puis à Bonn
m’ont énormément apporté aussi bien sur le plan mathématique que sur le
plan humain, et les travaux que nous avons en cours vont nous permettre de
poursuivre notre collaboration au-delà de cette thèse.
Je souhaite également remercier Frédéric Ferraty de faire partie de mon

jury. Il m’a toujours apporté de bons conseils par rapport à mon travail, et j’ai
toujours pu trouver la porte de son bureau ouverte chaque fois que j’en ai eu
besoin.
Je voudrais à présent remercier les professeurs du Laboratoire de Statis-

tique et Probabilités que j’ai pu cotoyer pendant ces trois années de thèse,
4 REMERCIEMENTS
notamment les membres du groupe travail STAPH : Philippe Vieu, Yves Ro-
main, Alain Boudou, Sylvie Viguier, et Luboš que j’ai plaisir à voir chaque
fois qu’il revient à Toulouse. Je remercie également Fabrice Gamboa pour la
confiance qu’il m’a accordée pendant mon année de DEA, je garde un très bon
souvenir de son enseignement. Je souhaite également remercier Anne Ruiz-
Gazen, que je connais mieux depuis la 1ère Rencontre des Jeunes Statisticiens
à Aussois. Elle s’est montrée très disponible pour répondre à mes questions et
discuter avec elle m’a permis d’envisager de nouvelles pistes de recherche.
Je tiens aussi à remercier particulièrement Françoise Michel pour sa dispo-

nibilité, sa bonne humeur et son efficacité pour prendre en charge les problèmes
administratifs que l’on rencontre au quotidien.
Ces trois années de thèse m’ont également permis de rencontrer des doc-
torants avec qui je passe de très bons moments. Les doctorants arrivés l’an
dernier, Maxime, Laurent, Florent et Amélie, ont apporté leur bonne humeur
pendant la pause quizz de midi. Mes remerciements vont aussi aux doctorants
arrivés en thèse la même année que moi ou l’année suivante, qui vont me laisser
de très bons souvenirs : Delphine (avec qui ça a été un plaisir de partager le
bureau ces deux dernières années), Marielle, Agnès, Solenn, Myriam et Diana.
Je ne saurais oublier les doctorants qui m’ont accueilli à mon arrivée en thèse,
et tous les bons moments qu’on a passé : Renaud, Clément, Cécile, Yan, Élie,
Abdelaâti, Nicolas et Jean-Pierre. Je souhaite aussi remercier Sébastien, à qui
j’ai posé d’innombrables questions sur LATEX, sur R, et je retiens avant tout
sa disponibilité et sa bonne humeur. Enfin, je connais Lionel depuis le DEA
et on partage le même bureau depuis le début de notre thèse, et je tiens à lui
dire quel plaisir j’ai eu de pouvoir faire ma thèse en même temps que lui, pour
tous les bons moments passés pendant ces années.
Enfin, je voudrais remercier ma famille, plus particulièrement mes parents

qui m’ont toujours soutenu dans les études et qui m’ont permis de les réaliser
dans les meilleures conditions possibles. Je remercie aussi ma sœur Magali,
ainsi que Marc, Julie et Anthony pour tous les moments qu’on passe ensemble
chaque fois que je reviens à Perpignan. Enfin, pour tout ce qu’elle m’apporte,
je remercie Marine.
TABLE DES MATIÈRES
Remerciements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Partie I. Estimation spline de quantiles conditionnels pour variable

explicative fonctionnelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
I.1. Présentation de l’estimateur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
I.2. Quantile regression when the covariates are functions . . . . . . 35
I.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
I.4.2. Construction of the estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
I.4.3. Convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
I.4.4. Some comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
I.4.5. Proof of the convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
I.3. Commentaires et perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 TABLE DES MATIÈRES
Partie II. Estimateur par splines de lissage dans le modèle linéaire

fonctionnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
II.1. Construction de l’estimateur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
II.2. Résultat de convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
II.3. Commentaires et perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Partie III. Modèle linéaire fonctionnel lorsque la variable explicative

est bruitée . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
III.1. Moindres carrés orthogonaux - Cas multivarié . . . . . . . . . . . . 77
III.2. Moindres carrés orthogonaux - Cas fonctionnel . . . . . . . . . . 83
III.2.1. Construction de l’estimateur (splines de régression) . . . . . . . . . . 84
III.2.2. Résultat de convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
III.2.3. Commentaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
III.2.4. Estimateur par splines de lissage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
III.2.5. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
III.3. Functional linear regression with errors-in-variables . . . . . . 91
III.3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
III.3.2. Estimation of α in the non-noisy case . . . . . . . . . . . . . . . . . . . . . . . . 94
III.3.3. Total Least Squares method for functional covariates . . . . . . . . 99
III.3.4. Some comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

TABLE DES MATIÈRES 7
III.3.5. A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
III.3.6. Proof of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
III.4. Régression sur composantes principales . . . . . . . . . . . . . . . . . . . . 121
III.4.1. Procédure d’estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
III.4.2. Intégrale du carré de la régression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
III.4.3. Résultats asymptotiques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
III.4.4. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Partie IV. Application à la prévision de pics de pollution . . . . 133
IV.1. Prévision par les quantiles conditionnels . . . . . . . . . . . . . . . . . . 135
IV.1.1. Algorithme d’estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
IV.1.2. Choix des paramètres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
IV.1.3. Modèle avec plusieurs variables explicatives . . . . . . . . . . . . . . . . . . 138
IV.2. Prévision par la moyenne conditionnelle . . . . . . . . . . . . . . . . . . 141
IV.2.1. Estimation par splines de régression . . . . . . . . . . . . . . . . . . . . . . . . . . 141
IV.2.2. Estimation par splines de lissage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
IV.3. Données de pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
IV.4. Ozone pollution forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
IV.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
IV.4.2. A brief analysis of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8 TABLE DES MATIÈRES
IV.4.3. Functional linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
IV.4.4. Conditional quantiles estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
IV.4.5. Application to Ozone prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Partie V. Annexe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
V.1. Variable explicative bruitée - Preuves . . . . . . . . . . . . . . . . . . . . . . 169
V.2. Intégrale du carré de la régression - Preuves . . . . . . . . . . . . . . 175
V.3. Régression sur composantes principales - Preuves . . . . . . . . 193
Bibliographie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
INTRODUCTION
La statistique fonctionnelle a connu un très important développement ces
dernières années. Cette branche de la statistique vise a étudier des données
qui, de part leur structure et le fait qu’elles soient collectées sur des grilles
très fines, peuvent être assimilées à des courbes ou à des surfaces, par exemple
fonctions du temps ou de l’espace. Le besoin de considérer ce type de données,
maintenant couramment rencontré sous le nom de données fonctionnelles dans
la littérature, est avant tout un besoin pratique. Compte tenu des capacités
actuelles des appareils de mesure et de stockage informatique, les situations
pouvant fournir de telles données sont multiples et issues de domaines variés :
on peut imaginer par exemple des courbes de croissance, de température, des
images observées par satellite, . . . Donner une liste exhaustive des situations
où de telles données sont rencontrées n’est pas envisageable, mais des exemples
précis de données fonctionnelles seront abordés dans cette thèse.
Cependant, au-delà de cet aspect pratique, il est nécessaire de donner un

cadre théorique pour l’étude de ces données. Bien que la statistique fonction-
nelle ait les mêmes objectifs que les autres branches de la statistique (analyse
de données, inférence, . . . ), les données ont cette particularité de prendre leurs
valeurs dans des espaces de fonctions, et les méthodes usuelles de la statistique
multivariée sont ici mises en défaut. Par exemple, considérons que l’on dispose
des observations de n courbes en p points de discrétisation, ces courbes étant
utilisées comme prédicteur d’une autre variable. Si on regroupe ces données
notées xij (pour i allant de 1 à n et j allant de 1 à p) sous forme d’une matrice
de taille n × p,
 
x11 . . . x1p
X =  ... ..  ,
.
xn1 . . . xnp
la méthode des moindres carrés ordinaires, très courante en statistique mul-

tivariée, peut donner de très mauvais résultats dans cette situation, puisque
cette méthode amène à l’inversion de la matrice Xτ X qui peut se révéler diffi-
cile voire impossible pour deux raisons. La première est que p est généralement
grand (on peut même avoir p > n, et ainsi une matrice Xτ X non inversible).
La seconde raison est qu’il y a de fortes chances d’avoir une colinéarité im-
portante entre les p prédicteurs du fait qu’ils sont les points de mesure d’une
même fonction. Pour contourner ce problème, des solutions ont été envisagées,
les plus courantes étant
11
• la “ridge regression” (introduite initialement par Hoerl et Kennard, 1980),

qui consiste à ajouter un terme de pénalisation dans le critère des moindres
carrés. Cela amène à inverser la matrice (Xτ X + λIp ) (avec λ réel stricte-
ment positif et Ip matrice identité de taille p) au lieu de Xτ X,
• la régression sur composantes principales, qui consiste à réduire la di-
mension p en utilisant les k premières composantes principales issues de
l’analyse en composantes principales du tableau X (avec k entier non nul
“convenablement” choisi),
• la régression “partial least squares” (voir Helland, 1990), qui est une
méthode algorithmique basée à chaque étape sur la régression par moindres
carrés ordinaires sur les résidus de l’étape précédente.
Ces différentes méthodes sont étudiées et comparées dans un article de Frank

et Friedman (1993), en vue de leur application dans le domaine de la chi-
miométrie. Comme souligné par Hastie et Mallows (1993) dans leur discussion
de cet article, l’approche qui consiste à voir une courbe uniquement à travers
un vecteur de points de mesure est réductrice, ne serait-ce que par le fait que
les points de mesure doivent être les mêmes pour chaque courbe observée, ce
qui n’est pas forcément le cas en pratique. Cette approche conduit également
au problème que l’on perd la structure de courbe si on utilise uniquement les
mesures de la courbe en certains points. C’est pourquoi il paraı̂t préférable de
traiter les données en tenant compte de leur nature fonctionnelle.
Les tout premiers travaux dans lesquels on retrouve cette idée de données
fonctionnelles sont finalement relativement “anciens”. Rao (1958) et Tucker
(1958) envisagent ainsi l’analyse en composantes principales et l’analyse fac-
torielle pour des données fonctionnelles et considèrent même explicitement les
données fonctionnelles comme un type particulier de données. Par la suite,
Ramsay (1982) dégage la notion de données fonctionnelles et soulève la ques-
tion de l’adaptation des méthodes de la statistique multivariée à ce cadre
fonctionnel. À partir de là, les travaux pour explorer la statistique fonction-
nelle commencent à se multiplier, pour finalement aboutir aujourd’hui à des
ouvrages faisant référence en la matière, comme par exemple les monographies
de Ramsay et Silverman (2002 et 2005).
Les travaux réalisés constituent ainsi à l’heure actuelle une littérature très
dense, que ce soit sur un plan théorique ou appliqué. D’un point de vue
théorique, la notion de variable aléatoire fonctionnelle (c’est-à-dire une va-
riable aléatoire à valeurs dans un espace de fonctions) est apparue, ce qui a
nécessité d’expliciter des notions simples pour une telle variable, comme par
12
exemple l’espérance ou l’opérateur de covariance. Des techniques déjà exis-

tantes en statistique multivariée (comme par exemple l’analyse en composantes
principales) peuvent alors se développer dans ce cadre fonctionnel, utilisant
notamment les connaissances en théorie des opérateurs dans les espaces de
Hilbert (voir par exemple Dunford et Schwarz, 1963, Gohberg et Krein, 1971).
Parmi les points de départ de cette généralisation au cadre fonctionnel des
méthodes de la statistique multivariée, Deville (1974) introduit une analyse en
composantes principales de courbes, tandis que la thèse de Dauxois et Pousse
(1976) va au-delà de l’analyse en composantes principales dans un espace de
Hilbert, s’attachant à traiter un certain nombre de méthodes regroupées sous
le nom d’analyses factorielles, dont l’analyse en composantes principales fait
partie. L’article de Dauxois, Pousse et Romain (1982) aborde davantage les
aspects asymptotiques du problème, fournissant notamment des résultats de
convergence pour l’opérateur de covariance empirique.
Dans cette thèse, on propose d’apporter une contribution à l’étude des

données fonctionnelles dans le contexte où la variable fonctionnelle sert à expli-
quer un phénomène représenté par une autre variable. Le problème qui va nous
intéresser est celui de la régression dans le cas où la variable explicative est
fonctionnelle. C’est un sujet sur lequel la littérature est très conséquente. D’un
point de vue très général, ce modèle de régression fonctionnelle peut s’écrire
(1) Yi = r (Xi ) + i , i = 1, . . . , n,
où, pour tout i = 1, . . . , n,
• Xi appartient à un espace de Hilbert H dont le produit scalaire et la semi-

norme associée seront notés respectivement h., .i et k.kH . Notons que Xi
pourra être aléatoire ou non, suivant que l’on considère un modèle à plan
aléatoire ou à plan fixe,
• Yi est une variable aléatoire réelle (variable d’intérêt),
• i est une variable aléatoire d’erreur,
• les variables aléatoires considérées sont toutes définies sur un même espace
probabilisé (Ω, A, P).
Le but est alors d’estimer l’opérateur r : H −→ R inconnu, sur la base des

données (Xi , Yi )i=1,...,n .
13
Dans cette thèse, on va s’intéresser à un modèle un peu plus particulier que

le modèle précédent (1). Il s’agit du modèle linéaire fonctionnel, qui s’écrit sous
la forme
(2) Yi = hα, Xi i + i , i = 1, . . . , n,
où le but est d’estimer α ∈ H inconnu, sur la base des données (Xi , Yi )i=1,...,n .
Les hypothèses faites sur 1 , . . . , n seront détaillées le moment venu. Nous
verrons en effet que ces hypothèses diffèreront suivant les situations envisagées.
D’abord décrit par Ramsay et Dalzell (1991), ce modèle est toujours l’objet
de travaux récents, comme en témoignent par exemple les articles de Cardot,
Ferraty et Sarda (1999, 2003) étudiant le cas d’une variable réponse réelle
et donnant une méthode d’estimation de α à l’aide de ce qu’ils introduisent
comme la régression sur composantes principales fonctionnelle, ou encore à
l’aide de fonctions splines. Des vitesses de convergence sont également obtenues
pour les estimateurs qu’ils construisent. Dans cette thèse, c’est à ce modèle (2)
que l’on va s’intéresser. C’est un modèle très populaire en analyse de données
fonctionnelles, il est important de noter toutefois que d’autres modèles qui lui
sont liés (par exemple des extensions de ce modèle linéaire fonctionnel (2))
sont également sujets d’études récentes. Là encore, il semble impossible de
lister les modèles existants relatifs à (1) et (2). On va cependant donner un
tour d’horizon des modèles les plus fréquemment rencontrés.
• Il est possible de considérer que la variable d’intérêt est elle aussi de nature
fonctionnelle, comme la variable explicative. Cuevas, Febrero et Fraiman
(2002) ou encore Chiou, Müller et Wang (2004) se sont ainsi intéressés
à ce modèle linéaire fonctionnel avec une variable réponse fonctionnelle,
c’est-à-dire lorsque Yi (pour i = 1, . . . , n) appartient aussi à un espace
de Hilbert. Cuevas, Febrero et Fraiman (2002) travaillent dans le cadre
d’un modèle à plan fixe (c’est-à-dire que X1 , . . . , Xn sont non aléatoires),
alors que Chiou, Müller et Wang (2004) considèrent X1 , . . . , Xn aléatoires,
basant leur méthode d’estimation sur les décompositions de Karhunen-
Loève des Xi et Yi , i = 1, . . . , n. L’article plus bibliographique de Müller
(2005) passe en revue diverses méthodes d’estimation pour des modèles
linéaires fonctionnels avec une variable d’intérêt réelle ou fonctionnelle
et une variable explicative multidimensionnelle ou fonctionnelle. Il étend
aussi ces méthodes, considérant notamment des variables explicatives dont
les points de mesure peuvent être irrégulièrement espacés et en faible
14
nombre, situation étudiée dans les articles de Yao, Müller et Wang (2005a,
2005b).
• Une autre extension possible du modèle linéaire fonctionnel est le modèle
linéaire fonctionnel généralisé, qui est la version fonctionnelle du modèle
linéaire généralisé introduit par Wedderburn (1974) puis repris dans un
ouvrage par McCullagh et Nelder (1989). Ce modèle linéaire fonctionnel
généralisé a été notamment étudié par Cardot et Sarda (2005) ainsi que
Müller et Stadtmüller (2005). Dans ce modèle, on suppose que la loi condi-
tionnelle de Yi sachant Xi = x appartient à la famille exponentielle. Par
exemple, cela permet de traiter le cas particulier important de la régression
fonctionnelle binomiale, où on a Yi ∈ {0, 1} pour i = 1, . . . , n (voir Müller
et Stadtmüller, 2005).
• Un autre modèle qui connaı̂t une grande popularité récente pour des rai-
sons pratiques (et qui est en fait un cas particulier du modèle linéaire
fonctionnel généralisé cité ci-dessus) est la version fonctionnelle de la clas-
sification, développé, entre autres, dans les travaux de Berlinet, Biau et
Rouvière (2005). Dans ce modèle de classification, la variable réponse Yi
est cette fois un label associé à la courbe Xi . Le but est de construire
une règle de classification de façon à pouvoir attribuer un label à une
nouvelle observation. Berlinet, Biau et Rouvière (2005) proposent ainsi
une construction de règle de classification basée sur une décomposition
de X1 , . . . , Xn dans une base d’ondelettes. Ils montrent aussi une certaine
forme d’optimalité pour cette règle, prouvant qu’asymptotiquement elle
prédit aussi bien que la meilleure règle possible, la règle de Bayes (voir à
ce sujet Devroye, Györfi et Lugosi, 1996). Müller et Stadtmüller (2005)
ont également abordé ce problème en interprétant le problème de classi-
fication (à deux labels) avec variable explicative fonctionnelle comme un
cas particulier du modèle linéaire fonctionnel généralisé, avec une variable
réponse binaire.
• Enfin, une dernière approche importante consiste à revenir au modèle (1)
et d’estimer directement l’opérateur r de façon nonparamétrique. Cette
approche a été développée par Ferraty et Vieu (2002, 2003) qui donnent
un estimateur à noyau de l’opérateur r et obtiennent là aussi des vitesses
de convergence pour cet estimateur. On reviendra un peu plus loin sur
cette approche. Ces techniques nonparamétriques avec variables fonction-
nelles sont récentes et les principaux travaux peuvent être trouvés dans la
monographie de Ferraty et Vieu (2006).
Comme cela a déjà été souligné, l’étude de ces divers modèles est motivée
au départ par des problèmes pratiques. La variété des domaines dans lesquels
15
les outils de la statistique fonctionnelle interviennent est considérable. La mo-

nographie de Ramsay et Silverman (2002) est à elle seule une mine de situa-
tions concrètes de données fonctionnelles et de méthodes différentes, allant par
exemple de l’étude de la forme d’os déterrés par des archéologues à l’étude de
l’enregistrement de l’activité du cerveau lorsqu’on fait prononcer à un indi-
vidu une syllabe (par électromyographie, c’est-à-dire en enregistrant avec des
électrodes placées à la surface de la peau l’activité électrique provoquée par le
mouvement des lèvres et en mesurant l’accélération des lèvres). Il est impor-
tant de noter que, parmi tous ces cas concrets, se pose souvent le problème
du traitement préalable des données. Ramsay et Silverman (2002) mettent no-
tamment en évidence, sur des données relatives à des courbes de croissance,
le fait que ces courbes montrent deux types de variabilité, l’amplitude (qui
se rapporte à des variations en taille pour des caractéristiques particulières
comme le pic de croissance de la puberté) et la phase (qui se rapporte à des
variations dans le temps de caractéristiques particulières). Un prétraitement
des données doit viser à éliminer la phase de façon à pouvoir concentrer l’étude
sur la variation en amplitude. C’est l’objet de travaux tels que les articles de
Ramsay et Li (1998), ainsi que Kneip, Li, Mac Gibbon et Ramsay (2000). Sans
vouloir en faire une liste exhaustive, on souhaite citer quelques domaines dans
lesquels apparaissent les données fonctionnelles, pour donner une idée du type
de problèmes que la statistique fonctionnelle permet de résoudre.
• En biologie, on trouve en premier lieu le travail précurseur de Rao (1958)

concernant une étude de courbes de croissance. Plus récemment, un autre
exemple est l’étude des variations de l’angle du genou durant la marche
(voir Ramsay et Silverman, 2002). Concernant la biologie animale, des
études de la ponte de mouches méditerranéennes ont été faites par plu-
sieurs auteurs (Chiou, Müller, Wang et Carey, 2003, Chiou, Müller et
Wang, 2003, Cardot, 2006). Les données consistent en des courbes don-
nant pour chaque mouche la quantité d’œufs pondus en fonction du temps.
• La chimiométrie fait aussi partie des champs d’étude propices à l’utilisa-
tion de méthodes de la statistique fonctionnelle. Parmi les travaux exis-
tants sur le sujet, on peut citer Frank et Friedman (1993) dont on a déjà
parlé en début d’introduction, ainsi que Hastie et Mallows (1993) qui ont
commenté l’article de Frank et Friedman (1993) en apportant un exemple
de courbes mesurant la log-intensité d’un rayon laser réfracté en fonction
de l’angle de réfraction. Plus récemment, Ferraty et Vieu (2002) se sont
intéressés à l’étude de la contenance de graisse de morceaux de viande
(variable d’intérêt) étant données les courbes d’absorbsions de longueurs
d’ondes infra-rouge de ces morceaux de viande (variable explicative).
16
• Des applications liées à l’environnement ont été notamment étudiées par

Aneiros-Perez, Cardot, Estevez-Perez et Vieu (2004) qui ont travaillé sur
un problème de prévision de pollution. Ces données consistent en des me-
sures de pics de pollution par l’ozone chaque jour (variable d’intérêt) étant
données des courbes de polluants ainsi que de courbes météorologiques de
la veille (variables explicatives). Ces données seront également utilisées
dans la partie appliquée de cette thèse (partie IV), et seront alors expli-
citées en détail à ce moment-là.
• La climatologie est un domaine où les données fonctionnelles apparaissent
naturellement. Une étude du phénomène El Niño (courant chaud de l’océan
Pacifique) a ainsi été réalisée par Besse, Cardot et Stephenson (2000).
Dans ce travail, les données consistent en des mesures de la température
de ce courant en fonction du temps, et la prédiction est faite en utilisant
un modèle autorégressif fonctionnel (voir à ce sujet Bosq, 2000).
• En linguistique, des travaux ont également été réalisés, notamment concer-
nant la reconnaissance vocale. On peut citer par exemple Hastie, Buja et
Tibshirani (1995), Berlinet, Biau et Rouvière (2005) ou encore Ferraty et
Vieu (2003). Ces travaux sont fortement liés aux méthodes de classifica-
tion lorsque la variable explicative est une courbe. Brièvement, les données
sont des courbes correspondant à des enregistrements de phonèmes pro-
noncés par différents individus. On associe un label à chaque phonème
(variable d’intérêt) et le but est d’établir une classification de ces courbes
en utilisant comme variable explicative la courbe enregistrée.
• Dans le domaine de la graphologie, l’apport des techniques de la statistique
fonctionnelle a là aussi trouvé une application. Les travaux sur ce problème
sont par exemple ceux de Hastie, Buja et Tibshirani (1995) et Ramsay
(2000). Ce dernier modélise par exemple la position du stylo (abscisses et
ordonnées en fonction du temps) à l’aide d’équations différentielles.
• Les applications à l’économie sont aussi relativement nombreuses. Des tra-
vaux ont notamment été effectués par Kneip et Utikal (2001), et récemment
par Benko, Härdle et Kneip (2005), basés notamment sur une analyse en
composantes principales fonctionnelle. Cette méthode d’estimation sera
analysée lorsqu’on l’utilisera (voir partie III), même si on peut déjà sou-
ligner que l’idée de base est, lors de l’estimation de l’opérateur de cova-
riance, d’estimer des produits scalaires entre les courbes observées au lieu
d’estimer des courbes elles-mêmes.
Ce rapide tour d’horizon donne une idée de la diversité des approches per-
mettant d’étudier les modèles (1) et (2). Dans cette thèse, on propose d’ap-
porter des contributions à cette étude. Les différentes approches qui seront
17
considérées, toujours reliées au modèle (2), auront ainsi pour but de proposer
une estimation du paramètre fonctionnel α. Parmi les différentes méthodes
possibles, l’une d’entre elles consiste à faire des hypothèses de régularité sur α
(ces hypothèses seront analysées en détail dans la suite pour chaque approche
envisagée). Il est alors fréquent d’estimer α par projection sur un espace de
fonctions régulières dont on connaı̂t une base. Dans cette thèse, on a considéré
pour chaque approche un espace de fonctions splines, même si plusieurs autres
bases (comme par exemple les bases de Fourier, les bases d’ondelettes, . . . )
peuvent aussi être envisagées. Depuis leur introduction (voir notamment de
Boor, 1978, Schumaker, 1981, un peu plus récemment Dierckx, 1993), les
splines connaissent une grande popularité, notamment grâce à une mise en
œuvre pratique relativement simple. Pour situer brièvement le contexte, une
spline polynômiale (univariée) sur l’intervalle [0, 1] (on choisit cet intervalle
pour simplifier) est une fonction polynômiale s de degré q par morceaux (avec
q ∈ N) définie à l’aide de k − 1 points x1 , . . . , xk−1 (avec k ∈ N, k ≥ 2), appelés
nœuds, formant une subdivision de l’intervalle [0, 1],
0 < x1 < . . . < xk−1 < 1,
et s vérifiant les propriétés suivantes :
• s est un polynôme de degré q sur chaque sous-intervalle [0, x1 ], [x1 , x2 ],

. . . , [xk−2 , xk−1 ], [xk−1 , 1],
• s est une fonction de classe C q−1 sur l’intervalle [0, 1] (par convention, s
est une fonction en escalier lorsque q = 0).
Plus précisément, étant donné un degré q ∈ N, l’ensemble Sq (x1 , . . . , xk−1 ) des

fonctions splines ayant pour nœuds x1 , . . . , xk−1 et de degré q est l’ensemble
des fonctions s qui s’écrivent
q k−1
X X r−1
j
s(t) = θj t + δj (t − xj )+ ,
j=0 j=1
où θ0 , . . . , θq , δ1 , . . . , δk−1 ∈ R et, pour tout entier j ≥ 1,

uj si u ≥ 0,
uj+ =
0 si u < 0.
18
On montre alors que Sq (x1 , . . . , xk−1 ) est un espace vectoriel de dimension k+q.
En prenant comme nœuds les points de mesure des observations, on parle de
splines de lissage. Ces splines particulières ont notamment été étudiées par
Eubank (1988). Dans le cas où les nœuds sont en d’autres points, on parle de
splines de régression. Ces deux types de fonctions splines seront utilisées dans
la thèse. On va juste préciser quelques notations et propriétés pour ces deux
types de fonctions splines.
• Pour les splines de régression, on se donne un entier k ≥ 2 et un degré

q ∈ N de polynôme, et on considère l’espace des fonctions splines de degré
q avec k −1 nœuds intérieurs sur [0, 1]. On prendra ces nœuds équidistants
pour simplifier. Cet espace de fonctions splines est un espace vectoriel de
dimension k + q. Une base de cet espace est l’ensemble des fonctions B-
splines normalisées (voir par exemple de Boor, 1978), que l’on notera dans
toute la suite
Bk,q = (B1 , . . . , Bk+q )τ .

Ainsi, une fonction spline s’écrit comme une combinaison linéaire de ces
fonctions de base, de la forme Bτk,q θ avec θ ∈ Rk+q .
• Pour les splines de lissage, on prend comme nœuds les points t1 , . . . , tp
auxquels sont mesurées les courbes X1 , . . . , Xn et comme degré de po-
lynôme q = 2m − 1 avec m entier strictement positif. On suppose de plus
que ces splines sont des polynômes de degré m − 1 sur les intervalles [0, t1 ]
et [tp , 1]. On montre alors (voir Eubank, 1988) que cet ensemble de fonc-
tions splines (appelé espace des fonctions splines naturelles) est un espace
vectoriel de dimension p, dont une base sera notée dans toute la suite
b(t) = (b1 (t), . . . , bp (t))τ .

Une propriété importante de ces splines naturelles est qu’il existe une
correspondance bijective entre cet espace et Rp de la façon suivante. Pour
tout vecteur w = (w1 , . . . , wp )τ ∈ Rp , il existe une unique fonction spline
naturelle, appelée spline d’interpolation associée à w et notée sw , telle
que, pour tout j = 1, . . . , p,
sw (tj ) = wj .
En notant B la matrice de taille p × p ayant pour éléments bi (tj ) pour i
et j allant de 1 à p, on montre que sw est donnée par
(3) sw (t) = b(t)τ (Bτ B)−1 Bτ w.

19
De plus, une propriété importante d’une telle spline d’interpolation est

qu’elle vérifie
Z 1 Z 1
(m)
(4) sw (t)2 dt ≤ f (m) (t)2 dt,
0 0
(m)
pour toute autre fonction f avec f ∈ L2 ([0, 1]) et f (tj ) = wj pour tout
j = 1, . . . , p. Notons enfin une autre propriété importante des fonctions
splines naturelles. Étant données des observations (x1 , y1 ), . . . , (xn , yn ), on
montre (voir encore Eubank, 1998) que le problème de minimisation
( n Z 1 )
1X
min (yi − f (xi ))2 + ρ f (m) (t)dt ,
f ∈L2 ([0,1]) n i=1 0
avec ρ > 0, admet une unique solution qui est une spline naturelle de
degré 2m − 1 avec comme nœuds intérieurs x1 , . . . , xn (spline de lissage).
Pour l’estimation de la moyenne conditionnelle, ce type de problème avec
pénalisation est préconisé par exemple par Eubank (1988), Wahba (1990)
ou encore Green et Silverman (1994). Cela permet de chercher une fonction
f dont on contrôle le lissage au moyen de la pénalisation sur la norme L2
de sa dérivée d’ordre m.
Cette première partie d’introduction présentait le cadre de travail général de

cette thèse. On va dans la suite de cette introduction donner une présentation
des différentes contributions apportées à l’étude du modèle (2).
Partie I : Estimation de quantiles conditionnels
La première contribution apportée à l’étude du modèle (2) est l’objet de

la partie I. On propose de considérer le modèle (2) d’un point de vue de la
régression sur quantiles, pour proposer une alternative à la régression usuelle
sur la moyenne. Concernant la statistique multivariée, la régression sur quan-
tiles est apparue dans les années 70. Bien que la régression sur la moyenne
dispose de propriétés qui en font un modèle très populaire (calculs explicites
aisés, propriétés d’optimalité en cas d’erreurs gaussiennes), Mosteller et Tu-
key (1977) mettent en évidence le fait qu’elle ne peut donner qu’une vision
incomplète des données. Ils lancent alors l’idée de la régression sur quantiles,
reprise ensuite par Koenker et Bassett (1978). La monographie de Koenker
(2005) donne actuellement une vision d’ensemble sur pratiquement 30 ans de
20
travaux concernant la régression sur quantiles dans le cadre d’une variable ex-
plicative multivariée. Outre le fait que cette alternative à la régression sur la
moyenne permette de donner une meilleure idée de la distribution des données
(car calculer un quantile d’un certain ordre pour une loi de probabilité revient
finalement à inverser la fonction de répartition de cette loi), elle offre également
d’autres avantages, comme par exemple le fait de permettre la construction
d’intervalles de prédiction, ou encore de disposer d’une certaine forme de ro-
bustesse. Concernant ce dernier point, comme souligné par Koenker (2005), la
moyenne conditionnelle possède des propriétés d’optimalité lorsque les erreurs
sont gaussiennes. Si ce n’est pas le cas (notamment lorsqu’on est en présence
de données aberrantes), la performance par exemple de la médiane peut être
supérieure à celle de la moyenne : la médiane présente une certaine forme de
robustesse vis-à-vis des données aberrantes. D’un point de vue appliqué, l’utili-
sation des quantiles est présente dans des domaines aussi variés que l’agronomie
(pour estimer des seuils de rendement), la médecine (voir par exemple l’article
de Lejeune et Sarda, 1988, concernant les courbes de croissance) ou en fiabilité
(toujours concernant l’estimation de seuils). Récemment, des travaux se sont
intéressés à l’estimation de quantiles (non conditionnels) pour des variables
aléatoires multivariées ou à valeurs dans un espace de Banach, introduisant
notamment la notion de boule médiane. Il s’agit par exemple des articles de
Averous et Meste (1997) puis de Cadre (2001). Au niveau de l’estimation de
la médiane conditionnelle pour variable d’intérêt multivariée, Berlinet, Cadre
et Gannoun (2001) ont proposé une méthode d’estimation nonparamétrique
basée sur une estimation à noyau de la fonction de répartition (multivariée)
conditionnelle. Cette situation (variable d’intérêt multivariée ou à valeurs dans
un espace de Banach) ne sera pas envisagée ici, puisqu’on considèrera unique-
ment une variable d’intérêt réelle, seule la variable explicative pouvant prendre
ses valeurs dans un espace de fonctions.
Dans le cadre d’une variable explicative univariée ou multivariée, outre les

travaux initiaux de Koenker et Bassett (1978), l’estimation de quantiles condi-
tionnels donne lieu à une littérature abondante. En notant toujours Y1 , . . . , Yn
les observations (réelles) de la variable d’intérêt et X1 , . . . , Xn les observations
(multivariées) de la variable explicative, étant donné un réel α ∈]0, 1[, on note
gα (x) le quantile conditionnel sachant Xi = x (avec x ∈ Rp ) défini par
P(Yi ≤ gα (Xi )|Xi = x) = α,

21
où P(.|Xi = x) désigne la loi conditionnelle de Yi sachant Xi = x. La fonction

gα , définie de Rp dans R est appelée fonction quantile conditionnel d’ordre α.
L’article bibliographique de Poiraud-Casanova et Thomas-Agnan (1998) fait
un large tour d’horizon des méthodes d’estimation de cette fonction quantile
conditionnel. Ces méthodes peuvent être réparties en deux classes.
• La première de ces classes utilise le fait déjà souligné que, pour calculer
un quantile d’un certain ordre pour une loi de probabilité, on peut re-
venir à l’inversion de la fonction de répartition (conditionnelle) de cette
loi. Le tout est alors de donner une méthode d’estimation de cette fonc-
tion de répartition. Cette idée est utilisée par exemple dans un article
de Bhattacharya et Gangopadhyay (1990) qui proposent une estimation
à noyau et par la méthode des plus proches voisins de la fonction de
répartition. Les vitesses nonparamétriques usuelles sont également obte-
nues pour les estimateurs construits. Avec cette même idée, Ducharme,
Gannoun, Guertin et Jéquier (1995) donnent un estimateur à noyau de
la fonction de répartition conditionnelle, puis en l’inversant, obtiennent la
normalité asymptotique de l’estimateur des quantiles conditionnels.
• La seconde classe d’estimation de quantiles conditionnels regroupe les
méthodes plus directes, basées sur la recherche de quantiles condition-
nels comme solutions d’un problème de minimisation. Bassett et Koenker
(1978) étudient ainsi un estimateur de la médiane conditionnelle en mi-
nimisant un critère du type moindres valeurs absolues, et prouvent sa
consistance et sa normalité asymptotique. D’un point de vue numérique,
des méthodes algorithmiques de résolution de tels problèmes de minimi-
sation (dont la solution n’est pas explicite) peuvent être trouvées dans
Koenker (2005), comme par exemple l’utilisation de la méthode du sim-
plexe. Plus généralement, ceci peut être étendu à n’importe quel quantile.
On cherche ainsi un estimateur de gα parmi une certaine classe de fonc-
tions rα minimisant une quantité du type
n
1X
(5) lα (Yi − rα (Xi )) ,
n i=1
où la fonction de perte lα (qui remplace la fonction carré par rapport à

un problème de minimisation classique de type moindres carrés), baptisée
“check function” par Koenker (2005), est définie par
lα (u) = |u| + (2α − 1) u.

22
On retombe sur l’estimation de la médiane par minimisation d’un critère

du type moindres valeurs absolues lorsque α = 1/2. Plusieurs types d’es-
timateurs sont envisageables, ayant en commun le fait de minimiser cette
quantité (5). Par exemple, He et Shi (1994) proposent un estimateur de gα
basé sur des splines de régression et obtiennent des vitesses de convergence
usuelles en statistique nonparamétrique. On reviendra sur cet estimateur
au cours de cette partie I de la thèse. On rencontre d’autres estimateurs
dans la littérature, comme par exemple l’estimateur de Koenker, Ng et
Portnoy (1994) basé sur des splines de lissage, ou encore l’estimateur pro-
posé par Lejeune et Sarda (1988) utilisant la méthode de régression po-
lynômiale locale. Enfin, Fan, Hu et Truong (1994) donnent quant à eux
un estimateur à noyau de gα .
La partie I de cette thèse propose donc de généraliser la notion de régression

sur quantiles au cas où la variable explicative est fonctionnelle. On considère
ainsi des observations X1 , . . . , Xn appartenant à l’espace de Hilbert L2 ([0, 1])
des fonctions définies de [0, 1] dans R de carré intégrable, muni de son produit
scalaire usuel défini par
Z 1
hf, gi = f (t)g(t)dt,
0
pour toutes fonctions f et g de L2 ([0, 1]), et de norme associée k.kL2 . En

pratique, les courbes observées sont de carré intégrable dans une très grande
majorité de cas, le fait de se restreindre à des fonctions définies sur [0, 1] ne
fait pas perdre en généralité, puisqu’on peut toujours s’y ramener (pour des
fonctions définies au départ sur un intervalle [a, b]) à l’aide de la transformation
affine
x−a
x 7−→ .
b−a
Dans ce contexte, on cherche alors rα dans une certaine classe d’opérateurs qui
minimise
n
1X
(6) lα (Yi − rα (Xi )) .
n i=1
23
On se place ensuite dans le cadre d’un modèle linéaire, c’est-à-dire que l’on
suppose que rα (Xi ) s’écrit hΨα , Xi i pour tout i = 1, . . . , n avec Ψα ∈ L2 ([0, 1]).
L’objectif de cette partie est de proposer un estimateur Ψα à l’aide de splines
de régression, solution d’une version pénalisée du problème de minimisation
(6). Cette approche s’inspire ainsi, dans le cas réel, des travaux de He et
Shi (1994), ou encore de Koenker, Ng et Portnoy (1994) du point de vue
de l’introduction d’une pénalisation. On verra que, dans le cadre fonctionnel,
l’introduction d’une pénalisation est importante pour assurer l’existence d’un
estimateur solution du problème de minimisation (on peut voir aussi les tra-
vaux de Cardot, Ferraty et Sarda, 2003, concernant l’introduction d’une telle
pénalisation dans le cadre de l’estimation de la moyenne conditionnelle). Le
comportement asymptotique de cet estimateur sera ensuite étudié et on don-
nera une borne supérieure pour la vitesse de convergence, relativement à une
norme particulière, la norme induite par l’opérateur de covariance de Xi .
Partie II : Estimateur par splines de lissage dans le modèle linéaire

fonctionnel
Dans la partie précédente, l’estimation de quantiles conditionnels a été vue

comme une alternative possible à l’estimation de la moyenne conditionnelle.
Cependant, cette dernière reste la plus populaire concernant l’étude des modèles
(1) et (2). On propose dans cette partie une nouvelle méthode d’estimation de
la moyenne conditionnelle. Comme cela a déjà été remarqué, les modèles (1) et
(2) sont concernés par de nombreux travaux. La procédure d’estimation pro-
posée dans cette partie, basée sur des techniques de splines de lissage, offre un
complément aux techniques déjà existantes.
• Par exemple, Ferraty et Vieu (2002, 2006) considèrent le modèle (1) et

se placent d’un point de vue nonparamétrique, en estimant directement
l’opérateur r. Ils donnent une méthode d’estimation à noyau (en adaptant
l’estimateur de Nadaraya-Watson à ce cadre fonctionnel) et fournissent
des résultats de convergence pour cet estimateur. En adoptant un point
de vue nonparamétrique, ils font simplement des hypothèses sur r du type
“r continu” ou “r höldérien” et obtiennent leurs résultats en introduisant
la dimension fractale de la loi de probabilité de X1 , . . . , Xn , en d’autres
termes en contrôlant les probabilités du type P (Xi ∈ B (x, δ)) lorsque δ
tend vers zéro, où B (x, δ) désigne la boule de centre x et de rayon δ pour
la semi-norme k.kH de H. Cette approche nonparamétrique ne sera pas
envisagée dans cette thèse. On considèrera directement le modèle (2) et
24
on proposera une méthode d’estimation de α basé sur la résolution d’un

problème de minimisation de type moindres carrés pénalisés.
• Ce type d’approche d’estimation de α se retrouve par exemple dans les
travaux de Goutis (1998) qui donne une méthode d’estimation de α, utili-
sant cependant le produit scalaire défini comme l’intégrale du produit des
dérivées secondes des fonctions au lieu du produit scalaire usuel de L2 .
Cardot, Ferraty et Sarda (1999, 2003) donnent quant à eux deux méthodes
d’estimation de α, leur implémentation ainsi que des résultats de conver-
gence pour les estimateurs construits. La première méthode d’estimation
introduit la régression sur composantes principales fonctionnelle, basée
sur la diagonalisation de l’opérateur de covariance de Xi et généralisant
ainsi la méthode de régression sur composantes principales dans le cas
multivarié. La seconde méthode d’estimation de α utilise les splines de
régression (en choisissant k − 1 nœuds équirépartis dans l’intervalle [0, 1]).
Cet estimateur est obtenu comme solution d’un problème de minimisation
de type moindres carrés pénalisés. On recherche alors un estimateur spline
b = Bτk,q θb avec θb ∈ Rk+q solution du problème de
b de α, s’écrivant ainsi α
α
minimisation
( )
1X
n
2 (m)
2
(7) min Yi − hBτk,q θ, Xi i + ρ Bτk,q θ 2 .
θ∈Rk+q n i=1 L
Ce problème de minimisation est consitué de deux termes, le premier

étant un terme de résidus de type moindres carrés classique et le second
étant un terme de régularisation avec un paramètre de lissage (ρ) et la
norme au carré de la dérivée d’ordre m de la fonction spline que l’on re-
cherche. Comme souligné par Cardot, Ferraty et Sarda (2003), ce terme
de régularisation permet d’assurer l’existence et la consistance de l’esti-
mateur spline construit. Il permet aussi en pratique de contrôler le lissage
de cet estimateur. Il est important de noter que le problème de minimi-
sation (7) admet une solution explicite, ce qui permet une mise en œuvre
pratique relativement simple.
L’approche que l’on présente dans cette partie II peut être davantage rap-
prochée des travaux de Cardot, Ferraty et Sarda (2003). En ce qui concerne
notre travail, on a considéré des splines de lissage (c’est-à-dire avec des nœuds
placés aux points de mesure des courbes X1 , . . . , Xn observées). Pour utiliser
ces splines de lissage, on introduit les points de mesure des courbes. On suppose
ainsi que X1 , . . . , Xn sont à valeurs dans L2 ([0, 1]), espace des fonctions définies
de [0, 1] dans R, de carré intégrable, muni de son produit scalaire h., .i et de
25
sa norme k.kL2 usuels. Pour simplifier, on supposera que toutes les courbes
sont observées en des points de discrétisation t1 < . . . < tp , les mêmes pour
toutes les courbes, et équirépartis, c’est-à-dire que tj − tj−1 = 1/p pour tout
j = 2, . . . , p. Le modèle (2) sera alors approximé par un modèle linéaire fonc-
tionnel discret. On associe ensuite à ce modèle un problème de minimisation
de type moindres carrés pénalisés, dont la solution (estimation de α) s’écrit
de façon explicite comme une fonction spline. On établit dans cette partie II
un résultat de convergence sur l’estimation de α. On peut même améliorer les
vitesses obtenues en posant des hypothèses plus fortes sur la régularité des
courbes X1 , . . . , Xn . L’analyse de ces vitesses sera détaillée au cours de cette
partie II.
Partie III : Modèle linéaire fonctionnel lorsque la variable expli-

cative est bruitée
Jusqu’à présent, au vu de l’écriture des modèles (1) et (2), on a toujours

implicitement fait l’hypothèse que les courbes X1 , . . . , Xn sont observées sans
erreur. Cette hypothèse peut se révéler assez peu réaliste en pratique puisque
de nombreuses erreurs (comme entre autres des erreurs de mesure) peuvent
empêcher de connaı̂tre les courbes X1 , . . . , Xn exactement. Il semble alors plus
réaliste de considérer que la variable explicative réellement disponible est une
variable Wi (pour i = 1, . . . , n) telle qu’en chaque point de mesure tj (pour
j = 1, . . . , p), on a
(8) Wi (tj ) = Xi (tj ) + δij ,
où (δij )i=1,...,n,j=1,...,p est une suite de variables aléatoires indépendantes et

identiquement distribuées représentant les erreurs (de mesures, . . . ) faites en
chaque point t1 , . . . , tp , et telles que E(δij ) = 0 et E(δij2 ) = σδ2 pour tout
i = 1, . . . , n et pour tout j = 1, . . . , p.
Ce modèle avec des erreurs dans les variables explicatives a été l’objet de
nombreuses études dans le cadre multivarié (c’est-à-dire lorsque X1 , . . . , Xn
sont des éléments de Rp ). Par exemple, Fuller (1987) donne pour ce modèle
bruité une méthode par maximum de vraisemblance. Des résultats asymp-
totiques sont également donnés par Gleser (1981). Une méthode numérique
26
importante, connue sous le nom de moindres carrés orthogonaux, a été no-

tamment présentée par Golub et Van Loan (1980), puis reprise et développée
dans un ouvrage de Van Huffel et Vandewalle (1991). L’idée de départ de cette
méthode consiste à rajouter dans le problème de minimisation des moindres
carrés la quantité
n
1X
kWi − Xi k2 ,
n i=1
où Wi et Xi (pour i = 1, . . . , n) désignent les vecteurs de taille p de termes

généraux respectifs Wi (tj ) et Xi (tj ), pour tout j = 1, . . . , p, et k.k désigne la
norme vectorielle euclidienne usuelle (ici dans Rp ). On détermine alors (voir
Golub et Van Loan, 1980) la solution du problème de minimisation, construi-
sant ainsi un estimateur consistant de la moyenne conditionnelle.
Dans notre cadre fonctionnel, le cas de variables explicatives bruitées a

déjà été considéré. Les méthodes envisagées sont généralement basées sur un
débruitage de chaque courbe par des techniques de lissage (voir par exemple
Chiou, Muller et Wang, 2003, Cardot, 2006). Dans cette partie, on propose de
généraliser la méthode des moindres carrés orthogonaux à ce cadre fonctionnel,
fournissant ainsi une méthode “globale” de débruitage, et non plus courbe par
courbe. Après avoir expliqué le fonctionnement et le principe de résolution de
cette méthode des moindres carrés orthogonaux dans le cas multivarié, l’ob-
jet de cette partie III sera de donner sa généralisation au cas d’une variable
explicative fonctionnelle, et de s’intéresser au comportement asymptotique de
l’estimateur construit par la méthode des moindres carrés orthogonaux. La
méthode sera envisagée à la fois pour les splines de lissage (c’est-à-dire dans le
même contexte que dans la partie II) et pour les splines de régression (c’est-à-
dire dans le même contexte que dans les travaux de Cardot, Ferraty et Sarda,
2003).
Concernant ce problème de variable explicative bruitée, une autre piste a

commencée a être envisagée au cours de cette thèse. En revenant à une méthode
de débruitage courbe par courbe, elle consiste à considérer un lissage de chaque
courbe bruitée (par exemple un lissage à noyau), puis de faire une régression
sur composantes principales fonctionnelle à l’aide de la technique utilisée par
Kneip et Utikal (2001) ou encore par Benko, Härdle et Kneip (2005). Comme
cela a déjà été signalé, dans ces articles, lors de l’estimation de l’opérateur de
covariance, on estime des produits scalaires entre les courbes observées plutôt
27
que les courbes elles-mêmes. Cette méthode d’estimation a commencé à donner

des résultats encourageants tant au niveau pratique sur des simulations que
théorique avec la recherche de vitesses de convergence pour l’estimateur de α.
Ces premiers résultats seront également présentés dans cette partie III.
Partie IV : Simulations et application à la prévision de pics de

pollution
Dans cette dernière partie, on se propose de mettre en œuvre les différents

types d’estimateurs étudiés sur des jeux de données. Cette partie appliquée
propose d’abord une étude sur des données simulées (permettant de juger la
qualités des techniques d’estimation). Concernant l’estimation spline de quan-
tiles conditionnels présentée dans la partie I, le problème de minimisation que
l’on résoud n’ayant pas de solution explicite, on utilisera un algorithme de
type moindres carrés itérés pondérés (voir par exemple Ruppert et Caroll,
1988, également Lejeune et Sarda, 1988), permettant d’obtenir une méthode
numérique de construction de l’estimateur. On y présentera de plus, concer-
nant l’estimation de la moyenne conditionnelle et de quantiles conditionnels,
une façon de travailler avec plusieurs variables explicatives, au moyen d’un
modèle additif et l’introduction d’un algorithme de type backfitting (voir Has-
tie et Tibshirani, 1990), dont le principe sera donné dans cette partie IV. On
étudie dans un deuxième temps des données réelles dans le but d’apporter des
réponses à un problème de prévision de pics de pollution dans la région de
Toulouse (France). Ce thème de recherche, très important du point de vue
de la protection de l’environnement, est l’objet de nombreuses études. Par
exemple, Ghattas (1999) propose une méthode de prévision de pics de pol-
lution par l’ozone à l’aide d’arbres de régression (voir à ce sujet Breiman,
Friedman, Olshen et Stone, 1984), utilisant un jeu de données mesurées dans
la région de Marseille (France). Damon et Guillas (2002) ont un point de vue
davantage fonctionnel (observation de courbes d’ozone) et basent leur méthode
de prévision de pics de pollution sur un modèle autorégressif hilbertien (voir
Bosq, 2000).
Dans cette partie, le but est d’utiliser les méthodes introduites dans chacune
des parties I, II et III de cette thèse sur un jeu de données réel pour
• construire une prévision du pic de pollution du lendemain à l’aide de la

moyenne, de la médiane,
• construire des intervalles de prédiction du pic de pollution du lendemain,
28
• tenir compte des erreurs de mesure éventuelles sur la ou les variables

explicatives.
Ces données ont été fournies par l’ORAMIP (Observatoire Régional de l’Air
en Midi-Pyrénées). Elles ont déjà été l’objet d’une précédente étude à l’aide
d’une méthode d’estimation nonparamétrique à noyau par Aneiros-Perez, Car-
dot, Estevez-Perez et Vieu (2004). Il s’agit de l’estimateur nonparamétrique
préalablement introduit par Ferraty et Vieu (2002). Aneiros-Perez, Cardot,
Estevez-Perez et Vieu (2004) utilisent de plus le même type d’algorithme back-
fitting que celui présenté dans cette thèse pour étudier le cas de plusieurs
variables explicatives.
Sans rentrer pour le moment dans les détails (les données seront présentées
dans la partie IV), ces données consistent en des mesures horaires de pol-
luants (comme par exemple l’ozone, qui sera le polluant auquel on s’intéresse,
ou encore le monoxyde d’azote), ainsi que de variables météorologiques (par
exemple la vitesse du vent). Une première partie du travail consistera en une
étude descriptive de ces données (qui ont nécessité un premier traitement, en
raison de données manquantes). Puis dans un deuxième temps, on présentera
des méthodes de prévision de pics de pollution (par la moyenne conditionnelle,
la médiane conditionnelle, par intervalles de prédiction) en utilisant les esti-
mateurs splines considérés dans cette thèse (l’estimateur spline de quantiles
conditionels présenté dans la partie I, l’estimateur de la moyenne condition-
nelle par splines de lissage présenté dans la partie II, celui de la partie III
tenant compte des erreurs de mesure, ainsi que l’estimateur de la moyenne
conditionnelle par splines de régression de Cardot, Ferraty et Sarda, 2003).
PARTIE I
ESTIMATION SPLINE DE
QUANTILES
CONDITIONNELS POUR
VARIABLE EXPLICATIVE
FONCTIONNELLE
I.1. PRÉSENTATION DE L’ESTIMATEUR
Dans ce chapitre de présentation, on donne le principe de la construction

de l’estimateur de quantiles conditionnels pour variables explicatives fonction-
nelles, ainsi que des propriétés asymptotiques concernant cet estimateur. Ce
travail a fait l’objet d’une note aux Comptes Rendus de l’Académie des Sciences
(voir Cardot, Crambes et Sarda, 2004a) et d’un article paru dans Journal of
Nonparametric Statistics (voir Cardot, Crambes et Sarda, 2005). Cet article
est donné au chapitre I.2.
On reprend les notations introduites précédemment, et on considère dans

cette partie que l’on se place dans un cadre de modèle à plan aléatoire. Ainsi,
X1 , . . . , Xn sont des variables aléatoires indépendantes, identiquement dis-
tribuées (de même loi qu’une variable aléatoire X), à valeurs dans L2 ([0, 1]),
muni de son produit scalaire usuel et de sa norme associée. Pour simplifier, mais
sans perdre en généralité, on suppose que les variables X1 , . . . , Xn sont centrées,
c’est-à-dire que E (X) = 0. Les variables aléatoires Y1 , . . . , Yn sont à valeurs
dans R, de même loi qu’une variable aléatoire Y . On supposera également qu’il
y a indépendance entre les couples (X1 , Y1 ), . . . , (Xn , Yn ). Comme cela a été
souligné dans l’introduction de cette thèse, lorsqu’on se place du point de vue
de la régression sur quantiles, étant donné un nombre réel α ∈]0, 1[, on cherche
rα qui minimise le critère (5). En faisant l’hypothèse d’un modèle linéaire, on
écrit rα (Xi ) = hΨα , Xi i pour tout i = 1, . . . , n avec Ψα ∈ L2 ([0, 1]). En faisant
sur Ψα des hypothèses de régularité qui seront détaillées plus loin, on souhaite
proposer ici un estimateur spline de Ψα , en utilisant les splines de régression.
De façon analogue à ce qui est fait par Cardot, Ferraty et Sarda (2003) dans le
cadre de l’estimation de la moyenne conditionnelle, on estime alors Ψα comme
une combinaison linéaire des fonctions de base des fonctions B-splines. Pour
estimer le vecteur θb des coefficients de cette combinaison linéaire, il est naturel
32 I.1. PRÉSENTATION DE L’ESTIMATEUR
de revenir au critère (5) à minimiser. Cependant, en raison de la décroissance

vers zéro des valeurs propres de l’opérateur de covariance associé à X (voir à
ce sujet Dauxois, Pousse et Romain, 1982) , on va considérer un problème de
minimisation pénalisé. Cette approche par pénalisation est préconisée comme
on l’a vu en introduction notamment par Wahba (1990) ainsi que Green et
Silverman (1994) dans le cadre univarié et l’estimation de la moyenne condi-
tionnelle par splines de lissage. Toujours dans le cadre univarié, Koenker, Hu
et Portnoy (1994) ont également une approche par pénalisation pour l’esti-
mation de quantiles conditionnels par splines de lissage, à la différence que
leur pénalisation porte sur la norme L1 de la dérivée d’ordre m de la fonc-
tion cherchée (avec m ∈ N). Pour des raisons de facilité de mise en œuvre en
pratique, on considèrera ici une pénalisation portant sur la norme L2 de la
dérivée d’ordre m de la fonction cherchée. Cette approche est utilisée par Car-
dot, Ferraty et Sarda (2003) dans le cadre de l’estimation spline de la moyenne
conditionnelle dans le modèle linéaire fonctionnel. Cette pénalisation va per-
mettre de contrôler le degré de lissage de la solution, par l’intermédiaire d’un
paramètre de lissage ρ. Elle permet de faire un compromis entre l’ajustement
au données et le lissage de l’estimateur. Notons que le problème de minimi-
sation que l’on considèrera aura une solution Ψ b α que l’on ne peut pas écrire
explicitement, contrairement au problème de minimisation (7) dans le cadre de
l’estimation de la moyenne conditionnelle. Ceci est dû au fait que la fonction
objectif lα n’est pas dérivable en zéro. Néanmoins, une méthode algorithmique
de résolution de ce problème sera présentée dans la partie IV de cette thèse.
On présente maintenant le résultat de convergence de cet estimateur spline.

Dorénavant, on fait dépendre le nombre de nœuds k et le paramètre de lissage
ρ de n (k = kn tend vers l’infini et ρ = ρn tend vers zéro lorsque n tend vers
l’infini). Comme dans les travaux de Cardot, Ferraty et Sarda (2003), l’exis-
tence et le comportement asymptotique de cet estimateur sont liés à l’inversion
de la matrice
b ρn = 1 A τ A + ρ n G k n ,
C
n
où A est la matrice de taille n × (k + q) et de terme général hXi , Bj i pour

i = 1, . . . , n et j = 1, . . . , k + q. Ainsi, pour pouvoir inverser cette matrice
b ρn , il faut contrôler le comportement de ses plus petites valeurs propres. Plus
C
précisément, la plus petite valeur propre de C b ρn , notée λmin (C
b ρn ), tend vers
zéro lorsque n tend vers l’infini, la vitesse de convergence de l’estimateur va
I.1. PRÉSENTATION DE L’ESTIMATEUR 33
b ρn ). On introduit
donc dépendre de la vitesse de convergence vers zéro de λmin (C
donc une suite (ηn )n∈N telle que l’espace Ωn défini par
n o
(I.1) b ρn ) > cηn ,
Ωn = ω ∈ Ω/λmin (C
soit de probabilité tendant vers 1 lorsque n tend vers l’infini (avec c constante).
Cardot, Ferraty et Sarda (2003) montrent qu’une telle suite (ηn )n∈N existe et
qu’on a même

(I.2) b ρn ) ≥ cηn + oP (k 2 n1−δ )−1/2 ,
λmin (C n
avec δ ∈]0, 1[ et
ρn
ηn = .
kn
Les hypothèses permettant d’établir le résultat de convergence de notre esti-

mateur Ψ b α sont classiques en statistique fonctionnelle (voir notamment Cardot,
Ferraty et Sarda, 2003, ainsi que Bosq, 2000, pour des hypothèses analogues).
On suppose que la variable X est presque sûrement bornée dans L2 et que la
dérivée d’ordre p0 de la fonction Ψα est ν-höldérienne (et on pose p = p0 + ν),
cette hypothèse de régularité sur la fonction Ψα permettant essentiellement
d’utiliser une approximation spline de Ψα par un résultat dû à de Boor (1978).
On suppose également que les valeurs propres de l’opérateur de covariance
associé à X a ses valeurs propres strictement positives. Enfin, une dernière
hypothèse technique concerne la densité conditionnelle de = Y − hΨα , Xi
sachant X = x : cette densité sera supposée continue et bornée inférieurement
en zéro par une constante strictement positive, uniformément par rapport à
x ∈ L2 ([0, 1]). Cette dernière hypothèse permet notamment d’assurer l’unicité
du quantile conditionnel d’ordre α.
Sous ces hypothèses, on donne alors une borne pour la vitesse de convergence
de Ψb α vis-à-vis de la semi-norme induite par l’opérateur de covariance associé
à X (voir le théorème I.1 de Cardot, Crambes et Sarda, 2005, donné dans le
chapitre I.2). Comme on le verra alors, cette vitesse est un
34 I.1. PRÉSENTATION DE L’ESTIMATEUR

1 1 ρ2n 2(m−p)
OP + + + ρ n kn .
kn2p nηn kn ηn
Un corollaire immédiat de ce résultat est obtenu en prenant en particulier

ηn = ρn /kn , comme dans Cardot, Ferraty et Sarda (2003). Puis, en optimisant
cette vitesse par choix de kn et ρn , on obtient une vitesse en OP n−2p/(4p+1) .
On retrouve ainsi la vitesse obtenue par Cardot, Ferraty et Sarda (2003) dans
le cadre de l’estimation spline de la moyenne conditionnelle lorsque la variable
explicative est fonctionnelle.
I.2. QUANTILE REGRESSION WHEN
THE COVARIATES ARE FUNCTIONS
I.4.1. Introduction
Because of the increasing performances of measurement apparatus and com-

puters, many data are collected and saved on thinner and thinner time scales
or spatial grids (temperature curves, spectrometric curves, satellite images,
. . . ). So, we are led to process data comparable to curves or more generally
to functions of continuous variables (time, space). These data are called func-
tional data in the literature (see Ramsay and Silverman, 2002). Thus, there
is a need to develop statistical procedures as well as theory for this kind of
data and actually many recent works study models taking into account the
functional nature of the data.
Mainly in a formal way, the oldest works in that direction intended to give
a mathematical framework based on the theory of linear operators in Hilbert
spaces (see Deville, 1974, Dauxois and Pousse, 1976). After that and in an other
direction, practical aspects of extensions of descriptive statistical methods like
for example Principal Component Analysis have been considered (see Besse
and Ramsay, 1986). The monographs by Ramsay and Silverman (1997, 2002)
are important contributions in this area.
As pointed out by Ramsay and Silverman (1997), “the goals of functional

data analysis are essentially the same as those of other branches of Statistics” :
one of this goal is the explanation of variations of a dependent variable Y
(response) by using information from an independent functional variable X
(explanatory variable). In many applications, the response is a scalar : see
Frank and Friedman (1993), Ramsay and Silverman (1997), ... Traditionally,
36 I.2. QUANTILE REGRESSION WHEN THE COVARIATES ARE FUNCTIONS
one deals, for such a problem, with estimating the regression on the mean i.e.
the minimizer among some class of functionals r of

E (Y − r(X))2 .
As when X is a vector of real numbers, the two main approaches are li-
near (see Ramsay and Dalzell, 1991, for the functional linear model) or purely
nonparametric (see Ferraty and Vieu, 2002, which adapt kernel estimation to
the functional setting). It is also known that estimating the regression on the
median or more generally on quantiles has some interest. The problem is then
to estimate the minimizer among gα of
(I.3) E [lα (Y − gα (X))] ,
where lα (u) = |u|+(2α−1)u. The value α = 1/2 corresponds to the conditional

median whereas values α ∈]0, 1[ correspond to conditional quantiles of order
α. The advantage of estimating conditional quantiles may be found in many
applications such as in agronomy (estimation of yield thresholds), in medicine
or in reliability. Besides robust aspects of the median, it may also help to derive
some kind of confidence prediction intervals based on quantiles.
In our work, we assume that the conditional quantile of order α can be

written as
(I.4) gα (X) = hΨα , Xi,
where h., .i is a functional inner product and the parameter of the model Ψα
is a function to be estimated. This is the equivalent of the linear model for
regression quantiles studied by Koenker and Bassett (1978) where the inner
product is the Euclidean one and the parameter is a vector of scalars. We choose
to estimate the function Ψα by a “direct” method : writing our estimator
as a linear combination of B-splines, it minimizes the empirical version of
expectation (I.3) with the addition of a penalty term proportional to the square
norm of a given order derivative of the spline. The penalization term allows
on one side to control the regularity of the estimator and on the other side to
get consistency.
I.4.2. CONSTRUCTION OF THE ESTIMATOR 37
Unlike for the square function, minimization of function lα does not lead to
an explicit expression of the estimator. While computation of the estimator can
be resolved by using traditional algorithms (for instance based on Iteratively
Weighted Least Squares), the convexity of lα allows theoretical developments.
In section 2, we define more precisely the framework of our study and the
spline estimator of the functional parameter Ψα . Section 3 is devoted to the
asymptotic behaviour of our estimator : we study L2 convergence and derive
an upper bound for the rate of convergence. Comments on the model and on
the optimality of the rate of convergence are given in section 4. Finally, the
proofs are gathered in section 5.
I.4.2. Construction of the estimator
In this work, the data consist of an i.i.d. sample of pairs (Xi , Yi )i=1,...,n drawn
from a population distribution (X, Y ). We consider explanatory variables Xi
which are square integrable (random) functions defined on [0, 1], i.e. are ele-
ments of the space L2 ([0, 1]) so that Xi = (Xi (t), t ∈ [0, 1]). The response Yi is
a scalar belonging to R. Assume that H, the range of X, is a closed subspace of
L2 ([0, 1]). For Y having a finite expectation, E(|Y |) < +∞, and for α ∈]0, 1[,
the conditional α-quantile functional gα of Y given X is a functional defined
on H minimizing (I.3).
Our aim is to generalize the linear model introduced by Koenker and Bassett
(1978). In our setting, it consists in assuming that gα is a linear and continuous
functional defined on H and then it follows that gα (X) can be written as in
(I.4). Taking the usual inner product in L2 ([0, 1]), we can write
Z 1
gα (X) = hΨα , Xi = Ψα (t)X(t) dt,
0
where Ψα is the functional coefficient in H to be estimated, the order α being

fixed. From now on we consider, for simplicity, that the random variables Xi
are centered, that is to say E(Xi (t)) = 0, for t a. e.
When X is multivariate, Bassett and Koenker (1978) study the least absolute
error (LAE) estimator for the conditional median, which can be extended to
any quantile replacing the absolute value by the convex function lα in the
criterion to be minimized (see Koenker and Bassett, 1978). In our case where
we have to estimate a function belonging to an infinite dimensional space,
we are looking at an estimator in the form of an expansion in some basis of
B-splines functions and then minimizing a similar criterion with however the
addition of a penalty term.
Before describing in details the estimation procedure, let us note that esti-
mation of conditional quantiles has received a special attention in the multi-
variate case. As said before, linear modelling has been mainly investigated by
Bassett and Koenker (1978). For nonparametric models, we may distinguish
two different approaches : “indirect” estimators which are based on a prelimi-
nary estimation of the conditional cumulative distribution function (cdf) and
“direct” estimators which are based on the minimizing the empirical version of
criterion (I.3). In the class of “indirect” estimators, Bhattacharya and Gango-
padhyay (1990) study a kernel estimator of the conditional cdf, and estimation
of the quantile is achieved by inverting this estimated cdf. In the class of “di-
rect” estimators, kernel estimators based on local fit have been proposed (see
Tsybakov, 1986, Lejeune and Sarda, 1988 or Fan, Hu and Truong, 1994) ; in
a similar approach, He and Shi (1994) and Koenker, Ng and Portnoy (1994)
propose a spline estimator. Although our setting is quite different, we adapt
in our proofs below some arguments of the work by He and Shi (1994).
In nonparametric estimation, it is usual to assume that the function to be

estimated is sufficiently smooth so that it can be expended in some basis : the
degree of smoothness is quantified by the number of derivatives and a lipschitz
condition for the derivative of greatest order (see condition (H.2) below). It is
also quite usual to approximate such kind of functions by means of regression
splines (see de Boor, 1978, for a guide for splines). For this, we have to select
a degree q in N and a subdivision of [0, 1] defining the position of the knots.
Although it is not necessary, we take equispaced knots so that only the number
of the knots has to be selected : for k in N? , we consider k−1 knots that define a
subdivision of the interval [0, 1] into k sub-intervals. For asymptotic theory, the
degree q is fixed but the number of sub-intervals k depends on the sample size
n, k = kn . It is well-known that a spline function is a piecewise polynomial :
we consider here piecewise polynomials of degree q on each sub-interval, and
(q − 1) times differentiable on [0, 1]. This space of spline functions is a vectorial
space of dimension k+q. A basis of this vectorial space is the set of the so-called
normalized B-spline functions, that we note by Bk,q = (B1 , . . . , Bk+q )τ .
I.4.2. CONSTRUCTION OF THE ESTIMATOR 39
Then, we estimate Ψα by a linear combination of functions Bl . This leads

us to find a vector θb = (θb1 , . . . , θbk+q )τ in Rk+q such that
k+q
X
(I.5) bα =
Ψ θbl Bl = Bτk,q θ.
b
l=1
It is then natural to look for Ψ b α as the minimizer of the empirical version of

(I.3) among functional gα of the form (I.4) with functions Ψα belonging to the
space of spline functions defined above. We will however consider a penalized
criterion as we will see now. In our setting, the pseudo-design matrix A is the
matrix of dimension n × (k + q) and elements hXi , Bj i for i = 1, . . . , n and
j = 1, . . . , k + q. Even if we do not have an explicit expression for a solution
to the minimization problem, it is known that the solution would depend on
the properties of the inverse of the matrix n1 Aτ A which is the (k + q) × (k + q)
matrix with general term hΓX,n (Bj ), Bl i, where ΓX,n is the empirical version
of the covariance operator ΓX of X defined for all u in L2 ([0, 1]) by
(I.6) ΓX u = E (hX, uiX) .
We know that ΓX is a nuclear operator (see Dauxois, Pousse and Romain,

1982), consequently no bounded inverse exists for this operator. Moreover, as a
consequence of the first monotonicity principle (see theorem 7.1 in Weinberger,
1974), the restriction of this operator to the space of spline functions has
smaller eigenvalues than ΓX . Finally, it appears to be impossible to control
the speed of convergence to zero of the smallest eigenvalue of n1 Aτ A (when n
tends to infinity) : in that sense, we are faced with an inversion problem that
can be qualified as ill-conditioned. A way to circumvent this problem is to
introduce a penalization term in the minimization criterion (see Ramsay and
Silverman, 1997, or Cardot, Ferraty and Sarda, 2003, for a similar approach
in the functional linear model). Thus, the main role of the penalization is to
control the inversion of the matrix linked to the solution of the problem and it
consists in restricting the space of solutions. The penalization introduced below
will have another effect since we also want to control the smoothness of our
estimator. For this reason, and following several authors (see references above),
we choose a penalization which allows to control the norm of the derivative of
order m > 0 of any linear combination of B-spline functions, so that it can be
expressed matricially. Denoting by (Bτk,q θ)(m) the m-th derivative of the spline
function Bτk,q θ, we have

τ
(Bk,q θ)(m) 2 2 = θ τ Gk θ, ∀θ ∈ Rk+q ,
L
(m) (m)
where Gk is the (k + q) × (k + q) matrix with general term hBj , Bl i.
Then, the vector θb in (I.5) is chosen as the solution of the following mini-
mization problem
X n
1 τ 2
(I.7) min lα (Yi − hBk,q θ, Xi i) + ρ (Bk,q θ) L2 ,
τ (m)
θ∈Rk+q n
i=1
where ρ is the penalization parameter. In the next section, we present a conver-

gence result of the solution of (I.7). Note that the role of the penalization also
clearly appears in this result.
I.4.3. Convergence result
We present in this section the main result on the convergence of our esti-
mator, when n goes to infinity (k = kn → +∞, ρ = ρn → 0). The behaviour
of our estimator is linked to a penalized version of the matrix C b = 1 Aτ A.
n
More precisely, adopting the same notations as in Cardot, Ferraty and Sarda
(2003), the existence and convergence of our estimator depend on the inverse
of the matrix Cb ρn = C b + ρn Gkn . Under the hypotheses of theorem I.1 below,
the smallest eigenvalue of C b ρn , noted λmin (C
b ρn ), tends to zero as the sample
size n tends to infinity. As the rate of convergence of Ψ b α depends on the speed
b
of convergence of λmin (Cρn ) to zero, we introduce a sequence (ηn )n∈N such that
the set Ωn defined by
n o
(I.8) b ρn ) > cηn ,
Ωn = ω/λmin (C
has probability which goes to 1 when n goes to infinity. Cardot, Ferraty and
Sarda (2003) have shown that such a sequence exists in the sense that under
I.4.3. CONVERGENCE RESULT 41
hypotheses of theorem I.1, there exists a strictly positive sequence (ηn )n∈N
tending to zero as n tends to infinity and such that

(I.9) b ρn ) ≥ cηn + oP (k 2 n1−δ )−1/2 ,
λmin (C n
with δ ∈]0, 1[.
b α , we assume that the

To prove the convergence result of the estimator Ψ
following hypotheses are satisfied.
(A.1) kXkL2 ≤ C0 < +∞, a.s.
(p0 )
(A.2) The function Ψα is supposed to have a p0 -th derivative Ψα such that

(p0 ) (p0 )
Ψα (t) − Ψα (s) ≤ C1 |t − s|ν , s, t ∈ [0; 1],
where C1 > 0 and ν ∈ [0, 1]. In what follows, we set p = p0 + ν and we suppose
that q ≥ p ≥ m.
(A.3) The eigenvalues of ΓX (defined in (I.6)) are strictly positive.
(A.4) For x ∈ H, the random variable defined by = Y − hΨα , Xi has

conditional density function fx given X = x, continuous and bounded below
by a strictly positive constant at 0, uniformly for x ∈ H.
We derive in theorem I.1 below an upper bound for the rate of convergence
with respect to some kind of L2 -norm. Indeed, the operator ΓX is strictly
non-negative, so we can associate it a semi-norm noted k.kΓX and defined by
kuk2ΓX = hΓX u, ui. Then, we have the following result.
Theorem I.1. — Under hypotheses (A.1) − (A.4), if we also suppose that

there exists β, γ in ]0, 1[ such that kn ∼ nβ , ρn ∼ n−γ and ηn ∼ n−β−(1−δ)/2
(where δ is defined in relation (I.9)), then
b α exists and is unique except on a set whose probability goes to zero as

(i) Ψ
n goes to infinity,
2
b 1 1 ρ2n 2(m−p)
(ii) Ψα − Ψα = OP + + + ρ n kn .
ΓX kn2p nηn kn ηn
I.4.4. Some comments
(i) Hypotheses (A.1) and (A.3) are quite usual in the functional setting : see for
instance Bosq (2000) or Cardot, Ferraty and Sarda (2003). Hypothesis (A.4)
implies uniqueness of the conditional quantile of order α.
(ii) Some arguments in the proof of theorem I.1 are inspired from the demons-
tration of He and Shi (1994) within the framework of real covariates. Moreover,
some results from Cardot, Ferraty and Sarda (2003) are also useful, mainly to
deal with the penalization term as pointed out above. Note that it is assumed
in the model of He and Shi (1994) that the error term is independent of X :
condition (A.4) allows us to deal with a more general setting, as in Koenker
and Bassett (1978).
(iii) It is possible to choose particular values for β and γ to optimize the

upper bound for the rate of convergence in theorem I.1. In particular, we
remark the importance to control the speed of convergence to zero of the
smallest eigenvalue of C b ρn by ηn . For example, Cardot, Ferraty and Sarda
(2003) have shown that, under hypotheses of theorem I.1, relation (I.9) is true
with ηn = ρn /kn . This gives us
2
b 1 kn 2(m−p)
Ψ α − Ψ α = O P + + ρ n + ρ n kn .
ΓX kn2p nρn
A corollary is obtained if we take kn ∼ n1/(4p+1) and ρn ∼ n−2p/(4p+1) ; then we

get
2
b
Ψα − Ψα = OP n−2p/(4p+1) .
ΓX
We can imagine that, with stronger hypotheses on the random function X, we

can find a sequence ηn greater than ρn /kn , that will improve the convergence
speed of the estimator. As a matter of fact, the rate derived in theorem I.1
does not imply the rate obtained by Stone (1982), that is to say a rate of
I.4.4. SOME COMMENTS 43
order n−2p/(2p+1) . Indeed, suppose that 1/kn2p , 1/(nηn ) and ρ2n /(kn ηn ) are all of
order n−2p/(2p+1) . This would imply that kn ∼ n1/(2p+1) and ηn ∼ n−1/(2p+1) ,
which contradicts the condition ηn ∼ n−β−(1−δ)/2 . Nevertheless, it is possible
to obtain a speed of order n−2p/(2p+1)+κ . This leads to kn ∼ n1/(2p+1)−κ/(2p)
and ηn ∼ n−1/(2p+1)−κ . Then, the condition ηn ∼ n−β−(1−δ)/2 implies κ =
p(1 − δ)/(2p + 1). So finally, we get kn ∼ n(1+δ)/2(2p+1) , ρn ∼ n(−4p−1+δ)/4(2p+1)
and ηn ∼ n(−p−1+pδ)/(2p+1) . The convergence result would be then
2
b
Ψα − Ψα = OP n−p(1+δ)/(2p+1) .
ΓX
2(m−p)
A final remark is that the last term ρn kn of the speed in theorem I.1 is
not always negligible compared to the other terms. However, it will be the case
if we suppose that m ≤ p/(1 + δ) + (1 − δ)/4(1 + δ).
(iv) This quantile estimator is quite useful in practice, specially for forecasting
purpose (by conditional median or inter-quantiles intervals). From a computa-
tional point of view, several algorithms may be used : we have implemented in
the R language an algorithm based on the Iterated Reweighted Least Square
(IRLS). Note that even for real data cases, the curves are always observed
in some discretization points, the regression splines is easy to implement by
approximating inner products with quadrature rules. The IRLS algorithm (see
Ruppert and Carroll, 1988, Lejeune and Sarda, 1988) allows to build conditio-
nal quantiles spline estimators and gives satisfactory forecast results. This algo-
rithm has been used in particular on the “ORAMIP” (“Observatoire Régional
de l’Air en Midi-Pyrénées”) data to forecast pollution in the city of Toulouse
(France) : the results of this practical study are described in Cardot, Crambes
and Sarda (2004b). We are interested in predicting the ozone concentration
one day ahead, knowing the ozone curve (concentration along time) the day
before. In that special case, conditional quantiles were also useful to predict an
ozone threshold such that the probability to exceed this threshold is a given
risk 1−α. In other words, it comes back to give an estimation of the α-quantile
maximum ozone knowing the ozone curve the day before.
I.4.5. Proof of the convergence result
b α −Ψα
The proof of the result is based on the same kind of decomposition of Ψ
as the one used by He and Shi (1994). The main difference comes from the fact
that our design matrix is ill-conditioned, which led us to add the penalization
term treated using some arguments from Cardot, Ferraty and Sarda (2003).
Hypothesis (A.2) implies (see de Boor, 1978) that there exists a spline function
Ψ?α = Bτkn ,q θ ? , called spline approximation of Ψα , such that
C2
(I.10) sup |Ψ?α (t) − Ψα (t)| ≤ .
t∈[0,1] knp
In what follows, we set Ri = hΨ?α − Ψα , Xi i. We deduce from (I.10) and from

hypothesis (A.1) that there exists a positive constant C3 such that
C3
(I.11) max |Ri | ≤ , a.s.
i=1,...,n knp
The operator ΓX,n allows to define the empirical version of the L2 norm by
kuk2ΓX,n = hΓX,n u, ui. At first, we show the result (ii) of theorem I.1 for the
penalized empirical L2 norm. Writing Ψ b α − Ψ α = (Ψ
b α − Ψ?α ) + (Ψ?α − Ψα ), we
get
2 2
b b (m)
Ψα − Ψ α + ρ n ( Ψ α − Ψ α )
2
ΓX,n L
n
X n
X
2 b α − Ψ?α , Xi i2 + 2
≤ hΨ hΨ?α − Ψα , Xi i2
n i=1
n i=1
2 2
b ? (m)
+2ρn (Ψ α − Ψ α ) 2 + 2ρn (Ψ?α − Ψα )(m) L2 .
L
Now, using again hypothesis (A.1), we get almost surely and for all i =
1, . . . , n, the inequality hΨ?α − Ψα , Xi i2 ≤ C02 C22 /kn2p . Moreover, lemma 8 of
I.4.5. PROOF OF THE CONVERGENCE RESULT 45
Stone (1985) gives us the existence of a positive constant C4 that satisfies

(Ψα − Ψ? )(m) 2 2 ≤ C4 kn2(m−p) . So we deduce
α L
2 2
b b (m)
(I.12) Ψα − Ψ α + ρ n ( Ψ α − Ψ α )
2
ΓX,n L
n
2X b 2
b ? (m)
≤ hΨα − Ψ?α , Xi i2 + 2ρn (Ψ α − Ψ α ) 2
n i=1 L
2C02 C22
+ 2p + 2C4 ρn kn2(m−p) , a.s.
kn
Our goal is now to compare our estimator Ψ b α with the spline approximation
?
Ψα . For that, we adopt the following transformation θ = Cb −1/2 ?
ρn β + θ . Then,
we define on the set Ωn
h
i
fi (β) = lα Yi − b −1/2
hBτkn ,q
?
C ρn β + θ , X i i
h i(m)
τ −1/2 2
b
+ρn Bkn ,q Cρn β + θ ? .

L2
P
We notice that minimizing ni=1 fi (β) comes back to the minimization of the
criterion (I.7). We are interested by the behaviour of the function fi around
zero : fi (0) is the value of our loss criterion when θ = θ ? . Let us also notice
that the inverse of the matrix C b ρn appears in the definition of fi . This inverse
exists on the set Ωn defined by (I.8), and which probability goes to 1 as n goes
to infinity. Lemma I.1 below allows us to get the results (i) and (ii) of theorem
I.1 for the penalized empirical L2 norm.
Lemma I.1. — Under the hypotheses of theorem p I.1, for all > 0, there
exists L (sufficiently large) and (δn )n∈N with δn = 1/(nηn ) + ρ2n /(kn ηn ) such
that, for n large enough
" n n
#
X X
P inf fi (β) > fi (0) > 1 − .
kβk=Lδn
i=1 i=1
We use convexity arguments to prove the result (i). The existence of the solu-
tion of the minimization problem (I.7) is guaranteed since the function to be
minimized is convex, if we keep in mind that
(m)
2
ρθ τ Gk θ = ρ Bτk,q θ 2 ≥ 0.
L
Using the convexity of fi , the result of lemma I.1 means that for all > 0 there
exists L such that, for n large enough (asP Lδn goes to zero), we can not find
more than one minimum for the function ni=1 fi with probability 1 − .
As we use the one-to-one transformation θ = C b ρ−1/2

n β + θ ? on the set Ωn , we
deduce the existence and the uniqueness of the solution of (I.7) on a subset of
Ωn whose probability goes to one as n goes to infinity, which proves point (i)
of theorem I.1.
Now, let be strictly positive ; using lemma I.1 and the convexity of function
fi , there exists L such that, for n large enough
" n n
#
X X
(I.13) P inf fi (β) > fi (0) > 1 − .
kβk≥Lδn
i=1 i=1
On the other hand, using the definition of fi and the minimization criterion
(I.7), we have
1 X b 1/2 b b 1/2 ?
n
f i C ρn θ − C ρn θ
n i=1
" n #
1X 2
(m)
= inf lα Yi − hBτkn ,q θ, Xi i + ρn Bτkn ,q θ 2 ,
θ∈R k n +q n i=1 L
so we finally get
1 X b 1/2 b b 1/2 ? 1 X
n n
f i C ρn θ − C ρn θ ≤ fi (0).
n i=1 n i=1
Then, combining this with equation (I.13), we obtain
" #
n
X n
X
(I.14) P inf fi (β) > b 1/2 θb − C
fi C b 1/2 θ ? > 1 − .
ρn ρn
kβk≥Lδn
i=1 i=1
b ρn , we have
Now, using the definition of C
" #
n
1X b 2
b ? (m)
P hΨα − Ψ?α , Xi i2 + ρn (Ψ α − Ψ α ) 2 ≤ L2 δn2
n i=1 L
h i
b 1/2 b ?
= 1 − P C ρn (θ − θ ) > Lδn
" #
X n Xn
≥ P inf fi (β) > fi Cb 1/2 θb − C
b 1/2 θ ? .
ρn ρn
kβk≥Lδn
i=1 i=1
With relation (I.14), this last probability is greater than 1 − , so we obtain
n 2
1X b ? 2 b ? (m) 2
1 ρ2n
h Ψ α − Ψ α , Xi i + ρ n ( Ψ α − Ψ α ) = O P δ n = O P + .
n i=1 L2 nηn kn ηn
This last result, combined with inequality (I.13) finally gives us the equivalent
of result (ii) for the penalized empirical L2 norm. Point (ii) (with the norm
k.kΓX ) then follows from lemma I.2 below, and achieves the proof of theorem
I.1 (ii).
Lemma I.2. — Let f and g be two functions supposed to be m times diffe-
rentiable and such that
kf − gk2ΓX,n + ρn k (f − g)(m) k2L2 = OP (un ),

with un going to zero when n goes to infinity. Under hypotheses (A.1) and
(A.3) and if moreover kgkL2 and kg (m) kL2 are supposed to be bounded, we have
kf − gk2ΓX = OP (un ).
Proof of lemma I.1. — This proof is based on three preliminary lemmas. We

denote by Tn the set of the random variables (X1 , . . . , Xn ). Under hypotheses
of theorem I.1, we have the following results.
Lemma I.3. — There exists a constant C5 such that, on the set Ωn defined
by (I.8), we have
C kβk
τ b −1/2 5
max hBkn ,q Cρn β, Xi i ≤ √ , a.s.
i=1,...,n kn ηn
Lemma I.4. — For all > 0 and for any sequence (Ln ) such that Ln ≤
p
nkn ηn δn2 , we have
" #
Xn

lim P sup (fi (Ln δn β) − fi (0) − E [fi (Ln δn β) − fi (0)|Tn ]) > δn2 n = 0.
n→+∞ kβk=1
i=1
Lemma I.5. — For all > 0, there exists L = L (sufficiently large) such
that
" n
#
X
P inf E [fi (Lδn β) − fi (0)|Tn ] > δn2 n > 1 − .
kβk=1
i=1
These three lemmas allow us to prove lemma I.1. Indeed, let L be a strictly
positive real number ; we denote
n
X
An = (fi (Lδn β) − fi (0)) ,
i=1
and
n
X
Bn = E [fi (Lδn β) − fi (0)|Tn ] .
i=1
Using lemmas I.4 and I.5, given > 0, we can find L = L such that, for n
large enough, P inf kβk=1 Bn > δn2 n > 1 − and supkβk=1 |An − Bn | = oP (δn2 n).
Then, we deduce
" n n
#
X X
P inf fi (Lδn β) − fi (0) > 0 > 1 − ,
kβk=1
i=1 i=1
which achieves the proof of lemma I.1.
Proof of lemma I.3. — Using lemma 6.2 of Cardot, Ferraty and Sarda
(2003), we have
b ρn ) ≥ C 0 ηn + oP ((k 2 n1−δ )−1/2 ).

λmin (C 5 n
2
b ρ−1/2 b −1 2
Noticing that hBτkn ,q C n β, X i τ
i ≤ hBkn ,q , Xi iCρn hBkn ,q , Xi i kβk , we de-
duce that
2
τ b −1/2
hBkn ,q Cρn β, Xi i

1
2 1−δ −1/2
≤ hBτkn ,q , Xi ihBkn ,q , Xi i kβk2 + o P kn n ,
C50 ηn

b −1 b −1 ) = 1/λmin (C
b ρn ). Then, noticing that
using the fact that Cρn = λmax (C ρn
kn +q
X
1
hBτkn ,q , Xi ihBkn ,q , Xi i = 2
hBj , Xi i = O ,
j=1
kn
2
τ b −1/2
this gives us hBkn ,q Cρn β, Xi i ≤ C500 kβk2 /(kn ηn ) + oP n(δ−1)/2 almost
surely, and achieves the proof of lemma I.3.
Proof of lemma I.4. — Considering the definition of functions fi and lα ,

we have

X n

sup fi (Lδn β) − fi (0) − E [fi (Lδn β) − fi (0)|Tn ]
kβk≤1 i=1

X n
τ b −1/2
= sup i − Lδn h Bkn ,q Cρn β, Xi i − Ri − |i − Ri |
kβk≤1 i=1

h i
b −1/2 β, Xi i − Ri − |i − Ri ||Tn ,

−E i − Lδn hBτkn ,q C ρn

where 1 , . . . , n are n real random variables independent and identically dis-

tributed defined
by i = Yi − hΨα , Xi i for all i = 1, . . . , n. Let us also denote
b ρ−1/2
∆i (β) = i − Lδn hBτkn ,q C n β, Xi i − Ri − |i − Ri |. To prove lemma I.4, it
suffices to show that, for all > 0, there exists L = Ln such that
!
Xn

lim P sup [∆i (β) − E(∆i (β)|Tn )] > δn2 n = 0.
n→+∞ kβk≤1
i=1
Let be a real number strictly

positive and C the subset of Rkn +q defined by
kn +q
C= β∈R / kβk ≤ 1 . As C is a compact set, we can cover it with open
S n
balls, that is to say C = K j=1 Cj with Kn chosen, for all j from 1 to Kn , such
that
√
δn kn ηn
(I.15) diam (Cj ) ≤ .
8C5 L
Hence
kn +q
8C5 L
(I.16) Kn ≤ √ .
δn kn ηn
Now, for 1 ≤ j ≤ Kn , let β j be in Cj ; using the definition of ∆i (β) and the

triangular inequality, we have

X n

min [∆i (β) − E(∆i (β)|Tn )] − ∆i (β j ) − E(∆i (β j )|Tn )
j=1,...,Kn
i=1
Xn
τ b −1/2
≤ 2Lδn min hB C
kn ,q ρn (β − β j ), X i
i .
j=1,...,Kn
i=1
Then, using lemma I.3, we get
n
X

min [∆i (β) − E(∆i (β)|Tn )] − ∆i (β j ) − E(∆i (β j )|Tn )
j=1,...,Kn
i=1
C5 n
≤ 2Lδn √ min β − β j ,
kn ηn j=1,...,Kn
this last inequality being true only on the set Ωn defined by (I.8). Moreover,
there exists a unique j0 ∈ {1, . . . , Kn } such that β ∈ Cj0 , which gives us with
relation (I.15)
(I.17)
X n

min [∆i (β) − E(∆i (β)|Tn )] − ∆i (β j ) − E(∆i (β j )|Tn ) ≤ δn2 n.
j=1,...,Kn 4
i=1
On the other hand, we have
b −1/2 β, Xi i|,
sup |∆i (β)| ≤ Lδn sup |hBτkn ,q C ρn
β∈C β∈C
and using lemma I.3 again, we get, on Ωn ,
C5 Lδn
(I.18) sup |∆i (β)| ≤ √ .
β∈C kn ηn
Besides, for β fixed in C, with the same arguments as before, if we denote by

T ? the set of the random variables (X1 , . . . , Xn , . . .), we have
n
X n
X
V (∆i (β)|T ? ) ≤ b −1/2 β, Xi i|2 |T ? .
L2 δn2 V |hBτkn ,q C ρn
i=1 i=1
b ρn , we remark that
Then, using the definition of C
n
X 2
τ b −1/2 2 b −1/2 Gkn C
b −1/2 β,
(I.19) hBkn ,q Cρn β, Xi i = n kβk − nρn β τ C ρn ρn
i=1
which gives us
n
X
(I.20) V (∆i (β)|T ? ) ≤ nL2 δn2 .
i=1
We are now able to prove lemma I.4. Using first relation (I.17), we have
" ! #
Xn

P sup [∆i (β) − E (∆i (β)|Tn )] > δn2 n ∩ Ωn T ?

kβk≤1 i=1
" ! #
X n
2
?
≤ P max ∆i (β j ) − E ∆i (β j )|Tn > δn n ∩ Ωn T ,
j=1,...,Kn 2
i=1
and then
" n ! #
X
2 ?
P sup [∆i (β) − E (∆i (β)|Tn )] > δn n ∩ Ωn T
kβk≤1 i=1
" n ! #
X 2
?
≤ Kn max P ∆i (β j ) − E ∆i (β j )|Tn > δn n ∩ Ωn T .
j=1,...,Kn 2
i=1
By inequalities (I.18) and (I.20), we apply Bernstein inequality (see Uspensky,

1937) and inequality (I.16) to obtain
" n ! #
X

P sup [∆i (β) − E (∆i (β)|Tn )] > δn2 n ∩ Ωn T ?

kβk≤1 i=1
( kn +q )
8C5 Ln 2 δn4 n2 /4
≤ 2 exp ln √ − √ .
δn kn ηn 2nL2 δn2 + 2C5 Lδn × δn2 n/(2 kn ηn )
This bound does not depend on the sample T ? = (X1 , . . . , Xn , . . .), hence, if
we take the expectation on both sides of this inequality above, we deduce
" n
! #
X
P sup [∆i (β) − E (∆i (β)|Tn )] > δn2 n ∩ Ωn
kβk≤1 i=1
√
2 δn2 kn ηn n
≤ 2 exp − 2 √
8L kn ηn + 4C5 Lδn
√
(kn + q)(8L2 kn ηn + 4C5 Lδn ) 8C5 Ln
× 1− √ ln √ .
2 δn2 kn ηn n δn kn ηn
p
If L = Ln ≤ nkn ηn δn2 , we have
√
δn2 kn ηn n 1
2
√ ≥ −−−−→ +∞,
L kn ηn kn ηn n→+∞
√
δn2 kn ηn n √
≥ n −−−−→ +∞,
Lδn n→+∞
√
k n L2 k n η n
2
√ ≤ kn2 ηn −−−−→ 0,
δn k n η n n n→+∞
kn Lδn kn
√ ≤ √ −−−−→ 0.
δn2 kn ηn n n n→+∞
This leads to
" ! #
Xn
2
lim P sup [∆i (β) − E (∆i (β)|Tn )] > δn n ∩ Ωn = 0,
n→+∞ kβk≤1 i=1

and with the fact that Ωn has probability tending to 1 when n goes to infinity,
we finally obtain
" n #
X
2
lim P sup [∆i (β) − E (∆i (β)|Tn )] > δn n = 0,
n→+∞ kβk≤1 i=1

Proof of lemma I.5. — Let a and b be two real numbers. We denote by Fi
the random repartition function of i given Tn Rand by fi the random density
function of i given Tn . As E (lα (i + b)|Tn ) = R lα (s + b) dFi (s), we obtain,
using the definition of lα ,
E (lα (i + a + b) − lα (i + b)|Tn )

Z +∞ Z −a−b
= 2α s dFi (s) − 2(1 − α) s dFi (s)
−a−b −∞
Z +∞ Z −b
−2α s dFi (s) + 2(1 − α) s dFi (s)
−b −∞
Z +∞ Z −a−b
+2α(a + b) dFi (s) − 2(1 − α)(a + b) dFi (s)
−a−b −∞
Z +∞ Z −b
−2αb dFi (s) + 2(1 − α)b dFi (s),
−b −∞
what gives us
E (lα (i + a + b) − lα (i + b)|Tn )

Z −b Z −b Z −a−b
= 2 s dFi (s) + 2αa + 2b dFi (s) − 2a dFi (s).
−a−b −a−b −∞
Then, noticing that dFi (s) = fi (s)ds and using a Taylor linearization at
first order of around 0 (we write fi (s) = fi (0) + o(1) and Fi (−a − b) =
Fi (0) − (a + b)fi (0) + o(a + b)), we finally obtain (with Fi (0) = α)
a2
E (lα (i + a + b) − lα (i + b)|Tn ) = fi (0)a2 + 2fi (0)ab + ( + ab)riab ,
2
√ √
with riab −→ 0 when a, b −→ 0. If we set L0 = 2L and Ri0 = 2Ri , this
relation gives us
n
X h i
(I.21) b −1/2 β, Xi i − Ri − lα (i − Ri ) |Tn
E lα i − Lδn hBτkn ,q C ρn
i=1
X n h i
= 2 b −1/2 β, Xi i2 + L0 δn hBτ C
fi (0) L02 δn2 hBτkn ,q C b −1/2 β, Xi iR0
ρn kn ,q ρn i
i=1
n h
X i
+ b −1/2 β, Xi i2 + L0 δn hBτ C
L02 δn2 hBτkn ,q C b −1/2 β, Xi iR0 riβ ,
ρn kn ,q ρn i
i=1
with riβ −→ 0. Considering β such that kβk = 1, we have, using relation

(I.11)
b −1/2 β, Xi i2 + L0 δn hBτ C
L02 δn2 hBτkn ,q C b −1/2 0
(I.22) ρn kn ,q ρn β, Xi iRi
1 02 2 τ b −1/2 C2
≥ L δn hBkn ,q Cρn β, Xi i2 − 2p3 , a.s.
2 kn
Moreover, we set Vn = supkβk=1 maxi=1,...,n |riβ |. Using lemma I.3 and relation
(I.11), we have
Lδn kβk C3
τ b −1/2
Lδn hBkn ,q Cρn β, Xi i + |Ri | ≤ C5 √ + p.
kn ηn kn
We deduce from this that, for all β such that kβk = 1 and for all i = 1, . . . , n,

b −1/2 β, Xi i + |Ri | −→ 0,
sup max Lδn hBτkn ,q C ρn
kβk=1 i=1,...,n

b ρ−1/2
and riβ −→ 0 when Lδn hBτkn ,q C n β, X i + |Ri | −→ 0, hence, we can
i
conclude that supkβk=1 maxi=1,...,n |riβ | −→ 0. Then with condition (A.4), we
have 11{Vn <mini fi (0)/4} = 11R for n large enough, and
h i
02 2 τ b −1/2 b −1/2 β, Xi iR0 riβ
(I.23) L δn hBkn ,q Cρn β, Xi i2 + L0 δn hBτkn ,q C ρn i
1
b −1/2 β, Xi i2 + L0 δn hBτ C b −1/2 β, Xi iR0
≤ min fi (0) L02 δn2 hBτkn ,q C ρn kn ,q ρn i
4 i=1,...,n

3 02 2 τ b −1/2 C2
≤ 2 min fi (0) L δn hBkn ,q Cρn β, Xi i2 + 32p .
i=1,...,n 16 8kn
Using inequalities (I.23) and (I.24), relation (I.22) becomes then
n
X h i
b −1/2 β, Xi i − Ri − lα (i − Ri ) |Tn
E lα i − Lδn hBτkn ,q C ρn
i=1
" n
#
5 02 2 X τ b −1/2 9 C 2
3 n
≥ 2 min fi (0) L δn hBkn ,q Cρn β, Xi i2 − 2p .
i=1,...,n 16 i=1
8 k n
Now, we come back to the definition of function fi to obtain

X n
1
inf E [fi (Lδn β) − fi (0)|Tn ]
δn2 n kβk=1 i=1
" n
#
5L02 X τ b −1/2 9C 2
≥ 2 min fi (0) hB C β, Xi i2 − 2p3
i=1,...,n 16n i=1 kn ,q ρn 8kn δn2

(m) 2
τ
2 b −1/2 Lρn τ b −1/2 (m) τ

? (m)
+ρn L Bkn ,q Cρn β 2 + 2 h B C
kn ,q ρn β , Bkn ,q θ i.
L δn
Reminding that L02 = 2L2 and taking ξ = min( 45 mini=1,...,n fi (0), 1), we have
ξ > 0 by hypothesis (A.4) and then
X n
1
δn2 n kβk=1 i=1
" n #
1 X (m) 2
≥ ξL2 inf b −1/2 β, Xi i2 + ρn Bτ C
hBτkn ,q C b −1/2
kβk=1 n
ρn kn ,q ρn β 2
i=1 L
9 C2 2Lρn (m)
b −1/2 β ? (m)
− min fi (0) 2p3 + h Bτkn ,q C ρ , B τ
k ,q θ i.
4 i=1,...,n kn δn2 δn n n
Using relation (I.19), we get
X n
1
δn2 n kβk=1 i=1
9 C2 2Lρn τ b −1/2 (m) (m)
≥ ξL2 − min fi (0) 2p3 + h Bkn ,q Cρn β , Bτkn ,q θ ? i.
4 i=1,...,n 2
k n δn δn

Moreover, for kβk = 1, the infimum τ b −1/2 (m) , Bτ θ ? (m) i is
of h Bkn ,q Cρn β kn ,q
b 1/2 ? b 1/2 ?
obtained for β = −C ρn θ / Cρn θ . Using the fact that the spline approxi-
mation has a bounded m-th derivative, we deduce the existence of a constant
C9 > 0 such that
(m) (m) C9
b −1/2 β
inf h Bτkn ,q C , Bτkn ,q θ ? i ≥ −√ ,
ρn
kβk=1 ηn
hence we obtain
X n
1
δn2 n kβk=1 i=1
9 C2 Lρn
≥ ξL2 − min fi (0) 2p3 − 2C9 √ ,
4 i=1,...,n kn δn2 δn η n
that is to say
X n
1
δn2 n kβk=1 i=1

2 9 mini=1,...,n fi (0)C32 2C9 ρn
≥ ξL 1 − − √ .
4ξL2 kn2p δn2 ξLδn ηn
Noticing that
1 1 nηn
for δn2 ∼ , we have 2p ∼ 2p −−−−→ 0,
nηn k n δn2 kn n→+∞
ρ n ρ n
p
for δn2 ∼ , we have √ ∼ ρn kn −−−−→ 0,
kn ηn δn η n n→+∞
The last quantity in the inequality above can be made arbitrarily large as n
goes to infinity by choosing L = L sufficiently large. This leads to
n
!
1 X
lim P inf E [fi (Lδn β) − fi (0)|Tn ] > 1 = 1,
n→+∞ δn2 n kβk=1 i=1

Proof of lemma I.2. — Writing ΓX = (ΓX − ΓX,n ) + ΓX,n , we make the

following decomposition

(I.24) kf − gk2ΓX = 2kΓX − ΓX,n k kf k2L2 + kgk2L2 + kf − gk2ΓX,n .
Pm−1 tl (l)
Now, let us decompose f as follows : f = P + R with P (t) = l=0 l!
f (0)
R t (t−u)m−1 (m)
and R(t) = 0 (m−1)! f (u) du. P belongs to the space Pm−1 of polynomials
of degree at most m − 1, whose dimension is finite and equal to m. Using
hypothesis (A.3), there exists a constant C6 > 0 such that we have kP k2L2 ≤
C6 kP k2ΓX,n . Then, we can deduce
(I.25) kf k2L2 ≤ 2kP k2L2 + 2kRk2L2

≤ 2C6 kP k2ΓX,n + 2kRk2L2
≤ 4C6 kf k2ΓX,n + 4C6 kΓX,n k kRk2L2 + 2kRk2L2 .
As ΓX,n is a bounded operator (by hypothesis (A.1)), there exists a constant

C7 > 0 such that we have kΓX,n k ≤ C7 . Moreover, under Cauchy-Schwarz
inequality, there exists a constant C8 > 0 such that kRk2L2 ≤ C8 kf (m) k2L2 .
Relation (I.26) gives kf k2L2 ≤ 4C6 kf k2γX,n + (4C6 C7 + 2) C8 kf (m) k2L2 . Then, if
we write f = (f − g) + g, we finally deduce
(I.26) kf k2L2 ≤ 8C6 kf − gk2ΓX,n + (8C6 C7 + 4) C8 k(f − g)(m) k2L2

+8C6 kΓX,n k kgk2L2 + (8C6 C7 + 4) C8 kg (m) k2L2 .
We have supposed that kgkL2 and kg (m) kL2 are bounded, so
8C6 kΓX,n k kgk2L2 + (8C6 C7 + 4) C8 kg (m) k2L2 = O(1),
and the hypothesis kf −gk2ΓX,n +ρn k(f −g)(m) k2L2 = OP (un ) gives us the bounds
kf − gk2ΓX,n = OP (un ) and k(f − g)(m) k2L2 = OP (un /ρn ). Then, relation (I.27)
becomes
un
(I.27) kf k2L2 = OP 1 + .
ρn
Finally, we have kΓX −ΓX,n k = oP (n(δ−1)/2 ) = oP (ρn ) from lemma 5.3 of Cardot,
Ferraty and Sarda (1999). This equality, combined with equations (I.24) and
(I.27) gives us kf − gk2ΓX = OP (un ), which is the announced result.
I.3. COMMENTAIRES ET PERSPECTIVES
La vitesse que l’on obtient concernant l’estimation de quantiles condition-

nels dans ce cadre fonctionnel est moins bonne que la vitesse nonparamétrique
unidimensionnelle usuelle de Stone (1982), qui est en n−2p/(2p+1) . Remarquons
toutefois que ce résultat de convergence est obtenu sous des hypothèses clas-
siques et relativement faibles. On peut imaginer qu’il est possible de l’atteindre
en mettant des hypothèses plus fortes sur X, dans le but par exemple de trouver
un ηn plus grand que ρn /kn . Des travaux sont actuellement en cours en colla-
boration avec Hervé Cardot, Alois Kneip et Pascal Sarda en ce qui concerne
l’amélioration de ces vitesses. Ces travaux concernent dans un premier temps
l’estimation de la moyenne conditionnelle (voir la partie II de cette thèse),
mais on peut raisonnablement espérer obtenir aussi des résultats concernant
l’estimation de quantiles conditionnels.
Outre cette amélioration des vitesses de convergence, ce travail sur l’estima-

tion de quantiles conditionnels pour variables explicatives fonctionnelles ouvre
plusieurs perspectives. Par exemple, on peut imaginer d’autres méthodes d’es-
timation dans la catégorie d’estimation de quantiles par des méthodes directes
(c’est-à-dire par minimisation d’un critère de type (5)). On peut ainsi envisager
par exemple un estimateur en utilisant une autre base que les splines (Fou-
rier, ondelettes, . . . ). On peut également envisager une méthode d’estimation
à noyau, étendant au cas d’une variable explicative fonctionnelle les travaux de
Fan, Hu et Truong (1994), adoptant ainsi un point de vue nonparamétrique.
Sur un autre plan, concernant l’hypothèse d’indépendance des Xi , il semble

envisageable de pouvoir l’alléger et d’obtenir ainsi des vitesses de conver-
gence pour des données dépendantes, en faisant par exemple des hypothèses
de mélange (voir par exemple à ce sujet Rio, 2000). Ferraty, Rabhi et Vieu
62 I.3. COMMENTAIRES ET PERSPECTIVES
(2005) se posent ce problème de l’estimation de quantiles conditionnels pour va-

riables fonctionnelles dépendantes, mais ont un point de vue nonparamétrique
et passent par l’inversion de la fonction de répartition conditionnelle pour es-
timer les quantiles conditionnels.
Enfin, une autre perspective qu’il semble intéressant d’envisager est l’esti-
mation de quantiles conditionnels lorsque non seulement la variable explicative
est fonctionnelle mais aussi la variable d’intérêt. On pourrait même envisager
dans un premier temps une variable réponse uniquement multivariée. Les tra-
vaux déjà cités en introduction de Averous et Meste (1997) et de Cadre (2001)
semblent très utiles dans le cadre de cette étude.
PARTIE II
ESTIMATEUR PAR SPLINES

DE LISSAGE DANS LE
MODÈLE LINÉAIRE
FONCTIONNEL
II.1. CONSTRUCTION DE
L’ESTIMATEUR
L’objet de cette partie est de construire un estimateur de la moyenne condi-

tionnelle dans le cadre du modèle (2), et d’étudier son comportement asymp-
totique. Cette étude fait l’objet d’un article à paraı̂tre dans Computational
Statistics and Data Analysis (voir Cardot, Crambes, Kneip et Sarda, 2006).
Pour des raisons pratiques, comme cet article étudie aussi le cas où la variable
explicative est bruitée, cet article est donné dans la partie III de la thèse (cha-
pitre III.3), puisque cette partie concerne le cas du modèle linéaire fonctionnel
lorsque la varaible explicative est bruitée. Dans ce premier chapitre, on donne le
principe de construction de l’estimateur. En conservant les notations de l’intro-
duction, on se place dans un modèle à plan fixe ou aléatoire. Le premier cas cor-
respond à des situations où X1 , . . . , Xn sont des fonctions fixées non aléatoires.
On supposera ces courbes dans L2 ([0, 1]). Des exemples de telles situations se
trouvent par exemple en chimiométrie où X1 , . . . , Xn sont des réponses fonc-
tionnelles obtenues sous des conditions expérimentales prédéterminées (voir
par exemple à ce sujet Cuevas, Febrero et Fraiman, 2002). Sinon, dans le cas
du modèle à plan aléatoire, X1 , . . . , Xn seront des variables aléatoires à va-
leurs dans L2 ([0, 1]), de même loi qu’une variable aléatoire X. Sans perte de
généralité, on supposera que les variables X1 , . . .P , Xn sont centrées (c’est-à-dire
E (X) = 0 dans le modèle à plan aléatoire et n1 ni=1 Xi = 0 dans le modèle à
plan fixe). Les variables aléatoires Y1 , . . . , Yn sont à valeurs dans R, de même
loi qu’une variable aléatoire Y . On supposera également qu’il y a indépendance
entre les couples (X1 , Y1 ), . . . , (Xn , Yn ). Ainsi, les variables aléatoires 1 , . . . ,
n définies dans (2) sont indépendantes identiquement distribuées, de même loi
que , telle que E() = 0. Dans le cas du modèle à plan aléatoire, on suppose de
plus que E(kXk2 ) < +∞, ce qui assure l’existence de l’opérateur de covariance
ΓX , et que E(X) = 0. Pour alléger les preuves, cette dernière hypothèse sera
plutôt remplacée par l’hypothèse plus forte que E(|X) = 0.
66 II.1. CONSTRUCTION DE L’ESTIMATEUR
En se plaçant sur l’espace L2 ([0, 1]) muni de son produit scalaire usuel, le
modèle (2) s’écrit ainsi, pour tout i = 1, . . . , n,
Z 1
(II.1) Yi = α(t)Xi (t)dt + i ,
0
le but étant de donner une estimation de α sur la base des observations (X1 , Y1 ),
. . . , (Xn , Yn ).
En pratique, les courbes n’étant pas entièrement disponibles, on note t1 <

. . . < tp les points de mesure de ces courbes que l’on suppose identiques pour
toutes les courbes et équidistants entre 0 et 1. Dans cette situation discrétisée,
on notera h., .ip la version discrétisée du produit scalaire de L2 ([0, 1]), défini
par
p
1X
hf, gip = f (tj )g(tj ),
p j=1
pour toutes fonctions f et g de L2 ([0, 1]). Cette version discrétisée du produit

scalaire usuel de L2 ([0, 1]) en donne une approximation dont la qualité dépend
à la fois de la taille de p mais aussi de la régularité des fonctions f et g. La
prise en compte de l’erreur due à cette approximation est l’objet de travaux en
cours, et ne sera pas présentée ici. Dans cette partie de la thèse, on considèrera
donc le modèle (II.1) sous la forme
p
1X
(II.2) Yi = α(tj )Xi (tj ) + i .
p j=1
Dans la suite, on adopte les notations matricielles suivantes. On note Y =

(Y1 , . . . , Yn )τ , α = (α(t1 ), . . . , α(tp ))τ , Xi = (Xi (t1 ), . . . , Xi (tp ))τ pour tout
i = 1, . . . , n et = (1 , . . . , n )τ . On note de plus X la matrice de taille n × p
et de terme général Xi (tj ) pour i = 1, . . . , n et j = 1, . . . , p. En utilisant ces
notations matricielles, le modèle (II.2) s’écrit donc
II.1. CONSTRUCTION DE L’ESTIMATEUR 67
1
(II.3) Y = Xα + .
p
On souhaite maintenant donner une estimation de α basée sur les splines de

lissage, plus précisément, on va estimer le vecteur α ∈ Rp comme les valeurs
aux points de mesure t1 , . . . , tp d’une fonction lisse. On va donc supposer que α
est une fonction m fois dérivable, avec m ∈ N. En reprenant les notations sur
les splines de lissage vues dans l’introduction de la thèse, l’idée de l’estimation
de α à l’aide de splines de lissage amène à chercher α b ∗ solution du problème
de minimisation
( 2 Z )
1 1
Y − Xa + ρ s(m) (t)2 dt ,
(II.4) min
a∈Rp n p
I
a
où k.k désigne ici la norme euclidienne usuelle de Rn , sa est la spline d’inter-
polation associée au vecteur a, et ρ est un paramètre de lissage permettant à
nouveau le compromis entre l’ajustement au données et le lissage de l’estima-
teur. En utilisant la relation (3), on peut écrire
Z
sa(m) (t)2 dt = aτ A∗m a,
I
avec
Z
−1
A∗m τ
= B (B B) [ b(m) (t)b(m) (t)τ dt] (Bτ B)−1 Bτ .
I
La solution au problème de minimisation (II.4) peut alors s’écrire explicitement

par
(II.5)
−1 −1
1
∗ 1 τ 1 1 τ
b =
α 2
X X + ρA∗m τ
X Y= X X + ρpA∗m Xτ Y.
np np n np
68 II.1. CONSTRUCTION DE L’ESTIMATEUR
L’étude de cet estimateur α b ∗ dépend alors du comportement de la matrice

Am , plus précisément des valeurs propres de la matrice pA∗m . Par exemple,
∗
Utreras (1983) montre que cette matrice pA∗m possède m valeurs propres nulles
µ1,p = . . . = µm,p = 0, tandis que, lorsque p tend vers l’infini,
p ∞
X 1 X
(II.6) −→ (πj)−2m ,
j=m+1
µ j,p j=m+1
où 0 < µm+1,p < . . . < µp,p désignent les p − m valeurs propres non nulles de
pA∗m . Comme dans (II.6) la série converge uniquement si m 6= 0, on supposera
que ceci est vérifié dans la suite.
Le fait que cette matrice pA∗m ait m valeurs propres nulles pose problème
1
pour l’inversion de la matrice np Xτ X + ρpA∗m , donc pour l’existence de l’esti-
mateur. Pour contourner ce problème, on va légèrement modifier l’estimateur
b ∗ en procédant de la façon suivante. On note Em le sous-espace propre de
α
dimension m associé aux m valeurs propres nulles de pA∗m , et Pm la matrice
de projection sur ce sous-espace. On définit alors
Am = Pm + pA∗m .
L’estimateur de α sera alors défini par
(II.7)
−1 −1
1 1 τ ρ τ 1 1 τ
b F LS,X
α = X X + Am X Y= X X + ρAm Xτ Y,
np np2 p n np
solution du problème de minimisation
( 2 )
1 1 ρ
Y − Xa + aτ Am a ,
(II.8) min
a∈Rp n p p
F LS signifiant “Functional Least Squares”. Ainsi, le problème des valeurs

propres nulles de pA∗m disparaı̂t, la matrice Am possédant m valeurs propres
II.1. CONSTRUCTION DE L’ESTIMATEUR 69
égales à 1, les p−m valeurs propres restantes coı̈ncidant avec les valeurs propres
µm+1,p < . . . < µp,p de pA∗m . Finalement, l’estimation du paramètre fonctionnel
α est définie par
bF LS,X = sαb F LS,X ,

α
b F LS,X .
interpolation spline associée au vecteur α
II.2. RÉSULTAT DE CONVERGENCE
On étudie dans ce chapitre le comportement asymptotique de notre estima-

teur lorsque n et p tendent vers l’infini. Les résultats que l’on présente ici (et
leurs preuves) font également partie de l’article de Cardot, Crambes, Kneip et
Sarda (2006) donné au chapitre III.3 de la thèse. Cette étude est notamment
basée sur le comportement de l’inverse
P∞ de la −2m matrice Am . En utilisant la re-
−1
lation (II.6), on a Tr (Am ) −→ j=m+1 (πj) + m =: D0 lorsque p → ∞.
Ainsi, pour toute constante D1 > D0 , il existe p0 ∈ N tel que, pour tout p ≥ p0 ,

(II.9) Tr A−1
m ≤ D1 .
Le comportement asymptotique de l’estimateur α b F LS,X de α est étudié re-

lativement à la semi-norme induite par l’opérateur de covariance empirique
discrétisé et définie par

1 1 τ
(II.10) kuk2ΓX,n,p = uτ X X u.
p np
L’hypothèse de régularité faite sur la fonction α est la suivante. On suppose

que α est m fois dérivable
R 1 (m) R et α(m) ∈ L2 ([0, 1]). Ainsi, on peut définir D2 =
1
0
α (t)2 dt et D3∗ = 0 α(t)2 dt. On en déduit que p1 ατ Pm α ≤ p1 ατ α −→ C3∗
si p → ∞. Alors, pour toute constante D3 > D3∗ , il existe p1 ∈ N tel que
p1 ≥ p0 et p1 ατ Pm α ≤ D3 pour tout p ≥ p1 .
72 II.2. RÉSULTAT DE CONVERGENCE
Comme noté auparavant, X1 , . . . , Xn peuvent être aléatoires ou pas. Dans

tous les cas, les espérances du théorème qui suit sont relatives à la loi de proba-
bilité induite par la variable aléatoire . Lorsque X1 , . . . , Xn sont aléatoires, ces
espérances doivent être vues comme conditionnelles à X1 , . . . , Xn . On donne
maintenant le résultat asymptotique concernant α b F LS,X . Sous l’hypothèse de
régularité sur la fonction α faite précédemment (α(m) ∈ L2 ([0, 1])), avec les
définitions de D1 , D2 , D3 , p1 , on obtient pour tout n ∈ N et tout p ≥ p1

1 τ
b F LS,X ) −
kE(α αk2ΓX,n,p ≤ρ α Pm α + D 2 ≤ ρ (D3 + D2 ) ,
p
et
1 σ2
b F LS,X ) k2 ≤ D1 .
b F LS,X − E(α
E kα
p nρ
Ce résultat (biais et variance) a la conséquence immédiate suivante. Si on

suppose de plus que, pour tous n, p (cette inégalité étant presque sûre dans le
cas où X1 , . . . , Xn sont aléatoires),
sup sup |Xi (tj )| ≤ D4 < +∞,

i=1,...,n j=1,...,p
on a alors, en prenant ρ ∼ n−1/2 avec n → ∞,

b F LS,X − αk2ΓX,n,p = OP n−1/2 .
kα
II.3. COMMENTAIRES ET
PERSPECTIVES
Le résultat précédent donne une vitesse moins bonne que la vitesse nonpa-
ramétrique unidimensionnelle usuelle de Stone (1982). Cependant, des travaux
en collaboration avec Hervé Cardot, Alois Kneip et Pascal Sarda sont actuelle-
ment en cours dans le but d’améliorer cette vitesse, donnant des résultats en-
courageants. D’autre part, ces travaux prennent aussi en compte le passage du
cas discrétisé considéré ici au cas “fonctionnel”. Plus précisément, on cherche
aussi à établir des résultats de convergence (si possible avec des vitesses aussi
bonnes que dans le cas discrétisé) pour α bF LS,X = sαb F LS,X relativement aux
semi-normes k.kΓX,n et k.kΓX . Ces travaux en cours tiennent également compte
de l’approximation du produit scalaire, ce qui n’estP pas fait dans cette thèse.
On considère en effet ici le modèle s’écrivant Yi = p pj=1 Xi (tj )α(tj ) + i pour
1
i = 1, . . . , n et non Yi = hXi , αi + i pour i = 1, . . . , n.
D’autre part, comme cela a été évoqué dans les perspectives d’estimation de
quantiles conditionnels pour variables explicatives fonctionnelles, on peut aussi,
pour l’estimation de la moyenne conditionnelle envisager la construction d’un
estimateur par ondelettes (voir à par exemple ce sujet les livres de Daubechies,
1992 et Cohen, 2003). Les ondelettes ont connu un intérêt grandissant ces
dernières années. Elles présentent en effet des avantages par rapport aux bases
de Fourier par exemple, comme de permettre la représentation d’un signal à
la fois en temps et en échelle.
PARTIE III
MODÈLE LINÉAIRE
FONCTIONNEL LORSQUE
LA VARIABLE
EXPLICATIVE EST
BRUITÉE
III.1. MOINDRES CARRÉS
ORTHOGONAUX - CAS MULTIVARIÉ
L’objet de ce premier chapitre de la partie III est de donner une descrip-

tion de la méthode des moindres carrés orthogonaux (“Total Least Squares”
en anglais, abrégé en T LS) dans le cas où la variable explicative est multi-
variée, c’est-à-dire élément de Rp , et de donner une méthode algorithmique de
résolution de ce problème (voir par exemple Golub et Van Loan, 1980). Ces
travaux, bien que connus dans ce cadre multivarié, nous seront très utiles dans
notre cadre fonctionnel, c’est pourquoi on les rappelle ici. On considère donc
que le modèle s’écrit, pour i allant de 1 à n,

 Yi = Xτi α + i ,
(III.1)

Wi = X i + δ i ,
avec Xi = (Xi1 , . . . , Xip )τ , Wi = (Wi1 , . . . , Wip )τ et δ i = (δi1 , . . . , δip )τ vecteurs

de Rp . On doit alors estimer α = (α1 , . . . , αp )τ ∈ Rp sur la base des observa-
tions disponibles (W1 , Y1 ), . . . , (Wn , Yn ). Dans toute la suite, on adopte les
notations matricielles suivantes : on note X, W et δ les matrices de taille n × p
et de termes généraux respectifs Xij , Wij et δij pour i allant de 1 à n et j allant
de 1 à p, et on note Y = (Y1 , . . . , Yn )τ et = (1 , . . . , n )τ . Comme souligné en
introduction, l’idée des moindres carrés orthogonaux est de rajouter un terme
d’erreur sur la variable explicative dans le problème de minimisation, à savoir
n
1X
kWi − Xi k2 ,
n i=1
78 III.1. MOINDRES CARRÉS ORTHOGONAUX - CAS MULTIVARIÉ
pour finalement déterminer simultanément une estimation de α et X1 , . . . , Xn

en résolvant le problème de minimisation
( n
)
1 X 2
(III.2) min (Yi − Xτi a) + (Xi − Wi )τ (Xi − Wi ) .
a∈Rp ,Xi ∈Rp n i=1
La représentation graphique dans le cas univarié (p = 1) permet de justifier

l’appellation de moindres carrés orthogonaux. En effet, lorsqu’on résoud un
problème de moindres carrés ordinaires, on cherche la droite qui minimise la
somme des carrés des distances “verticales” des points du nuage jusqu’à la
droite. En comparaison, lorsqu’on résoud le problème de minimisation (III.2),
on cherche en fait la droite qui minimise la somme des carrés des distances “or-
thogonales” des points du nuage jusqu’à la droite (voir les figures ci-dessous).
MC Ord MC Orth
Yi Yi
i
i
Xi α δi
Xi α
Xi W i Xi
La résolution du problème de minimisation (III.2) est basée sur de l’algèbre

matricielle, notamment la décomposition en valeurs singulières d’une matrice
rectangulaire (un ouvrage de référence en algèbre matricielle est par exemple
celui de Golub et Van Loan, 1996). Initialement proposé par Golub et Van
Loan (1980), cette méthode des moindres carrés orthogonaux a ensuite été
reprise dans un ouvrage de Van Huffel et Vandewalle (1991). Le résultat est le
suivant.
III.1. MOINDRES CARRÉS ORTHOGONAUX - CAS MULTIVARIÉ 79
Proposition III.1. — La solution en a ∈ Rp au problème de minimisation

b T LS , est donnée par
(III.2), notée α
−1
(III.3) b T LS = Wτ W − σmin
α 2
Ip Wτ Y,
2
où Ip désigne la matrice identité de taille p et σmin est la plus petite valeur
τ
propre non nulle de la matrice (W, Y) (W, Y), où (W, Y) désigne la matrice
obtenue en concaténant les matrices W et Y.
On remarque cette écriture fait apparaı̂tre la solution du problème de mi-

nimisation comme une version corrigée de l’estimateur par moindres carrés
2
ordinaires. En effet, il apparaı̂t un terme −σmin Ip qui peut être vu comme un
terme de “dérégularisation”. Celui-ci peut s’interpréter comme un terme de
correction qui vise à diminuer le biais induit par la présence de la matrice
Wτ W dans l’expression de l’estimateur, au lieu de la matrice Xτ X (non dis-
ponible) car on observe W et non X. On donne la preuve de ce résultat, l’idée
étant reprise ensuite dans notre cadre de variable explicative fonctionnelle.
Preuve: On introduit la norme de Frobenius d’une matrice A de terme

général aij (i = 1, . . . , n et j = 1, . . . , p), notée k.kF et définie par
p
n X
X
kAk2F = a2ij = Tr (Aτ A) .
i=1 j=1
Le problème de minimisation (III.2) s’écrit alors
min k(δ, )k2F ,

((W,Y)−(δ,))( α
−1 )=0
ce qui revient à résoudre, en notant A = (W, Y), E = (δ, ) et x = (ατ , −1)τ ,
min kEk2F .
Ax=Ex
En notant k.k la norme matricielle euclidienne usuelle, on a ainsi

80 III.1. MOINDRES CARRÉS ORTHOGONAUX - CAS MULTIVARIÉ
xτ Eτ Ex xτ Aτ Ax
kEk2F ≥ kEk = sup = sup .
x6=0 xτ x x6=0 xτ x
Or, on remarque que, si on prend
Axxτ
E= ,
xτ x
on a bien Ax = Ex et
xxτ Aτ Axxτ xτ Aτ Ax
kEk2F = Tr (Eτ E) = = .
(xτ x)2 xτ x
τ τ
Il ne reste alors plus qu’à minimiser x xAτ xAx en x. On considère donc la dia-
gonalisation de Aτ A et on note σmin2
la plus petite valeur propre non nulle,
associée au vecteur propre noté vmin . Ainsi, la solution au problème de mini-
misation est obtenue pour x = kvmin . La dernière composante (la (p + 1)ème )
donne la valeur de k = −1/vminp+1 . La solution au problème de minimisation
(III.2) est donc donnée par
 
vmin1
1  ...  et b
b T LS =
α − = Avmin vmin τ .
δ, b
vminp+1
vminp
L’écriture
b T LS s’obtient alors immédiatement de la façon suivante. Comme
de α
b T LS
α
−1
est vecteur propre de la matrice Aτ A associé à la valeur propre σmin
2
,
on a donc

Wτ b T LS
α 2 b T LS
α
(W, Y) = σmin ,
Yτ −1 −1
ce qui donne (en considérant les p premières composantes de ce vecteur)
Wτ Wα
b T LS − Wτ Y = σmin
2
b T LS ,
α
III.1. MOINDRES CARRÉS ORTHOGONAUX - CAS MULTIVARIÉ 81
et cela achève la preuve de la proposition III.1.
Il est possible, pour faire face à des éventuels problèmes de conditionnement

(dû au fait que les valeurs propres de la matrice W τ W peuvent décroı̂tre rapi-
dement vers zéro), de considérer une version régularisée du problème de mini-
misation (III.2). Ce cas de figure a notamment été envisagé dans un article de
Golub, Hansen et O’Leary (1999) qui considèrent le problème de minimisation
(III.4) ( )
n
1 X
min (Yi − Xτi a)2 + (Xi − Wi )(Xi − Wi )τ + ρaτ Lτ La ,
où L est une matrice de taille p×p fixée et ρ est un paramètre de régularisation
qui permet de contrôler le mauvais conditionnement de la matrice W. Golub,
Hansen et O’Leary (1999) montrent alors le résultat suivant, dont la preuve se
calque sur celle de la proposition III.1, en incorporant simplement en plus la
régularisation (cette preuve ne sera donc pas donnée ici).
Proposition III.2. — La solution en a ∈ Rp au problème de minimisation
b T LS,pen, est donnée par
(III.4), notée α
−1
(III.5) b T LS,pen = Wτ W + ρLτ L − σmin,pen
α 2
Ip Wτ Y,
2
où σmin,pen est la plus petite valeur propre non nulle de la matrice

τ Lτ L 0
(W, Y) (W, Y) + .
0 0
Ce cas faisant intervenir une pénalisation est important dans notre contexte
fonctionnel, où, comme cela a été souligné, l’apport d’une pénalisation s’avère
fondamental. Ce dernier résultat permettra ainsi la généralisation de la méthode
des moindres carrés orthogonaux au cas d’une variable explicative fonction-
nelle.
III.2. MOINDRES CARRÉS
ORTHOGONAUX - CAS FONCTIONNEL
Dans ce chapitre, le but est de généraliser la méthode des moindres carrés

orthogonaux à notre cadre fonctionnel. On souhaite proposer deux estima-
teurs splines, le premier étant obtenu en modifiant l’estimateur par splines de
régression introduit par Cardot, Ferraty et Sarda (1999, 2003), et le second
étant basé sur l’estimateur par splines de lissage introduit dans la partie II de
la thèse dans le cas où la variable explicative n’était pas bruitée.
On va d’abord s’attacher à donner la méthode de construction de l’estima-

teur par splines de régression, qui a été étudié le premier chronologiquement
(voir Crambes, 2005, pour un premier travail sur ce sujet). Il a été rappelé rapi-
dement dans l’introduction comment l’estimateur de Cardot, Ferraty et Sarda
(1999, 2003) est construit. On cherche un estimateur α b avec θ
b = Bτk,q θ b ∈ Rk+q
solution du problème de minimisation (7). Comme souligné précédemment, ce
problème de minimisation admet une solution explicite. Celle-ci est donnée par
(III.6) bF LS,X = 1 ( 1 Dτ DX + ρGk )−1 Dτ Y,

θ
n n X X
avec
 
hB1 , X1 i . . . hBk+q , X1 i
DX =  .. .. ,
. .
hB1 , Xn i . . . hBk+q , Xn i
et
84 III.2. MOINDRES CARRÉS ORTHOGONAUX - CAS FONCTIONNEL

(m) (m) (m) (m) 
hB1 , B1 i . . . hB1 , Bk+q i
 .. .. 
Gk =  . . .
(m) (m) (m) (m)
hBk+q , B1 i . . . hBk+q , Bk+q i
III.2.1. Construction de l’estimateur (splines de régression)
Ici, les courbes X1 , . . . , Xn ne sont pas connues, les courbes réellement ob-
servées W1 , . . . , Wn sont définies par (8). Pour étendre la méthode des moindres
carrés orthogonaux à ce contexte, on va donc considérer le problème de mini-
misation
(
1X
n
2 2
e f i
min Yi − hBτk,q θ, Xi i + Xi − W
θ∈Rk+q ,Xi ∈L2 (I) n i=1
)

τ (m) 2
(III.7) +ρ Bk,q θ 2 ,
L
e i et W
où X f i sont les versions splines de Xi et Wi . Plus précisément, en notant
X la matrice n × p de terme général (Xi (tj )) pour i = 1, . . . , n et j = 1, . . . , p
et si β est la matrice p × (k + q) de terme général Br (tj ) pour j = 1, . . . , p et
e est la matrice n × (k + q) définie par
r = 1, . . . , k + q, alors X
e = Xβ,
X
et Xe i est la ième ligne de X.

e Maintenant, en utilisant une technique analogue
à celle présentée dans le cas multivarié (au chapitre sur les moindres carrés
orthogonaux), on montre le résultat suivant.
Proposition III.3. — La solution en θ ∈ Rk+q du problème de minimisa-

bF T LS (F T LS pour “Functional Total Least Squares”) est
tion (III.7), notée θ
donnée par
III.2.1. CONSTRUCTION DE L’ESTIMATEUR (SPLINES DE RÉGRESSION) 85
−1
bF T LS 1 1 τ 2
(III.8) θ = DW DW + ρGk − σmin Bk DτW Y,
n n
où les matrices DW et Bk sont définies par
 
hB1 , W1 i . . . hBk+q , W1 i
DW = .. .. ,
. .
hB1 , Wn i . . . hBk+q , Wn i
 
hB1 , B1 i . . . hBk+q , B1 i
Bk =  .. .. ,
. .
hB1 , Bk+q i . . . hBk+q , Bk+q i
2
et σmin est la plus petite valeur propre de la matrice
τ
1 DW DW −1 −1
√ ,Y √ , Y + γ (γ τ γ) (ρKk ) (γ τ γ) γ τ ,
n p p
avec γ matrice (p + 1) × (k + q + 1) donnée par
β
√
p
0
γ= ,
0 0
et Kk matrice (k + q + 1) × (k + q + 1) donnée par

Gk 0
Kk = .
0 0
La preuve de ce résultat est donnée en annexe (voir partie V.1.). On constate

que ce résultat est une analogie directe à la proposition III.2, la matrice iden-
tité intervenant dans la “dérégularisation” est remplacée par la matrice Bk .
Le problème
qui
τ vient
se rajouter ici est que les valeurs propres de la ma-
1 DW DW
trice n √p , Y √ , Y décroissent vers zéro et cela cause des problèmes
p
2
numériques pour le calcul de σmin . Ainsi, on va contourner ce problème par
l’intermédiaire d’un résultat qui donne le lien entre DτX DX (non accessible) et
DτW DW , ce qui va permettre de modifier la “dérégularisation”. La preuve de
ce résultat est également donnée en annexe (voir partie V.1.).
Proposition III.4. — On fait l’hypothèse suivante.
(B.0) Les variables Xi vérifient (p.s.)
sup sup |Xi (t)| ≤ c0 ,

i=1,...,n t∈[0,1]
où c0 ne dépend pas de n. On a alors
1 τ 1 σ2
(III.9) DW DW = DτX DX + δ Bk + R1 ,
n n p
où R1 est une matrice vérifiant

1
kR1 k = OP .
n1/2 p1/2 k 1/2
Enfin, comme σδ2 n’est pas toujours connu, on peut l’estimer. On choisit ici
de l’estimer nonparamétriquement, en utilisant les travaux de Gasser, Sroka
et Jennen-Steinmetz (1986). Comme on est en présence de points de mesure
équidistants, cet estimateur est donné par
n p−1
1X 1 X
(III.10) bδ2
σ = [Wi (tj−1 ) − Wi (tj ) + Wi (tj+1 ) − Wi (tj )]2 .
n i=1 6(p − 2) j=2
Finalement, l’estimateur de θ est donné par

III.2.2. RÉSULTAT DE CONVERGENCE 87
−1
bF T LS 1 1 τ bδ2
σ
(III.11) θ = D DW + ρGk − Bk DτW Y,
n n W p
et l’estimateur de α est donné par
bF T LS .
bF T LS,k = Bτk,q θ
α
III.2.2. Résultat de convergence
Le résultat de convergence à venir donne une borne supérieure pour la vi-

tesse de convergence de α bF T LS vers α au sens de la semi-norme induite par
l’opérateur de covariance ΓX . On supposera que k = kn → +∞ et ρ = ρn → 0
quand n → +∞. On fait également tendre p vers l’infini. La preuve de ce
résultat sera basée sur une décomposition de la forme α bF T LS − α = α bF T LS −
b+α
α b − α, où αb est l’estimateur de α par splines de régression introduit par
Cardot, Ferraty et Sarda (1999, 2003), supposant que les courbes X1 , . . . , Xn
sont directement accessibles et non bruitées. Comme ces derniers ont déjà
établi un résultat de convergence concernant α b, il faudra évaluer l’écart entre
cet estimateur et celui par moindres carrés orthogonaux. Ainsi, pour établir
notre résultat de convergence, on aura besoin des hypothèses faites par Car-
dot, Ferraty et Sarda (2003) assurant la convergence de leur estimateur. Ces
hypothèses sont les suivantes.
(B.1) La variable X vérifie
kXk ≤ c1 < +∞, p.s.

0
(B.2) La fonction α admet une dérivée d’ordre p0 et α(p ) vérifie

(p0 ) (p0 )
α (t) − α (s) ≤ c2 |t − s|ν ,
pour s, t ∈ [0, 1], où c2 > 0 et ν ∈ [0, 1]. Dans ce qui suit, on pose d = p0 +ν
et on suppose que q ≥ d ≥ m.
(B.3) Les valeurs propres de ΓX sont strictement positives.
On fait aussi les hypothèses suivantes qui nous permettront de contrôler la

bF T LS et α
vitesse de l’écart entre α b. Ces hypothèses sont les suivantes.
(B.0) (hypothèse de la proposition III.4) Les variables Xi vérifient (p.s.)
sup sup |Xi (t)| ≤ c0 ,

i=1,...,n t∈[0,1]
où c0 ne dépend pas de n.

(B.4) Les variables δij vérifient

sup sup E δij4 ≤ c3 .
i=1,...,n j=1,...,p
où c3 ne dépend pas de n et de p.

(B.5) Les variables Yi et δij sont indépendantes pour tout i = 1, . . . , n et
j = 1, . . . , p et il existe une constante c4 > 0 indépendante de n telle que
supi=1,...,n E (Yi2 ) ≤ c4 .
On a alors le résultat suivant.
Proposition III.5. — Sous les hypothèses qui précèdent, en supposant de

plus que 1/p = o(ρn /kn ), qu’il existe une constante c5 > 0 indépendante de n
1/2
telle que kn (hΓX,n α, Bj i)j=1,...,kn +q ≥ c5 et qu’il existe β, γ ∈]0, 1[ tels que
kn ∼ nβ , ρn ∼ n−(1−γ)/2 , on a

1 kn kn
kb
αF T LS,kn − αk2ΓX = OP 2d
+ + ρn + .
kn nρn npρn
La preuve ne sera pas donnée ici. Elle est simplement basée sur la comparaison
bF T LS,kn et α
entre α b et entre α
b et α. Un résultat de convergence a déjà été obtenu
par Cardot, Ferraty et Sarda (2003) concernant kb α − αkΓX . Plus précisément,
sous les hypothèses (B.1) − (B.3), dès que 1/p = o(ρn /kn ), on a

1 kn
kb
α− αk2ΓX = OP + + ρn .
kn2d nρn
Pour prouver la proposition III.5, on montre finalement que

kn
kb
αF T LS,kn − bk2ΓX
α = OP .
npρn
III.2.4. ESTIMATEUR PAR SPLINES DE LISSAGE 89
ce dernier résultat se prouvant de façon analogue à la preuve du théorème III.2

(voir le chapitre suivant).
III.2.3. Commentaires
Vitesse de convergence. — On remarque que kn /(npρn ) sera négligeable

par rapport à kn /(nρn ) si p est assez grand. Cela signifie qu’à partir du moment
où le nombre de points de mesure est suffisamment grand, l’effet du bruit est
négligeable. Sous cette hypothèse, une vitesse optimale peut être trouvée en
choisissant un ρn et un kn particuliers (voir Cardot, Ferraty et Sarda, 2003).
En prenant ρn ∼ n−2d/(4d+1) et kn ∼ n1/(4d+1) , on a alors

αF T LS,kn − αk2ΓX = OP n−2d/(4d+1) .
kb
Effet de la dérégularisation. — Regardons ce qui se passe si on estime θ

sans faire intervenir de dérégularisation, mais en utilisant les courbes dispo-
bW avec
bW = Bτk,q θ
nibles W1 , . . . , Wn . Plus précisément, on a α
−1
bW 1 1 τ
θ = D DW + ρGk DτW Y.
n n W
Alors, avec des arguments comparables à ceux utilisés pour la preuve du

résultat précédent, on obtient, si p est assez grand, la même vitesse qu’avec
l’estimateur par moindres carrés orthogonaux. Dans ce cas, la correction in-
duite par la méthode des moindres carrés orthogonaux ne semble pas avoir un
impact fondamental sur la vitesse de convergence. Cependant, on peut voir sur
des simulations que l’estimateur par moindres carrés orthogonaux est meilleur
(voir le chapitre suivant).
III.2.4. Estimateur par splines de lissage
Jusqu’à présent, on s’est intéressé à l’estimateur par splines de régression. On

va maintenant considérer un estimateur par splines de lissage. La construction
de cet estimateur est basée sur une correction de l’estimateur par splines de
lissages introduit dans la partie II, en s’inspirant directement de la méthode des
moindres carrés orthogonaux. La construction de l’estimateur se fait de façon
très proche de celle utilisée pour les splines de régression. La dérégularisation
diffère, ne faisant plus intervenir la matrice Bk mais la matrice identité. Cet
estimateur est présenté et étudié dans l’article de Cardot, Crambes, Kneip et
Sarda (2006), donné dans le chapitre suivant.
III.2.5. Perspectives
Les modèles avec variables bruitées sont certainement amenés à se développer

dans le futur, tant ils semblent en adéquation avec la réalité (à partir du mo-
ment où on mesure des variables, elles sont nécessairement entachées d’erreurs).
Du coup, les perpectives de travaux futurs sur ce modèle défini par (2) et (8)
sont multiples.
L’idée la plus naturelle pour supprimer le bruit de la courbe explicative est

de la lisser (par exemple par un lissage à noyau). Cette méthode a commencé
à être envisagée et des premiers travaux sont en cours. Le début de cette étude
est présenté à la partie III.4. de cette thèse. Les premiers résultats semblent
encourageants.
Toujours concernant une variable explicative bruitée, il est peut-être envisa-

geable de considérer l’estimation de quantiles conditionnels. La transposition
de la méthode des moindres carrés orthogonaux à ce contexte n’est à priori pas
évidente, principalement dû au fait que le problème de minimisation relatif à
l’estimation de quantiles n’a pas de solution explicite.
Enfin, une autre perspective à envisager dans ce contexte de variable expli-

cative bruitée propose un travail à plus long terme. L’idée serait de considérer
un bruit à temps continu. Un problème se pose immédiatement : il n’existe pas
de bruit blanc à temps continu. On peut alors envisager un bruit continu δ(t)
qui vérifierait par exemple une hypothèse de mélange.
III.3. FUNCTIONAL LINEAR
REGRESSION WITH
ERRORS-IN-VARIABLES
III.3.1. Introduction
A very common problem in statistics is to explain the effects of a covariate

on a response (variable of interest). While the covariate is usually considered
as a vector of scalars, nowadays, in many applications (for instance in clima-
tology, remote sensing, linguistics, . . . ) the data come from the observation
of a continuous phenomenon over time or space : see Ramsay and Silverman
(2002) or Ferraty and Vieu (2006) for examples. The increasing performances
of measurement instruments permit henceforth to collect these data on dense
grids and they can not be considered anymore as variables taking values in Rp .
This necessitated to develop for this kind of data ad hoc techniques which have
been popularized under the name of functional data analysis and have been
deeply studied these last years (to get a theoretical and practical overview on
functional data analysis, we refer to the books of Bosq, 2000, Ramsay and
Silverman, 1997, Ramsay and Silverman, 2002 and Ferraty and Vieu, 2006).
Our study takes place in this framework of functional data analysis in the
context of regression estimation evocated above. Thus, we consider here the
case of a functional covariate while the response is scalar. To be more precise,
we first consider observations (Xi , Yi )i=1,...,n , where the Xi ’s are real functions
defined on an interval I of R with the assumption that it is square integrable
over I. As usually assumed in the literature, we then work on the Rseparable
real Hilbert space L2 (I) of such functions f defined on I such that I f (t)2 dt
is finite. RThis space is endowed with its usual inner product h., .i defined by
hf, gi = I f (t)g(t)dt for f, g ∈ L2 (I), and the associated norm is noted k.kL2 .
Now, the model we consider to sumarize the link between covariates Xi and
92 III.3. FUNCTIONAL LINEAR REGRESSION WITH ERRORS-IN-VARIABLES
responses Yi is a linear model introduced in Ramsay and Dalzell (1991) and

defined by
Z
(III.12) Yi = α(t)Xi (t)dt + i , i = 1, . . . , n,
I
where α ∈ L2 (I) is an unknown functional parameter and i , i = 1, . . . , n are

i.i.d. real random variables satisfying E(i ) = 0 and E(2i ) = σ2 . The functional
parameter α has been estimated in various ways in the literature : see Ramsay
and Silverman (1997), Marx and Eilers (1999), Cardot, Ferraty and Sarda
(1999) and Cardot, Ferraty and Sarda (2003). Here, our final goal is to deal
with the problem of estimating α in the case where Xi (t) is corrupted by some
unobservable error.
Before going further, let us note that there can be different ways to generate
the curves Xi . One possibility is a fixed design, that is, X1 , . . . , Xn are fixed,
non-random functions. Examples are experiments in chemical or engineering
applications, where Xi corresponds to functional responses obtained under va-
rious, predetermined experimental conditions (see for instance Cuevas, Febrero
and Fraiman, 2002). In other applications one may assume a random design,
where X1 , . . . , Xn are an i.i.d. sample. In any case, Y1 , . . . , Yn are independent
and the expectations always refer to the probability distribution induced by
the random variables 1 , . . . , n , only. In the case of random design, they thus
formally have to be interpreted as conditional expectation given X1 , . . . , Xn .
This implies for instance that E(i |Xi ) = 0 and E(2i |Xi ) = σ2 .
In what precedes it is implicitly assumed that the curves Xi are observed

without error (in model (III.12) all the errors are confined to the variable Yi
by the way of i ). Unfortunately, this assumption does not seem to be very
realistic in practice, and many errors (instrument errors, human errors, . . . )
prevent to know X1 , . . . , Xn exactly. Furthermore, it is to be noticed that in
practice, the whole curves are not available, so we suppose in the following
that the curves are observed in p discretization points t1 < . . . < tp belonging
to I, that we will take equispaced in order to simplify. Taking from now on
I = [0, 1] in order to simplify the notations, we thus have tj − tj−1 = 1p for all
j = 2, . . . , p. Thus, we observe discrete noisy trajectories
(III.13) Wi (tj ) = Xi (tj ) + δij , i = 1, . . . , n, j = 1, . . . , p,

III.3.1. INTRODUCTION 93
where (δij )i=1,...,n,j=1,...,p is a sequence of independent real random variables,

such that, for all i = 1, . . . , n and j = 1, . . . , p
E(δij ) = 0,
and
E(δij2 ) = σδ2 .
The noise components δij are not discrete realizations of continuous time “ran-
dom noise” stochastic process and must be interpreted as random measurement
errors at the finite discretization points (see e.g. Cardot, 2000 and Chiou,
Müller and Wang, 2003 for similar points of view).
The problem of the Errors-in-Variables linear model has already been stu-
died in many ways in the case where the covariate takes values in R or Rp ,
that is to say when it is univariate or multivariate. For instance, the maximum
likelihood method has been applied to this context (see Fuller, 1987), and
asymptotic results have been obtained (see for example Gleser, 1981). Because
this problem is strongly linked to the problem of solving linear systems
Ax ≈ b,
where x ∈ Rp is unknown, b ∈ Rn and A is a matrix of size n × p, some

numerical approaches have also been proposed. One of the most famous is the
Total Least Squares (T LS) method (see for example Golub and Van Loan,
1980 or Van Huffel and Vandewalle, 1991).
Now, coming back to model (III.12), very few works have been done in the
case of Errors-in-Variables : in a recent work from Chiou, Müller and Wang
(2003), a two-step approach is proposed which consists in first smoothing the
noisy trajectories in order to get denoised curves and then build functional
estimators. The point of view adopted here is quite different and deals with
the extension of the T LS approach in the context of the functional linear
model.
Let us describe our formal framework for Errors-in-Variables which is inspi-

red from what is done in the literature. We introduce a discretized version of
the inner product h., .i will be denoted by h., .ip and defined for f, g ∈ L2 (I)
by
p
1X
hf, gip = f (tj )g(tj ).
p j=1
This approximation of h., .i by h., .ip is valid only if p is large enough, so we

assume this from now on. In this context of discretized curves, relation (III.12)
then writes
p
1X
(III.14) Yi = α(tj )Xi (tj ) + i , i = 1, . . . , n.
p j=1
Finally the problem is to estimate α using data (Wi (tj ), Yi )i=1,...,n,j=1,...,p

where W1 (tj ), . . . , Wn (tj ) are noisy observations of X1 (tj ), . . . , Xn (tj ) for j =
1, . . . , p. The generalization of the T LS method to the case where Xi is a func-
tional random variable is presented. As in the multivariate case, the T LS me-
thod consists in a modification of a (penalized) least squares estimator of α for
non-noisy observations : see Marx and Eilers (1999) and Cardot, Ferraty and
Sarda (2003) for such kind of estimators based on B-splines with two different
penalties. Here, we introduce another estimator based on smoothing splines
which, as far as we know, has not been studied previously in the literature.
Some convergence results are also given in this first section (in the non-noisy
case) which serve as a basis for convergence results of the T LS estimator given
in the section after. A more detailed study of the asymptotic behavior of the
smoothing splines estimator will be the subject of a forthcoming work. Then,
the results of convergence for the T LS estimator are commented and we pro-
pose some numerical simulations presenting an evaluation of our estimation
procedure. Finally, we give the proof of our results.
III.3.2. Estimation of α in the non-noisy case
We adopt the following matrix notations : Y = (Y1 , . . . , Yn )τ , Xi = (Xi (t1 ),

. . . , Xi (tp ))τ for all i = 1, . . . , n, α = (α(t1 ), . . . , α(tp ))τ and = (1 , . . . , n )τ .
Moreover, we denote by X the n × p matrix with general term Xi (tj ) for all
III.3.2. ESTIMATION OF α IN THE NON-NOISY CASE 95
i = 1, . . . , n and for all j = 1, . . . , p. Using these notations, the model (III.14)

then writes
1
(III.15) Y = Xα + .
p
In this section, we first assume that X is observable without errors and our
estimation procedure for α is motivated by the popular smoothing splines
approach. Then, we want to estimate α as a smooth function, i.e. we assume
that α is m times differentiable for some fixed m ∈ N∗ .
At first we briefly come back to the smoothing splines procedure in the

usual univariate case. For some noisy observations zi of a smooth function
f (t
Pi ) at design 2points R t1(m), . . . , tp , an estimate fb is obtained by minimizing
1
p i (zi − v(ti )) + ρ I v (t)2 dt for some smoothing parameter ρ > 0. Mi-
nimization takes place over all functions v in an m-th order Sobolev space,
that is D m v ∈ L2 (I). It can be shown (for an overview of results in spline
theory, consider de Boor, 1978, and Eubank, 1988) that the solution fb is
in the space N S m (t1 , . . . , tp ) of natural splines of order 2m with knots at
t1 , . . . , tp . This is a p-dimensional linear functions space with D m v ∈ L2 (I) for
any v ∈ N S m (t1 , . . n . , tp ), and there exist basis o functions b1 , . . . , bp such that
m
P
N S (t1 , . . . , tp ) = j θj bj | θ1 , . . . , θp ∈ R . Different possible basis func-
tions proposed by various authors are discussed in Eubank (1988). An impor-
tant property of natural splines is that for any vector w = (w1 , . . . , wp )τ ∈ Rp ,
there exists a unique natural spline interpolant sw with sw (tj ) = wj , j =
1, . . . , p. With b(t) = (b1 (t), . . . , bp (t))τ and B denoting the p × p matrix with
elements bi (tj ), sw is given by
(III.16) sw (t) = b(t)τ (Bτ B)−1 Bτ w.
Moreover such a spline interpolant satisfies the following fine property
Z Z
(m)
sw (t)2 dt ≤ f (m) (t)2 dt for any other function f
I I
(III.17) with f (m) ∈ L2 (I) and f (tj ) = wj , j = 1, . . . , p.
The inequality (III.17) implies that the solution fb is given by fb = swb , where
P R (m)
b is obtained by minimizing 1p i (zi − wi )2 + ρ I sw (t)2 dt over all vectors
w
w ∈ Rp .
These ideas readily generalize to the problem of estimating α in (III.15)

b ∗F LS,X may be obtained by solving the
and then the function α. An estimator α
minimization problem
( 2 Z )
1 1
(III.18) min Y − Xa
+ ρ sa(m) (t)2 dt ,
a∈Rp n p I
where k.k stands for the usual Euclidean norm, and ρ > 0 is a smoothing
parameter allowing a trade-off between the goodness-of-fit
R (m) 2 to the data and
the smoothness of the fit. By (III.16), we have I sa (t) dt = aτ A∗m a, where
R
A∗m = B (Bτ B)−1 [ I b(m) (t)b(m) (t)τ dt] (Bτ B)−1 Bτ is a p × p matrix. There-
fore, (III.18) can be reformulated in the form
( 2 )
1 1
(III.19) min Y − Xa
+ ρaτ A∗m a ,
a∈Rp n p
leading to the solution
−1 −1
1 1 τ 1 1 τ
b ∗F LS,X
α = X X + ρA∗m τ
X Y= X X + ρpA∗m Xτ Y.
np np2 n np
However, there is a problem with this estimator which is due to the struc-
ture of the eigenvalues of pA∗m . These eigenvalues have been studied by many
authors and a discussion of general results is given by Eubank (1988). The
most precise results in our context are presented in Utreras (1983). It is shown
that this matrix has exactly m zero eigenvalues µ1,p = . . . = µm,p = 0, while
as p → ∞,
III.3.2. ESTIMATION OF α IN THE NON-NOISY CASE 97
p ∞
X 1 X
(III.20) −→ (πj)−2m ,
µ
j=m+1 j,p j=m+1
where 0 < µm+1,p < . . . < µp,p denote the p − m non-zero eigenvalues of pA∗m .
The series given in (III.20) converges for m 6= 0, so we assume this in the
following.
Due to the m zero eigenvalues, existence of α b ∗F LS,X can only be guaranteed by

introducing constraints on the structure of X. This can, however, be avoided
by introducing a minor modification of this estimator. The m-dimensional
eigenspace corresponding to µ1,p = . . . = µm,p = 0 is the linear vector space
generated by all (discretized) polynomials of degree m − 1, that is, Em consists
Pm−1
of all vectors w ∈ Rp with wi = θ1 + j=1 θj+1 tji , i = 1, . . . , p, for some
coefficients θ1 , . . . , θm . Let Pm denote the p × p projection matrix projecting
into the space Em , and set Am = Pm + pA∗m . Our final estimator α b F LS,X is
then defined by
(III.21)
−1 −1
1 1 τ ρ τ 1 1 τ
b F LS,X
α = 2
X X + Am X Y= X X + ρAm Xτ Y,
np np p n np
and a corresponding estimator of α is provided by α bF LS,X = sαb F LS,X . It is
b F LS,X is solution of the modified minimization pro-
immediately verified that α
blem
( 2 )
1 1 ρ
Y − Xa + aτ Am a .
min
a∈Rp n p p
By definition, the matrix Am possesses m eigenvalues equal to 1, while the

remaining p − m eigenvalues coincide with the eigenvalues
P∞ µm+1,p < . . . < µp,p
of pA∗m . Thus, by (III.20), we obtain Tr (A−1
m ) −→ j=m+1 (πj) −2m
+ m =: D0
as p → ∞. It follows that for any constant D1 > D0 there exists a p0 ∈ N such
that

(III.22) Tr A−1
m ≤ D1 ,
for all p ≥ p0 .
We will now study the behavior of our estimator for large values of n and p.
The behavior of our estimator will be evaluated with respect to the semi-norm

1 1 τ
kuk2ΓX,n,p = uτ X X u.
p np
It is well-known that functional linear regression belongs to the class of ill-

posed problems. The semi-norm k.kΓX,n,p may be seen as a discretized version
of L2 semi-norms which are usually applied in this context. It is not possible
to derive any bound for the bias by using the Euclidean norm. Suppose, for
example, that all functions Xi lie in a low dimensional linear function space
X . Then any structure of α which is orthogonal to X cannot be identified from
the data.
The regularity assumption that we will do on α follows.

(H.1) For some m ∈ N∗ , α is m times differentiable and α(m) ∈ L2 (I).
R R
Then, let D2 = I α(m) (t)2 dt and D3∗ = I α(t)2 dt. By construction of Pm , Pm α
provides the best approximation (in a least squares sense) of α by (discretized)
polynomials of degree m, and p1 ατ Pm α ≤ p1 ατ α −→ D3∗ as p → ∞. Let D3
denote an arbitrary constant with D3∗ < D3 < ∞. There then exists a p1 ∈ N
with p1 ≥ p0 such that p1 ατ Pm α ≤ D3 for all p ≥ p1 .
As noticed before, X1 , . . . , Xn can be either fixed, non-random functions or

an i.i.d. sample of random functions. In any case, expected values and variance
b F LS,X as stated in the theorem will refer to the probability distribution
of α
induced by the random variable . In the case of random design, they stand
for conditional expectation given X1 , . . . , Xn .
Theorem III.1. — Under assumption (H.1) and the definitions of D1 , D2 ,

D3 , p1 , we obtain for all n ∈ N, all p ≥ p1 and every matrix X ∈ Rn × Rp

1 τ
(III.23) b F LS,X ) −
kE(α αk2ΓX,n,p ≤ρ α Pm α + D 2 ≤ ρ (D3 + D2 ) ,
p
III.3.3. TOTAL LEAST SQUARES METHOD FOR FUNCTIONAL COVARIATES 99
as well as
1 σ2
(III.24) E kα b F LS,X ) k2 ≤ D1 .
b F LS,X − E(α
p nρ
Remark When adding some additional constraint like

(H.2) sup sup |Xi (t)| ≤ D4 < +∞,
i=1,...,n t∈[0,1]
with D4 independant of n, or when (H.2) is almost surely satisfied in the case

of a random design, then we have
D2
E kα b F LS,X ) k2ΓX,n,p ≤ 4 E kα
b F LS,X − E(α b F LS,X ) k2 ,
b F LS,X − E(α
p
and the theorem implies that

b F LS,X − αk2ΓX,n,p = OP n−1/2 ,
kα
if ρ = ρn ∼ n−1/2 as n → ∞. This rate obviously compares favorably to

existing rates in the literature.
III.3.3. Total Least Squares method for functional covariates
We address now the estimation of α from noisy covariates. At first, let us

describe how the T LS method works in the case of a covariate belonging to
Rp . In that case, we have
Yi = Xτi α + i , i = 1, . . . , n,
and
Wi = Xi + δ i , i = 1, . . . , n,
where α, Xi , Wi and δ i are vectors of Rp . The T LS approach relies on the

simultaneous estimation of α and Xi by considering the minimization problem
(see for example Van Huffel and Vandewalle, 1991)
( n
)
1 X
(III.25) min (Yi − Xτi a)2 + (Xi − Wi )τ (Xi − Wi ) .
The T LS algorithm solving (III.25) is given in Van Huffel and Vandewalle

(1991). In some cases, the singular values of the matrix W can quickly de-
crease to zero, and the minimization problem (III.25) is then ill-conditioned.
A possible way to circumvent this problem is to introduce a regularization in
(III.25), and the minimization problem we consider is then
( n
1 X
min (Yi − Xτi a)2 + (Xi − Wi )(Xi − Wi )τ
)
(III.26) +ρaτ Lτ La ,
where L is a p × p matrix and ρ is a regularization parameter allowing to

deal with the ill-conditioning of the design matrix W τ W (see Golub, Hansen
and O’Leary, 1999). Indeed, the T LS solution in a ∈ Rp to the minimization
problem (III.26) is given by
−1
(III.27) b T LS,pen = Wτ W + ρLτ L − σk2 Ip
α Wτ Y,
where σk is the smallest non-zero singular value of the matrix (W, Y) and Ip
is the p × p identity matrix.
In our functional situation, we consider model (III.15) and using the same
matricial notations as in the previous section, we write
W = X + δ,
III.3.3. TOTAL LEAST SQUARES METHOD FOR FUNCTIONAL COVARIATES 101
where W and δ are the n × p matrices with respective general terms Wi (tj )
and δij . So, the minimization problem we consider now is the following one :
we are looking for an estimation α b ∗F T LS of α, solution of the minimization
problem
( n
" 2 # )
1X 1 1 ρ
(III.28) min Yi − Xτi a + kXi − Wi k2 + aτ Am a ,
a∈R ,Xi ∈Rp
p
n i=1 p p p
where the matrix Am is the one introduced previously. Now, with these nota-
tions, we have the following result.
Proposition III.6. — The solution in a ∈ Rp of the minimization problem

(III.28) is given by
−1
1 1 ρ
(III.29) b ∗F T LS
α = 2
Wτ W + Am − σk2 Ip Wτ Y,
np np p
where σk2 is the smallest non-zero eigenvalue of the matrix
τ
1 W W ρ Am 0
,Y ,Y + .
n p p p 0 0
In equation (III.29), computational problems can appear due to the

τ value of

2 1 W W
σk which may be close to zero. Indeed, the eigenvalues of n p , Y p
,Y
are known to decrease rapidly to zero, and this can of course cause numerical
problems with the computation of σk2 . Nevertheless, we can circumvent this
problem using the following result.
Proposition III.7. — Suppose that for some constant D5 > 0 independant

of n and p,
4

(H.3) E δir ≤ D5 .
Then, if moreover (H.2) holds, we have

1 τ 1 τ σδ2
(III.30) W W = X X + Ip + R,
np2 np2 p2

1
where R is a matrix such that kRk = OP n1/2 p
, k.k being the usual norm of
a matrix.
The last problem is that σδ2 is not always known. There are several ways to
estimate it. We choose to use the estimator presented in Gasser, Sroka and
Jennen-Steinmetz (1986) and given by (as we are in the case of equispaced
measurement points)
n p−1
1X 1 X
(III.31) bδ2
σ = [Wi (tj−1 ) − Wi (tj ) + Wi (tj+1 ) − Wi (tj )]2 .
n i=1 6(p − 2) j=2
This leads us to change the former estimator of α given by (III.29) and to take
instead
−1
1 1 τ ρ bδ2
σ
(III.32) b F T LS
α = W W + A m − Ip Wτ Y,
np np2 p p2
bF T LS .
and again a corresponding estimator of α is provided by α
b F T LS is given in the following theorem.

The asymptotic behavior α
Theorem III.2. — Under assumptions (H.1) - (H.3), if we assume moreover

that Yi and δij are independent for all i = 1, . . . , n, j = 1, . . . , p and that there
exists
a constant 0 < D6 < +∞ such that, for all n and p sufficiently large

p1/2 np1 2 Xτ Xα ≥ D6 , then we have

1 1
(III.33) b F T LS − α
kα b F LS,X kΓX,n,p = OP + .
n1/2 p1/2 ρ1/2 n1/2
III.3.4. SOME COMMENTS 103
III.3.4. Some comments
b2
σ
(i) In the expression (III.32) of the estimator of α, the term − p2δ Ip acts as a
deregularization term. It allows us to deal with the bias introduced by the fact
that we only know the matrix W instead of the “true” one X.

(ii) In theorem III.2, the hypothesis p1/2 np1 2 Xτ Xα ≥ D6 means (in the case
of the random design) that α does not belong to the kernel of the covariance
operator ΓX .
(iii) An immediate corollary of theorems III.1 and III.2 is

1 1 1
b F T LS −
kα αk2ΓX,n,p = OP +ρ+ + .
nρ npρ n
If we compare these terms, we can see that, for p large enough, it remains

1
b F T LS −
kα αk2ΓX,n,p = OP +ρ ,
nρ
and then, for ρ = ρn ∼ n−1/2 ,

b F T LS − αk2ΓX,n,p = OP n−1/2 .
kα
This means that we obtain the same upper bound for the convergence speed of
the F T LS estimator as the F LS estimator using the true curves X1 , . . . , Xn .
This result is in accordance with the intuition. The estimation will be improved
for a high number p of discretization points.
(iv) Let us see what happens for the F LS estimator using the noisy curves
W1 , . . . , Wn . The estimator of α is then given by
−1
1 1 ρ
(III.34) b F LS,W
α = 2
W τ W + Am Wτ Y.
np np p
A calculus analogous to the one used in the proof of theorem III.2 leads us to

1 1
b F LS,W − α
kα b F LS,X kΓX,n,p = OP + ,
n1/2 p1/2 ρ1/2 n1/2
that is to say we have the same upper bound of convergence speed for α b F LS,W
and αb F T LS . Nevertheless, the results obtained in the simulations allow us to
think that we improve the estimation (see last remark) using the F T LS esti-
mator instead of the F LS estimator with the noisy curves W1 , . . . , Wn .
(v) Using some heuristic arguments to expand the mean quadratic error of
estimation of α (similarly to what is done in Cardot, 2000), we can see that
it is generally better to consider the F T LS estimator compared to the F LS
one with the variable W . More precisely, using the same notations as before,
let us denote
−1
1 1 ρ
b
α(λ) = 2
Wτ W + Am − λIp Wτ Y,
np np p
where λ is a positive real number such that the matrix np1 2 Wτ W + ρp Am − λIp
is positive definite. Then we have the following result, which proof is given in
the last section.
Proposition III.8. — Let M ISE(λ) = E [(α(λ)b − α)τ (α(λ)

b − α)].
If we
τ −1 1
assume that (W W) exists and if ρ kAm k is negligible compared to np W τ W ,
then we have
∂
M ISE(λ)|λ=0 < 0.
∂λ
III.3.5. A SIMULATION STUDY 105
In other words, this result means that it is advantageous to put a deregulari-

zation term −λIp (with a small positive λ) in order to improve the quality of
the estimation relatively to the M ISE criterion.
III.3.5. A simulation study
Presentation of the simulation. — The aim of this simulation is to eva-

luate the performances of our estimator α b F T LS , and to compare it with αb F LS,W .
We also compare α b F T LS to α
b F LS,W
f , which is given by the same formula (III.34)
where the curve W is now replaced by a smoothed version W f . We can think
that this smoothing step has a correcting effect on the noisy curve W , and then
this smoothed curve W f can be expected to be closer than W to the unknown
“true” curve X. This gives us the intuition that the estimator α b F LS,W
f should
be better than α b F LS,W . To obtain a smoothed version W f of W , we choose

to use the Nadaraya-Watson kernel estimator (see for example Härdle,1991
or Sarda and Vieu, 2000). In the simulations, the kernel is the standard nor-
mal kernel. For the bandwidth we have tried at first a value chosen by cross
validation for each curve (see Härdle,1991). We have also tried several other
bandwidth values applying to this cross-validated bandwidth a decreasing or
increasing factor. In order to synthesize results, we only give the simulation re-
sults when X is non-random (when X is random, the simulation we have done
lead to the same kind of conclusions). We have simulated N = 100 samples,
each being composed of n = 200 observations (Wi , Yi )i=1,...,n from the model
given by (III.12) and (III.13), where the fixed design curves X1 , . . . , Xn are
defined on I = [0, 1] by

10 sin (2πit) if i is even,
Xi (t) =
10 cos (2πit) if i is odd,
similarly to what is used for the simulation in Cuevas, Febrero and Fraiman.
Each sample is randomly split into a learning sample of length nl = 100 (this
sample is used to build the estimator) and a test sample of length nt = 100 (this
sample is used to see the quality of the estimator by the way of computation
of error terms). We made simulations for different numbers of discretization
points, p = 50, p = 100 and p = 200. Two functions α were considered,
either α(t) = 10 sin (2πt) or α(t) = 10 sin3 (2πt3 ). Finally, the error terms
were chosen as follows : ∼ N (0, σ2 ) with σ = 0.2 and δij ∼ N (0, σδ2 ) for
all i = 1, . . . , n, j = 1, . . . , p with either σδ = 0.1, σδ = 0.2 or σδ = 0.5.

Concerning the parameters of the spline functions, the order of differentiation
in the penalization is fixed to the value m = 2. The most important parameter
to choose is the smoothing parameter value ρ (see Marx and Eilers, 1999). We
present in the next subsection a criterion allowing to select reasonable values
and we check the effectiveness of this criterion in the simulations.
Generalized Cross Validation criteria. — In the setting of the estimation

of a function f by smoothing splines described in section 2, the most popu-
lar method for the selection of ρ is generalized cross-validation (see Wahba,
1990). The first idea is to use cross-validtion that is to choose the ρ that yields
the best prediction (in a mean squares sense) when prediction of a value is
done with the remaining observations. After this, a computational simplifica-
tion of the cross-validation criterion has been proposed in the literature that
leads to the generalized cross-validation : see Wahba (1990). In our Functional
Least Squares estimation, we can easily adapt this generalized cross-validation
(GCV ) in the following way. The GCV criterion is defined by
n
1X
(Yi − Ybi )2
n i=1
(III.35) GCVF LS,W (ρ) = 2 ,
1
1 − Tr(HF LS,W (ρ))
n
where HF LS,W (ρ) is the “hat matrix” given by
−1
1 1 τ ρ
HF LS,W (ρ) = W W W + Am Wτ ,
np np2 p
b = HF LS,W (ρ)Y. Then, we select the optimal parameter ρGCV as the one
and Y
that minimizes the GCV criterion (III.35). The criterion (III.35) is a direct
adaptation of the one introduced in Wahba (1990) except that the “hat matrix”
has been changed for our setting.
Concerning the Functional Total Least Squares estimation, although Cross

Validation has already been studied in Sima and Van Huffel (2004), what we
want to propose here is a generalization of the GCV criterion (III.35), in the
III.3.5. A SIMULATION STUDY 107
following way. The prediction of Yi for i = 1, . . . , n is slightly different in the

context of T LS. The estimation of the unknown Xi , noted X b i , is given by
b τ Wi
Yi − p1 α
(III.36) b i = Wi +
X b
α,
b 2
1 + 1p kαk
obtained as in Fuller (1987) by differentiating equation (III.28) with respect

to Xi . Then, we take Ybi = hα,
b Xb i ip as the prediction of Yi . Then, the proposed
GCV criterion is given by
n
1X b i ip )2
b F T LS , X
(Yi − hα
n i=1
(III.37) GCVF T LS (ρ) = 2 ,
1
1 − Tr(HF T LS (ρ))
n
where HF T LS (ρ) is the “hat matrix” given by
−1
1 1 τ ρ bδ2
σ
HF T LS (ρ) = W W W + A m − Ip Wτ
np np2 p p2
Then, the optimal parameters ρGCV as obtained by minimizing the GCV cri-
terion (III.37). In our simulations, these GCV criteria have been computed for
ρ over a grid taking its values among 10−2 , 10−3 , . . . , 10−8 .
Results of the simulation. — We use two error criteria to see the quality of
the prediction. The first one is the relative mean square error of the estimator
of α, given by
Pp
j=1 b j ) − α(tj )]2
[α(t
(III.38) E1 = Pp 2
,
j=1 α(tj )
and the second one is the mean square error of the prediction of Y, given by
1 X b 2
n
(III.39) E2 = Yi − Y i .
n i=1
These errors, evaluated on N = 100 simulated test samples, are given in tables
1 and 3 for the different values of p and the different functions α. We have
computed the F LS estimator of α using the unknown true curves X (in order
to have a reference), the observed curves W and the smooth version W f of the
observed curves W . Adopting the strategy described in 5.1, we have selected
bandwidth values for W f leading to best results in term of estimation of α.
Indeed, for the first example, it appears to us that undersmoothing the curves
W had some benefit on the estimation of α whereas the cross-validated band-
width was selected for the second example. Although it is beyond the scope
of this paper, we note that this problem of the choice of the bandwidth in the
“smoothing” procedure has to be investigated more deeply since it is a crucial
problem as it is pointed out for instance in Chiou, Müller and Wang (2003).
We can see that the F T LS estimator always improves the prediction com-
pared to F LS, W , and the improvement is really interesting when p is small
with a relatively important noise level σδ . We can also see that the estima-
tors F T LS and F LS, W f are quite close. F LS, W
f seems to be better when the
noise level σδ is small whereas F T LS seems to be better when this noise level
becomes high. Nevertheless, it is important to note that the F T LS estimator
is faster to compute compared to the F LS, W f one. Moreover, the choice of
the parameter h is not evident (choosing it by cross validation is not always
the best thing to do whereas it implies additional long computation times,
particularly when n is large).
Moreover, it has to be noticed that the prediction is also improved when

the number of discretization points increases. We can also see that the error
increases between table 1 and table 3, mainly because of the shape of the
second function α, which is less smooth than the first one.
Table 2 gives the estimated values of σδ using the estimator defined by

(III.31) and given in Gasser, Sroka and Jennen-Steinmetz (1986). We can see
that we get good estimations of σδ , and an increasing accuracy with the number
of discretization points. It also seems that the quality of the estimation is not
much related to the value of σδ . Finally, we have plotted on figure 1 an example
of the estimation of α in the case where p = 100 and σδ = 0.5, in the case
III.3.6. PROOF OF THE RESULTS 109
of the function α(t) = 10 sin3 (2πt3 ). In order not to have too many curves
on a same graphic, we choose to plot only the estimators F T LS, F LS, X and
F LS, W . This graphic tends to confirm the values given in tables 1 and 3.
E1 E2
σδ = 0.1 σδ = 0.2 σδ = 0.5 σδ = 0.1 σδ = 0.2 σδ = 0.5
p = 50 0.00015 0.00014 0.00013 0.0031 0.0032 0.0032
F LS, X p = 100 0.00009 0.00010 0.00009 0.0027 0.0026 0.0027
p = 200 0.00005 0.00006 0.00004 0.0024 0.0026 0.0025
p = 50 0.00018 0.00061 0.00232 0.0044 0.0067 0.0180
F T LS p = 100 0.00013 0.00065 0.00219 0.0040 0.0063 0.0139
p = 200 0.00009 0.00057 0.00204 0.0035 0.0056 0.0091
p = 50 0.00017 0.00080 0.00245 0.0040 0.0065 0.0209
f
F LS, W p = 100 0.00011 0.00063 0.00226 0.0036 0.0062 0.0154
p = 200 0.00006 0.00056 0.00210 0.0029 0.0056 0.0112
p = 50 0.00020 0.00098 0.00366 0.0050 0.0081 0.0305
F LS, W p = 100 0.00015 0.00079 0.00344 0.0045 0.0072 0.0245
p = 200 0.00011 0.00063 0.00329 0.0039 0.0067 0.0124
Table 1. Error E1 on α given by α(t) = 10 sin (2πt) and error E 2 of prediction.
σδ = 0.1 σδ = 0.2 σδ = 0.5

p = 50 0.1141 0.2075 0.5034
p = 100 0.1011 0.2005 0.5005
p = 200 0.0999 0.1999 0.4999
Table 2. Estimated values of σδ according to the different values of
σδ and the different values of p.
III.3.6. Proof of the results
Proof of theorem III.1. — First consider relation (III.23), and note that
−1
1 1 τ ρ
b F LS,X ) = 2
E (α 2
X X + Am Xτ Xα.
np np p
E1 E2
σδ = 0.1 σδ = 0.2 σδ = 0.5 σδ = 0.1 σδ = 0.2 σδ = 0.5
p = 50 0.0508 0.0509 0.0510 0.0427 0.0426 0.0426
F LS, X p = 100 0.0504 0.0504 0.0503 0.0422 0.0423 0.0424
p = 200 0.0503 0.0502 0.0502 0.0414 0.0414 0.0416
p = 50 0.0513 0.0526 0.0630 0.0439 0.0491 0.0830
F T LS p = 100 0.0509 0.0522 0.0618 0.0434 0.0476 0.0762
p = 200 0.0506 0.0517 0.0607 0.0429 0.0460 0.0735
p = 50 0.0510 0.0525 0.0645 0.0435 0.0490 0.0851
f
F LS, W p = 100 0.0507 0.0520 0.0627 0.0429 0.0475 0.0790
p = 200 0.0504 0.0516 0.0614 0.0422 0.0458 0.0763
p = 50 0.0516 0.0530 0.0850 0.0447 0.0504 0.0960
F LS, W p = 100 0.0512 0.0527 0.0822 0.0442 0.0496 0.0889
p = 200 0.0508 0.0521 0.0799 0.0438 0.0488 0.0834

Table 3. Error E1 on α given by α(t) = 10 sin 3 2πt3 and error E2
of prediction.
b F LS,X ) is solution of the minimization problem

It follows that E (α
( 2 )
1 1 1 ρ
Xα − Xa + aτ Am a .
min
a∈Rp n p p p
b F LS,X ) and a = α,
This implies, comparing this criterion above for a = E (α
2
1 1 1
Xα − XE (α

+ ρ E (α τ ρ τ
b ) b ) A E ( b
α ) ≤ α Am α.
n p
F LS,X F LS,X m F LS,X
p p p
But definition of Am and as well (III.17) lead to
Z Z
1 τ 1 1
α Am α = α τ P m α + (m)
sα (t)2 dt ≤ ατ Pm α + α(m) (t)2 dt,
p p I p I
and (III.23) is an immediate consequence, noticing that

estimation of α(t) = 10sin3(2πt3)

10
5
0
−5
true curve α
estimation of α (FLS,X)
estimation of α (FTLS)
estimation of α (FLS,W)
−10
0.0 0.2 0.4 0.6 0.8 1.0
Figure 1. Estimation of α (solid line) with functional least squares

using X (dashed line), functional least squares using W (dashed and
dotted line) and functional total least squares
(dotted line) in cases
α(t) = 10 sin (2πt) and α(t) = 10 sin 3 2πt3 .
1
b F LS,X ) − αk2ΓX,n,p =
kE (α b F LS,X )k2 .
kXα − XE (α
np2
Relation (III.24) follows from
1 τ
E α b F LS,X − E α b τF LS,X [α b F LS,X − E (αb F LS,X )]
p
−2 !
1 1 τ 1 τ ρ
= E X X X + Am Xτ
p n2 p 2 np2 p
" −2 #
σ2 1 τ 1 τ
= Tr X X + ρAm X X
n np np
" −1 #
σ2 1 τ σ2 σ2
≤ Tr X X + ρAm ≤ Tr (ρAm )−1 ≤ D1 .
n np n nρ
This completes the proof of the theorem III.1.
Proof of proposition III.6. — We have

W δ α
(III.40) ,Y − , = 0,
p p −1
which allows us now to write the minimisation problem (III.28) as follows
( 2 )
1 δ ρ
√ , + a Am a ,
τ
min
(( W
p
,Y ) − ( δ
p
, ))( a
−1 ) =0 n p F
p
where the notation k.kF stands for the usual Frobenius norm, more precisely
kAk2F = Tr (Aτ A) for every matrix A. Then, we are led to consider the
minimization problem
( 2 )

1 δ ρ
(III.41) min √ ,
+ xτ Bm x ,
Cx=Ex n p F
p

W δ a Am 0
with C = p
,Y,E= , x = −1
p
, and Bm = 0 0
. If we denote γ
the (p + 1) × (p + 1) matrix defined by
√ √
γ=
diag(1/ p, . . . , 1/ p) 0 ,
0 0
we have
τ
1 τ τ δ δ 1 τ τ 1
x γ √ , √ , γx = x E Ex = xτ Cτ Cx
n p p n n
τ
1 τ τ W W
= x γ √ ,Y √ , Y γx,
n p p
and then we can see that the quantity
τ
1 τ τ WW ρ
x γ √ , Y γx + xτ Bm x
√ ,Y
n pp p
τ
1 τ τ W W
= x γ √ ,Y √ , Y γx + xτ γ τ (ρBm ) γx
n p p
is minimized for x eigenvector of the matrix
τ
1 τ W W
γ √ ,Y √ , Y γ + γ τ (ρBm ) γ
n p p
τ
1 W W ρ
= ,Y , Y + Bm ,
n p p p
corresponding to the smallest non-zero eigenvalue, which is denoted σk2 . Using

the definition of this eigenvalue, we deduce that
τ
1 W W ρ
,Y , Y + Bm xb = σk2 x
b.
n p p p
This gives, keeping the p first rows,
−1
1 1 ρ
b=
α 2
Wτ W + Am − σk2 Ip Wτ Y,
np np p
and the proof of the proposition III.6 is now complete.
Proof of proposition III.7. — We can write
n
!
1 1 1 X
2
W τ W = 2 Xτ X + Mirs
np np np2 i=1
r,s=1,...,p
where Mirs = Xi (tr )δis + δir Xi (ts ) + δir δis . If we define R as the matrix such
that
n
!
1 X σδ2
Mirs = Ip + R,
np2 i=1 p2
r,s=1,...,p
for every unitary vector u = (u1 , . . . , up )τ ∈ Rp , we get E (Ru) = 0. Moreover,

with hypotheses (H.2) and (H.3), we have, E

E kRuk2 = uτ E (Rτ R) u
p n p
!
1 X XX
= E (Mirj Mijs ) ur us
n2 p4 r,s=1 i=1 j=1
p n p
!
1 X XX
= E (Mirj Mijr ) u2r
n2 p4 r=1 i=1 j=1

1
= O ,
np2
hence we deduce that

1
kRk = OP .
n1/2 p
Proof of theorem III.2. — We can write
−1
1 τ ρ 1 τ
b F T LS − α
α b= X X + Am V+S W Y ,
np2 p np
with
1 1 τ 1 τ
V := Wτ Y − X Y= δ Y,
np np np
−1 −1
1 τ ρ 1 τ ρ
S := 2
X X + Am + T − 2
X X + Am ,
np p np p
and
bδ2 − σδ2
σ
T := R − Ip .
p2
First noticing that
" −1 #
1 τ ρ
E 2
X X + Am V = 0,
np p
we also have, with hypotheses (H.2) and (H.3),
 
1 −1 2
ρ
E  X τ
X + Am V 
np2 p
ΓX,n,p
" −1 −1 #
1 τ 1 τ 1 τ 1 τ
= E 2 Y δ X X + ρAm X X X X + ρAm δτ Y
n p np np np
" −1 #
D8 1 τ
≤ Tr X X + ρAm ,
np np
where D8 < +∞ does not depend on n and p. This allows us to get
−1
1 ρ 1
τ
(III.42) X X + Am V = OP .
np 2 p (npρ)(1/2)
ΓX,n,p
The convergence result given in Gasser, Sroka and Jennen-Steinmetz (1986)

bδ2 of σδ2 defined by (III.31) we have
implies that for the estimator σ

1
(III.43) bδ2
σ = σδ2 + OP .
n1/2 p
Then, using this and the result (III.30) of the proposition III.7, we can write
−1 −1
1 τ ρ bδ2
σ 1 τ ρ bδ2 − σδ2
σ
W W + A m − Ip = X X + A m + R − Ip .
np2 p p2 np2 p p2
Using the result (III.43) and the fact that the norm of Ip is 1, we deduce
2
σ
b − σ 2
1
δ δ
p2 Ip = OP n1/2 p3 .
If we recall the order of kRk given in proposition III.7, we finally obtain

1
kTk = OP .
n1/2 p
Now, if we use the first inequality in Demmel (1992), we can write

1 S 1 W τ Y

p1/2 np
" −1 −1 #
1 1 τ ρ 1 τ ρ
= 1/2 X X + Am + T − X X + Am
p np2 p np2 p
1/2

1 τ τ
× W YY W
2
np 2
F
−1 2 −1

D9 1 τ ρ 1 1
τ Wτ Y ,
≤ 1/2 X X + A m W Y kTk np
p np2 p np
where D9 < +∞ does not depend on n and p. We notice that
−1
1 ρ 1
τ τ
X X + A m W Y = OP (p1/2 ),
np2 p np

and kTk = OP 1/(n1/2 p) . Moreover, with the hypothesis on p1/2 np1 2 Xτ Xα ,
we also have
−1
1
W Y = OP p1/2 ,
τ
np
so we get

(III.44) S 1 W τ Y = OP
1
.
np n1/2
ΓX,n,p
Finally, we combine relations (III.42) and (III.44) to get the result of theorem
III.2.
Proof of proposition III.8. — Let us expand the M ISE(λ),
b τ α(λ))
M ISE(λ) = E (α(λ) b − 2ατ E (α(λ))
b + ατ α,
b
to deduce, using the matricial expression of α(λ)
" −3
∂ 1 τ 1 τ ρ
(III.45) M ISE(λ)|λ=0 = 2E 2 2 Y W 2
W W + Am Wτ Y
∂λ n p np p
−2 #
1 τ 1 ρ
− α W τ W + Am Wτ Y .
np np2 p
Now, using the fact that Y = p1 Wα − 1p δα +

−1
1 τ 1 ρ
Y W 2
W τ W + Am − ατ
np np p
−1 −1
1 τ 1 τ ρ 1 τ τ 1 τ
= Y W W W + Am − 2α W W W W
np np2 p np np2
" −1
1 1 τ τ 1 τ ρ
= α W W W W + Am
np p np2 p
−1 −1 #
1 τ τ 1 ρ 1 ρ
− α δ W W τ W + Am + τ W W τ W + Am
p np2 p np2 p
" −1 #
1 1 τ τ 1 τ
− α W W W W .
np p np2
−1 −1
Considering the quantity np1 2 Wτ W + pρ Am − np1 2 Wτ W , if we make
an approximation at first order, we get
−1 −1
1 τ ρ 1 τ
W W + Am − W W
np2 p np2
−1 −1
1 τ ρ 1 τ
≈ − W W Am W W ,
np2 p np2
what gives us, coming back to relation (III.45)

∂
(III.46) M ISE(λ)|λ=0
∂λ "
−1 −1 !
1 1 ρ 1
≈ 2E − 2 3 ατ Wτ W Wτ W Am Wτ W
n p np2 p np2
−2 #
1 ρ
× W τ W + Am Wτ Y
np2 p
" −3 #
1 1 ρ
+2E − 2 3 ατ δ τ W W τ W + Am Wτ Y
n p np2 p
" −3 #
1 τ 1 ρ
+2E 2 2 W W τ W + Am Wτ Y .
np np2 p
Using the fact that δ and are both independent from W and Y , the last two
terms in relation (III.46) are zero, and we obtain finally
∂
M ISE(λ)|λ=0
∂λ "
−1 −1 !
1 1 ρ 1
≈ 2E − 2 4 ατ Wτ W 2
Wτ W Am 2
Wτ W
n p np p np
−2 #
1 ρ
× 2
W τ W + Am Wτ Wα .
np p
This last quantity is negative, what achieves the proof of proposition III.8.
III.4. RÉGRESSION SUR COMPOSANTES
PRINCIPALES
Dans ce cadre d’une variable explicative bruitée, ce qui précédait visait à

généraliser la méthode des moindres carrés orthogonaux au cas d’une variable
explicative fonctionnelle. Dans cette partie, on va présenter une autre approche,
qui concerne un travail en cours. Cette approche est finalement au départ plus
directe que les moindres carrés orthogonaux (qui traite globalement les courbes
bruitées). L’idée de départ est d’effectuer un lissage (par exemple un lissage à
noyau) de chaque courbe bruitée, puis de produire une estimation du paramètre
fonctionnel à l’aide par exemple d’une régression sur composantes principales.
Rappelons que dans cette partie, le modèle considéré est donné par (2) et
(8) et que le but est de donner une méthode d’estimation de α à l’aide des
observations (W1 , Y1 ), . . . , (Wn , Yn ).
III.4.1. Procédure d’estimation
L’idée de cette procédure de lissage est donc la suivante. Lorsqu’on doit

prendre en compte des courbes bruitées W1 , . . . , Wn , on les lisse dans le but
de construire une estimation des “vraies” courbes X1 , . . . , Xn . On peut alors
f1 , . . . , W
utiliser ces nouvelles courbes lissées W fn pour produire un estimateur de
α au moyen d’une régression sur composantes principales fonctionnelle (voir
les travaux de Cardot, Ferraty et Sarda, 1999, 2003). Cette régression sur
composantes principales fonctionnelle sera néanmoins adaptée, reprenant une
idée utilisée par Kneip et Utikal (2001) ainsi que Benko, Härdle et Kneip
(2005). La procédure d’estimation comprend ainsi deux étapes qui vont être
détaillées dans ce qui suit.
122 III.4. RÉGRESSION SUR COMPOSANTES PRINCIPALES
Étape 1 : lissage des courbes bruitées. — Cette première étape consiste

à lisser les observations bruitées W1 , . . . , Wn à l’aide d’un estimateur à noyau
de type Nadaraya-Watson, introduit à l’origine simultanément par Nadaraya
(1964) et Watson (1964). Pour plus de détails sur cet estimateur à noyau tant
d’un point de vue théorique qu’appliqué, on renvoie à Härdle (1991) ou Sarda
et Vieu (2000). Plus précisément, on définit, pour i = 1, . . . , n et pour t ∈ [0, 1],
p
X
t − tj
Wi (tj )K
hi
(III.47) fi (t) = j=1
W .
p
X t − tj
K
j=1
hi
Dans cette expression (III.47), la fonction K, paire et d’intégrale égale à 1, est

appelée noyau et le nombre réel hi > 0 est appelé largeur de fenêtre. C’est ce
paramètre qui permet de contrôler le lissage de la courbe estimée W fi , alors que
le choix du noyau est moins fondamental, si ce n’est que l’estimateur construit
hérite des propriétés de régularité du noyau choisi (continuité, dérivabilité, . . . ).
Le choix de la largeur de fenêtre étant très important, de nombreux travaux
ont été réalisés pour permettre de déterminer ce paramètre en pratique, comme
par exemple la validation croisée (voir Härdle ,1991).
Étape 2 : régression sur composantes principales fonctionnelle. —

Cette étape consiste à construire une estimation de α au moyen d’une régression
sur composantes principales fonctionnelle (voir Cardot, Ferraty et Sarda, 1999,
2003) en utilisant comme variable explicative la version lissée W fi de la courbe
bruitée Wi . Cette méthode est basée sur la diagonalisation de l’opérateur de
covariance empirique ΓX,n associé à X. On note (λr )r≥1 la suite des valeurs
propres de ΓX,n (rangées par ordre décroissant) et (gr )r≥1 la suite de fonctions
propres associées, telles que pour tout r ≥ 1,
ΓX,n gr = λr gr .
En introduisant les coefficients

III.4.1. PROCÉDURE D’ESTIMATION 123
(III.48) ηir = hXi , gr i,
pour tout i = 1, . . . , n et pour tout r ≥ 1, on a alors
n
X
ηir = 0,
i=1
pour tout r ≥ 1 et
n
X
ηir ηis = λr 11[r=s],
i=1
pour r, s ≥ 1, avec 11[r=s] = 1 si r = s et 0 sinon. On utilise alors l’dée suivante

provenant de Kneip et Utikal (2001), idée également reprise dans Benko, Härdle
et Kneip (2005). Pour déterminer les fonctions propres gr , r ≥ 1, il n’est pas
nécessaire de passer par l’opérateur ΓX,n . En effet, on peut considérer à la place
la matrice M de taille n × n définie par
1
(III.49) M i1 i2 = hXi1 , Xi2 i,
n
pour i1 , i2 = 1, . . . , n. L’avantage d’utiliser cette matrice est que l’on estime

les produits scalaires entre les courbes, donc des nombres réels. On obtiendra
ainsi des vitesses de convergences avec un biais en h2i et une variance en 1/n.
L’étude de ces vitesses sera détaillée dans la section suivante. En utilisant des
résultats d’algèbre sur la diagonalisation de matrices (voir par exemple Good,
69), on montre que les valeurs propres non nulles de ΓX,n et les valeurs propres
de M sont les mêmes. De plus, si on note pr = (p1r , . . . , pnr )τ le vecteur propre
de M associé à la valeur propre λr , on a la relation
p
(III.50) ηir = λr pir ,
pour tout i = 1, . . . , n et pour tout r ≥ 1 tel que λr > 0. On obtient alors les
fonctions propres gr , pour r ≥ 1, par
n Pn
1 X i=1 ηir Xi
(III.51) gr = √ pir Xi = P n 2
.
λr i=1 i=1 ηir
Ainsi, en utilisant ce qui a été fait dans la première étape, on construit une
estimation de la matrice M définie par (III.49), en utilisant les estimations
fi de Xi , pour i = 1, . . . , n. L’estimateur le plus naturel M
W c de M semble
être la matrice de taille n × n et de terme général M ci1 i2 = hW
1 fi1 , W
fi2 i, pour
n
i1 , i2 = 1, . . . , n. Cependant, comme cela a été souligné, on cherche à estimer
ici des produits scalaires entre des courbes et non les courbes elles-mêmes. Si
on regarde l’estimateur hW fi1 , W
fi2 i lorsque i1 = i2 = i, on remarque que ce
terme s’écrit

XX p p Z t−tj1
K hi K hi
t−tj2
fi , W
hW fi i = Wi (tj1 )Wi (tj2 ) hP i2 dt
I p t−tj3
j3 =1 K
j1 =1 j2 =1
hi

X p
X p Z K t−t j1
K
t−tj2
hi hi
= Wi (tj1 )Wi (tj2 ) hP i2 dt
I p t−tj3
j3 =1 K
j1 =1 j2 =1
j2 6=j1 hi
2
X p Z K hij1
t−t
2
+ Wi (tj1 ) hP i2 dt.
I p t−tj3
j1 =1 j3 =1 K hi
Ainsi, le terme
2
X p Z K
t−tj1
hi
Wi (tj1 )2 hP i2 dt
I p t−tj3
j1 =1
j3 =1 K hi
produit un biais dans l’estimation de Mii . L’idée est donc de le supprimer dans
l’estimateur. Cette idée a été utilisée par Kneip et Utikal (2001) dans le cadre
III.4.1. PROCÉDURE D’ESTIMATION 125
de l’estimation de densités puis par Benko, Härdle et Kneip (2005) concernant

l’analyse en composantes principales fonctionnelle. On se rend compte cepen-
dant que cette idée avait déjà été proposée auparavant par Hall et Marron
(1987) et Jones et Sheater (1991) dans le cadre de l’intégration de densités.
Concernant l’estimation de l’intégrale du carré de fonctions de régression (qui
nous intéressera plus particulièrement ici), on peut citer les travaux de Ben-
henni et Cambanis (1992), Ruppert, Sheater et Wand (1993), ainsi que Huang
et Fan (1999). Ceci permet de gagner au niveau du biais de l’estimation de la
matrice M. Dans la suite, on considère donc l’estimateur, pour i1 , i2 = 1, . . . , n,


 X p
X p Z K
t−tj1
K
t−tj2

 1 hi 1 hi

 Wi1 (tj1 )Wi2 (tj2 ) hP i hP 2 i dt,

 n j =1 j =1 I
p
K
t−t j3 p
K
t−tj4

 1 2 j3 =1 hi 1 j4 =1 hi 2




ci1 i2
M = si i1 6= i2 ,





 Z K t−tj1 K t−tj2

 1 X p
X p

 Wi1 (tj1 )Wi2 (tj2 ) hP 1
hi hi 2

 i2 dt, si i1 = i2 .
 n j1 =1 j2 =1
 I p
j3 =1 K
t−tj3
j2 6=j1 hi 1
De plus, en ayant à l’esprit que l’on estime des produits scalaires plutôt que
des courbes, il semble plus approprié de choisir une même largeur de fenêtre
hi1 i2 pour l’estimation du produit scalaire entre la paire de courbes {Xi1 , Xi2 },
pour i1 , i2 = 1, . . . , n. Une légère modification de l’estimateur ci-dessus nous
donne alors
(III.52) 

 p p Z K
t−tj1
K
t−tj2

 1 XX hi i hi 1 i 2

 Wi1 (tj1 )Wi2 (tj2 ) hP 1 2 i2 dt, si i1 6= i2 ,

 n j =1 j =1 p t−tj3

 1 2
I K
 j3 =1 hi i
1 2
c
M i1 i2 =


 p p Z K t−tj1 K t−tj2

 1 X X hi i hi 1 i 2

 Wi1 (tj1 )Wi2 (tj2 ) hP 1 2 i2 dt, si i1 = i2 .

 n j =1 j =1 t−tj3
 I p
 1 2 j3 =1 K hi i
j2 6=j1 1 2
Avec cette estimation M c de M, on calcule les valeurs propres b λr et les vecteurs

propres pbr correspondants, pour r = 1, . . . , n. On en déduit les estimations ηbir
et gbr de ηir et gr en utilisant les relations (III.50) et (III.51). Finalement, on
construit l’estimateur de α donné par l’approximation d’ordre L ≥ 1, c’est-à-
dire en utilisant les L premières composantes principales (voir Cardot, Ferraty
et Sarda, 1999, 2003). Notre estimateur est ainsi donné par
L n
1 X X Yi f
(III.53) bL =
α hW , gb ib
g.
br i r r
n r=1 i=1 λ
III.4.2. Intégrale du carré de la régression
Cette partie est relativement indépendante de ce qui précède. Son but est
d’établir des résultats concernant le biais et la variance d’un estimateur à noyau
de l’intégrale du carré d’une fonction de régression. Ce résultat sera ensuite
utilisé pour en déduire le biais et la variance concernant l’estimation de la
matrice M par (III.52). Le cas qui nous intéresse est celui d’un modèle à effets
fixes (t1 , . . . , tp sont fixés et forment une subdivision régulière de [0, 1]). On
considère donc ici le modèle
Yi = r(xi ) + i ,
pour tout i = 1, . . . , n, avec E(i ) = 0, E(2i ) = σ 2 , et r ∈ L2 ([0, 1]). On suppose

que x1 , . . . , xn forment une subdivision régulière de [0, 1]. L’estimateur à noyau
de r (de Nadaraya-Watson) s’écrit
Pn xi −x

Yi K
rb(x) = Pi=1
n
h
xi −x .
i=1 K h
Le but final est d’estimer l’intégrale de r 2 sur [0, 1] (voir cas 3 ci-après). On va
cependant profiter de cette étude pour donner d’autres résultats pouvant être
utiles (cas 1 et cas 2). Les preuves de ces résultats sont relativement techniques
et seront données dans l’annexe (voir partie V.2.).
III.4.2. INTÉGRALE DU CARRÉ DE LA RÉGRESSION 127
Cas1. — On souhaite dans ce premier cas estimer la quantité
Z 1
θ= r(x)α(x)dx,
0
où α est une fonction donnée de L2 ([0, 1]). On considère pour cela l’estimateur
n Z
X 1
K xih−x α(x)
θb = Yi Pn
xi −x dx.
i=1 0 i=1 K h
On suppose que K est un noyau d’ordre 2 à support sur [−1, 1] et on notera

R1 R1
µ2 (K) = −1 s2 K(s)ds et R(K) = −1 K(s)2 ds. On supposera de plus que n et
h sont tels que nh −→ +∞. Le biais et la variance de θ sont donnés dans les
propositions III.9 et III.10.
Proposition III.9. — On a
Z 1
(III.54) b − θ = µ2 (K)
E(θ) r (x)α(x)dx h2 + o(h2 ).
00
2 0
Z 1 Z 2
b 2 2 1 1
(III.55) V θ = 2σ α(x) dx Ψ(z)dz +o ,
0 0 n n
avec
Z 1
Ψ(z) = K(s)K(s − z)dz.
−1+z
Cas2. — On suppose maintenant qu’on dispose d’autres observations Z1 , . . . , Zn

telles que, pour tout i = 1, . . . , n,
Zi = s(xi ) + δi ,
avec E(δi ) = 0, E(δi2 ) = τ 2 , Yi indépendant de Zi , i indépendant de δi , et

s ∈ L2 ([0, 1]). On souhaite cette fois estimer la quantité
Z 1
θ= r(x)s(x)dx.
0
On considère pour cela l’estimateur
n X
n Z x −x
X 1
K xih−x K jh
θb = Yi Z j Pn Pn xi −x
x −x dx.
i=1 j=1 0 i=1 j=1 K h
K jh
Les hypothèses et notations sur K sont les mêmes qu’au cas précédent. Le
biais et la variance de θ sont donnés dans les propositions III.11 et III.12.
Z 1
(III.56) b − θ = µ2 (K)
E(θ) [r(x)s (x) + r (x)s(x)] dx h2 + o(h2 ).
00 00
2 0
Z 1 Z 1
(III.57) b
V θ = 2 τ2 2
r(x) dx + σ 2
s(x) dx 2
0 0
Z 2
1 1
× Ψ(z)dz +o .
0 n n
Cas3. — Avec les mêmes notations qu’au cas 1, on considère l’estimation de
Z 1
θ= r(x)2 dx.
0
On introduit pour cela l’estimateur

III.4.3. RÉSULTATS ASYMPTOTIQUES 129
Pn Pn x −x
Z 1 i=1 Yi Yj K xih−x K jh
j=1
θb =
j6=i
Pn
xi −x 2
dx
0 i=1 K h
n X n Z 1 x −x
X K xih−x K jh
= Yi Yj Pn dx.
xi −x 2
i=1 j=1 0 i=1 K h
j6=i
Le biais et la varaince de θ sont donnés dans les propositions III.13 et III.14.

Z 1
(III.58) b − θ = µ2 (K)
E(θ) r(x)r (x)dx h2 00
0
Z 1
2 1 2 1
−R(K) r(x) dx +o h + .
0 nh nh
Z 1 Z 2
b = 8σ 2 2 1 1
(III.59) V(θ) r(x) dx Ψ(z)dz +o .
0 0 n n
III.4.3. Résultats asymptotiques
Les résultats que l’on va établir ici sont directement inspirés des travaux
de Kneip et Utikal (2001) et Benko, Härdle et Kneip (2005). Les preuves des
résultats qui suivent sont elles aussi directement inspirées de ces travaux, elles
seront données dans l’annexe de la thèse (voir partie V.3.).
Proposition III.15. — Pour tous i1 , i2 = 1, . . . , n, si on prend hi1 i2 de la
forme p−ζ avec ζ ∈ [1/4, 1/2[, on a

ci1 i2 − Mi1 i2 = OP 1
M .
np1/2
Proposition III.16. — Si on note k.k la norme matricielle euclidienne usuelle,

on a
2
c 1
M − M = O P .
p
Les deux propositions suivantes donnent le comportement des valeurs propres

et des vecteurs propres de Mc par rapport à ceux de M.
Proposition III.17. — On suppose que les valeurs propres de M sont telles

que, pour tout r = 1, . . . , L, il existe des constantes 0 < C1r < +∞ et 0 <
C2r ≤ C3r < +∞ vérifiant
min |λr − λs | ≥ C1r ,

s=1,...,n,s6=r
et
C2r ≤ λr ≤ C3r .
Alors, on a, pour tout r = 1, . . . , L,

b r − λr = O P 1 1
λ + .
n1/2 p1/2 p
Proposition III.18. — Sous les mêmes hypothèses qu’à la proposition III.17,

on a

1
kb
pr − p r k = O P .
p1/2
III.4.4. PERSPECTIVES 131
III.4.4. Perspectives
Ce début de travail donne des résultats assez encourageants. D’un point de

vue théorique, il faut maintenant prouver un résultat de convergence concer-
nant l’estimateur α bL de α défini par (III.53), ce qui devrait être raisonnable-
ment envisageable au vu des résultats de convregence précédents concernant
les valeurs propres et les vecteurs propres de l’estimateur de la matrice M.
D’un point de vue un peu plus appliqué, il reste à faire tout un travail sur le
choix du nombre L de composantes principales. Une méthode par validation
croisée peut à priori être envisagée, alors que Kneip et Utikal (2001) proposent
une procédure de test dans leur contexte d’estimation de densités. Enfin, il
paraı̂t assez intéressant de comparer cette procédure d’estimation avec celle
des moindres carrés orthogonaux.
PARTIE IV
APPLICATION À LA
PRÉVISION DE PICS DE
POLLUTION
IV.1. PRÉVISION PAR LES QUANTILES
CONDITIONNELS
Dans ce premier chapitre, on s’intéresse à l’estimation spline de quantiles

conditionnels qui a été présenté dans la partie I de la thèse. Comme cela a été
souligné, le problème de minimisation (I.7) ne possède pas de solution explicite.
On adopte ici une stratégie déjà utilisée par Lejeune et Sarda (1988). Elle est
basée sur un algorithme itératif (appelé algorithme des moindres carrés itérés
pondérés) qui consiste, à chaque étape de l’algorithme, à résoudre un problème
de moindres carrés pondérés (voir Ruppert et Caroll, 1988). Cette étude pra-
tique a fait l’objet d’un chapitre de e-book dont le but était de faire de la
prévision de pics de pollution. Ce chapitre de e-book, ainsi que la présentation
de données de pollution étudiées sont donnés dans les chapitres IV.3 et IV.4
de la thèse.
IV.1.1. Algorithme d’estimation
Considérons tout d’abord la quantité
ωi (α) = 2α11{Yi −hBτk,q θ,Xi i≥0} + 2(1 − α)11{Yi −hBτk,q θ,Xi i<0} ,
où 11 désigne la fonction indicatrice d’un ensemble. En utilisant cette notation,

on peut écrire le problème de minimisation (I.7) sous la forme
136 IV.1. PRÉVISION PAR LES QUANTILES CONDITIONNELS
( n
)
1X 2
(IV.1) min ωi (α) Yi − hBτk,q θ, Xi i + ρ (Bτk,q θ)(m) L2 .
θ∈Rk+q n i=1
Le principe de l’algorithme des moindres carrés itérés pondérés consiste alors

à remplacer la valeur absolue par un terme quadratique pondéré. On obtient
ainsi, à chaque étape de l’algorithme, une expression explicite de la solution
du problème de minimisation. On décrit l’algorithme ci-dessous.
b(1) solution du problème de minimisation

• Initialisation : on détermine θ
( n
)
1X 2 2
min Yi − hBτk,q θ, Xi i + ρ (Bτk,q θ)(m) L2 ,
θ∈Rk+q n i=1
dont l’expression explicite est donnée par

−1
b(1) = 1
θ
1 τ
D DX + ρGk DτX Y,
n n X
où DX est la matrice n×(k+q) de terme général hBj , Xi i pour i = 1, . . . , n
et j = 1, . . . , (k + q), Gk est la matrice (k + q) × (k + q) de terme général
(m) (m)
hBj , Bl i pour j, l = 1, . . . , (k + q) et Y = (Y1 , . . . , Yn )τ .
• Étape r + 1 : connaissant θ b (r) , on détermine θ
b (r+1) solution du problème
de minimisation
 

1 X n (r) τ
2 

ωi (α) Yi − hBk,q θ, Xi i τ 2
min h
i1/2 + ρ (Bk,q θ)
(m)
,
 n i=1 Yi − hBτ θ, Xi i2 + η 2
θ∈Rk+q  L2 

k,q
où η 2 est une constante strictement positive que l’on se fixe pour éviter
(r)
un dénominateur nul, et ωi (α) est ωi (α) à l’étape r de l’algorithme, à
savoir
(r)
ωi (α) = 2α11{Y −hBτ b(r) ,Xi i≥0} + 2(1 − α)11{Y −hBτ b (r) ,Xi i<0} .
i k,q θ i k,q θ
En définissant la matrice Ω(r) de taille n × n, diagonale dont les éléments

diagonaux sont donnés, pour tout i = 1, . . . , n, par
IV.1.2. CHOIX DES PARAMÈTRES 137
(r)
ωi (α)
,
[(Yi − hBτk,q θ, Xi i)2 + η 2 ]1/2
on obtient la solution du problème de minimisation de l’étape r +1 donnée
par
−1
b (r+1) 1 1 τ (r)
θ = D Ω DX + ρGk DτX Ω(r) Y.
n n X
• Critère d’arrêt : on décide d’arrêter l’algorithme lorsque
(r+1)
C − C (r) < err,
où la quantité err est fixée et où C (r) est défini par
1X
n 2
C (r) = b(r) , Xi i) + ρ
lα (Yi − hBτk,q θ b (r) )(m)
(Bτk,q θ 2.
n i=1 L
IV.1.2. Choix des paramètres
La procédure d’estimation qui vient d’être présentée dépend de beaucoup

de paramètres : le nombre de nœuds k, le degré des fonctions splines q, l’ordre
de dérivation dans la pénalisation m, et le paramètre de lissage ρ. Outre le
nombre de nœuds k, le choix du paramètre ρ est crucial pour donner une
bonne estimation de θ (voir à ce sujet Marx et Eilers, 1999). On fixe donc
m = 2, q = 3 (splines cubiques) et k = 8. Le paramètre ρ est quant à lui
fixé par validation croisée généralisée (voir Wahba, 1990) à chaque étape de
l’algorithme des moindres carrés itérés pondérés. Avant de donner ce critère,
on revient à la validation croisée ordinaire. À l’étape r de l’algorithme des
moindres carrés itérés pondérés, on peut définir le critère de validation croisée
par
1 X 2
n
CV (r)
(ρ) = b(r),[−i] , Xi i ,
Yi − hBτk,q θ
n i=1
(r),[−i]
b
où θ désigne l’estimation de θ à l’étape r de l’algorithme, en utilisant
toutes les données sauf la ième , (Xi , Yi ). On choisit alors pour valeur de ρ celle
138 IV.1. PRÉVISION PAR LES QUANTILES CONDITIONNELS
qui minimise ce critère CV (ρ). On peut alors montrer (voir Wahba, 1990) que
ce critère s’écrit sous la forme
(r)
2
τ b
1 Xn Y i − hB k,q θ , X i i
CV (r) (ρ) = 2 ,
n i=1 (r)
1 − hii
(r)
où hii est le ième élément diagonal de la “hat matrix” H(r) (à l’étape r de
l’algorithme) définie par
−1
H(r) = DX DτX Ω(r) DX + ρGk DτX Ω(r) .
Le critère de validation croisée généralisée (à l’étape r de l’algorithme) est alors

(r) P (r)
obtenu en remplaçant dans le critère de validation croisée hii par n1 ni=1 hii =

1
Tr H(r) . Ce critère s’écrit donc, en notant Y b (r) = H(r) Y,
n
τ
1 b (r)
Y−Y Ω (r)
Y−Y b (r)
n
GCV (r) (ρ) = 2 ,
1 − n1 Tr (H(r) )
et on choisit alors pour valeur de ρ celle qui minimise ce critère GCV (ρ).
Notons que le calcul numérique de ce critère est plus rapide que celui du
critère de validation croisée ordinaire, et donne en pratique de bons résultats.
En utilisant les propriétés de la trace, on peut gagner aussi en calcul sur les
tailles de matrices en écrivant
−1

Tr H (r)
= Tr DτX Ω(r) DX DτX Ω(r) DX + ρGk .
IV.1.3. Modèle avec plusieurs variables explicatives
On souhaite étendre ici le modèle (2) au cas où on dispose de plusieurs va-
riables explicatives. Notons dans la suite ces v variables (centrées) Xi1 , . . . , Xiv
IV.1.3. MODÈLE AVEC PLUSIEURS VARIABLES EXPLICATIVES 139
appartenant à L2 ([0, 1]) pour tout i = 1, . . . , n. On se restreint à un modèle

additif s’écrivant, pour tout i = 1, . . . , n
Z 1 Z 1
Yi = Ψ1α (t)Xi1 (t)dt +...+ Ψvα (t)Xiv (t)dt + i ,
0 0
avec

P i ≤ 0/Xi1 = x1 , . . . , Xiv = xv = α.
Pour déterminer des estimateurs (par splines de régression) de Ψ1α , . . . , Ψvα , on

utilise l’algorithme backfitting de Hastie et Tibshirani (1990), que l’on décrit
ici.
b 1,(1)
• Initialisation : on pose Ψ α
b v,(1)
= ... = Ψ α = 0.
• Étape r + 1 : on considère, pour tout l = 1, . . . , v, la variable
l−1 Z
X 1 v
X Z 1
Yil,r+1 = Yi − b s,(r+1) (t)X s (t)dt −
Ψ b s,(r) (t)X s (t)dt,
Ψ
α i α i
s=1 0 s=r+1 0
et on considère le modèle à une variable explicative

Z 1
l,r+1
Yi = Ψα (t)Xil (t)dt + i .
0
En utilisant l’algorithme des moindres carrés itérés pondérés présenté
b l,(r+1)
précédemment, on obtient ainsi Ψ α pour tout l = 1, . . . , v.
IV.2. PRÉVISION PAR LA MOYENNE
CONDITIONNELLE
Le chapitre précédent concernait l’estimation de quantiles conditionnels pour

variable explicative fonctionnelle en pratique. Dans ce chapitre, on présente
brièvement le travail qui a été fait dans le cas de l’estimation de la moyenne
conditionnelle pour variable explicative fonctionnelle.
IV.2.1. Estimation par splines de régression
Concernant les splines de régression, Cardot, Ferraty et Sarda (1999, 2003)

ont là aussi évalué les performances de leur estimateur au moyen de simu-
lations. Une extension de cet estimateur au cas de plusieurs variables expli-
catives est possible (de façon analogue a ce qui a été fait pour les quantiles
conditionnels à la partie précédente avec un modèle additif et un algorithme
“backfitting”). Ceci est proposé par Cardot, Crambes et Sarda (2006), dans le
chapitre de e-book évoqué précédemment (voir les chapitres IV.3 et IV.4 pour
les détails de cette étude de prévision de pics de pollution).
IV.2.2. Estimation par splines de lissage
Concernant les estimateurs présentés dans les parties II et III de la thèse

(estimateur de la moyenne conditionnelle basé sur les splines de lissage lorsque
la variable explicative est bruitée ou non), l’expression explicite de l’estima-
teur (basée sur une écriture matricielle simple) permet une programmation
immédiate. On peut alors évaluer ses performances au moyen de simulations.
142 IV.2. PRÉVISION PAR LA MOYENNE CONDITIONNELLE
Celles-ci peuvent être trouvées dans l’article de Cardot, Crambes, Kneip et

Sarda (2006) (donné au chapitre III.3. de la thèse), ce qui permet de se faire une
idée des performances des estimateurs construits, notamment par la méthode
des moindres carrés orthogonaux.
IV.3. DONNÉES DE POLLUTION
Dans cette partie, on donne une rapide description des données de pollution
fournies par l’ORAMIP, ainsi qu’une rapide étude de celles-ci. Les différentes
variables sont mesurées toutes les heures, la période concernant nos données
allant du 15 Mai au 15 Septembre, pour les années 1997, 1998, 1999 et 2000.
Les variables mesurées sont des polluants ou des variables météorologiques,
• le monoxyde d’azote NO (en µg/m3 ) : NO,

• le dioxyde d’azote NO2 (en µg/m3 ) : N2,
• les poussières PM10 (en µg/m3 ) : PS,
• l’ozone O3 (en µg/m3 ) : O3,
• la vitesse du vent (en m/s) : VV,
• la direction du vent (en degrés) : DV,
• la température (en degrés Celsius) : TE,
• l’humidité relative (en pourcentage) : HR.
Ces variables sont mesurées dans différentes stations de l’agglomération de

Toulouse,
• la station Jacquier : JAC,
• la station Léo Lagrange : LEO,
• la station des Mazades : MAZ,
• la station Berthelot : BRT,
• la station Colomiers : COL,
• la station Balma : BAL.
De nombreuses données manquantes apparaissent, principalement en raison
de pannes ou en l’absence d’appareils de mesures, certaines variables, n’étant
144 IV.3. DONNÉES DE POLLUTION
jamais observées. On supprime les variables qui ont trop de données man-
quantes, et on garde pour les autres leur moyenne sur toutes les stations (car
les variables sont très similaires d’une station à l’autre). On dispose ainsi de 5
variables NO, N2, O3, DV et VV. On peut avoir une première idée du compor-
tement des variables (par exemple la variable O3 qui nous intéressera plus par
la suite) en regardant les courbes journalières. La figure 2 donne une partie
des courbes journalières de cette variable O3, ainsi que la courbe moyenne en
trait plus clair. Notons que les courbes journalières vont de 18 h à 17 h le
lendemain. On constate ainsi une diminution de l’ozone pendant la nuit, alors
que le pic d’ozone intervient dans le milieu de l’après-midi.
200
150
ozone
100
50
0
18 21 24 3 6 9 12 15
hours
Figure 2. Courbes journalières d’ozone.
Une étude descriptive de ces données a été faite (analyses en composantes

principales). Une première étude est faite par Cardot, Crambes et Sarda (2004b).
Une étude plus détaillée peut être trouvée dans Cardot, Crambes et Sarda
(2006). Ce travail regroupe toute une étude de ces données à l’aide du logiciel
Xplore. Les méthodes d’estimation vues précédemment (moyenne condition-
nelle, quantiles conditionnels) ont été utilisées et les résultats obtenus sont
donnés dans le chapitre de ce e-book (voir chapitre suivant).
IV.3. DONNÉES DE POLLUTION 145
L’étude de ces données de pollution va être prolongée par un travail en cours,

en collaboration avec Hervé Cardot, Alois Kneip et Pascal Sarda, dans le cadre
de l’étude de l’estimateur de la moyenne conditionnelle par splines de lissage
(estimateur présenté dans la partie II de la thèse). En supposant la normalité
des erreurs dans le modèle (2), on considère l’estimation d’une nouvelle donnée
Yn+1 connaissant une nouvelle variable explicative fonctionnelle Xn+1 , le but
étant de donner un intervalle de prédiction pour Yn+1 .
IV.4. OZONE POLLUTION FORECASTING
IV.4.1. Introduction
Prediction of Ozone pollution is currently an important field of research,

mainly in a goal of prevention. Many statistical methods have already been
used to study data dealing with pollution. For example, Ghattas (1999) used
a regression tree approach, while a functional approach has been proposed
by Damon and Guillas (2002) and by Aneiros-Perez, Cardot, Estevez-Perez
and Vieu (2004). Pollution data often consist now in hourly measurements of
pollutants and meteorological data. These variables are then comparable to
curves known in some discretization points, usually called functional data in
the literature (see ramsay and Silverman, 1997). Many examples of such data
have already been studied in various fields (see Frank and Friedman, 1993,
Ramsay and Silverman, 2002, Ferraty and Vieu, 2002). It seems then natural
to propose some models that take into account the fact that the variables are
functions of time.
The data we study here were provided by the ORAMIP(1) , which is an air
observatory located in the city of Toulouse (France). We are interested in a
pollutant like Ozone. We consider the prediction of the maximum of pollution
for a day (maximum of Ozone) knowing the Ozone temporal evolution the day
before. To do this, we consider two models. The first one is the functional linear
model introduced by ramsay and Dalzell (1991). It is based on the prediction
of the conditional mean. The second one is a generalization of the linear model
for quantile regression introduced by Koenker and Bassett (1978) when the
covariates are curves. It consists in forecasting the conditional median. More
(1)
“Observatoire Régional de l’Air en Midi-Pyrénées”
148 IV.4. OZONE POLLUTION FORECASTING
generally, we introduce this model for the α-conditional quantile, with α ∈]0, 1[.
This allows us to give prediction intervals. For both models, a spline estimator
of the functional coefficient is introduced, in a way similar to Cardot, Ferraty
and Sarda (2003).
This work is divided into four parts. First, we give a brief statistical descrip-
tion and analysis of the data, in particular by the use of principal components
analysis (PCA), to study the general behaviour of the variables. Secondly, we
present the functional linear model and we propose a spline estimator of the
functional coefficient. Similarly, we propose in the third part a spline estima-
tor of the functional coefficient for the α-conditional quantile. In both models,
we describe the algorithms that have been implemented to obtain the spline
estimator. We also extend these algorithms to the case where there are seve-
ral functional predictors by the use of a backfitting algorithm. Finally, these
approaches are illustrated using the real pollution data provided by the ORA-
MIP.
IV.4.2. A brief analysis of the data
Description of the data. — The data provided by ORAMIP consist in

hourly measurements during the period going from the 15th May to the 15th
September for the years 1997, 1998, 1999 and 2000, of the following variables :
– Nitrogen Monoxide (noted NO),

– Nitrogen Dioxide (noted N2),
– Ozone (noted O3),
– Wind Direction (noted WD),
– Wind Speed (noted WS).
These variables were observed in six different stations in Toulouse. There

are some missing data, mainly because of breakdowns. There were also other
variables (such as the temperature) for which the missing data were too nume-
rous and we could not use them, so, in the following, we just consider the five
variables mentioned above. We first noticed that these variables take values
which are very similar from one station to another. Thus, for each variable, we
consider the mean of the measurements in the different stations. This approach
is one way to deal with missing values.
IV.4.2. A BRIEF ANALYSIS OF THE DATA 149
A descriptive analysis of the variables can show simple links between them.
For example, we can see that the mean daily curves of the first three variables
NO, N2 and O3 (cf. figure 3) have a similar evolution for NO and N2 (at least
in the first part of the day). On the contrary, the curves for NO and O3 have
opposite variations. These observations are also confirmed by the correlation
matrix of the variables NO, N2 and O3.
mean of NO
5 10 15 20
NO
5 10 15 20
hours
mean of N2
15 20 25 30 35
N2
5 10 15 20
hours
mean of O3
40 60 80 100
O3
5 10 15 20
hours
Figure 3. Daily mean curves for the variables NO (blue curve), N2

(green curve and O3 (red curve).
Principal Component Analysis. — A first PCA has been done on the ma-
trix whose columns are the different daily mean variables. As these variables
have different units, we also consider the reduced matrix. The first two compo-
nents allow to explain more than 80% of the variance. To visualize the results
of this PCA, we have represented the mean hours (figure 4) and the variables
(figure 5) in the plane formed by the two first principal axes. We notice on
figure 4 that the first axis separates the morning and the afternoon evolution
while the second axis separates the day and the night. Concerning figure 5,
the first axis separates Nitrogen Monoxide and Nitrogen Dioxide of Ozone. We
can also remark that, if we put the graphic 4 on the graphic 5, we find that
the maximum of Ozone is in the afternoon and that the quantity of Ozone
is low in the morning. It is the contrary for Nitrogen Monoxide and Nitrogen
Dioxide.
representation of the mean hours
10
23
21
second principal component 22

9 24
11
20
13 12 8
0
15 1
17 7
16 19
1814
2 5
6
3
4
-2 -1 0 1 2 3 4
first principal component
Figure 4. Representation of the mean hours 1, . . . , 24 in the plane

generated by the two first principal components.
representation of the variables
N2
second principal component
WS
O3
0
WD
NO
0
first principal component
Figure 5. Representation of the variables NO, N2, O3, WD and WS

in the plane generated by the two first principal components.
Functional Principal Component Analysis. — We also performed a

functional PCA (see Ramsay and Silverman, 1997) of the different variables.
IV.4.3. FUNCTIONAL LINEAR MODEL 151
We come back here to the functional background where we consider each va-
riable as a curve discretized in some points. We can look at the variations of
each variable around its mean by representing the functions µ, µ + Cξ and
µ − Cξ, where µ is the mean curve of the variable, C is a constant and ξ is a
principal component. For example, for Ozone, we make this representation for
the first principal component (that represents nearly 80% of the information)
on figure 6. The constant C has been fixed arbitrarily in this example equal to
10, to obtain a figure easily interpretable. We can see that the first principal
component highlights variations around the mean at 3:00 pm. It is the time of
the maximum of Ozone in the middle of the afternoon.
variations of O3 around the mean

100
80
O3
60
40
5 10 15 20
hours
Figure 6. Variations of O3 around the mean. The blue solid curve

represents the mean curve µ of Ozone, the red dotted curve represents
µ+10ξ where ξ is the first principal component, and the green dashed
curve represents µ − 10ξ.
IV.4.3. Functional linear model
We describe now the functional linear model presented for example by Ram-
say and Silverman (1997). Let us consider a sample (Xi , Yi )i=1,...,n of pairs of
random variables, independent and identically distributed, with the same dis-
tribution as (X, Y ), with X belonging to the functional space L2 (D) of the
integrable square functions defined on a bounded interval D of R, and Y be-

longing to R. We center each function Xi by introducing X fi = Xi − E(Xi ).
The functional linear model is then defined by
Z
(IV.2) Yi = µ + fi (t) dt + i ,
α(t)X
D
Z
with E(i |Xi ) = 0. We have E(Yi ) = µ and E(Yi |Xi ) = µ + fi (t) dt.
α(t)X
D
In practice, each function Xi is known in p = 24 equispaced discretization

points t1 , . . . , tp ∈ D (with t1 ≤ . . . ≤ tp ). So, the integral above is approxima-
ted by
Z X p−1
fi (t) dt ' λ(D)
α(t)X fi (tj ),
α(tj )X
D p j=1
where λ(D) stands for the length of the interval D. More generally, when the
discretization points are not equispaced, the integral can be easily approxima-
ted by
Z p−1
X
fi (t) dt '
α(t)X fi (tj ).
(tj+1 − tj )α(tj )X
D j=1
Spline estimation of α. — We choose to estimate the functional coefficient

of regression α : D −→ R by a spline function (see de Boor, 1978, for details).
Let us consider k ∈ N? and q ∈ N. We split D into k intervals of the same
length. A spline function is a piecewise polynomial function of degree q ∈ N?
on each sub-interval, (q − 1) times differentiable on D. The extremities of
the sub-intervals are called knots. It is known that the space of such splines
functions is a vectorial space of dimension k + q. We consider the basis Bk,q of
this space called B-splines basis and that we write Bk,q = t (B1 , · · · , Bk+q ).
We estimate α by a linear combination of the functions Bl , l = 1, . . . , k + q,

b ∈ Rk+q such that
b ∈ R and a vector θ
that leads us to find µ
k+q
X
b=
α θbl Bl = t Bk,q θ,
b
l=1
b solutions of the following minimization problem

b and θ
with µ
X n
1 t
(m) 2
(IV.3) min t f 2
(Yi − µ − h Bk,q θ, Xi i) + ρ ( Bk,q θ) ,
µ∈R,θ∈Rk+q n L2
i=1
where (t Bk,q θ)(m) is the mth derivative of t Bk,q θ and ρ is a penalization pa-
rameter that allows to control the smoothness of the estimator (see Cardot,
Ferraty and Sarda, 2003). The notation h., .i refers to the usual inner product
of L2 (D) and k.kL2 is the norm induced by this inner product.

µ
If we set β = ∈ Rk+q+1 , then, the solution of the minimization pro-
θ
blem (IV.3) above is given by
b = 1 ( 1 t DX DX + ρKk )−1 t DX Y,
β
n n
with
 
1 hB1 , X1 i . . . hBk+q , X1 i
0 0
DX =  ... ..
.
..
.  and Kk = ,
0 Gk
1 hB1 , Xn i . . . hBk+q , Xn i
(m) (m)
where Gk is the (k + q) × (k + q) matrix with elements hBj , Bl i. It also
satisfies
2
t
θGk θ = (t Bk,q θ)(m) L2 .
The computation of the matrices DX and Gk is performed with the Xplore

functions bspline and bsplineini.
Let us notice that a convergence result for this spline estimator is given by
Cardot, Ferraty and Sarda (2003).
Selection of the parameters. — The estimator defined by (IV.3) depends

on a large number of parameters : the number of knots k, the degree q of
splines, the order m of derivation in the penalization term, and the smoothing
parameter ρ. It seems (see Marx and Eilers, 1999, Besse, Cardot and Ferraty,
1997) that only the penalization parameter ρ is really important provided that
the number of knots is large enough.
The parameter ρ is chosen by the generalized cross validation criterion (see

Wahba, 1990) which is described below.
1 1
Consider the “hat matrix” H(ρ) = DX ( t DX DX + ρKk )−1 t DX . It sa-
n n
b
tisfies Y = H(ρ)Y. The generalized cross validation criterion is then given
by
n
1X
(Yl − Ybl )2
n l=1
(IV.4) GCV (ρ) = 2 .
1
1 − T r(H(ρ))
n
We select the optimal parameter ρGCV as the one that minimizes the GCV
criterion (IV.4). Let us notice that we do not have to compute the matrix H(ρ)
(whose size is n × n) since we have T r(H(ρ)) = T r( n1 t DX DX ( n1 t DX DX +
ρKk )−1 ).
The Xplore function sflmgcv uses this GCV criterion and gives the estima-
tions of µ, θ and α.
Multiple functional linear model. — We now want to generalize the mo-

del (IV.2) to the case where there are several (centered) functional covariates
f1 , . . . , X
X fv . We consider the following additive model
Z Z
(IV.5) Yi = µ + f1 (t) dt + . . . +
α1 (t)X fv (t) dt + i .
αv (t)X
i i
D D
To get the estimates of µ, α1 , . . . , αv , we used the backfitting algorithm (see

Hastie and Tibshirani, 1990), which principle is described below. It allows us to
avoid inverting large scale matrices and leads to a faster estimation procedure.
The Xplore function giving the estimates of µ, α1 , . . . , αv using the backfitting
algorithm for v covariates is sflmgcvmult.
– Step 1
P
We initialize α c1 (1) , . . . , αd
v−1
(1)
b to n1 ni=1 Yi . Then, we determine
to 0 and µ
cv (1) by using the spline estimation procedure for the functional
b(1) and α
µ
linear model with one covariate.
– Step 2
For r = 1, . . . , v, we consider
r−1 Z
X v
X Z
Yir,2 = Yi − αbl (2) fl (t) dt −
(t)X fl (t) dt,
αbl (1) (t)X
i i
l=1 D l=r+1 D
and we make a simple functional regression

Z
Yir,2 =µ+ fr (t) dt + i .
αr(2) (t)X i
D
Then, we obtain µ b(2) and αcr (2) , for r = 1, . . . , v. The optimal penaliza-
tion parameter is determined for each estimator with generalized cross
validation.
– Step j + 1
αr (j) − α
While maxr=1,...,v (kc cr (j−1) k) > ξ (where ξ is an error constant to
be fixed), we consider
r−1 Z
X v
X Z
Yir,j+1 = Yi − αbl (j+1) fl (t) dt −
(t)X fl (t) dt,
αbl (j) (t)X
i i
l=1 D l=r+1 D
and we make a simple functional regression

Z
Yir,j+1 =µ+ fr (t) dt + i ,
αr(j+1) (t)X i
D
by using the estimator defined for the functional linear model with one
cr (j+1) , for r = 1, . . . , v. The optimal
b(j+1) and α
covariate. We then deduce µ
penalization parameter is determined for each estimator with generalized
cross validation.
IV.4.4. Conditional quantiles estimation
Our goal is now to find the Ozone threshold value such that the conditional
probability to exceed this value is equal to a certain given risk α ∈]0, 1[. More
precisely, if Y is a real random value, we define its α-quantile by the real
number qα such that
P(Y ≤ qα ) = α.
Koenker and Bassett (1978) use the following property to define quantile
estimators (which can be naturally generalized to conditional quantiles) :
qα = arg min E(lα (Y − a)),

a∈R
with
lα (u) =| u | +(2α − 1)u.
Let us now come back to our functional case. We still consider the sample
(Xi , Yi )i=1,...,n of pairs of random variables, independent and identically distri-
buted, with the same distribution as (X, Y ), with X belonging to the functional
space L2 (D), and Y belonging to R. Without loss of generality, we suppose
that X is a centered variable, that is to say E(X) = 0. Let α be a real number
in ]0, 1[ and x a function in L2 (D). We suppose that the conditional α-quantile
of Y given [X = x] is the unique scalar gα (x) such that
(IV.6) P(Y ≤ gα (x)|X = x) = α,

IV.4.4. CONDITIONAL QUANTILES ESTIMATION 157
where P(.|X = x) is the conditional probability given [X = x].
Let us remark that gα (x) can be defined in an equivalent way as the solution
of the minimization problem
(IV.7) min E(lα (Y − a)|X = x).

a∈R
We assume now that there exists a unique function Ψα ∈ L2 (D) such that
gα can be written in the following way
Z
(IV.8) gα (X) = c + hΨα , Xi = c + Ψα (t)X(t) dt.
D
This condition can be seen as a direct generalization of the model introduced

by Koenker and Bassett (1978), the difference being that here, the covariates
are functions.
Spline estimator of Ψα . — Our goal is now to give a nonparametric es-

timator of the function Ψα . In the case where the covariate X is real, many
nonparametric estimators have already been proposed (see for example Bhat-
tacharya and Gangopadhyay, 1990, Fan, Hu and Truong, 1994, Lejeune and
Sarda, 1988 or He and Shi, 1994).
As for the spline estimator of the conditional mean, we consider the vectorial
space of spline functions with k − 1 interior knots and of degree q, and its B-
splines basis Bk,q = t (B1 , . . . , Bk+q ). We estimate Ψα by a linear combination
of the Bl functions for l going from 1 to k + q. This leads us to find a vector
θb = t (θb1 , . . . , θbk+q ) in Rk+q such that
k+q
X
(IV.9) bα =
Ψ θbl Bl = t Bk,q θ.
b
l=1
The vector θb will be solution of the following minimization problem, which is

the penalized empirical version of (IV.7),
X n
1 t
(m) 2
(IV.10) min t
lα (Yi − c − h Bk,q θ, Xi i) + ρ ( Bk,q θ) ,
c∈R,θ∈Rk+q n L2
i=1
where (t Bk,q θ)(m) is the m-th derivative of the spline function t Bk,q θ and ρ
is a penalization parameter which role is to control the smoothness of the
estimator, as for the minimization problem (IV.3). This criterion is similar to
(IV.3), the quadratic function being here replaced by the loss function lα . In
this case, we have to deal with an optimization problem that does not have an
explicit solution, contrary to the estimation of the conditional mean. That is
why we adopted the strategy proposed by Lejeune and Sarda (1988). It is based
on an algorithm that consists in performing iterative weighted least squares
(see Ruppert and Caroll, 1988). Let us consider the function δi defined by
δi (α) = 2α11{Yi −c−ht Bk,q θ,Xi i≥0} + 2(1 − α)11{Yi −c−ht Bk,q θ,Xi i<0} .
The minimization problem (IV.10) is then equivalent to
X n
1 t
(m) 2
(IV.11) min t
δi (α) | Yi − c − h Bk,q θ, Xi i | +ρ ( Bk,q θ) .
i=1
Then, we can approximate this criterion by replacing the absolute value by a

weighted quadratic term, hence we can obtain a sequence of explicit solutions.
The principle of this Iterative Reweighted Least Squares algorithm is described
below.
– Initialization
We determine β 1 = t (c1 , θ 1 ) solution of the minimization problem
X n
1 t
(m) 2
min t 2
(Yi − c − h Bk,q θ, Xi i) + ρ ( Bk,q θ) ,
i=1
which solution β 1 is given by β 1 = 1 1 t

( DX DX
n n
+ ρKk )−1 t DX Y, with
DX and Kk defined previously.
– Step j+1
IV.4.4. CONDITIONAL QUANTILES ESTIMATION 159
Knowing β j = t (cj , θ j ), we determine β j+1 = t (cj+1 , θ j+1 ) solution of

the minimization problem
X n
1 δij (α)(Yi − c − ht Bk,q θ, Xi i)2 t

(m) 2
min + ρ ( Bk,q θ) L2
,
c∈R,θ∈Rk+q n
i=1
[(Yi − c − ht Bk,q θ, Xi i)2 + η 2 ]1/2
where δij (α) is δi (α) on step j of the algorithm, and η is a strictly positive
constant that allows us to avoid a denominator equal to zero. Let us define
the n × n diagonal matrix Wj with diagonal elements given by
δ1j (α)
[Wj ]ll = .
n[(Yl − c − ht Bk,q θ, Xl i)2 + η 2 ]1/2
Then, β j+1 = (t DX Wj DX + ρKk )−1 t DX Wj Y.
Remark : Since our algorithm relies on weighted least squares, we can derive
a generalized cross validation criterion to choose the penalization parameter
value ρ at each step of the algorithm. Indeed, the “hat matrix” defined by
H(ρ) = DX (t DX WDX +ρKk )−1 t DX W satisfies Y b = H(ρ)Y, where W is the
weight matrix obtained at the previous step of the algorithm. The generalized
cross validation criterion is then given by
1 t b b
(Y − Y)W(Y − Y)
(IV.12) n
GCV (ρ) = 2 ,
1
1 − T r(H(ρ))
n
where T r(H(ρ)) = T r( t DX W(t DX WDX + ρKk )).
We select the optimal parameter ρGCV as the one that minimizes the GCV
criterion (IV.12). The Xplore function squantgcv uses this GCV criterion and
gives the estimations of c, θ and Ψα .
b α is also available in Cardot, Crambes

A convergence result of the estimator Ψ
and Sarda (2005).
Multiple conditional quantiles. — Assuming we have now v functional

covariates X 1 , . . . , X v , this estimation procedure can be easily extended. We
consider the following model
(IV.13) P (Yi ≤ gα1 (Xi1 ) + . . . + gαv (Xiv )/Xi1 = x1i , . . . , Xiv = xvi ) = α.
Similarly as before, we assume that gα1 (Xi1 ) + . . . + gαv (Xiv ) = c + hΨ1α , Xi1 i +
. . .+hΨvα , Xiv i with Ψ1α , . . . , Ψvα in L2 (D). The estimation of each function Ψrα is
obtained using the iterative backfitting algorithm combined with the Iterative
Reweighted Least Squares algorithm. The Xplore function giving the estimates
of c, Ψ1α , . . . , Ψvα is squantgcvmult.
IV.4.5. Application to Ozone prediction
We want to predict the variable maximum of Ozone one day i, noted Yi ,

using the functional covariates observed the day before until 5 :00 pm. We
consider covariates with length of 24 hours. We can assume that beyond 24
hours, the effects of the covariate are negligible knowing the last 24 hours, so
each curve Xi begins at 6 :00 pm the day i − 2.
We ramdomly splitted the initial sample (Xi , Yi )i=1,...,n into two sub-samples :
– a learning sample (Xai , Yai )i=1,...,nl whose size is nl = 332, used to compute
b and α
the estimators µ b for the functional linear model and the estimators
b b
c and Ψα for the model with quantiles,
– a test sample (Xti , Yti )i=1,...,nt whose size is nt = 142, used to evaluate the
quality of the models and to make a comparison between them.
We also have chosen to take k = 8 for the number of knots, q = 3 for the
degree of spline functions and m = 2 for the order of the derivative in the
penalization.
To predict the value of Yi , we use the conditional mean and the conditional
median (i.e. α = 0.5). To judge the quality of the models, we give a prediction
of the maximum of Ozone for each element of the test sample,
IV.4.5. APPLICATION TO OZONE PREDICTION 161
Z
ct = µ
Y b+ b(t)Xti (t) dt,
α
i
D
for the prediction of the conditional mean, and
Z
c
Y ti = b
c+ b α (t)Xt (t) dt
Ψ i
D
for the prediction of the conditional median.
Then, we consider three criteria given by
P nt
1 ct )2
−Y
nt i=1 (Yti i
C1 = 1
P nt ,
nt i=1 (Yti − Y l )2
nt
1 X ct |,
C2 = | Yt i − Y i
nt i=1
P nt
1 ct )
lα (Yti − Y
Pnti=1
nt i
C3 = 1 ,
nt i=1 lα (Yti − qα (Yl ))
where Y l is the empirical mean of the learning sample (Yai )i=1,...,nl and qα (Yl ) is
the empirical α-quantile of the learning sample (Yai )i=1,...,nl . This last criterion
C3 is similar to the one proposed by Koenker and Machado (1999). We remark
that, the more these criteria take low values (close to zero), the better is the
prediction. These three criteria are all computed on the test sample.
Prediction of the conditional mean. — The values of the criteria C1

and C2 are given in the table 4. It appears that the best model with one
covariate to predict the maximum of Ozone is the one that use the curve of
Ozone the day before. We have also built multiple functional linear models, in
order to improve the prediction. The errors for these models are also given in
table 4. It appears that the best model is the one that use the four covariates
Ozone, Nitrogen Monoxide, Wind Direction and Wind Speed. So, adding other
covariates allows to improve the prediction, even if the gain is low.
Models Variables C1 C2
NO 0.828 16.998
models N2 0.761 16.153
with 1 covariate O3 0.416 12.621
WD 0.910 18.414
WS 0.796 16.756
O3, NO 0.409 12.338
models O3, N2 0.410 12.373
with 2 covariates O3, WD 0.405 12.318
O3, WS 0.400 12.267
O3, NO, N2 0.408 12.305
O3, NO, WD 0.394 11.956
models O3, NO, WS 0.397 12.121
with 3 covariates O3, N2, WD 0.397 12.003
O3, N2, WS 0.404 12.156
O3, WD, WS 0.397 12.101
O3, NO, WD, WS 0.391 11.870
models O3, NO, N2, WD 0.395 11.875
with 4 covariates O3, NO, N2, WS 0.398 12.069
O3, N2, WD, WS 0.394 11.962
model with 5 covariates O3, NO, N2, WD, WS 0.392 11.877
Table 4. Prediction error criteria C 1 and C2 for the different func-
tional linear models.
Prediction of the conditional median. — Table 5 gathers the prediction

errors of the different models. As for the functional linear model, the best
prediction using one covariate is the one obtained by using the Ozone curve
the day before. Moreover, the prediction is slightly improved by adding other
covariates. The best prediction for the criterion C3 is obtained for the model
using the covariates Ozone, Nitrogen Monoxide, Nitrogen Dioxide and Wind
Speed. For this model with these four covariates, we have represented on figure
7 the GCV criterion versus − log(ρ) for the different values of ρ from 10−5 to
10−10 . The minimum value of the GCV criterion is reached for ρ = 10−8 . Figure
8 represents the predicted maximum of Ozone (with this model of 4 covariates)
versus the measured maximum of Ozone for the test sample. We see on this
graphic that the points are quite close to the straight line of equation y = x.
Another interest of the conditional quantiles is that we can build some pre-
diction intervals for the maximum of Ozone, which can be quite useful in the
Models Variables C1 C2 C3
NO 0.826 16.996 0.911
models N2 0.805 16.800 0.876
with 1 covariate O3 0.425 12.332 0.661
WD 0.798 18.836 0.902
WS 0.885 18.222 0.976
O3, NO 0.412 12.007 0.643
models O3, N2 0.405 11.936 0.640
with 2 covariates O3, WD 0.406 12.109 0.649
O3, WS 0.406 11.823 0.633
O3, NO, N2 0.404 11.935 0.639
O3, NO, WD 0.404 12.024 0.644
models O3, NO, WS 0.407 11.832 0.638
with 3 covariates O3, N2, WD 0.402 11.994 0.642
O3, N2, WS 0.403 12.108 0.641
O3, WD, WS 0.403 12.123 0.640
O3, NO, WD, WS 0.399 11.954 0.641
models O3, NO, N2, WD 0.397 11.921 0.639
with 4 covariates O3, NO, N2, WS 0.397 11.712 0.634
O3, N2, WD, WS 0.398 11.952 0.640
model with 5 covariates O3, NO, N2, WD, WS 0.397 11.864 0.638
Table 5. Prediction error criteria C 1 , C2 and C3 for the different
functional quantile regression models.
context of prevention of Ozone pollution. Coming back to the initial sample

(that is to say when the days are chronologically ordered), we have plotted on
figure 9 the measures of the maximum of Ozone during the first 40 days of
our sample, that is to say from the 17th May of 1997 to the 25th June of 1997
(blue solid curve). The red dotted curve above represents the values of the 90%
quantile and the green dashed curve below represents the values of the 10%
quantile predicted for these measures. The prediction model used is again the
quantile regression model with the 4 covariates O3, NO, N2 and WS.
Analysis of the results. — Both models, the functional linear model and
the model with conditional quantiles for functional covariates, give satisfying
results concerning the maximum of Ozone prediction. Concerning figure 8, it
seems that few values are not well predicted. This highlights a common problem
for statistical models, which get into trouble when predicting extreme values
GCV criterion
4
3.5
GCV
3
2.5
2
5 6 7 8 9 10
-log(rho)
Figure 7. Generalized Cross Validation criterion for different values

of ρ in the quantile regression model using the covariates O3, NO,
N2, WS.
(outliers). The interval of prediction given by the 90% and 10% conditional
quantiles can be an interesting answer to that problem, as seen on figure 9.
In spite of the lack of some important variables in the model, such as tempe-
rature for example, we can produce good estimators of maximum of pollution
knowing the data the day before. The most efficient variable to estimate the
maximum of Ozone is the Ozone curve the day before ; however, we noticed
that prediction accuracy can be improved by adding other variables in the
model. We can suppose that it will be possible to improve again these results
when other covariates will be available from ORAMIP, such as temperature
curves.
200
150
Ozone predicted
100
50
0
0 50 100 150 200

Ozone measured
Figure 8. Predicted Ozone versus measured Ozone for the test

sample, using the prediction quantile regression model with the co-
variates O3, NO, N2, WS.
160
140
120
Ozone
100
80
60
10 20 30 40
days
Figure 9. Prediction interval of the measures of maximum of Ozone

for the period going from the 17th May of 1997 to the 25th June of
1997 (blue solid curve). The red dotted curve and the green dashed
curve represent respectively the values of the 90% and 10% quantiles
predicted for these measures.
PARTIE V
ANNEXE
V.1. VARIABLE EXPLICATIVE BRUITÉE
- PREUVES
Preuve de la proposition III.3. — La solution du problème de minimisa-

tion (III.7) est donnée par
−1
bF T LS 1 1 τ 2
θ = DW DW + ρGk − σmin Bk DτW Y.
n n
Preuve: En utilisant la matrice β, on écrit α = βθ avec θ ∈ Rk+q . On tire

alors de l’écriture du modèle

Wβ δβ θ
(V.1) ,Y − , = 0,
p p −1
ce qui nous permet d’écrire le problème de minimisation (III.7) sous la forme
( 2 )
1 δ
√ , + ρθ τ Gk θ ,
min
(V.1) n p
F
où la notation k.kF désigne toujours la norme de Frobenius matricielle. On

considère alors le problème de minimisation
170 V.1. VARIABLE EXPLICATIVE BRUITÉE - PREUVES
( 2 )
1 δ
√ , + ρxτ Kk x ,
(V.2) min
Ax=Ex n p
F

θ
avec A = Wβ
p
,Y , E = δβ
p
, et x = −1
. En notant que
τ
1 τ τ δ δ 1 τ τ
x γ √ , √ , γx = x E Ex
n p p n
1 τ τ
= x A Ax
n τ
1 τ τ W W
= x γ √ ,Y √ , Y γx,
n p p
on voit que la quantité
τ
1 τ τ W W
x γ √ , Y γx + ρxτ Kk x
√ ,Y
n p p
τ
1 τ τ W W −1 −1
= x γ √ ,Y √ , Y γx + xτ γ τ γ (γ τ γ) (ρKk ) (γ τ γ) γ τ γx
n p p
est minimisée pour γx vecteur propre de la matrice
τ
1 W W −1 −1
√ ,Y √ , Y + γ (γ τ γ) (ρKk ) (γ τ γ) γ τ ,
n p p
2
correspondant à la plus petite valeur propre non nulle, notée σmin . En utilisant
la définition de cette valeur propre, on déduit que
τ
1 W W τ −1 τ −1 τ 2
√ ,Y √ , Y + γ (γ γ) (ρKk ) (γ γ) γ γ x b = σmin b,
γx
n p p
ce qui donne, en prémultipliant par γ τ ,

V.1. VARIABLE EXPLICATIVE BRUITÉE - PREUVES 171
τ βτ β b

1 Wβ Wβ b
θ 2 θ
,Y ,Y = σmin p .
n p p −1 −1
Finalement, en gardant les k + q premières lignes, on obtient
−1
b= 1
θ
1 τ 2
D DW + ρGk − σmin Bk DτW Y,
n n W
ce qui achève la preuve de la proposition III.3.
Preuve de la proposition III.4. — On a
1 τ 1 σ2
DW DW = DτX DX + δ Bk + R1 ,
n n p
avec

1
kR1 k = OP .
n1/2 p1/2 k 1/2
Preuve: En utilisant le fait que Wi (tj ) = Xi (tj ) + δij pour i = 1, . . . , n et

j = 1, . . . , p, on peut écrire
n
!
1 τ 1 1X
DW DW = DτX DX + Mirs ,
n n n i=1
r,s=1,...,k+q
avec Mirs = hBr , Xi ihBs , δi i+hBr , δi ihBs , Xi i+hBr , δi ihBs , δi i. Étudions main-
tenant cette variable aléatoire Mirs . Tout d’abord, en utilisant l’indépendance
entre Xi et δi , on peut écrire
(V.3) E (Mirs ) = E (hBr , δi ihBs , δi i)

p p
1 XX
= 2 Br (tj1 )Bs (tj2 )E (δi (tj1 )δi (tj2 ))
p j =1 j =1
1 2
p
σδ2 X
= Br (tj )Bs (tj )
p2 j=1
σδ2
= hBr , Bs i.
p
D’autre part, on a
2

(V.4) E Mirs = E hBr , Xi i2 E hBs , δi i2

+E hBr , δi i2 E hBs , Xi i2

+E hBr , δi i2 hBs , δi i2
+2E (hBr , Xi ihBs , Xi i) E (hBr , δi ihBs , δi i) .
En utilisant des résultats techniques sur les B-splines (voir Cardot, 2000), on
note que

σ2 Xp 1
δi
(V.5) |E (hBr , δi ihBs , δi i)| = 2 Br (tj )Bs (tj ) = O ,
p pk
j=1
et, avec l’hypothèse (B.0),

1
(V.6) |E (hBr , Xi ihBs , Xi i)| = O .
k2
Avec le même type de calculs, on a aussi

V.1. VARIABLE EXPLICATIVE BRUITÉE - PREUVES 173

E hBr , δi i2 hBs , δi i2
p p p p
1 XXXX
= 4 Br (tj1 )Br (tj2 )Bs (tj3 )Bs (tj4 )E (δij1 δij2 δij3 δij4 )
p j =1 j =1 j =1 j =1
1 2 3 4
p
1 X
= 4 Br (tj )2 Bs (tj )2 E δij4
p j=1
p p
1 XX
+ 4 Br (tj1 )Br (tj2 )Bs (tj1 )Bs (tj2 )E δij2 1 E δij2 2 ,
p j =1 j =1
1 2
d’où
 " #2 
p
1 X
E hBr , δi i2 hBs , δi i2 = O  4 Br (tj )Bs (tj )  ,
p j=1
ce qui donne finalement

2 2 1
(V.7) E(hBr , δi i hBs , δi i ) = O .
p k2
2
Maintenant, avec (V.5), (V.6) et (V.7), la relation (V.4) devient

2
1 1
E Mirs =O +O ,
pk 3 p k2
2
soit, en prenant p > k,

2
1
(V.8) E Mirs =O .
pk 3
On peut maintenant conclure la preuve de la proposition III.4. En utilisant

(V.3) et (V.8), on a donc
n
1X σ2 1
Mirs = δ hBr , Bs i + OP .
n i=1 p n1/2 p1/2 k 3/2
C’est donc là qu’apparaissent les matrices Bk and R1 : il existe une matrice
R1 telle que
n
!
1X σδ2
Mirs = Bk + R1 ,
n i=1 p
r,s=1,...,k+q

1
avec, pour r, s = 1, . . . , k + q, R1rs = OP n1/2 p1/2 k 3/2
, soit, avec le théorème
1.19 de Chatelin (1983),

1
kR1 k = OP ,
n1/2 p1/2 k 1/2
ce qui termine la preuve de la proposition III.4.

V.2. INTÉGRALE DU CARRÉ DE LA
RÉGRESSION - PREUVES
Z 1
b − θ = µ2 (K)
E(θ) r (x)α(x)dx h2 + o(h2 ).
00
2 0
Preuve:
(V.9) b −θ
E(θ)
n Z Z 1
X 1
K xih−x α(x)
= E(Yi ) Pn
xi −x dx − r(x)α(x)dx
i=1 0 i=1 K h 0
Z 1
Pn xi −x
Pn xi −x

i=1 r(x i )K − r(x) K
= Phn xi −x
i=1 h
α(x)dx.
0 i=1 K h
Calculons d’abord la somme
n
X
xi − x
K .
i=1
h
On utilise l’approximation d’une somme par une intégrale, ce qui donne, avec
le changement de variable s = u−x
h
,
176 V.2. INTÉGRALE DU CARRÉ DE LA RÉGRESSION - PREUVES
n
X Z x+h
xi − x u−x
(V.10) K = n K du + o(nh)
i=1
h x−h h
Z 1
= nh K(s)ds + o(nh)
−1
= nh + o(nh).
Calculons maintenant
n
X n
X
xi − x xi − x
r(xi )K − r(x) K .
i=1
h i=1
h
On a, par un développement limité,
n
X n
X
xi − x xi − x
r(xi )K − r(x) K
i=1
h i=1
h
" n n
X xi − x r 00 (x) X xi − x
0
= r (x) (xi − x)K + (xi − x)2 K
i=1
h 2 i=1 h
Xn !#
xi − x
+o (xi − x)2 K ,
i=1
h
ce qui donne, en utilisant le fait

P que K est un noyau d’ordre 2 et une approxi-
mation par une intégrale de ni=1 (xi − x)2 K xih−x ,
n
X n
X
xi − x xi − x
(V.11) r(xi )K − r(x) K
i=1
h i=1
h
µ2 (K) 00
= r (x)nh3 + o nh3 .
2
En revenant à la relation (V.9) et en utilisant les résultats (V.10) et (V.11),

on obtient
V.2. INTÉGRALE DU CARRÉ DE LA RÉGRESSION - PREUVES 177
Z 1 µ2 (K) 00
b −θ = 2
r (x)nh3 + o(nh3 )
E(θ) dx,
0 nh + o (nh)
ce qui donne le biais.
Z 1 Z 2
1 1
V θb = 2σ 2
α(x) dx 2
Ψ(z)dz +o .
0 0 n n
b on a
Preuve: D’après la définition de θ,
n
n X Z !
X 1
K xih−x α(x)
θb2 = Yi Yj Pn
xi −x dx
i=1 j=1 0 i=1 K h
Z 1 x −y
!
K jh α(y)
× Pn
xi −y dy ,
0 i=1 K h
ce qui donne, vu que E(Yi2 ) = r(xi )2 + σ 2 ,
2
(V.12) V θb = E θb2 − E θb
n Z 1 !
X K xi −x
α(x)
= E(Yi2 ) − r(xi )2 Pn h xi −x dx
i=1 0 i=1 K h
Z 1 !
xi −y
K h α(y)
× Pn
xi −y dy
0 i=1 K h
n Z ! Z !
X 1 x −x
K ih α(x) 1
K ih−y α(y)
x
2
= σ Pn
xi −x dx Pn
xi −y dy
i=1 0 i=1 K h 0 i=1 K h
Z 1Z 1 " n #
1 X x i − x x i − y
= σ2 K K α(x)α(y)dxdy,
0 0 S(x)S(y) i=1 h h
P
en notant S(x) = ni=1 K xih−x . Si on calcule la somme à l’aide d’une ap-
proximation par une intégrale, on a
n
X
xi − x xi − y
K K
i=1
h h
Z
u−x u−y
= n K K du + o(nh).
[x−h,x+h]∪[y−h,y+h] h h
u−x
Considérons le cas x ≤ y (le cas x ≥ y est analogue). En posant s = h
, ceci
donne
n
X
xi − x xi − y
K K
i=1
h h
Z
y−x
= nh K(s)K s − ds + o(nh).
[−1,1]∪[−1+ y−x
h
,1+ y−x
h
] h

Remarquons que les quantités K u−x h
et K u−y
h
du ont des supports non
disjoints uniquement pour y compris entre x − 2h et x + 2h, soit x−yh
compris
entre −2 et 2. On introduit alors la fonction Ψ définie par
Z 1
Ψ(a) = K(s)K(s − a)ds,
−1+a
pour tout a ∈ [0, 2], et on déduit alors
n
X
xi − x xi − y y−x
K K = nhΨ + o(nh).
i=1
h h h
y−x
En revenant alors à (V.12) et en posant z = h
, il vient (en utilisant aussi
(V.10))
Z 1 Z 2
b 2 nh2
V θ = 2σ Ψ(z)α(x + hz)dz α(x)dx,
0 [nh + o(nh)]2 0
ce qui donne la variance.
Z 1
b − θ = µ2 (K)
E(θ) [r(x)s (x) + r (x)s(x)] dx h2 + o(h2 ).
00 00
2 0
Preuve: On a, avec les notations introduites précédemment
b −θ
E(θ)
n n Z x −x Z 1
XX K xih−x K jh
1
= E(Yi )E(Zj ) Pn Pn xi −x
xj −x dx − r(x)s(x)dx
i=1 j=1 0 i=1 j=1 K h
K h 0
hP i
Z 1 P n xi −x n x j −x Pn xj −x
i=1 r(x i )K h j=1 s(x j )K h
− s(x) j=1 K h
= 2
dx
0 S(x)
h i
Z 1 Pn r(xi )K xi −x − r(x) Pn K xi −x s(x) Pn K xj −x
i=1 h i=1 h j=1 h
+ dx.
0 S(x)2
Avec ce qui a été fait précédemment pour établir les relations (V.10) et (V.11),
on a aussi
n
X
xi − x
r(xi )K = r(x)nh + o (nh) ,
i=1
h
n
X n
X
xj − x xj − x µ2 (K) 00
s(xj )K − s(x) K = s (x)nh3 + o nh3 ,
j=1
h j=1
h 2
n
X n
X
xi − x xi − x µ2 (K) 00
r(xi )K − r(x) K = r (x)nh3 + o nh3 ,
i=1
h i=1
h 2
n
X
xj − x
s(x) K = s(x)nh + o (nh) ,
j=1
h
ce qui donne, en revenant à la relation de départ
b −θ
E(θ)
Z 1 µ2 (K) µ2 (K) 00
2
r(x)s00 (x)n2 h4 + 2
r (x)s(x)n2 h4 + o (n2 h4 )
= 2 dx,
0 [nh + o(nh)]
ce qui achève la preuve du biais pour ce cas 2.
Z 1 Z 1 Z 2
1 1
V θb = 2 τ 2 2
r(x) dx + σ 2 2
s(x) dx Ψ(z)dz +o .
0 0 0 n n
Preuve: Par un calcul analogue à celui du calcul de la variance dans le cas

1, et on a
n X
n X
n X
n Z x −x
!
X K xih−x K jh
1
θb2 = Yi Z j Yk Z l Pn Pn xi −x
x −x dx
i=1 j=1 k=1 l=1 0 i=1 j=1 K h
K jh
Z 1 !
K xkh−y K xlh−y
× P n Pn xk −y

xl −y dy .
0 k=1 l=1 K h
K h
Les différents cas de figure de la quadruple somme sont (a) i = k, j = l, (b)

i = k, j 6= l, (c) i 6= k, j = l et (d) i 6= k, j 6= l. En utilisant E(Yi2 ) = r(xi )2 +σ 2
et E(Zj2 ) = s(xj )2 + τ 2 , il vient
2
V θb = E θb2 − E θb
n X
n Z x −x
!
X 2 K jh1
K xi −x
2 2 2 2 2 h
= σ s(xj ) + τ r(xj ) + σ τ dx
i=1 j=1 0 S(x)2
Z 1 x −y
!
K xih−y K jh
× dy
0 S(y)2
n X
n X
n Z x −x
!
X 1
K xih−x K jh
2
+τ r(xi )r(xk ) dx
i=1 j=1 k=1 0 S(x)2
k6=i
Z x −y
!
1 xk −y
K h
K jh
× dy
0 S(y)2
n X
n X
n Z xj −x
!
X 1
K xi −x
K
2 h h
+σ s(xj )s(xl ) dx
i=1 j=1 l=1 0 S(x)2
l6=j
Z !
1 xi −y xl −y
K h
K h
× dy
0 S(y)2
Z 1Z P n xi −x
xi −y
hPn 2 xj −x xj −y
i
i=1 K K j=1 s(xj ) K K
1
h h h h
= σ2 dxdy
0 0 S(x)2 S(y)2
Z 1Z Pn 2 xi −x
xi −y
hPn xj −x xj −y i
i=1 r(xi ) K K h j=1 K K
1
2 h h h
+τ dxdy
0 0 S(x)2 S(y)2
Z 1Z P n xi −x
xi −y
hPn xj −x xj −y
i
i=1 K K j=1 K K
1
h h h h
+σ 2 τ 2 dxdy
0 0 S(x)2 S(y)2

Pn Pn xi −x
xk −y
hPn xj −x xj −y
i
Z 1Z 1 i=1 k=1 r(xi )r(xk )K h
K h j=1 K h
K h
k6=i
+τ 2 dxdy
0 0 S(x)2 S(y)2

P n xi −x
xi −y
Pn Pn xj −x xl −y

Z 1Z 1 i=1 K h
K h j=1 l=1 s(xj )s(xl )K h
K h
2 l6=j
+σ dxdy
0 0 S(x)2 S(y)2
Or, on a vu au calcul de la variance dans le cas 1 que

n
X
xi − x xi − y y−x
K K = nhΨ + o(nh).
i=1
h h h
Par des calculs analogues, on montrerait aussi que
n
X
2 xj − x xj − y y−x
s(xj ) K K = nhs(x)Ψ + o(nh),
j=1
h h h
et on remarque également que
n X
X n
xi − x xk − y
r(xi )r(xk )K K = n2 h2 r(x)r(y) + o n2 h2 .
i=1 k=1
h h
k6=i
Ainsi, en revenant à notre calcul de variance, on obtient finalement (après

changement de variable z = y−x
h
)

V θb

2 1 2 1 2 2 1
= σ ×O +τ ×O +σ τ ×O
n2 h n2 h n2 h
Z 1 Z 1 Z 2
2 2 2 2 1 1
+2 τ r(x) dx + σ s(x) dx Ψ(z)dz +o ,
0 0 0 n n
ce qui donne la variance annoncée, en prenant n et h tels que nh −→ +∞.

Z 1
b − θ = µ2 (K)
E(θ) r(x)r (x)dx h2 00
0
Z 1
2 1 2 1
−R(K) r(x) dx +o h + .
0 nh nh
Preuve:
b −θ
E(θ)
n X n Z x −x Z 1
X 1
K xih−x K jh
= r(xi )r(xj ) P n dx −
xi −x 2
r(x)2 dx
i=1 j=1 0 i=1 K h
0
j6=i
 
Z n
n X xj −x
1 X K ih x −x
K 
=  r(x )r(x ) h
− r(x)2 
 i j Pn
xi −x 2  dx
0 i=1 j=1 i=1 K h
j6=i
Z 1
P n xi −x
2 Pn 2

xi −x 2
P n xi −x
2
i=1 r(xi )K h
− i=1 r(xi ) K h
− i=1 r(x)K h
= Pn
xi −x 2
dx.
0 i=1 K h
Calculons d’abord la somme
n
X 2
2 xi − x
r(xi ) K .
i=1
h
En procédant de façon analogue aux calculs précédents (approximation d’une

somme par une intégrale, même changement de variable, on écrit de plus r(x +
hs) = r(x) + o(1)), on obtient
n
X 2
2 xi − x
(V.13) r(xi ) K = R(K)r(x)2 nh + o(nh).
i=1
h
Calculons à présent
" n #2 " n #2
X xi − x X xi − x
r(xi )K − r(x)K .
i=1
h i=1
h
On a, comme précédemment,
" n #2 " n #2
r(xi )K − r(x)K
i=1
h i=1
h
" n X n #
X xi − x xi − x
= r(xi )K − r(x)K
i=1
h i=1
h
" n X n #
X xi − x xi − x
× r(xi )K + r(x)K
i=1
h i=1
h
" n n
X xi − x r 00 (x) X xi − x
0 2
= r (x) (xi − x)K + (xi − x) K
i=1
h 2 i=1 h
X n !#
x i − x
+o (xi − x)2 K
i=1
h
" n n !#
× 2r(x) K +o K .
i=1
h i=1
h
On utilise alors (V.10), le fait

Pque K est un noyau d’ordre 2 et une approxi-
mation par une intégrale de ni=1 (xi − x)2 K xih−x , et on obtient
" n #2 " n #2
(V.14) r(xi )K − r(x)K
i=1
h i=1
h

µ2 (K) 00 3 3
= r (x)nh + o(nh ) × [2r(x)nh + o(nh)]
2
= µ2 (K)r(x)r 00 (x)n2 h4 + o(n2 h4 ).
En revenant au calcul du biais et en utilisant les résultats (V.10), (V.12) et

(V.14), on obtient
Z 1
b −θ = µ2 (K)r(x)r 00 (x)n2 h4 + o(n2 h4 ) − R(K)r(x)2 nh + o(nh)
E(θ) dx,
0 n2 h2 + o(n2 h2 )
ce qui donne le biais annoncé.
Z 1 Z 2
b = 8σ 2 2 1 1
V(θ) r(x) dx Ψ(z)dz +o .
0 0 n n
b on a
Preuve: D’après la définition de θ,
n X
n X n
n X Z x −x
!
X 1
K xih−x K jh
b2
θ = Yi Yj Yk Yl Pn dx
xi −x 2
i=1 j=1 k=1 l=1 0 i=1 K h
j6=i l6=k
Z !
1
K xkh−y K xlh−y
× Pn dy .
xk −x 2
0 k=1 K h
On commence par chercher les différents cas de figure pour la quadruple somme
ci-dessus. On répertorie ces cas ci-dessous.

l = j → Yi2 Yj2 → n(n − 1)
• k=i 2
 l 6= j →Yi Yj Yl → n(n
2 2
− 1)(n − 2)

 l = i → Yi Yj → n(n − 1)

 k=j

 l 6= i → Yi Yj2 Yl → n(n − 1)(n − 2)

• k=
6 i 

  l = i → Yi2 Yj Yk → n(n − 1)(n − 2)



 k 6= j l = j → Yi Yj2 Yk → n(n − 1)(n − 2)
 
l 6= i, j → Yi Yj Yk Yl → n(n − 1)(n − 2)(n − 3)
Comme Yi2 = r(xi )2 + 2r(xi )i + 2i , on a E (Yi2 ) = r(xi )2 + σ 2 , et on obtient

donc dans les cas de figures suivants

E Yi2 Yj2 = r(xi )2 r(xj )2 + σ 2 r(xi )2 + r(xj )2 + σ 4 ,
pour j 6= i,

E Yi2 Yj Yk = r(xi )2 r(xj )r(xk ) + σ 2 r(xj )r(xk ),
pour j 6= i et k 6= i, j,
E (Yi Yj Yk Yl ) = r(xi )r(xj )r(xk )r(xl ),
pour j 6= i, k 6= i, j et l 6= i, j, k. On déduit donc de ceci
(V.15) b = E(θb2 ) − E(θ)

V(θ) b2
Xn X n X
n X n
= [E(Yi Yj Yk Yl ) − r(xi )r(xj )r(xk )r(xl )]
i=1 j=1 k=1 l=1
j6=i l6=k
Z x −x
! Z 1 !
K xih−x K jh
1
K xkh−y K xlh−y
× Pn dx ×
xi −x 2
Pn dy
xk −y 2
0 i=1 K h
0 k=1 K h
n X
X n

= 2 σ 2 r(xi )2 + r(xj )2 + σ 4
i=1 j=1
j6=i
Z x −x
! Z x −y
!
K xih−x K jh
1 1
K xih−y K jh
× Pn dx × Pn dy
xi −x 2 xi −y 2
0 i=1 K h
0 i=1 K h
n X
X n Xn
+4 σ 2 r(xj )r(xk )
i=1 j=1 k=1
j6=i k6=i,j
Z x −x
! Z !
1
K xih−x K jh 1
K xih−y K xkh−y
× Pn dx ×
xi −x 2
Pn dy .
xk −y 2
0 i=1 K h
0 k=1 K h
Le premier terme de la double somme de (V.15) s’écrit

n X
n Z x −x
!
X 1
K xih−x K jh
(V.16) r(xi )2 P n dx
xi −x 2
i=1 j=1 0 i=1 K h
j6=i
Z x −y
!
K xih−y K jh
1
× Pn dy
xi −y 2
0 i=1 K h
Z 1Z 1 " n X n
1 X xi − x xj − x
= 2 2
r(xi )2 K K
0 0 S(x) S(y) i=1 j=1
h h

xi − y xj − y
×K K dxdy
h h
Z 1Z 1 " n 2 2 #
1 X x i − x x i − y
− 2 2
r(xi )2 K K dxdy.
0 0 S(x) S(y) i=1
h h
Or, en utilisant à nouveau l’approximation d’une somme par une intégrale

(double cette fois), on a
n X
X n

2 xi − x xj − x xi − y xj − y
r(xi ) K K K K
i=1 j=1
h h h h
ZZ
u−x v−x
= n2 r(u)2 K K
([x−h,x+h]∪[y−h,y+h])2 h h

u−y v−y
×K K dudv + o(n2 h2 )
h h
Z
2 2 u−x u−y
= n r(u) K K du
Z
v−x v−y
× K K dv + o(n2 h2 ).
u−x
Considérons le cas x ≤ y (le cas x ≥ y est analogue). En posant s = h
et
t = v−x
h
, ceci donne
n X
X n
r(xi ) K K K K
i=1 j=1
h h h h
Z
y−x
= n 2 h2 r(x + hs)2 K(s)K s − ds
[−1,1]∪[−1+ y−x
h
,1+ y−x
h
] h
Z
y−x
× K(t)K t − dt + o(n2 h2 )
y−x
[−1,1]∪[−1+ h ,1+ h ]y−x h
Z 1
2 2 2 y−x
= n h r(x + hs) K(s)K s − ds
−1+ y−x
h
h
Z 1
y−x
× K(t)K t − dt + o(n2 h2 ).
−1+ h y−x h
R 2 u−x
u−y

Remarquons
que, par
exemple dans r(u) K h
K h
du, les quantités
u−y
K u−xh
et K h
du ont des supports non disjoints uniquement pour y com-
x−y
pris entre x − 2h et x + 2h, soit h compris entre −2 et 2. On déduit alors,
en utilisant la fonction Ψ vue dans le cas 1
n X
X n
r(xi ) K K K K
i=1 j=1
h h h h
2
2 2 2 y−x
= n h [r(x) + o(1)] Ψ + o(n2 h2 ).
h
En utilisant ce résultat, on revient au calcul de (V.16) qui donne alors

Z 1Z " n n
1
1 XX xi − x xj − x
2
2 2
r(xi ) K K
0 0 S(x) S(y) i=1 j=1
h h

xi − y xj − y
×K K dxdy
h h
Z 1 Z x+2h 2 2 2
n h [r(x) + o(1)]2 y−x
= 2 Ψ dxdy
0 x−2h [nh + o(nh)]4 h

n 2 h3
+o ,
[nh + o(nh)]4
y−x
d’où, en posant z = h
et en supposant que nh → +∞
Z 1Z " n n
1
1 XX xi − x xj − x
2
2 2
r(xi ) K K
0 0 S(x) S(y) i=1 j=1
h h

xi − y xj − y
×K K dxdy
h h
Z 1 Z 2
2 2 1 1 1
= 2 r(x) dx Ψ(z) dz 2
+o 2
=o .
0 0 n h n h n
Par un calcul très analogue au précédent, on a aussi
Z 1Z " n 2 2 #
1
1 X x i − x x i − y 1
2
2 2
r(xi ) K K dxdy = o .
0 0 S(x) S(y) i=1 h h n
Ainsi, en revenant à la relation (V.16), on a finalement

n X
n Z x −x
!
X 1
K xih−x K jh
(V.17) r(xi )2 Pn dx
xi −x 2
i=1 j=1 0 i=1 K h
j6=i
Z x −y
!
1
K xih−y K jh 1
× Pn dy = o
xi −y 2
.
n
0 i=1 K h
Passons maintemant au calcul de la somme triple dans (V.15). On a
n X
n X n Z x −x
!
X 1
K xih−x K jh
(V.18) r(xj )r(xk ) P n dx
xi −x 2
i=1 j=1 k=1 0 i=1 K h
j6=i k6=i,j
! Z
K xkh−y K xlh−y 1
× Pn dy
xk −x 2
0 k=1 K h
n X
X n h i
n X n h i
n X
X n h i
n X
X Xn Xn h i X n h i
= − − − − .
i=1 j=1 k=1 i=1 j=1 i=1 j=1 i=1 k=1
j6=i j6=i k6=i |i=1{z }
| {z } | {z } | {z } cas k=j=i
cas k=i cas k=j cas j=i
Dans cette expression, par des calculs analogues aux précédents, les quatre
dernières sommes donnent des o n1 . Il reste à calculer la première somme, et
là aussi, un calcul similaire (en approximant la triple somme par une intégrale
triple) conduit à
n X
X n X
n
xi − x xj − x xi − y xk − y
r(xj )r(xk )K K K K
i=1 j=1 k=1
h h h h
Z
u−x u−y
= n3 K K du
Z x+h Z y+h
v−x w−y
× r(v)K dv r(w)K dw + o(n3 h3 ).
x−h h y−h h
u−x v−x
Ceci donne, pour x ≤ y et en posant s = h
, t= h
,
n X
X n X
n

xi − x
xj − x xi − y xk − y
r(xj )r(xk )K K K K
i=1 j=1 k=1
h h h h
Z 1 Z 1
3 3 y−x
= nh Ψ r(x + ht)K(t)dt r(y + hz)K(z)dz + o(n3 h3 )
h
−1 −1
y−x
= n 3 h3 Ψ [r(x) + o(1)] [r(y) + o(1)] + o(n3 h3 ).
h
y−x
En posant z = h
, on obtient finalement en revenant à (V.18)
n X n
n X Z x −x
!
X 1
K xih−x K jh
(V.19) r(xj )r(xk ) P n dx
xi −x 2
i=1 j=1 k=1 0 i=1 K h
j6=i k6=i,j
Z !
K xkh−y K xlh−y 1
× Pn dy
xk −x 2
0 k=1 K h
Z 1 3 4 2 Z 2 3 4

n h [r(x) + o(1)] n h
= 2 Ψ(z)dz dx + o
0 [nh + o(nh)]4 0 [nh + o(nh)]4
Z 1 Z 2
1 1
= 2 r(x)2 dx Ψ(z)dz +o .
0 0 n n
En revenant finalement à l’expression (V.15) de la variance et en utilisant

les relations (V.17) et (V.19), on obtient le résultat.
V.3. RÉGRESSION SUR COMPOSANTES
PRINCIPALES - PREUVES
Preuve de la proposition III.15. — Pour tous i1 , i2 = 1, . . . , n, si on prend

hi1 i2 de la forme p−ζ avec ζ ∈ [1/4, 1/2[, on a

ci1 i2 − Mi1 i2 = OP 1
M .
np1/2
Preuve: On va utiliser les résultats prouvés dans la section III.4.2. Dans le

cas où i1 6= i2 , on a, d’après les propositions III.11 et III.12,

c
E M i1 i2 − M i1 i2
Z 1
1 µ2 (K) 00 00 2 2

= [Xi1 (t)Xi2 (t) + Xi1 (t)Xi2 (t)] dt hi1 i2 + o hi1 i2 ,
n 2 0
et

V M ci1 i2 − Mi1 i2
Z 1 Z 2
1 2
2 2
1 1
= 2
2 σδ Xi1 (t) + Xi2 (t) dt Ψ(z)dz +o .
n 0 0 p p
De même, dans le cas où i1 = i2 = i, on a, d’après les propositions III.13 et

III.14,
194 V.3. RÉGRESSION SUR COMPOSANTES PRINCIPALES - PREUVES
1 Z 1

c
E Mii − Mii = µ2 (K) 00 2 2
Xi (t)Xi (t)dt hii + o hii ,
n 0
et
Z 1 Z 2
cii − Mii 1 1 1
V M = 2 4σδ2 2
Xi (t) dt Ψ(z)dz +o .
n 0 0 p p
On en déduit, pour tous i1 , i2 = 1, . . . , n,

ci1 i2 − Mi1 i2 = O h2i1 i2
E M ,
n
et

ci1 i2 = O 1
V M .
n2 p
Ainsi, en prenant hi1 i2 de la forme p−ζ avec ζ ∈ [1/4, 1/2[, on s’assure d’un
biais négligeable, et le résultat de la proposition III.15 est immédiat.
2
c 1
M − M = O P .
p
Preuve: On a
2 h τ i X n
n X 2
c c−M c−M = ci1 i2 − Mi1 i2 ,
M − M ≤ Tr M M M
i1 =1 i2 =1
et le résultat est alors immédiat en utilisant la proposition III.15.

V.3. RÉGRESSION SUR COMPOSANTES PRINCIPALES - PREUVES 195
Preuve de la proposition III.17. — Sous les hypothèses de la proposition

III.17, on a, pour tout r = 1, . . . , L,

b r − λr = O P 1 1
λ 1/2 1/2
+ .
n p p
Preuve: On commence par donner un résultat provenant de Kneip et Utikal

(2001). On a
h i
b c
λr − λr = Tr PEr M − M + R1 ,
où PEr désigne la matrice de projection sur le sous-espace propre associé à la

r ème valeur propre λr , et R1 vérifie
2
c
6 M − M
|R1 | ≤ .
mins=1,...,n,s6=r |λr − λs |
En utilisant la proposition III.16 et l’hypothèse de cette proposition concernant

les valeurs propres de M, on en déduit que

1
R1 = O P ,
p
ce qui donne

b τ c 1
λr − λ r = p r M − M p r + O P .
p
D’autre part, on a
2
E pτr c
M − M pr
n X
X n X
n X
n h i
= p i1 r p i2 r p i3 r p i4 r E ci1 i2 − Mi1 i2
M ci3 i4 − Mi3 i4
M
i1 =1 i2 =1 i3 =1 i4 =1
X n X n X n h i
= 2 c
p i1 r p i2 r p i3 r E M i1 i2 − M i1 i2 M ci1 i3 − M i1 i3
i1 =1 i2 =1 i3 =1
X n X n X n
2 1
≤ p i1 r |pi2 r | |pi3 r | × O ,
i1 =1 i2 =1 i3 =1
n2 p
Pn Pn
ce qui donne finalement, vu que i1 =1 p2i1 r = 1 et que i2 =1 |pi2 r | = O n1/2 ,

c − M pr = O P 1
pτr M ,
n p1/2
1/2
et achève la preuve de la proposition III.17.
Preuve de la proposition III.18. — Sous les hypothèses de la proposition

III.17, on a

1
kb
pr − p r k = O P .
p1/2
Preuve: On commence là aussi par énoncer un résultat provenant de Kneip

et Utikal (2001). On a

b r − pr = −Sr M
p c − M pr + R 2 ,
où Sr est la matrice définie par
X 1
Sr = PE ,
s6=r
λs − λ r s
V.3. RÉGRESSION SUR COMPOSANTES PRINCIPALES - PREUVES 197
et R2 vérifie
2
c
6 M − M
kR2 k ≤ .
minλ6=λr |λ − λr |2
En utilisant la proposition III.16 et l’hypothèse de cette proposition concernant

les valeurs propres de M, on en déduit que

1
kR2 k = O .
p

c
En posant q r = M − M pr , on obtient donc

1
kb
pr − p r + S r q r k = O P .
p
Calculons maintenant kSr k. On a
" ! ! #1/2
X 1 X 1
kSr k = sup v τ ps1 pτs1 ps2 pτs2 v
kvk=1 s1 6=r
λ s1 − λ r
s2 6=r
λ s2 − λ r
" ! ! #1/2
1 X X
≤ sup v τ ps1 pτs1 ps2 pτs2 v ,
`r kvk=1 s 6=r s 6=r
1 2
P{|λr−1 − λr | , |λr+1 − λr |}. Comme le suprémum

où `r = min
P
ci-dessus n’est
τ
autre que s6=r ps ps , on en déduit qu’il est inférieur à k ps pτs k = 1, d’où
1
kSr k ≤ .
`r
D’après les hypothèses faites sur les λr , on en déduit alors que

kSr k = O (1) .
Pour tout v ∈ Rn , on a alors

E (v τ Sr q r )2 = v τ Sr E (q r q τr ) Sr v
h τ i
2 2 τ c c
≤ kvk kSr k E pr M − M M − M pr .
D’après ce qui précède, on obtient ainsi
!
2 kvk2
E (v τ Sr q r ) =O .
np
Finalement, en prenant comme vecteur v le vecteur dont toutes les compo-

santes sont nulles sauf la ième qui vaut 1, il vient

1
E (Sr q r )2i =O ,
np
d’où

1
|b
pir − pir | = OP ,
n p1/2
1/2
ce qui prouve le résultat de la proposition III.18.

BIBLIOGRAPHIE
[1] Aneiros-Perez, G., Cardot, H., Estevez-Perez, G. and Vieu, P. (2004).

Maximum ozone concentration forecasting by functional nonparametric
approaches. Environmetrics, 15, 675-685.
[2] Averous, J. and Meste, M. (1997). Median balls : an extension of the

interquartile intervals to multivariate distributions. Journal of Multiva-
riate Analysis, 63, 222-241.
[3] Bassett, G. and Koenker, R. (1978). Asymptotic theory of least absolute

error regression. Journal of the American Statistical Association, 73,
618-622.
[4] Benhenni, K. and Cambanis, S. (1992). Sampling designs for estimating

integrals of stochastic processes. Annals of Statistics, 20, 161-194.
[5] Benko, M., Härdle, W. and Kneip, A. (2005). Common functional prin-
cipal components. SFB 649 Economic Risk Discussion Paper, 2006-010.
[6] Berlinet, A., Biau, G. and Rouvière, L. (2005). Functional classification

with wavelets. Preprint.
[7] Berlinet, A., Cadre, B. and Gannoun, A. (2001). On the conditional

L1 -median and its estimation. Journal of Nonparametric Statistics, 13,
631-645.
[8] Besse, P., Cardot, H. and Ferraty, F. (1997). Simultaneous nonparametric

regression of unbalanced longitudinal data. Computational Statistics and
Data Analysis, 24, 255-270.
200 BIBLIOGRAPHIE
[9] Besse, P., Cardot, H. and Stephenson, D. (2000). Autoregressive fore-

casting of some functional climatic variations. Scandinavian Journal of
Statistics, 27, 673-687.
[10] Besse, P. and Ramsay, J.O. (1986). Principal components analysis of

sampled functions. Psychometrika, 51, 285-311.
[11] Bhattacharya, P.K. and Gangopadhyay, A.K. (1990). Kernel and nearest-
neighbor estimation of a conditional quantile. Annals of Statistics, 18,
1400-1415.
[12] Bosq, D. (2000). Linear processes in function spaces. Lecture Notes in

Statistics, 149, Springer.
[13] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Clas-
sification and regression trees. Wadsworth Statistics and Probability Se-
ries, Wadsworth Advanced Books and Software, Belmont.
[14] Cadre, B. (2001). Convergent estimators for the L1 -median of a Banach

valued random variable. Statistics, 35, 509-521.
[15] Cardot, H. (2000). Nonparametric estimation of smoothed principal com-

ponents analysis of sampled noisy functions. Nonparametric Statistics,
12, 503-538.
[16] Cardot, H. (2006). Conditional functional principal components analysis.

Scandinavian Journal of Statistics, to appear.
[17] Cardot, H., Crambes, C., Kneip, A. and Sarda, P. (2006). Smoothing
splines estimators in functional linear regression with errors-in-variables.
Computational Statistics and Data Analysis, special issue on functional
data analysis, to appear.
[18] Cardot, H., Crambes, C. and Sarda, P. (2004a). Estimation spline

de quantiles conditionnels pour variables explicatives fonctionnelles.
Comptes Rendus de l’Académie des Sciences, 339, 141-144.
[19] Cardot, H., Crambes, C. and Sarda, P. (2004b). Conditional quantiles

with functional covariates : an application to ozone pollution forecasting.
In Compstat 2004 Proceedings, J. Antoch editor, Physica-Verlag, 769-776.
[20] Cardot, H., Crambes, C. and Sarda, P. (2005). Quantile regression when
the covariates are functions. Journal of Nonparametric Statistics, 17,
841-856.
BIBLIOGRAPHIE 201
[21] Cardot, H., Crambes, C. and Sarda, P. (2006). Conditional quantiles

with functional covariates : an application to ozone pollution forecasting.
In Applied Biostatistics : Case Studies and Interdisciplinary Methods,
Xplore e-book, to appear.
[22] Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model.
Statistic and Probability Letters, 45, 11-22.
[23] Cardot, H., Ferraty, F. and Sarda, P. (2003). Spline estimators for the
functional linear model. Statistica Sinica, 13, 571-591.
[24] Cardot, H. and Sarda, P. (2005). Estimation in generalized linear mo-

dels for functional data via penalized likelihood. Journal of Multivariate
Analysis, 92, 24-41.
[25] Chatelin, F. (1983). Spectral approximation of linear operators. Academic

Press, New-York.
[26] Chiou, J-M., Müller, H.-G., Wang, J-L and Carey, J.R. (2003). A func-
tional multiplicative effects model for longitudinal data, with application
to reproductive histories of female medflies. Statistica Sinica, 13, 1119-
1133.
[27] Chiou, J-M., Müller, H.-G. and Wang, J-L. (2003). Functional quasi-
likelihood regression models with smooth random effects. Journal of the
Royal Statistical Society, Series B, 65, 405-423.
[28] Chiou, J-M., Müller, H.-G. and Wang, J-L. (2004). Functional response
models. Statistica Sinica, 14, 675-693.
[29] Cohen, A. (2003). Numerical analysis of wavelets methods. Elsevier, Am-

sterdam.
[30] Crambes, C. (2005). Total least squares for functional data. Invited paper
in ASMDA 2005 Proceedings, 619-626.
[31] Cuevas, A., Febrero, M. and Fraiman, R. (2002). Linear functional re-
gression : the case of a fixed design and functional response. Canadian
Journal of Statistics, 30, 285-300.
[32] Damon, J. and Guillas, S. (2002). The inclusion of exogenous variables in

functional autoregressive ozone forecasting. Environmetrics, 13, 759-774.
[33] Daubechies, I. (1992). Ten lectures on wavelets. SIAM, Philadelphia.

202 BIBLIOGRAPHIE
[34] Dauxois, J. et Pousse, A. (1976). Les analyses factorielles en calcul des

probabilités et en statistique : essai d’étude synthétique. Thèse de docto-
rat, Université Paul Sabatier, Toulouse.
[35] Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for
the principal component analysis of a random vector function : some
applications to statistical inference. Journal of Multivariate Analysis,
12, 136-154.
[36] de Boor, C. (1978). A practical guide to splines. Applied Mathematical

Sciences, Springer, New York.
[37] Demmel, J. (1992). The componentwise distance to the nearest singular

matrix. SIAM, Journal of Matrix Analysis and Applications, 13, 10-19.
[38] Deville, J.-C. (1974). Méthodes statistiques et numériques de l’analyse

harmonique. Annales de l’I.N.S.E.E., 15, 3-101.
[39] Devroye, L., Györfi, L. and Lugosi, G. (1996). A probabilistic theory of

pattern recognition. Applications of Mathematics, Springer, New York.
[40] Dierckx, P. (1993). Curve and surface fitting with splines. Monographs
on Numerical Analysis, Oxford Science Publications, New York.
[41] Ducharme, G., Gannoun, A., Guertin, M.-C. and Jéquier, J.-C. (1995).
Reference values obtained by kernel-based estimation of quantile regres-
sion. Biometrics, 51, 1105-1116.
[42] Dunford, N. and Schwarz, J. (1963). Linear operators. Interscience, New

York.
[43] Eubank, R.L. (1988). Spline smoothing and nonparametric regression.

Marcel Dekker.
[44] Fan, J., Hu, T.-C. and Truong, Y (1994). Robust nonparametric function
estimation. Scandinavian Journal of Statistics, 21, 433-446.
[45] Ferraty, F., Rahbi, A. and Vieu, P. (2005). Conditional quantiles for
functional dependent data with application to the climatic El Nino phe-
nomenon. Sankhya, 67, 378-399.
[46] Ferraty, F. and Vieu, P. (2002). The functional nonparametric model and
application to spectrometric data. Computational Statistics, 17, 545-564.
BIBLIOGRAPHIE 203
[47] Ferraty, F. and Vieu, P. (2003). Curves discrimination : a nonparametric

functional approach. Special issue in honour of Stan Azen : a birthday
celebration. Computational Statistics and Data Analysis, 44, 161-173.
[48] Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis :

theory and practice. Springer, New York.
[49] Frank, I.E. and Friedman, J.H. (1993). A statistical view of some che-
mometrics regression tools. Technometrics, 35, 109-135.
[50] Fuller, W.A. (1987). Measurement error models. Wiley, New York.
[51] Gasser, T., Sroka, L. and Jennen-Steinmetz, C. (1986). Residual variance

and residual pattern in nonlinear regression. Biometrika, 3, 625-633.
[52] Ghattas, B. (1999). Prévisions des pics d’ozone par arbres de régression,
simples et agrégés par bootstrap. Revue de Statistique Appliquée, XLVII,
61-80.
[53] Gleser, L.J. (1981). Estimation in a multivariate “errors-in-variables”

regression model : large sample results. Annals of Statistics, 9, 24-44.
[54] Goberg, I.C. et Krein, M.G. (1971). Introduction à la théorie des

opérateurs linéaires non auto-adjoints dans un espace hilbertien. Dunod,
Paris.
[55] Golub, G.H., Hansen, P.C and O’Leary, D.P. (1999). Tikhonov regula-
rization and total least squares. SIAM, Journal of Matrix Analysis and
Applications, 21, 185-194.
[56] Golub, G.H. and Van Loan, C.F. (1980). An analysis of the total least
squares problem. SIAM, Journal of Numerical Analysis, 17, 883-893.
[57] Golub, G.H. and Van Loan, C.F. (1996). Matrix computations. Johns
Hopkins University Press, Baltimore.
[58] Good, I.J. (1969). Some applications of the singular value decomposition
of a matrix. Technometrics, 11, 823-831.
[59] Goutis, C. (1998). Second derivative functional regression with appli-

cations to near infrared spectroscopy. Journal of the Royal Statistical
Society, Series B, 60, 103-114.
204 BIBLIOGRAPHIE
[60] Green, P.J. and Silverman, B.W. (1994). Nonparametric regression and
generalized linear models : a roughness penalty approach. Monographs on
Statistics and Applied Probability, Chapman and Hall, London.
[61] Hall, P. and Marron, J.S. (1987). Estimation of integrated squared den-
sity derivatives. Statistics and probability Letters, 6, 109-115.
[62] Härdle, W. (1991). Smoothing techniques with implementation in S.

Springer, New-York.
[63] Hastie, T. and Tibshirani, R. (1990). Generalized additive models. Mono-

graphs on Statistics and Applied Probability, Chapman and Hall, Lon-
don.
[64] Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant

analysis. Annals of Statistics, 23, 73-102.
[65] Hastie, T. and Mallows, C. (1993). Discussion of “A statistical view of

some chemometrics regression tools.” by Frank, I.E. and Friedman, J.H.
Technometrics, 35, 140-143.
[66] He, X. and Shi, P. (1994). Convergence rate of B-spline estimators of

nonparametric conditional quantile functions. Nonparametric Statistics,
3, 299-308.
[67] Helland, I.S. (1990). Partial least squares regression and statistical mo-
dels. Scandinavian Journal of Statistics, 17, 97-114.
[68] Hoerl, A.E. and Kennard, R.W. (1980). Ridge regression : advances,
algorithms and applications. American Journal of Mathematical Mana-
gement Sciences, 1, 5-83.
[69] Huang, L.-S. and Fan, J. (1999). Nonparametric estimation of quadratic

regression functionals. Bernoulli, 5, 927-949.
[70] Jones, M.C. and Sheater, S.J. (1991). Using non-stochastic terms to ad-
vantage in kernel-based estimation of integrated squared density deriva-
tives. Statistics and probability Letters, 11, 511-514.
[71] Kneip, A., Li, X., Mac Gibbon, K.B. and Ramsay, J.O. (2000). Curve
registration by local regression. Canadian Journal of Statistics, 28, 19-
29.
BIBLIOGRAPHIE 205
[72] Kneip, A. and Utikal, K.J. (2001). Inference for density families using
functional principal component analysis. Journal of the American Sta-
tistical Association, 96, 519-542.
[73] Koenker, R. (2005). Quantile regression. Econometric Society Mono-

graphs, Cambridge.
[74] Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica,

46, 33-50.
[75] Koenker, R. and Machado, J. (1999). Goodness of fit and related infe-
rence processes for quantile regression. Journal of the American Statis-
tical Association, 94, 1296-1310.
[76] Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines.
Biometrika, 81, 673-680.
[77] Lejeune, M. and Sarda, P. (1988). Quantile regression : a nonparametric

approach. Computational Statistics and Data Analysis 6, 229-239.
[78] Marx, B.D. and Eilers, P.H. (1999). Generalized linear regression on
sampled signals and curves : a P -spline approach. Technometrics, 41,
1-13.
[79] McCullagh, P. and Nelder, J.A. (1989). Generalized linear models (Se-
cond Edition). Monographs on Statistics and Applied Probability, Chap-
man and Hall, London.
[80] Mosteller, F. and Tukey, J. (1977). Data analysis and regression : a se-
cond course in statistics. Addison-Wesley, Reading.
[81] Müller, H.-G. (2005). Functional modeling and classification of longitu-

dinal data. Scandinavian Journal of Statistics, 32, 223-240.
[82] Müller, H.-G. and Stadtmüller, U. (2005). Generalized functional linear

models. Annals of Statistics, 33, 774-805.
[83] Nadaraya, E.A. (1964). On estimating regression. Theory of Probability

and its Applications, 10, 186-190.
[84] Poiraud-Casanova, S. et Thomas-Agnan, C. (1998). Quantiles condition-

nels. Journal de la Société Française de Statistique, 139, 31-44.
206 BIBLIOGRAPHIE
[85] Ramsay, J.O. (1982). When the data are functions. Psychometrika, 47,
379-396.
[86] Ramsay, J.O. (2000). Functional components of variation in handwriting.

Journal of the American Statistical Association, 95, 9-15.
[87] Ramsay, J.O. and Dalzell, C.J. (1991). Some tools for functional data
analysis. Journal of the Royal Statistical Society, Series B, 53, 539-572.
[88] Ramsay, J.O. and Li, X. (1998). Curve registration. Journal of the Royal
Statistical Society, Series B, 60, 351-363.
[89] Ramsay, J.O. and Silverman, B.W. (1997). Functional data analysis
(First Edition). Springer, New York.
[90] Ramsay, J.O. and Silverman, B.W. (2002). Applied functional data ana-
lysis. Springer, New York.
[91] Ramsay, J.O. and Silverman, B.W. (2005). Functional data analysis (Se-
cond Edition). Springer, New York.
[92] Rao, C.R. (1958). Some statistical methods for comparison of growth
curves. Biometrics, 14, 1-17.
[93] Rio, E. (2000). Théorie asymptotique des processus aléatoires faiblement

dépendants. Springer-Verlag, Berlin.
[94] Ruppert, D. and Caroll, J. (1988). Transformation and weighting in re-

gression. Chapman and Hall, New York.
[95] Ruppert, D., Sheater, S.J. and Wand, M.P. (1993). An effective band-
width selector for local least squares regression. Working paper, 93-017.
[96] Sarda, P. and Vieu, P. (2000). Kernel regression. In Smoothing and Re-
gression : Approches, Computation and Application, M.G. Schimek edi-
tor, Wiley Series in Probability and Statistics, 43-70.
[97] Schumaker, L.L. (1981). Spline functions : basic theory. Wiley, New York.
[98] Sima, D.M. and Van Huffel, S. (2004). Appropriate cross validation for
regularized error-in-variables linear models. In Compstat 2004 Procee-
dings, J. Antoch editor, Physica-Verlag, 1815-1822.
BIBLIOGRAPHIE 207
[99] Stone, C.J. (1982). Optimal rates of convergence for nonparametric mo-
dels. Annals of Statistics, 10, 1040-1053.
[100] Stone, C.J. (1985). Additive regression and other nonparametric models.
Annals of Statistics, 13, 689-705.
[101] Tsybakov, A.B. (1986). Robust reconstruction of functions by the local-

approximation method. Problems of Information Transmission, 22, 133-
146.
[102] Tucker, L.R. (1958). Determination of parameters of a functional relation

by factor analysis. Psychometrika, 23, 19-23.
[103] Uspensky, J.V. (1937). Introduction to mathematical probability.

McGraw-Hill Book Company, New York.
[104] Utreras, F. (1983). Natural spline functions, their associated eigenvalue

problem. Numerische Mathematik, 42, 107-117.
[105] Van Huffel, S. and Vandewalle, J. (1991). The total least squares problem :
computational aspects and analysis. SIAM, Philadelphia.
[106] Wahba, G. (1990). Spline models for observational data. SIAM, Phila-
delphia.
[107] Watson, G.S. (1964). Smooth regression analysis. Sankhya, Series A, 26,
359-372.
[108] Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized li-

near models, and the Gauss-Newton method. Biometrika, 61, 439-447.
[109] Weinberger, H.F. (1974). Variational methods for eigenvalue approxima-

tion. SIAM, Philadelphia.
[110] Yao, F., Müller, H.-G. and Wang, J.-L. (2005a). Functional data ana-
lysis for sparse longitudinal data. Journal of the American Statistical
Association, 100, 577-590.
[111] Yao, F., Müller, H.-G. and Wang, J.-L. (2005b). Functional linear regres-
sion analysis for longitudinal data. Annals of Statistics, 33, 2873-2903.

Thesemodele Lineaire

Transféré par

Droits d'auteur :

Formats disponibles

Thesemodele Lineaire

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Thesemodele Lineaire

Transféré par

Droits d'auteur :

Formats disponibles

Modèles de régression linéaire pour variables

To cite this version:

HAL Id: tel-00134003

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

DOCTORAT DE L’UNIVERSITÉ PAUL SABATIER

Modèles de régression linéaire pour variables

Directeurs de thèse : Hervé Cardot et Pascal Sarda

Soutenue le 23 novembre 2006 devant le jury composé de Messieurs :

Benoı̂t Cadre Université Montpellier II Rapporteur

Laboratoire de Statistique et Probabilités

Modèles de régression linéaire pour variables

Je voudrais tout d’abord remercier Pascal Sarda et Hervé Cardot pour

Je tiens ensuite à remercier Benoı̂t Cadre et Antonio Cuevas pour avoir

Je souhaite également remercier Frédéric Ferraty de faire partie de mon

Je voudrais à présent remercier les professeurs du Laboratoire de Statis-

Je tiens aussi à remercier particulièrement Françoise Michel pour sa dispo-

Enfin, je voudrais remercier ma famille, plus particulièrement mes parents

Partie I. Estimation spline de quantiles conditionnels pour variable

I.1. Présentation de l’estimateur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

I.2. Quantile regression when the covariates are functions . . . . . . 35

I.4.2. Construction of the estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

I.4.3. Convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

I.4.4. Some comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

I.4.5. Proof of the convergence result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

I.3. Commentaires et perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Partie II. Estimateur par splines de lissage dans le modèle linéaire

II.1. Construction de l’estimateur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

II.2. Résultat de convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

II.3. Commentaires et perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Partie III. Modèle linéaire fonctionnel lorsque la variable explicative

III.1. Moindres carrés orthogonaux - Cas multivarié . . . . . . . . . . . . 77

III.2. Moindres carrés orthogonaux - Cas fonctionnel . . . . . . . . . . 83

III.2.1. Construction de l’estimateur (splines de régression) . . . . . . . . . . 84

III.2.2. Résultat de convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

III.2.4. Estimateur par splines de lissage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

III.3. Functional linear regression with errors-in-variables . . . . . . 91

III.3.2. Estimation of α in the non-noisy case . . . . . . . . . . . . . . . . . . . . . . . . 94

III.3.3. Total Least Squares method for functional covariates . . . . . . . . 99

III.3.4. Some comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

III.3.5. A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

III.3.6. Proof of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

III.4. Régression sur composantes principales . . . . . . . . . . . . . . . . . . . . 121

III.4.1. Procédure d’estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

III.4.2. Intégrale du carré de la régression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

III.4.3. Résultats asymptotiques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

III.4.4. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Partie IV. Application à la prévision de pics de pollution . . . . 133

IV.1. Prévision par les quantiles conditionnels . . . . . . . . . . . . . . . . . . 135

IV.1.1. Algorithme d’estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

IV.1.2. Choix des paramètres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

IV.1.3. Modèle avec plusieurs variables explicatives . . . . . . . . . . . . . . . . . . 138

IV.2. Prévision par la moyenne conditionnelle . . . . . . . . . . . . . . . . . . 141

IV.2.1. Estimation par splines de régression . . . . . . . . . . . . . . . . . . . . . . . . . . 141

IV.2.2. Estimation par splines de lissage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

IV.3. Données de pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

IV.4. Ozone pollution forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

IV.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

IV.4.2. A brief analysis of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

IV.4.3. Functional linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151