Xiii Clapem-Abstracts Book

Descargar como pdf o txt
Descargar como pdf o txt
Está en la página 1de 204

ISSN 2389-9069

Organized by:

xiii
Sponsors: clapem-Latin American Congress of
Probability and Mathematical Statistics

Latin American Congress of Probability


and Mathematical Statistics
September 22nd to 26th, 2014
Cartagena de Indias, Colombia
Hotel Caribe

ABSTRACTS BOOK
ISSN 2389-9069

xiii
clapem-Latin American Congress of
Probability and Mathematical Statistics

September 22nd to 26th, 2014


Cartagena de Indias, Colombia
Hotel Caribe

ABSTRACTS BOOK
Editors

Viswanathan Arunachalam
(Departamento de Estadstica, Universidad Nacional de Colombia)

Liliana Blanco Castaeda


(Departamento de Estadstica, Universidad Nacional de Colombia)

Johanna Garzn
(Departamento de Matemticas, Universidad Nacional de Colombia)

Sandra Vergara
(Departamento de Estadstica, Universidad Nacional de Colombia)

Primera edicin: septiembre de 2014

Universidad Nacional de Colombia


Facultad de Ciencias
Carrera 30 Calle 45, edificio 404, oficina 125
Telfono: 3165000 ext. 13074
Bogot, D. C., Colombia

ISSN 2389-9069

Diseo grfico editorial, armada electrnica


e impresin:
Proceditor ltda.
Calle 1C No. 27 A 01
Telfono: 757 9200. Fax: ext. 102
Bogot, D. C., Colombia
[email protected]

Impreso en Colombia - Printed in Colombia

reservados todos los derechos. Esta publicacin no puede ser reproducida ni en su


todo ni en sus partes, ni registrada en o transmitida por un sistema de recuperacin
de informacin, en ninguna forma ni por ningn medio sea mecnico, fotoqumico,
electrnico, magntico, electroptico, por fotocopia o cualquier otro, sin el permiso
previo por escrito de la editorial.
Scientific Committee
Ricardo Fraiman (Chair) Universidad de la Repblica, Uruguay
Leonardo Trujillo Universidad Nacional de Colombia
Ramn Giraldo Universidad Nacional de Colombia
Victor Prez Abreu CIMAT. Mxico
Karine Bertin Universidad de Valparaso. Chile
Badih Ghattas Universite de la Mediterranee, France
Jos Len Universidad Central de Venezuela
David Aldous University of California, USA
Servet Martnez Universidad de Chile.
Antonio Galves Universidade de Sao Paulo, Brazil
Graciela Boente Universidad Buenos Aires. Argentina
Pablo Ferrari Universidad Buenos Aires. Argentina
Paola Bermolen Universidad de la Repblica. Uruguay
Pascal Massart Universit de Paris, France
Erico presutti Gran Sasso Science Institute, Italy

Organizing Committee
Liliana Blanco Castaeda (Chair) Universidad Nacional de Colombia (Bogot)
Viswanathan Arunachalam Universidad Nacional de Colombia (Bogot)
Johanna Garzn Merchn Universidad Nacional de Colombia (Bogot)
Jos Alfredo Jimnez Universidad Nacional de Colombia (Bogot)
Leonardo Trujillo Universidad Nacional de Colombia (Bogot)
Sandra Vergara Cardozo Universidad Nacional de Colombia (Bogot)
Jorge Mario Ramrez Osorio Universidad Nacional de Colombia (Medelln)
Adolfo Quiroz Universidad de los Andes
Mara Elsa Correal Universidad de los Andes
Vctor Hugo Prieto Universidad Antonio Nario
Sandra Gutirrez Meza Universidad de Cartagena
Csar Serna Universidad Central
Francisco Zuluaga Universidad Eafit (Medelln)
Germn Moreno Arenas Universidad Industrial de Santander
lvaro Calvache Archila Universidad Pedaggica y Tecnolgica de Colombia
Carmen Helena Cepeda Araque Universidad Pedaggica y Tecnolgica de Colombia
Sandra Patricia Crdenas Ojeda Universidad Pedaggica y Tecnolgica de Colombia
Martha Corrales Universidad Sergio Arboleda
Organized by:

Sponsors:
Contents

Introduction 1

Welcome 3

Short courses 5

1.1. Topics in quantitative risk management 7


1.2. Stochastic models of population genetics 7
1.3. Confidence distribution (CD): A new approach
in distributional inference and its applications 8

Plenary lectures 9

2.1. Distributed statistical algorithms 11


2.2. Stochastic processes with random contexts:
A characterization and adaptive estimators
for the transition probabilities 11
2.3. Multifractal statistics 12
2.4. Asymptotic theory for the sample covariance
matrix of a heavy-tailed multivariate time series 13
2.5. Asymptotic behaviour of first passage time
distributions for Lvy processes 13

Thematic sessions 15

3.1. Causal inference 17


3.2. Data driven penalty calibration 21
3.3. Fragmentation processes 23
3.4. Functional data 25
3.5. Hypoelliptic diffusions 28
3.6. Mixed and joint modeling 30
3.7. Optimal designs 32
3.8. Particle systems 34
3.9. Probability and statistics in finance 36
3.10. Random graphs and detection problems 37
3.11. Random segmentation models 38
3.12. Random trees and applications 40
3.13. Robust statistics 41
3.14. Sampling methods 43
3.15. Special session dedicated to Victor Yohai 46
3.16. Stochastic analysis 48
3.17. Francisco Aranda Ordaz Award 50

Contributed talks 53

Contributed talks 1 54
Contributed talks 2 58
Contributed talks 3 61
Contributed talks 4 66
Contributed talks 5 70
Contributed talks 6 72
Contributed talks 7 76
Contributed talks 8 80
Contributed talks 9 83
Contributed talks 10 86
Contributed talks 11 90
Contributed talks 12 93
Contributed talks 13 97
Contributed talks 14 100
Contributed talks 15 102
Contributed talks 16 106

Posters 109

Index of Authors 193


Introduction
CLAPEM 2014

Welcome

Thanks for attending the XIII CLAPEM (Congreso Latinoamericano de


Probabilidad y Estadistica Matematica) in Cartagena de Indias, Colombia
from September 22nd to the 26th, 2014. We are sure you are already enjoy-
ing the city of Cartagena and this event will be remembered for the rest of
our lives. This event is organized by Universidad Nacional de Colombia, Uni-
versidad de los Andes, Universidad de Cartagena, Universidad Industrial de
Santander, Universidad Central, Universidad Antonio Nario, Universidad
Sergio Arboleda, Universidad Eafit, Universidad de Antioquia, Universidad
Pedaggica y Tecnolgica de Colombia, Embajada de Francia y la Cooperacin
Regional para los pases Andinos de la Embajada de Francia en Colombia Our
financial sponsors were Colciencias, Bernoulli Society, Academia Colombi-
ana de Ciencias Exactas y Naturales, Sociedad Colombiana de Matemticas,
International Statistical Institute, National Science Foundation, USA, Insti-
tute of Mathematical Statistics, Universidad del Tolima, Fundacin Universi-
taria Los Libertadores, SAS y Avianca.

The objective of CLAPEM is to be the meeting point of researches and stu-


dents in the area of probability and mathematical statistics. It will offer to the
community of Latin America the opportunity to actualize, socialize and dis-
cuss their knowledge in the field of probability and statistics with help of short
courses and plenary talks given by well-known international specialists. There
are also contributed talks, invited thematic sessions and posters in the main
topics of Probability and Mathematical Statistics. The congress will consolidate
the national and regional statistics communities, thanks to the interchange of
knowledge of the experience, both theoretical and experimental, and the pro-
motion for the creation of research interest groups in their field.

There is an interesting variety of topics for the thematic sessions in this version
of the event: Causal Inference, Data driven penalty calibration, Fragmentation
processes, Functional data, Hypoelliptic diffusions, Mixed and Joint Modeling,
Optimal designs, Particle systems, Probability and statistics in finance, Ran-
dom segmentation models, Random trees and applications, Robust statistics,
Sampling methods This edition of the event count with the participation of

3
CLAPEM 2014 Universidad Nacional de Colombia

recognized leaders in the areas of Probability, Stochastic Process and Mathe-


matical Statistics. Some invited speakers are Prof. Alison Etheridge (the Oxford
University, U.K), Prof. Regina Liu (Rutgers University, U.S.A), Prof. Paul
Embrechts, (ETH Zrich, Switzerland), Prof. Carenne Ludea (Instituto Vene-
zolano de Investigaciones Cientficas, Venezuela), Prof. Gerard Biau, (Univer-
sit Pierre et Marie Curie and Institute Universitaire de France, France); Prof.
Roberto Imbuzeiro Oliveira (IMPA, Brazil), Prof. Thomas Mikosch (University
of Copenhagen, Denmark) and Prof. Victor Rivero (CIMAT, Mexico), among
many others.

There will be more than sixty (60) oral contributions and more than one hundred
(100) poster contributions. Participants are coming from all over the world. We
have confirmed the participation of people from Argentina, Brazil, Chile, Colom-
bia, Costa Rica, Cuba, Denmark, France, India, Peru, Spain, Switzerland, United
Kingdom, USA, Uruguay and Venezuela, among others.

We wish a successful event for you all!

Liliana Blanco Castaeda


Chair
Organizing Committee
XIII CLAPEM
Universidad Nacional de Colombia. Bogot
September 2014

4
Short courses
CLAPEM 2014

1.1. Topics in quantitative risk management

Embrechts Paul
Department of Mathematics ETH Zrich,
Switzerland.

Abstract
In this course, we discuss the main statistical tools for the quantitative analysis
of solvency (risk capital) for banks and insurance companies. Topics included
are:

1) The loss operator and risk measures


2) Multivariate statistical models
3) Dependence modeling linear correlation (copulas)
4) An introduction to extreme value theory
5) Examples from current regulatory praxis.

The course is mainly based on the textbook:

McNeil, A.J., Frey, R. and Embrechts, P. (2005). Quantitative risk management:


Concepts, techniques, tools. Princeton University Press.

Further material is to be found on my website: www.math.ethz.ch/ embrechts

1.2. Stochastic models of population genetics

Alison Etheridge
Oxford University, U.K.

Abstract
We provide an introduction to some mathematical models that arise in theo-
retical population genetics. These fall into two classes: forwards-in-time models
for the evolution of frequencies of different genetic types in a population; and,
backwards-in-time (coalescent) models that trace out the genealogical rela-
tionships between individuals in a sample from the population. Some, like the
Wright-Fisher model, date right back to the origins of the subject. Others, like

7
CLAPEM 2014 Universidad Nacional de Colombia

multiple merger coalescents or (spatial) Lambda-Fleming-Viot processes are


much more recent. In these lectures we can do no more than skim the surface,
but we shall aim to give a taste of the rich mathematical structures underpin-
ning all these models.

1.3. Confidence distribution (CD):


A new approach in distributional
inference and its applications

Regina Liu
Rutgers University, USA.

Abstract
A confidence distribution (CD) is a sample-dependent distribution function
that can be used as an estimate for an unknown parameter. It can be viewed
as a distribution estimator of the parameter. CDs have been shown be effec-
tive tools in statistical inference. Specifically, we discuss the usefulness of CDs
in: exact meta-analysis approach for discrete data and its application to 2 x 2
tables with rare events, combining heterogeneous studies using only summary
statistics, combining the test results from independent studies, and providing
efficient network meta-analysis.

8
Plenary lectures
CLAPEM 2014

2.1. Distributed statistical algorithms

Gerard Biau
Universit Pierre et Marie Curie and Institut
Universitaire de France.

Abstract
Modern learning architectures must be flexible enough to accommodate the
ever increasing size of datasets involved in the Big Data regime. Drawing
inspiration from the theory of distributed computation models developed in
the context of gradient-type optimization algorithms, I will present a consen-
sus-based asynchronous distributed solution for nonparametric online regres-
sion and analyze some of its asymptotic properties. A companion software
implemented in Go (an open source native concurrent programming language
developed at Google Inc.) is also delivered. Substantial numerical evidence
involving up to 44 parallel processors is provided on synthetic datasets to assess
the excellent performance of the method, both in terms of computation time
and prediction gains.

2.2. Stochastic processes with random


contexts: A characterization and adaptive
estimators for the transition probabilities

Roberto Imbuzeiro Oliveira


IMPA, Brazil.

Abstract
This paper introduces the concept of random context representations for the
transition probabilities of a finite-alphabet stochastic process. Processes with
these representations generalize context tree processes (a.k.a. variable length
Markov chains), and are proven to coincide with processes whose transition
probabilities are almost surely continuous functions of the (infinite) past. This
is similar to a classical result by Kalikow about continuous transition prob-
abilities. Existence and uniqueness of a minimal random context representa-
tion are proven and an estimator of the transition probabilities based on this

11
CLAPEM 2014 Universidad Nacional de Colombia

representation is shown to have very good pastwise adaptativity properties.


In particular, it achieves minimax performance, up to logarithmic factors, for
binary renewal processes with bounded 2 + moments.

2.3. Multifractal statistics

Carenne Ludea
Escuela de Matemticas, Facultad de Ciencias, UCV.

Abstract
Multifractal models seem to appear everywhere. They have been successfully
used in applications ranging from natural phenomena such as turbulence, rain-
fall or earthquakes to man made data in finance or internet traffic and are asso-
ciated to anomalous scaling: that is, a nonlinear scaling law for the moments of
the processes increments over finite time intervals. However, multifractals are
characterized by a multiplicity of local Hlder exponents within any finite time
interval also. In fact, both concepts are intimately related by what has come to
be known as the multifractal formalism, where the scaling function and the
spectrum of the Hlder exponents, or multifractal spectrum, are obtained as
the Legendre transform of each other under certain conditions. In practice,
although models tend to be described by their multifractal spectrum, most esti-
mation procedures for multifractal models are based on the estimation of the
scaling function. However, this turns out to be a non trivial problem as estima-
tion based on the empirical moments is intrinsically biased. In this talk, we
will run through several key concepts, models and current developments for
multifractal processes from a statistics point of view.

12
CLAPEM 2014

2.4. Asymptotic theory for the sample covariance


matrix of a heavy-tailed multivariate time series

Thomas Mikosch
University of Copenhagen.
(Richard A. Davis, Columbia NY and Oliver Pfaffel,
Munich).

Abstract
We give an asymptotic theory for the eigenvalues of the sample covariance
matrix of a multivariate time series. The time series constitutes a linear process
across time and between components. The input noise of the linear process has
regularly varying tails with index (0,4); in particular, the time series has an
infinite fourth moment. We derive the limiting behavior for the largest eigen-
values of the sample covariance matrix and show point process convergence of
the normalized eigenvalues. The limiting process has an explicit form involving
points of a Poisson process and eigenvalues of a non-negative definite matrix.
Based on this convergence, we derive limit theory for a host of other continuous
functionals of the eigenvalues, including the joint convergence of the largest
eigenvalues, the joint convergence of the largest eigenvalue and the trace of the
sample covariance matrix, and the ratio of the largest eigenvalue to their sum.

2.5. Asymptotic behaviour of first passage


time distributions for Lvy processes

Vctor Rivero
Centro de Investigacin en Matemticas A.C.,
Mxico.

Abstract
Let X be a real valued Lvy process that is in the domain of attraction of
a stable law. In the first part of this talk, we will consider the case where
X is non-monotone. As an analogue of the random walk results in [4] and

13
CLAPEM 2014 Universidad Nacional de Colombia

[1] we will describe the local behaviour of the distribution of the lifetime
under the characteristic measure n of excursions away from 0 of the process
X reflected in its past infimum, and of the first passage time of X below 0,
T0 = inf{t > 0 : Xt < 0}, under x (), for x > 0, in two different regimes for x,
viz. x = o(c()) and x > Dc(), for some D > 0. We sharpen our estimates by
distinguishing between two types of path behaviour, viz. continuous passage
at T0 and discontinuous passage. Some sharp local estimates for the entrance
law of the excursion process associated to X reflected in its past infimum will
be described. In the second part of the talk, we will describe the case where X
is non-incresing, i.e. is a subordinator.

Based in the papers [2,3] in collaboration with R. Doney.

Keywords: Lvy processes, first passage time distribution, local limit theorems,
fluctuation theory.

References
1. Doney, R. A. (2012). Local behaviour of first passage probabilities. Proba-
bility Theory and Related Fields, 152(3-4), 559-588.

2. Doney, R. A. and Rivero, V. (2012). Asymptotic behaviour of first passage


time distributions for Lvy processes. Probability theory and related fields
(pp. 1-45). 10.1007/s00440-012-0448-x.

3. Doney, R. A. and Rivero, V. (2013). Asymptotic behaviour of first passage


time distributions for subordinators. Submitted 2013.

4. Vatutin, V. A. and Wachtel, V. (2009). Local probabilities for random walks


conditioned to stay positive. Probability theory and related fields, 143(1-2),
177-217.

14
Thematic sessions
CLAPEM 2014

3.1. Causal inference

Organized by Andrea Rotnitzky (Harvard School of Public Health, USA)

Inverse probability of censoring weighted


U-statistics for right-censored data with
an application to testing hypotheses

Somnath Datta
University of Lousville, USA.
(Bandyopadhyay and Satten).

Abstract

A right-censored version of a U-statistic with a kernel of degree m1 is intro-


duced by the principle of a mean preserving reweighting scheme which is also
applicable when the dependence between failure times and the censoring vari-
able is explainable through observable covariates. Its asymptotic normality and
an expression of its standard error are obtained through a martingale argu-
ment. We study the performances of our U-statistic by simulation and compare
them with theoretical results. A doubly robust version of this reweighted U-sta-
tistic is also introduced to gain efficiency under correct models while preserv-
ing consistency in the face of model misspecifications. Using a Kendalls kernel,
we obtain a test statistic for testing homogeneity of failure times for multiple
failure causes in a multiple decrement model. The performance of the proposed
test is studied through simulations. Its usefulness is also illustrated by applying
it to a real data set on graft-versus-host-disease.

17
CLAPEM 2014 Universidad Nacional de Colombia

Unifying the counterfactual and graphical


approaches to causality via single
world intervention graphs (SWIGs)

Thomas Richardson
Professor and Chair
Department of Statistics University of Washington.
(James M. Robins, Harvard School of Public Health,
USA).

Abstract
Models based on potential outcomes, also known as counter factuals, were
introduced by Neyman (1923) and later applied to observational contexts by
Rubin (1974). Such models are now used extensively within Biostatistics, Sta-
tistics, Political Science, Economics, and Epidemiology for reasoning about
causation. Directed acyclic graphs (DAGs), introduced by Wright (1921) are
another formalism used to represent causal systems. Graphs are extensively
used in Computer Science, Bioinformatics, Sociology and Epidemiology.

In this talk, I will present a simple approach to unifying these two frame-
works via a new graph, termed the Single-World Intervention Graph (SWIG).
The SWIG encodes the counter factual independences associated with a spe-
cific hypothetical intervention on a set of treatment variables. The nodes on
the SWIG are the corresponding counter factual random variables. The SWIG
is derived from a causal DAG via a simple node splitting transformation. I
will illustrate the theory with a number of examples. Finally we illustrate that
SWIGs avoid a number of pitfalls that are present in an alternative approach to
unification, based on twin networks that has been advocated by Pearl (2000).

Links
Short paper: http://www.statslab.cam.ac.uk/ rje42/uai13/Richardson.pdf

Full paper: http://www.csss.washington.edu/Papers/wp128.pdf

18
CLAPEM 2014

Higher order influence functions and


minimax estimation of non-linear functionals

James Robins
Harvard School of Public Health, USA.

Abstract
I describe recent advances in the theory of estimation with higher order influ-
ence functions. The theory is a theory of point and interval estimation for non-
linear functionals in parametric, semi-, and non-parametric models that applies
equally to both root-n and non-root-n problems. The theory reproduces results
previously obtained by the modern theory of non-parametric inference, pro-
duces many new non-root-n results, and most importantly, opens up the ability
to perform non-root-n inference in complex high dimensional models, such as
models for the estimation of the causal effect of time varying treatments in the
presence of time varying confounding and informative censoring. Higher order
influence functions are higher order U-statistics. The theory extends first order
semiparametric theory based on first order influence functions. I will describe
recent results on constructing tests of independence between two random vari-
ables that are rate-optimal against certain natural omnibus alternatives.

19
CLAPEM 2014 Universidad Nacional de Colombia

Improving the performance of double-


robust instrumental variables estimators
under model misspecification

Stijn Vansteelandt
Ghent University, Belgium.
(Ghent University, Belgium and Vanessa Didelez,
University of Bristol, U.K.).

Two-stage least squares estimators and variants thereof are widely used in
econometrics and beyond to infer the effect of an exposure on an outcome
using data on instrumental variables. In biostatistics, a separate literature on
instrumental variable estimation has developed, which uses double-robust
G-estimators in so-called structural mean and distribution models instead.
These are consistent when either a working model for the distribution of the
instrumental variable (given covariates) or a working model for the (counter-
factual exposure-free) outcome mean (given covariates) is correctly specified,
but not necessarily both. We examine the performance of locally efficient dou-
ble-robust G-estimators in simulation studies, and find it to be sometimes poor
under model misspecification. We therefore propose adaptive G-estimation
procedures which improve efficiency under misspecification of one working
model, and reduce bias under misspecification of both working models. Sim-
ulation studies demonstrate drastic improvements relative to locally efficient
G-estimators as well as two-stage least squares estimators.

20
CLAPEM 2014

3.2. Data driven penalty calibration

Organized by Vincent Rivoirard (Universit Paris Dauphine, France)

Pointwise adaptive estimation of the marginal


density of a weakly dependent process

Karine Bertin
CIMFAV, Universidad de Valparaso, Chile.
(Nicolas Klutchniko, ENSAI, Rennes, France.)

Abstract
We studied the estimation of the common marginal density function of weakly
dependent stationary processes. The accuracy of estimation is measured using
pointwise risks. We propose a data-driven procedure using kernel rules. The
bandwidth is selected using the approach of Goldenshluger and Lepski and we
prove that the resulting estimator satisfies an oracle type inequality. The pro-
cedure is also proved to be adaptive (in a minimax framework) over a scale
of Hlder balls for several types of dependence: classical econometrics models
such as GARCH as well as dynamical systems and i.i.d. sequences can be con-
sidered using a single procedure of estimation. Some simulations illustrate the
performance of the proposed method.

21
CLAPEM 2014 Universidad Nacional de Colombia

Adaptive pointwise estimation


of a conditional density

Claire Lacour
Paris Sud Orsay.
(Karime Bertin, CIMFAV, Chile and Vincent
Rivoirard, Universit Paris Dauphine, France).

Abstract
This talk is devoted to a calibrated method for estimating a conditional den-
sity. We consider a sample of independent and identically distributed observa-
tions ( Xi ,Yi )1i n and we are interested in the conditional density of Yi given Xi,
defined by

f (x , y )dy = P (Yi dy | Xi = x ).

Motivated by Approximate Bayesian Computation, our aim is to give an adap-


tive method to estimate f (x0 ,.) at a fixed point x0. We use the recent method
of Goldenshluger and Lepski to select an estimator among a family of kernel
estimators. We shall explain how to select an optimal bandwidth adapted to the
point x0 using only the observations. We give results for our estimator in terms
of oracle inequality and minimax rate of convergence.

22
CLAPEM 2014

3.3. Fragmentation processes

Organized by Joaquin Fontbona (CMM, Chile)

An optimal stopping problem for


fragmentation processes

Juan Carlos Pardo


Centro de Investigacin en Matemticas A.C.,
Mxico.

Abstract
In this talk, we consider a toy example of an optimal stopping problem driven
by fragmentation processes. We show that one can work with the concept of
stopping lines to formulate the notion of an optimal stopping problem and
moreover, to reduce it to a classical optimal stopping problem for a general-
ized Ornstein-Uhlenbeck process associated with Bertoins tagged fragment.
We go on to solve the latter using a classical verification technique thanks to
the application of aspects of the modern theory of integrated exponential Lvy
processes. (Join work with A. Kyprianou).

23
CLAPEM 2014 Universidad Nacional de Colombia

Ray-Knight representation of Lvy-


driven logistic branching processes

Mara Clara Fittipaldi


Universidad de Buenos Aires, Argentina.
Email: [email protected]

Using the construction of continuum random trees associated with general


Lvy processes given by J.-F. Le Gall and Y. Le Jan [3] and a generalization of
the pruning procedure developed by R. Abraham and J.-F. Delmas [1, 2], we
obtain a Ray-Knight type representation for general continuous-state branch-
ing processes with logistic growth (LB-process), which give us a description of
its continuous genealogy. This result extends the Ray-Knight representation for
logistic Feller difussion given by V. Le, E. Pardoux and A. Wakolbinger [4]. This
is a joint work with J. Berestycki and J. Fontbona.

References
1. R. Abraham, and J.-F. Delmas, A continuum-tree-valued Markov process,
Ann. Probab. 40 (2012), no. 3, 1167|1211.

2. R. Abraham, J.-F. Delmas, and G. Voisin, Pruning a Lvy Continuum Ran-


dom Tree, Elec. J. of Probab. 15 (2010), no. 46, 1429|1473.

3. J.-F. Le Gall and Y. Le Jan, Branching processes in Lvy processes: The


exploration process, Ann. Probab. 26 (1998), 213|252.

4. E. Pardoux and A. Wakolbinger, From exploration paths to mass excur-


sions - variations on a theme of Ray and Knight, 2011.

24
CLAPEM 2014

3.4. Functional data

Organized by Mariela Sued (Universidad de Buenos Aires, Argentina) and


Daniela Rodrguez (Universidad de Buenos Aires, Argentina)

Discretized nonparametric
regression for functional data

Pamela Llop
Facultad de Ingeniera Qumica (UNL) and
Instituto de Matemtica Aplicada del Litoral (UNL -
CONICET), Argentina.
(Liliana Forzania, Facultad de Ingeniera Qumica
(UNL) and Instituto de Matemtica Aplicada del
Litoral (UNL - CONICET), Argentina; and Ricardo
Fraiman Universidad de la Repblica, Uruguay)

Abstract
Technological progress in collecting and storing data has provided data sets
recorded at finite grids of points which, thanks to the new technologies,
become increaisngly more denser and denser over time. Although in practice
data always come in the form of finite dimensional vectors, from the theoreti-
cal point of view, the classic multivariate techniques are not suitable to deal
with this kind of data. In this direction, the asymptotic theory can be analyzed
either assuming the existence of continuous underlying stochastic processes
ideally observed at every point, or transforming the (observed) discrete values
into functions via interpolation (error less case), smoothing (if error is present),
splines or series approximations. When dealing with the regression problem
for discretized functional data, a natural question that emerges is which the
relationship between the ideal nonparametric regression estimate computed
with the entire curve and the one computed with the discretized sample. In
this direction, we state conditions under which the consistency of the estimator
computed with the discretized trajectories can be derived from the consistency
of the one based on the whole curves. Also, we give conditions on the grid size
discretization in order to achieve the same rates of convergence as in infinite
dimensional setting. Those results are consequence of two more general results

25
CLAPEM 2014 Universidad Nacional de Colombia

which, besides discretization, also includes the case of smoothing via regular-
ization, basis representation or interpolation data.

Modeling repeated functional data, with


applications to trajectories of fertility

Hans-Georg Mller
University of California, Davis, USA.
(K. Chen, University of Pittsburgh, USA and P.
Delicado Universitat Politcnica de Catalunya,
Barcelona, Spain).

Abstract
Repeatedly observed functional data are encountered in various applications.
These include demographic trajectories observed for each calendar year. A
previous conditional double functional principal component approach to rep-
resent such processes poses complex problems for both theory and applica-
tions. A simpler and more interpretable approach can be based on a marginal
rather than conditional functional principal component representation of the
underlying function valued processes. An additional assumption of common
principal components leads to the special case of a simple tensor product rep-
resentation. For samples of independent realizations of the underlying func-
tion-valued stochastic process, this approach leads to straightforward fitting
methods for obtaining the components of these models. The resulting estimates
can be shown to satisfy asymptotic consistency properties. The proposed meth-
ods are illustrated with an application to trajectories of fertility that are repeat-
edly observed over many calendar years for 17 countries.

26
CLAPEM 2014

Functional principal component


analysis revisted

Jane-Ling Wang
Department of Statistics, University of California,
Davis, USA.
(Xiaoke Zhang, University of California, Davis
USA).

Abstract
Functional data analysis (FDA) deals with the analysis of a sample of func-
tions or curves. Traditional multivariate principal component analysis (PCA)
has been successfully extended to the functional setting and the core issue is the
estimation of the mean function and covariance surface of the functional data.
The methodology and theory often vary and depend on the sampling plan of
the functional data. In this talk, we focus on a unified approach and theory that
can handle any sampling plan and different weighing schemes in the functional
PCA approaches. The theory leads to interesting types of asymptotic behavior
depending on the sampling plan, which also has an effect on the performance
of different weighing schemes. Two commonly adopted weighing schemes are
compared.

27
CLAPEM 2014 Universidad Nacional de Colombia

3.5. Hypoelliptic diffusions

Organized by Patrick Cattiaux (University of Toulouse, France).

On some estimates for (strictly)


hypoelliptic diusions processes

Stphane Menozzi
Universite dEvry Val dEssonne.

Abstract
In this talk, we will present various techniques for studying processes of the
form:
t t
Xt1 = x1 + F1 (s, X s )ds + (s, X s )dWs ,
0 0
t
X = x2 + F2 (s, X s )ds,
2
t
0
t
X = x3 + F3 (s, X s2 ,, X sn )ds
3
t
0


t
Xtn = xn + Fn (s, X sn1 , X sn ),
0

where W is a Brownian motion. Systems of the above form appear in many


applicative fields: from statistical mechanics to kinetics and finance. A charac-
teristic feature in this equation is that the noise acts only on the first compo-
nent. If some non-degeneracy is assumed on the diffusion coefficient and the
drift term (weak Hormander condition), the noise can then propagate in the
system and yields a density for the underlying process. We will discuss how this
density can be investigated through usual perturbation techniques la parame-
trix and then introduce some more functional approaches based on stochastic
control (Fleming transform). These approaches lead to pointwise multi-scale
Arnonson like Gaussian bounds when the coefficients are smooth. On the other
hand we will also investigate the martingale problem for a continuous diffusion
coefficient.

28
CLAPEM 2014

Estimation for hypoelliptic diffusions

Clmentine Prieur
Universit Joseph Fourier - Grenoble I
Jos Len
Universidad Central de Venezuela

Abstract
In this work, we are interested in harmonic oscillators perturbed with a
gaussian white noise. More precisely, we consider (Zt = (xt , yt ) 2d , t 0)
governed by the following.

Ito stochastic differential equation:

dxt = yt

dyt = IdWt (c(xt , yt ) yt + V (xt ))

We assume that the process is ergodic with a unique invariant probability mea-
sure m, and that the convergence in the ergodic theorem is quick enough. We
also discuss sufficient conditions for this. For such oscillators, we aim at study-
ing inference issues such as the estimation of the density of the invariant prob-
ability measure m, as far as the estimation of the drift or the variance term.
One major issue in our study is that we work with incomplete data, observing
only the first coordinate X. Thus we approximate the Y component by finite
differences. Even in case the potential is the Duffings one V (x ) = x / 4 x 2 / 2
(Kramers oscillator) this problem is not easy. We focus on non-parametric
inference, see [1,2, 3].

References

1. Cattiaux, P., Len, J. R. and Prieur, C. (2014). Estimation for stochastic


damping hamiltonian systems under partial observation. I. Invariant den-
sity. Stochastic Processes and Their Applications. 124(3), 1236-1260.

2. Cattiaux, P., Len, J. R. and Prieur, C. (2013). Estimation for stochastic


damping hamiltonian systems under partial observation. II. Drift term.
Submitted. http://hal.archives-ouvertes.fr/hal-00877054.

3. Cattiaux, P., Len, J. R. and Prieur, C. (2014). Estimation for stochastic


damping hamiltonian systems under partial observation. III. Diffusion
Drift term. Submitted. http://hal.archives-ouvertes.fr/hal-01044611.

29
CLAPEM 2014 Universidad Nacional de Colombia

3.6. Mixed and joint modeling

Organized by Carles Serrat (Universitat Politecnica de Catalunya, Spain)

Some new tools for mixed effects models

Marc Lavielle
Inria Saclay, Popix team, France

Keywords: population approach, mixed effects models, modeling, simulation,


monolix

Abstract
Population models describe biological and physical phenomena observed in
each of a set of individuals, and also the variability between individuals. This
approach finds its place in domains like pharmacometrics when we need to
quantitatively describe interactions between diseases, drugs and patients. This
means developing models that take into account that different patients react
differently to the same disease and the same drug. The population approach can
be formulated in statistical terms using mixed effects models.

We will see how the framework allows us to represent models for many dif-
ferent data types including continuous, categorical, count and time-to-event
data. This opens the way for the use of quite generic methods for modeling and
simulating these diverse data types.

Mlxtran is a declarative language designed for encoding hierarchical models,


including complex mixed effects models. Mlxtran is also a particularly power-
ful solution for encoding dynamical systems represented by a system of ordi-
nary differential equations. Mlxtran is used by several software tools including
Monolix for modeling and Mlxplore for model exploration (http://lixoft.com).

Mlxtran is also used by Simulx, a R and Matlab function for easily comput-
ing predictions and simulating data from such complex mixed effects models
(https://team.inria.fr/popix/mlxtoolbox).

30
CLAPEM 2014

Joint Models for the analysis of time-to-


event data with longitudinal information

Carles Serrat
Universitat Politcnica de Catalunya-Catalonia,
Spain

Abstract
The aim of this presentation is to review joint modelling techniques for the
simultaneous analysis of timetoevent data and longitudinal timevarying
data. This is an increasing area of interest in both the methodological and the
applied point of view and it allows the analysis and understanding of complex
systems.

Among others, three main advantages of this approach are: a) it corrects the
bias derived from a traditional separate analysis, b) the modelization allows to
incorporate and model the between and within correlation among observations
and, c) true longitudinal profiles for endogenous covariates can be included in
the relative hazard survival submodel.

The relevant benefit of these models is being able to estimate the effect of each
subjectspecific longitudinal profile in the hazard function for the event of
interest, in an adaptive manner. In particular, subjectspecific dynamic pre-
dictions, like conditional survival functions given the available longitudinal
information, can be derived.

Estimation procedure and implementation in R (JM and JMbayes packages)


will be introduced and some illustrations will be given. Extensions to the case
of multiple time-to-event variables with multiple longitudinal covariates will
be also considered.

Keywords: joint modeling, shared random effects models, relative risks models

31
CLAPEM 2014 Universidad Nacional de Colombia

3.7. Optimal designs

Organized by Timothy OBrien (Loyola University Chicago, USA)

Efficient experimental design


strategies for multicategory logit
regression settings and bioassay

Timothy E. OBrien
Department of Mathematics and Statistics, Loyola
University Chicago.

Abstract
Analysis of multi-category response data in which the multinomial dependent
variable is linked to selected covariates includes several rival models. These
models include the adjacent category (AC), baseline category logit (BCL), two
variants of the continuation ratio (CR), and the proportional odds (PO). For a
given set of data, the fits and predictions associated with these various models
can vary quite dramatically as can the associated optimal designs (which are
then used to estimate the respective model parameters).

Keywords: Goodness-of-Fit, Multinomial Regression Models, Optimal Design,


Robustness, Synergy.

32
CLAPEM 2014

Optimal designs for estimation


and discrimination for nonlinear
mixed models effects

Vctor Ignacio Lpez


Ros, Instituto de Matemticas, Facultad de Ciencias
Exactas y Naturales, Universidad de Antioquia,
(Mara Eugenia Castaeda Lpez
Escuela de Estadstica, Facultad de Ciencias,
Universidad Nacional de Colombia).

Abstract
The purpose of this conference is to present a procedure for constructing opti-
mal designs for simultaneous parameter estimation and model discrimination
in the context of nonlinear mixed models effects. The compound design crite-
rion is considered. This design criterion is formed by maximizing a weighted
average which depends on different Fisher information matrices. A numeri-
cal example shows the properties of the procedure. The relationship with other
design procedures for parameter estimation and model discrimination is dis-
cussed.

33
CLAPEM 2014 Universidad Nacional de Colombia

3.8. Particle systems

Organized by Pablo Groisman (Universidad de Buenos Aires, Argentina)

Quantitative propagation of chaos for


generalized Kac particle systems

Joaqun Fontbona
Universidad de Chile, CMM.
(Roberto Cortez, Universidad de Chile).

Abstract
We study a class of one dimensional mean field particle systems with binary
interactions, which includes Kacs simplied model of the Boltzmann equation
and some kinetic models for the evolution of wealth distribution. We obtain
explicit rates of convergence, as the total number of particles goes to , for the
Wasserstein distance between the law of a particle and its limiting law, which
depend linearly on time. The proof is based on a novel coupling between the
particle system and a suitable system of non-independent nonlinear processes,
constructed with tools from optimal mass transportation, and relies also on
recently obtained sharp estimates for empirical measures of i.i.d or exchange-
able random variables. The obtained rates are compared with known conver-
gence rates for the lees physical Nanb particle approximations of the Kac
equation, in which each pair interaction has an effect on only one of the parti-
cles. Possible extensions (including to Boltzmanns equation) are also discussed.

34
CLAPEM 2014

Constrained information transmission


on Erds-Rnyi graphs

Christophe Gallesco
Unicamp, Brazil

Abstract
We model the transmission of information of a message on the ErdsRnyi
random graph with parameters (n,p) and limited resources. The vertices of
the graph represent servers that may broadcast a message at random. Each
server has a random emission capital that decreases by one at each emission.
We examine two natural dynamics: in the first dynamics, an informed server
performs its attempts, then checks at each of them if the corresponding edge is
open or not; in the second dynamics the informed server knows a priori who
its neighbors are, and it performs all its attempts on its actual neighbors in the
graph. In each case, we obtain first and second order asymptotics (law of large
numbers and central limit theorem), when n and p is fixed, for the final propor-
tion of informed servers.

Sampling quasistationary distributions:


Computational efficiency and
particle based algorithms

Roberto Imbuzeiro Oliveira


IMPA, Brazil.

Abstract
We consider the problem of sampling from quasistationary distributions of
finite state Markov chains. Our perspective is computational and inspired
by the literature on Markov chain mixing times, where a small mixing time
implies (in an appropriate computational model) that one can approximate the
stationary distribution with moderate computational effort.

35
CLAPEM 2014 Universidad Nacional de Colombia

3.9. Probability and statistics in finance

Organized by Anton Thalmaier (University of Luxembourg, Luxembourg)

Differential equations driven by a fractional


Brownian motion and applications

Samy Tindel
Universite de Lorraine, France.

Abstract

In this talk, we will first justify the use of fractional Brownian motion as a driv-
ing noise for differential systems in several applied situations, with a special
emphasis on finance models. We will then introduce the main ideas of the so-
called rough path theory, which allow to solve differential equations driven by
a general class of stochastic processes. Finally, we will give an account on some
recent density estimates concerning these objects.

Keywords: stochastic differential equations, fractional Brownian motion,


rough paths theory, density of random variables.

Anticipating linear stochastic differential


equations driven by a Lvy process

David Mrquez
Universidad de Barcelona, Spain

Abstract
In this paper, we study the existence of a unique solution for linear stochastic
differential equations driven by a Lvy process, where the initial condition and
the coefficients are random and not necessarily adapted to the underlying filtra-
tion. Towards this end, we extend the method based on Girsanov transforma-

36
CLAPEM 2014

tions on Wiener space and developped by Buckdahn [1] to the canonical Lvy
space, which is introduced in [2].

References
1. Buckdahn, R. (1989). Transformations on the Wiener space and Skorohod-
type stochastic differential equations. Seminarberichte [Seminar reports]
105. Humboldt Universitt, Sektion Mathematik. MR-1033989.

2. Sol, J. L., Utzet, F. and Vives, J. (2007). Canonical Lvy processes and
Malliavin calculus. Stochastic Processes and Their Applicacions, 117, 165-
187. MR-2290191.

3.10. Random graphs and detection problems

Organized by Gabor Lugosi (Pompeu Fabra University, Spain)

Estimation in high-dimensional
random geometric graph

Sebastien Bubeck
Princeton University.
(Jian Ding, Ronen Eldan and Miklos Racz).

Abstract
We consider a random graph model where connections depend on unknown
d-dimensional labels (or feature vectors) for the vertices. Upon the observation
of a realization from this model, we are interested in estimating the unknown
dimension d of the feature vectors. We propose a new statistic, based on signed
triangles, which can successfully estimate dimensions as large as n2 (where n
is the number of vertices), while a simple count of triangles would only work
up to dimension of order n. We also show that n2 is optimal, using a new bound
on the total variation distance between Wish art matrices and the Gaussian
Orthogonal Ensemble.

37
CLAPEM 2014 Universidad Nacional de Colombia

Random Kademlia networks

Luc Devroye
McGill University

Abstract
Kademlia is the facto standard searching algorithms for P2P networks on the
Internet, which is used by millions of users every day (especially those who like
free downloads). We explain this random graph model, and analyze its proba-
bilistic performance.

This is joint work with Xing Shi Cai.

3.11. Random segmentation models

Organized by Luis Nieto Barajas (ITAM, Mexico)

A Bayesian nonparametric approach


for time series clustering

Alberto Contreras-Cristn
Department of Probability and Statistics, IIMAS-
UNAM, Mexico.

Abstract
Within a Bayesian nonparametric framework, we propose to use a Poisson-
Dirichlet process mixture model in order to produce clustering on a set of time
series. In a first stage, the series are modeled using a hierarchical linear regres-
sion that accommodates levels, trends, seasonal and time dependent com-
ponents. Each of these features has an associated parameter. Then, for prior
specification, some these parameters are assumed to follow a Poisson-Dirichlet
process. Since such semi parametric prior distributions give realizations which

38
CLAPEM 2014

are almost surely discrete, we use this feature and cluster the time series follow-
ing the clustering structure of the posterior samples from the feature param-
eters described above. A simulation study allows us to choose which of the
parameters related to levels, trends and seasonality are useful for clustering,
thus providing a flexible framework since different sets of series can be clus-
tered using different characteristics.

Change point detection models derived


from Bayesian nonparametrics

Ramss Mena Chvez


Department of Probability and Statistics, IIMAS-
UNAM, Mexico.

Abstract
Change point detection models aim to determine the most probable grouping
for a given sample indexed on an ordered set. For this purpose, we pro- pose
a methodology based on exchangeable partition probability functions, specifi-
cally on Pitmans sampling formula. Emphasis will be given to the Markov-
ian case, in particular for discretely observed Ornstein-Uhlenbeck diffusion
processes. Some properties of the resulting model are explained and posterior
results are obtained via a novel MCMC algorithm.

39
CLAPEM 2014 Universidad Nacional de Colombia

3.12. Random trees and applications

Organized by Louigi Addario Berry (McGill University, Canada)

The subleading order of 2


dimensional cover times

David Belius
McGill University/Centre de Recherches
Mathmatiques
(Nicola Kistler, City University of New York, College
of Staten Island)

Abstract
The epsilon-cover time of the two dimensional torus by Brownian motion is the
time it takes for the process to come within distance epsilon > 0 from any point.
Its leading order in the small epsilon-regime has been established by Dembo,
Peres, Rosen and Zeitouni [Ann. of Math., 160 (2004)]. In this talk I will pres-
ent a recent result identifying the second order correction. This correction term
arises in an interesting way from strong correlations in the field of occupation
times, and in particular from an approximate tree structure in this field. Our
method draws on ideas from the study of the extremes of branching Brownian
motion.

40
CLAPEM 2014

Connectivity and diameter of


Inhomogeneous Random graphs

Nicolas Fraiman
University of Pennsilvanya, Philadelphia USA.
(Luc Devroye and Dieter Mistche).

Abstract
In this talk we describe the connectivity threshold, the diameter, and metric
properties of inhomogeneous random graphs. In this model edges are present
independently but with unequal probabilities. We generalize results known for
the ErdsRnyi model G(n,p) for several ranges of p.

3.13. Robust statistics

Organized by Ana Bianco (Universidad de Buenos Aires, Argentina)

Essential model validation


for stochastic ordering

Eustasio del Barrio


Brazilian Institute of Geography and Statistics.
(P. lvarez, Juan Cuesta, Carlos Matrn).

Abstract
Stochastic ordering among distributions has been considered in a variety of sce-
narios. Economic studies often involve research about the ordering of invest-
ment strategies or social welfare. However, as noted in the literature, stochastic
orderings are often a too strong assumption which is not supported by the data
even in cases in which the researcher tends to believe that a certain variable
is somehow smaller than another. Instead of considering this rigid model of

41
CLAPEM 2014 Universidad Nacional de Colombia

stochastic order we propose to look at a more flexible version in which two


distributions are said to satisfy an approximate stochastic order relation if they
are slightly contaminated versions of distributions which do satisfy the stochas-
tic ordering. The minimal level of contamination that makes this approximate
model hold can be used as a measure of the deviation of the original distri-
butions from the exact stochastic order model. Our approach is based on the
use of trimmings of probability measures. We discuss the connection between
them and the approximate stochastic order model and provide theoretical sup-
port for its use in data analysis. We also provide simulation results and a case
study for illustration.

High finite-sample efficiency and


robustness based on distance-
constrained maximum likelihood

Vctor J. Yohai
Universidad de Buenos Aires, Argentina
(Ricardo A. Maronna, Universidad Nacional de La
Plata, Argentina).

Abstract
Good robust estimators can be tuned to combine a high breakdown point and
a specified asymptotic efficiency at a central model. This happens in regres-
sion with MM- and tau estimators among others. However, the finite-sam-
ple efficiency of these estimators can be much lower than the asymptotic one.
To overcome this drawback, an approach is proposed for parametric models,
which is based on a distance between parameters. Given a robust estimator, the
proposed one is obtained by maximizing the likelihood under the constraint
that the distance is less than a given threshold. For the linear model with nor-
mal errors and using the MM estimator and the distance induced by the Kull-
back-Leibler divergence, simulations show that the proposed estimator attains
a finite-sample efficiency close to one, while its maximum mean squared error
is smaller than that of the MM estimator. The same approach also shows good
results in the estimation of multivariate location and scatter.

42
CLAPEM 2014

3.14. Sampling methods

Organized by Denise Silva (Escola Nacional de Ciencias Estatisticas)

Approximation of rejective sampling


inclusion probabilities

Hlne Boistard
Universit Toulouse 1.
(Hendrik P. Lopuhaa, Delft University of
Technology, Netherlands; Anne Ruiz-Gazen,
Universit Toulouse 1).

Abstract
For rejective sampling, an expansion of joint inclusion probabilities of any
order is obtained in terms of the inclusion probabilities of order one, extend-
ing previous results by Hajek and making the remainder term more precise.
The main result is applied to derive bounds on higher order correlations, which
are needed for the consistency and asymptotic normality of several complex
estimators.

43
CLAPEM 2014 Universidad Nacional de Colombia

The estimation of gross flows in complex


surveys with random nonresponse

Andrs Gutirrez
Universidad Santo Toms, Bogot, Colombia
(Leonardo Trujillo, Universidad Nacional de
Colombia, Bogot, Colombia and Pedro Luis
do Nascimento Silva, IBGE, Escola Nacional de
Ciencias Estatisticas, Rio de Janeiro, Brazil).

Abstract
Rotating panel surveys are used to calculate estimates of gross flows between
two consecutive periods of measurement. This paper considers a general proce-
dure for the estimation of gross flows when the rotating panel survey has been
generated from a complex survey design with random nonresponse. A pseudo
maximum likelihood approach is considered through a two-stage model of
Markov chains for the allocation of individuals among the categories in the
survey and for modeling for nonresponse.

Keywords: Design-based inference, rotating panel surveys, gross flows, Mar-


kov chains.

44
CLAPEM 2014

Modelling compositional time series


from the Brazilian labour force survey

Denise Britz do Nascimento Silva


National School of Statistical Sciences
Brazilian Institute of Geography and Statistics
Rio de Janeiro Brazil
Eduardo Rosseti

Abstract
A compositional time series is a multivariate time series in which each of the
series has values bounded between zero and one and the sum of the series
equals one at each time point. This paper presents the state-space approach for
modelling compositional time series from the Brazilian Labour Force Survey
(BLFS) taking into account the sampling errors. The BLFS is a rotating panel
survey in which the rotation pattern applies to panels of households. Within
each rotation group, a panel of households stays in the sample for four succes-
sive months, is rotated out for the following 8 months and is sampled again for
another four successive months. The survey collects monthly information about
employment according to the International Labour Organization (ILO) defini-
tions. The modelling procedure produces estimates for the vector of employed,
unemployed and not in the labour force and also for the unemployment rate
series with corresponding estimates for seasonals and trends. The model pro-
vides bounded predictions and estimates satisfying the unity-sum constraint
while taking into account the sampling errors and the correlation structured
implied by the survey rotation pattern.

45
CLAPEM 2014 Universidad Nacional de Colombia

3.15. Special session dedicated to Victor Yohai

Organized by Ricardo Maronna (Universidad Nacional de La Plata, Argentina)

Robustness in two-sample problems: Similarity

Carlos Matrn Bea


Departamento de Estadstica e Investigacin
Operativa, Universidad de Valladolid.

Abstract
Traditional statistical treatments for assessing model validation or goodness
of fit suffer of a serious drawback arising from the fact that testing hypoth-
esis theory is designed to provide evidence to reject the hypothesis. This means
that with their use we will not be able to confirm the model; instead, at most,
we would get just lack of statistical evidence to reject it. In the pure robust-
ness framework, the consideration of contamination neighborhoods leads to
an appropriate statement of similarity between probabilities. This concept can
be applied, in a fully non-parametric setting, to two-sample problems involv-
ing homogeneity or stochastic order between the parent distributions. Our
approach resorts to probability metrics and trimming techniques that allow
both: mathematical treatment and feasibility of the involved procedures. By
measuring the level of similarity we can address the problem of model vali-
dation looking for the approximate validity of our goal as the alternative in a
suitable test.

46
CLAPEM 2014

Robust regression for asymmetric


response models

Alfio Marazzi
Facult de Biologie et Medecine, Universit de
Lausanne.

Abstract
I will review a series of joint papers with Victor Yohai about robust paramet-
ric estimates of regression with positive asymmetrically distributed censored
(or non censered) responses. All estimates are based on a two-step paradigm.
In the first step, a very robust (high breakdown point) initial estimate is com-
puted. The initial estimate is used to identify the bulk of the data and the outli-
ers. In the second step, outliers are down weighted or removed and an efficient
weighted estimate is computed which maintains the degree of robustness of
the initial one. We consider a class of asymmetric models for the log-response
distribution which includes location-scale models (e.g., logweibull), location-
scale-shape models (e.g., generalized loggamma) as well as other models (e.g.,
negative binomial). Typical initial estimates are S estimates (that minimize
M-scales of the residuals) and Qtau estimates (that minimize tau scales of the
differences between empirical and model based quantiles). The class of final
estimates includes weighted likelihood and truncated likelihood estimates
which asymptotically approach the maximum likelihood estimates when the
models are correct.

47
CLAPEM 2014 Universidad Nacional de Colombia

3.16. Stochastic analysis

Organized by Soledad Torres (CIMFAV, Universidad de Valparaso, Chile)

Semicircular limits on the free Poisson algebra

Solesne Bourguin
Department of Mathematical, Sciences, Carnegie
Mellon University.
(Giovanni Peccati).

Abstract
This talk will focus on recent developments in the study of the free Poisson alge-
bra, namely a new multiplication formula, as well as non-commutative diagram
formulas. These results are key to studying semicircular limits for non-commu-
tative random variables on this space, on which a fourth moment theorem has
been shown to hold.

On the short-time behavior of the


implied volatility for jump-diffusion
models with stochastic volatility

Jorge Alberto Len


Centro de Investigacin y de Estudios Avanzados
del I.P.N
CINVESTAV Mxico.

Abstract
In this talk, we use the techniques of Malliavin calculus to obtain an expression
for the short-time behavior of the at-the-money implied volatility skew for a
general jump-diffusion stochastic volatility model. Here we will consider the
following three cases:
48
CLAPEM 2014

1) The involved stochastic volatility process is adapted to the filtration genera-


ted by the Brownian motion driving the asset price.

2) The volatility process is correlated not only with the Brownian motion dri-
ving the asset price, but also with the asset price jumps.

3) The strike is adapted to the the filtration generated by the Brownian motion
driving the asset price.

Finite Potential Theory

Jaime San Martn


Universidad de Chile - CMM.

Abstract
We shall show an interesting connection between potential theory on discrete
spaces and the M-matrix problem from linear algebra. This relation allows us
to show some important results in matrix analysis as well as to give new insight
to potential theory. In the other direction, some results from linear algebra have
important implications in stochastic analysis. We will discuss some possible
applications.

49
CLAPEM 2014 Universidad Nacional de Colombia

3.17. Francisco Aranda Ordaz Award

Organized by Ramss Mena Chvez, Department of Probability and Statistics,


IIMAS-UNAM, Mexico

Geometric Measure Techniques


in Set Estimation

Alejandro Cholaquidis
Universidad de la Repblica, Uruguay

Abstract
A domain S in Rd is said tofulfilthe Poincar cone property if any point in the
boundary of S is the vertex of a (finite) cone which does not otherwise intersects
the closure of S. For more than a century,this condition has played a relevant
role in the theory of partial differential equations, as a shape assumption aimed
to ensure the existence of a solution for the classical Dirichlet problem. In the
talk, in a completely different setting, I willanalysesome statistical applications
of the Poincar cone property (when defined in a slightly stronger version). I
will show that this condition can be seen as a sort of generalized convexity:
while it is considerably less restrictive than convexity, it still retains some ``con-
vex flavour. In particular, when imposed to a probability support S, this prop-
erty allows the estimation of S from a random sample of points, using the ``hull
principle much in the same way as a convex support is estimated using the
convex hull of the sample points. The statistical properties of such hull estima-
tor (consistency, convergence rates) will be presented in detail. It will be shown
that the class of sets fulfilling the Poincar property is a P-Glivenko-Cantelli
class for any absolutely continuous distribution P on Rd. Finally, an algorithm
to approximate the cone-convex hull of a finite sample of points will be pro-
posed and some practical illustrations will be given.

50
CLAPEM 2014

Variational description of Gibbs-non-Gibbs


dynamical transitions for spin-flip systems

Julian Martinez
Universidad de Buenos Aires
(Roberto Fernndez and Frank den Hollander).

Abstract
We discuss the concept of Gibbs/ non-Gibbs measure in the lattice together
with its extension to the mean field / local-mean field context, and the emer-
gence of dynamical Gibbs-non-Gibbs transitions under independent spin-flip
(infinite-temperature) dynamics. We show that these dynamical transitions
are equivalent to bifurcations in the set of global minima of the large-deviation
rate function describing optimal conditioned trajectories of the empirical den-
sity. Possible bifurcation scenarios are fully determined in the mean field case,
yielding a full characterization of passages from Gibbs to non-Gibbs-and vice
versa- with sharp transition times.

51
Contributed talks
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 1

Using time series trimmed-clustering methods


to detect stationary periods in the sea state

Pedro C. lvarez Esteban


University of Valladolid, Spain.
(Joaqun Ortega CIMAT, Mexico, Carolina Eun,
CIMAT, Mexico).

Abstract
Random sea waves are often modeled as stationary processes for short or mod-
erately long periods of time and therefore the problem of detecting changes in
the sea state is very important. In general, the sea state can be regarded as a
sequence of stationary and transition (between stationary) periods of time. Seg-
mentation and change-point methods have been widely used classify or identify
both types of periods. However, very often these methods fail when changes
occur slowly over a period of time, as is the case in most cases related to the
sea state. We look at this problem from the spectral point of view proposing a
method that considers processes normalized to have unit variance and looks
at changes in the energy distribution through the energy spectra by looking at
their total variation distance. This distance measures the difference between
two probability densities by determining how much they have in common, or
equivalently, how much one of them has to be modified to coincide with the
other, and the spectrum of a normalized process can be seen as the probability
density of the energy distribution. The series of wave height measures is divided
into intervals of 30 minutes and for each the spectral density is estimated. Then,
the above distances are computed to obtain a matrix of distances. Different
clustering methods over this dissimilarity matrix are explored, including data
driven trimmed clustering methods in order to take into account the heteroge-
neity introduced by the existence of the transition periods. We present simula-
tion studies to validate the proposed method as well as examples of applications
to real data.

54
CLAPEM 2014

Identification and estimation of


general ARMA models

Ignacio Lobato
Instituto Tecnolgico Autnomo de Mxico.
(Carlos Velasco, Universidad Carlos III de Madrid).

Abstract
This article introduces frequency domain procedures for performing inference
in general time series linear models. We allow for possibly noninvertible and/
or noncausal processes in the absence of information on these potential non-
fundamentalness properties. We use information from higher order moments
to achieve identification on the location of the roots of the AR and MA poly-
nomials for non-Gaussian time series. We propose a minimum distance esti-
mator that combines the information contained in second, third, and fourth
moments. Contrary to existing estimators, the proposed estimator is consistent
under general assumptions, and can be computed in one single step. For the
standard causal and invertible ARMA model with non-Gaussian innovations,
our estimator can be asymptotically more efficient than Gaussian-based pro-
cedures, such as the Whittle estimator. For cases where Gaussian-based pro-
cedures are inconsistent, such as noncausal or noninvertible ARMA models,
the proposed estimator is consistent under general assumptions. The proposed
procedures also overcome the need to use tests for causality or invertibility.

55
CLAPEM 2014 Universidad Nacional de Colombia

A note on the specification of


conditional heteroscedasticity using
an opne-loop TAR model

Fabio Humberto Nieto


Universidad Nacional de Colombia.
(Edna Carolina Moreno, Universidad Santo Toms
de Aquino).

Abstract
Clusters of large values are observed in sample paths of certain open-loop
threshold autoregressive (TAR) stochastic processes. Three types of marginal
conditional distributions of the underlying stochastic process are outlined in
this paper in order to characterize the stochastic mechanism that generates this
empirical TAR-model stylized fact. One of them permits to find the conditional
variance function that explains the aforementioned stylized fact. As a byprod-
uct, a sufficient condition for having asymptotic weak stationarity in an open-
loop TAR stochastic process is derived.

56
CLAPEM 2014

Extended modelling and improved


estimation for INAR(1) processes

Klaus Leite Pinto Vasconcellos


Universidade Federal de Pernambuco.
(Marcelo Pereira, Universidade Federal de
Pernambuco).

Abstract

We introduce first order non-negative integer-valued autoregressive processes


with power series innovations based on the binomial thinning operator. This
new model contains as particular cases several models such as the Poisson
INAR(1) model, the geometric INAR(1) model among many others. We also
derive, for the Poisson INAR(1) model, the second order bias of the squared dif-
ference estimator for one of the parameters. The main properties of the power
series model are derived, such as mean, variance and the autocorrelation func-
tion. Yule-Walker, conditional least squares and conditional maximum likeli-
hood estimators of the model parameters are derived. For the Poisson INAR(1)
model, we arrive at a very simple bias formula for the squared difference esti-
mator, which allows us to define a very simple bias-adjusted estimator. An
extensive Monte Carlo experiment is conducted to evaluate the performances
of all estimators in finite samples for the power series models here studied. Spe-
cial sub-models are studied in some detail. Applications to real data sets are
given to show the flexibility and potentiality of the new model. For the Poisson
INAR(1) model, we investigate the performance of our bias corrected estimator.
The behavior of a modified conditional least squares estimator in terms of bias
is also studied. The numerical simulations and the practical examples that are
provided show that our general power series model can be a useful and recom-
mendable alternative in modeling time series with small integer values. The
Monte Carlo simulation studies also provide numerical evidence that the here
proposed bias-adjusted estimator outperforms the other estimators in small
samples for Poisson INAR(1) models. We therefore recommend the use of this
bias improved estimator in Poisson INAR(1) models. An application to a real
data set also illustrates the practical use of this bias correction.

57
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 2

High order exponential-based LL methods


for random differential equations

Hugo de la Cruz
Escola de Matemtica Aplicada-FGV.
(Felix Carbonell, Biospective / McGill University).

Abstract
Over the last few years, there has been growing and renewed interest in the
numerical study of Random Differential Equations (RDEs). On one hand,
this is motivated by the fact that RDEs have played an important role in the
modeling physical, biological, neurological and engineering phenomena and,
on the other, motivated by the usefulness of RDEs for the numerical analysis
of stochastic differential equations (SDEs)via the extant conjugacy property
between RDEs and SDEswhich allows to study stronger pathwise proper-
ties of SDEs driven by different kind of noises other than the Brownian. Given
that in most common cases no explicit solution of the equations is known, the
construction of computational methods for the treatment and simulation of
RDEs has become an important need. In this vein, the Local Linearization (LL)
approach is a successful technique that has been applied for defining numerical
integrators for RDEs. However, a major drawback of the obtained methods is
its relative low order of convergence; in fact it is twice the order of the moduli
of continuity of the driven stochastic process. The present work overcomes this
limitation by introducing a new exponential-based high order numerical inte-
grator for RDEs. For this, a suitable approximation of the stochastic processes
present in the random equation, together with the local linearization technique
and an adapted Pad method with scaling and squaring strategy are conve-
niently combined. In this way, a higher order of convergence can be achieved
(independent of the moduli of continuity of the stochastic process) while
retaining the dynamical and numerical stability properties of the low order LL
method. Results on the convergence and stability of the suggested method and
details on its efficient implementation are discussed. The performance of the
introduced method is further illustrated through computer simulations.

58
CLAPEM 2014

Stability for some linear stochastic


fractional systems

Allan Fiel
Departamento de Control Automtico Cinvestav-
IPN.
(Jorge A. Len, Departamento de Control
Automtico Cinvestav-IPN, David Mrquez-
Carreras, Universitat de Barcelona).

Abstract
We obtain a closed expression for the solution of a linear Volterra integral equa-
tion with an additive Hlder continuous noise and with a continuous function
as initial condition. We then discuss the stability of the solution via the frac-
tional calculus. As an application, we analyze the stability in the mean of some
stochastic fractional integral equations.

Rate of convergence to equilibrium of


fractional driven stochastic differential
equations with some multiplicative noise

Joaqun Fontbona
CMM, University of Chile.
(Fabien Panloup, IMT-Toulouse, Universit Paul
Sabatier).

Abstract
We investigate the problem of the rate of convergence to equilibrium for ergo-
dic stochastic differential equations driven by fractional Brownian motion with
Hurst parameter H >1/ 2 and multiplicative noise component . When is
constant and for every H (0,1), it was proven by M. Hairer that, under some
mean-reverting assumptions, such a process converges to its equilibrium at a
rate of order t where (0,1) (depending on H ). The aim of this paper
is to extend these types of results to some multiplicative noise setting. More

59
CLAPEM 2014 Universidad Nacional de Colombia

precisely, we show that we can recover such convergence rates when H >1/ 2
and the inverse of the diffusion coefficient is a Jacobian matrix. The main
novelty of this work is a kind of extension of Foster-Lyapunov-like techniques
to this non-Markovian setting, which allows us to put in place an asymptotic
coupling scheme such as Hairers one, without resorting to deterministic con-
tracting properties.

Remarks on stochastic transport equation

Christian Olivera
IMECC-UNICAMP.
(Wladimir Neves UFRJ, IM-UFRJ).

Abstract
In this talk, we discuss a number of results on stochastic transport equations.
First, the main issue of uniqueness follows with more general assumptions
than in the deterministic case. For instance, the result of wellposedness for the
Cauchy problem under the Ladyzhenskaya-Prodi-Serrin condition. The initial-
boundary value problem, intrinsically more difficult than the Cauchy problem,
is also addressed in this talk. We consider in detail the stochastic trace result.

60
CLAPEM 2014

Contributed talks 3

Reconstructing past climate from natural


proxies and estimated climate forcings
using short and long-memory models

Luis Barboza Chinchilla


Universidad de Costa Rica.
(Bo Li, University of Illinois at Urbana-Champaign,
Martin P. Tingley and Frederi Viens, Pennsylvania
State University and Purdue University).

Abstract
We have produced new reconstructions of Northern Hemisphere annually aver-
aged temperature anomalies back to 1000AD, based on a model that includes
external climate forcings and accounts for any long-memory features. Our
reconstruction is based on two linear models, with the first linking the latent
temperature series to three main external forcings (solar irradiance, green-
house gas concentration, and volcanism), and the second linking the observed
temperature proxy data (tree rings, sediment record, ice cores, etc.) to the unob-
served temperature series. Uncertainty is captured with additive noise, and a
rigorous statistical investigation of the correlation structure in the regression
errors motivates the use of long memory fractional Gaussian noise models for
the error terms. We use Bayesian estimation to fit the model parameters and to
perform separate reconstructions of land-only and combined land-and-marine
temperature anomalies. We quantify the effects of including the forcings and
long memory models on the quality of model fits, and find that long memory
models can result in more precise uncertainty quantification, while the exter-
nal climate forcings substantially reduce the squared bias and variance of the
reconstructions. Finally, we use posterior samples of model parameters to
arrive at an estimate of the transient climate response to greenhouse gas forc-
ings of 2.56 C (95% credible interval of [2.20, 2.95] C), in line with previous,
climate-model-based estimates.

61
CLAPEM 2014 Universidad Nacional de Colombia

A stochastic disease transmission


in an epidemic model considering
a hyperbolic incidence rate

Alejandra Christen
Pontificia Universidad Catlica de Valparaso.
(M. Anglica Mauln -Yaez, Pontificia Universidad
Catlica de Valparaso, Eduardo Gonzlez-Olivares,
Pontificia Universidad Catlica de Valparaso).

Abstract

A stochastic SI epidemic model, based on the model proposed by Roberts and


Saha (1999) is analyzed considering a hyperbolic type nonlinear incidence rate.
Assuming the proportion of population infected varies with time, a new model
described by an ordinary differential equation is presented, which is analogous
to an equation describing the double Allee effect. Then, the asymptotic behav-
ior of a stochastic fluctuation due to the environmental variation in the coef-
ficient of disease transmission is studied. So, a stochastic differential equation
is obtained, which is analysed through the associated Fokker- Planck equation
to obtain the probability density function (its invariant probability distribu-
tion) when the proportion of the infected population reaches steady state. An
explicit expression for this probability density function is found with a relation-
ship between the moments of the proportion of population infected. According
to our knowledge, this incidence rate has not been previously used for these
types of epidemic models.

62
CLAPEM 2014

Mixed beta regression with penalized


splines for severity in plant diseases

Pedro A. Torres-Saavedra
Department of Mathematical Sciences, University of
Puerto Rico at Mayagez.
(Ral E. Macchiavelli, Department of Crops and
Agroenvironmental Sciences, University of Puerto
Rico at Mayagez).

Abstract
Severity in plant diseases is quantified as the amount of plant material affected
by the disease, and is usually expressed as a continuous variable in a 0-1 scale.
Since plant diseases are monitored throughout the crops lifecycle, the model-
ing of severity progress curves needs to incorporate the longitudinal structure
of the data. Mixed beta regression has emerged as an appealing alternative to
model this. However, when the average and the subject-specific curves do not
follow a parametric form, semi-parametric methods are required. We propose
a mixed beta regression with smooth average curves and subject-specific curves
to model severity progress curves. Parameters in the proposed model are esti-
mated via maximum likelihood. The roughness parameters in the penalized
splines are chosen using traditional model selection criteria (e.g., BIC or AIC).
The proposed semi-parametric method allows us to model flexible shapes for
disease progress curves, and can be used to compare treatments or conditions
while taking into account the longitudinal and design structures of the data.
We apply the proposed method to model the severity of Black Sigatoka in an
experimental banana plantation in Isabela, Puerto Rico, designed to compare
different control practices. The use of the proposed method yields very useful
results that allow plant pathologists and crop managers to understand, monitor
and control diseases.

63
CLAPEM 2014 Universidad Nacional de Colombia

Reproducing kernel Hilbert space


approach to general functional linear
regression for exponential families

Carlos Valencia
Universidad de los Andes.
(with Ming Yuan, University of Wisconsin).

Abstract
Many statistical analses require the processing and manipulation of data that
take the form of random curves that are usually the result of smoothed ver-
sions of longitudinal data measured over a grid of points that can be modeled
as functional data. Special attention has been paid to the modeling of a scalar
response with functional predictors, the functional linear regression being the
most renowned case. However, in numerous applications there are a number
of restrictions in terms of the characterization of the response variable, for
instance when this response is categorical or when the usual zero mean addi-
tive error assumption does not seem to be appropriate. A natural alternative is
the use a generalized linear model adapted for a functional predictor.

In this paper, we study a smoothness regularization estimation of an infinite


dimensional parameter in an exponential family model with functional predic-
tors. We focused on the Reproducing Kernel Hilbert space approach and show
that regardless of the generality of the method, minimax optimal convergence
rates are achieved. A general framewrok for the asymptotic analysis of the first
order approximation is developed by using a simultaneous diagonalization
technique of two positive definite kernels. Asymptotic rates are presented for a
family of spectral norms that allow a unified analysis for the slope estimation
and the prediction problem.

In particular, we consider the functional generalized linear model where the


response Y follows a probability distribution in the exponential family with
density
f ( y ) = exp( 0 y ( 0 )), (4.1)
0

and canonical parameter 0 ( X ) = 0 + 0 X in the real numbers, with X



being a second order stochastic process on a compact domain . 0 is the
unknown slope function and 0 is the unknown scalar intercept. We assume
one observes a training data ( x1 , y1 ), ,( xn , yn ) consisting of the realization

64
CLAPEM 2014

of n independent copies of ( X ,Y ). Our purpose is to estimate the slope param-


eter 0 and, based on it, present a point estimator for 0 ( X ) . We assume that
0 belongs to a reproducing kernel Hilbert space H L2 ( ) .

The regularization method combines two non-negative functionals of the


parameters ( , ) . The first one is a data fit functional n ( , ) that measures
how well the data is explained as a realization of a random sample with associ-
ated densities f ( yi ) and i = + xi . We shall use the negative loglikeli-
i
hood of the data as the data fit functional. The second functional is a penalty
term J ( ) that prevents the over fitting of the estimator by giving solutions that
are not plausable less chance of being selected. We choose J ( ) as a norm (or
semi-norm) in the reproducing kernel Hilbert space H . Therefore, the method
of regularization estimates ( 0 , 0 ) by finding the arguments that minimize
the expression

n (c, ) + J ( ), (4.2)

for c in the real line and H . 0 is the tuning parameter that balances
out the two criteria represented by n and J respectively. Despite there being an
optimization problem over an infinite dimensional space, we show that by the
representer theorem the problem can be solved in a finite number of parameters
and the many known smoothing splines estimation algorithms may be adapted
to solve the numerical problem.

Many of the previous approaches for estimating the slope in the generalized
functional linear model rely of the Functional Principal Components Analysis
(FPCA), that in general impose strong restrictions on the spacing of the eigen-
values of the operator generated by the covariance kernel of the process X () .
We relax these assumptions and obtain sharper minimax convergence rates.
Our asymptotic analysis proves optimality on these rates under some regularity
of conditions.

65
CLAPEM 2014 Universidad Nacional de Colombia

Contributed Talks 4

Multivariate walks through dimensions:


The turning bands equations case

Carlos Eduardo Alonso-Malaver


Universidad Nacional de Colombia.
(Emilio Porcu, Universidad Tcnica Federico
Santamara, and Ramn Giraldo Henao,
Universidad Nacional de Colombia).

Abstract
We characterize the classes of stationary-isotropic matrix-valued covariance
functions on an Euclidean space, as the scale mixture of a uniquely determined
matrix-valued measure. Such a result is the analogue of the Schoenberg theo-
rem for the class of univariate stationary-isotropic covariance functions. Based
on previous results, we illustrate the existence of operators that map a radial
function f being positive definitive on some Euclidean spaces Rd; in another
function, say g, being radial and positive definite on an Euclidean space of lower
or higher dimension. One of the classes of these operators are the multivariate
versions of the turning bands equations.

66
CLAPEM 2014

Order book microstructure visualization:


The case of the Colombian high-
frequency foreign exchange market

Andrea Marcela Cruz Moreno


Universidad Nacional de Colombia.
(Javier Sandoval, Universidad Nacional de Colombia
and Universidad Externado de Colombia).

Abstract
In financial markets, the order book is defined as the set of unexecuted buy/sell
orders at which traders are willing to buy/sell a specific financial instrument.
Recently, book order information has become available and it is believed to con-
tain useful information to develop profitable high-frequency trading strategies.
This work presents a comparative study between three proposed methods for
visualization of high-frequency order book information: a dynamic heat map,
a dynamic wavelet transform heat map and a dynamic Markov random field.
A study case is provided using tick-by-tick real high-frequency data from the
Set-Fx, the Colombian Forex exchange market. We will evaluate each visual-
izations performance based on how supportive they are to the trading buy/
sell decision making process using measures such as the ratio between mean
and variance of the normalized histogram of the image gradient or the sum of
the absolute value of the differences between pixels horizontally and vertically.
Finally, we present an analysis and discussion about possible enhancement
methods. This paper is organized as follows: Section 1 presents the introduc-
tion. Section 2 depicts a brief review of main concepts and definitions to under-
stand order book dynamics and its connection to High Frequency Trading in
Forex Exchange markets. Section 3 introduces different techniques and the
results of the comparative study. Finally, Section 4 presents the conclusions and
some suggestions for coupling the proposed visualization techniques to trading
strategies.

67
CLAPEM 2014 Universidad Nacional de Colombia

Impact of the central bank interventions


on the Colombian exchange rate

Luis Melo
Banco de la Repblica, Colombia
(Juan Jos Echavarra, Federacin de Cafeteros,
Colombia; Mauricio Villamizar, Georgetown
University).

Abstract
The adoption of a managed regime assumes that interventions are relatively
successful. However, while some authors consider that foreign exchange inter-
ventions are not effective, arguing that domestic and foreign assets are close
substitutes, others advocate their use and maintain that their effects can even
last for months. There is also a lack of consensus on the related question of
how to intervene. Are dirty interventions more powerful than pre-announced
constant ones? This paper compares the effects of day-to-day interventions
with discretionary interventions for the Colombian case by combining a Tobit-
GARCH reaction function with an asymmetric power PGARCH(1,1) impact
function. Our results show that the impact of pre-announced and transparent
US $ 20 million daily interventions, adopted by Colombia in 2008-2012, has
been much larger than the impact of dirty interventions adopted in 2004-2007.
As a second exercise, we compare the effect of different types of interventions by
the Colombian central bank using an event study approach, without imposing
restrictive parametric assumptions or without the need to adopt a structural
model. We find that all types of interventions have been successful according to
the fact that smoothing criterion were able to stem exchange rate volatility. In
particular, volatility options seemed to have the strongest effect. We find that
results are robust when using different window sizes and counterfactuals.

68
CLAPEM 2014

Encouraging the development of global capital


markets: The role of funded pension schemes

Mara Nela Seijas Gimnez


Universidad ORT Uruguay.

Abstract
Personal individual capitalization systems have experienced significant growth
in recent decades, following the trend of aging populations and the defined
benefit pension crisis. This article investigates whether the implementation of
funded pension schemes has prompted the development of domestic capital
markets worldwide, over the 1990-2011 period. The methodological strategy
relies upon panel regressions, minimum spanning tree and hierarchical tree
classification techniques applied to depth and liquidity indicators of stock and
bond markets as well as representative pension fund performance information.
The analysis has revealed that individual capitalization pension funds have
meant a stimulus to stock market depth. A negative causality with stock market
liquidity is also evidenced and linked to the long-term profile of pension port-
folio management. Both development ratios receive positive impacts of greater
magnitude from the cluster of advanced maturation systems. Results also sug-
gest that voluntary systems have mainly encouraged public debt depth but are
related to improvements in stock market development as well. Finally, evidence
reveals that clusters of low gradual and incipient maturation systems exert posi-
tive impacts on public debt depth. These findings are consistent with existing
literature and also with the investment portfolio that usually characterizes pen-
sion funds in their earlier stages of life.

69
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 5

Stochastic model for an


opportunistic spectrum access

Viswanathan Arunachalam
Department of Statistics,Universidad Nacional de
Colombia.

Abstract
An important problem in opportunistic spectrum access is the maximization
of the number of packets sent by the secondary users during the white space
of the spectrum, while avoiding the infringement of the privileges of the pri-
mary user. The focus of the model is from the perspective of the secondary node
only and hence the alternating renewal process describing the primary users
activity has not been taken into account. It would be interesting to incorporate
the availability of the spectrum for the secondary user which is nothing but
the unavailability function of the alternating renewal process. We set up this
problem in terms of an optimal stopping problem. Explicit expression for the
optimal number of packets that can be sent by the secondary nodes in a white
space is obtained. An example is used to explain the model.

Performance analysis of LTE-UMTS networks

Selvamuthu Dharmaraja
Department of Mathematics
Indian Institute of Technology Delhi
New Delhi India.

Abstract
Next generation networks require efficient radio resource management (RRM).
Increasing demand of high-bit rate services together with limited availability
of radio resources, requires smart RRM strategies that also maintain quality
of service. Current penetration of technologies such as Universal Mobile Tele-

70
CLAPEM 2014

communication System (UMTS) is quite high. LongTerm Evolution (LTE) is


an emerging technology, operators will face long periods in which both Radio
Access Technologies (RAT) will coexist. A smart use of radio resources in both
RATs will be of high interest for operators. In this paper, we discuss how efficient
RRM can be exploited to facilitate a smarter use of available radio resources
for LTE/LTE-A and UMTS scenarios. Joint Call Admission Control (JCAC) is
one of the algorithms for Joint Radio Resource Management (JRRM). In our
work, we take into consideration two JCAC policies for LTE-UMTS networks:
networks load based and service based. These two approaches are applied inde-
pendently and their effects are studied. It has been shown that load based JCAC
results in greater throughput for LTE-UMTS networks, while it degrades net-
work performance in terms of packet delay for interactive users. Simulations
are performed in Qualnet 6.1 network simulator.

Convergence in distribution for 1D contact


process seen from its rightmost point

Pablo Groisman
IMAS-CONICET - U. de Buenos Aires.
(E. Andjel - F. Ezanno, IMAP - U. dAix-Marseille,
L. Rolla, IMAS-CONICET - U. de Buenos Aires).

Abstract
We consider a 1D contact process seen from its rightmost point on the space
of infinite configurations which are bounded above. Despite the fact that this
process has no invariant measures, we will prove that it converges in distribu-
tion to the quasi-stationary distribution of the same process but defined on the
space of finite configurations.

71
CLAPEM 2014 Universidad Nacional de Colombia

Optimal maintenance of a system subject


to shocks and progressive deterioration

Mauricio Junca
Universidad de los Andes, Colombia
(Mauricio Snchez-Silva, Universidad de los Andes,
Colombia).

Abstract
We present a model to define an optimal maintenance policy of systems that
deteriorate as a result of shocks, modeled as a compound Poisson process and a
deterministic, state dependent rate. The optimal maintenance strategy is based
on an impulse control model. In the model, the optimal time and size of inter-
ventions are executed according the the system state, which is obtained from
permanent monitoring.

Contributed talks 6

Minimum risk point estimation of Gini index

Bhargab Chattopadhyay
University of Texas at Dallas.
(Shyamal Krishna De, Binghamton University).

Abstract
Economic inequality is usually measured when it comes to evaluating the
effects of economic policies at micro or macro level. In order to evaluate the
economic policies adopted by a government, it is important to estimate the Gini
index at any specific time period. If the income data for all households in the
region of interest is not available, one has to draw a relatively small sample
to estimate the Gini index for that region. A method of estimation should be
developed such that the cost of sampling and the error in estimation are kept
as low as possible. It is well known that error in estimation decreases when the
sample size increases which in turn increases the cost of sampling. To minimize

72
CLAPEM 2014

the cost of sampling, one has to reduce the sample size which in turn may lead
to a higher estimation error. Therefore, a procedure is required which can act
as a trade-off between the estimation error and the sampling cost. To achieve
this trade-off, the sample size should not be fixed in advance. This problem falls
in the domain of sequential analysis where it is known as minimum risk point
estimation problem. In this presentation, we propose a sequential procedure
that yields an asymptotic minimum risk point estimator of Gini index by mini-
mizing the asymptotic risk function comprising of a cost function and estima-
tion error. Under distribution-free scenario, we prove that the final sample size
for our procedure approaches the optimal sample size that minimizes the risk
function. A detailed and more rigorous use of reverse submartingale properties
has been adopted to prove that on average, the final sample size hovers around
the optimal sample size and also the ratio-regret is asymptotically 1.

A Bayesian approach to errors-


in-variables beta regression

Jorge Figueroa-Ziga
Universidad de Concepcin - Chile.
(Reinaldo Arellano-Valle, Pontificia Universidad
Catlica de Chile, Silvia L.P. Ferrari, Universidad de
So Paulo, Brazil).

Abstract
Beta regression models have been widely used for the analysis of limited-range
continuous variables. Here, we consider an extension of the beta regression
models that allows for explanatory variables to be measured with error. Thus,
we propose a Bayesian treatment for errors-in-variables beta regression mod-
els. The specification of prior distributions is discussed, computational imple-
mentation via Gibbs sampling is provided, and two real data applications are
presented. Additionally, Monte Carlo simulations are used to evaluate the per-
formance of the proposed approach.

73
CLAPEM 2014 Universidad Nacional de Colombia

A PAC-bayesian approach to bipartite ranking

Sylvain Robbiano
CIMFAV, Universitad de Valparaso, Chile.
(Benjamn Guedj, LSTA, Universit Piere et Marie
Cury, France).

Abstract
The bipartite ranking problem consists in learning from a sample Dn = ( Xi ,Yi )ni=1
to emphrank observations Xi, while preserving the order of their associated
labels Yiinpm1 . We consider this problem in the high dimensional situation,
where the observations Xi s lie in a space of dimension d, possibly much larger
than the sample size n. A standard approach in this context involves the intro-
duction of a emphscoring function. We propose to estimate the optimal scoring
function using the so-called Gibbs posterior distribution, which avors sparse
additive estimators. This procedure appears valuable when it comes to assess
the effect of each covariate on the score of an observation. Using elements from
the PAC-Bayesian theory, we provide theoretical guarantees about our method,
along with an implementation through MCMC.

Bayesian regression models for continuous


proportions with zeros and ones

Leandro Tavares Correia


University of So Paulo.
(Heleno Bolfarine, University of So Paulo, Cibele
Queiroz da-Silva, University of Braslia).

Abstract
Over the last few years, statistical modeling of continuous proportion has
become the issue of many studies. Some examples of continuous proportions
data include unemployment rate, mortality in traffic accidents, the fraction of
income contributed to a retirement fund, the fraction of exportation income of
the industry sectors, etc. Usual linear and nonlinear regression models are not
suitable for these types of data. Some different alternatives have been proposed

74
CLAPEM 2014

for modeling continuous proportions data that are perceived to be related to


exogenous variables, such as data transformation, censured response variable
or assuming a distribution for the response variable that is restricted to lim-
ited range. Tobit regression models and beta regression models are of particu-
lar interest. The beta distribution is very attractive for modeling limited range
data due to its flexibility, since its density has different shapes depending on the
values of the two parameters that index the distribution. However, proportions
data may include a non-negligeable number of zeros and/or ones. When this is
the case, the beta distribution does not provide a satisfactory description of the
data, since it does not allow a positive probability for any particular point in the
interval [0,1] . To circumvent this problem, Ospina e Ferrari (2010) proposed a
mixed continuous-discrete version of the model using the beta law to the define
the continuous component of the distribution and the discrete component is
defined by a Bernoulli or a degenerate distribution at zero or at one. In this
model, the beta distribution is applied to model the continuous component of
the data and the Bernoulli distribution fits the discrete component. Ospina e
Ferrari (2010) named it the bizu distribution and its distribution function can be
expressed as a convex combination of two cumulative distribution functions as

BIZU ( y ; , , , ) = Ber ( y ; ) + (1 )F ( y ; , ),

where Ber ( y ; ) represents the cumulative distribution function of a Ber-


noulli random variable with parameter and F ( y ; , ) represents the the
cumulative distribution function of a B( , ) . The mixture parameter , with
0 < <1 , allows the convex combination of the two distribution functions. An
alternative approach to deal with the type of dataset described above is based
on an extension of the to bit censored model on the interval [0,1]. For the to bit
doubled censured model, inflation of zeros and ones may also be considered
using an extension of the mixture model presented by Moulton and Halsey
(1995). It is considered that part of the zeros and/or ones come from a Bernoulli
type model that links possible zero and/or one excess with a group of covariates
that may have influence on probability of their occurrence and moreover the
continuous responses are modeled using the normal or power-normal distribu-
tion (including a parameter of asymetric) (see Pewsey et al., 2012) with a link
function as usual generalized linear models. In this paper, we present a the
Bayesian version of the zero-and-one inflated beta regression models and also
a Bayesian class of double censored tobit model with zero and one inflation.
The advantages of Bayesian analysis are well known and include elicitation of
prior belief, avoidance of asymptotic approximation and practical estimation
of function of parameters via MCMC scheme. We also discuss some Bayesian
diagnostic techniques such as Bayesian residuals and influence measures based
on the q-divergence.

75
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 7

Analysis of sequential tests by using simulation

lvaro Calvache Archila


Universidad Pedaggica y Tecnolgica de Colombia.
(Dairo Gil Gil, Universidad Pedaggica y
Tecnolgica de Colombia).

Abstract
The sequential tests theory analysis was developed by Wald. It states that a
sample size cannot be fixed in advance; instead the data should be evaluated
as they are collected and further sampling should be stopped according to
a rule previously defined and as soon as significant results are observed. It
was also demonstrated that the probability that a sample size is infinite, is
zero. In addition, a method was established to calculate the expected value
of the sample size. The higher order moments of the random sample size are
not easy to calculate. However, it was demonstrated that these moments are
finite. Recent literature about sequential tests does not provide new theoreti-
cal elements about calculating these moments. The most recent articles on
this subject show various applications in several areas of knowledge, but none
of them calculate, for instance, the variability of the random sample size. In
this lecture, we present a way to simulate the probability functions of the
sample sizes, allowing us to make a decision on a specific case and what would
be the most convenient for a study, whether to work with sequential test or to
work with a hypothesis test with a fixed sample size.

76
CLAPEM 2014

Distribution-free tests of inequality


constraints on conditional moments

Miguel A. Delgado
Universidad Carlos III de Madrid.
(Juan C. Escanciano, Indiana University).

Abstract
We present a methodological approach for testing inequality constraints on
conditional moments. The null hypothesis of an inequality restriction is equiv-
alently expressed as an equality using the least concave majorant operator
applied to the integrated conditional moments. A suitable time transformation
of the basic process renders an asymptotic distribution-free test, with critical
values that can be easily tabulated. Monte Carlo experiments provide evidence
of the satisfactory finite sample performance of the proposed test.

Compound Poisson distribution


approximation via semi-
nonparametric distributions

Norman Giraldo
Universidad Nacional de Colombia.

Abstract
The semi-nonparametric density estimator or SNP, introduced by Gal-
lant and Nychka (1987) is a density estimator based on a family of functions
( fn (x , d ), n = 1,2,) , where each fn (x , d ) is a density function defined through
a truncated Hermites polynomials expansion, with a coefficients vector
(parameters) d = (d0 ,, d p ) , given by
n

2
pn
fn (x , d ) = di H i (x ) e x /2 + 0 n(x ), d n ,
2
(4.3)
i=0

77
CLAPEM 2014 Universidad Nacional de Colombia

where n(x) is the standard normal density function, ( pn , n 1) is a non-decres-


ing sequence
of positive integers, 0 > 0 is an arbitrary, small, real number and
n = {d : fn (x , d )dx = 1} . Given a sample of size m, { X1 ,, Xm } , from a con-

tinuous random variable X , with density f (x ) , its SNP estimator is defined
as f = fn (x , d) where d is the argument which maximizes the quasi-maxlike-
lihood (1/ m) t =1 log( fn (( Xt ) / , d ) / ), n 1 ; values , being the mean
m

and standard deviation respectively, directly estimated from the sample. Prop-
erties and new applications for the SNP distributions, away from its use as den-
sity estimators, and used in this presentation can be consulted in Leon, Menca
and Santana (2005). This presentation develops another SNP application, this
time to the problem of efficiently and accurately approximating the compound
Poisson distribution. If { X1 , X2 ,} is a sequence of iid positive, continuous
random variables, with common distribution F (x ) , with F (0+) = 0 , and
N Poisson( ) in a Poisson distributed with parameter , random vari-
able, assumed independent from the X 'j s , then we define the random variable
S = j X j , such that S = 0 N = 0 , named Compound Poisson, and denoted
N

S PC( , F (x )) . Its distribution function is



j
FS (x ) = (S x ) = p j X k x , x 0,
j =0 k=1
with j=1 = 0 and p j = (N = j) . If we replace F * j (x ) := ( k=1X k x ) , the
0 j

j-th convolution of F (x ) with itself, then FS (x ) = j=0 p j F * j (x ) , with F (x ) 1


*0

. The search for approximate methods for FS (x ) has been a longstanding problem
in actuarial science, with many proposed solutions and still an active source of
research. We now assume that the distribution of F (x ) depends on m parame-
ters, (1 ,..., m ) then S PC( , F (x ; 1 ,..., m )) . The contribution of this presen-
tation is an approximation method for FS ( x ) using a semi-nonparametric SNP
distribution with pm parameters (d1 ,, d p ) , obtained by a procedure akin
m
to the classical moment estimation method. If S PC( , F (x ; 1 ,, m ) and
Y SNP (d1 ,, d p ) then moments (S j ), (Y k ), j = 1,2,, m , k = 1,2,, pm
m
can be expressed by closed algebraic expressions. Assuming pm = m + 1 , we
solve by numerical methods the non-linear system

E(((S S ) / S ) j ) = E(Y j ), j = 1,2,, m + 1, (4.4)

with unknowns d1 ,..., dm+1 , and using the optimization procedures supported
by the R software system. We provide a comparison between the approximation
result from the proposed method with other well known approximation pro-
cedures like Bowers (1966) Gamma approximation, the Normal-Power method
(NP) in Beard et. al (1984), and the recursive method by Panjer, Panjer and
Willmot (1996). These methods are conveniently implemented with the help
of the R software actuar library, Goulet (2005).The quality of the approxima-

78
CLAPEM 2014

tions are examined following the same strategies implemented in Gendron and
Crepeau (1989) and Chaubey, Garrido and Trudeau (1998), where an Inverse
Gaussian distribution was assumed for the X variable. Then, the empirical
cumulative distribution from simulation of large samples of the exact distribu-
tion of S are compared with the approximations provided by the methods
using Cramer-von Mises statistics. We show significant improvements for the
proposed method over the others. This presentation contains results previously
obtained in Velsquez (2009).

An adaptive runs test to identify


markovian dependence

Myrian Elena Vergara Morales


Universidad de la Salle and Universidad Nacional de
Colombia
(Jimmy Antonio Corzo Salamanca, Universidad
Nacional de Colombia).

Abstract
We propose an adaptive runs test to identify rth order Markovian dependence
in a Bernoulli sequence, constructed from two runs tests: one of them depen-
dent on the number of ones and the other independent of it. We give explicit
expressions for the distribution of the test statistics and for the power of the test
based on these statistics. We calculate the power of the two tests separately and
of the adaptive test, and we note that the adaptive test is more powerful than the
two tests when they are used separately, specially when the success probability
is around 0.5 and when the sequence contains too much or too few successes.

79
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 8

Continuous state branching


processes in a Brownian motion

Sandra Palau Caldern


Centro de Investigacin en Matemticas A.C.
(Juan Carlos Pardo Milln, Centro de Investigacin
en Matemticas A.C.).

Abstract
In this talk, we introduce a continuous state branching process in a Brownian
environment. The present model generalizes the recent paper by Boinghoff and
Hutzenthaler, in which they studied the case when the continuous state branch-
ing process is the Feller-diffusion. In particular, we study different aspects of
this type of process as: probability of extinction or their conditioned version at
survival. Special attention is given to the self-similar case.

References
1. Boinghoff, C. and Hutzenthaler, C. (2012). Branching diffusions in random
environment. Markov Process and Related Fields, 18, 269-310.

2. Fu, Z. and Li, Z. (2010). Stochastic equations of non-negative processes


with jumps. Stochastic Processes and Their Applications, 120, 306-330.

3. Hutzenthaler, M. (2011). Supercritical branching diffusions in random


environment. Electronic Communications in Probablity, 16, 781-791.

80
CLAPEM 2014

A time reversal duality for branching


processes and applications

Miraine Dvila Felipe


Universit Pierre et Marie Curie & Collge de
France, Paris.
(Amaury Lambert, Universit Pierre et Marie Curie
& Collge de France, Paris).

Abstract
We consider a random forest F, defined as a sequence of i.i.d. splitting trees,
each started at time 0 from a single ancestor (with a specific distribution, dif-
ferent from that of its descendants), stopped at the first tree having survived up
( )
to a fixed time T. We denote by t , 0 t T the population size process associ-
ated to this forest, and we prove that if the splitting trees are supercritical, then the
( ) (
time-reversed process T t , 0 t T , has the same distribution as t , 0 t T
, but where the
)
, the corresponding width process of an equally defined forest F
underlying splitting trees are obtained by conditioning on ultimate extinction,
and are then subcritical. The results are based on an identity in law between the
contour processes of these random forests, truncated up to T, and the duality
property of Lvy processes. This identity will have some useful applications in
the context of epidemiology, since we will be able to characterize the population
size process conditional on the coalescence times of individuals at T.

On the distribution of symbols in


random weighted staircase tableaux

Pawel Hitczenko
Drexel University.
(Amanda Parshall, Drexel University).

Abstract
In this paper, we study staircase tableaux, a combinatorial object introduced
due to its connections with the partially asymmetric exclusion process (PASEP)
and Askey-Wilson polynomials. Due to their interesting connections, staircase

81
CLAPEM 2014 Universidad Nacional de Colombia

tableaux have been the object of study in several recent papers. In particular,
the distribution of various parameters in random staircase tableaux has been
studied. There have been interesting results on parameters along the main diag-
onal; however, no such results have appeared for other diagonals. It was conjec-
tured that the distribution of symbols along the kth diagonal is asymptotically
Poisson as k and the size of the tableau tend to infinity. We partially prove this
conjecture and, more specifically, we prove it for the second main diagonal.

Minimal clade size of beta-coalescent trees

Arno Siri- Jgousse


Universidad de Guanajuato.
(Fabian Freund, Hoenheim University, Linglong
Yuan, Uppsala University).

Abstract
We aim to study the asymptotics of distributions of various functionals of the
Beta (alpha alpha, alpha) n-coalescent with 1 < a < 2 when n goes to infinity.
The Beta n-coalescent is a Markov process taking values in the set of partitions
of 1,, n, evolving from the initial value 1,, n) by merging (coalescing) blocks
together into one and finally reaching the absorbing state 1,, n. The minimal
clade of 1 is the block which contains 1 at the time of coalescence of the single-
ton 1 . The limit size of the minimal clade of 1 is provided. We express this as a
function of the coalescence time of 1 and sizes of blocks at that time. The case
a = 1 is treated apart using a nice construction of the Bolthausen-Sznitman
coalescent by means of random recursive trees and results on the Chinese Res-
taurant process.

82
CLAPEM 2014

Contributed talks 9

Simultaneous confidence bands for


the estimation of the mean discounted
warranty cost for coherent systems

Carlos Mario Lopera


Universidad Nacional de Colombia.
(Nelfi Gonzlez lvarez, Universidad Nacional de
Colombia).

Abstract
The selection of a warranty program for a new product on the market gen-
erates additional costs to the manufacturer other than those inherent to the
manufacturing process. This makes it necessary to establish warranty costs for
a given period of time, thus, the manufacturer can estimate the required level of
reserves to deal with the future warranty claims. Particularly, we consider the
so-called discounted warranty costs. The models developed for these kinds of
costs incorporates the age of the product at the time of the warranty claim and
it can be studied through the stochastic process known as the General Lifetime
Model. In practice, most of the products are systems consisting of several com-
ponents. When the product or system is repairable and maintenance actions in
the components involving costs are made, it is interesting to model the impact
of such actions on the system warranty costs. One of the main appeals of the
General Lifetime Model is that it can evaluate the evolution of the system under
the so-called physical approach, which allows to model the failure process of
the system or product through time and given different levels of information, in
particular, it allows to model the failure rate process, which is the most impor-
tant aspect of these models. Thus, the main difference between the classical
reliability model -known as the statistical approach and the physical approach
is the level of information: while the latter shows the failure process at the level
of the components, in the former only the system failure is observed. This dif-
ferentiates the failure process from an approach to the other, due to the fact that
the associated failure rate processes change, so that the failure rate in the statis-
tical approach is a deterministic function, while the failure rate in the physical
approach is a stochastic process.

83
CLAPEM 2014 Universidad Nacional de Colombia

Statistical test for a hidden Markov model for


nucleotide distribution in bacterial DNA

Marcelo Sobottka
Universidade Federal de Santa Catarina.

Abstract
In this work, we present parameter estimators for a hidden-Markov based model
for the distributional structure of nucleotides in bacterial DNA sequences. Such
a model supposes that the gross structure of bacterial DNA sequences can be
derived from uniformly distributed mutations of some primitive DNA which
is constructed following a ten-parameter Markov process [1]. The proposed
estimators can be used to construct a statistical test which indicates whether a
given DNA sequence can be simulated by the model. This is a joint work under-
taken with A. G. Hart (Centro de Modelamiento Matemtico, Universidad de
Chile) and M. Weber Mendona (Universidade Federal de Santa Catarina). M.
Sobottka was supported by CNPq-Brazil grant 455399/2011-5 and by CAPES-
Brazil Fellowship.

References
1. Sobottka, M. and Hart, A. G. (2011). A model capturing novel strand sym-
metries in bacterial DNA. Biochemical and Biophysical Research Commu-
nications, 410(4), 823-828.

84
CLAPEM 2014

Weighted M-estimators in nonlinear


regression for complete data and with
missing responses at random

Paula M. Spano
Universidad de Buenos Aires and CONICET.
(Ana M. Bianco, Universidad de Buenos Aires and
CONICET).

Abstract
The main objective of this work is to develop simultaneous confidence bands
for the mean of the discounted warranty cost for coherent systems under physi-
cal minimum repair, i.e., when the system is observed at the level of its com-
ponents, using computer intensive methods based on resampling. In doing so,
based on the theoretical framework of martingale processes and the central
limit resampling theorem (CLRT), we proove the conditions of the latter on
the discounted warranty cost processes. A Monte Carlo simulation study to
evaluate the finite sample performance of the proposed method is performed
through the achieved coverage probability. The results in the considered sce-
narios show that the confidence bands based on resampling have coverage
probabilities close to the expected values, in particular, those based on samples
sizes with more than 100 systems.

85
CLAPEM 2014 Universidad Nacional de Colombia

Kullback Leibler procedure versus the


ignoring mode of failure model for
approximating a Weibull to an independent
Weibull Competing Risks Model

Sergio Yez
Universidad Nacional de Colombia.
(Luis A. Escobar, Louisiana State University, Nelfi
Gonzlez, Universidad Nacional de Colombia).

Abstract
The purpose of this study is to find a Weibull approximation to the distribution
of the minimum for a competing risks model with two independent Weibull
failure modes. The maximum likelihood Weibull fit ignoring the mode of fail-
ure information is called the ignoring mode of failure model (IG). We show
that for large samples and complete data, the ignoring mode of failure model is
equivalent to the best Kullback Leibler Weibull approximation.

Contributed talks 10

PCA and PRIM

Daniel Andrs Daz Pachn


University of Miami.
(Sunil Rao, University of Miami, Jean Eudes Dazard,
Case Western Reserve University).

Abstract
Principal Components Analysis (PCA) is a widely used technique that proves
useful for dimension reduction and characterization of variability in multivari-
ate populations. Our interest lies in studying when and why PCA can be used
to effectively model a response-predictor set relationship. Specifically, take Z

86
CLAPEM 2014

to be a continuous random variable such that its support traverses the origin
of a p-dimensional continuous space E. Let Y be a p-dimensional continuous
random vector in E such that the supports of each component of Y traverse the
origin of E, where Y also satisfies the property that its p components are pair-
wise orthogonal. Select uniformly in E any vector X of p continuous random
variables traversing the origin. We prove that Y explains Z better than X in
terms of the correlation. In particular, we prove that the principal components
explain better a response variable than the original input variables. This has
important consequences for modeling data in high dimensions. We illustrate
this result using PRIM, a bump-hunting algorithm used to identify and charac-
terize modal subgroups in populations. We study the empirical performance of
our findings via simulations that mimic high dimensional applications.This is
a joint work undertaken with J. Sunil Rao of the University of Miami and Jean-
Eudes Dazard of Case Western Reserve University.

Bootstrap-based uncertainty measures


for empirical best predictors in
generalized linear mixed models

Daniel Flores Agreda


Research Center for Statistics / Geneva School of
Economics and Management / Universit de Genve.

Abstract
In this talk, we focus on the problem of uncertainty estimation in prediction for
random effects in mixed models. In a first stage, we review the evaluation and
estimation of the Mean Squared Error (MSE) of the Empirical Predictor based
on second-order correct approximations in the spirit of Prasad & Rao (1990),
Das et al. (2004) and Jiang (2003) among others. Resampling procedures, and
specially Empirical Bootstrap, provide an attractive way of estimating MSE by
either computing it directly or by providing some bias correction in conjunc-
tion with the approximation-based approach. We explore bootstrap schemes in
mixed models for hierarchical data and propose a non-parametric algorithm
for estimating the MSE of the Empirical Best Predictors of the Random Effects,
based on the Generalized Bootstrap for Estimating Equations (Chatterjee &
Bose, 2005) adapted for Gaussian GLMMs by Field et al. (2010) and Samanta
& Welsh (2013). We apply this procedure in the context of Generalized Linear
Mixed Models and the Empirical Best Predictor (Jiang & Lahiri, 2006). Finally,

87
CLAPEM 2014 Universidad Nacional de Colombia

we illustrate the properties of our proposal with simulation studies. Joint work
with Eva CANTONI.

References
1. Chatterjee, S. and Bose, A. (2005). Generalized bootstrap for estimating
equations. The Annals of Statistics, 33(1), 414-436.

2. Das, K., Jiang, J., and Rao, J. N. K. (2004). Mean squared error of empirical
predictor. The Annals of Statistics, 32(2), 818-840.

3. Field, C. A., Pang, Z., and Welsh, A. H. (2010). Bootstrapping robust esti-
mates for clustered data. Journal of the American Statistical Association,
105(492), 1606-1616.

4. Jiang, J. (2003). Empirical best prediction for small-area inference based on


generalized linear mixed models. Journal of Statistical Planning and Infe-
rence.

5. Jiang, J., and Lahiri, P. (2006). Mixed model prediction and small area esti-
mation. Test, 15(1), 1-96.

6. Prasad, N. G. N. and Rao, J. N. K. (1990). The estimation of the mean squa-


red error of small-area estimators. Journal of the American Statistical Asso-
ciation, 85(409), 163-171.

7. Samanta, M. and Welsh, A. H. (2013). Bootstrapping for highly unbalan-


ced clustered data. Computational Statistics and Data Analysis, 59(Mar.),
70-81.

88
CLAPEM 2014

Analysis of 2(k-p) experiments with beta


response using a model doubly restricted

Luis Fernando Grajales


Universidad Nacional de Colombia.
(Luis Alberto Lpez, Universidad Nacional de
Colombia, scar Orlando Melo, Universidad
Nacional de Colombia).

Abstract
Some fractional factorial experiments include response in (0,1); their analysis
requires considering linear restrictions on the parameters because models are
supersaturated, i.e., we have more parameters than observations. In order to
solve this problem, a doubly restricted beta regression model is proposed: where
both the mean and dispersion parameters of the distribution are modeled and
restricted simultaneously. A penalty function with Lagrange multipliers is
proposed in order to get the maximum likelihood estimations on parameters.
The likelihood ratio and the Wald tests are considered alternatives for testing
a hypothesis about parameters; additionally, nested models are compared. We
consider a measure of goodness of fit. A simulated example and data from 2(k-
p) experiments are analyzed. Also, we compared the results obtained here with
Bayesian and frequentist unrestricted estimations made in some papers.

A general class of zero-or-one inflated


logit-skew-normal models

Germn Moreno
Universidad Industrial de Santander.
(Guillermo Martnez-Flrez; Heleno Bolfarine).

Abstract
This paper proposes a general class of regression models for continuous pro-
portions when data is inflated with zeros or/and ones. The proposed models
assumes that the response variable has a mixture continuous--discrete distri-
bution with covariates in both the discrete and continuous parts of the model.

89
CLAPEM 2014 Universidad Nacional de Colombia

As revealed by real data applications, models investigated seem to be a valid


alternative to the modeling proportions and rates with zero or one inflated.

Contributed talks 11

Jump-diffusion approximation of
density dependent Markov chains
in domains with boundaries

Enrico Bibbona
Department of Mathematics G. Peano, University
of Torino.
(Alessio Angius, Gianfranco Balbo, Marco Beccuti,
Andras Horvath, Department of Computer Science,
University of Torino, Roberta Sirovich, Laura
Sacerdote, Department of Mathematics G. Peano,
University of Torino).

Abstract
Density dependent Markov Chains are widely used to model many differ-
ent phenomena in population dynamics, chemical reactions, epidemics. It is
well known, mainly because of the work of Kurtz, that such processes can be
approximated by ordinary differential equations (ODEs) when their indexing
parameter grows large. Important phenomena that cannot be revealed with
such approaches include heavy tailed or bi-modal population distributions. A
better approximation proposed again by Kurtz is through diffusion processes.
However, such an approximation does not naturally encode the presence of
boundaries of the state space. We show how such a problem can be relevant
in some concrete examples and we propose a jump-diffusion approximation
that has the same law of the approximating diffusion as far as it remains in the
interior of the state space but includes jumps at the boundary that mimics the
original Markov chain and allow to catch the behavior at the behavior as well.
The same approach can also be applied to the simulation of hybrid models with
different time scales.

90
CLAPEM 2014

Inter-temporal equilibrium with state


dependent utilities and heterogenous agents

Jaime A. Londoo
Universidad Nacional de Colombia.

Abstract
I extend results given in J. A. Londoo: A sensitive Inter-temporal Equilibrium
for Relative Well-Being, working paper (2013), characterizing inter-temporal
equilibrium, when are assumed incomplete markets, markets with arbitrage
opportunities and when heterogeneous agents maximize a state dependent utility
functional, as proposed in J.A. Londoo State Dependent Utilities and Incomplete
Markets. Mathematical Problems in Engineering, 2013: 1-8 (2013). The maximi-
zation problem is an optimization problem defined on a class of portfolios that
are not arbitrage opportunities, but the market itself allows arbitrages. Also, we
prove that any equilibrium market arising from the previous considerations, sat-
isfies a weak form of lack of arbitrage, and we also provide tools for the construc-
tions of equilibrium markets when the aggregate endowments and dividends are
exogenously given. The theoretical framework used is a generalization or markets
when the processes are Brownian Flows on Manifolds.

Analytical approximations for pricing


financial options in jump diffusion
models using Mellin transforms

John Freddy Moreno Trujillo


Universidad Externado de Colombia.

Abstract
The definition and properties of the Mellin transform, together with their appli-
cations in determining the density of algebraic combinations of random vari-
ables naturally led to its application to the valuation of exotic financial options.
As a particular case, we consider the application of this transform in the case
where the underlying an arithmetic Asian option following a jump-diffusion

91
CLAPEM 2014 Universidad Nacional de Colombia

process. Is obtained by applying the transformed by a reduction of the partial


differential equation and a characterization of the solution. It is considered that
the underlying follows a process of the form

1 N (t )
S(t ) = S(0)exp (r q k 2 )t + W (t ) Yi
2 i=1

Partial difference equation Black-Scholes extended to this case and is consid-


ered an approach to its solution by applying Mellin transform.

Utility maximization in pure-jump


models with nonlinear wealth dynamics
driven by marked point processes

Rafael Serrano
Universidad del Rosario.

Abstract
We study the martingale approach to maximization of expected utility from
consumption and terminal wealth in a pure-jump model driven by marked
point processes and in the presence of margin requirements such as different
interest rates for borrowing and lending and risk premiums for short positions.
This is modeled by adding in a margin payment function into the investors
wealth equation which is nonlinear with respect to the portfolio process. We
give sufficient conditions for the existence of optimal policies using martingale
and convex duality techniques. Closed-form solutions for the optimal value
function are found in the case of pure-jump models with Markov-modulated
jump-size distributions and agents with logarithmic utility.

92
CLAPEM 2014

Contributed Talks 12

A new robust regression model for proportions

Cristian Bayes
Departamento de Ciencias, Pontificia Universidad
Catlica del Per.
(Jorge L. Bazn, Universidade de So Paulo,
Catalina Garca, Universidad de Granada).

Abstract
A new regression model for proportions is presented by considering the Beta
rectangular distribution proposed by Hahn (2008). This new model includes
the Beta regression model introduced by Ferrari and Cribari-Neto (2004) and
the variable dispersion Beta regression model introduced by Smithson and
Verkuilen (2006) as particular cases. Like Branscum, Johnson and Thurmond
(2007), a Bayesian inference approach is adopted using Markov Chain Monte
Carlo (MCMC) algorithms. Simulation studies on the influence of outliers by
considering contaminated data under four perturbation patterns to generate
outliers were carried out. These confirmed that the Beta rectangular regression
model seems to be a new robust alternative for modeling proportion data and
that the Beta regression model shows sensitivity to the estimation of regression
coefficients, to the posterior distribution of all parameters and to the model
comparison criteria considered.

Furthermore, two applications are presented to illustrate the robustness of the


Beta rectangular model.

Keywords: Proportions, Beta regression, Bayesian estimation, link function,


MCMC.

References
1. Branscum, A. J., Johnson, W. O. and Thurmond, M. C. (2007). Bayesian
beta regression; application to household data and genetic distance bet-
ween foot-and-mouth disease viruses, Australian & New Zealand. Journal
of Statistics, 49(3), 287-301.

93
CLAPEM 2014 Universidad Nacional de Colombia

2. Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates


and proportions, Journal of Applied Statistics, 31, 799-815.

3. Hahn, E. D. (2008). Mixture densities for project management activity


times: A robust approach to PERT. European Journal of Operational
Research, 188, 450-459.

4. Smithson, M. and Verkuilen, J. (2006). A better lemon squeezer? Maxi-


mum-likelihood regression with beta-distributed dependent variables.
Psychological Methods, 11(1), 54-71.

Does compressed sensing have


applications in robust statistics?

Salvador Flores
CMM, Universidad de Chile.
(Luis Briceo, Universidad Federico Santa Mara).

Abstract
Over the last few years, there has been a lot of excitement around the advances
in the reconstruction of sparse signals by 1 -norm minimization and its appli-
cations to compressed sensing. The core mathematical problem behind this is
that of finding the sparsest solution to under determined systems. The great
bulk of the existing results are probabilistic, and can be loosely resumed as: for
Gaussian random matrices A there is a very high probability that the minimal
1 -norm is a solution to the under determined system, Ax = b is also the spars-
est one, provided that the later is sparse enough. We shall discuss an application
of this theory proposed for the following variant of the robust linear regression
problem. Let y n be a vector containing observations from the linear model
y = Xf + , where = z + e is an error term composed of two contributions.
A dense, presumably small, vector of noise z, and an arbitrary sparse vector e
modeling outliers. We suppose that the design matrix X is under control and
not subject to contamination. The problem is to find an estimator f of f with
provable error bounds independent of the magnitude of the sparse vector e. We
show that the results obtained by the aforementioned theory can be improved
in many directions by extending some results related to the regression break-
down point of the classical 1 estimator. Our results are based on sharp error
bounds for the solutions of the 1 minimization problem min y Xg 1 when
errors consist of noise and sparse outliers.

94
CLAPEM 2014

Robust estimators in additive


models with missing responses

Alejandra Martnez
Universidad de Buenos Aires and CONICET.
(Graciela Boente, Universidad de Buenos Aires and
CONICET, Matas Salibian-Barrera, University of
British Columbia).

Abstract
As is well known, kernel estimators of the regression function in nonparametric
multivariate regression models suffer from the so-called curse of dimensional-
ity, which occurs because the number of observations lying in a neighborhood
of fixed radio decreases exponentially with the dimension. Additive models are
widely used to avoid the difficulty of estimating regression functions of several
covariates without using a parametric model. They generalize linear models,
are easily interpretable, and are not affected by the curse of the dimensionality.
Different estimation procedures for these models have been proposed in the
literature, and some of them have also been extended to the situation when the
data may contain missing responses. It is easy to see that most of these esti-
mators can be unduly affected by a small proportion of atypical observations,
since they are based on local averages or local polynomials. For that reason,
robust procedures to estimate the components of an additive model are needed.
We consider robust estimators for additive models based on local polynomials
that can also be used on data sets with missing responses. These estimators
simultaneously avoid the curse of dimensionality and the sensitivity to atypical
observations. Our proposal is based on the method of marginal integration,
and adapted to the missing responses situation.

95
CLAPEM 2014 Universidad Nacional de Colombia

Robust and consistent variable selection in


generalized linear and additive models

Marco Avella Medina


University of Geneva.
(Elvezio Ronchetti, University of Geneva).

Abstract
Generalized Linear Models (GLM) and Generalized Additive Models (GAM)
are popular statistical methods for modeling continuous and discrete data both
parametrically and nonparametrically. In this general framework, we consider
the problem of variable selection through penalized methods by focusing on
resistance issues in the presence of outlying data and other deviations from the
stochastic assumptions. We propose robust penalized M-estimators and study
their asymptotic properties. In particular, we show that robust counterparts
of the adaptive lasso and the nonnegative garrote satisfy the oracle properties.
Our results extend the available theory from linear models to GLM and GAM,
and from L2-based estimation to robust estimation. Finally, we illustrate the
final sample performance of the method by a simulation study in a Poisson
regression setting.

96
CLAPEM 2014

Contributed talks 13

Invasion, coexistence and extinction in spatio-


temporally heterogeneous environments

Alexandru Hening
University of Oxford.
(Steven N. Evans, UC Berkeley, Sebastan J.
Schreiber, UC Davis).

Abstract
A fundamental problem in ecology is to understand when it is possible for one
species to invade the range of another established species. There is widespread
empirical evidence that invasions can occur when there is significant heteroge-
neity in space and time in the range of the resident species. We propose a model
for the invasion process with a view to understanding what factors make inva-
sion possible. The model reduces to studying a coupled system of two stochastic
differential equations. By introducing the concept of invasion rate, we are able
to fully characterize the conditions on the coefficients of the SDEs under which
invasion is possible or impossible.

A wavelet-based multifractal approach


for the analysis of satellite images

Orietta Nicolis
Universidad de Valparaso.

Abstract
The description of natural phenomena by an analysis of the statistical scaling
laws is always a popular topic. Many studies aim to identify the fractal fea-
ture by estimating the self-similar parameter H, considered constant at differ-
ent scales of observation. However, most real world data exhibit a multifractal
structure, that is, the self-similarity parameter varies erratically with time. The

97
CLAPEM 2014 Universidad Nacional de Colombia

multifractal spectrum provide an efficient tool for characterizing the scaling


and singularity structures in signals and images, proving useful in numerous
applications such as fluid dynamics, Internet network traffic, finance, image
analysis, texture synthesis, meteorology, and geophysics. In recent years, the
multifractal formalism has been implemented with wavelets. The advantages
of using the wavelet-based multifractal spectrum are: the availability of fast
algorithms for wavelet transform, the locality of wavelet representations in both
time and scale, and intrinsic dyadic self-similarity of basis functions. In this
work, we propose a robust wavelet-based Multifractal Spectrum estimator for
the analysis of satellite images. Finally, a simulation study and examples are
considered to test the performances of the estimator.

Goodness-of-fit test for noisy directional data

Thanh Mai Pham Ngoc


Universit Paris Sud.
(Claire Lacour, Universit Paris Sud).

Abstract
We consider spherical data Xi noised by a random rotation i SO(3) so that
only the sample Zi =i Xi , is observed. We define a nonparametric test proce-
dure to distinguish H0: the density f of Xi is the uniform density f0 on the
sphere and H1: f f02CN and f is in a Sobolev space with smoothness s. For a
noise density f with smoothness index, we show that an adaptive procedure (i.e.
s is not assumed to be known) cannot have a faster rate of separation than ad(s)
= (N/sqrt(loglog(N))(2s/(2s+2+1)) and we provide a procedure which reaches
this rate. We also deal with the case of super smooth noise. We illustrate the
theory by implementing our test procedure for various kinds of noise on SO(3)
and by comparing it to other procedures. Applications to real data in astrophys-
ics and paleomagnetism are provided.

98
CLAPEM 2014

On probabilistic-stochastic
visual communication

Moshe Porat
Technion, Israel Institute of Technology.

Abstract
Color information plays a major role in visual communication although most
algorithms and tools are developed mainly for monochromatic image trans-
mission. Usually, the representation and coding of visual information is per-
formed either in the Red-Green-Blue (RGB) color space or in another color
space chosen rather arbitrarily. Considering an image as a two-dimensional
stochastic field, it is well known that the color components of natural images,
such as RGB, are highly correlated. In this work we explore the inter-color cor-
relation characteristics of natural images, their relation to the main statisti-
cal properties of the image and their joint behavior in major image spaces or
planes. Presently, most color coding algorithms tend to de-correlate the color
components as part of the coding and the transmission process. A widely
used example of the de-correlation approach is the baseline JPEG algorithm
for image compression (and consequently, the MPEG algorithm for video cod-
ing). This is done by applying color transforms to reduce the statistical correla-
tion and thus enabling low bit-rate chrominance encoding. However, having
analyzed these separate components as three monochromatic stochastic fields,
considerable resemblance is noticeable, implying that substantial mutual infor-
mation has remained and has not been exploited to reduce the size of the stored
or transmitted data. To improve the encoding, new analysis tools for the image
statistics are introduced in this work, taking advantage of the high inter-color
correlation to perform efficient encoding of the information. One approach is
to optimally reduce the inter-color correlation. Another approach is to use a
correlation-enhancement transform to increase the inter-color correlation of
an image prior to the encoding to allow efficient approximation of two of the
color components using the third color component as a reference. Our work
shows that exploitation of mutual spectral information, as proposed, can
improve the coding of chrominance information in color image compression.
The measure of Signal-to-Noise Ratio (SNR) is used to assess both quantita-
tive and subjective visual fidelity. Experimental results show that the proposed
approaches outperform presently available methods in the sense of SNR vs. bit-
rate. Our conclusion is that the high correlation between primary RGB colors
could be helpful for image coding as well as for video transmission, and that
a new spatio-temporal approach to image compression is more efficient than
conventional de-correlation-based techniques.

99
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 14

Wiener integrals with respect to


the Hermite random field and
applications to the wave equation

Jorge Clarke de la Cerda


Universidad del Bio-Bio.
(Ciprian A. Tudor, Universit de Sciences et
Technologie de Lille).

Abstract
The Hermite random field has been introduced as a limit of some weighted
Hermite variations of the fractional Brownian sheet. In this work, we define it
as a multiple integral with respect to the standard Brownian sheet and intro-
duce Wiener integrals with respect to it. As an application we study the wave
equation driven by the Hermite sheet. We prove the existence of the solution
and we study the regularity of its sample paths, the existence of the density and
of its local times.

Characterization of the support in Hlder


norm of a wave equation in dimension three

Francisco Javier Delgado-Vences


Universitat de Barcelona.
(Marta Sanz-Sol, Universitat de Barcelona).

Abstract
We consider a non-linear stochastic wave equation driven by a Gaussian noise
white in time and with a spatial stationary covariance. Results found by Dalang
and Sanz- Sol (2009), show that the sample paths of the random field solution
are Hlder continuous, jointly in time and in space. In this lecture, we will

100
CLAPEM 2014

establish a characterization of the topological support of the law of the solution


to this equation in Hlder norm. This will follow from an approximation theo-
rem, in the convergence of probability, for a sequence of evolution equations
driven by a family of regularizations of the driving noise.

A central limit theorem for the Euler


characteristic of a Gaussian excursion set

Jos Rafael Len


Universidad Central de Venezuela.
(Anne Estrade, Universit Paris Descartes).

Abstract
We study the Euler characteristic of an excursion set of a stationary Gaussian
random field. Let X : d be a stationary isotropic Gaussian field hav-
ing trajectories in C 2 ( d ) . Let us fix a level u and consider the excursion
set above u , {t d : X (t ) u} . We take the restriction to a compact domain
considering for any bounded rectangle T d , A(T , u) = {t T : X (t ) u} .

The aim of this paper is to establish a central limit theorem for the Euler char-
acteristic of A(T , u) as T grows to d , as conjectured by R. Adler more than
ten years ago. The required assumption on X is stronger than Gemans one in
dimension one but weaker than having C 3 trajectories. Our result extends to a
higher dimension which is known in dimension one, since in that case, the Euler
characteristic of A(T , u) equals the number of up-crossings of X at level u.

101
CLAPEM 2014 Universidad Nacional de Colombia

On the regenerative nature of the


extremal particles of supercritical one-
dimensional contact processes

Achillefs Tzioufas
Universidad de Buenos Aires.

Abstract
We study the behaviour of symmetric supercritical one-dimensional contact
processes on survival. We show the existence of random regenerative space-
time points on the trajectory of their extremal particles; the key to this a short
proof of a result of Mountford and Sweet (2000) by means of a new, elementary
approach.

Contributed talks 15

Assessment of respondent driven


sampling data from Guam health
communication survey

Grazyna Badowski
University of Guam.
(Lilnabeth Somera, University of Guam, Hye-Ryeon
Lee, University of Hawaii).

Abstract
Respondent driven sampling (RDS) is a relatively new network sampling tech-
nique typically employed for hard-to-reach populations (Heckathorn 1997,
2002). It is similar to snowball sampling where initial seed respondents
recruit additional respondents from their network. The RDS mathematical
model is based on Markov chain theory. It suggests that if peer recruitment
occurs through a sufficiently large number of recruitment waves, the sample

102
CLAPEM 2014

will stabilize and reach equilibrium distribution. The RDS model uses infor-
mation about the social network obtained during the recruitment process to
weight the sample. Under certain assumptions the method promises to produce
the sample independent from the biases that may have been introduced by the
non-random choice of seeds from which recruitment began. We conducted a
survey on health communication in general population on Guam using the RDS
method. In this paper, we will investigate the performance of RDS as a Mar-
kov chain by assessing all the assumptions and comparing estimates from the
RDS survey on health communication with population data from both the 2010
Guam Census and the 2012 Behavioral Risk Surveillance Survey (BRFSS). This
study included RDS data collected on Guam in 2013 (n = 511) and 2012 BRFSS
Guam Data (n = 2031). The estimates were calculated first using unweighted
RDS sample, second using RDS inference methods and compared with known
population characteristics. RDS sample was largely representative of the total
population by sex, ethnicity, socioeconomic status and geographic location but
sample overrepresented young adults age 18-34 and with some post high school
education. Respondent-driven sampling statistical inference methods failed to
reduce these biases. Further study is needed in deriving proper RDS statistical
inference.

Cycle detection for microarray time series data

Isabel Llatas Salvador


Centro de Estadstica y Matemticas Aplicadas.
Universidad Simn Bolvar, Venezuela.

Abstract
A new regression model for proportions is presented by considering the Beta
rectangular distribution proposed by Hahn (2008). This new model includes
the Beta regression model introduced by Ferrari and Cribari-Neto (2004) and
the variable dispersion Beta regression model introduced by Smithson and
Verkuilen (2006) as particular cases. Like Branscum, Johnson and Thurmond
(2007), a Bayesian inference approach is adopted using Markov Chain Monte
Carlo (MCMC) algorithms. Simulation studies on the influence of outliers by
considering contaminated data under four perturbation patterns to generate
outliers were carried out and confirm that the Beta rectangular regression
model seems to be a new robust alternative for modeling proportion data and
that the Beta regression model shows sensitivity to the estimation of regression

103
CLAPEM 2014 Universidad Nacional de Colombia

coefficients, to the posterior distribution of all parameters and to the model


comparison criteria considered.

Furthermore, two applications are presented to illustrate the robustness of the


Beta rectangular model.

Keywords: Proportions, Beta regression, Bayesian estimation, link function,


MCMC.

References
1. Branscum, A. J., Johnson, W. O. and Thurmond, M. C. (2007). Bayesian
beta regression; application to household data and genetic distance bet-
ween foot-and-mouth disease viruses, Australian & New Zealand. Journal
of Statistics, 49(3), 287-301.

2. Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates


and proportions. Journal of Applied Statistics, 31, 799-815.

3. Hahn, E. D. (2008). Mixture densities for project management activity


times: A robust approach to PERT. European Journal of Operational
Research, 188, 450-459.

4. Smithson, M. and Verkuilen, J. (2006). A better lemon squeezer? Maxi-


mum-likelihood regression with beta-distributed dependent variables.
Psychological Methods, 11(1), 54-71.

Empirical processes methods for


functional data analysis

Adolfo J. Quiroz
Universidad de los Andes.
(Joaqun Ortega, Centro de Investigacin en
Matemticas, Guanajuato, Mxico).

Abstract
The application of empirical processes methods in the context of the analysis
of functional data is considered. In particular, quadratic forms of dot products

104
CLAPEM 2014

of certain estimated functions with the functional data are studied as statistics
for the two-sample problem on functional data. Asymptotic distribution results
are given for the proposed statistics and application examples are described in
connection with principal component analysis for functional data.

Asymptotic statistical analysis of


stationary ergodic time series

Daniil Ryabko
INRIA Lille, France, and INRIA Chile.

Abstract
A fully non-parametric approach to asymptotic statistical analysis of stationary
ergodic time series is presented. The considered problems include time-series
clustering, hypothesis testing, change-point estimation, and independence
testing. The presented approach is based on empirical estimates of the dis-
tributional distance. Main results include algorithms that are asymptotically
consistent under the only assumption that the time series in question are sta-
tionary ergodic. No independence or mixing-type assumptions are involved.
While some results are new, detailed exposition of others can be found in [1] D.
Ryabko, Testing composite hypotheses about discrete ergodic processes, Test,
vol. 21, no. 2, pp. 317-329, 2012. [2] D. Ryabko and B. Ryabko, Nonparametric
statistical inference for ergodic processes, IEEE Transactions on Information
Theory, vol. 56, no. 3, pp. 1430-1435,2010.

105
CLAPEM 2014 Universidad Nacional de Colombia

Contributed talks 16

Robust and efficient estimation of


high-dimensional scatter matrices

Ricardo Maronna
National University of La Plata, Argentina
(Vctor Yohai, University of Buenos Aires and
CONICET, Argentina).

Abstract
We deal with the estimation of robust substitutes of the covariance matrix for
p-dimensional data. It is important that they possess both a high efficiency for
normal data and a high resistance to outliers; that is, a low bias under contami-
nation. The most frequently employed estimators are not quite satisfactory in
this respect. The Minimum Volume Ellipsoid (MVE) and Minimum Covari-
ance Determinant (MCD) estimators are known to have a very low efficiency.
S-Estimators (Davies 1987) with a monotonic weight function like the bisquare
behave satisfactorily for small p, say p 10 . Rocke (1996) showed that their
efficiency tends to one with increasing p. Unfortunately, this advantage is paid
with a serious loss of robustness for large p. There are three families of estima-
tors with controllable efficiencies: non-monotonic S-estimators (Rocke 1996),
MM-estimators (Tatsuoka and Tyler 2000) and tau-estimators (Lopuhaa 1991)
but their behavior for large p has not been explored to date. We compare their
behaviors employing different loss functions. A simulation study suggests that
the MM-estimators with an adequate loss function outperform the other types.

References
1. Davies, P. L. (1987). Asymptotic behavior of S-estimates of multivariate
location parameters and dispersion matrices. Ann. Statist., 15, 1269-1292.

2. Lopuha, H. P. (1991). Multivariate tau-estimators for location and scatter.


Canad. J. Statist., 19, 307-321.

3. Rocke, D. (1996). Robustness properties of S-estimators of multivariate


location and shape in high dimension. Ann. Statist., 24, 1327-1345.

4. Tatsuoka, K. S. and Tyler, D. E. (2000). On the uniqueness of S-functio-


nals and M-functionals under nonelliptical distributions. Ann. Statist., 28,
1219-1243.

106
CLAPEM 2014

Isotropy tests for textured images

Frdric Richard
Aix-Marseille Universit, France.

Abstract
The texture is an image aspect which is essential for processing images. In this
talk, we will deal specifically with irregular images, and consider a texture as
an effect of the irregularity on the image appearance. In this context, we will
focus on the issue of testing the texture isotropy. The isotropy is one of the main
texture features, which is useful for the diagnostic or pronostic of diseases in
Medicine. We will address the test issue considering the image as a realization
of generalized fractional Brownian fields.

107
Posters
CLAPEM 2014

Estimation and Local Influence


in Gaussian Models with Partially
Linear Covariance Structures.

Acosta Salazar Jonathan


Universidad Tcnica Federico Santa Mara.
(Vallejos Ronny, Universidad Tcnica Federico
Santa Mara; Osorio Felipe, Universidad Tcnica
Federico Santa Mara)

Abstract
In the context of spatial statistics, it is common to use the multivariate normal
distribution to model some phenomenon of interest. In this area there are no
replicates and the structure of the scale is of the form 2I + 2 R(). This is a
particular case of a more general structure, known as partially linear scales.
One of the most common estimation methods for the parameters 2 , 2, and
, is the restricted maximum likelihood (REML) method. This is based on
the observation vector multiplied by a matrix called contrast errors so that the
average of the new vector is zero. Moreover, Anderson (1973) estimated the
parameters of the multivariate normal distribution considering a linear scale
structure with a sample of size N. It is natural to extend the ideas of Anderson
(1973) to the case of partially linear structure, and look at the covariance of
spatial parametric models as particular cases when N = 1. On the other hand,
in diagnostic analysis, Cook (1986) developed a technique known as local influ-
ence, this approach measures the Gaussian curvature with small perturbations
in the sample and estimate the parameters of such disturbances to later mea-
sure the discrepancy between the estimates. In spatial statistics, Haining (1994)
suggested a diagnostic analysis by case-elimination. Genton and Ruiz-Gazen
(2010) studied an application with real data using infinitesimal perturbations.
In this talk, we extend the estimation method proposed in Anderson (1973) for
Gaussian models with linear covariance structures to the case of covariances
with partially linear structures. We also apply the methodology of local influ-
ence through an appropriate model with infinitesimal disturbances, using as a
measure of distance between models (with and without disrupting) the likeli-
hood displacement. We provide a tool that allows the detection of influential
observations. The proposed methodology is illustrated via an application with
real data.

111
CLAPEM 2014 Universidad Nacional de Colombia

Construction of the Family of


Marshall Olkin Copulas from Poisson
Distributed Random Variables

Joan Jess Amaya


Universidad Distrital Francisco Jos de Caldas
(Stefan Alberto Gmez Guevara, Universidad
Distrital Francisco Jos de Caldas,
David Andrs Paloma, Universidad Distrital
Francisco Jos de Caldas).

Abstract
We propose to show the construction of the family of bivariate Marshall-Olkin
copulas, using the marginal functions of the joint distribution of survival Poisson
distributed random variables. Moreover we simulate some copulas of that family,
generating the values of the random variables using the inversion method.

Parameter estimation in the Ornstein-


Uhlenbeck model with a noise generated
by a long memory process

Hctor Araya
Universidad de Valparaso.
(Soledad Torres, Universidad de Valparaso).

Abstract
The main purpose of this work is to estimate the parameter associated to the
Ornstein - Uhlenbeck (OU) model, with noise generated by a long memory
process. The OU process, is defined by the solution of the linear stochastic dif-
ferential equation dWt = dtdWt where Wt is a Wiener process or Brown-
ian motion. The main change of the Brownian motion OU is that the noise is
replaced by a long memory non-Gaussian process. We estimate the parameter
, by the least squares (LS) method, and we prove consistency. Finally, we pres-
ent some simulation study.

112
CLAPEM 2014

Determinants for Vertical Transmission


of Hiv in the State of Par, Brazil

Adrilayne dos Reis Arajo


Universidade Federal do Par
(Franciely Farias da Cunha,Universidade Federal do
Par Gelilza Salazar Costa, Universidade Federal do
Par)

Abstract
With the increasing number of HIV infected women in reproductive age,
children have been considered a growing risk group of HIV infection, with a
remarkable increasing in the incidence of children already born infected verti-
cal transmission. The transmission of HIV can occur during labor, others occur
in utero, especially in the last weeks of pregnancy. The objective of this study is
to evaluate possible factors influencing cases of live births of mothers infected
by HIV trough Binary Logistic Regression and characterize the social profile of
these pregnant women in the state of Par, Brazil. From the exploratory data
analysis, it is noteworthy that most of the women had laboratory diagnosis of
HIV infection during the prenatal period, the majority of infected pregnant
women had an emergency cesarean delivery. It was found that, among children
born alive, after birth was initiated antiretroviral prophylaxis within 24 hours
of birth. By binary logistic regression, we observed that pregnant women with
HIV who underwent the prenatal period are nearly 4 times more likely to have
a child born alive compared to pregnant women who did not have prenatal care.
The values of the parameter estimates of the achievement variable of prenatal
care were significant (p = 0,05). In this way it is important that the diagnosis of
HIV infection is done early in pregnancy to allow control of maternal infection,
performing prenatal and other treatments due to HIV virus.

Keywords: Binary Logistic Regression, Born Alive, Seropositive Pregnant.

113
CLAPEM 2014 Universidad Nacional de Colombia

A robust censored errors-in-variables model

Reinaldo B. Arellano-Valle
Pontificia Universidad Catlica de Chile
(Gustavo H.M.A. Rocha, Universidade Federal
de Minas Gerais, Belo Horizonte - MG, Brazil.,
Rosangela H. Loschi, Universidade Federal de
Minas Gerais, Belo Horizonte - MG, Brazil.).

Abstract
In this paper, we consider a non-standard linear regression model, where the
dependent variable is censored and some explanatory variables are measured
with additive errors. In addition, we build our statistical model on the assump-
tion of non-normality for the underlying probabilistic process. Specifically,
we assume that the joint distribution of the error terms and latent covariates
behave as a multivariate t distribution. Thus, the proposed model will be robust
enough to protect our inferences of atypical or influential observations. For the
estimation of the model parameters, we use the classical method of maximum
likelihood through the EM algorithm, in which we included an estimation
procedure for the asymptotic variance of the maximum likelihood estimators.
The proposed methodology is flexible enough to be adapted to other elliptical
models belonging to the scale mixture class of the normal model. The newly
developed procedures are illustrated with an application and simulated data.

A new classification method based on


the Kullback-Leibler divergence

William David Aristizbal Rodrguez


Federal University of Pernambuco
(Getulio Jose Amorim do Amaral, Federal
University of Pernambuco, Abrao David Costa do
Nascimento, Federal University of Pernambuco).

Abstract

Classification methods have been indicated as important steps to solve practical


issues, such as biology, radar image processing and reliability areas. Two clas-

114
CLAPEM 2014

sical parametric and nonparametric procedures are the Linear Discriminant


Analysis (LDA) and K Nearest Neighbors (KNN) methods, respectively. Both
LDA and KNN methods have presented good performances in terms of clas-
sification error rates. In general, the first of them is chosen in contexts on which
interpretability issues must be taken into account; while, the KNN method is
often more flexible than the LDA because it does not suppose the existence of
a linear separation among populations defined in the training stage. In this
paper, we propose a new semiparametric classification method based on the
Kullback-Leibler divergence. Our proposal is compared to LDA and KNN
methods by means of a Monte Carlo simulation study. To that end, we use the
classification error rate combined with the correlation coefficient variation as a
comparison criterion. Results provide evidence that the proposed method can
provide smaller error rates than the classical LDA and KNN procedures and
some variations of these methods.

Keywords: Classification methods, Linear Discriminant Analysis, K Nearest


Neighbors, Kullback-Leibler divergence, classification error rate.

Bias reduction of maximum


likelihood estimates for an
alternative skew-normal model

Jaime R. Arrue
Universidad de Antofagasta
(Reinaldo B. Arellano-Valle, Pontificia Universidad
Catlica de Chile, Hctor W. Gmez Universidad de
Antofagasta)

Abtract
The skew-generalized-normal model with parameters 1 and 2 0,
denoted by SGN (1, 2) corresponds to the skew-normal (SN) model for 2 =0.
Hence, several peculiarities of the SN model are preserved by the SGN one. In
particular, the Fisher information matrix is singular at 1 = 2 =0, and the MLE
of 1 can diverge infinite samples. If the additional parameter 2 is fixed in a
known value, e.g., 2 = 1, the SGN model becomes a natural competitor of the
SN model, with the advantage that its Fisher information matrix is no-singular
at 1 = 0. However, the divergence problem of the MLE of 1 infinite samples
persists. In this work, we study the SGN (1 1) model, hereafter denoted as MSN

115
CLAPEM 2014 Universidad Nacional de Colombia

(), where the divergence of the MLE of the shape parameter occurs with
positive probability in finite samples. To avoid this problem, we apply a method
proposed by Firth (1993), which uses a modified score function to estimate the
shape parameter. As a first result, the modified LME of is always finite. The
quasi- likelihood approach for confidence intervals is considered. When the
model has location and scale parameters, we combined our method with the
classical MLE of these parameters.

On the advantages of Glmperm And Rfit


R-Packages for generalized linear models:
how small the samples must be to obtain
statistically significant improvements?

Rodrigo Assar
FCFM Universidad de Chile
(Barrientos Francisco, ICBM Universidad de Chile).

Abstract
A key question in the consolidation of Generalized Linear Models is how
to decide if a given variable is statistically significant. The common way to
respond this question is through t-tests on the model coefficients, for continu-
ous variables, or using ANOVA for categorical variables. However, for small-
sized data sets, gaussianity of the errors and orthogonality of covariables are
very strong assumptions which carry to erroneous deductions. Here we analyze
how to avoid two error sources, correlation between co-variables and presence
of outliers, by an appropriate use of two alternative methods implemented in
the R-packages Glmperm and Rfit.Glmperm faces the correlation by consider-
ing the orthogonal projection of the tested variable on the other co-variables.
Thus, it replaces the variable by its projection and computes permutations of
the p-value. On the other hand, Rfit avoids the effect of outliers by using a
new Least Squares Method, which is based on Jaeckels dispersion. Through
randomly-generated examples, we show the performance advantages of these
methods over the common parametric tests in small samples. Finally, starting
from these examples, we generate criteria to decide if the sample size is small
enough to expect statistically significant differences between using the com-
mon or alternative ways, passing from erroneous decisions to correct decisions.

116
CLAPEM 2014

Boosting the performance of


functional data classifiers

Jairo Arturo Ayala Godoy


Centro de Investigacin en Matemtica, A.C
(CIMAT)

Abstract
Classification methods for functional data range from direct adaptations of
classical multivariate techniques to recent proposals based on depth measures
and nearest neighbours ideas. Necessarily, all of them are based on appropri-
ate definitions of distance among curves. Boosting is an established technique
in the classification literature to improve the performance of weak classifiers.
In this work, we rank the main classification algorithms in the FDA literature
according to their boosted performance and point to the features of classifiers
that are more susceptible of benefit from Boosting. Our conclusions are sup-
ported by a simulation study and illustrated by benchmark datasets.

A constructive approach to optimization of


dynamic systems with uncertainties: The
extended Gaussian pseudospectral method

Vadim Azhmyakov
University of Antonio Nario
(lber lvarez Pinto, University of Antonio Nario,
Ruthber Rodrguez Serrezuela, University of
Antonio Nario).

Abstract
We deal with a class of stochastic optimal control problems involving some
models of engineering systems with uncertainties. Our contribution is mainly
devoted to a practically motivated application of the pseudospectral solution
method for some stochastic-type Hamiltonian boundary value problems. We
propose a numerical algorithm based on the celebrated Gauss pseudospec-
tral approach to optimal dynamic systems with stochastic nature. The last one

117
CLAPEM 2014 Universidad Nacional de Colombia

makes it possible to simplify the conventional Hamiltonian boundary value


problem and consider some auxiliary algebraic systems. The implementable
algorithm we propose is numerically consistent and moreover, implies a con-
crete implementable scheme for a relatively small discretization grid associated
with the given stochastic dynamics. We also use the differential continuation
approach in order to correctly treat the strong requirement for the initial con-
ditions. Finally, we present a complete computational solution procedure and
moreover, discuss some practically motivated systems engineering examples.

ARCH model and fractional Brownian motion

Natalia Bahamonde
(Pontificia Universidad Catlica de Valparaso),
Soledad Torres (Universidad de Valparaso),
Ciprian Tudor (Universit de Lille 1)

Abtract
We study an extension of the ARCH model that includes the squared fractional
Brownian motion. We construct least squares estimators for the parameters of
the model and we study their asymptotic behavior. We illustrate our results by
numerical simulations.

118
CLAPEM 2014

A Hybrid Freeway Travel Time Forecasting


Model Integrating Principal Component
Analysis And Neural Networks

Prateek Bansal
The University of Texas at Austin
Prof. Chen Mu-Chen (Institute of Traffic and
Transportation, National Chiao Tung University,
Taiwan)

Abstract
As travelers make their choices based on cost associated with travel time, its
information can be helpful to them in choosing appropriate routes and depar-
ture times. To achieve this goal, travel time prediction models have been pro-
posed in literature, but identification of important predictors has not received
much attention. Therefore, this study aims to build a robust and accurate free-
way travel time prediction model by identifying important predictor variables
(feature selection). We propose a travel time prediction and feature selection
model by integrating principal component analysis (PCA) and back propaga-
tion neural networks (BPNN). Though PCA is an extensively used data mining
technique, but as per authors best knowledge, literature does not have method-
ology to retrace original variables from principal components (PCs). Therefore,
we propose a straightforward methodology to retrace original variables from
PCs. The developed methodology should motivate researches to use PCA more
extensively in future. The developed hybrid PCA-BPNN model was validated by
predicting travel time on a 36.1 km long segment of Taiwans National Freeway
No. 1. The model predicts travel time on chosen freeway segment using only
four predictor variables with prediction accuracy equivalent to a stand-alone
BPNN prediction model developed with forty-three predictors. We found that
speed and flow of heavy vehicles on freeway are important predictors of travel
time whereas, rainfall found to have negligible predictive power. These findings
facilitate considerable reduction in financial expenses and time during future
data collection.

119
CLAPEM 2014 Universidad Nacional de Colombia

Cut-off phenomenon in diffusions


markov processes

Gerardo Barrera Vargas


Instituto de Matemtica Pura e Aplicada.
(Milton Jara Valenzuela, Instituto de Matemtica
Pura e Aplicada).

Abstract
We study the cut-off phenomenon for a family of stochastic small perturbations
of dynamical systems. We will focus on a semi-flow of a deterministic differen-
tial equation which is pertubed by small pertubations of a Brownian motion.
Under weaker hypothesis on the vector field, we will prove that the family of the
perturbed stochastic differential equations have cut-off phenomenon.

Efficiency Comparisons For Two


Sampling Pips Survey Designs

Oscar Ivan Barreto


Universidad Nacional de Colombia, Sede Bogot
(Trujillo Leonardo, Universidad Nacional de
Colombia, Sede Bogot)

An alternative probability proportional to size (pi-ps) sampling design named


as the Alternative Poisson (AP) design was recently proposed by Zaizai, Miao-
miao and Yalu (2013) and its inclusion probabilities calculation were studied
by Yan and Xue (2014). Pi-ps survey designs commonly use available auxiliary
variables for the whole population in order to build inclusion probabilities in
an efficient way. In this paper we compare a well-known method for pi-ps sam-
pling (Sunter, 1986) with the AP one in terms of variance. We prove that even
the AP design still has some problems in terms for those big units in the popu-
lation that cannot be detected as forced inclusion elements; the new proposed
design is in practice more efficient than the one proposed by Sunter.

Key words: design based inference, pi-ps sampling designs, Sunter method, AP
sampling design.

120
CLAPEM 2014

Bivariate distributions derived from copula


functions in the presence of cure fraction

Emlio Augusto Coelho-Barros


Universidade Tecnolgica Federal do Paran
(Jorge Alberto Achcar, Universidade de So Paulo;
Josmar Mazucheli, Universidade Estadual de
Maring ).

Abstract
We introduce bivariate Weibull distributions derived from copula functions in
presence of cure fraction, censored data and covariates. Two copula functions
are explored: the FGM (Farlie - Gumbel - Morgenstern) copula and the Gum-
bel copula. Inferences for the proposed models are obtained under the Bayes-
ian approach, using standard MCMC (Markov Chain Monte Carlo) methods.
An illustration of the proposed methodology is given considering a medical
data set. The use of copula functions could be a good alternative to analyse
bivariate lifetime data in presence of censored data, cure fraction and covari-
ates. Observe that in many applications of lifetime modelling, we could have
the presence of a cure fraction for individuals that are long term survivors or
cured individuals.

Time-Varying Autologistic Model


For Dynamic Network Data

Brenda Betancourt
University of California, Santa Cruz
(Rodriguez Abel, University of California, Santa
Cruz; Naomi Boyd, West Virginia University).

Abstract
Modelling of temporal evolution of network data has become a relevant prob-
lem for different applications. However, the complexity of the models increase
rapidly with the number of nodes making efficient short term prediction of
future outcomes of the system a challenge for big network data. Here, we pro-

121
CLAPEM 2014 Universidad Nacional de Colombia

pose an autologistic model for directed binary networks with a fused lasso pen-
alty. This model favors sparse solutions of the coefficients and their differences
in consecutive time points, and it is suitable for complex dynamic data where
the number of parameters is considerably greater than the number of observa-
tions over time. The structure of our model allow us to treat the optimization
problem separately for each pair of nodes increasing efficiency of the algorithm
through parallel computing. The optimal fused lasso tuning parameters are
chosen using BIC. We show the performance of the model on a real trading
network from the NYMEX natural gas futures market observed weekly over a
period of four years.

On branching process with


rare neutral network

Airam Aseret Blancas Bentez


Centro de Investigacin de Matemticas, Mxico.

Abstract
We study the genealogical structure of a Galton-Watson process with neutral
mutations, where the initial population is large and mutation rate is small.
Namely, we extend the results obtained in 2010 by Bertoin in two directions.
In the critical case, we construct the version of Bertoins model conditioned not
to be extinct, and in the case with finite variance, we show convergence of the
allelic sub-populations towards a tree indexed CSBP with immigration. Besides
this, we establish the version of the limit theorems in Bertoins work, in the case
where the reproduction law has infinite variance (and the above) and it is in the
domain of attraction of an a-stable distribution.This work is part of my PhD
research elaborated under the direction of Vctor Rivero.

122
CLAPEM 2014

Sample size for recurrent events analysis

Rafael E. Borges
Universidad de Los Andes, Mrida, Venezuela
(Maura Vsquez, Universidad Central de
Venezuela).

Abstract
Sample size is one of the most important issues that should be considered in
any study. In many studies, this is a topic that is completely solved, but it is still
a problem in some designs that are not standard. For recurrent event analysis,
the problem is not completely solved, the problem is completely solved from
two subjects that includes this type of model: it is solved for survival analysis
for the occurrence of one event (the standard survival analysis context), and for
longitudinal data analysis. In this talk, we present a brief review of the avail-
able method of the most important method for sampling size calculation for
survival analysis, and for longitudinal analysis, and propose two extensions for
determining the sample size of a recurrent event study. One of the extensions
is derived from the sample size for a unique event in survival analysis, and the
other is derived from the longitudinal data analysis context, we discuss the pros
ans cons from both extensions, revise some mathematical properties, and apply
the methods proposed for a study for analyzing the recurrence of episodes of
malaria caused by Plasmodium vivax in an endemic area of Venezuela.

Asymptotic distribution for a heavy-


tailed renewal reward dependent
process and applications

Dbora Borges Ferreira


Universidade Federal do Rio Grande do Norte.

Abstract
In this note, we study the convergence of renewal processes with rewards to
when the time between arrivals of claims depend strongly on the same. Assum-
ing the distribution of rewards and time inter arrivals are heavy tail we will use

123
CLAPEM 2014 Universidad Nacional de Colombia

Mallows distance to obtain the convergence searched. The result will be applied
in two situations. In the context of data traffic in communication networks, we
consider the model of an ON/OFF source sending random loads of traffic to a
network node, where there is a buffer with large capacity memory that stores
the information until it is transmitted. We will use the main result to estimate
the probability of buffer over flow in finite time. Secondly, consider the continu-
ous classical reserve risk process in the case that the claims are strongly depen-
dent on the time between arrivals of the same. An important risk measure is the
ruin probability which will be achieved through the main result.

Reference analysis for the student-t


calibration linear model

Mrcia DElia Branco


University of So Paulo, SP, Brazil.
(Reinaldo Arellano-Valle PUC - Santiago, Chile).

Abstract
The linear calibration problem, also known as inverse regression problem, is
motivated by the comparison of two or more measurement techniques/instru-
ments of the given characteristic of interest. Bayesian reference analysis under
normal linear calibration models were discussed in Ghosh et al. (1995), Kubo-
kawa and Robert (1994) and Chaibub Neto and Branco (2007). Extension of
some of these results are developed here for the Student-t linear calibration
model. A reparametrization is proposed to obtain a friendly expression for
the Fisher Information Matrix. We also discuss some theoretical proprieties of
these references prior and posterior distributions.

124
CLAPEM 2014

Cross-entropy for detecting anomalous


behaviour in health-care service provision

Sergio Armando Camelo Gmez


Universidad de los Andes, Colombia
(lvaro Jos Riascos, Universidad de los Andes,
Colombia)

Abstract
In this article we use cross-entropy to identify anomalous data in health-care
reports for the Colombian health system in 2010. In Colombia the government
pays for most of the patients health spending. Because the charges that insurers
make to the government are numerous, expenditure monitoring is difficult and
the opportunities for fraud are various. To automatize the search for anomalous
behavior we first divide the population in different risk groups. Each group
is characterized by a unique combination of gender, age group and medical
diagnosis. Then, within each risk group, we estimate parametrically the cross-
entropy of the information provided by each insurer with respect to the rest of
the sample. To make the calculations we use variables such as total spending,
number of medical appointments, number of first-time medical appointments
and medication requests. Cross-entropy calculations work as measurement of
how anomalous the information provided by each insurer is, so we look at the
highest values. We find that this method is able to identify strange data. Finally,
we implement a method that shows only anomalous data that is highly expen-
sive to the government. The anomalous reports found are very interesting.

Meteorological conditions
indexes and extreme values

Casanova Del ngel Francisco


Instituto Politcnico Nacional, Mxico.

Abstract
A graphic analysis of meteorological data collected by the IPNs meteorologi-
cal station located in Mexico Citys northern area, from 2001 through 2013,

125
CLAPEM 2014 Universidad Nacional de Colombia

is shown. Statistical analysis carried out begins with calculation of statisti-


cal parameters of every meteorological variable; meteorological correlation is
obtained from usual Euclidian distance between meteorological variables and
variance-covariance. Historical trends and their graphic structures are analyzed,
obtaining their average and trend values. Ellipses showing natural maximum
and minimum limits under which the values of the meteorological variable under
study should historically be present were built for annual and seasonal distribu-
tions. Probability distributions of extreme values were determined.

Cases of homicide in Belm, Par, Brazil

Dbora Fernanda Castro Vianna Oliveira


Federal University of Par
(Dos Santos De Almeida Silvia, Federal University
of Par; Dos Reis Arajo Adrilayne, Federal
University of Par).

Abstract
The homicide is characterized as universal indicator of social violence, which
is distributed heterogeneously across regions and continents, the homicide is
primarily responsible for the high mortality of the population. It is socially rec-
ognized an act of extreme violence and serious violations of the rights to life
and security. Due to magnitude and transcendence that homicides have posed,
this paper aims to determine factors associated with homicides in the city of
Belm, located in Par State, Brazil, from January to December 2012 . For that,
we used whether the statistical techniques Exploratory Data Analysis and Cor-
respondence Analysis. Among the main results, we observed that the majority
of homicide victims in the city of Belm is male (92.33%), the most were victims
of homicide in October (11.48%) , the actually occurs in the streets (85.23%) , on
Sunday (23.12%) , the night shift (49.13%), more specifically in the range of time
for the 21:00 to 21:59 hours (11.17%), being committed with a firearm (81.02%)
and the presumed cause is hatred or revenge (90.85%). The results show that is
significant association between the levels of the variables: gender versus shift
and employee versus place of occurrence. Showing that the crime of homicide
is a serious public safety.

126
CLAPEM 2014

Test of hypotheses on random graphs


with application in neuroscience

Andressa Cerqueira
Universidade de So Paulo
(with Florencia Leonardi, Universidade de So
Paulo)

Abstract
The theory of random graphs has been successfully applied in recent years to
model neural interactions in the brain. While the probabilistic properties of
random graphs has been extensively studied in the literature, the development
of statistical inference methods for this class of objects has received less atten-
tion. In this work we propose a nonparametric test of hypotheses to decide if
two samples of random graphs are originated from the same probability distri-
bution. We show how to compute efficiently the test statistic and we study the
performance of the test on simulated data. The main motivation of this work is
to apply this test to analyze neural networks constructed from electroencepha-
lographic data.

Comparing the performance of blind source


separations techniques: Ica, Jade and Sobi

Ignacio Correa
Universidad de Chile
Fredes Luis (Universidad de Chile), Perlroth Andrs
(Universidad de Chile).

Abstarct
Separation of sources consists of recovering a set of signals of which only instan-
taneous linear mixtures are observed. Often no a priori information on the
mixing matrix is available, iti.e. the linear mixture should be processed. When
the array manifold is unknown, blind identification of spatial mixtures allows
an array of sensors to implement such a source separation. This technique has
important applications in the Chilean mining industry, in particular for copper

127
CLAPEM 2014 Universidad Nacional de Colombia

extraction. To prevent tragedies, it is very important to detect earth movements


in time to act. Consequently, it is key to find the signal sources and discrimi-
nate risky from harmless movements. In this work we consider the three most
important Blind Source Separation Techniques, and their implementations in
the R statistical software. The ICA technique, based on information theory,
maximizes the entropy to estimate the source signals. The JADE technique
uses a family of fourth-order cumulant matrices. The SOBI technique relies
on stationary second order statistics based on a joint diagonalization of a set of
covariance matrices. Through sensitivity and simulation analyses, we compare
the three techniques in terms of stability, computational time and numerical
error. We conclude that, assuming source signals stationary, the JADE tech-
nique exhibits a significantly better performance than Fast ICA and SOBI.

A jump processes with Hoffman distribution

Nelson Alirio Cruz Gutirrez


Universidad Nacional de Colombia.
(Fabin Camilo Becerra Ochica, Universidad
Nacional de Colombia, Jos Alfredo Jimnez
Moscoso, Universidad Nacional de Colombia).

Abstract
This work seeks to generalize the discrete jump Markov processes, through the
modification of underlying Poisson Processes for a new stochastic processes
that we called the Hoffman processes, which is characterized by a distribution
of the Hoffman family. In this family, Poisson distribution is a particular case
We present the basic properties of the Hoffman process, to establish the jump
Hoffman process, and also difference between proposed jump Hoffman process
and known jump process model.

128
CLAPEM 2014

Copula applied to multivariate


linear regression models

Cruz Reyes Danna Lesley


Universidad El Bosque
(Masmela Caita Luis Alejandro, Universidad
Distrital Francisco Jos de Caldas).

Abstract
It is still common in linear regression procedure with the objective of shap-
ing of a random variable, named variable regressor, through a linear function
of a set of covariates, also called independent variables, besides involving a
random term. The model is expressed as Y = + , where Yn 1 . the vector
that containing the values of the regressor variable matrix, Xn(p+1) is called
the design matrix and contains information of the independent variables, the
vector (p+1)1 contains the parameters that measure the influence that covari-
ables on the regressor variable and n1 is the vector of errors. But, this model
considers a single response variable, suppose that we have m variable regressor,
each of which has p components, this model is a multivariate linear model,
that accommodates two or more response variable. The multivariate linear
regression model is essentially several univariate linear regression models
putting together, with the errors being related with each. Here multivariate
means the response variables are multivariate. The assumptions in this model
are E((i ) ) = 0 and Cov((i ) ), ( k ) = ik I with i; k = 1; 2; m, where = i k , but
observations from different trials are uncorrelated. In this model the unknown
parameters are and ik . The maximum likelihood estimators are obtained
under the assumption that e is normally distributed and these coincide with
the OLS estimators. In this article, under the above conditions and for the par-
ticular case where m = 2 and p = 1, ie, only two regressors include variables and
co-variable. In this sense, the two variables are assumed Y(1) and Y(2) normal
and is used to obtain the bivariate distribution. In this paper, we present copula
regression as an alternative to OLS and maximum likelihood, the joint distri-
bution is described by a copula, the major advantage of a copula regression is
that there are no restrictions on the probability distributions that can be used.
The ideal Copulas have the following properties: ease of simulation, closed form
for conditional density, different degrees of association available for different
pairs of variables. So, the good Candidates are the Gaussian copula or t-Copula.
The Results on a dataset are illustrated in finance area.

129
CLAPEM 2014 Universidad Nacional de Colombia

Multifractal detrended fluctuation


analysis in Colombian market indices

Andy Rafael Domnguez


Politcnico Grancolombiano
(Benjamn Valderrama, Politcnico
Grancolombiano).

Abstract
The multifractal detrended fluctuation analysis (MF-DFA) is a well-established
method to detect correlations and multifractal properties in time series with
non-stationarity and it has found applicability in a wide variety of fields. We
analyze multifractal features by using MF-DFA for various Colombian market
time series (stock market global index and other stock market indices). The
results show the quantifying of the multifractality structure and long-range
correlations of analyzed time series. We discuss implicactions multifractality
nature of these results in developing better forecasting models, such as has been
reported by other investigations [1-2].

References
1. Zunino, L. et al. (2009). Multifractal structure in Latin-American market
indices. Chaos, Solitons & Fractals, 41(5), 2331-2340.

2. Domnguez, A. (2013). Long-range correlations in the Colombian electri-


city spot prices, ICAMI 2013 International conference on applied mathe-
matics and informatics. San Andrs, Colombia. Nov. 2013.

130
CLAPEM 2014

Split plot unfolding interaction


in nonlinear regression

Alessandra dos Santos


Escola Superior de Agricultura Luiz de Queiroz
(Taciana Villela Savian, Escola Superior de
Agricultura Luiz de Queiroz, Simone Daniela
Sartorio, Universidade Federal de So Carlos).

Abstract
In experiments involving a qualitative and a quantitative factor, if a significant
interaction between the factors is detected in the variance analysis, a regres-
sion analysis should be taken. However, the use of linear regression models is
not always the most appropriate key to evaluate the effect of the quantitative
factor. This paper presents a means to suit a nonlinear regression model in a
trial involving repeated measurements over time. For that, the weight gain of
male and female sheep of Santa Ins breed, in kilograms and at twelve different
ages, was measured. Conducted in a split-plot design, since the time factor was
not randomized, the variance analysis requires adjustment on freedom degrees,
due to the unperformed sphericity condition. Greenhouse and Geisser correc-
tion (G-G) was used for the interaction and time effects. The F test in vari-
ance analysis showed significant outcome for interaction between the factors
and, at the outspread interaction for time and gender factor effect evaluation at
each level, the Gompertz adjustment and a adherence test for the model were
proposed as well. After suiting the model for the weight data, a comparison
study among parameter curves for males and females was also made. As the
result shows, the univariate model with split-plot design can be used in trials
involving animal growth. However, its application is subject to a examination
of the sphericity condition. The incorporation of the Gompertz model at split-
ting interactions is also a viable method and enabled the evaluation of real qual-
ity at the model adjustment applied to the data. Also, the comparison among
parameter from adjusted curves showed that males and females have statis-
tically identical values for the parameters; both related to the animals birth
weight. The female maximum weight expected (40.7 kg) is statistically lower
than was found for males (57.3 kg). However, their growth rate (0.011 kg / day
for females) is greater than the males (0.007 kg / day for males), i.e., females
reached weight stabilization faster than males.

131
CLAPEM 2014 Universidad Nacional de Colombia

Bootstrap techniques for estimating order


statistics with spatially correlated data:
The case of PM10 in Bogot, Colombia

Juan Carlos Espinosa Moreno


Universidad nacional de Colombia
(Fabin Camilo Becerra Ochica, Universidad
Nacional de Colombia
Ramn Giraldo Henao, Universidad Nacional de
Colombia)

Abstract
The bootstrap is a resampling method for statistical inference. It is commonly
used to estimate confidence. The literature on the bootstrap is extensive. The
bootstrap methods are usually focused in samples of iid variables. However
in many fields of statistic we must to deal with observations that are not iid.
These include regression problems, temporally or spatially correlated data, and
hierarchical problems. In this work we use resampling methods for spatially
dependent data. In particular our interest is to carry out estimation of some
order statistics of the PM10 distribution based on data of this variable recorded
at several monitoring stations of Bogot, Colombia. We assume that the PM10
data are realizations of a Gaussian random field. Our interest is calculating
confidence intervals for the parameters by using resampling methods which
has take into account the spatial dependence between data. In a first steps of the
analysis the behavior of the methodology proposed is studied by using simu-
lated data. We simulate several realizations ofa Gaussian random field. With
data of each simulation we use geostatiscal methods for estimating the spatial
covariance structure which is involved in the Bootstrap estimation procedure.
At the end of the work we show the results of applying this methodology to the
real data set considered.

132
CLAPEM 2014

Detecting Stationary Intervals For Random


Waves Using Time Series Clustering

Carolina Eun
Centro de Investigacin en matemticas, A.C,
(CIMAT)
(Ortega Joaqun, Centro de Investigacin en
matemticas, A.C, (CIMAT); Alvarez Esteban
Pedro C., Universidad de Valladolid, Spain).

Abstract
The problem of detecting changes in the state of the sea is very important for
the analysis and determination of wave climate in a given location. Wave mea-
surements are frequently statistically analyzed as a time series, and segmenta-
tion algorithms developed in this context are used to determine change-points.
However, most methods found in the literature consider the case of instanta-
neous changes in the time series, which is not usually the case for sea waves,
where changes take a certain time interval to occur. We propose a new seg-
mentation method that allows for the presence of transition intervals between
successive stationary periods, and is based on the analysis of distances of nor-
malized spectra to detect clusters in the time series. The series is divided into
30-minutes intervals and the spectral density is estimated for each one. The
normalized spectra are compared using the Total Variation distance and a
hierarchical clustering method is applied to the distance matrix. The informa-
tion obtained from the clustering algorithm is used to classify the intervals as
belonging to a stationary or a transition period We present simulation studies
to validate the method and examples of applications to real data.

133
CLAPEM 2014 Universidad Nacional de Colombia

A piecewise deterministic Markov models


applied to modeling patient poor compliance
in the multi-IV administration case

Lisandro Fermn
University of Valparaso, Valparaso, Chile.
(Jacques Lvy Vhel, Regularity Team, INRIA
Saclay-Ile-de-France & MAS Laboratory, Ecole
Centrale Paris, France).

Abstract
We propose a particular piecewise deterministic Markov process (PDMP) to
model the drug concentration in the case of multiple intravenous doses and par-
tial compliance situation. In this context, we commonly find the problem of vari-
able time-dossing intervals. The model allows us to take into account the irregular
drug intake times. This irregularity in drug input times have to be evaluated. We
will consider random drug input times and we will study the randomness of drug
concentration generated by partial compliance to multiple intravenous doses.
We derive some probability results on the stochastic dynamics using the PDMP
theory, focusing on two aspects of practical relevance: the variability of the con-
centration and the regularity of its probability distribution.

Outliers detection based on


functional data analysis

Valeria Fonseca Daz


Universidad Nacional de Colombia
(with Juan Sebastian Hernandez Hernandez,
Universidad Nacional de Colombia, Alvaro
Mauricio Montenegro Daz, Universidad Nacional
de Colombia).

Abstract
In this study a non-parametric statistical methodology is introduced to detect
outliers related to quality water variables measured in Colombia. The main

134
CLAPEM 2014

purpose of this work is to create confidence bands allowing outliers detection


inside potential unusual curves. The method consists of three phases: Firstly,
search for the best curve fitt through cubic splines; secondly, identification of
unusual curves based on depth measures; lastly, either confidence bands esti-
mation by bootstrap or computation of detection bands from observed data
quantiles, to detect outliers. Simulations have been run to validate both pro-
cedures and the complete method was applied over a quality water dataset by
IDEAM from 2005 to 2012.

Study and implementation of a growth


function adapted for the species
Schizolobium amazonicus (aka Paric)

Rodrigo Cesar Freitas da Silva Federal


University of Par, Brazil.
(Prof. Dr. rer. nat. Joao Marcelo Brazao Protazio,
Federal University of Par, Brazil).

Abstract
Even today it is an established fact that the social, economic, medicinal and
productive potential of forest resources in the Amazon are still unknown. This
is a fact and it is necessary to invest in essentially local human material in order
to train technicians with vast knowledge of the reality of the Amazon in order
to maximize the sustainable use of its natural resources. Therefore, it is essen-
tial to create technologies for the Amazon that will mitigate the formation of a
technical and theoretical tools, geared to their real needs. Even if we are aware
of this reality, almost nothing has been done for decades to minimize this prob-
lem. It is common to see it used on the Amazon and no adaptation technologies
that were developed exclusively for other realities and thus, inducing some-
times adverse to our reality. Therefore, the main goal we have to create a model
based on individuals (IBM Model) that is capable of simulating the dynamic
specific growth species from native forests of northern Brazil and do not copy
pre-existing models adapted only to other realities. We hope, that the imple-
mentation of this model will add knowledge to our region and he also manages
other discussions on our needs. In this work, we present a model implemented
specifically for Schizolobium species parahyba var. Amazonicus (aka Paric),
which is a species of great economic importance to the northern region of Bra-
zil. In the state of Par, in particular, already accounts for over 40.

135
CLAPEM 2014 Universidad Nacional de Colombia

Bayesian analysis of augmented


simplex regression models for
longitudinal proportional data

Diana Milena Galvis Soto


UNICAMP-Brazil.
(Vctor Hugo Lachos Dvila, UNICAMP-Brazil,
Dipankar Bandyopadahyay, University of
Minnesota, USA).

Abstract
Proportional continuous data can be found in areas such as biological sciences,
health, engineering, etc. This type of data ranges between zero and one (0, 1)
and for its analysis distributions such as logistic-normal (Aitchison and Sheng,
1980), beta, beta-rectangular (Hann, 2008), simplex (Barndorff-Nielsen and
Jorgensen, 1997), among others, have been typically used. However, in practi-
cal situations, it is possible to observe, proportions, rates or percentages being
zero, one or both values and, thus, the previous mentioned distributions can-
not be used. In this work, motivated by the flexibility of simplex distribution,
we propose a three-part mixture distribution, with degenerate point masses at
0 and 1, obtaining a new distribution with support in the interval [0, 1], which
will be called the zero-one augmented simplex (ZOAS) model. For our analy-
sis, we adopt a Bayesian framework and develop a Markov chain Monte Carlo
algorithm to carry out the posterior analyses. The marginal likelihood is trac-
table, and utilized to compute not only some Bayesian model selection mea-
sures but also case-deletion influence diagnostics based on the q-divergence
measured (Csisz, I. et al., 1967). The newly developed procedures are illustrated
with a simulation study as well as application to a real dataset from a clinical
periodontology study. The empirical results shown the gain in model fit and
parameter estimation over other alternatives, and provide quantitative insight
into assessing the true covariate effects on longitudinal proportion responses.

136
CLAPEM 2014

Methods of constructing copulas

Stefan Alberto Gmez Guevara


Universidad Distrital Francisco Jos de Caldas
(Joan Jess Amaya, Universidad Distrital Francisco
Jos de Caldas, David Andrs Paloma, Universidad
Distrital Francisco Jos de Caldas).

Abstract
We present the main concepts and definitions of the theory of bivariate copulas
and we present and illustrate several general methods of constructing bivari-
ate copulas, In the inversion method, we exploit Sklars theorem to produce
copulas directly from joint distribution functions. Using geometric methods,
we construct singular copulas whose support lies in a specified set.

Topological combination of distributed


and autonomos sources to identify
patterns in times series

Ana Mara Gomez Lamus


(Fundacin Universitaria Los Libertadores)
Salas Fuentes Rodrigo (Universidad De Valparaso).

Abstract
The increasing rate of data generation and storage from distributed and autono-
mous sources are introducing a big scientific and technological challenge. For
example air pollution monitoring from several sources located within the city.
New analysis methodologies are needed to obtain relevant information by
implementing techniques with adequate results [1]. In this work we propose the
development of a technique to analyze air pollution data by using vector quan-
tization. This technique will help to identify clusters by preserving the intrinsic
topology of the data, this clusters are obtained by similarity or correlation from
distributed sources avoiding missing relevant information[3]. The proposal
was implemented to analyze the daily registries of PM10 contamination, data
acquired from monitoring stations located in the Metropolitan area of Santi-

137
CLAPEM 2014 Universidad Nacional de Colombia

ago: Las Condes, Pudahuel y Parque OHiggins, in the year 2009. The proposal
was called ARFIMA-SOM and consists of four stages. The first stage we use the
autoregressive fractionally integrated moving average (ARFIMA)[2] model to
identify the structural properties of stationarity, tendency, and periodicity to
analytically describe the local behavior of the time series for each source. In the
second stage, the information given by the simple and partial autocorrelation
function is used to infer about the structure of the stochastic process. With this
information the characteristic vector is built and used as an input for the self
organizing maps (SOM) models. With the SOM, the most relevant topological
patterns are identified[4], in other words, the similar daily pollution behaviors
are found. In the third stage, for each cluster, we model the time series found
by the SOM that are similar. Finally, in the fourth stage, the critical contamina-
tion episodes are identified, where common spatial are temporal pattern of high
pollution are detected. Simulation results show how our proposal of ARFIMA-
SOM allows us to detect pattern of high level of PM10 pollution based on the
topological combination of time series patterns.

References
1. Caragea D. y Reinoso J. 2005. Statistics Gathering for Learning from Dis-
tributed, Heterogeneous and Autonomous Data Sources, Artificial Intelli-
gence Research Laboratory, IOWA State University, Ames, IA 50011-1040.

2. Contreras-Reyes J., Palma W. 2012. Statistical Analysis of autoregressive


Fractionally Integrated Moving Avarage Models. arXiv:1208.1728v1.

3. Gmez A., Salas R. y Veloz A. 2012. Combining Self-Organizing Maps for


Distributed Vector Quantization. XII Latin American Congress of Proba-
bility and Mathematical Statistics. Via del Mar, Chile. March 26-30.

4. Salas R., Saavedra C., Allende H. y Moraga C. 2011. Machine Fusion to


Enhance the Topology Preservation for Vector Quantization Artificial Neu-
ronal Networks, Pattern Recognition Letters, ELSEIVER, Vol 32, 962-972.

138
CLAPEM 2014

Comparing the Markov order


estimators Aic, Bic And Edc

Ctia R. Goncalves
(Universidade de Braslia, Brasil)
Dorea Chang C. Y. (Universidade de Braslia,
Brasil), De Resende Paulo A. A. (Universidade de
Braslia, Brasil).

Abstract
In the framework of nested hypothesis testing several alternatives for esti-
mating the order of a Markov chain have been proposed. The AIC, Akaikes
entropy-based information criterion, constitutes the best known tool for model
identification and has had a fundamental impact in statistical model selection.
In spite of the AICs relevance, several authors have pointed out its inconsis-
tency that may lead to overestimation of the true order. To overcome this incon-
sistency, the Bayesian information criterion, BIC was proposed by introducing
in the penalty term the sample size and it is a consistent estimator for large
samples. A more general approach is exhibited by EDC, efficient determination
criterion, that encompass both AIC and BIC estimates. Under proper setting
the EDC, besides being a strongly consistent estimate, is an optimal estimator.
These approaches are briefly presented and compared by numerical simulation.
The presented results may support decisions related to estimators choice.

Generalization of Kaplan-Meier estimator


for survival analysis for fuzzy lifetime

Jos Alejandro Gonzlez Campos


Universidade Estadual de Campinas, Brasil.
(Vctor Hugo Lachos Dvila, Universidade Estadual
de Campinas, Brasil).

Abstract
The limit product estimator or Kaplan-Meier is a survival function non-para-
metric estimator, characterized by its ease of calculation and by its asymptotic

139
CLAPEM 2014 Universidad Nacional de Colombia

normality. In this paper, we propose a generalization of Kaplan-Meier estima-


tor when the failure times are considered fuzzy, overcoming the problem of
imprecise idealizations, expressed by numerical values. We present a simula-
tion, which consisted in giving a fuzzy structure to a real data set, to then com-
pare the results.

Phase transition in ferromagnetic ising


model with a cell-board external field

Manuel Gonzlez Navarrete


University of So Paulo
(Pechersky Eugene, Institute for Information
Transmission Problems of the Russian Academy
of Sciences; Yambartsev Anatoli, University of So
Paulo)

Abstract
In this paper we show the presence of a first-order phase transition for a fer-
romagnetic Ising model on with a periodical external magnetic field (pro-
2

posed by Maruani et al. ([4]). The external field takes two values h , with h > 0
, composing a cell-board configuration with rectangular cells of sides L1 L2
sites, such that total value of the external field is zero. Formally, for each n, m
integers we define

C(n, m) = {(t1 , t 2 ) 2 : nL1 t1 < (n + 1)L1 ,



mL2 t 2 < (m + 1)L2 },
then

Z+ = C(n, m),
n ,m:n+m is even
Z = 2 \ Z + .

2
Let = {1, +1} be a configuration on 2 . We study the model with a
formal Hamiltonian defined for any as

140
CLAPEM 2014

H ( ) = J (t ) (s) hs (s),
t , s s

where J > 0 , the symbol t , s denotes nearest neighbours s, t 2 , that is


| t s |=1 , and

h, ifs Z + ,
hs =
h, ifs Z .

2J 2J
The phase transition holds if h < + . Our result can be applied to obtain
L1 L2
the phase transition in the Ising antiferromagnetic with external field (see
Dobrushin [2] and Dobrushin et. al. [3]). And the phase transition for the model
studied by Nardi, Olivieri and Zahradnik in [5]. In that work the lattice 2 was
represented as a union of one-dimension sublattices (say, horizontal). The exter-
nal field is constant on every one-dimension sublattice and has different signs
on the neighboring sublattices.We used an approach based on the technique of
reflection positivity ([1], [6]). Particularly, we apply a certain key inequality which
is usually referred to as the chessboard estimate. This tool allows us to construct a
sort of the Peierls arguments to evaluate the contours probabilities.

References
[1] Biskup, M.: Reflection Positivity and Phase Transitions in Lattice Spin
Models in: Methods of Contemporary Mathematical Statistical Physics, ed.
Roman Kotecky. Springer-Verlag, Berlin, 2009.

[2] Dobrushin, R.L: The problem of uniqueness of a Gibbs random fields and
the problem of phase transition. Func. anal. and appl., 2:302-312 (1968).

[3] Dobrushin, R.L., Kolafa, J., Shlosman, S.: Phase Diagram of the Two-
Dimensional Ising Antiferromagnet (Computer-assisted proof). Comm.
Math. Phys. 102(1):89-103 (1985).

[4] Maruani, A., Pechersky, E., Sigelle, M.: On Gibbs fields in image processing.
Markov Processes Relat. Fields, 1: 419-442 (1995).

[5] Nardi, F.R., Olivieri, E., Zahradnk, M.: On the Ising model with strongly
anisotropic external field. Journ. Stat. Phys., 97: 87-144 (1999).

[6] Shlosman, S.B.: The method of reflection positivity in the mathematical


theory of first-order phase transitions. Russian Math. Surveys 41(3): 83-134
(1986).

141
CLAPEM 2014 Universidad Nacional de Colombia

Application of violin plot for circular data

Julieth Vernica Guarn Escudero


(Juan Carlos Correa Morales, Marcela Ruiz
Guzmn).

Abstract
In many fields, data are collected by angular measurements. These data provide
orientation or angles in the plane (circular data) or space (spherical data). The
circular data constitutes the simplest case of this category of data called direc-
tional data where the measure is not scalar but angular or directional. There is
a variety of graphic representations for circular data, among them we have the
rose plot, circular dot plot and boxplot for circular data. We propose a modifi-
cation to the boxplot graph for circular data to visualize the information more
clearly. This will be illustrated with real data.

Modelling Gini coefficient in Bogot


through Bayesian Beta Regression

Hugo AndresGutierrez Rojas


Universidad Santo Tomas, Colombia
(Jos Andrs Flrez Gutirrez, Universidad Santo
Tomas, Colombia; Jos Fernando Zea Castro,
Universidad Santo Tomas, Colombia)

Abstract
Gini coefficient is a well-known inequality index. In this research we estimate
it, by means of the Multipurpose Household Survey of Bogot. The resulting
estimates are design-unbiased and are used as response variable (taking values
between zero and one) in a beta regression model that incorporates informative
prior information in a Bayesian setup, where the unit of observation is defined
to be localities in Bogot.

142
CLAPEM 2014

Optimal dividend payment problem under


time ruin constraints: Exponential case

Camilo Hernndez
Universidad de los Andes
(Mauricio Junca, Universidad de los Andes.

Abstract
The idea of this work is to study a way to link the two standard problems in
the optimal dividend payment theory: the maximization of profits and the
minimization of the probability of ruin. In this paper, we study the classical
Cramer-Lundberg model with exponential claim sizes subject to a constraint
on the time of ruin (P1). This type of constraint is a feature for which a solu-
tion was unknown. In order to do so, we use Lagrange multipliers to obtain
the Lagrange Dual function which leads to an auxiliary problem (P2). For this
problem, given a multiplier, we prove the uniqueness of the optimal barrier
strategy and we also obtain its optimal value function. Finally, we present the
main theorem of the paper in which we prove that the optimal value function
of (P1) is obtained as the pointwise infimum over all optimal value functions of
the collection of problems (P2).

A model of Lake Valencia temperatures


using statistical methods

Jos B. Hernndez C.
Universidad Central de Venezuela
(Jos R. Len, Universidad Central de Venezuela)

Abstract
Water temperature plays an important role in the ecological functioning and
control of biogeochemical processes of a body of water. The objective of this
work is to identify the surface water temperature and improve understanding of
the spatial-temporal variations in Lake Valencia, as well as create a map of both
surface and deep temperatures of Lake Valencia temperatures. In this paper,
data from two surface weather stations and four thermistors located through-

143
CLAPEM 2014 Universidad Nacional de Colombia

out the lake a year starting in November 2007 through October 2008 were used.
Descriptive statistics (mean, maximum and minimum) for daily time series
as well as day and night temperatures were calculated. Wind speed and solar
radiation on the surface were also measured. A correlation between the day
and night temperatures of surface water as well as wind speeds and solar radia-
tion will be made. We define, by using the heat equation, a propagation model
temperature from the surface to the bottom. For this we do not only use the
measures, we have also integrate data from satellite images.

Multivariate intensity peaks over threshold


models: Applications to risk management

Rodrigo Herrera
Universidad de Talca
(Nikoluas Hautsch, University of Vienna, Valerie
Chvez-Demoulin), University of Laussane).

Abstract
Financial risk management has become a ubiquitous work for banks, compa-
nies and financial institutions, especially during the last subprime crisis. In a
world where globalization is constantly increasing with a dramatic increase in
available information of financial market data, it makes necessary the develop-
ment of new methodologies to describe the dynamics of these instruments. Two
of the most interesting fields which have emerged as reliable frameworks of new
methodologies are point process theory and extreme value theory (EVT). EVT
has showed its major influence in modeling of extreme risk with measures that
attempt to describe the tail of a loss distribution as are the Value at Risk (VaR)
and the Expected Shortfall (ES) (Embrechts et al., 2003; Chvez-Demoulin
et al., 2005; Herrera, 2013), while point process has been applied in different
areas of risk management as for example, portfolio credit risk, high-frequency
trading, jump-diffusion models (Russell, 1999; Hautsch, 2011). The contribu-
tion of this paper is twofold. First, we propose a framework based on marked
self-exciting point process, which captures the dynamic behavior in clusters
of extreme events. In particular, we introduce the autoregressive conditional
intensity peaks over a threshold (ACI-POT) model, which in its most basic
form corresponds to the combination of two known models; the ACI model
introduced by Russell (1999) and the POT model by Davison and Smith (1990).
The second proposed model is the multivariate extension of Hawkes-POT

144
CLAPEM 2014

model, introduced for the univariate case by Chvez-Demoulin et al. (2005)


and recently reviewed in different financial contexts (Chvez-Demoulin and
McGill, 2012; Herrera, 2013). One of major advantages of these new approaches
is that these can directly model the impact of clustering of extreme event, where
the future evolution of the process is influenced by its past history. In addition,
this class of processes generates a flexible and computationally tractable multi-
variate dependence structure, properties that are empirically well documented
(see Hautsch, 2011 and references therein). The second contribution, from a
purely empirical perspective, is to discuss some stylized facts related to the
cluster behavior of extreme events in the context of financial modeling, both
at a conceptual level and at an empirical level. To this we consider three well
investigated international stock market indexes, the DAX, the S&P 500 and the
FTSE 100 index, which exhibit these cluster features. We show that by means
of the ACI-POT and Hawkes-POT approaches, we can capture these stylized
facts and discuss some possible economic explanations for their presence in
financial time series. Chvez-Demoulin, V., Davison, A., McNeil, A., 2005. A
point process approach to value-at-risk estimation. Quantitative finance 5,
227-234. Chvez-Demoulin, V., McGill, J., 2012. High-frequency financial data
modeling using hawkes processes. Journal of banking & finance 36 (12), 3415-
3426. Embrechts, P. Klppelberg, C. and Mikosch, T. 2003. Modelling extremal
events for insurance and finance Springer-Verlag.Hautsch, N., 2011. Economet-
rics of financial high-frequency data. Springer.Herrera, R., 2013. Energy risk
management through self-exciting marked point process. Energy economics 38
(0), 64 - 76.Russell, J., 1999. Econometric modeling of multivariate irregularly-
spaced high-frequency data. Manuscript, GSB, University of Chicago.

Quantile Regression Modelling Of


Brazilian Anthropometric Data
For Small Domain Estimation

Luna Hidalgo Carneiro


(IBGE)
Silva Pedro Luis Do Nascimento (ENCE/IBGE).

Abstract
Knowledge about the anthropometry of the population is increasingly neces-
sary for guiding design and monitoring of public health policies. In Brazil, the
main official source of basic anthropometric data (height and weight) is the

145
CLAPEM 2014 Universidad Nacional de Colombia

Consumer Expenditure Survey (POF) conducted by the Brazilian Institute of


Geography and Statistics (IBGE). IBGE releases only the results of the popu-
lation median weight and height by sex and age group, with federation units
as the lowest level of geographic disaggregation. However, with the increasing
demand for such information on lower levels of geographic disaggregation, the
application of methods which can provide reliable estimates for areas where the
sample sizes are small is also on the rise. IBGE used a direct design-based esti-
mation approach to compute the estimated median of height and weight using
the data from POF. In this paper we consider the application of two alternative
methods of estimation for quantiles in small areas: linear quantile regression
and M-quantile regression. These methods were compared with the direct esti-
mation approach for estimating the 10, 50 and 90 centiles and their standard
errors by age and sex groups for some Brazilian states using data from POF
2008-2009. The linear quantile regression method provided the best estimates
in the sense of having smaller standard errors and yielding smoother centile
curves over the age groups.

Mixture of phase-type distributions

Hinojosa Adrian
Departamento de Estatstica, Universidade Federal
de Minas Gerais
(Demarqui Fbio, Departamento de Estatstica,
Universidade Federal de Minas Gerais).

Abstract
The present work aims to use the EM-algorithm for estimate the parameters of
a mixture of Phase-Type distributions. Phase-Type distributions are distribu-
tions over the positive real axis, and corresponds to the absorbing time of a con-
tinuous jump process. They were considered since the work Erlang, 1909 and
Neuts, 1959. For a review of the subject and applications see Asmussen(2000).
Two main techniques are used for parametric estimation, the EM-algorithm
was first proposed by Asmussen et al. (1996) and Bayesian estimation using
Markov chain Monte Carlo(MCMC) based approach in Bladt et al. (2003).
The mixture model that we consider was proposed by Frydman (2005), in the
context of stochastic social mobility process. This process uses the same base
generator of the process to get a collection of independent process where transi-
tions are performed at different speeds. The chains are mixed at time zero, upon

146
CLAPEM 2014

choosing the initial state. We develop for this mixture the EM-algorithm as
devised in the work of Asmussen et al.(1996) and implement it on R.

Slashed Quasigamma distribution

Iriarte Salinas Yuri Antonio


Technological Institute University of Atacama,
Chile
(with Varela Hctor, Bolfarine Heleno, Gomez
Hctor W, Technological Institute University of
Atacama, Chile).

Abstract
This paper introduces an extension of a subfamily of the generalized gamma
Distribution. The extension is denominated slashed quasi-gamma distribution
and is defined as the quotient of a random variable gamma (numerator) and a
random variable with uniform distribution in the denominator. The extension
is directed at making the generalized gamma distribution more flexible in rela-
tion to its kurtosis. Maximum likelihood estimation is implemented for param-
eter estimation. Results of a real data application reveal good performance in
applied scenarios.

Keywords: Gamma distribution, generalized gamma distribution, slash distri-


bution.

147
CLAPEM 2014 Universidad Nacional de Colombia

Risk factors relating to cases


of death from AIDS

Edson Marcos Leal Soares Ramos


University of Par, Brazil
(Franciely Farias da Cunha,University of Par,
Brazil; Adrilayne dos Reis Arajo, University of
Par, Brazil).

Abstract
AIDS, since its discovery, constituted itself as an illness that oversteps the
bounds of biomedical dimension: characterized as an incurable clinical pathol-
ogy, which leads to death, also entering the psychological and social fields. This
means that the experience of illness is loaded with prejudice, discrimination,
fear, violence, loneliness, uncertainty, unemployment, poverty, prostitution,
and gender inequalities. It is therefore an important public health problem of
major proportions. Thus, the aim of this study is to model the cases of death
from AIDS in the State of Par, Brazil. For this, we used statistical techniques
exploratory data analysis and binary logistic regression. From the exploratory
data analysis, it is emphasized that the majority of AIDS patients are women
who have contracted the virus by a sexual relationship with infected men, it
was also noted that the majority of patients did not complete elementary school.
Through the binary logistic regression, we found that patients with AIDS who
were with tuberculosis, have two times more likely evolve to death compared
to patients without this complication. The anemic patients are four times more
likely evolve to death compared with non-anemic patients. Patients who already
showed signs of diarrhea have 88% more chance of evolving death compared to
patients who did not present this complication. The values of the parameter
estimates of the variables: presence of tuberculosis, anemia and diarrhea were
significant at the 5%. Thus, AIDS related complications become risk factors for
death of the patient.

148
CLAPEM 2014

Sensitivity analysis for variance


parameters in Bayesian simplex mixed
models for proportional data

Freddy Omar Lpez Quintero


Universidad Tcnica Federico Santa Mara, Chile.

Abstract
We present the Bayesian modeling for response variable restricted to the inter-
val (0,1), such as proportions and rates, using the simplex distribution for the
case which data have a longitudinal form and taking into account random
effects. We consider homogeneous and heterogeneous structures for submodels
of dispersion parameter and investigate, by using sensibility analysis, the effect
of five different prior distributions for variance parameters of random effect, on
the final estimation. Models are illustrated with simulated and real data.

Modelling the homoscedastic growth


curve for item response data

Tavares Madruga Maria Regina


Federal University of Par
(Heliton Ribeiro Tavares, Federal University of Par
/ Vunesp Foundation).

Abstract
It is very frequent in evaluations systems the situation where different groups
of examinees are evaluated and their ability distribution parameters estimated
on the same metric scale. In this work we aim to model the possible growth
of the mean parameters of the ability distributions of groups of examines
when the variances of these distributions are equals, but unknown,and the
items are already calibrated in a large sample study.We consider that K differ-
ent groups of individuals are appraised on a certain area of knowledge, taking
tests denominated Test 1, Test 2,..., Test K, respectively. A sample of Nk subjects
from population k takes test k composed of nk items. The total number of items

149
CLAPEM 2014 Universidad Nacional de Colombia

will be denoted by n, satisfying n k=1nk . We will consider that the groups


K

involved in the analysis correspond to the series t = (t1 , t 2 ,, t K ) . If this series


is sequential, we can adopt t1 =1, t 2 = 2,, t K = K . Let us represent by k the
mean ability of the population taking the test k, k =1,, K . In general, let

k = f (t k | b),

be a twice differentiable continuous function with p parameters and


b = (1 , 2 ,, p ) the vector of parameters of the function. The mean vec-
tor for the K levels can now be rewritten as

b = ( f (t1 | b), f (t 2 | b),, f (t K | b)) .

Several regression models can be proposed to represent the true relationship


between the population ability parameters and some explanatory variable,
such as time or grade, for instance. The estimating equations considering the 3
parameter logistic model(LM3) and some simulation results are presented for
the parameters of the polynomial, logistic and Gompertz models and for the
dispersion parameters. The average estimates are very close of the true values,
proving the effectiveness of the estimating process the growth curve parame-
ters. An application to real data obtained from a study of Ministry of Education
of Brazil is presented too.

Nonparametric Regression With


Noise Derived From A Poisson
Process And Brownian Motion

Lorena Mansilla
Universidad de Valparaso
(Torres Soledad, CIMFAV-Universidad de
Valparaso; Viens Frederi, Purdue University).

Abstract
This work deals with parameter estimation of the biomass of a biological species,
in which organisms interact with others. In 2012, Kefi et al, define a determin-
istic mathematical model for the biomass in terms of trophic and non-trophic
species interactions. We generalize this work developing a stochastic model,

150
CLAPEM 2014

given by a nonparametric regression, where the noise is driven by a Poisson


process and an independent Brownian motion. We use the well known Nada-
raya -Watson estimator, for which we prove the consistency in this case. Finally,
we illustrate the estimation in a real data set, corresponding to the biomass
of three species: Concholepas concholepas (Martyn, 1784) (crazy), Fissurella
latimarginata (black lapa) and Lessonia nigrescens (huiro negro) from north-
ern Chile, that was facilitated by the Subsecretaria de Pesca y Acuicultura in
Chile (SUBPESCA). Keywords: Brownian motion, Poisson process, nonpara-
metric regression, Nadaraya Watson

A new model for dependent financial data

Carolina Marchant
Universidad Federal de Pernambuco
(Vctor Leiva, Universidad de Valparaso, Helton
Saulo, Universidade Federal do Rio Grande do Sul).

Abstract
The Birnbaum-Saunders distribution is receiving considerable attention due
to its good properties. One of its extensions is the family of scale-mixture
Birnbaum-Saunders distributions, which shares its good properties, but it also
has further properties such as robust estimation. The autoregressive conditional
duration model is the primary family to analyze high-frequency financial trans-
action data. We propose a methodology based on new SBS autoregressive condi-
tional duration models. This methodology includes parameter estimation by the
EM algorithm, inference for these parameters, in-sample and out-of-sample fore-
cast techniques and a residual analysis. We prove the robustness of the estimation
procedure and we carry out a Monte Carlo study to evaluate its performance. In
addition, we assess the practical usefulness of this methodology by using real-
world data of financial transactions from the New York stock exchange.

151
CLAPEM 2014 Universidad Nacional de Colombia

Asymptotic properties of maximum


likelihood estimators based on progressive
type-II censoring under weak conditions

Daniele S. Baratela Martins Neto


Universidade de Braslia, UnB.

Abstract
The aim of this work is to establish consistency and asymptotic normality
of maximum likelihood estimators based on progressively Type II censored
samples according to the same lines as in LeCan (1957). In this study, weaker
conditions than those given by Lin and Balakrishnan (2011) are assumed. The
proposed model relaxes the assumption of existence of the third derivative of
the logarithm of the density function.

Comparisons of estimation methods


for the parameters of Marshall-Olkin
extended exponential distribution

Josmar Mazucheli
Universidade Estadual de Maring.
(Francisco Louzada Neto, Universidade de So
Paulo, Mohamed E. Ghitany, Kuwait University).

Abstract
The aim of this work is to compare through Monte Carlo simulations, the
finite sample properties of the estimates of the parameters of the Marshall-
Olkin extended exponential distribution obtained by ten estimation methods:
maximum likelihood, modified moments, L-moments, maximum product of
spacings, ordinary least-squares, weighted east-squares, percentile, Crmer-
von-Mises, Anderson-Darling and Right-tail Anderson-Darling. The bias, root
mean-squared error, absolute and maximum absolute difference between the
true and estimated distribution functions are used as the criterion for compari-
son. The simulation study concludes that the L-moments and maximum prod-

152
CLAPEM 2014

ucts of spacings methods are highly competitive with the maximum likelihood
method in small and large samples.

Prediction of mutual funds using


distance-based beta regression

scar O. Melo
Universidad Nacional de Colombia.
(Carlos E. Melo, Universidad Distrital Francisco
Jos de Caldas, Sandra E. Melo, Universidad
Nacional de Colombia).

Abstract
In the context of regression with a beta-type response variable, we propose a
new method that links two methodologies: a distance-based model, and a beta
regression with variable dispersion. The proposed model is useful for those sit-
uations where the response variable is a rate or a proportion, which is related
with a mixture between continuous and categorical explanatory variables. We
present its main statistical properties and some measures for selection of the
most predictive dimensions for the model. A main advantage of our proposal
is that it is quite general because we only need to choose a suitable distance for
both the mean model and the variable dispersion model depending on the type
of explanatory variables. Furthermore, the mean and precision predictions for
a new individual and the problem of missing data are also developed. Rather
than removing variables or observations with missing data, we use the distance-
based method to work with all data without the need to fill in or impute missing
values. Finally, an application of mutual funds is presented using the Gower
distance for both the mean model and the variable dispersion model. This
methodology is applicable to any problem where estimation of distance-based
beta regression coefficients for correlated explanatory variables is of interest.

153
CLAPEM 2014 Universidad Nacional de Colombia

Option pricing using Gaussian distributions

Mauricio Molina
Universidad Nacional de Colombia
(Jos Alfredo Jimnez, Universidad Nacional de
Colombia; Juan David Pulgarn, Universidad
Nacional de Colombia).

Abstract
The Black-Scholes model is widely used in stock markets due to its simple
implementation. But there is good empirical evidence to prove that the under-
lying stock distribution is not lognormal. One of the reasons for this is that the
volatility is not constant, thus, it is necessary to consider a distribution with
heavier tails in order to improve the pricing of European options. Bahra (1997)
considers that the underlying distribution is a mixed lognormal distribution,
with five parameters. In this case, it is easier to adjust these parameters to the
implied risk neutral distribution RND, making the model more feasible to pres-
ent negative skewness and kurtosis, and easy to apply because the pricing for-
mula results in a convex combination of two Black-Scholes prices. In this work,
we proposed a linear transformation of the underlying asset introducing two
new parameters: location and scale, and a new formula was obtained, similar
to classic mixed pricing formula, that was also a convex combination but not
independent of one another, making it more suitable to the stock data and also
easy to find parameters to adjust the risk neutral distribution. The Black-Scho-
les model can be obtained and adjusted when the asset distribution is bimodal.
There are differences between the classic mixed model and this model. We pres-
ent numerical results, showing the difference of three models Black-Scholes,
mixed Black-Scholes, and the proposed model.

154
CLAPEM 2014

A model selection criterion for the


segmentation of symbolic sequences

Bruno Monte de Castro


Universite of So Paulo.

Abstract
The sequence segmentation problem aims to partition a sequence or a set of
sequences into a finite number of segments as homogeneous as possible. In this
work we consider the problem of segmenting a set of random sequences with
values in a finite alphabet A into a finite number of independent blocks. We
suppose also that we have m independent sequences of length n, constructed by
the concatenation of s segments of length l *j and each block is obtained from
l*
the distribution pj over A j ,; j =1,, s . Besides we denote the real cut points
by the vector f k * = (k1* ,, ks*1 ) , with ki* = j=1l *j , i =1,, s 1 , these points
i

represent the change of segment. We propose to use a penalized maximum


likelihood criterion to infer simultaneously the number of cut points and the
position of each one those points. We also present a algorithm to sequence seg-
mentation and we present some simulations to show how it works and its con-
vergence speed. Our principal result is the proof of strong consistency of this
estimators when m grows to infinity.

155
CLAPEM 2014 Universidad Nacional de Colombia

A spatio-temporal study of
maximum extremes rainfalls in
Guanajuato State, Mexico.

Moreno Leonardo
(Facultad de Ciencias Econmicas, UDELAR,
Montevideo, Uruguay)
(Ortega Snchez Joaqun, Centro de Investigacin
en Matemticas, CIMAT, Guanajuato, Mxico)

A topic of current interest is extreme climatic events. An increasing concern


with the climate variability due to the large impact that it causes on the popula-
tion and the Economy is perceived. The aim is to establish a space-time model
for the values of daily rainfall in the State of Guanajuato, Mexico, from meteo-
rological stations located in the region. While the natural path for spatial exten-
sion of the theory of extremes value are max-stable processes, inference on that
family of processes is currently inflexible and it has a large computational cost.
Lack of space stationary is explored. Finite dimensional fit through extreme
copulas and its extension to higher dimensions is provided by regular vines.
This research shows the close relation between the finite dimensional distribu-
tions of max-stable processes and extreme copulas. Predictions are compared
using different max-stable processes and r-vines, where the last provide a pos-
sible solution to the problem of non-stationarity. New conclusions about the
behavior of maximum rainfalls in the region are stated. Global predictions
show that severe flooding could in the State affect.

156
CLAPEM 2014

A Wald test for the significance of


inputs in an artificial neural network
based on forecast combination

Johanna Marcela Orozco-Castaeda


Universidad Nacional de Colombia-Sede Medelln.
(Juan David Velsquez Henao, Universidad
Nacional de Colombia).

Abstract
In the context of combination of time series forecasts, it is useful to determine
whether a certain forecast incorporates all the relevant information in compet-
ing forecasts. The reason for this is because we may know whether compet-
ing forecasts may be fruitfully combined. In the linear forecast combination
approach, for example, we can use the forecast encompassing hypothesis tests
to show if one forecast of time series in the combination is encompassed by the
competing forecasts (Newbold and Harvey, 2007). Nevertheless, in nonlinear
combination of time series forecasts there are no tests for forecast encompass-
ing. In this work, we develop and apply a statistical test based on Wald-statistics
in order to prove the significance of individual input variable in a nonlinear
specification. We use a Regression Neural Network (RNN) model with one hid-
den layer as a system of combining forecasts of time series, where the input
variables are the different forecasts generated from several methods, in order
to obtain a combined forecast with more accuracy than the individual fore-
cast with respect to some criterium. The Wald-statistic is proposed by authors
as an alternative way to prove the significance of parameters related to each
input variable. This test can also be viewed as a model selection strategy based
on statistical concepts, see Anders and Korn (1999). We show an application
of the Wald test to prove forecast encompassing in the nonlinear combination
produced from the RNN. We consider a nonlinear regression RNN with one
hidden layer having a functional form given by

Yt = f (X t ; ) + t (5.9)

where
q
f (X t ; ) = 0 + X t' + jG(g 0 j + X t' g j ) (5.10)
j=1

Where is the parameter vector containing all , and , and G is the


1
logistic function, G(Z ) = , Yt is the response variable, f (X t ; ) is
1 + eZ
157
CLAPEM 2014 Universidad Nacional de Colombia

a nonlinear function of real value, X t is a vector with p explanatory vari-


ables, X t = (Yt 1 ,...,Ytp ) , and t is the error between the observed Yt and
what the model predicts. In the neural network literature, this specification
contains p input units in the input layer (each corresponding to an input
variable Yti ), q units in the hidden layer with the jth hidden unitactivation
G(g j 0 + X t' g j ) , and one output unit in the output layer with the output value
o = 0 + X t' + j=1 jG(g j 0 + X t' g j ). An ANN model in that form is called
q

augmented single layer network, see Kuan and White (1994). The full vector
of parameters, = (o , ', ', g', g o' ) contains p + 1 + q( p + 2) parameters, where

' = (1 ,... p )

' = ( 1 , 2 ,... q )

g' = (g1' , g1' ..., g q' ) ; each g 'j = (g j1 ,..., g jp ) and j =1,..., q

g o' = (g 01 , g 02 ,..., g 0q )

In order to know if an input in the model (1), it is important to achieve the


best local approximation possible to establish the irrelevant variables hypoth-
esis as a restriction of the form H 0 : Si = 0 against H1 : Si 0 , where Si is
the (q + 1) ( p + 1 + q( p + 2)) selection matrix, that selects relevant elements
i and i , j , i =1,..., p .

Finite mixture of extreme value distributions:


identifiability and estimation

Guevara Otiniano Cira E.


University of Brasilia (EST)
(Gonalves Ctia. R., University of Brasilia (MAT),
Dorea Chang C. Y., University of Brasilia).

Abstract
The class of all finite mixtures of generalized extreme value distributions is
proved to be identifiable. In addition, the estimates of the unknown parameters
of the mixtures, are obtained via the EM - Algorithm. The performance of the
estimates are tested by Monte Carlo simulation.

158
CLAPEM 2014

Stochastic modeling for aggregation-


breakage processes with Fourier basis

Daniel Paredes
Universit Paul Sabatier Toulouse III, Institut de
Mathmatiques de Toulouse
(Gamboa Fabrice, Universit Paul Sabatier Toulouse
III, Institut de Mathmatiques de Toulouse; Guerin
La, Universit de Toulouse; INPT, UPS Laboratoire
de Gnie Chimique).

Abstract
Research in particulate systems often requires the solution of a population
balance equation which is written in terms of the number density function.
Besides, the number density function is defined in terms of internal coordi-
nates (e.g. particle size and particle morphologic coordinates) and it generates
integral and derivative terms. Different methods exist for solving numerically
the population balance equation. This methods are often computationally
expensive and they loose efficiency when they are applied to multivariate func-
tions (often the number density function considers just one particle size coor-
dinate like length or volume). Our aim is to search a method for solving this
kind of population balance equations for aggregation and breakage processes,
considering multivariate number density functions in terms of particle size and
morphological coordinates using Fouriers Basis. Keywords: Stochastic Model-
ing, Aggregation-Breakage Processes, Method of Moments, Fouriers Basis.

159
CLAPEM 2014 Universidad Nacional de Colombia

On the distribution of explosion time


of stochastic differential equations

Liliana Peralta Hernndez


Departamento de Control
Automtico Cinvestav-IPN
(Dr. Jorge A. Len, Departamento de Control
Automtico Cinvestav-IPN; Dr. Jos Villa-Morales,
Departamento de Matemticas y Fsica Universidad
Autnoma de Aguascalientes).

Abstract
In this talk, we will discuss the blow-up in finite time of stochastic differential
equations driven by a Brownian motion. In particular, we talk about extensions
of Osgood criterion, which can be applied to some nonautonomous stochastic
differential equations with additive Wiener integral noise.

Some approximations of fractional


Brownian motion

Wilmer Pineda
Fundacin Universitaria Los Libertadores
(Isaac Zainea, Universidad Central).

Abstract
Fractional Brownian motion has been used successfully to model a variety
of natural phenomena. Despite the great impact generated by this motion,
the fractional Brownian motion is never a semimartingale except when it is
the classical Brownian motion corresponding to H = 1/2, also the fBm isnt a
Markov process. Hence, the approximation of the fractional Brownian motion
by other classes of stochastic processes is important. In this talk, we will dis-
cuss different ways to approach the fractional Brownian motion, including the
approximation by martingales, semimartingales, Poisson processes and ran-
dom walks. Also, we discuss the kind of convergence and the advantages and
disadvantages of these approaches.

160
CLAPEM 2014

Income distribution in developing


countries -Colombia as an example

Mara Nubia Quevedo Cubillos


Universidad Militar Nueva Granada.

Abstract
I perform a statistical analysis of the income distribution in the Colombian
society and compare the results with those obtained in other societies. In par-
ticular, I show that the Colombian society is characterized by the presence of a
phase with a Boltzmann-like distribution, which includes most of the popula-
tion, and a Pareto-like distribution, which involves individuals with highest
incomes. In addition, I propose to interpret these results in the context of geo-
metro thermo dynamics to understand the phase transition structure of both
distributions.

Estimation and comparison of energy price


using time series and stochastic differential

Carlos Alberto Ramirez Vanegas


Universidad Tecnolgica de Pereira
(Poveda Yuri Alexander, Universidad Tecnolgica
de Pereira; Mora Ceballos Carlos Arturo,
Universidad Tecnolgica de Pereira).

Abstract
The price of the electric energy in the Colombian wholesale market is a ran-
dom variable with high volatility. For this reason, since the entry into opera-
tion of this market in 1994, several models have been proposed to represent its
behavior and estimation. It is common to find different models to estimate the
price of electricity in the scientific literature. These tools are based on different
premises to model these prices. However, the characteristics of the Colombian
electricity market (highly hybrid, with exportation commitments, etc.) do not
allow applying the techniques reported in other contexts in the country. In this

161
CLAPEM 2014 Universidad Nacional de Colombia

paper, two methodologies are proposed, stochastic differential equations and


(Ornstein-Uhlenbeck model) and time series, using a data sample from the
Colombian electric system.

Identifying hierarchical structures in network


data using nonparametric mixtures

Pedro Regueiro
University of California, Santa Cruz
(Abel Rodrguez, University of California, Santa
Cruz)

Abstract
The class of Bayesian stochastic blockmodels has become a popular approach
to relational data. This is due, in part, to the fact that inference on structural
properties of networks follows naturally in this framework. Here, we propose a
Bayesian multiscale stochastic blockmodel to identify and study possible hier-
archical organization on network data. The model utilizes a prior for the com-
munity structure closely related to the nested Chinese restaurant process. We
use a latent variable augmentation scheme to develop a Markov chain Monte
Carlo algorithm that allows to fit this model. Illustrations are provided both
through simulated and real datasets.

162
CLAPEM 2014

Approximate Bayesian inference


for the Rosenblatt distribution

Laura Rifo
University of Campinas
Andrade, P. University of So Paulo

Abstract
The Rosenblatt distribution is a one-parameter family arising from a non-
Central limit theorem for long-range dependent random variables. This family
includes the standard normal distribution, the standardized chi-squared dis-
tribution, and weighted sums of chi-squared variates. Its analytical form is not
manageable, and its moments, cumulants and empirical distribution have just
recently been numerically studied. We apply a Bayesian likelihood-free meth-
odology to obtain inferences for that family, comparing the performance of
some statistics.

Testing the presence of heteroscedasticity


in unobserved component models

Alejandro Rodrguez
Universidad de Talca, Chile.
(Guillermo Ferreira, Universidad de Concepcin,
Chile).

Abstract
In the context of time series analysis, conditional heteroscedasticity has an
important effect on the coverage of prediction intervals. Moreover, when pre-
diction intervals are constructed using unobserved component models (UCMs),
the problem increases due to the possible existence of several components that
may or may not be conditional heteroscedastic, and consequently, the true cov-
erage depends on the correct identification of the source of the heteroscedastic-
ity. Proposals for testing homoscedasticity have been applied to the auxiliary
residuals of the UCM; however, in most cases, these procedures are unable, on
average, to identify the heteroscedastic component correctly. The problem is

163
CLAPEM 2014 Universidad Nacional de Colombia

associated with the transmission of heteroscedasticity between the auxiliary


residuals, which may generate the incorrect identification of heteroscedasticity
in the component with constant conditional variance. In this article, we make
two main contributions. First, we propose a non-parametric statistic for test-
ing homoscedasticity. We study the asymptotic validation of the statistics and
consider bootstrap procedures for approximating its finite sample distribution.
Second, we focus on eliminating the transmission and then, using the auxiliary
residuals, to identify the conditional heteroscedastic components correctly. In
this sense, the simulation results show an improvement in the power and the
size of the homoscedasticity tests.

Jump process with memory of variable length

Douglas Rodrigues Pinto


USP

Abstract

Stochastic chains with memory of variable length were introduced by Ris-


sanen (1983) and are currently an important subject of research due to their
applications in various fields such as linguistics, genetics and neuroscience, for
example. Let Yt a jump process with memory of variable length associated with
(au, p). Our main objective is to study the behavior of the process and propose
estimators for the process parameters such as the rate q:au (0, ) and the
contexts tree au associated with the process.

164
CLAPEM 2014

Living conditions, types of households


and families - A study on the living
conditions and wellfare stat of elderly in
agrarian reform settlements, Sp, Brazil

Janice Rodrigues Placeres Borges


(CCA/UFSCAR) (Universidade Federal de So Carlos
(DTAiSER - CCA/UFSCar), Araras/SP - Brasil)
(Sartorio Simone Daniela (CCA/UFSCAR),
Universidade Federal de So Carlos (DTAiSER -
CCA/UFSCar), Araras/SP Brasil).

Abstract
This study aimed to contribute to research on households, families and living
conditions, touching on a topic still lacking studies: living conditions, family
arrangements and characteristics of households in the rural population, as an
empirical benchmark elderly of rural settlements in the region of Ribeiro Preto,
SP, Brazil, where he found a significant percentage of elderly in two settle-
ments in Sao Paulo agrarian reform: Monte Alegre and Guarani. If contacts
that through the analysis of questionnaires, from, only, the univariate descrip-
tive analysis fails to observe the different forms of associations / relationships
between three or more variables. This situation can be resolved using multi-
variate techniques. The aim of this work was to study, by means of multivari-
ate techniques, living conditions, family arrangements and characteristics of
households of elderly rural population. Data were derived from a field survey
with application of closed questionnaire, consisting of thematic blocks in 355
households. Because the data collected are categorical responses, the Multiple
Correspondence Factor Analysis ( MCFA ) was applied to identify associa-
tions. This technique aims to group highly correlated variables, with a result-
ing reduction in the number of predictor variables in the model. Results and
Discussion: The axes generated by FHH showed a satisfactory contribution of
the variables under study to identify the most important relationships between
them, and hence identify the groups. 59 % of the total settlement in Monte
Alegre, and 65 % of the total, the Guarani settlement, were headed by individu-
als 60 years or older, mostly male. Regarding marital status, the highlights were
the percentages of married, the two areas. The collected data also revealed that
the vast majority belonged to complete nuclear families of old or families. Also,
there were seconded percentage of illiteracy or the old primary school incom-
plete. The great majority lived in the settlement for more than 15 years, with an
average income of one minimum wage, however, 35 % of them have disclosed

165
CLAPEM 2014 Universidad Nacional de Colombia

no income. As to the origin, the Guarani elders came, mostly, from the urban
environment, in Monte Alegre greater percentage from rural areas occurs. The
women were closer to the age group 40 to 60 years, only 1 % of them were head
of household and the vast majority had no income. Settlements in the study, it
was found that, contrary to what occurs in urban areas, the vast majority of the
elderly lived with their families. This demonstrates the social practice of adding
family around her other relatives, for shorter or longer periods, depending on
the needs. Respondents made reference to a list of chronic diseases - resem-
bling the general national statistics. A risk factor to health -related occupation
was found: exposure to pesticides and their side effects. Given this situation,
the demographic transition in Brazil requires actually forming new strategies
against elderly in So Paulo countryside, their living conditions and welfare, to
achieve improvement of all items raised about conditions of life and well - being
of elderly residents in rural settlements.

A Robbins-Monro algorithm for


nonparametric estimation of Functional
AR processes with Markov switching

Luis AngelRodriguez
CIMFAV, Facultad de Ingeniera, Universidad
de Valparaso, Chile and Dpto. de Matemticas,
FACYT, Universidad de Carabobo, Venezuela
(Lisandro Fermn, CIMFAV, Facultad de Ingeniera,
Universidad de Valparaso, Ricardo Ros,
Universidad Central de Venezuela)

Abstract
We consider nonparametric estimation for functional autoregressive process
with Markov-switching. First, we study the case where the complete data is
available; i.e. when we observe the Markov-switching regimen, then we esti-
mate the regression function in each regimen using a Nadaraya-Watson type
estimator. Second, we introduce a nonparametric recursive algorithm on the
case of hidden Markov-switching regimen, which restore the missing data by
means Monte-Carlo step and estimate the regression functions by a Robbins-
Monro step. Consistency and asymptotic normality of the estimators is proved.

166
CLAPEM 2014

Random graph with normal distribution

Leidy Paola Rodrguez Prieto


Universidad Distrital Francisco Jos de Caldas
(Leidy Johana Pulgarn Ovalle, Universidad
Distrital Francisco Jos de Caldas, Julieth Katherine
Molina, Universidad Distrital Francisco Jos de
Caldas).

Abstract
We described a basic notion that uses the characterization of a graph as a set of
independencies and the notion of a minimal I-map. We then defined a notion
of a perfect map and showed that not every distribution has a perfect map. We
described the concept of I-equivalence, which captures an equivalence rela-
tionship between two graphs, one where they specify precisely the same set
of independencies. Finally, we defined a partially directed graph that provides
a compact representation for an entire I-equivalence class, and we provided
an algorithm for constructing this graph. We showed the properties of the
Bayesian network representation and its semantics. These results are crucial to
understanding the cases where we can construct a Bayesian network and pres-
ent the multivariate normal distribution with a random graph.

Jackknife empirical likelihood for


unequal probability sampling

Jessica Mara Rojas Mora


Universidad de Crdoba, Colombia
(Pacheco Lpez Mario Jos, Universidade de So
Paulo, Brasil).

Abstract
We show in this article that the jackknife empirical likelihood method pro-
posed by Jing, Yuan & Zhou (2009) can be applied to construct design based
confidence intervals under unequal probability sampling without replacement.
This method is extremely simple to use in practice. A simulation study are con-
ducted to compare the Monte Carlo performance of the 95 %jackknife empiri-

167
CLAPEM 2014 Universidad Nacional de Colombia

cal likelihood confidence interval with the standard confidence interval based
on the central limit theorem. In terms of coverage probability of confidence
intervals, the jackknife empirical likelihood generally outperforms the stan-
dard confidence interval. Key words: Confidence Interval, Empirical Likeli-
hood, Jackknife, Unequal Probability Sampling.

Bayesian approach for generalized


elliptical semiparametric models

Luz Marina Rondn Poveda


Instituto de Matemtica e Estatstica, Universida
de de So Paulo - Brasil and Departamento de
Estadstica, Universidad Nacional de Colombia.
Colombia
(Heleno Bolfarine, Instituto de Matemtica e
Estatstica, Universidade de So Paulo - Brasil).

Abstract
Regression models under the assumption of independent and normally dis-
tributed errors with varying dispersion are a very flexible statistical tool for
data analysis because they admit that both location and dispersion parame-
ters depend on the explanatory variables, which allows these models can be
applied to a wide variety of practical situations. The statistical inference on
this class of models was developed by Aitkin (1987) and Verbyla (1993) under
the classic approach and by Cepeda and Gamerman (2001) under the Bayesian
approach. Xu and Zhang (2013) extended the proposal of Cepeda and Gamer-
man (2001) including a nonparametric additive effect (which is described by
an B-spline (see, for instance, Boor (1978))) in the systematic component of the
location parameter, i.e., assuming that the functional form of the dependence
between the mean or median of the response variable distribution and a con-
tinuous explanatory variable is unknown. However, in practice, there are data
sets in which the effect of a continuous explanatory variable on the dispersion
parameter has a functional form also unknown. On the other hand, as is well
known, the inference on models under the assumption of normally distributed
errors can be highly influenced by outlying observations on the response vari-
able. Therefore, in this paper we study the statistical inference and the diag-
nostic methods based on the Bayesian approach for regression models under
the assumption of independent additive errors follow normal, Student-t, slash,

168
CLAPEM 2014

contaminated normal, Laplace and symmetric hyperbolic (Barndoff-Nielsen


(1977)) distributions, where both location and dispersion parameters of the
response variable distribution include nonparametric additive components
described by B-splines. Some of those distributions for the model error pres-
ent heavier tails than the normal ones, so, the regression models based on they
seem to be a reasonable choice for robust inference. It is noteworthy that the
regression models study in this paper, called here as generalized elliptical semi-
parametric models (GESM), generalize the systematic (since they consider lin-
ear parametric and nonparametric effects) and random components (because
they consider for the model error distributions obtained as a scale mixture of
normal distributions) of the models studied by Aitkin (1987), Verbyla (1993),
Cepeda and Gamerman (2001) and Xu and Zhang (2013). This class of models
provides a rich set of symmetric distributions for the model error, some of them
with heavier/lighter tails than the normal ones as well as with different levels
of kurtosis. In order to draw samples of the posterior distribution of the inter-
est parameters we describe the prior distributions and the methods (based on
the Gibbs sampler and the Metropolis-Hastings algorithm). The performance
of this MCMC algorithm is evaluated through simulation experiments. We
apply the proposed methodology to a real data set. Some diagnostic tools under
Bayesian approach such as measures of influence based on deletion of cases and
standardized residuals are discussed in this paper and applied to analyze the
fitted models.

Estimation of latent distribution


parameters by advantage of hits

Thamara Rbia Almeida de Medeiros


(Helen Indianara Seabra Gomes, Heliton Ribeiro
Tavares).

Abstract
Response Theory (IRT) has reached a major role in the area of Educational
Assessment, as well as in several other areas of knowledge. Basically, the IRT
proposes models for latent traits, or characteristics of the individual that cannot
be observed directly. This type of variable should be inferred from the observa-
tion of related secondary variables. The IRT suggests methods to represent the
relationship between the probability of an individual to give a correct answer
to an item and its latent trait (ability or proficiency) in the area of knowledge

169
CLAPEM 2014 Universidad Nacional de Colombia

assessed. In addition, a parameter set that describes the item also influences
that probability. Depending on the case, the interest may lie in the estimation of
item parameters (calibration), estimation of individual skills, and/or estimation
of average skills. In many applications, we want to compare the average skills
of several populations in a particular subject (such as mathematics or Portu-
guese language), with items already calibrated. In this case, we can estimate
the population parameters via Marginal Maximum Likelihood, as proposed
in Zimowski & Bock (1997) in specific softwares. However, usually we do not
know the parameters of the items, but we need to estimate them at an early
stage (e.g. via EM Algorithm - see Dempster see at al, 1977) and then estimate
the population parameters afterwards (Andrade, Tavares and Valle, 2000). In
this study, we propose a method for estimating average skills of a latent distri-
bution in Item Response Models. We consider the case where we have only two
study populations, subjected to tests with common items. The proposal is based
on a function of the difference of the proportions of correct answers of the two
populations on the common items. We present some numerical results when
items set in all replicas are fixed, as well as when we vary the items. We also
performed an analysis of residues, and an exploration based on the sample size.

Modelling climate change effects


on site productivity of Nothofagus
Dombeyi forests in southern Chile

Christian Salas
(Universidad de La Frontera)
(Gregoire Timothy G. (Yale University))

Abstract
Estimation of site productivity is crucial for both management and research
purposes. Site-index is the dominant height of a forest at a reference-age, and
is the index most commonly used for site productivity estimation in forestry.
However, the concept is based on that climate for a given site is fixed through
time. We used stem analysis data of more than 300 dominant trees of the native
species Nothofagus dombeyi, spanning the geographical distribution of these
species in south-central Chile. A solution of a differential equation with a power
transformation was used as growth model, and was fitted using nonlinear
mixed-effects models, adding random-effects to one of its parameters. Later,
we regress the random-effects to site factors, habitat type, and climate variables.

170
CLAPEM 2014

We assess the proposed model in a dynamic system context in order to provide


a better analysis of the tree-growth phenomena than traditional prediction-
based analysis. We discuss on the results of the model not only when tree-level
variables changes but also site factors, habitat type, and climate variables.

Variance of an alternative
item count technique

Adriana Marcela Salazar


Universidad Nacional de Colombia
(Leonardo Trujillo, Universidad Nacional de
Colombia; Luz Mery Gonzlez, Universidad
Nacional de Colombia)

Abstract
Some alternative methodologies in order to treat sensitive questions in surveys
have been proposed in the literature (Warner, 1965; Devore, 1977; Miller,1984;
Droitcour et al., 1991; Kim and Warde, 2004; Chaudhuri, 2011; Imai, 2011; Hus-
sain, Shaz y Shabbir, 2013, among many others). A review of these methods
can be found in Trujillo and Gonzalez (2012). A particular type of these recent
methods are known as Item Count Techniques (ICT). The aim is to obtain
estimations of the prevalence or the total number of individuals in a popula-
tion possessing a particular sensitive characteristic. The particular questions
associated to these variables normally carry problems either of nonresponse
or bias. Also, most of the ICT methods proposed consider the strong assump-
tion of a sampling design corresponding to a particular simple random sample
with replacement. In this work, we extend the estimators for a finite population
under any (complex) survey sampling design with their corresponding vari-
ance. Some simulations prove that the theoretical variance is in effect equal to
the found expression in a finite population.

171
CLAPEM 2014 Universidad Nacional de Colombia

Fuzzy time series forecasting of air pollution

Ledys Llasmin Salazar Gomez


(University of Valparaso)
Salas Rodrigo (University of Valparaso).

Abstract
Each realization of a stochastic process is affected by error measurements,
meanwhile the prediction of time series only take into account the random-
ness regarded to the variability of the stochastic process through time. There-
fore, the uncertainty of the data is not considered in the conventional modeling
and it becomes necessary to design or implement techniques to manage this.
In this work, we exhibit the modeling process of a time series based on fuzzy
techniques. The implementation of if-then fuzzy rules to model the series can
tackle the problem of uncertainty in the input data. The application of fuzzy
techniques are performed using the \Takagi-Sugeno-Kang (TSK) to model the
time series and to make predictions robustly. The TSK can address the uncer-
tainty by recognizing the local behavior of the process and, moreover, we can
interpret the model of the system. Therefore, the if-then fuzzy models have the
advantage over conventional nonlinear modeling because the local representa-
tion of the process. This local representation allows the description of a nonlin-
ear system using mathematical functions and address the overall complexity
underlying the dynamic process. The antecedent of a fuzzy rule divides the
input space in local diffuse regions, while the consequent describes the dynam-
ics of these regions. The antecedent of the rule is within certain regions of the
input space and the consequent is usually an autoregressive model. Further-
more, the TSK model works as a local predictor because it is associated with
a specific region of the input space. Inside these regions, the local predictions
describes the dynamic behavior of part of a complete system captured by the
antecedent part of the rule[1].The model is applied to forecast the level of con-
centration of air pollution based on the time series, where this measurements
are prone to several sources of noise. Simulations results shows a competitive
performance in the mean square error. 1. A. Veloz, R. Salas, H. Allende-Cid and
H. Allende (2012). SIFAR: Self-Identification of Lags of an Autoregressive TSK-
based Model. In the proceeding of the 42nd IEEE International Symposium on
Multiple- Valued Logic, ISMVL 2012, Victoria, BC, Canada, May 14-16, 2012,
pp. 226-231. IEEE Press.

172
CLAPEM 2014

Markov transition models: a focus on planned


experiments with correlated binary data

Mauricio Santana Lordelo


State University of Feira de Santana (UEFS), Bahia,
Brazil
(Borges Fernandes Gilenio, Federal University
of Bahia (UFBA), Brazil; Leovigildo Fiaccone
Rosemeire, Federal University of Bahia (UFBA),
Brazil).

Abstract
The transition Markov models are a tool very important for several areas of
knowledge when studies are developed with repeated measures. They are char-
acterized by modeling the response variable over time conditional to the previ-
ous response which is known as the history. In addition it is possible to include
other covariates. In the case of binary responses, can be constructed a matrix
of transition probabilities from one state to another. In this work, four different
approaches to transition models were compared in order to assess which best
estimates of the causal effect of treatments in an experimental studies where the
outcome is a vector of binary response measured over time. Simulation study
was held taking into account a balanced experiments with three treatments of
categorical nature. To assess the best estimates standard error and bias, beyond
the percentage of coverage were used. The results showed that the marginal-
ized transition models are more appropriate in situation where an experiment
is developed with a reduced number of repeated measurements.

173
CLAPEM 2014 Universidad Nacional de Colombia

Nonlinear mixed model applied to


the ruminal degradability in situ of
sugar cane (IN NATURA) in sheep

Simone Daniela Sartorio


Universidade Federal de So Carlos (DTAiSER -
CCA/UFSCar), Araras/SP Brasil.
(Rosiana Rodrigues Alves, Empresa Brasileira
de Pesquisa Agropecuria (Embrapa) - Pesca e
Aquicultura, Palmas/TO - Brasil, Pedro Henrique
Rezende de Alcntara, Empresa Brasileira de
Pesquisa Agropecuria (Embrapa) - Pesca e
Aquicultura, Palmas/TO - Brasil).

Abstract
To determine the proportion of nutrients consumed by ruminants, non-lin-
ear models are widely used in studies that seek to estimate the parameters of
ruminal degradation kinetics, through classical methods of univariate analysis.
However, as these studies involve longitudinal data, the use of mixed meth-
odology may be more favorable to describe this phenomenon. The aim of this
study was to use non-linear mixed models (MNLM) in parameter estimation
of ruminal degradation in situ kinetics of sugar cane, in sheep fed on diets with
different roughage (R): concentrate (C) proportions. The data used were deter-
mined by the in situ technique using 4 adult males sheeps, without race, can-
nulated in rumen. The experiment was originally designed in a split plot, with
plots in a Latin square with 4 animals, 4 periods and 4 treatments (consisting of
diets with different roughage (R): concentrate (C): 100R:0C; 80R:20C, 60R:40C,
40R: 60C), and the plots, 13 incubation times (0, 12, 24, 48, 72, 96, 120, 144, 168,
192, 216, 240 and 312 hours). We adopted the mixed logistic nonlinear model to
explain the behavior of the degradability of indigestible neutral detergent fiber
(iNDF) and indigestible acid detergent fiber (iADF) of sugar cane (in natura)
depending on the incubation times. Variance components were estimated by
the likelihood maximum method. In the selection of random and fixed part of
the model, we used the likelihoods ratio test (LRT) and the information crite-
rion AIC and BIC. The analyses were performed using the nlme package of R
software, 3.0.1, considering a significance level 5%. The final model for iNDF
included a random effect in the parameter I of logistic model, only 3 curves are
needed to describe their degradability over time, where the treatments with
roughage 60 and 40% didnt differ. Have to iADF, the final model included ran-
dom effect in the 2 parameters of the logistic model (I and k), however, 4 curves

174
CLAPEM 2014

were needed to describe the treatments, with the parameter k of the treatments
with 80 and 40% roughage didnt differ. For both variables, the highest percent-
age of roughage provided the highest degradation rate. The correlation of longi-
tudinal data was properly estimated, adequately explaining the extra variability
caused by the effects of factors associated with the experimental design, with
the inclusion of random effects in the model parameters. This fact, if not con-
sidered, can affect the estimates and the associated standard error and, thus,
alter the results significantly. Moreover, the mixed approach is quite attractive
when the research also aims to understand the behavior of the process degrad-
ability over the incubation times.

Different methods for handling


longitudinal binary outcome subject
to potentially random dropout

Ali Satty
University of KwaZulu-Natal.
(Henry Mwambi, University of KwaZulu-Natal
Geert Molenberghs, Hasselt University).

Abstract
This paper compares the performance of weighted generalized estimating
equations (WGEE), multiple imputation based on generalized estimating equa-
tions (MI-GEE) and generalized linear mixed models (GLMM) for analyzing
incomplete longitudinal binary data when the underlying study is subject to
dropout. The paper aims to explore the performance of the above methods in
terms of handling dropouts that are missing at random (MAR). The methods
are compared on simulated data. The longitudinal binary data was generated
from a logistic regression model. Dropouts are generated under several differ-
ent dropout rates and sample sizes. The methods were evaluated in terms of
bias, accuracy and mean square error in cases data are subject to random drop-
out. In the conclusion, MI-GEE method is doing better in both (not in terms)
small and large sample sizes.

175
CLAPEM 2014 Universidad Nacional de Colombia

A principled over-penalization strategy;


analysis and application to histogram
selection in density estimation

Adrien Saumard
(Universidad de Valparaso).

Abstract
Penalization is a general tool in nonparametric statistics, that allow to select
an estimator (or equivalently a model) along many others. In many situations,
it is possible to design accurate penalties that are asymptotically optimal and
non-asymptotically nearly optimal ([5]). However, it is well-known that opti-
mal penalties that are prescribed by theory usually bene.t from a slight over-
penalization in practice ([4], [1]), for sample sizes that are small to moderate.

In the case of the AIC criterion, a few non-asymptotic corrections have thus
been proposed, such as the AICc criterion of Burnham et Anderson ([3], [4]) or
the over-penalization proposed by Birg and Rozenholc ([2]). However, these
attempts are not su ciently theoretically grounded to be suitably generalized
to other situations.

We propose a general and principled over-penalization strategy in the context


M-estimation. Our penalty correction relies on some considerations on the
deviations of the excess risks of the estimators at hand. We theoretically vali-
date our strategy in the classical case of histogram selection in maximum likeli-
hood density estimation. We provide non-asymptotic oracle inequalities for the
Kullback-Leibler divergence under various assumptions. In particular, the case
of unbounded log-densities is tackled. Finally, we show good performances on
simulations, compared to the other previous corrections.

References
[1] S. Arlot. Choosing a penalty for model selection in heteroscedastic regres-
sion, June 2010. arXiv:0812.3141.

[2] L. Birg and Y. Rozenholc. How many bins should be put in a regular his-
togram. ESAIM Probab. Stat.,10:24.45 (electronic), 2006.

[3] K. P. Burnham and D. R. Anderson. Multimodel inference: understanding


AIC and BIC in model selection. Sociol. Methods Res., 33(2):261.304, 2004.

176
CLAPEM 2014

[4] G. Claeskens and N. L. Hjort. Model selection and model averaging. Cam-
bridge Series in Statistical and Probabilistic Mathematics. Cambridge Uni-
versity Press, Cambridge, 2008.

[5] P. Massart. Concentration inequalities and model selection, volume 1896


of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the
33rd Summer School on Probability Theory held in Saint-Flour, July 6.23,
2003, With a foreword by Jean Picard.

Testing with dependent functional data

MatthieuSaumard
(PUCV)

Abstract
Procedure of testing with functional data, for most of them, has been designed
in the case of independent variables. The case of dependent variables is an
important case to be explored. Two tests, similar to the ones we propose, have
been already established in presence of dependent random variables, we refer
to Aue et al. and Horvath et al.. Nevertheless, they are performed to study the
stability in the functional liner model or the functional autoregressive process.
Delsol et al. have considered structural test in presence of independent random
variables. In this poster, we will present a simple test in the context of func-
tional time series. This is a test of no-effect in the functional linear model with
dependent regressors.

177
CLAPEM 2014 Universidad Nacional de Colombia

A comparison of estimators of item


response models: A simulation study

Helen Indianara Seabra Gomes


Universidade Federal do Par.
(Thamara Rbia Almeida de Medeiros,
Universidade Federal do Par, Heliton Ribeiro
Tavares, Universidade Federal do Par).

Abstract
The concept of the Item Response Theory (IRT) was founded around 1930, but
it was axiomatized in the 1960s. The IRT is one of the theories of latent model-
ing that emerged in the 1930s, which posit that human behavior is a result of
hypothetical processes called latent traits. In Brazil, many applications in the
field of Education have adopted the TRI as a standard methodology for calibra-
tion of items (question) and the consequent estimation of these latent traits (or
skills), and now serve as selective processes for entrance to universities, stu-
dent funding and scholarships on federal government projects. It is essential to
develop studies to identify which method, if any, is most appropriate for esti-
mating these skills in order to generate the fairest possible results, given the
purposes of these tests. According to Andrade, Tavares & Valle (2000), the IRT
is based on a set of statistical models that seek to mathematically represent the
probability of an individual to give/select the right answer for an item as a func-
tion of the parameters of this item and the skills of that individual. The meth-
odologies employed by this theory require computational implementations of
the methods available. To estimate the parameters of the items, methods such
as Marginal Maximum Likelihood Method, through the EM Algorithm have
been adopted. The skills have been generally estimated by Maximum Likeli-
hood (ML), EAP (Expected a Posteriori), MAP (Maximum A Posteriori), and
weighted and Biponderada. Recently, some applications have been developed
for using IRT, such as IRT - Pro - MG BILOG, PARSCALE, Multilog, TEST-
FACT, and LOGIST. In this study, we describe a study of the implementation
of the estimation process of skills aiming to compare the performance of some
key softwares.

178
CLAPEM 2014

Continuous process derived from the


solution of generalized Langevin equation:
Theoretical properties and some simulations

Josiane Stein
Mathematics Institute, Federal University of Rio
Grande do Sul
(Slvia R.C. Lopes, Mathematics Institute, Federal
University of Rio Grande do Sul,Ary V. Medino,
Mathematics Institute, Federal University of Rio
Grande do Sul).

Abstract
This work presents a continuous time process derived from generalized Lan-
gevin equation (GLE). The main interest is to study the GLE when the noise
process has infinite second moment. For this situation, we consider the noise
as being a symmetric -stable Lvy process, which can also have infinite first
moment. One goal is to study the dependence structure of the process, but the
function of autocovariance is not defined in the case of infinite second moment
processes. We propose to use a dependence measure, the so-called codifference.
We also propose an estimator for this dependence measure. Another interest
in this work is to estimate the process parameters. We consider the maximum
likelihood estimator for a particular case of the general process, which is the
one derived from the solution of the classical Langevin equation. The continu-
ous process resulting from this equation is called the Ornstein-Uhlenbeck (OU)
process (see Barndorff and Shephard, 2001, Jongbloed et al., 2005 and Zhang
and Zhang, 2013). Since the -stable distribution has a closed formula in only
three cases, that is, when {0.5,1,2} (see Samorodnitsky and Taqqu, 1994),
it is necessary to use numerical methods for the process generation and the
estimation by maximizing the likelihood function. Consider the GLE, given by

V (t ) = (t s)V (s)ds + L(t )


0
(5.1)
V (0) = V ,
0

where {L(t )}t 0 is a symmetric -stable Lvy process and V0 is a random vari-
able independent of L(t). Under a few conditions in L() and 1< 2 , the solu-
tion of this equation is given by

179
CLAPEM 2014 Universidad Nacional de Colombia

V (t ) = V0 (t ) + (t s)dL(s), (5.2)
0

where V0 V (0) and () satisfies

(t ) = (t s)(s)ds
0
(5.3)
(0) =1.

One wants to define a dependence measure for any stationary process. If
{ X (t )}t 0 is any stationary process, then the codifference function is given by

{
(t ) = ( X (t ), X (0)) = log ei ( X (t ) X (0)) }
{
log e i ( X (t )) } log { e
i ( X (0))
}
. (5.4)

For the stochastic process given by (5.2), the codifference function can be cal-
culated as

V ((t ) 1)
(t ) = log 0
log(V0 (1)), (5.5)
V ((t ))
0

where V () is the characteristic function of the random variable V0 and ()


0
is the function defined in (5.3). We want to estimate the parameters of the pro-
cess given in (5.2). For each function () and each noise process L() , we have
t
different parameters to be estimated. When (t ) = e , > 0 (OU process), it
is possible to calculate the log-likelihood function. Consider the discretization
proposed by Zhang and Zhang (2013)

Vkh = e hV(k 1)h + Z k ,h , where (5.6)

1/
d
kh 1 e h
Z k ,h = e ( s kh )dZ s = Sk , (5.7)
( k 1)h

{Sk }k is an independent and identically distributed sequence of random vari-


ables with symmetric -stable distribution (with scale parameter ) and h is
the step size. Then, it is possible to calculate the log-likelihood function as
N
L( | Z 0,h ,, Z N ,h ) = log( f (Z k ,h | )), (5.8)
k =0

where f () is the density function of the -stable distribution, = ( , , )


is the parameters vector and N is the sample size. By numerical maximization
of the function L(), we obtain the maximum likelihood estimator .

180
CLAPEM 2014

References
1. Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-Gaussian Ornstein-
Uhlenbeck-based models and some of their uses in financial economics.
Journal of the Royal Statistical Society, Serie B, 63(2), 167-241.

2. Jongbloed, G. et al. (2005). Nonparametric inference for Lvy-driven Orns-


tein-Uhlenbeck processes. Bernoulli, 11(5), 759-791.

3. Samorodnitsky, G. and Taqqu, M. S. (1994). Stable non-Gaussian random


processes. New York: Chapman & Hall.

4. Zhang, S. and Zhang, X. (2013). A least squares estimator for discretely


observed Ornstein-Uhlenbeck processes driven by symmetric -stable
motions. Annals of the Institute of Statistical Mathematics, 65, 89-103.

181
CLAPEM 2014 Universidad Nacional de Colombia

Analysis of funtional magnetic resonance


imaging via item response theory

Heliton Tavares
Federal University of Par.
(Dalton Andrade Federal, University of Santa
Catarina, Tnia Macedo, Unesp).

Abstract
The data analysis of Magnetic Resonance Imaging (MRI) is a diagnostic imag-
ing method well established in medical practice and increasing in terms of
its development. Given the high ability to differentiate tissues, the range of
applications extends to all parts of the human body and explores anatomical
and functional aspects. Analysis of Functional Magnetic Resonance Imaging
(fRMI) stands out as one of MRI techniques that has allowed to explore brain
functions such as memory, language and motor control. Several applications
have emerged in the evaluation of cognitive processes, as well as monitoring
the growth of brain tumors, pre-surgical mapping studies of mental chro-
nometry, and also as a method of diagnosis in Alzheimers disease. Basically,
fMRI analyzes blood flow to detect the brain areas activated by some stimu-
lus, a function, and multiple images (slices) are obtained simultaneously, and
this process is repeated over time, sometimes with several individuals. How-
ever, the brain performs other tasks in parallel, so the biggest challenge is to
accurately identify the area activated by the stimulus under study. This article
aims to present a proposal for identification of the active region based on Item
Response Theory, and compared with the current methodology. Each image is
subdivided into small voxels (pixels), which are categorized as Active or Inac-
tive. We assume that the probability of a voxel to be active will be given by a
one-parameter logistic model (LM1), which represents the activity level of the
voxel. The parameters are estimated by Marginal Maximum Likelihood and
represent the area activated by the function under study.

182
CLAPEM 2014

Bayesian analysis of gamma regression


models: State of the art and extensions

Francisco Torres-Avils
Universidad de Santiago de Chile.

Abstract
This paper presents a review of the inference in gamma regression models
from a Bayesian perspective with emphasis on mixed models. The work begins
by presenting the usual construction of this class of regression models, and
later defines some extensions of these ones. We discuss the choice of the link
function, the elicitation of prior distributions, inclusion of random effects and
model selection through a simulation study. Finally, the methodology is illus-
trated using real data from the public health area.

A directional multivariate VaR

Ral Andrs Torres Daz


Universidad Carlos III de Madrid
(Rosa Elvira Lillo Rodrguez, Universidad Carlos
III de Madrid, Henry Laniado Rodas, Universidad
Carlos III de Madrid).

Abstract
The traditional measure of risk in an assets portfolio is the VaR (Value at Risk)
due to its good properties and easy interpretation. However, only a few refer-
ences are devoted to the generalization of this concept to the multivariate con-
text. In this work, we introduce the definition of a multivariate financial risk
measure MRVaR based on the directional extremality quantile notion recently
introduced in the literature. The directions in the definition of the MRVaR can
be chosen by the investor according to her/his risk preferences. We state the
main properties of this MRVaR, the non-parametric estimation and a robust-
ness analysis. We also show the advantage of using this MRVaR with respect to
other multivariate VaR introduced in the recent literature. Finally, we illustrate

183
CLAPEM 2014 Universidad Nacional de Colombia

our definition with the Archimedian copula for which it is possible to obtain
the explicit expression of the MRVaR.

Is a Brownian motion skew?

Soledad Torres
CIMFAV, Facultad de Ingeniera, Universidad de
Valparaso
(Antoine Lejay, Universit de Lorraine, iecn, umr
7502, Vandoeuvre-ls-Nancy, F-54500, France cnrs,
iecl, umr 7502, Vandoeuvre-ls-Nancy, F-54500,
France Inria, Villers-ls-Nancy, F-54600, France
Ernesto Mordecki, Centro de Matemtica, Facultad
de Ciencias, Universidad de la Repblica).

Abstract
We study the asymptotic behavior of the maximum likelihood estimator corre-
sponding to the observation of a trajectory of a skew Brownian motion, through
a uniform time discretization. We characterize the speed of convergence and
the limiting distribution when the step size goes to zero, which in this case are
non-classical, under the null hypothesis of the skew Brownian motion being an
usual Brownian motion. This allows to design a test on the skewness parameter.

Projective convergence

Liliana Trejo Valencia


Instituto de Fsica, Universidad Autnoma de San
Luis Potos.
(Edgardo Ugalde Saldaa, Instituto de Fsica,
Universidad Autnoma de San Luis Potos).

Abstract
We define the notion of projective convergence of probability measures with
complete support. Related to the topology induced by this type of convergence,
we define the projective distance ho and we give some of their topological prop-
erties such as non-separability and completeness of the space. Our motiva-

184
CLAPEM 2014

tion is to ensure that our definition is strictly stronger than weak-convergence


although we give some examples that allow us to conclude that is not compa-
rable with the ard-distance introduced by Ornstein (1974). We prove that under
the projective convergence scheme, the entropy is preserved and we provide
conditions for which the projective limit of mixing measures is mixing.

Stochastic models for electricity


price in Colombia

Estefana Uribe Gaviria


Universidad Nacional de Colombia
(Alfredo Trespalacios Carrasquilla, Empresas
Pblicas de Medelln, Universidad EAFIT.

Abstract
The agents who take part in the electrical markets face the uncertainty of
the future behaviour of the market, impeding decision-making for the short,
medium and long term. The spot price of energy determines the form in which
the commercial exchanges are realized. In this work, several models appear,
which capture the dynamics of the energy spot price of Colombia and the
estimation of its parameters. The seasonality reversion is represented to long
term average and dependence by fundamental variables of this market. The
occurrence of the El Nio phenomenon generated alterations in the price of
the energy both in its expected value and in variance, as well as the hydraulic
generation of the system, the level of the flows and the demand of energy. The
electrical reform implanted with the Laws 142 and 143 of 1994 of Colombia,
created a wholesale competitive market, in order to achieve the efficiency in the
service of electricity and the free entry to the agents interested in giving it. This
market is known as Market of Wholesale Energy (MEM) and in it participate
the agents who develop the activities of generation, transmission, distribution
and commercialization, as well as the big consumers of electricity (Prez et al.,
1999). In 1995, a way was opened for the free competition and private partici-
pation whereby the competition in the generation of energy helped to reflect
the energy real spot price and the variations it suffers due to different factors
such as the occurrence of the phenomenon known as El Nio, availability of
water, costs of generation, among others (Botero et al., 2008). A suitable under-
standing of the factors that influence the price of a bag of energy will allow
the agents who compromise on this market, to define strategies that maximize

185
CLAPEM 2014 Universidad Nacional de Colombia

their income and simultaneously to adequately manage the risk of the varia-
tions of their cash flow by means of the use of financial available derivatives
(Trespalacios et al., 2012).

Previously, several statistical models have proposed themselves for the adjust-
ment and forecast of the price spot such as (Botero et al., 2008), who implement
a process of a statistics family based on the models WEAPON AND ARIMA
(Gil, et al., 2008) and (Lucia et al., 2002) present models that besides taking into
account the effect of the occurrence of the El Nio phenomenon also work on the
explanatory variables in mind: level of the flows, demand of energy and the gen-
eration of energy. The models evaluated by (Pilipovic, 1998) and (Geman et al.,
2003) explain the seasonal variation, reversion to the average and jumps. For the
modeling, they first analyzed the historical behavior of the energy spot price in
Colombia from January 2000 until December 2013, the stationarity, the jumps or
beaks that they present over time and to which factors these anomalies are owed.
Then, they proposed the models which are considered to be suitable departing
(Lucia et al., 2002), the estimation of the respective parameters and the prediction
of the energy spot price for the year 2014 of each one of the models. We think the
energy spot price in Colombia presents bosses of seasonal variation and reversion
to the average, since the previous studies. A change is demonstrated in the struc-
ture of the variance by the end of the year 2013 that is explained by the reduction
of the hydrological contributions in the western zone of the country without a
climatically important event, this deepened the possibly for a small availability of
natural gas to supply the thermal plants in periods of shortage.

A semiparametric approach for joint


modeling of median and skewness

Luis Hernando Vanegas Penagos


Instituto de Matemtica e Estatstica, Universidade
de So Paulo, Brasil and Departamento de
Estadstica, Universidad Nacional de Colombia,
Colombia.
(Gilberto A. Paula, Instituto de Matemtica e
Estatstica, Universidade de So Paulo, Brasil).

Abstract
Nonlinear regression models are commonly applied in areas such as Biology,
Chemistry, Medicine, Economics and Engineering. The analysis based on mod-

186
CLAPEM 2014

els under normal errors and constant variance is the most popular when the
variable of interest is continuous, due to desirable statistical properties and a
comprehensive developed theory. Nevertheless, the application of such mod-
els may be inadequate in some scenarios commonly found in practice. For
instance, as shown in this paper, ignoring the skewness of the response vari-
able distribution may introduce biases on the parameter estimates and/or on
the estimation of the associated variability measures. To deal with this prob-
lem, some proposals have been made in the literature to replace the normal-
ity assumption by more flexible classes of distributions. For example, in the
context of asymmetric and heavy-tailed responses, Lin et al. (2009) derived
diagnostic methods in nonlinear skew-t-normal regression models; Cancho et
al. (2010) studied nonlinear skew-normal regression models using classical and
Bayesian approaches; Lachos et al. (2011) introduced heteroscedastic nonlinear
regression models based on scale mixtures of skew-normal distributions; and
Labra et al. (2012) derived diagnostic methods for the class of regression models
introduced previously by Lachos et al. (2011). Although the models studied in
these papers are attractive, they have some limitations. For instance, modeling
the mean instead the median and assuming that the skewness parameter is
constant across the observations. That being so, this paper provides a unified
theoretical framework for semiparametric regression analysis based on log-
normal, log-Student-t, Birnbaum-Saunders, Birnbaum-Saunders-t and other
skewed and strictly positive distributions, in which both the median and the
skewness of the response variable distribution are explicitly modeled. In this
setup, named here as log-symmetric regression models, the median is described
using a parametric nonlinear function, whereas the skewness is modeled using
a semiparametric function whose nonparametric component is approximated
by a natural cubic spline (see, for instance, Green and Silverman (1994)). In the
context of nonparametric and semiparametric models, it is possible to cite some
of the most important contributions. For instance, Hastie and Tibshirani (1990)
introduced the class of generalized additive models and Rigby and Stasinopou-
los (2005) introduced the generalized additive models for location, scale and
shape (GAMLSS), which deal with the semiparametric mixed joint modeling
of all parameters in a general class of distributions. Rigby and Stasinopoulos
(2006, 2007) also illustrated the use of semiparametric models based on Box-
cox-t distribution; and developed a very flexible implementation of GAMLSS in
the statistical package R (www.R-project.org). More recently, Ibacache-Pulgar
et al. (2013) derived diagnostic tools in symmetric homoscedastic semipara-
metric models.

187
CLAPEM 2014 Universidad Nacional de Colombia

A sinusoidal family of probability density


function to model some problems

Velasco Jairo
Universidad Nacional de Colombia

Abstract
It is left to consider the form of a family of probability density functions that
allow modeling and studying cases where the data are clustered preferentially
around more than an apparent average. The functional form can be seen from

2ae a(b x ) 2 k
f ( x , a , b, k ) = a (b x ) 2 sin
I (x ) (1)
[1 + e ] 1 + e a(b x ) ( , )

with, k + , a + , b The functional form can vary the average varying


b ( E(x ) = b ), the dispersion at varying a and the number of virtual averages
k. Some statistical calculation as E(x) and some estimates of these parameters
achieved with the method of maximum likelihood from a sample is presented.

Estimation resource selection


functions using Bootstrap

SandraVergara Cardozo
Universidad Nacional de Colombia
(Bryan F. Manly, Western Ecosystem Technology
Inc, Raydonal Ospina, Federal University of
Pernambuco).

Abstract
Resource selection functions (RSFs) are used for quantify how animals are
selective in the use of the habitat period or food. A Resource Selection Prob-
ability Function (RSPF) can be estimated if N, the total number of units in the
population, and n1 the total number of used units in the study period is both
known and small. An approximation of the RSPF can then be estimated using
any standard program for logistic regression but the variances of the estimates

188
CLAPEM 2014

of the parameters are too small. Three methods of bootstrap sampling, para-
metric, nonparametric and a modified parametric method are proposed for
the estimation of variances, with a discussion about the limitations of logistic
regression for estimating RSPF. The method for estimating the RSPF described
here has potential applications in medicine, ecology and other areas.

Keywords: resource selection functions (RSFs); resource selection probability


function (RSPF); bootstrap; logistic regression.

Consumer demand system based on


a discrete-continuous model and the
double exponential distribution

Ignacio Vidal
Universidad de Talca
(Felipe Vsquez Universidad de Concepcin, Walter
Gmez Universidad de La Frontera).

Abstract
We consider the problem of describing consumer choice situations character-
ized by the simultaneous demand for multiple alternatives that are imperfect
substitutes for one another. In this paper, we propose a modification of the main
econometric technique to deal with this problem, the so called Kuhn-Tucker
multiple discrete-continuous economic consumer demand model. Our pro-
posed approach provides tractable forms for the densities and is based on the
use of a symmetric probability distribution for the error. The considered utility
function has a quadratic structure allowing non-additive preferences. Our pro-
posal includes, then, perfect and imperfect substitutes in choice alternatives and
gives an explicit functional form for the interaction of any pair of alternatives.
The functional form can be coded using modern formal calculation software
tools. By the maximum likelihood method we consider a constrained optimiza-
tion problem appears, that takes into account the mathematical assumptions of
the whole approach. We illustrate our methodology with a real data set related
to the time use of the inhabitants of Santiago de Chile.

189
CLAPEM 2014 Universidad Nacional de Colombia

Prediction of ozone concentration in the


region of Grande Vitria, Esprito Santo,
Brazil, using The Armax-Garch model

Zambon Monte Edson


Federal University of Esprito Santo
(Toledo De Almeida Albuquerque Taciana,
Federal University of Minas Gerais; Reisen Valdrio
Anselmo, Federal University of Esprito Santo)

Abstract
The objective of this study was to estimate the hourly ozone concentration in
region of Grande Vitria, Esprito Santo, Brazil, using ARMAX/ GARCH model,
for the period from 2011/01/01 to 2011/12/31. The data were provided by State Insti-
tute of Environment and Water Resources (IEMA). The models were estimated
for three stations: Laranjeiras, Ensead do Su and Cariacica. Some parameters
measured at the stations were adopted as explanatory variables of ozone concen-
tration, namely: temperature, relative humidity, wind speed and concentration of
nitrogen dioxide. These variables were significant and improved the fit of the esti-
mated model. The hourly predictions for the day 2011/12/31 (reserved to verify
the accuracy of the model) were close to the observed values and the estimates
generally followed the path of daily ozone concentration. When compared with
the ARMA and ARMAX models, ARMAX-GARCH model proved to be more
effective in the prediction of episodes of ozone pollution (higher hourly concen-
tration of 80 ug/m3), reduced the number of false alarms estimated and showed a
lower rate of occurrence of undetected episodes.

190
CLAPEM 2014

Forecasting time series with integer values

Luz Milena Zea Fernndez


Universidade Federal do Rio Grande do Norte
(Pinto Vasconcellos Klaus Leite, Universidade
Federal de Pernambuco).

Abstract
The study of time series is one of the most important subjects in the statistical
literature, the main purpose being to provide methods for modeling data sets
that exhibit correlation over time and to allow to make predictions. Integer-
valued time series have paid the attention because they occur in many contexts,
for example, the numbers of accidents in a manufacturing plant each month,
or the numbers of fishes caught in a particular area of sea each week, often as
counts of events, objects or individuals in consecutive intervals or at consecu-
tive points in time. In the last three decades, there has been an increasing inter-
est in proposing methodologies to study integer-valued time series, including
how to obtain non-negative and integer predictors. We center our attention in
studying and proposing new forecasting procedures for the Integer-valued first-
order Autoregressive Process (INAR(1)) with Poisson marginal distribution,
based on the binomial thinning operator and for the Integer-valued first-order
Autoregressive Conditional Heteroskedasticity Process (INARCH(1)), which
takes into account the over dispersion.

Special Session

Probability and mathematical statistics:


Foundation for Data Science

Alexander Infanzon
SAS

Abstract
There is a shortage of data scientists worldwide. Data scientists are professionals
that transform clusters of data collected by different companies or groups into
useful information. In order to win in the current market, create new products

191
CLAPEM 2014 Universidad Nacional de Colombia

or to generate new business, data scientists need to follow the trends that Big
Data and analytics dictates. Although there are now a couple of dozen mas-
ters programs designed to meet the urgent need for talent of Big Data analyt-
ics, there is still a shortage of these resources. This talk focuses on creating
awareness of the situation. It describes the skills required to be a successful
data scientist and how SAS is helping to address this gap by launching the SAS
Analytics U program.

192
CLAPEM 2014

Index of Authors

Achillefs Tzioufas 102 Carolina Eun 133


Adolfo J. Quiroz 104 Carolina Marchant 151
Adrian Hinojosa 146 Ctia R. Goncalves 139
Adriana Marcela Salazar 171 Christian Olivera 60
Adrien Saumard 176 Christian Salas 170
Adrilayne dos Reis Arajo 113 Christophe Gallesco 35
Airam Aseret Blancas Bentez 122 Claire Lacour 22
Alberto Contreras-Cristn 38 Clmentine Prieur 29
Alejandra Christen 62 Cristian Bayes 93
Alejandra Martnez 95 Cruz Reyes Danna Lesley 129
Alejandro Cholaquidis 50
Alejandro Rodrguez 163 Daniel Andrs Daz Pachn 86
Alessandra dos Santos 131 Daniel Flores Agreda 87
Alexander Infanzon 191 Daniel Paredes 159
Alexandru Hening 97 Daniele S. Baratela
Alfio Marazzi 47 Martins Neto 152
Ali Satty 175 Daniil Ryabko 105
Alison Etheridge 7 David Belius 40
Allan Fiel 59 David Mrquez 36
lvaro Calvache Archila 76 Dbora Borges Ferreira 123
Ana Mara Gomez Lamus 137 Dbora Fernanda Castro
Andrea Marcela Cruz Moreno 67 Vianna Oliveira 126
Andrs Gutirrez 44 Denise Britz do
Andressa Cerqueira 127 Nascimento Silva 45
Andy Rafael Domnguez 130 Diana Milena Galvis Soto 136
Arno Siri- Jgousse 82 Douglas Rodrigues Pinto 164

Bhargab Chattopadhyay 72 Edson Marcos Leal


Brenda Betancourt 121 Soares Ramos 148
Bruno Monte de Castro 155 Edson Zambon Monte 190
Emlio Augusto Coelho-Barros 121
Camilo Hernndez 143 Enrico Bibbona 90
Carenne Ludea 12 Estefana Uribe Gaviria 185
Carles Serrat 31 Eustasio del Barrio 41
Carlos Alberto
Ramirez Vanegas 161 Fabio Humberto Nieto 56
Carlos Eduardo Alonso-Malaver 66 Francisco Casanova Del ngel 125
Carlos Mario Lopera 83 Francisco Javier
Carlos Matrn Bea 46 Delgado-Vences 100
Carlos Valencia 64 Francisco Torres-Avils 183

193
CLAPEM 2014 Universidad Nacional de Colombia

Freddy Omar Lpez Quintero 149 Jos B. Hernndez C. 143


Frdric Richard 107 Jos Len 29
Jos Rafael Len 101
Gerard Biau 11 Josiane Stein 179
Gerardo Barrera Vargas 120 Josmar Mazucheli 152
Germn Moreno 89 Juan Carlos Espinosa Moreno 132
Grazyna Badowski 102 Juan Carlos Pardo 23
Guevara Otiniano Cira E. 158 Julian Martinez 51
Julieth Vernica
Hans-Georg Mller 26 Guarn Escudero 142
Hctor Araya 112
Helen Indianara Seabra Gomes 178 Karine Bertin 21
Hlne Boistard 43 Klaus Leite Pinto Vasconcellos 57
Heliton Tavares 182
Hugo Andres Gutierrez Rojas 142 Laura Rifo 163
Hugo de la Cruz 58 Leandro Tavares Correia 74
Ledys Llasmin Salazar Gomez 172
Ignacio Correa 127 Leidy Paola Rodrguez Prieto 167
Ignacio Lobato 55 Leonardo Moreno 156
Ignacio Vidal 189 Liliana Peralta Hernndez 160
Iriarte Salinas Yuri Antonio 147 Liliana Trejo Valencia 184
Isabel Llatas Salvador 103 Lisandro Fermn 134
Lorena Mansilla 150
Jaime A. Londoo 91 Luc Devroye 38
Jaime R. Arrue 115 Luis Angel Rodriguez 166
Jaime San Martn 49 Luis Barboza Chinchilla 61
Jairo Arturo Ayala Godoy 117 Luis Fernando Grajales 89
James Robins 19 Luis Hernando
Jane-Ling Wang 27 Vanegas Penagos 186
Janice Rodrigues Luis Melo 68
Placeres Borges 165 Luna Hidalgo Carneiro 145
Jessica Mara Rojas Mora 167 Luz Marina Rondn Poveda 168
Joan Jess Amaya 112 Luz Milena Zea Fernndez 191
Joaqun Fontbona 34
Joaqun Fontbona 59 Manuel Gonzlez Navarrete 140
Johanna Marcela Marc Lavielle 30
Orozco-Castaeda 157 Marcelo Sobottka 84
John Freddy Moreno Trujillo 91 Mrcia DElia Branco 124
Jonathan Acosta Salazar 111 Marco Avella Medina 96
Jorge Alberto Len 48 Mara Clara Fittipaldi 24
Jorge Clarke de la Cerda 100 Mara Nela Seijas Gimnez 69
Jorge Figueroa-Ziga 73 Mara Nubia Quevedo Cubillos 161
Jos Alejandro Maria Regina Tavares Madruga 149
Gonzlez Campos 139 Matthieu Saumard 177

194
CLAPEM 2014

Mauricio Junca 72 Rodrigo Herrera 144


Mauricio Molina 154
Mauricio Santana Lordelo 173 Salvador Flores 94
Miguel A. Delgado 77 Samy Tindel 36
Miraine Dvila Felipe 81 Sandra Palau Caldern 80
Moshe Porat 99 Sandra Vergara Cardozo 188
Myrian Elena Vergara Morales 79 Sebastien Bubeck 37
Selvamuthu Dharmaraja 70
Natalia Bahamonde 118 Sergio Armando
Nelson Alirio Cruz Gutirrez 128 Camelo Gmez 125
Nicolas Fraiman 41 Sergio Yez 86
Norman Giraldo 77 Simone Daniela Sartorio 174
Soledad Torres 184
Orietta Nicolis 97 Solesne Bourguin 48
Oscar Ivan Barreto 120 Somnath Datta 17
scar O. Melo 153 Stefan Alberto Gmez Guevara 137
Stijn Vansteelandt 20
Pablo Groisman 71 Stphane Menozzi 28
Pamela Llop 25 Sylvain Robbiano 74
Paul Embrechts 7
Paula M. Spano 85 Thamara Rbia
Pawel Hitczenko 81 Almeida de Medeiros 169
Pedro A. Torres-Saavedra 63 Thanh Mai Pham Ngoc 98
Pedro C. lvarez Esteban 54 Thomas Mikosch 13
Pedro Regueiro 162 Thomas Richardson 18
Prateek Bansal 119 Timothy E. OBrien 32

Rafael E. Borges 123 Vadim Azhmyakov 117


Rafael Serrano 92 Valeria Fonseca Daz 134
Ramss Mena Chvez 39 Velasco Jairo 188
Ral Andrs Torres Daz 183 Vctor Ignacio Lpez 33
Regina Liu 8 Vctor J. Yohai 42
Reinaldo B. Arellano-Valle 114 Vctor Rivero 13
Ricardo Maronna 106 Viswanathan Arunachalam 70
Roberto Imbuzeiro Oliveira 11, 35
Rodrigo Assar 116 William David
Rodrigo Cesar Freitas Aristizbal Rodrguez 114
da Silva Federal 135 Wilmer Pineda 160

195
Esta publicacin se termin de editar,
imprimir y encuadernar en septiembre de 2014
en Bogot, D. C., Colombia.
Se compuso en fuente tipogrfica
Minion Pro de 10 puntos.
ISSN 2389-9069
Organized by:

xiii
Sponsors: clapem-Latin American Congress of
Probability and Mathematical Statistics

Latin American Congress of Probability


and Mathematical Statistics
September 22nd to 26th, 2014
Cartagena de Indias, Colombia
Hotel Caribe

ABSTRACTS BOOK

También podría gustarte