Xiii Clapem-Abstracts Book
Xiii Clapem-Abstracts Book
Xiii Clapem-Abstracts Book
Organized by:
xiii
Sponsors: clapem-Latin American Congress of
Probability and Mathematical Statistics
ABSTRACTS BOOK
ISSN 2389-9069
xiii
clapem-Latin American Congress of
Probability and Mathematical Statistics
ABSTRACTS BOOK
Editors
Viswanathan Arunachalam
(Departamento de Estadstica, Universidad Nacional de Colombia)
Johanna Garzn
(Departamento de Matemticas, Universidad Nacional de Colombia)
Sandra Vergara
(Departamento de Estadstica, Universidad Nacional de Colombia)
Organizing Committee
Liliana Blanco Castaeda (Chair) Universidad Nacional de Colombia (Bogot)
Viswanathan Arunachalam Universidad Nacional de Colombia (Bogot)
Johanna Garzn Merchn Universidad Nacional de Colombia (Bogot)
Jos Alfredo Jimnez Universidad Nacional de Colombia (Bogot)
Leonardo Trujillo Universidad Nacional de Colombia (Bogot)
Sandra Vergara Cardozo Universidad Nacional de Colombia (Bogot)
Jorge Mario Ramrez Osorio Universidad Nacional de Colombia (Medelln)
Adolfo Quiroz Universidad de los Andes
Mara Elsa Correal Universidad de los Andes
Vctor Hugo Prieto Universidad Antonio Nario
Sandra Gutirrez Meza Universidad de Cartagena
Csar Serna Universidad Central
Francisco Zuluaga Universidad Eafit (Medelln)
Germn Moreno Arenas Universidad Industrial de Santander
lvaro Calvache Archila Universidad Pedaggica y Tecnolgica de Colombia
Carmen Helena Cepeda Araque Universidad Pedaggica y Tecnolgica de Colombia
Sandra Patricia Crdenas Ojeda Universidad Pedaggica y Tecnolgica de Colombia
Martha Corrales Universidad Sergio Arboleda
Organized by:
Sponsors:
Contents
Introduction 1
Welcome 3
Short courses 5
Plenary lectures 9
Thematic sessions 15
Contributed talks 53
Contributed talks 1 54
Contributed talks 2 58
Contributed talks 3 61
Contributed talks 4 66
Contributed talks 5 70
Contributed talks 6 72
Contributed talks 7 76
Contributed talks 8 80
Contributed talks 9 83
Contributed talks 10 86
Contributed talks 11 90
Contributed talks 12 93
Contributed talks 13 97
Contributed talks 14 100
Contributed talks 15 102
Contributed talks 16 106
Posters 109
Welcome
There is an interesting variety of topics for the thematic sessions in this version
of the event: Causal Inference, Data driven penalty calibration, Fragmentation
processes, Functional data, Hypoelliptic diffusions, Mixed and Joint Modeling,
Optimal designs, Particle systems, Probability and statistics in finance, Ran-
dom segmentation models, Random trees and applications, Robust statistics,
Sampling methods This edition of the event count with the participation of
3
CLAPEM 2014 Universidad Nacional de Colombia
There will be more than sixty (60) oral contributions and more than one hundred
(100) poster contributions. Participants are coming from all over the world. We
have confirmed the participation of people from Argentina, Brazil, Chile, Colom-
bia, Costa Rica, Cuba, Denmark, France, India, Peru, Spain, Switzerland, United
Kingdom, USA, Uruguay and Venezuela, among others.
4
Short courses
CLAPEM 2014
Embrechts Paul
Department of Mathematics ETH Zrich,
Switzerland.
Abstract
In this course, we discuss the main statistical tools for the quantitative analysis
of solvency (risk capital) for banks and insurance companies. Topics included
are:
Alison Etheridge
Oxford University, U.K.
Abstract
We provide an introduction to some mathematical models that arise in theo-
retical population genetics. These fall into two classes: forwards-in-time models
for the evolution of frequencies of different genetic types in a population; and,
backwards-in-time (coalescent) models that trace out the genealogical rela-
tionships between individuals in a sample from the population. Some, like the
Wright-Fisher model, date right back to the origins of the subject. Others, like
7
CLAPEM 2014 Universidad Nacional de Colombia
Regina Liu
Rutgers University, USA.
Abstract
A confidence distribution (CD) is a sample-dependent distribution function
that can be used as an estimate for an unknown parameter. It can be viewed
as a distribution estimator of the parameter. CDs have been shown be effec-
tive tools in statistical inference. Specifically, we discuss the usefulness of CDs
in: exact meta-analysis approach for discrete data and its application to 2 x 2
tables with rare events, combining heterogeneous studies using only summary
statistics, combining the test results from independent studies, and providing
efficient network meta-analysis.
8
Plenary lectures
CLAPEM 2014
Gerard Biau
Universit Pierre et Marie Curie and Institut
Universitaire de France.
Abstract
Modern learning architectures must be flexible enough to accommodate the
ever increasing size of datasets involved in the Big Data regime. Drawing
inspiration from the theory of distributed computation models developed in
the context of gradient-type optimization algorithms, I will present a consen-
sus-based asynchronous distributed solution for nonparametric online regres-
sion and analyze some of its asymptotic properties. A companion software
implemented in Go (an open source native concurrent programming language
developed at Google Inc.) is also delivered. Substantial numerical evidence
involving up to 44 parallel processors is provided on synthetic datasets to assess
the excellent performance of the method, both in terms of computation time
and prediction gains.
Abstract
This paper introduces the concept of random context representations for the
transition probabilities of a finite-alphabet stochastic process. Processes with
these representations generalize context tree processes (a.k.a. variable length
Markov chains), and are proven to coincide with processes whose transition
probabilities are almost surely continuous functions of the (infinite) past. This
is similar to a classical result by Kalikow about continuous transition prob-
abilities. Existence and uniqueness of a minimal random context representa-
tion are proven and an estimator of the transition probabilities based on this
11
CLAPEM 2014 Universidad Nacional de Colombia
Carenne Ludea
Escuela de Matemticas, Facultad de Ciencias, UCV.
Abstract
Multifractal models seem to appear everywhere. They have been successfully
used in applications ranging from natural phenomena such as turbulence, rain-
fall or earthquakes to man made data in finance or internet traffic and are asso-
ciated to anomalous scaling: that is, a nonlinear scaling law for the moments of
the processes increments over finite time intervals. However, multifractals are
characterized by a multiplicity of local Hlder exponents within any finite time
interval also. In fact, both concepts are intimately related by what has come to
be known as the multifractal formalism, where the scaling function and the
spectrum of the Hlder exponents, or multifractal spectrum, are obtained as
the Legendre transform of each other under certain conditions. In practice,
although models tend to be described by their multifractal spectrum, most esti-
mation procedures for multifractal models are based on the estimation of the
scaling function. However, this turns out to be a non trivial problem as estima-
tion based on the empirical moments is intrinsically biased. In this talk, we
will run through several key concepts, models and current developments for
multifractal processes from a statistics point of view.
12
CLAPEM 2014
Thomas Mikosch
University of Copenhagen.
(Richard A. Davis, Columbia NY and Oliver Pfaffel,
Munich).
Abstract
We give an asymptotic theory for the eigenvalues of the sample covariance
matrix of a multivariate time series. The time series constitutes a linear process
across time and between components. The input noise of the linear process has
regularly varying tails with index (0,4); in particular, the time series has an
infinite fourth moment. We derive the limiting behavior for the largest eigen-
values of the sample covariance matrix and show point process convergence of
the normalized eigenvalues. The limiting process has an explicit form involving
points of a Poisson process and eigenvalues of a non-negative definite matrix.
Based on this convergence, we derive limit theory for a host of other continuous
functionals of the eigenvalues, including the joint convergence of the largest
eigenvalues, the joint convergence of the largest eigenvalue and the trace of the
sample covariance matrix, and the ratio of the largest eigenvalue to their sum.
Vctor Rivero
Centro de Investigacin en Matemticas A.C.,
Mxico.
Abstract
Let X be a real valued Lvy process that is in the domain of attraction of
a stable law. In the first part of this talk, we will consider the case where
X is non-monotone. As an analogue of the random walk results in [4] and
13
CLAPEM 2014 Universidad Nacional de Colombia
[1] we will describe the local behaviour of the distribution of the lifetime
under the characteristic measure n of excursions away from 0 of the process
X reflected in its past infimum, and of the first passage time of X below 0,
T0 = inf{t > 0 : Xt < 0}, under x (), for x > 0, in two different regimes for x,
viz. x = o(c()) and x > Dc(), for some D > 0. We sharpen our estimates by
distinguishing between two types of path behaviour, viz. continuous passage
at T0 and discontinuous passage. Some sharp local estimates for the entrance
law of the excursion process associated to X reflected in its past infimum will
be described. In the second part of the talk, we will describe the case where X
is non-incresing, i.e. is a subordinator.
Keywords: Lvy processes, first passage time distribution, local limit theorems,
fluctuation theory.
References
1. Doney, R. A. (2012). Local behaviour of first passage probabilities. Proba-
bility Theory and Related Fields, 152(3-4), 559-588.
14
Thematic sessions
CLAPEM 2014
Somnath Datta
University of Lousville, USA.
(Bandyopadhyay and Satten).
Abstract
17
CLAPEM 2014 Universidad Nacional de Colombia
Thomas Richardson
Professor and Chair
Department of Statistics University of Washington.
(James M. Robins, Harvard School of Public Health,
USA).
Abstract
Models based on potential outcomes, also known as counter factuals, were
introduced by Neyman (1923) and later applied to observational contexts by
Rubin (1974). Such models are now used extensively within Biostatistics, Sta-
tistics, Political Science, Economics, and Epidemiology for reasoning about
causation. Directed acyclic graphs (DAGs), introduced by Wright (1921) are
another formalism used to represent causal systems. Graphs are extensively
used in Computer Science, Bioinformatics, Sociology and Epidemiology.
In this talk, I will present a simple approach to unifying these two frame-
works via a new graph, termed the Single-World Intervention Graph (SWIG).
The SWIG encodes the counter factual independences associated with a spe-
cific hypothetical intervention on a set of treatment variables. The nodes on
the SWIG are the corresponding counter factual random variables. The SWIG
is derived from a causal DAG via a simple node splitting transformation. I
will illustrate the theory with a number of examples. Finally we illustrate that
SWIGs avoid a number of pitfalls that are present in an alternative approach to
unification, based on twin networks that has been advocated by Pearl (2000).
Links
Short paper: http://www.statslab.cam.ac.uk/ rje42/uai13/Richardson.pdf
18
CLAPEM 2014
James Robins
Harvard School of Public Health, USA.
Abstract
I describe recent advances in the theory of estimation with higher order influ-
ence functions. The theory is a theory of point and interval estimation for non-
linear functionals in parametric, semi-, and non-parametric models that applies
equally to both root-n and non-root-n problems. The theory reproduces results
previously obtained by the modern theory of non-parametric inference, pro-
duces many new non-root-n results, and most importantly, opens up the ability
to perform non-root-n inference in complex high dimensional models, such as
models for the estimation of the causal effect of time varying treatments in the
presence of time varying confounding and informative censoring. Higher order
influence functions are higher order U-statistics. The theory extends first order
semiparametric theory based on first order influence functions. I will describe
recent results on constructing tests of independence between two random vari-
ables that are rate-optimal against certain natural omnibus alternatives.
19
CLAPEM 2014 Universidad Nacional de Colombia
Stijn Vansteelandt
Ghent University, Belgium.
(Ghent University, Belgium and Vanessa Didelez,
University of Bristol, U.K.).
Two-stage least squares estimators and variants thereof are widely used in
econometrics and beyond to infer the effect of an exposure on an outcome
using data on instrumental variables. In biostatistics, a separate literature on
instrumental variable estimation has developed, which uses double-robust
G-estimators in so-called structural mean and distribution models instead.
These are consistent when either a working model for the distribution of the
instrumental variable (given covariates) or a working model for the (counter-
factual exposure-free) outcome mean (given covariates) is correctly specified,
but not necessarily both. We examine the performance of locally efficient dou-
ble-robust G-estimators in simulation studies, and find it to be sometimes poor
under model misspecification. We therefore propose adaptive G-estimation
procedures which improve efficiency under misspecification of one working
model, and reduce bias under misspecification of both working models. Sim-
ulation studies demonstrate drastic improvements relative to locally efficient
G-estimators as well as two-stage least squares estimators.
20
CLAPEM 2014
Karine Bertin
CIMFAV, Universidad de Valparaso, Chile.
(Nicolas Klutchniko, ENSAI, Rennes, France.)
Abstract
We studied the estimation of the common marginal density function of weakly
dependent stationary processes. The accuracy of estimation is measured using
pointwise risks. We propose a data-driven procedure using kernel rules. The
bandwidth is selected using the approach of Goldenshluger and Lepski and we
prove that the resulting estimator satisfies an oracle type inequality. The pro-
cedure is also proved to be adaptive (in a minimax framework) over a scale
of Hlder balls for several types of dependence: classical econometrics models
such as GARCH as well as dynamical systems and i.i.d. sequences can be con-
sidered using a single procedure of estimation. Some simulations illustrate the
performance of the proposed method.
21
CLAPEM 2014 Universidad Nacional de Colombia
Claire Lacour
Paris Sud Orsay.
(Karime Bertin, CIMFAV, Chile and Vincent
Rivoirard, Universit Paris Dauphine, France).
Abstract
This talk is devoted to a calibrated method for estimating a conditional den-
sity. We consider a sample of independent and identically distributed observa-
tions ( Xi ,Yi )1i n and we are interested in the conditional density of Yi given Xi,
defined by
f (x , y )dy = P (Yi dy | Xi = x ).
22
CLAPEM 2014
Abstract
In this talk, we consider a toy example of an optimal stopping problem driven
by fragmentation processes. We show that one can work with the concept of
stopping lines to formulate the notion of an optimal stopping problem and
moreover, to reduce it to a classical optimal stopping problem for a general-
ized Ornstein-Uhlenbeck process associated with Bertoins tagged fragment.
We go on to solve the latter using a classical verification technique thanks to
the application of aspects of the modern theory of integrated exponential Lvy
processes. (Join work with A. Kyprianou).
23
CLAPEM 2014 Universidad Nacional de Colombia
References
1. R. Abraham, and J.-F. Delmas, A continuum-tree-valued Markov process,
Ann. Probab. 40 (2012), no. 3, 1167|1211.
24
CLAPEM 2014
Discretized nonparametric
regression for functional data
Pamela Llop
Facultad de Ingeniera Qumica (UNL) and
Instituto de Matemtica Aplicada del Litoral (UNL -
CONICET), Argentina.
(Liliana Forzania, Facultad de Ingeniera Qumica
(UNL) and Instituto de Matemtica Aplicada del
Litoral (UNL - CONICET), Argentina; and Ricardo
Fraiman Universidad de la Repblica, Uruguay)
Abstract
Technological progress in collecting and storing data has provided data sets
recorded at finite grids of points which, thanks to the new technologies,
become increaisngly more denser and denser over time. Although in practice
data always come in the form of finite dimensional vectors, from the theoreti-
cal point of view, the classic multivariate techniques are not suitable to deal
with this kind of data. In this direction, the asymptotic theory can be analyzed
either assuming the existence of continuous underlying stochastic processes
ideally observed at every point, or transforming the (observed) discrete values
into functions via interpolation (error less case), smoothing (if error is present),
splines or series approximations. When dealing with the regression problem
for discretized functional data, a natural question that emerges is which the
relationship between the ideal nonparametric regression estimate computed
with the entire curve and the one computed with the discretized sample. In
this direction, we state conditions under which the consistency of the estimator
computed with the discretized trajectories can be derived from the consistency
of the one based on the whole curves. Also, we give conditions on the grid size
discretization in order to achieve the same rates of convergence as in infinite
dimensional setting. Those results are consequence of two more general results
25
CLAPEM 2014 Universidad Nacional de Colombia
which, besides discretization, also includes the case of smoothing via regular-
ization, basis representation or interpolation data.
Hans-Georg Mller
University of California, Davis, USA.
(K. Chen, University of Pittsburgh, USA and P.
Delicado Universitat Politcnica de Catalunya,
Barcelona, Spain).
Abstract
Repeatedly observed functional data are encountered in various applications.
These include demographic trajectories observed for each calendar year. A
previous conditional double functional principal component approach to rep-
resent such processes poses complex problems for both theory and applica-
tions. A simpler and more interpretable approach can be based on a marginal
rather than conditional functional principal component representation of the
underlying function valued processes. An additional assumption of common
principal components leads to the special case of a simple tensor product rep-
resentation. For samples of independent realizations of the underlying func-
tion-valued stochastic process, this approach leads to straightforward fitting
methods for obtaining the components of these models. The resulting estimates
can be shown to satisfy asymptotic consistency properties. The proposed meth-
ods are illustrated with an application to trajectories of fertility that are repeat-
edly observed over many calendar years for 17 countries.
26
CLAPEM 2014
Jane-Ling Wang
Department of Statistics, University of California,
Davis, USA.
(Xiaoke Zhang, University of California, Davis
USA).
Abstract
Functional data analysis (FDA) deals with the analysis of a sample of func-
tions or curves. Traditional multivariate principal component analysis (PCA)
has been successfully extended to the functional setting and the core issue is the
estimation of the mean function and covariance surface of the functional data.
The methodology and theory often vary and depend on the sampling plan of
the functional data. In this talk, we focus on a unified approach and theory that
can handle any sampling plan and different weighing schemes in the functional
PCA approaches. The theory leads to interesting types of asymptotic behavior
depending on the sampling plan, which also has an effect on the performance
of different weighing schemes. Two commonly adopted weighing schemes are
compared.
27
CLAPEM 2014 Universidad Nacional de Colombia
Stphane Menozzi
Universite dEvry Val dEssonne.
Abstract
In this talk, we will present various techniques for studying processes of the
form:
t t
Xt1 = x1 + F1 (s, X s )ds + (s, X s )dWs ,
0 0
t
X = x2 + F2 (s, X s )ds,
2
t
0
t
X = x3 + F3 (s, X s2 ,, X sn )ds
3
t
0
t
Xtn = xn + Fn (s, X sn1 , X sn ),
0
28
CLAPEM 2014
Clmentine Prieur
Universit Joseph Fourier - Grenoble I
Jos Len
Universidad Central de Venezuela
Abstract
In this work, we are interested in harmonic oscillators perturbed with a
gaussian white noise. More precisely, we consider (Zt = (xt , yt ) 2d , t 0)
governed by the following.
dxt = yt
dyt = IdWt (c(xt , yt ) yt + V (xt ))
We assume that the process is ergodic with a unique invariant probability mea-
sure m, and that the convergence in the ergodic theorem is quick enough. We
also discuss sufficient conditions for this. For such oscillators, we aim at study-
ing inference issues such as the estimation of the density of the invariant prob-
ability measure m, as far as the estimation of the drift or the variance term.
One major issue in our study is that we work with incomplete data, observing
only the first coordinate X. Thus we approximate the Y component by finite
differences. Even in case the potential is the Duffings one V (x ) = x / 4 x 2 / 2
(Kramers oscillator) this problem is not easy. We focus on non-parametric
inference, see [1,2, 3].
References
29
CLAPEM 2014 Universidad Nacional de Colombia
Marc Lavielle
Inria Saclay, Popix team, France
Abstract
Population models describe biological and physical phenomena observed in
each of a set of individuals, and also the variability between individuals. This
approach finds its place in domains like pharmacometrics when we need to
quantitatively describe interactions between diseases, drugs and patients. This
means developing models that take into account that different patients react
differently to the same disease and the same drug. The population approach can
be formulated in statistical terms using mixed effects models.
We will see how the framework allows us to represent models for many dif-
ferent data types including continuous, categorical, count and time-to-event
data. This opens the way for the use of quite generic methods for modeling and
simulating these diverse data types.
Mlxtran is also used by Simulx, a R and Matlab function for easily comput-
ing predictions and simulating data from such complex mixed effects models
(https://team.inria.fr/popix/mlxtoolbox).
30
CLAPEM 2014
Carles Serrat
Universitat Politcnica de Catalunya-Catalonia,
Spain
Abstract
The aim of this presentation is to review joint modelling techniques for the
simultaneous analysis of timetoevent data and longitudinal timevarying
data. This is an increasing area of interest in both the methodological and the
applied point of view and it allows the analysis and understanding of complex
systems.
Among others, three main advantages of this approach are: a) it corrects the
bias derived from a traditional separate analysis, b) the modelization allows to
incorporate and model the between and within correlation among observations
and, c) true longitudinal profiles for endogenous covariates can be included in
the relative hazard survival submodel.
The relevant benefit of these models is being able to estimate the effect of each
subjectspecific longitudinal profile in the hazard function for the event of
interest, in an adaptive manner. In particular, subjectspecific dynamic pre-
dictions, like conditional survival functions given the available longitudinal
information, can be derived.
Keywords: joint modeling, shared random effects models, relative risks models
31
CLAPEM 2014 Universidad Nacional de Colombia
Timothy E. OBrien
Department of Mathematics and Statistics, Loyola
University Chicago.
Abstract
Analysis of multi-category response data in which the multinomial dependent
variable is linked to selected covariates includes several rival models. These
models include the adjacent category (AC), baseline category logit (BCL), two
variants of the continuation ratio (CR), and the proportional odds (PO). For a
given set of data, the fits and predictions associated with these various models
can vary quite dramatically as can the associated optimal designs (which are
then used to estimate the respective model parameters).
32
CLAPEM 2014
Abstract
The purpose of this conference is to present a procedure for constructing opti-
mal designs for simultaneous parameter estimation and model discrimination
in the context of nonlinear mixed models effects. The compound design crite-
rion is considered. This design criterion is formed by maximizing a weighted
average which depends on different Fisher information matrices. A numeri-
cal example shows the properties of the procedure. The relationship with other
design procedures for parameter estimation and model discrimination is dis-
cussed.
33
CLAPEM 2014 Universidad Nacional de Colombia
Joaqun Fontbona
Universidad de Chile, CMM.
(Roberto Cortez, Universidad de Chile).
Abstract
We study a class of one dimensional mean field particle systems with binary
interactions, which includes Kacs simplied model of the Boltzmann equation
and some kinetic models for the evolution of wealth distribution. We obtain
explicit rates of convergence, as the total number of particles goes to , for the
Wasserstein distance between the law of a particle and its limiting law, which
depend linearly on time. The proof is based on a novel coupling between the
particle system and a suitable system of non-independent nonlinear processes,
constructed with tools from optimal mass transportation, and relies also on
recently obtained sharp estimates for empirical measures of i.i.d or exchange-
able random variables. The obtained rates are compared with known conver-
gence rates for the lees physical Nanb particle approximations of the Kac
equation, in which each pair interaction has an effect on only one of the parti-
cles. Possible extensions (including to Boltzmanns equation) are also discussed.
34
CLAPEM 2014
Christophe Gallesco
Unicamp, Brazil
Abstract
We model the transmission of information of a message on the ErdsRnyi
random graph with parameters (n,p) and limited resources. The vertices of
the graph represent servers that may broadcast a message at random. Each
server has a random emission capital that decreases by one at each emission.
We examine two natural dynamics: in the first dynamics, an informed server
performs its attempts, then checks at each of them if the corresponding edge is
open or not; in the second dynamics the informed server knows a priori who
its neighbors are, and it performs all its attempts on its actual neighbors in the
graph. In each case, we obtain first and second order asymptotics (law of large
numbers and central limit theorem), when n and p is fixed, for the final propor-
tion of informed servers.
Abstract
We consider the problem of sampling from quasistationary distributions of
finite state Markov chains. Our perspective is computational and inspired
by the literature on Markov chain mixing times, where a small mixing time
implies (in an appropriate computational model) that one can approximate the
stationary distribution with moderate computational effort.
35
CLAPEM 2014 Universidad Nacional de Colombia
Samy Tindel
Universite de Lorraine, France.
Abstract
In this talk, we will first justify the use of fractional Brownian motion as a driv-
ing noise for differential systems in several applied situations, with a special
emphasis on finance models. We will then introduce the main ideas of the so-
called rough path theory, which allow to solve differential equations driven by
a general class of stochastic processes. Finally, we will give an account on some
recent density estimates concerning these objects.
David Mrquez
Universidad de Barcelona, Spain
Abstract
In this paper, we study the existence of a unique solution for linear stochastic
differential equations driven by a Lvy process, where the initial condition and
the coefficients are random and not necessarily adapted to the underlying filtra-
tion. Towards this end, we extend the method based on Girsanov transforma-
36
CLAPEM 2014
tions on Wiener space and developped by Buckdahn [1] to the canonical Lvy
space, which is introduced in [2].
References
1. Buckdahn, R. (1989). Transformations on the Wiener space and Skorohod-
type stochastic differential equations. Seminarberichte [Seminar reports]
105. Humboldt Universitt, Sektion Mathematik. MR-1033989.
2. Sol, J. L., Utzet, F. and Vives, J. (2007). Canonical Lvy processes and
Malliavin calculus. Stochastic Processes and Their Applicacions, 117, 165-
187. MR-2290191.
Estimation in high-dimensional
random geometric graph
Sebastien Bubeck
Princeton University.
(Jian Ding, Ronen Eldan and Miklos Racz).
Abstract
We consider a random graph model where connections depend on unknown
d-dimensional labels (or feature vectors) for the vertices. Upon the observation
of a realization from this model, we are interested in estimating the unknown
dimension d of the feature vectors. We propose a new statistic, based on signed
triangles, which can successfully estimate dimensions as large as n2 (where n
is the number of vertices), while a simple count of triangles would only work
up to dimension of order n. We also show that n2 is optimal, using a new bound
on the total variation distance between Wish art matrices and the Gaussian
Orthogonal Ensemble.
37
CLAPEM 2014 Universidad Nacional de Colombia
Luc Devroye
McGill University
Abstract
Kademlia is the facto standard searching algorithms for P2P networks on the
Internet, which is used by millions of users every day (especially those who like
free downloads). We explain this random graph model, and analyze its proba-
bilistic performance.
Alberto Contreras-Cristn
Department of Probability and Statistics, IIMAS-
UNAM, Mexico.
Abstract
Within a Bayesian nonparametric framework, we propose to use a Poisson-
Dirichlet process mixture model in order to produce clustering on a set of time
series. In a first stage, the series are modeled using a hierarchical linear regres-
sion that accommodates levels, trends, seasonal and time dependent com-
ponents. Each of these features has an associated parameter. Then, for prior
specification, some these parameters are assumed to follow a Poisson-Dirichlet
process. Since such semi parametric prior distributions give realizations which
38
CLAPEM 2014
are almost surely discrete, we use this feature and cluster the time series follow-
ing the clustering structure of the posterior samples from the feature param-
eters described above. A simulation study allows us to choose which of the
parameters related to levels, trends and seasonality are useful for clustering,
thus providing a flexible framework since different sets of series can be clus-
tered using different characteristics.
Abstract
Change point detection models aim to determine the most probable grouping
for a given sample indexed on an ordered set. For this purpose, we pro- pose
a methodology based on exchangeable partition probability functions, specifi-
cally on Pitmans sampling formula. Emphasis will be given to the Markov-
ian case, in particular for discretely observed Ornstein-Uhlenbeck diffusion
processes. Some properties of the resulting model are explained and posterior
results are obtained via a novel MCMC algorithm.
39
CLAPEM 2014 Universidad Nacional de Colombia
David Belius
McGill University/Centre de Recherches
Mathmatiques
(Nicola Kistler, City University of New York, College
of Staten Island)
Abstract
The epsilon-cover time of the two dimensional torus by Brownian motion is the
time it takes for the process to come within distance epsilon > 0 from any point.
Its leading order in the small epsilon-regime has been established by Dembo,
Peres, Rosen and Zeitouni [Ann. of Math., 160 (2004)]. In this talk I will pres-
ent a recent result identifying the second order correction. This correction term
arises in an interesting way from strong correlations in the field of occupation
times, and in particular from an approximate tree structure in this field. Our
method draws on ideas from the study of the extremes of branching Brownian
motion.
40
CLAPEM 2014
Nicolas Fraiman
University of Pennsilvanya, Philadelphia USA.
(Luc Devroye and Dieter Mistche).
Abstract
In this talk we describe the connectivity threshold, the diameter, and metric
properties of inhomogeneous random graphs. In this model edges are present
independently but with unequal probabilities. We generalize results known for
the ErdsRnyi model G(n,p) for several ranges of p.
Abstract
Stochastic ordering among distributions has been considered in a variety of sce-
narios. Economic studies often involve research about the ordering of invest-
ment strategies or social welfare. However, as noted in the literature, stochastic
orderings are often a too strong assumption which is not supported by the data
even in cases in which the researcher tends to believe that a certain variable
is somehow smaller than another. Instead of considering this rigid model of
41
CLAPEM 2014 Universidad Nacional de Colombia
Vctor J. Yohai
Universidad de Buenos Aires, Argentina
(Ricardo A. Maronna, Universidad Nacional de La
Plata, Argentina).
Abstract
Good robust estimators can be tuned to combine a high breakdown point and
a specified asymptotic efficiency at a central model. This happens in regres-
sion with MM- and tau estimators among others. However, the finite-sam-
ple efficiency of these estimators can be much lower than the asymptotic one.
To overcome this drawback, an approach is proposed for parametric models,
which is based on a distance between parameters. Given a robust estimator, the
proposed one is obtained by maximizing the likelihood under the constraint
that the distance is less than a given threshold. For the linear model with nor-
mal errors and using the MM estimator and the distance induced by the Kull-
back-Leibler divergence, simulations show that the proposed estimator attains
a finite-sample efficiency close to one, while its maximum mean squared error
is smaller than that of the MM estimator. The same approach also shows good
results in the estimation of multivariate location and scatter.
42
CLAPEM 2014
Hlne Boistard
Universit Toulouse 1.
(Hendrik P. Lopuhaa, Delft University of
Technology, Netherlands; Anne Ruiz-Gazen,
Universit Toulouse 1).
Abstract
For rejective sampling, an expansion of joint inclusion probabilities of any
order is obtained in terms of the inclusion probabilities of order one, extend-
ing previous results by Hajek and making the remainder term more precise.
The main result is applied to derive bounds on higher order correlations, which
are needed for the consistency and asymptotic normality of several complex
estimators.
43
CLAPEM 2014 Universidad Nacional de Colombia
Andrs Gutirrez
Universidad Santo Toms, Bogot, Colombia
(Leonardo Trujillo, Universidad Nacional de
Colombia, Bogot, Colombia and Pedro Luis
do Nascimento Silva, IBGE, Escola Nacional de
Ciencias Estatisticas, Rio de Janeiro, Brazil).
Abstract
Rotating panel surveys are used to calculate estimates of gross flows between
two consecutive periods of measurement. This paper considers a general proce-
dure for the estimation of gross flows when the rotating panel survey has been
generated from a complex survey design with random nonresponse. A pseudo
maximum likelihood approach is considered through a two-stage model of
Markov chains for the allocation of individuals among the categories in the
survey and for modeling for nonresponse.
44
CLAPEM 2014
Abstract
A compositional time series is a multivariate time series in which each of the
series has values bounded between zero and one and the sum of the series
equals one at each time point. This paper presents the state-space approach for
modelling compositional time series from the Brazilian Labour Force Survey
(BLFS) taking into account the sampling errors. The BLFS is a rotating panel
survey in which the rotation pattern applies to panels of households. Within
each rotation group, a panel of households stays in the sample for four succes-
sive months, is rotated out for the following 8 months and is sampled again for
another four successive months. The survey collects monthly information about
employment according to the International Labour Organization (ILO) defini-
tions. The modelling procedure produces estimates for the vector of employed,
unemployed and not in the labour force and also for the unemployment rate
series with corresponding estimates for seasonals and trends. The model pro-
vides bounded predictions and estimates satisfying the unity-sum constraint
while taking into account the sampling errors and the correlation structured
implied by the survey rotation pattern.
45
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
Traditional statistical treatments for assessing model validation or goodness
of fit suffer of a serious drawback arising from the fact that testing hypoth-
esis theory is designed to provide evidence to reject the hypothesis. This means
that with their use we will not be able to confirm the model; instead, at most,
we would get just lack of statistical evidence to reject it. In the pure robust-
ness framework, the consideration of contamination neighborhoods leads to
an appropriate statement of similarity between probabilities. This concept can
be applied, in a fully non-parametric setting, to two-sample problems involv-
ing homogeneity or stochastic order between the parent distributions. Our
approach resorts to probability metrics and trimming techniques that allow
both: mathematical treatment and feasibility of the involved procedures. By
measuring the level of similarity we can address the problem of model vali-
dation looking for the approximate validity of our goal as the alternative in a
suitable test.
46
CLAPEM 2014
Alfio Marazzi
Facult de Biologie et Medecine, Universit de
Lausanne.
Abstract
I will review a series of joint papers with Victor Yohai about robust paramet-
ric estimates of regression with positive asymmetrically distributed censored
(or non censered) responses. All estimates are based on a two-step paradigm.
In the first step, a very robust (high breakdown point) initial estimate is com-
puted. The initial estimate is used to identify the bulk of the data and the outli-
ers. In the second step, outliers are down weighted or removed and an efficient
weighted estimate is computed which maintains the degree of robustness of
the initial one. We consider a class of asymmetric models for the log-response
distribution which includes location-scale models (e.g., logweibull), location-
scale-shape models (e.g., generalized loggamma) as well as other models (e.g.,
negative binomial). Typical initial estimates are S estimates (that minimize
M-scales of the residuals) and Qtau estimates (that minimize tau scales of the
differences between empirical and model based quantiles). The class of final
estimates includes weighted likelihood and truncated likelihood estimates
which asymptotically approach the maximum likelihood estimates when the
models are correct.
47
CLAPEM 2014 Universidad Nacional de Colombia
Solesne Bourguin
Department of Mathematical, Sciences, Carnegie
Mellon University.
(Giovanni Peccati).
Abstract
This talk will focus on recent developments in the study of the free Poisson alge-
bra, namely a new multiplication formula, as well as non-commutative diagram
formulas. These results are key to studying semicircular limits for non-commu-
tative random variables on this space, on which a fourth moment theorem has
been shown to hold.
Abstract
In this talk, we use the techniques of Malliavin calculus to obtain an expression
for the short-time behavior of the at-the-money implied volatility skew for a
general jump-diffusion stochastic volatility model. Here we will consider the
following three cases:
48
CLAPEM 2014
2) The volatility process is correlated not only with the Brownian motion dri-
ving the asset price, but also with the asset price jumps.
3) The strike is adapted to the the filtration generated by the Brownian motion
driving the asset price.
Abstract
We shall show an interesting connection between potential theory on discrete
spaces and the M-matrix problem from linear algebra. This relation allows us
to show some important results in matrix analysis as well as to give new insight
to potential theory. In the other direction, some results from linear algebra have
important implications in stochastic analysis. We will discuss some possible
applications.
49
CLAPEM 2014 Universidad Nacional de Colombia
Alejandro Cholaquidis
Universidad de la Repblica, Uruguay
Abstract
A domain S in Rd is said tofulfilthe Poincar cone property if any point in the
boundary of S is the vertex of a (finite) cone which does not otherwise intersects
the closure of S. For more than a century,this condition has played a relevant
role in the theory of partial differential equations, as a shape assumption aimed
to ensure the existence of a solution for the classical Dirichlet problem. In the
talk, in a completely different setting, I willanalysesome statistical applications
of the Poincar cone property (when defined in a slightly stronger version). I
will show that this condition can be seen as a sort of generalized convexity:
while it is considerably less restrictive than convexity, it still retains some ``con-
vex flavour. In particular, when imposed to a probability support S, this prop-
erty allows the estimation of S from a random sample of points, using the ``hull
principle much in the same way as a convex support is estimated using the
convex hull of the sample points. The statistical properties of such hull estima-
tor (consistency, convergence rates) will be presented in detail. It will be shown
that the class of sets fulfilling the Poincar property is a P-Glivenko-Cantelli
class for any absolutely continuous distribution P on Rd. Finally, an algorithm
to approximate the cone-convex hull of a finite sample of points will be pro-
posed and some practical illustrations will be given.
50
CLAPEM 2014
Julian Martinez
Universidad de Buenos Aires
(Roberto Fernndez and Frank den Hollander).
Abstract
We discuss the concept of Gibbs/ non-Gibbs measure in the lattice together
with its extension to the mean field / local-mean field context, and the emer-
gence of dynamical Gibbs-non-Gibbs transitions under independent spin-flip
(infinite-temperature) dynamics. We show that these dynamical transitions
are equivalent to bifurcations in the set of global minima of the large-deviation
rate function describing optimal conditioned trajectories of the empirical den-
sity. Possible bifurcation scenarios are fully determined in the mean field case,
yielding a full characterization of passages from Gibbs to non-Gibbs-and vice
versa- with sharp transition times.
51
Contributed talks
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 1
Abstract
Random sea waves are often modeled as stationary processes for short or mod-
erately long periods of time and therefore the problem of detecting changes in
the sea state is very important. In general, the sea state can be regarded as a
sequence of stationary and transition (between stationary) periods of time. Seg-
mentation and change-point methods have been widely used classify or identify
both types of periods. However, very often these methods fail when changes
occur slowly over a period of time, as is the case in most cases related to the
sea state. We look at this problem from the spectral point of view proposing a
method that considers processes normalized to have unit variance and looks
at changes in the energy distribution through the energy spectra by looking at
their total variation distance. This distance measures the difference between
two probability densities by determining how much they have in common, or
equivalently, how much one of them has to be modified to coincide with the
other, and the spectrum of a normalized process can be seen as the probability
density of the energy distribution. The series of wave height measures is divided
into intervals of 30 minutes and for each the spectral density is estimated. Then,
the above distances are computed to obtain a matrix of distances. Different
clustering methods over this dissimilarity matrix are explored, including data
driven trimmed clustering methods in order to take into account the heteroge-
neity introduced by the existence of the transition periods. We present simula-
tion studies to validate the proposed method as well as examples of applications
to real data.
54
CLAPEM 2014
Ignacio Lobato
Instituto Tecnolgico Autnomo de Mxico.
(Carlos Velasco, Universidad Carlos III de Madrid).
Abstract
This article introduces frequency domain procedures for performing inference
in general time series linear models. We allow for possibly noninvertible and/
or noncausal processes in the absence of information on these potential non-
fundamentalness properties. We use information from higher order moments
to achieve identification on the location of the roots of the AR and MA poly-
nomials for non-Gaussian time series. We propose a minimum distance esti-
mator that combines the information contained in second, third, and fourth
moments. Contrary to existing estimators, the proposed estimator is consistent
under general assumptions, and can be computed in one single step. For the
standard causal and invertible ARMA model with non-Gaussian innovations,
our estimator can be asymptotically more efficient than Gaussian-based pro-
cedures, such as the Whittle estimator. For cases where Gaussian-based pro-
cedures are inconsistent, such as noncausal or noninvertible ARMA models,
the proposed estimator is consistent under general assumptions. The proposed
procedures also overcome the need to use tests for causality or invertibility.
55
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
Clusters of large values are observed in sample paths of certain open-loop
threshold autoregressive (TAR) stochastic processes. Three types of marginal
conditional distributions of the underlying stochastic process are outlined in
this paper in order to characterize the stochastic mechanism that generates this
empirical TAR-model stylized fact. One of them permits to find the conditional
variance function that explains the aforementioned stylized fact. As a byprod-
uct, a sufficient condition for having asymptotic weak stationarity in an open-
loop TAR stochastic process is derived.
56
CLAPEM 2014
Abstract
57
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 2
Hugo de la Cruz
Escola de Matemtica Aplicada-FGV.
(Felix Carbonell, Biospective / McGill University).
Abstract
Over the last few years, there has been growing and renewed interest in the
numerical study of Random Differential Equations (RDEs). On one hand,
this is motivated by the fact that RDEs have played an important role in the
modeling physical, biological, neurological and engineering phenomena and,
on the other, motivated by the usefulness of RDEs for the numerical analysis
of stochastic differential equations (SDEs)via the extant conjugacy property
between RDEs and SDEswhich allows to study stronger pathwise proper-
ties of SDEs driven by different kind of noises other than the Brownian. Given
that in most common cases no explicit solution of the equations is known, the
construction of computational methods for the treatment and simulation of
RDEs has become an important need. In this vein, the Local Linearization (LL)
approach is a successful technique that has been applied for defining numerical
integrators for RDEs. However, a major drawback of the obtained methods is
its relative low order of convergence; in fact it is twice the order of the moduli
of continuity of the driven stochastic process. The present work overcomes this
limitation by introducing a new exponential-based high order numerical inte-
grator for RDEs. For this, a suitable approximation of the stochastic processes
present in the random equation, together with the local linearization technique
and an adapted Pad method with scaling and squaring strategy are conve-
niently combined. In this way, a higher order of convergence can be achieved
(independent of the moduli of continuity of the stochastic process) while
retaining the dynamical and numerical stability properties of the low order LL
method. Results on the convergence and stability of the suggested method and
details on its efficient implementation are discussed. The performance of the
introduced method is further illustrated through computer simulations.
58
CLAPEM 2014
Allan Fiel
Departamento de Control Automtico Cinvestav-
IPN.
(Jorge A. Len, Departamento de Control
Automtico Cinvestav-IPN, David Mrquez-
Carreras, Universitat de Barcelona).
Abstract
We obtain a closed expression for the solution of a linear Volterra integral equa-
tion with an additive Hlder continuous noise and with a continuous function
as initial condition. We then discuss the stability of the solution via the frac-
tional calculus. As an application, we analyze the stability in the mean of some
stochastic fractional integral equations.
Joaqun Fontbona
CMM, University of Chile.
(Fabien Panloup, IMT-Toulouse, Universit Paul
Sabatier).
Abstract
We investigate the problem of the rate of convergence to equilibrium for ergo-
dic stochastic differential equations driven by fractional Brownian motion with
Hurst parameter H >1/ 2 and multiplicative noise component . When is
constant and for every H (0,1), it was proven by M. Hairer that, under some
mean-reverting assumptions, such a process converges to its equilibrium at a
rate of order t where (0,1) (depending on H ). The aim of this paper
is to extend these types of results to some multiplicative noise setting. More
59
CLAPEM 2014 Universidad Nacional de Colombia
precisely, we show that we can recover such convergence rates when H >1/ 2
and the inverse of the diffusion coefficient is a Jacobian matrix. The main
novelty of this work is a kind of extension of Foster-Lyapunov-like techniques
to this non-Markovian setting, which allows us to put in place an asymptotic
coupling scheme such as Hairers one, without resorting to deterministic con-
tracting properties.
Christian Olivera
IMECC-UNICAMP.
(Wladimir Neves UFRJ, IM-UFRJ).
Abstract
In this talk, we discuss a number of results on stochastic transport equations.
First, the main issue of uniqueness follows with more general assumptions
than in the deterministic case. For instance, the result of wellposedness for the
Cauchy problem under the Ladyzhenskaya-Prodi-Serrin condition. The initial-
boundary value problem, intrinsically more difficult than the Cauchy problem,
is also addressed in this talk. We consider in detail the stochastic trace result.
60
CLAPEM 2014
Contributed talks 3
Abstract
We have produced new reconstructions of Northern Hemisphere annually aver-
aged temperature anomalies back to 1000AD, based on a model that includes
external climate forcings and accounts for any long-memory features. Our
reconstruction is based on two linear models, with the first linking the latent
temperature series to three main external forcings (solar irradiance, green-
house gas concentration, and volcanism), and the second linking the observed
temperature proxy data (tree rings, sediment record, ice cores, etc.) to the unob-
served temperature series. Uncertainty is captured with additive noise, and a
rigorous statistical investigation of the correlation structure in the regression
errors motivates the use of long memory fractional Gaussian noise models for
the error terms. We use Bayesian estimation to fit the model parameters and to
perform separate reconstructions of land-only and combined land-and-marine
temperature anomalies. We quantify the effects of including the forcings and
long memory models on the quality of model fits, and find that long memory
models can result in more precise uncertainty quantification, while the exter-
nal climate forcings substantially reduce the squared bias and variance of the
reconstructions. Finally, we use posterior samples of model parameters to
arrive at an estimate of the transient climate response to greenhouse gas forc-
ings of 2.56 C (95% credible interval of [2.20, 2.95] C), in line with previous,
climate-model-based estimates.
61
CLAPEM 2014 Universidad Nacional de Colombia
Alejandra Christen
Pontificia Universidad Catlica de Valparaso.
(M. Anglica Mauln -Yaez, Pontificia Universidad
Catlica de Valparaso, Eduardo Gonzlez-Olivares,
Pontificia Universidad Catlica de Valparaso).
Abstract
62
CLAPEM 2014
Pedro A. Torres-Saavedra
Department of Mathematical Sciences, University of
Puerto Rico at Mayagez.
(Ral E. Macchiavelli, Department of Crops and
Agroenvironmental Sciences, University of Puerto
Rico at Mayagez).
Abstract
Severity in plant diseases is quantified as the amount of plant material affected
by the disease, and is usually expressed as a continuous variable in a 0-1 scale.
Since plant diseases are monitored throughout the crops lifecycle, the model-
ing of severity progress curves needs to incorporate the longitudinal structure
of the data. Mixed beta regression has emerged as an appealing alternative to
model this. However, when the average and the subject-specific curves do not
follow a parametric form, semi-parametric methods are required. We propose
a mixed beta regression with smooth average curves and subject-specific curves
to model severity progress curves. Parameters in the proposed model are esti-
mated via maximum likelihood. The roughness parameters in the penalized
splines are chosen using traditional model selection criteria (e.g., BIC or AIC).
The proposed semi-parametric method allows us to model flexible shapes for
disease progress curves, and can be used to compare treatments or conditions
while taking into account the longitudinal and design structures of the data.
We apply the proposed method to model the severity of Black Sigatoka in an
experimental banana plantation in Isabela, Puerto Rico, designed to compare
different control practices. The use of the proposed method yields very useful
results that allow plant pathologists and crop managers to understand, monitor
and control diseases.
63
CLAPEM 2014 Universidad Nacional de Colombia
Carlos Valencia
Universidad de los Andes.
(with Ming Yuan, University of Wisconsin).
Abstract
Many statistical analses require the processing and manipulation of data that
take the form of random curves that are usually the result of smoothed ver-
sions of longitudinal data measured over a grid of points that can be modeled
as functional data. Special attention has been paid to the modeling of a scalar
response with functional predictors, the functional linear regression being the
most renowned case. However, in numerous applications there are a number
of restrictions in terms of the characterization of the response variable, for
instance when this response is categorical or when the usual zero mean addi-
tive error assumption does not seem to be appropriate. A natural alternative is
the use a generalized linear model adapted for a functional predictor.
64
CLAPEM 2014
n (c, ) + J ( ), (4.2)
for c in the real line and H . 0 is the tuning parameter that balances
out the two criteria represented by n and J respectively. Despite there being an
optimization problem over an infinite dimensional space, we show that by the
representer theorem the problem can be solved in a finite number of parameters
and the many known smoothing splines estimation algorithms may be adapted
to solve the numerical problem.
Many of the previous approaches for estimating the slope in the generalized
functional linear model rely of the Functional Principal Components Analysis
(FPCA), that in general impose strong restrictions on the spacing of the eigen-
values of the operator generated by the covariance kernel of the process X () .
We relax these assumptions and obtain sharper minimax convergence rates.
Our asymptotic analysis proves optimality on these rates under some regularity
of conditions.
65
CLAPEM 2014 Universidad Nacional de Colombia
Contributed Talks 4
Abstract
We characterize the classes of stationary-isotropic matrix-valued covariance
functions on an Euclidean space, as the scale mixture of a uniquely determined
matrix-valued measure. Such a result is the analogue of the Schoenberg theo-
rem for the class of univariate stationary-isotropic covariance functions. Based
on previous results, we illustrate the existence of operators that map a radial
function f being positive definitive on some Euclidean spaces Rd; in another
function, say g, being radial and positive definite on an Euclidean space of lower
or higher dimension. One of the classes of these operators are the multivariate
versions of the turning bands equations.
66
CLAPEM 2014
Abstract
In financial markets, the order book is defined as the set of unexecuted buy/sell
orders at which traders are willing to buy/sell a specific financial instrument.
Recently, book order information has become available and it is believed to con-
tain useful information to develop profitable high-frequency trading strategies.
This work presents a comparative study between three proposed methods for
visualization of high-frequency order book information: a dynamic heat map,
a dynamic wavelet transform heat map and a dynamic Markov random field.
A study case is provided using tick-by-tick real high-frequency data from the
Set-Fx, the Colombian Forex exchange market. We will evaluate each visual-
izations performance based on how supportive they are to the trading buy/
sell decision making process using measures such as the ratio between mean
and variance of the normalized histogram of the image gradient or the sum of
the absolute value of the differences between pixels horizontally and vertically.
Finally, we present an analysis and discussion about possible enhancement
methods. This paper is organized as follows: Section 1 presents the introduc-
tion. Section 2 depicts a brief review of main concepts and definitions to under-
stand order book dynamics and its connection to High Frequency Trading in
Forex Exchange markets. Section 3 introduces different techniques and the
results of the comparative study. Finally, Section 4 presents the conclusions and
some suggestions for coupling the proposed visualization techniques to trading
strategies.
67
CLAPEM 2014 Universidad Nacional de Colombia
Luis Melo
Banco de la Repblica, Colombia
(Juan Jos Echavarra, Federacin de Cafeteros,
Colombia; Mauricio Villamizar, Georgetown
University).
Abstract
The adoption of a managed regime assumes that interventions are relatively
successful. However, while some authors consider that foreign exchange inter-
ventions are not effective, arguing that domestic and foreign assets are close
substitutes, others advocate their use and maintain that their effects can even
last for months. There is also a lack of consensus on the related question of
how to intervene. Are dirty interventions more powerful than pre-announced
constant ones? This paper compares the effects of day-to-day interventions
with discretionary interventions for the Colombian case by combining a Tobit-
GARCH reaction function with an asymmetric power PGARCH(1,1) impact
function. Our results show that the impact of pre-announced and transparent
US $ 20 million daily interventions, adopted by Colombia in 2008-2012, has
been much larger than the impact of dirty interventions adopted in 2004-2007.
As a second exercise, we compare the effect of different types of interventions by
the Colombian central bank using an event study approach, without imposing
restrictive parametric assumptions or without the need to adopt a structural
model. We find that all types of interventions have been successful according to
the fact that smoothing criterion were able to stem exchange rate volatility. In
particular, volatility options seemed to have the strongest effect. We find that
results are robust when using different window sizes and counterfactuals.
68
CLAPEM 2014
Abstract
Personal individual capitalization systems have experienced significant growth
in recent decades, following the trend of aging populations and the defined
benefit pension crisis. This article investigates whether the implementation of
funded pension schemes has prompted the development of domestic capital
markets worldwide, over the 1990-2011 period. The methodological strategy
relies upon panel regressions, minimum spanning tree and hierarchical tree
classification techniques applied to depth and liquidity indicators of stock and
bond markets as well as representative pension fund performance information.
The analysis has revealed that individual capitalization pension funds have
meant a stimulus to stock market depth. A negative causality with stock market
liquidity is also evidenced and linked to the long-term profile of pension port-
folio management. Both development ratios receive positive impacts of greater
magnitude from the cluster of advanced maturation systems. Results also sug-
gest that voluntary systems have mainly encouraged public debt depth but are
related to improvements in stock market development as well. Finally, evidence
reveals that clusters of low gradual and incipient maturation systems exert posi-
tive impacts on public debt depth. These findings are consistent with existing
literature and also with the investment portfolio that usually characterizes pen-
sion funds in their earlier stages of life.
69
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 5
Viswanathan Arunachalam
Department of Statistics,Universidad Nacional de
Colombia.
Abstract
An important problem in opportunistic spectrum access is the maximization
of the number of packets sent by the secondary users during the white space
of the spectrum, while avoiding the infringement of the privileges of the pri-
mary user. The focus of the model is from the perspective of the secondary node
only and hence the alternating renewal process describing the primary users
activity has not been taken into account. It would be interesting to incorporate
the availability of the spectrum for the secondary user which is nothing but
the unavailability function of the alternating renewal process. We set up this
problem in terms of an optimal stopping problem. Explicit expression for the
optimal number of packets that can be sent by the secondary nodes in a white
space is obtained. An example is used to explain the model.
Selvamuthu Dharmaraja
Department of Mathematics
Indian Institute of Technology Delhi
New Delhi India.
Abstract
Next generation networks require efficient radio resource management (RRM).
Increasing demand of high-bit rate services together with limited availability
of radio resources, requires smart RRM strategies that also maintain quality
of service. Current penetration of technologies such as Universal Mobile Tele-
70
CLAPEM 2014
Pablo Groisman
IMAS-CONICET - U. de Buenos Aires.
(E. Andjel - F. Ezanno, IMAP - U. dAix-Marseille,
L. Rolla, IMAS-CONICET - U. de Buenos Aires).
Abstract
We consider a 1D contact process seen from its rightmost point on the space
of infinite configurations which are bounded above. Despite the fact that this
process has no invariant measures, we will prove that it converges in distribu-
tion to the quasi-stationary distribution of the same process but defined on the
space of finite configurations.
71
CLAPEM 2014 Universidad Nacional de Colombia
Mauricio Junca
Universidad de los Andes, Colombia
(Mauricio Snchez-Silva, Universidad de los Andes,
Colombia).
Abstract
We present a model to define an optimal maintenance policy of systems that
deteriorate as a result of shocks, modeled as a compound Poisson process and a
deterministic, state dependent rate. The optimal maintenance strategy is based
on an impulse control model. In the model, the optimal time and size of inter-
ventions are executed according the the system state, which is obtained from
permanent monitoring.
Contributed talks 6
Bhargab Chattopadhyay
University of Texas at Dallas.
(Shyamal Krishna De, Binghamton University).
Abstract
Economic inequality is usually measured when it comes to evaluating the
effects of economic policies at micro or macro level. In order to evaluate the
economic policies adopted by a government, it is important to estimate the Gini
index at any specific time period. If the income data for all households in the
region of interest is not available, one has to draw a relatively small sample
to estimate the Gini index for that region. A method of estimation should be
developed such that the cost of sampling and the error in estimation are kept
as low as possible. It is well known that error in estimation decreases when the
sample size increases which in turn increases the cost of sampling. To minimize
72
CLAPEM 2014
the cost of sampling, one has to reduce the sample size which in turn may lead
to a higher estimation error. Therefore, a procedure is required which can act
as a trade-off between the estimation error and the sampling cost. To achieve
this trade-off, the sample size should not be fixed in advance. This problem falls
in the domain of sequential analysis where it is known as minimum risk point
estimation problem. In this presentation, we propose a sequential procedure
that yields an asymptotic minimum risk point estimator of Gini index by mini-
mizing the asymptotic risk function comprising of a cost function and estima-
tion error. Under distribution-free scenario, we prove that the final sample size
for our procedure approaches the optimal sample size that minimizes the risk
function. A detailed and more rigorous use of reverse submartingale properties
has been adopted to prove that on average, the final sample size hovers around
the optimal sample size and also the ratio-regret is asymptotically 1.
Jorge Figueroa-Ziga
Universidad de Concepcin - Chile.
(Reinaldo Arellano-Valle, Pontificia Universidad
Catlica de Chile, Silvia L.P. Ferrari, Universidad de
So Paulo, Brazil).
Abstract
Beta regression models have been widely used for the analysis of limited-range
continuous variables. Here, we consider an extension of the beta regression
models that allows for explanatory variables to be measured with error. Thus,
we propose a Bayesian treatment for errors-in-variables beta regression mod-
els. The specification of prior distributions is discussed, computational imple-
mentation via Gibbs sampling is provided, and two real data applications are
presented. Additionally, Monte Carlo simulations are used to evaluate the per-
formance of the proposed approach.
73
CLAPEM 2014 Universidad Nacional de Colombia
Sylvain Robbiano
CIMFAV, Universitad de Valparaso, Chile.
(Benjamn Guedj, LSTA, Universit Piere et Marie
Cury, France).
Abstract
The bipartite ranking problem consists in learning from a sample Dn = ( Xi ,Yi )ni=1
to emphrank observations Xi, while preserving the order of their associated
labels Yiinpm1 . We consider this problem in the high dimensional situation,
where the observations Xi s lie in a space of dimension d, possibly much larger
than the sample size n. A standard approach in this context involves the intro-
duction of a emphscoring function. We propose to estimate the optimal scoring
function using the so-called Gibbs posterior distribution, which avors sparse
additive estimators. This procedure appears valuable when it comes to assess
the effect of each covariate on the score of an observation. Using elements from
the PAC-Bayesian theory, we provide theoretical guarantees about our method,
along with an implementation through MCMC.
Abstract
Over the last few years, statistical modeling of continuous proportion has
become the issue of many studies. Some examples of continuous proportions
data include unemployment rate, mortality in traffic accidents, the fraction of
income contributed to a retirement fund, the fraction of exportation income of
the industry sectors, etc. Usual linear and nonlinear regression models are not
suitable for these types of data. Some different alternatives have been proposed
74
CLAPEM 2014
BIZU ( y ; , , , ) = Ber ( y ; ) + (1 )F ( y ; , ),
75
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 7
Abstract
The sequential tests theory analysis was developed by Wald. It states that a
sample size cannot be fixed in advance; instead the data should be evaluated
as they are collected and further sampling should be stopped according to
a rule previously defined and as soon as significant results are observed. It
was also demonstrated that the probability that a sample size is infinite, is
zero. In addition, a method was established to calculate the expected value
of the sample size. The higher order moments of the random sample size are
not easy to calculate. However, it was demonstrated that these moments are
finite. Recent literature about sequential tests does not provide new theoreti-
cal elements about calculating these moments. The most recent articles on
this subject show various applications in several areas of knowledge, but none
of them calculate, for instance, the variability of the random sample size. In
this lecture, we present a way to simulate the probability functions of the
sample sizes, allowing us to make a decision on a specific case and what would
be the most convenient for a study, whether to work with sequential test or to
work with a hypothesis test with a fixed sample size.
76
CLAPEM 2014
Miguel A. Delgado
Universidad Carlos III de Madrid.
(Juan C. Escanciano, Indiana University).
Abstract
We present a methodological approach for testing inequality constraints on
conditional moments. The null hypothesis of an inequality restriction is equiv-
alently expressed as an equality using the least concave majorant operator
applied to the integrated conditional moments. A suitable time transformation
of the basic process renders an asymptotic distribution-free test, with critical
values that can be easily tabulated. Monte Carlo experiments provide evidence
of the satisfactory finite sample performance of the proposed test.
Norman Giraldo
Universidad Nacional de Colombia.
Abstract
The semi-nonparametric density estimator or SNP, introduced by Gal-
lant and Nychka (1987) is a density estimator based on a family of functions
( fn (x , d ), n = 1,2,) , where each fn (x , d ) is a density function defined through
a truncated Hermites polynomials expansion, with a coefficients vector
(parameters) d = (d0 ,, d p ) , given by
n
2
pn
fn (x , d ) = di H i (x ) e x /2 + 0 n(x ), d n ,
2
(4.3)
i=0
77
CLAPEM 2014 Universidad Nacional de Colombia
and standard deviation respectively, directly estimated from the sample. Prop-
erties and new applications for the SNP distributions, away from its use as den-
sity estimators, and used in this presentation can be consulted in Leon, Menca
and Santana (2005). This presentation develops another SNP application, this
time to the problem of efficiently and accurately approximating the compound
Poisson distribution. If { X1 , X2 ,} is a sequence of iid positive, continuous
random variables, with common distribution F (x ) , with F (0+) = 0 , and
N Poisson( ) in a Poisson distributed with parameter , random vari-
able, assumed independent from the X 'j s , then we define the random variable
S = j X j , such that S = 0 N = 0 , named Compound Poisson, and denoted
N
. The search for approximate methods for FS (x ) has been a longstanding problem
in actuarial science, with many proposed solutions and still an active source of
research. We now assume that the distribution of F (x ) depends on m parame-
ters, (1 ,..., m ) then S PC( , F (x ; 1 ,..., m )) . The contribution of this presen-
tation is an approximation method for FS ( x ) using a semi-nonparametric SNP
distribution with pm parameters (d1 ,, d p ) , obtained by a procedure akin
m
to the classical moment estimation method. If S PC( , F (x ; 1 ,, m ) and
Y SNP (d1 ,, d p ) then moments (S j ), (Y k ), j = 1,2,, m , k = 1,2,, pm
m
can be expressed by closed algebraic expressions. Assuming pm = m + 1 , we
solve by numerical methods the non-linear system
with unknowns d1 ,..., dm+1 , and using the optimization procedures supported
by the R software system. We provide a comparison between the approximation
result from the proposed method with other well known approximation pro-
cedures like Bowers (1966) Gamma approximation, the Normal-Power method
(NP) in Beard et. al (1984), and the recursive method by Panjer, Panjer and
Willmot (1996). These methods are conveniently implemented with the help
of the R software actuar library, Goulet (2005).The quality of the approxima-
78
CLAPEM 2014
tions are examined following the same strategies implemented in Gendron and
Crepeau (1989) and Chaubey, Garrido and Trudeau (1998), where an Inverse
Gaussian distribution was assumed for the X variable. Then, the empirical
cumulative distribution from simulation of large samples of the exact distribu-
tion of S are compared with the approximations provided by the methods
using Cramer-von Mises statistics. We show significant improvements for the
proposed method over the others. This presentation contains results previously
obtained in Velsquez (2009).
Abstract
We propose an adaptive runs test to identify rth order Markovian dependence
in a Bernoulli sequence, constructed from two runs tests: one of them depen-
dent on the number of ones and the other independent of it. We give explicit
expressions for the distribution of the test statistics and for the power of the test
based on these statistics. We calculate the power of the two tests separately and
of the adaptive test, and we note that the adaptive test is more powerful than the
two tests when they are used separately, specially when the success probability
is around 0.5 and when the sequence contains too much or too few successes.
79
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 8
Abstract
In this talk, we introduce a continuous state branching process in a Brownian
environment. The present model generalizes the recent paper by Boinghoff and
Hutzenthaler, in which they studied the case when the continuous state branch-
ing process is the Feller-diffusion. In particular, we study different aspects of
this type of process as: probability of extinction or their conditioned version at
survival. Special attention is given to the self-similar case.
References
1. Boinghoff, C. and Hutzenthaler, C. (2012). Branching diffusions in random
environment. Markov Process and Related Fields, 18, 269-310.
80
CLAPEM 2014
Abstract
We consider a random forest F, defined as a sequence of i.i.d. splitting trees,
each started at time 0 from a single ancestor (with a specific distribution, dif-
ferent from that of its descendants), stopped at the first tree having survived up
( )
to a fixed time T. We denote by t , 0 t T the population size process associ-
ated to this forest, and we prove that if the splitting trees are supercritical, then the
( ) (
time-reversed process T t , 0 t T , has the same distribution as t , 0 t T
, but where the
)
, the corresponding width process of an equally defined forest F
underlying splitting trees are obtained by conditioning on ultimate extinction,
and are then subcritical. The results are based on an identity in law between the
contour processes of these random forests, truncated up to T, and the duality
property of Lvy processes. This identity will have some useful applications in
the context of epidemiology, since we will be able to characterize the population
size process conditional on the coalescence times of individuals at T.
Pawel Hitczenko
Drexel University.
(Amanda Parshall, Drexel University).
Abstract
In this paper, we study staircase tableaux, a combinatorial object introduced
due to its connections with the partially asymmetric exclusion process (PASEP)
and Askey-Wilson polynomials. Due to their interesting connections, staircase
81
CLAPEM 2014 Universidad Nacional de Colombia
tableaux have been the object of study in several recent papers. In particular,
the distribution of various parameters in random staircase tableaux has been
studied. There have been interesting results on parameters along the main diag-
onal; however, no such results have appeared for other diagonals. It was conjec-
tured that the distribution of symbols along the kth diagonal is asymptotically
Poisson as k and the size of the tableau tend to infinity. We partially prove this
conjecture and, more specifically, we prove it for the second main diagonal.
Abstract
We aim to study the asymptotics of distributions of various functionals of the
Beta (alpha alpha, alpha) n-coalescent with 1 < a < 2 when n goes to infinity.
The Beta n-coalescent is a Markov process taking values in the set of partitions
of 1,, n, evolving from the initial value 1,, n) by merging (coalescing) blocks
together into one and finally reaching the absorbing state 1,, n. The minimal
clade of 1 is the block which contains 1 at the time of coalescence of the single-
ton 1 . The limit size of the minimal clade of 1 is provided. We express this as a
function of the coalescence time of 1 and sizes of blocks at that time. The case
a = 1 is treated apart using a nice construction of the Bolthausen-Sznitman
coalescent by means of random recursive trees and results on the Chinese Res-
taurant process.
82
CLAPEM 2014
Contributed talks 9
Abstract
The selection of a warranty program for a new product on the market gen-
erates additional costs to the manufacturer other than those inherent to the
manufacturing process. This makes it necessary to establish warranty costs for
a given period of time, thus, the manufacturer can estimate the required level of
reserves to deal with the future warranty claims. Particularly, we consider the
so-called discounted warranty costs. The models developed for these kinds of
costs incorporates the age of the product at the time of the warranty claim and
it can be studied through the stochastic process known as the General Lifetime
Model. In practice, most of the products are systems consisting of several com-
ponents. When the product or system is repairable and maintenance actions in
the components involving costs are made, it is interesting to model the impact
of such actions on the system warranty costs. One of the main appeals of the
General Lifetime Model is that it can evaluate the evolution of the system under
the so-called physical approach, which allows to model the failure process of
the system or product through time and given different levels of information, in
particular, it allows to model the failure rate process, which is the most impor-
tant aspect of these models. Thus, the main difference between the classical
reliability model -known as the statistical approach and the physical approach
is the level of information: while the latter shows the failure process at the level
of the components, in the former only the system failure is observed. This dif-
ferentiates the failure process from an approach to the other, due to the fact that
the associated failure rate processes change, so that the failure rate in the statis-
tical approach is a deterministic function, while the failure rate in the physical
approach is a stochastic process.
83
CLAPEM 2014 Universidad Nacional de Colombia
Marcelo Sobottka
Universidade Federal de Santa Catarina.
Abstract
In this work, we present parameter estimators for a hidden-Markov based model
for the distributional structure of nucleotides in bacterial DNA sequences. Such
a model supposes that the gross structure of bacterial DNA sequences can be
derived from uniformly distributed mutations of some primitive DNA which
is constructed following a ten-parameter Markov process [1]. The proposed
estimators can be used to construct a statistical test which indicates whether a
given DNA sequence can be simulated by the model. This is a joint work under-
taken with A. G. Hart (Centro de Modelamiento Matemtico, Universidad de
Chile) and M. Weber Mendona (Universidade Federal de Santa Catarina). M.
Sobottka was supported by CNPq-Brazil grant 455399/2011-5 and by CAPES-
Brazil Fellowship.
References
1. Sobottka, M. and Hart, A. G. (2011). A model capturing novel strand sym-
metries in bacterial DNA. Biochemical and Biophysical Research Commu-
nications, 410(4), 823-828.
84
CLAPEM 2014
Paula M. Spano
Universidad de Buenos Aires and CONICET.
(Ana M. Bianco, Universidad de Buenos Aires and
CONICET).
Abstract
The main objective of this work is to develop simultaneous confidence bands
for the mean of the discounted warranty cost for coherent systems under physi-
cal minimum repair, i.e., when the system is observed at the level of its com-
ponents, using computer intensive methods based on resampling. In doing so,
based on the theoretical framework of martingale processes and the central
limit resampling theorem (CLRT), we proove the conditions of the latter on
the discounted warranty cost processes. A Monte Carlo simulation study to
evaluate the finite sample performance of the proposed method is performed
through the achieved coverage probability. The results in the considered sce-
narios show that the confidence bands based on resampling have coverage
probabilities close to the expected values, in particular, those based on samples
sizes with more than 100 systems.
85
CLAPEM 2014 Universidad Nacional de Colombia
Sergio Yez
Universidad Nacional de Colombia.
(Luis A. Escobar, Louisiana State University, Nelfi
Gonzlez, Universidad Nacional de Colombia).
Abstract
The purpose of this study is to find a Weibull approximation to the distribution
of the minimum for a competing risks model with two independent Weibull
failure modes. The maximum likelihood Weibull fit ignoring the mode of fail-
ure information is called the ignoring mode of failure model (IG). We show
that for large samples and complete data, the ignoring mode of failure model is
equivalent to the best Kullback Leibler Weibull approximation.
Contributed talks 10
Abstract
Principal Components Analysis (PCA) is a widely used technique that proves
useful for dimension reduction and characterization of variability in multivari-
ate populations. Our interest lies in studying when and why PCA can be used
to effectively model a response-predictor set relationship. Specifically, take Z
86
CLAPEM 2014
to be a continuous random variable such that its support traverses the origin
of a p-dimensional continuous space E. Let Y be a p-dimensional continuous
random vector in E such that the supports of each component of Y traverse the
origin of E, where Y also satisfies the property that its p components are pair-
wise orthogonal. Select uniformly in E any vector X of p continuous random
variables traversing the origin. We prove that Y explains Z better than X in
terms of the correlation. In particular, we prove that the principal components
explain better a response variable than the original input variables. This has
important consequences for modeling data in high dimensions. We illustrate
this result using PRIM, a bump-hunting algorithm used to identify and charac-
terize modal subgroups in populations. We study the empirical performance of
our findings via simulations that mimic high dimensional applications.This is
a joint work undertaken with J. Sunil Rao of the University of Miami and Jean-
Eudes Dazard of Case Western Reserve University.
Abstract
In this talk, we focus on the problem of uncertainty estimation in prediction for
random effects in mixed models. In a first stage, we review the evaluation and
estimation of the Mean Squared Error (MSE) of the Empirical Predictor based
on second-order correct approximations in the spirit of Prasad & Rao (1990),
Das et al. (2004) and Jiang (2003) among others. Resampling procedures, and
specially Empirical Bootstrap, provide an attractive way of estimating MSE by
either computing it directly or by providing some bias correction in conjunc-
tion with the approximation-based approach. We explore bootstrap schemes in
mixed models for hierarchical data and propose a non-parametric algorithm
for estimating the MSE of the Empirical Best Predictors of the Random Effects,
based on the Generalized Bootstrap for Estimating Equations (Chatterjee &
Bose, 2005) adapted for Gaussian GLMMs by Field et al. (2010) and Samanta
& Welsh (2013). We apply this procedure in the context of Generalized Linear
Mixed Models and the Empirical Best Predictor (Jiang & Lahiri, 2006). Finally,
87
CLAPEM 2014 Universidad Nacional de Colombia
we illustrate the properties of our proposal with simulation studies. Joint work
with Eva CANTONI.
References
1. Chatterjee, S. and Bose, A. (2005). Generalized bootstrap for estimating
equations. The Annals of Statistics, 33(1), 414-436.
2. Das, K., Jiang, J., and Rao, J. N. K. (2004). Mean squared error of empirical
predictor. The Annals of Statistics, 32(2), 818-840.
3. Field, C. A., Pang, Z., and Welsh, A. H. (2010). Bootstrapping robust esti-
mates for clustered data. Journal of the American Statistical Association,
105(492), 1606-1616.
5. Jiang, J., and Lahiri, P. (2006). Mixed model prediction and small area esti-
mation. Test, 15(1), 1-96.
88
CLAPEM 2014
Abstract
Some fractional factorial experiments include response in (0,1); their analysis
requires considering linear restrictions on the parameters because models are
supersaturated, i.e., we have more parameters than observations. In order to
solve this problem, a doubly restricted beta regression model is proposed: where
both the mean and dispersion parameters of the distribution are modeled and
restricted simultaneously. A penalty function with Lagrange multipliers is
proposed in order to get the maximum likelihood estimations on parameters.
The likelihood ratio and the Wald tests are considered alternatives for testing
a hypothesis about parameters; additionally, nested models are compared. We
consider a measure of goodness of fit. A simulated example and data from 2(k-
p) experiments are analyzed. Also, we compared the results obtained here with
Bayesian and frequentist unrestricted estimations made in some papers.
Germn Moreno
Universidad Industrial de Santander.
(Guillermo Martnez-Flrez; Heleno Bolfarine).
Abstract
This paper proposes a general class of regression models for continuous pro-
portions when data is inflated with zeros or/and ones. The proposed models
assumes that the response variable has a mixture continuous--discrete distri-
bution with covariates in both the discrete and continuous parts of the model.
89
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 11
Jump-diffusion approximation of
density dependent Markov chains
in domains with boundaries
Enrico Bibbona
Department of Mathematics G. Peano, University
of Torino.
(Alessio Angius, Gianfranco Balbo, Marco Beccuti,
Andras Horvath, Department of Computer Science,
University of Torino, Roberta Sirovich, Laura
Sacerdote, Department of Mathematics G. Peano,
University of Torino).
Abstract
Density dependent Markov Chains are widely used to model many differ-
ent phenomena in population dynamics, chemical reactions, epidemics. It is
well known, mainly because of the work of Kurtz, that such processes can be
approximated by ordinary differential equations (ODEs) when their indexing
parameter grows large. Important phenomena that cannot be revealed with
such approaches include heavy tailed or bi-modal population distributions. A
better approximation proposed again by Kurtz is through diffusion processes.
However, such an approximation does not naturally encode the presence of
boundaries of the state space. We show how such a problem can be relevant
in some concrete examples and we propose a jump-diffusion approximation
that has the same law of the approximating diffusion as far as it remains in the
interior of the state space but includes jumps at the boundary that mimics the
original Markov chain and allow to catch the behavior at the behavior as well.
The same approach can also be applied to the simulation of hybrid models with
different time scales.
90
CLAPEM 2014
Jaime A. Londoo
Universidad Nacional de Colombia.
Abstract
I extend results given in J. A. Londoo: A sensitive Inter-temporal Equilibrium
for Relative Well-Being, working paper (2013), characterizing inter-temporal
equilibrium, when are assumed incomplete markets, markets with arbitrage
opportunities and when heterogeneous agents maximize a state dependent utility
functional, as proposed in J.A. Londoo State Dependent Utilities and Incomplete
Markets. Mathematical Problems in Engineering, 2013: 1-8 (2013). The maximi-
zation problem is an optimization problem defined on a class of portfolios that
are not arbitrage opportunities, but the market itself allows arbitrages. Also, we
prove that any equilibrium market arising from the previous considerations, sat-
isfies a weak form of lack of arbitrage, and we also provide tools for the construc-
tions of equilibrium markets when the aggregate endowments and dividends are
exogenously given. The theoretical framework used is a generalization or markets
when the processes are Brownian Flows on Manifolds.
Abstract
The definition and properties of the Mellin transform, together with their appli-
cations in determining the density of algebraic combinations of random vari-
ables naturally led to its application to the valuation of exotic financial options.
As a particular case, we consider the application of this transform in the case
where the underlying an arithmetic Asian option following a jump-diffusion
91
CLAPEM 2014 Universidad Nacional de Colombia
1 N (t )
S(t ) = S(0)exp (r q k 2 )t + W (t ) Yi
2 i=1
Rafael Serrano
Universidad del Rosario.
Abstract
We study the martingale approach to maximization of expected utility from
consumption and terminal wealth in a pure-jump model driven by marked
point processes and in the presence of margin requirements such as different
interest rates for borrowing and lending and risk premiums for short positions.
This is modeled by adding in a margin payment function into the investors
wealth equation which is nonlinear with respect to the portfolio process. We
give sufficient conditions for the existence of optimal policies using martingale
and convex duality techniques. Closed-form solutions for the optimal value
function are found in the case of pure-jump models with Markov-modulated
jump-size distributions and agents with logarithmic utility.
92
CLAPEM 2014
Contributed Talks 12
Cristian Bayes
Departamento de Ciencias, Pontificia Universidad
Catlica del Per.
(Jorge L. Bazn, Universidade de So Paulo,
Catalina Garca, Universidad de Granada).
Abstract
A new regression model for proportions is presented by considering the Beta
rectangular distribution proposed by Hahn (2008). This new model includes
the Beta regression model introduced by Ferrari and Cribari-Neto (2004) and
the variable dispersion Beta regression model introduced by Smithson and
Verkuilen (2006) as particular cases. Like Branscum, Johnson and Thurmond
(2007), a Bayesian inference approach is adopted using Markov Chain Monte
Carlo (MCMC) algorithms. Simulation studies on the influence of outliers by
considering contaminated data under four perturbation patterns to generate
outliers were carried out. These confirmed that the Beta rectangular regression
model seems to be a new robust alternative for modeling proportion data and
that the Beta regression model shows sensitivity to the estimation of regression
coefficients, to the posterior distribution of all parameters and to the model
comparison criteria considered.
References
1. Branscum, A. J., Johnson, W. O. and Thurmond, M. C. (2007). Bayesian
beta regression; application to household data and genetic distance bet-
ween foot-and-mouth disease viruses, Australian & New Zealand. Journal
of Statistics, 49(3), 287-301.
93
CLAPEM 2014 Universidad Nacional de Colombia
Salvador Flores
CMM, Universidad de Chile.
(Luis Briceo, Universidad Federico Santa Mara).
Abstract
Over the last few years, there has been a lot of excitement around the advances
in the reconstruction of sparse signals by 1 -norm minimization and its appli-
cations to compressed sensing. The core mathematical problem behind this is
that of finding the sparsest solution to under determined systems. The great
bulk of the existing results are probabilistic, and can be loosely resumed as: for
Gaussian random matrices A there is a very high probability that the minimal
1 -norm is a solution to the under determined system, Ax = b is also the spars-
est one, provided that the later is sparse enough. We shall discuss an application
of this theory proposed for the following variant of the robust linear regression
problem. Let y n be a vector containing observations from the linear model
y = Xf + , where = z + e is an error term composed of two contributions.
A dense, presumably small, vector of noise z, and an arbitrary sparse vector e
modeling outliers. We suppose that the design matrix X is under control and
not subject to contamination. The problem is to find an estimator f of f with
provable error bounds independent of the magnitude of the sparse vector e. We
show that the results obtained by the aforementioned theory can be improved
in many directions by extending some results related to the regression break-
down point of the classical 1 estimator. Our results are based on sharp error
bounds for the solutions of the 1 minimization problem min y Xg 1 when
errors consist of noise and sparse outliers.
94
CLAPEM 2014
Alejandra Martnez
Universidad de Buenos Aires and CONICET.
(Graciela Boente, Universidad de Buenos Aires and
CONICET, Matas Salibian-Barrera, University of
British Columbia).
Abstract
As is well known, kernel estimators of the regression function in nonparametric
multivariate regression models suffer from the so-called curse of dimensional-
ity, which occurs because the number of observations lying in a neighborhood
of fixed radio decreases exponentially with the dimension. Additive models are
widely used to avoid the difficulty of estimating regression functions of several
covariates without using a parametric model. They generalize linear models,
are easily interpretable, and are not affected by the curse of the dimensionality.
Different estimation procedures for these models have been proposed in the
literature, and some of them have also been extended to the situation when the
data may contain missing responses. It is easy to see that most of these esti-
mators can be unduly affected by a small proportion of atypical observations,
since they are based on local averages or local polynomials. For that reason,
robust procedures to estimate the components of an additive model are needed.
We consider robust estimators for additive models based on local polynomials
that can also be used on data sets with missing responses. These estimators
simultaneously avoid the curse of dimensionality and the sensitivity to atypical
observations. Our proposal is based on the method of marginal integration,
and adapted to the missing responses situation.
95
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
Generalized Linear Models (GLM) and Generalized Additive Models (GAM)
are popular statistical methods for modeling continuous and discrete data both
parametrically and nonparametrically. In this general framework, we consider
the problem of variable selection through penalized methods by focusing on
resistance issues in the presence of outlying data and other deviations from the
stochastic assumptions. We propose robust penalized M-estimators and study
their asymptotic properties. In particular, we show that robust counterparts
of the adaptive lasso and the nonnegative garrote satisfy the oracle properties.
Our results extend the available theory from linear models to GLM and GAM,
and from L2-based estimation to robust estimation. Finally, we illustrate the
final sample performance of the method by a simulation study in a Poisson
regression setting.
96
CLAPEM 2014
Contributed talks 13
Alexandru Hening
University of Oxford.
(Steven N. Evans, UC Berkeley, Sebastan J.
Schreiber, UC Davis).
Abstract
A fundamental problem in ecology is to understand when it is possible for one
species to invade the range of another established species. There is widespread
empirical evidence that invasions can occur when there is significant heteroge-
neity in space and time in the range of the resident species. We propose a model
for the invasion process with a view to understanding what factors make inva-
sion possible. The model reduces to studying a coupled system of two stochastic
differential equations. By introducing the concept of invasion rate, we are able
to fully characterize the conditions on the coefficients of the SDEs under which
invasion is possible or impossible.
Orietta Nicolis
Universidad de Valparaso.
Abstract
The description of natural phenomena by an analysis of the statistical scaling
laws is always a popular topic. Many studies aim to identify the fractal fea-
ture by estimating the self-similar parameter H, considered constant at differ-
ent scales of observation. However, most real world data exhibit a multifractal
structure, that is, the self-similarity parameter varies erratically with time. The
97
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
We consider spherical data Xi noised by a random rotation i SO(3) so that
only the sample Zi =i Xi , is observed. We define a nonparametric test proce-
dure to distinguish H0: the density f of Xi is the uniform density f0 on the
sphere and H1: f f02CN and f is in a Sobolev space with smoothness s. For a
noise density f with smoothness index, we show that an adaptive procedure (i.e.
s is not assumed to be known) cannot have a faster rate of separation than ad(s)
= (N/sqrt(loglog(N))(2s/(2s+2+1)) and we provide a procedure which reaches
this rate. We also deal with the case of super smooth noise. We illustrate the
theory by implementing our test procedure for various kinds of noise on SO(3)
and by comparing it to other procedures. Applications to real data in astrophys-
ics and paleomagnetism are provided.
98
CLAPEM 2014
On probabilistic-stochastic
visual communication
Moshe Porat
Technion, Israel Institute of Technology.
Abstract
Color information plays a major role in visual communication although most
algorithms and tools are developed mainly for monochromatic image trans-
mission. Usually, the representation and coding of visual information is per-
formed either in the Red-Green-Blue (RGB) color space or in another color
space chosen rather arbitrarily. Considering an image as a two-dimensional
stochastic field, it is well known that the color components of natural images,
such as RGB, are highly correlated. In this work we explore the inter-color cor-
relation characteristics of natural images, their relation to the main statisti-
cal properties of the image and their joint behavior in major image spaces or
planes. Presently, most color coding algorithms tend to de-correlate the color
components as part of the coding and the transmission process. A widely
used example of the de-correlation approach is the baseline JPEG algorithm
for image compression (and consequently, the MPEG algorithm for video cod-
ing). This is done by applying color transforms to reduce the statistical correla-
tion and thus enabling low bit-rate chrominance encoding. However, having
analyzed these separate components as three monochromatic stochastic fields,
considerable resemblance is noticeable, implying that substantial mutual infor-
mation has remained and has not been exploited to reduce the size of the stored
or transmitted data. To improve the encoding, new analysis tools for the image
statistics are introduced in this work, taking advantage of the high inter-color
correlation to perform efficient encoding of the information. One approach is
to optimally reduce the inter-color correlation. Another approach is to use a
correlation-enhancement transform to increase the inter-color correlation of
an image prior to the encoding to allow efficient approximation of two of the
color components using the third color component as a reference. Our work
shows that exploitation of mutual spectral information, as proposed, can
improve the coding of chrominance information in color image compression.
The measure of Signal-to-Noise Ratio (SNR) is used to assess both quantita-
tive and subjective visual fidelity. Experimental results show that the proposed
approaches outperform presently available methods in the sense of SNR vs. bit-
rate. Our conclusion is that the high correlation between primary RGB colors
could be helpful for image coding as well as for video transmission, and that
a new spatio-temporal approach to image compression is more efficient than
conventional de-correlation-based techniques.
99
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 14
Abstract
The Hermite random field has been introduced as a limit of some weighted
Hermite variations of the fractional Brownian sheet. In this work, we define it
as a multiple integral with respect to the standard Brownian sheet and intro-
duce Wiener integrals with respect to it. As an application we study the wave
equation driven by the Hermite sheet. We prove the existence of the solution
and we study the regularity of its sample paths, the existence of the density and
of its local times.
Abstract
We consider a non-linear stochastic wave equation driven by a Gaussian noise
white in time and with a spatial stationary covariance. Results found by Dalang
and Sanz- Sol (2009), show that the sample paths of the random field solution
are Hlder continuous, jointly in time and in space. In this lecture, we will
100
CLAPEM 2014
Abstract
We study the Euler characteristic of an excursion set of a stationary Gaussian
random field. Let X : d be a stationary isotropic Gaussian field hav-
ing trajectories in C 2 ( d ) . Let us fix a level u and consider the excursion
set above u , {t d : X (t ) u} . We take the restriction to a compact domain
considering for any bounded rectangle T d , A(T , u) = {t T : X (t ) u} .
The aim of this paper is to establish a central limit theorem for the Euler char-
acteristic of A(T , u) as T grows to d , as conjectured by R. Adler more than
ten years ago. The required assumption on X is stronger than Gemans one in
dimension one but weaker than having C 3 trajectories. Our result extends to a
higher dimension which is known in dimension one, since in that case, the Euler
characteristic of A(T , u) equals the number of up-crossings of X at level u.
101
CLAPEM 2014 Universidad Nacional de Colombia
Achillefs Tzioufas
Universidad de Buenos Aires.
Abstract
We study the behaviour of symmetric supercritical one-dimensional contact
processes on survival. We show the existence of random regenerative space-
time points on the trajectory of their extremal particles; the key to this a short
proof of a result of Mountford and Sweet (2000) by means of a new, elementary
approach.
Contributed talks 15
Grazyna Badowski
University of Guam.
(Lilnabeth Somera, University of Guam, Hye-Ryeon
Lee, University of Hawaii).
Abstract
Respondent driven sampling (RDS) is a relatively new network sampling tech-
nique typically employed for hard-to-reach populations (Heckathorn 1997,
2002). It is similar to snowball sampling where initial seed respondents
recruit additional respondents from their network. The RDS mathematical
model is based on Markov chain theory. It suggests that if peer recruitment
occurs through a sufficiently large number of recruitment waves, the sample
102
CLAPEM 2014
will stabilize and reach equilibrium distribution. The RDS model uses infor-
mation about the social network obtained during the recruitment process to
weight the sample. Under certain assumptions the method promises to produce
the sample independent from the biases that may have been introduced by the
non-random choice of seeds from which recruitment began. We conducted a
survey on health communication in general population on Guam using the RDS
method. In this paper, we will investigate the performance of RDS as a Mar-
kov chain by assessing all the assumptions and comparing estimates from the
RDS survey on health communication with population data from both the 2010
Guam Census and the 2012 Behavioral Risk Surveillance Survey (BRFSS). This
study included RDS data collected on Guam in 2013 (n = 511) and 2012 BRFSS
Guam Data (n = 2031). The estimates were calculated first using unweighted
RDS sample, second using RDS inference methods and compared with known
population characteristics. RDS sample was largely representative of the total
population by sex, ethnicity, socioeconomic status and geographic location but
sample overrepresented young adults age 18-34 and with some post high school
education. Respondent-driven sampling statistical inference methods failed to
reduce these biases. Further study is needed in deriving proper RDS statistical
inference.
Abstract
A new regression model for proportions is presented by considering the Beta
rectangular distribution proposed by Hahn (2008). This new model includes
the Beta regression model introduced by Ferrari and Cribari-Neto (2004) and
the variable dispersion Beta regression model introduced by Smithson and
Verkuilen (2006) as particular cases. Like Branscum, Johnson and Thurmond
(2007), a Bayesian inference approach is adopted using Markov Chain Monte
Carlo (MCMC) algorithms. Simulation studies on the influence of outliers by
considering contaminated data under four perturbation patterns to generate
outliers were carried out and confirm that the Beta rectangular regression
model seems to be a new robust alternative for modeling proportion data and
that the Beta regression model shows sensitivity to the estimation of regression
103
CLAPEM 2014 Universidad Nacional de Colombia
References
1. Branscum, A. J., Johnson, W. O. and Thurmond, M. C. (2007). Bayesian
beta regression; application to household data and genetic distance bet-
ween foot-and-mouth disease viruses, Australian & New Zealand. Journal
of Statistics, 49(3), 287-301.
Adolfo J. Quiroz
Universidad de los Andes.
(Joaqun Ortega, Centro de Investigacin en
Matemticas, Guanajuato, Mxico).
Abstract
The application of empirical processes methods in the context of the analysis
of functional data is considered. In particular, quadratic forms of dot products
104
CLAPEM 2014
of certain estimated functions with the functional data are studied as statistics
for the two-sample problem on functional data. Asymptotic distribution results
are given for the proposed statistics and application examples are described in
connection with principal component analysis for functional data.
Daniil Ryabko
INRIA Lille, France, and INRIA Chile.
Abstract
A fully non-parametric approach to asymptotic statistical analysis of stationary
ergodic time series is presented. The considered problems include time-series
clustering, hypothesis testing, change-point estimation, and independence
testing. The presented approach is based on empirical estimates of the dis-
tributional distance. Main results include algorithms that are asymptotically
consistent under the only assumption that the time series in question are sta-
tionary ergodic. No independence or mixing-type assumptions are involved.
While some results are new, detailed exposition of others can be found in [1] D.
Ryabko, Testing composite hypotheses about discrete ergodic processes, Test,
vol. 21, no. 2, pp. 317-329, 2012. [2] D. Ryabko and B. Ryabko, Nonparametric
statistical inference for ergodic processes, IEEE Transactions on Information
Theory, vol. 56, no. 3, pp. 1430-1435,2010.
105
CLAPEM 2014 Universidad Nacional de Colombia
Contributed talks 16
Ricardo Maronna
National University of La Plata, Argentina
(Vctor Yohai, University of Buenos Aires and
CONICET, Argentina).
Abstract
We deal with the estimation of robust substitutes of the covariance matrix for
p-dimensional data. It is important that they possess both a high efficiency for
normal data and a high resistance to outliers; that is, a low bias under contami-
nation. The most frequently employed estimators are not quite satisfactory in
this respect. The Minimum Volume Ellipsoid (MVE) and Minimum Covari-
ance Determinant (MCD) estimators are known to have a very low efficiency.
S-Estimators (Davies 1987) with a monotonic weight function like the bisquare
behave satisfactorily for small p, say p 10 . Rocke (1996) showed that their
efficiency tends to one with increasing p. Unfortunately, this advantage is paid
with a serious loss of robustness for large p. There are three families of estima-
tors with controllable efficiencies: non-monotonic S-estimators (Rocke 1996),
MM-estimators (Tatsuoka and Tyler 2000) and tau-estimators (Lopuhaa 1991)
but their behavior for large p has not been explored to date. We compare their
behaviors employing different loss functions. A simulation study suggests that
the MM-estimators with an adequate loss function outperform the other types.
References
1. Davies, P. L. (1987). Asymptotic behavior of S-estimates of multivariate
location parameters and dispersion matrices. Ann. Statist., 15, 1269-1292.
106
CLAPEM 2014
Frdric Richard
Aix-Marseille Universit, France.
Abstract
The texture is an image aspect which is essential for processing images. In this
talk, we will deal specifically with irregular images, and consider a texture as
an effect of the irregularity on the image appearance. In this context, we will
focus on the issue of testing the texture isotropy. The isotropy is one of the main
texture features, which is useful for the diagnostic or pronostic of diseases in
Medicine. We will address the test issue considering the image as a realization
of generalized fractional Brownian fields.
107
Posters
CLAPEM 2014
Abstract
In the context of spatial statistics, it is common to use the multivariate normal
distribution to model some phenomenon of interest. In this area there are no
replicates and the structure of the scale is of the form 2I + 2 R(). This is a
particular case of a more general structure, known as partially linear scales.
One of the most common estimation methods for the parameters 2 , 2, and
, is the restricted maximum likelihood (REML) method. This is based on
the observation vector multiplied by a matrix called contrast errors so that the
average of the new vector is zero. Moreover, Anderson (1973) estimated the
parameters of the multivariate normal distribution considering a linear scale
structure with a sample of size N. It is natural to extend the ideas of Anderson
(1973) to the case of partially linear structure, and look at the covariance of
spatial parametric models as particular cases when N = 1. On the other hand,
in diagnostic analysis, Cook (1986) developed a technique known as local influ-
ence, this approach measures the Gaussian curvature with small perturbations
in the sample and estimate the parameters of such disturbances to later mea-
sure the discrepancy between the estimates. In spatial statistics, Haining (1994)
suggested a diagnostic analysis by case-elimination. Genton and Ruiz-Gazen
(2010) studied an application with real data using infinitesimal perturbations.
In this talk, we extend the estimation method proposed in Anderson (1973) for
Gaussian models with linear covariance structures to the case of covariances
with partially linear structures. We also apply the methodology of local influ-
ence through an appropriate model with infinitesimal disturbances, using as a
measure of distance between models (with and without disrupting) the likeli-
hood displacement. We provide a tool that allows the detection of influential
observations. The proposed methodology is illustrated via an application with
real data.
111
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
We propose to show the construction of the family of bivariate Marshall-Olkin
copulas, using the marginal functions of the joint distribution of survival Poisson
distributed random variables. Moreover we simulate some copulas of that family,
generating the values of the random variables using the inversion method.
Hctor Araya
Universidad de Valparaso.
(Soledad Torres, Universidad de Valparaso).
Abstract
The main purpose of this work is to estimate the parameter associated to the
Ornstein - Uhlenbeck (OU) model, with noise generated by a long memory
process. The OU process, is defined by the solution of the linear stochastic dif-
ferential equation dWt = dtdWt where Wt is a Wiener process or Brown-
ian motion. The main change of the Brownian motion OU is that the noise is
replaced by a long memory non-Gaussian process. We estimate the parameter
, by the least squares (LS) method, and we prove consistency. Finally, we pres-
ent some simulation study.
112
CLAPEM 2014
Abstract
With the increasing number of HIV infected women in reproductive age,
children have been considered a growing risk group of HIV infection, with a
remarkable increasing in the incidence of children already born infected verti-
cal transmission. The transmission of HIV can occur during labor, others occur
in utero, especially in the last weeks of pregnancy. The objective of this study is
to evaluate possible factors influencing cases of live births of mothers infected
by HIV trough Binary Logistic Regression and characterize the social profile of
these pregnant women in the state of Par, Brazil. From the exploratory data
analysis, it is noteworthy that most of the women had laboratory diagnosis of
HIV infection during the prenatal period, the majority of infected pregnant
women had an emergency cesarean delivery. It was found that, among children
born alive, after birth was initiated antiretroviral prophylaxis within 24 hours
of birth. By binary logistic regression, we observed that pregnant women with
HIV who underwent the prenatal period are nearly 4 times more likely to have
a child born alive compared to pregnant women who did not have prenatal care.
The values of the parameter estimates of the achievement variable of prenatal
care were significant (p = 0,05). In this way it is important that the diagnosis of
HIV infection is done early in pregnancy to allow control of maternal infection,
performing prenatal and other treatments due to HIV virus.
113
CLAPEM 2014 Universidad Nacional de Colombia
Reinaldo B. Arellano-Valle
Pontificia Universidad Catlica de Chile
(Gustavo H.M.A. Rocha, Universidade Federal
de Minas Gerais, Belo Horizonte - MG, Brazil.,
Rosangela H. Loschi, Universidade Federal de
Minas Gerais, Belo Horizonte - MG, Brazil.).
Abstract
In this paper, we consider a non-standard linear regression model, where the
dependent variable is censored and some explanatory variables are measured
with additive errors. In addition, we build our statistical model on the assump-
tion of non-normality for the underlying probabilistic process. Specifically,
we assume that the joint distribution of the error terms and latent covariates
behave as a multivariate t distribution. Thus, the proposed model will be robust
enough to protect our inferences of atypical or influential observations. For the
estimation of the model parameters, we use the classical method of maximum
likelihood through the EM algorithm, in which we included an estimation
procedure for the asymptotic variance of the maximum likelihood estimators.
The proposed methodology is flexible enough to be adapted to other elliptical
models belonging to the scale mixture class of the normal model. The newly
developed procedures are illustrated with an application and simulated data.
Abstract
114
CLAPEM 2014
Jaime R. Arrue
Universidad de Antofagasta
(Reinaldo B. Arellano-Valle, Pontificia Universidad
Catlica de Chile, Hctor W. Gmez Universidad de
Antofagasta)
Abtract
The skew-generalized-normal model with parameters 1 and 2 0,
denoted by SGN (1, 2) corresponds to the skew-normal (SN) model for 2 =0.
Hence, several peculiarities of the SN model are preserved by the SGN one. In
particular, the Fisher information matrix is singular at 1 = 2 =0, and the MLE
of 1 can diverge infinite samples. If the additional parameter 2 is fixed in a
known value, e.g., 2 = 1, the SGN model becomes a natural competitor of the
SN model, with the advantage that its Fisher information matrix is no-singular
at 1 = 0. However, the divergence problem of the MLE of 1 infinite samples
persists. In this work, we study the SGN (1 1) model, hereafter denoted as MSN
115
CLAPEM 2014 Universidad Nacional de Colombia
(), where the divergence of the MLE of the shape parameter occurs with
positive probability in finite samples. To avoid this problem, we apply a method
proposed by Firth (1993), which uses a modified score function to estimate the
shape parameter. As a first result, the modified LME of is always finite. The
quasi- likelihood approach for confidence intervals is considered. When the
model has location and scale parameters, we combined our method with the
classical MLE of these parameters.
Rodrigo Assar
FCFM Universidad de Chile
(Barrientos Francisco, ICBM Universidad de Chile).
Abstract
A key question in the consolidation of Generalized Linear Models is how
to decide if a given variable is statistically significant. The common way to
respond this question is through t-tests on the model coefficients, for continu-
ous variables, or using ANOVA for categorical variables. However, for small-
sized data sets, gaussianity of the errors and orthogonality of covariables are
very strong assumptions which carry to erroneous deductions. Here we analyze
how to avoid two error sources, correlation between co-variables and presence
of outliers, by an appropriate use of two alternative methods implemented in
the R-packages Glmperm and Rfit.Glmperm faces the correlation by consider-
ing the orthogonal projection of the tested variable on the other co-variables.
Thus, it replaces the variable by its projection and computes permutations of
the p-value. On the other hand, Rfit avoids the effect of outliers by using a
new Least Squares Method, which is based on Jaeckels dispersion. Through
randomly-generated examples, we show the performance advantages of these
methods over the common parametric tests in small samples. Finally, starting
from these examples, we generate criteria to decide if the sample size is small
enough to expect statistically significant differences between using the com-
mon or alternative ways, passing from erroneous decisions to correct decisions.
116
CLAPEM 2014
Abstract
Classification methods for functional data range from direct adaptations of
classical multivariate techniques to recent proposals based on depth measures
and nearest neighbours ideas. Necessarily, all of them are based on appropri-
ate definitions of distance among curves. Boosting is an established technique
in the classification literature to improve the performance of weak classifiers.
In this work, we rank the main classification algorithms in the FDA literature
according to their boosted performance and point to the features of classifiers
that are more susceptible of benefit from Boosting. Our conclusions are sup-
ported by a simulation study and illustrated by benchmark datasets.
Vadim Azhmyakov
University of Antonio Nario
(lber lvarez Pinto, University of Antonio Nario,
Ruthber Rodrguez Serrezuela, University of
Antonio Nario).
Abstract
We deal with a class of stochastic optimal control problems involving some
models of engineering systems with uncertainties. Our contribution is mainly
devoted to a practically motivated application of the pseudospectral solution
method for some stochastic-type Hamiltonian boundary value problems. We
propose a numerical algorithm based on the celebrated Gauss pseudospec-
tral approach to optimal dynamic systems with stochastic nature. The last one
117
CLAPEM 2014 Universidad Nacional de Colombia
Natalia Bahamonde
(Pontificia Universidad Catlica de Valparaso),
Soledad Torres (Universidad de Valparaso),
Ciprian Tudor (Universit de Lille 1)
Abtract
We study an extension of the ARCH model that includes the squared fractional
Brownian motion. We construct least squares estimators for the parameters of
the model and we study their asymptotic behavior. We illustrate our results by
numerical simulations.
118
CLAPEM 2014
Prateek Bansal
The University of Texas at Austin
Prof. Chen Mu-Chen (Institute of Traffic and
Transportation, National Chiao Tung University,
Taiwan)
Abstract
As travelers make their choices based on cost associated with travel time, its
information can be helpful to them in choosing appropriate routes and depar-
ture times. To achieve this goal, travel time prediction models have been pro-
posed in literature, but identification of important predictors has not received
much attention. Therefore, this study aims to build a robust and accurate free-
way travel time prediction model by identifying important predictor variables
(feature selection). We propose a travel time prediction and feature selection
model by integrating principal component analysis (PCA) and back propaga-
tion neural networks (BPNN). Though PCA is an extensively used data mining
technique, but as per authors best knowledge, literature does not have method-
ology to retrace original variables from principal components (PCs). Therefore,
we propose a straightforward methodology to retrace original variables from
PCs. The developed methodology should motivate researches to use PCA more
extensively in future. The developed hybrid PCA-BPNN model was validated by
predicting travel time on a 36.1 km long segment of Taiwans National Freeway
No. 1. The model predicts travel time on chosen freeway segment using only
four predictor variables with prediction accuracy equivalent to a stand-alone
BPNN prediction model developed with forty-three predictors. We found that
speed and flow of heavy vehicles on freeway are important predictors of travel
time whereas, rainfall found to have negligible predictive power. These findings
facilitate considerable reduction in financial expenses and time during future
data collection.
119
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
We study the cut-off phenomenon for a family of stochastic small perturbations
of dynamical systems. We will focus on a semi-flow of a deterministic differen-
tial equation which is pertubed by small pertubations of a Brownian motion.
Under weaker hypothesis on the vector field, we will prove that the family of the
perturbed stochastic differential equations have cut-off phenomenon.
Key words: design based inference, pi-ps sampling designs, Sunter method, AP
sampling design.
120
CLAPEM 2014
Abstract
We introduce bivariate Weibull distributions derived from copula functions in
presence of cure fraction, censored data and covariates. Two copula functions
are explored: the FGM (Farlie - Gumbel - Morgenstern) copula and the Gum-
bel copula. Inferences for the proposed models are obtained under the Bayes-
ian approach, using standard MCMC (Markov Chain Monte Carlo) methods.
An illustration of the proposed methodology is given considering a medical
data set. The use of copula functions could be a good alternative to analyse
bivariate lifetime data in presence of censored data, cure fraction and covari-
ates. Observe that in many applications of lifetime modelling, we could have
the presence of a cure fraction for individuals that are long term survivors or
cured individuals.
Brenda Betancourt
University of California, Santa Cruz
(Rodriguez Abel, University of California, Santa
Cruz; Naomi Boyd, West Virginia University).
Abstract
Modelling of temporal evolution of network data has become a relevant prob-
lem for different applications. However, the complexity of the models increase
rapidly with the number of nodes making efficient short term prediction of
future outcomes of the system a challenge for big network data. Here, we pro-
121
CLAPEM 2014 Universidad Nacional de Colombia
pose an autologistic model for directed binary networks with a fused lasso pen-
alty. This model favors sparse solutions of the coefficients and their differences
in consecutive time points, and it is suitable for complex dynamic data where
the number of parameters is considerably greater than the number of observa-
tions over time. The structure of our model allow us to treat the optimization
problem separately for each pair of nodes increasing efficiency of the algorithm
through parallel computing. The optimal fused lasso tuning parameters are
chosen using BIC. We show the performance of the model on a real trading
network from the NYMEX natural gas futures market observed weekly over a
period of four years.
Abstract
We study the genealogical structure of a Galton-Watson process with neutral
mutations, where the initial population is large and mutation rate is small.
Namely, we extend the results obtained in 2010 by Bertoin in two directions.
In the critical case, we construct the version of Bertoins model conditioned not
to be extinct, and in the case with finite variance, we show convergence of the
allelic sub-populations towards a tree indexed CSBP with immigration. Besides
this, we establish the version of the limit theorems in Bertoins work, in the case
where the reproduction law has infinite variance (and the above) and it is in the
domain of attraction of an a-stable distribution.This work is part of my PhD
research elaborated under the direction of Vctor Rivero.
122
CLAPEM 2014
Rafael E. Borges
Universidad de Los Andes, Mrida, Venezuela
(Maura Vsquez, Universidad Central de
Venezuela).
Abstract
Sample size is one of the most important issues that should be considered in
any study. In many studies, this is a topic that is completely solved, but it is still
a problem in some designs that are not standard. For recurrent event analysis,
the problem is not completely solved, the problem is completely solved from
two subjects that includes this type of model: it is solved for survival analysis
for the occurrence of one event (the standard survival analysis context), and for
longitudinal data analysis. In this talk, we present a brief review of the avail-
able method of the most important method for sampling size calculation for
survival analysis, and for longitudinal analysis, and propose two extensions for
determining the sample size of a recurrent event study. One of the extensions
is derived from the sample size for a unique event in survival analysis, and the
other is derived from the longitudinal data analysis context, we discuss the pros
ans cons from both extensions, revise some mathematical properties, and apply
the methods proposed for a study for analyzing the recurrence of episodes of
malaria caused by Plasmodium vivax in an endemic area of Venezuela.
Abstract
In this note, we study the convergence of renewal processes with rewards to
when the time between arrivals of claims depend strongly on the same. Assum-
ing the distribution of rewards and time inter arrivals are heavy tail we will use
123
CLAPEM 2014 Universidad Nacional de Colombia
Mallows distance to obtain the convergence searched. The result will be applied
in two situations. In the context of data traffic in communication networks, we
consider the model of an ON/OFF source sending random loads of traffic to a
network node, where there is a buffer with large capacity memory that stores
the information until it is transmitted. We will use the main result to estimate
the probability of buffer over flow in finite time. Secondly, consider the continu-
ous classical reserve risk process in the case that the claims are strongly depen-
dent on the time between arrivals of the same. An important risk measure is the
ruin probability which will be achieved through the main result.
Abstract
The linear calibration problem, also known as inverse regression problem, is
motivated by the comparison of two or more measurement techniques/instru-
ments of the given characteristic of interest. Bayesian reference analysis under
normal linear calibration models were discussed in Ghosh et al. (1995), Kubo-
kawa and Robert (1994) and Chaibub Neto and Branco (2007). Extension of
some of these results are developed here for the Student-t linear calibration
model. A reparametrization is proposed to obtain a friendly expression for
the Fisher Information Matrix. We also discuss some theoretical proprieties of
these references prior and posterior distributions.
124
CLAPEM 2014
Abstract
In this article we use cross-entropy to identify anomalous data in health-care
reports for the Colombian health system in 2010. In Colombia the government
pays for most of the patients health spending. Because the charges that insurers
make to the government are numerous, expenditure monitoring is difficult and
the opportunities for fraud are various. To automatize the search for anomalous
behavior we first divide the population in different risk groups. Each group
is characterized by a unique combination of gender, age group and medical
diagnosis. Then, within each risk group, we estimate parametrically the cross-
entropy of the information provided by each insurer with respect to the rest of
the sample. To make the calculations we use variables such as total spending,
number of medical appointments, number of first-time medical appointments
and medication requests. Cross-entropy calculations work as measurement of
how anomalous the information provided by each insurer is, so we look at the
highest values. We find that this method is able to identify strange data. Finally,
we implement a method that shows only anomalous data that is highly expen-
sive to the government. The anomalous reports found are very interesting.
Meteorological conditions
indexes and extreme values
Abstract
A graphic analysis of meteorological data collected by the IPNs meteorologi-
cal station located in Mexico Citys northern area, from 2001 through 2013,
125
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The homicide is characterized as universal indicator of social violence, which
is distributed heterogeneously across regions and continents, the homicide is
primarily responsible for the high mortality of the population. It is socially rec-
ognized an act of extreme violence and serious violations of the rights to life
and security. Due to magnitude and transcendence that homicides have posed,
this paper aims to determine factors associated with homicides in the city of
Belm, located in Par State, Brazil, from January to December 2012 . For that,
we used whether the statistical techniques Exploratory Data Analysis and Cor-
respondence Analysis. Among the main results, we observed that the majority
of homicide victims in the city of Belm is male (92.33%), the most were victims
of homicide in October (11.48%) , the actually occurs in the streets (85.23%) , on
Sunday (23.12%) , the night shift (49.13%), more specifically in the range of time
for the 21:00 to 21:59 hours (11.17%), being committed with a firearm (81.02%)
and the presumed cause is hatred or revenge (90.85%). The results show that is
significant association between the levels of the variables: gender versus shift
and employee versus place of occurrence. Showing that the crime of homicide
is a serious public safety.
126
CLAPEM 2014
Andressa Cerqueira
Universidade de So Paulo
(with Florencia Leonardi, Universidade de So
Paulo)
Abstract
The theory of random graphs has been successfully applied in recent years to
model neural interactions in the brain. While the probabilistic properties of
random graphs has been extensively studied in the literature, the development
of statistical inference methods for this class of objects has received less atten-
tion. In this work we propose a nonparametric test of hypotheses to decide if
two samples of random graphs are originated from the same probability distri-
bution. We show how to compute efficiently the test statistic and we study the
performance of the test on simulated data. The main motivation of this work is
to apply this test to analyze neural networks constructed from electroencepha-
lographic data.
Ignacio Correa
Universidad de Chile
Fredes Luis (Universidad de Chile), Perlroth Andrs
(Universidad de Chile).
Abstarct
Separation of sources consists of recovering a set of signals of which only instan-
taneous linear mixtures are observed. Often no a priori information on the
mixing matrix is available, iti.e. the linear mixture should be processed. When
the array manifold is unknown, blind identification of spatial mixtures allows
an array of sensors to implement such a source separation. This technique has
important applications in the Chilean mining industry, in particular for copper
127
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
This work seeks to generalize the discrete jump Markov processes, through the
modification of underlying Poisson Processes for a new stochastic processes
that we called the Hoffman processes, which is characterized by a distribution
of the Hoffman family. In this family, Poisson distribution is a particular case
We present the basic properties of the Hoffman process, to establish the jump
Hoffman process, and also difference between proposed jump Hoffman process
and known jump process model.
128
CLAPEM 2014
Abstract
It is still common in linear regression procedure with the objective of shap-
ing of a random variable, named variable regressor, through a linear function
of a set of covariates, also called independent variables, besides involving a
random term. The model is expressed as Y = + , where Yn 1 . the vector
that containing the values of the regressor variable matrix, Xn(p+1) is called
the design matrix and contains information of the independent variables, the
vector (p+1)1 contains the parameters that measure the influence that covari-
ables on the regressor variable and n1 is the vector of errors. But, this model
considers a single response variable, suppose that we have m variable regressor,
each of which has p components, this model is a multivariate linear model,
that accommodates two or more response variable. The multivariate linear
regression model is essentially several univariate linear regression models
putting together, with the errors being related with each. Here multivariate
means the response variables are multivariate. The assumptions in this model
are E((i ) ) = 0 and Cov((i ) ), ( k ) = ik I with i; k = 1; 2; m, where = i k , but
observations from different trials are uncorrelated. In this model the unknown
parameters are and ik . The maximum likelihood estimators are obtained
under the assumption that e is normally distributed and these coincide with
the OLS estimators. In this article, under the above conditions and for the par-
ticular case where m = 2 and p = 1, ie, only two regressors include variables and
co-variable. In this sense, the two variables are assumed Y(1) and Y(2) normal
and is used to obtain the bivariate distribution. In this paper, we present copula
regression as an alternative to OLS and maximum likelihood, the joint distri-
bution is described by a copula, the major advantage of a copula regression is
that there are no restrictions on the probability distributions that can be used.
The ideal Copulas have the following properties: ease of simulation, closed form
for conditional density, different degrees of association available for different
pairs of variables. So, the good Candidates are the Gaussian copula or t-Copula.
The Results on a dataset are illustrated in finance area.
129
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The multifractal detrended fluctuation analysis (MF-DFA) is a well-established
method to detect correlations and multifractal properties in time series with
non-stationarity and it has found applicability in a wide variety of fields. We
analyze multifractal features by using MF-DFA for various Colombian market
time series (stock market global index and other stock market indices). The
results show the quantifying of the multifractality structure and long-range
correlations of analyzed time series. We discuss implicactions multifractality
nature of these results in developing better forecasting models, such as has been
reported by other investigations [1-2].
References
1. Zunino, L. et al. (2009). Multifractal structure in Latin-American market
indices. Chaos, Solitons & Fractals, 41(5), 2331-2340.
130
CLAPEM 2014
Abstract
In experiments involving a qualitative and a quantitative factor, if a significant
interaction between the factors is detected in the variance analysis, a regres-
sion analysis should be taken. However, the use of linear regression models is
not always the most appropriate key to evaluate the effect of the quantitative
factor. This paper presents a means to suit a nonlinear regression model in a
trial involving repeated measurements over time. For that, the weight gain of
male and female sheep of Santa Ins breed, in kilograms and at twelve different
ages, was measured. Conducted in a split-plot design, since the time factor was
not randomized, the variance analysis requires adjustment on freedom degrees,
due to the unperformed sphericity condition. Greenhouse and Geisser correc-
tion (G-G) was used for the interaction and time effects. The F test in vari-
ance analysis showed significant outcome for interaction between the factors
and, at the outspread interaction for time and gender factor effect evaluation at
each level, the Gompertz adjustment and a adherence test for the model were
proposed as well. After suiting the model for the weight data, a comparison
study among parameter curves for males and females was also made. As the
result shows, the univariate model with split-plot design can be used in trials
involving animal growth. However, its application is subject to a examination
of the sphericity condition. The incorporation of the Gompertz model at split-
ting interactions is also a viable method and enabled the evaluation of real qual-
ity at the model adjustment applied to the data. Also, the comparison among
parameter from adjusted curves showed that males and females have statis-
tically identical values for the parameters; both related to the animals birth
weight. The female maximum weight expected (40.7 kg) is statistically lower
than was found for males (57.3 kg). However, their growth rate (0.011 kg / day
for females) is greater than the males (0.007 kg / day for males), i.e., females
reached weight stabilization faster than males.
131
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The bootstrap is a resampling method for statistical inference. It is commonly
used to estimate confidence. The literature on the bootstrap is extensive. The
bootstrap methods are usually focused in samples of iid variables. However
in many fields of statistic we must to deal with observations that are not iid.
These include regression problems, temporally or spatially correlated data, and
hierarchical problems. In this work we use resampling methods for spatially
dependent data. In particular our interest is to carry out estimation of some
order statistics of the PM10 distribution based on data of this variable recorded
at several monitoring stations of Bogot, Colombia. We assume that the PM10
data are realizations of a Gaussian random field. Our interest is calculating
confidence intervals for the parameters by using resampling methods which
has take into account the spatial dependence between data. In a first steps of the
analysis the behavior of the methodology proposed is studied by using simu-
lated data. We simulate several realizations ofa Gaussian random field. With
data of each simulation we use geostatiscal methods for estimating the spatial
covariance structure which is involved in the Bootstrap estimation procedure.
At the end of the work we show the results of applying this methodology to the
real data set considered.
132
CLAPEM 2014
Carolina Eun
Centro de Investigacin en matemticas, A.C,
(CIMAT)
(Ortega Joaqun, Centro de Investigacin en
matemticas, A.C, (CIMAT); Alvarez Esteban
Pedro C., Universidad de Valladolid, Spain).
Abstract
The problem of detecting changes in the state of the sea is very important for
the analysis and determination of wave climate in a given location. Wave mea-
surements are frequently statistically analyzed as a time series, and segmenta-
tion algorithms developed in this context are used to determine change-points.
However, most methods found in the literature consider the case of instanta-
neous changes in the time series, which is not usually the case for sea waves,
where changes take a certain time interval to occur. We propose a new seg-
mentation method that allows for the presence of transition intervals between
successive stationary periods, and is based on the analysis of distances of nor-
malized spectra to detect clusters in the time series. The series is divided into
30-minutes intervals and the spectral density is estimated for each one. The
normalized spectra are compared using the Total Variation distance and a
hierarchical clustering method is applied to the distance matrix. The informa-
tion obtained from the clustering algorithm is used to classify the intervals as
belonging to a stationary or a transition period We present simulation studies
to validate the method and examples of applications to real data.
133
CLAPEM 2014 Universidad Nacional de Colombia
Lisandro Fermn
University of Valparaso, Valparaso, Chile.
(Jacques Lvy Vhel, Regularity Team, INRIA
Saclay-Ile-de-France & MAS Laboratory, Ecole
Centrale Paris, France).
Abstract
We propose a particular piecewise deterministic Markov process (PDMP) to
model the drug concentration in the case of multiple intravenous doses and par-
tial compliance situation. In this context, we commonly find the problem of vari-
able time-dossing intervals. The model allows us to take into account the irregular
drug intake times. This irregularity in drug input times have to be evaluated. We
will consider random drug input times and we will study the randomness of drug
concentration generated by partial compliance to multiple intravenous doses.
We derive some probability results on the stochastic dynamics using the PDMP
theory, focusing on two aspects of practical relevance: the variability of the con-
centration and the regularity of its probability distribution.
Abstract
In this study a non-parametric statistical methodology is introduced to detect
outliers related to quality water variables measured in Colombia. The main
134
CLAPEM 2014
Abstract
Even today it is an established fact that the social, economic, medicinal and
productive potential of forest resources in the Amazon are still unknown. This
is a fact and it is necessary to invest in essentially local human material in order
to train technicians with vast knowledge of the reality of the Amazon in order
to maximize the sustainable use of its natural resources. Therefore, it is essen-
tial to create technologies for the Amazon that will mitigate the formation of a
technical and theoretical tools, geared to their real needs. Even if we are aware
of this reality, almost nothing has been done for decades to minimize this prob-
lem. It is common to see it used on the Amazon and no adaptation technologies
that were developed exclusively for other realities and thus, inducing some-
times adverse to our reality. Therefore, the main goal we have to create a model
based on individuals (IBM Model) that is capable of simulating the dynamic
specific growth species from native forests of northern Brazil and do not copy
pre-existing models adapted only to other realities. We hope, that the imple-
mentation of this model will add knowledge to our region and he also manages
other discussions on our needs. In this work, we present a model implemented
specifically for Schizolobium species parahyba var. Amazonicus (aka Paric),
which is a species of great economic importance to the northern region of Bra-
zil. In the state of Par, in particular, already accounts for over 40.
135
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
Proportional continuous data can be found in areas such as biological sciences,
health, engineering, etc. This type of data ranges between zero and one (0, 1)
and for its analysis distributions such as logistic-normal (Aitchison and Sheng,
1980), beta, beta-rectangular (Hann, 2008), simplex (Barndorff-Nielsen and
Jorgensen, 1997), among others, have been typically used. However, in practi-
cal situations, it is possible to observe, proportions, rates or percentages being
zero, one or both values and, thus, the previous mentioned distributions can-
not be used. In this work, motivated by the flexibility of simplex distribution,
we propose a three-part mixture distribution, with degenerate point masses at
0 and 1, obtaining a new distribution with support in the interval [0, 1], which
will be called the zero-one augmented simplex (ZOAS) model. For our analy-
sis, we adopt a Bayesian framework and develop a Markov chain Monte Carlo
algorithm to carry out the posterior analyses. The marginal likelihood is trac-
table, and utilized to compute not only some Bayesian model selection mea-
sures but also case-deletion influence diagnostics based on the q-divergence
measured (Csisz, I. et al., 1967). The newly developed procedures are illustrated
with a simulation study as well as application to a real dataset from a clinical
periodontology study. The empirical results shown the gain in model fit and
parameter estimation over other alternatives, and provide quantitative insight
into assessing the true covariate effects on longitudinal proportion responses.
136
CLAPEM 2014
Abstract
We present the main concepts and definitions of the theory of bivariate copulas
and we present and illustrate several general methods of constructing bivari-
ate copulas, In the inversion method, we exploit Sklars theorem to produce
copulas directly from joint distribution functions. Using geometric methods,
we construct singular copulas whose support lies in a specified set.
Abstract
The increasing rate of data generation and storage from distributed and autono-
mous sources are introducing a big scientific and technological challenge. For
example air pollution monitoring from several sources located within the city.
New analysis methodologies are needed to obtain relevant information by
implementing techniques with adequate results [1]. In this work we propose the
development of a technique to analyze air pollution data by using vector quan-
tization. This technique will help to identify clusters by preserving the intrinsic
topology of the data, this clusters are obtained by similarity or correlation from
distributed sources avoiding missing relevant information[3]. The proposal
was implemented to analyze the daily registries of PM10 contamination, data
acquired from monitoring stations located in the Metropolitan area of Santi-
137
CLAPEM 2014 Universidad Nacional de Colombia
ago: Las Condes, Pudahuel y Parque OHiggins, in the year 2009. The proposal
was called ARFIMA-SOM and consists of four stages. The first stage we use the
autoregressive fractionally integrated moving average (ARFIMA)[2] model to
identify the structural properties of stationarity, tendency, and periodicity to
analytically describe the local behavior of the time series for each source. In the
second stage, the information given by the simple and partial autocorrelation
function is used to infer about the structure of the stochastic process. With this
information the characteristic vector is built and used as an input for the self
organizing maps (SOM) models. With the SOM, the most relevant topological
patterns are identified[4], in other words, the similar daily pollution behaviors
are found. In the third stage, for each cluster, we model the time series found
by the SOM that are similar. Finally, in the fourth stage, the critical contamina-
tion episodes are identified, where common spatial are temporal pattern of high
pollution are detected. Simulation results show how our proposal of ARFIMA-
SOM allows us to detect pattern of high level of PM10 pollution based on the
topological combination of time series patterns.
References
1. Caragea D. y Reinoso J. 2005. Statistics Gathering for Learning from Dis-
tributed, Heterogeneous and Autonomous Data Sources, Artificial Intelli-
gence Research Laboratory, IOWA State University, Ames, IA 50011-1040.
138
CLAPEM 2014
Ctia R. Goncalves
(Universidade de Braslia, Brasil)
Dorea Chang C. Y. (Universidade de Braslia,
Brasil), De Resende Paulo A. A. (Universidade de
Braslia, Brasil).
Abstract
In the framework of nested hypothesis testing several alternatives for esti-
mating the order of a Markov chain have been proposed. The AIC, Akaikes
entropy-based information criterion, constitutes the best known tool for model
identification and has had a fundamental impact in statistical model selection.
In spite of the AICs relevance, several authors have pointed out its inconsis-
tency that may lead to overestimation of the true order. To overcome this incon-
sistency, the Bayesian information criterion, BIC was proposed by introducing
in the penalty term the sample size and it is a consistent estimator for large
samples. A more general approach is exhibited by EDC, efficient determination
criterion, that encompass both AIC and BIC estimates. Under proper setting
the EDC, besides being a strongly consistent estimate, is an optimal estimator.
These approaches are briefly presented and compared by numerical simulation.
The presented results may support decisions related to estimators choice.
Abstract
The limit product estimator or Kaplan-Meier is a survival function non-para-
metric estimator, characterized by its ease of calculation and by its asymptotic
139
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
In this paper we show the presence of a first-order phase transition for a fer-
romagnetic Ising model on with a periodical external magnetic field (pro-
2
posed by Maruani et al. ([4]). The external field takes two values h , with h > 0
, composing a cell-board configuration with rectangular cells of sides L1 L2
sites, such that total value of the external field is zero. Formally, for each n, m
integers we define
Z+ = C(n, m),
n ,m:n+m is even
Z = 2 \ Z + .
2
Let = {1, +1} be a configuration on 2 . We study the model with a
formal Hamiltonian defined for any as
140
CLAPEM 2014
H ( ) = J (t ) (s) hs (s),
t , s s
h, ifs Z + ,
hs =
h, ifs Z .
2J 2J
The phase transition holds if h < + . Our result can be applied to obtain
L1 L2
the phase transition in the Ising antiferromagnetic with external field (see
Dobrushin [2] and Dobrushin et. al. [3]). And the phase transition for the model
studied by Nardi, Olivieri and Zahradnik in [5]. In that work the lattice 2 was
represented as a union of one-dimension sublattices (say, horizontal). The exter-
nal field is constant on every one-dimension sublattice and has different signs
on the neighboring sublattices.We used an approach based on the technique of
reflection positivity ([1], [6]). Particularly, we apply a certain key inequality which
is usually referred to as the chessboard estimate. This tool allows us to construct a
sort of the Peierls arguments to evaluate the contours probabilities.
References
[1] Biskup, M.: Reflection Positivity and Phase Transitions in Lattice Spin
Models in: Methods of Contemporary Mathematical Statistical Physics, ed.
Roman Kotecky. Springer-Verlag, Berlin, 2009.
[2] Dobrushin, R.L: The problem of uniqueness of a Gibbs random fields and
the problem of phase transition. Func. anal. and appl., 2:302-312 (1968).
[3] Dobrushin, R.L., Kolafa, J., Shlosman, S.: Phase Diagram of the Two-
Dimensional Ising Antiferromagnet (Computer-assisted proof). Comm.
Math. Phys. 102(1):89-103 (1985).
[4] Maruani, A., Pechersky, E., Sigelle, M.: On Gibbs fields in image processing.
Markov Processes Relat. Fields, 1: 419-442 (1995).
[5] Nardi, F.R., Olivieri, E., Zahradnk, M.: On the Ising model with strongly
anisotropic external field. Journ. Stat. Phys., 97: 87-144 (1999).
141
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
In many fields, data are collected by angular measurements. These data provide
orientation or angles in the plane (circular data) or space (spherical data). The
circular data constitutes the simplest case of this category of data called direc-
tional data where the measure is not scalar but angular or directional. There is
a variety of graphic representations for circular data, among them we have the
rose plot, circular dot plot and boxplot for circular data. We propose a modifi-
cation to the boxplot graph for circular data to visualize the information more
clearly. This will be illustrated with real data.
Abstract
Gini coefficient is a well-known inequality index. In this research we estimate
it, by means of the Multipurpose Household Survey of Bogot. The resulting
estimates are design-unbiased and are used as response variable (taking values
between zero and one) in a beta regression model that incorporates informative
prior information in a Bayesian setup, where the unit of observation is defined
to be localities in Bogot.
142
CLAPEM 2014
Camilo Hernndez
Universidad de los Andes
(Mauricio Junca, Universidad de los Andes.
Abstract
The idea of this work is to study a way to link the two standard problems in
the optimal dividend payment theory: the maximization of profits and the
minimization of the probability of ruin. In this paper, we study the classical
Cramer-Lundberg model with exponential claim sizes subject to a constraint
on the time of ruin (P1). This type of constraint is a feature for which a solu-
tion was unknown. In order to do so, we use Lagrange multipliers to obtain
the Lagrange Dual function which leads to an auxiliary problem (P2). For this
problem, given a multiplier, we prove the uniqueness of the optimal barrier
strategy and we also obtain its optimal value function. Finally, we present the
main theorem of the paper in which we prove that the optimal value function
of (P1) is obtained as the pointwise infimum over all optimal value functions of
the collection of problems (P2).
Jos B. Hernndez C.
Universidad Central de Venezuela
(Jos R. Len, Universidad Central de Venezuela)
Abstract
Water temperature plays an important role in the ecological functioning and
control of biogeochemical processes of a body of water. The objective of this
work is to identify the surface water temperature and improve understanding of
the spatial-temporal variations in Lake Valencia, as well as create a map of both
surface and deep temperatures of Lake Valencia temperatures. In this paper,
data from two surface weather stations and four thermistors located through-
143
CLAPEM 2014 Universidad Nacional de Colombia
out the lake a year starting in November 2007 through October 2008 were used.
Descriptive statistics (mean, maximum and minimum) for daily time series
as well as day and night temperatures were calculated. Wind speed and solar
radiation on the surface were also measured. A correlation between the day
and night temperatures of surface water as well as wind speeds and solar radia-
tion will be made. We define, by using the heat equation, a propagation model
temperature from the surface to the bottom. For this we do not only use the
measures, we have also integrate data from satellite images.
Rodrigo Herrera
Universidad de Talca
(Nikoluas Hautsch, University of Vienna, Valerie
Chvez-Demoulin), University of Laussane).
Abstract
Financial risk management has become a ubiquitous work for banks, compa-
nies and financial institutions, especially during the last subprime crisis. In a
world where globalization is constantly increasing with a dramatic increase in
available information of financial market data, it makes necessary the develop-
ment of new methodologies to describe the dynamics of these instruments. Two
of the most interesting fields which have emerged as reliable frameworks of new
methodologies are point process theory and extreme value theory (EVT). EVT
has showed its major influence in modeling of extreme risk with measures that
attempt to describe the tail of a loss distribution as are the Value at Risk (VaR)
and the Expected Shortfall (ES) (Embrechts et al., 2003; Chvez-Demoulin
et al., 2005; Herrera, 2013), while point process has been applied in different
areas of risk management as for example, portfolio credit risk, high-frequency
trading, jump-diffusion models (Russell, 1999; Hautsch, 2011). The contribu-
tion of this paper is twofold. First, we propose a framework based on marked
self-exciting point process, which captures the dynamic behavior in clusters
of extreme events. In particular, we introduce the autoregressive conditional
intensity peaks over a threshold (ACI-POT) model, which in its most basic
form corresponds to the combination of two known models; the ACI model
introduced by Russell (1999) and the POT model by Davison and Smith (1990).
The second proposed model is the multivariate extension of Hawkes-POT
144
CLAPEM 2014
Abstract
Knowledge about the anthropometry of the population is increasingly neces-
sary for guiding design and monitoring of public health policies. In Brazil, the
main official source of basic anthropometric data (height and weight) is the
145
CLAPEM 2014 Universidad Nacional de Colombia
Hinojosa Adrian
Departamento de Estatstica, Universidade Federal
de Minas Gerais
(Demarqui Fbio, Departamento de Estatstica,
Universidade Federal de Minas Gerais).
Abstract
The present work aims to use the EM-algorithm for estimate the parameters of
a mixture of Phase-Type distributions. Phase-Type distributions are distribu-
tions over the positive real axis, and corresponds to the absorbing time of a con-
tinuous jump process. They were considered since the work Erlang, 1909 and
Neuts, 1959. For a review of the subject and applications see Asmussen(2000).
Two main techniques are used for parametric estimation, the EM-algorithm
was first proposed by Asmussen et al. (1996) and Bayesian estimation using
Markov chain Monte Carlo(MCMC) based approach in Bladt et al. (2003).
The mixture model that we consider was proposed by Frydman (2005), in the
context of stochastic social mobility process. This process uses the same base
generator of the process to get a collection of independent process where transi-
tions are performed at different speeds. The chains are mixed at time zero, upon
146
CLAPEM 2014
choosing the initial state. We develop for this mixture the EM-algorithm as
devised in the work of Asmussen et al.(1996) and implement it on R.
Abstract
This paper introduces an extension of a subfamily of the generalized gamma
Distribution. The extension is denominated slashed quasi-gamma distribution
and is defined as the quotient of a random variable gamma (numerator) and a
random variable with uniform distribution in the denominator. The extension
is directed at making the generalized gamma distribution more flexible in rela-
tion to its kurtosis. Maximum likelihood estimation is implemented for param-
eter estimation. Results of a real data application reveal good performance in
applied scenarios.
147
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
AIDS, since its discovery, constituted itself as an illness that oversteps the
bounds of biomedical dimension: characterized as an incurable clinical pathol-
ogy, which leads to death, also entering the psychological and social fields. This
means that the experience of illness is loaded with prejudice, discrimination,
fear, violence, loneliness, uncertainty, unemployment, poverty, prostitution,
and gender inequalities. It is therefore an important public health problem of
major proportions. Thus, the aim of this study is to model the cases of death
from AIDS in the State of Par, Brazil. For this, we used statistical techniques
exploratory data analysis and binary logistic regression. From the exploratory
data analysis, it is emphasized that the majority of AIDS patients are women
who have contracted the virus by a sexual relationship with infected men, it
was also noted that the majority of patients did not complete elementary school.
Through the binary logistic regression, we found that patients with AIDS who
were with tuberculosis, have two times more likely evolve to death compared
to patients without this complication. The anemic patients are four times more
likely evolve to death compared with non-anemic patients. Patients who already
showed signs of diarrhea have 88% more chance of evolving death compared to
patients who did not present this complication. The values of the parameter
estimates of the variables: presence of tuberculosis, anemia and diarrhea were
significant at the 5%. Thus, AIDS related complications become risk factors for
death of the patient.
148
CLAPEM 2014
Abstract
We present the Bayesian modeling for response variable restricted to the inter-
val (0,1), such as proportions and rates, using the simplex distribution for the
case which data have a longitudinal form and taking into account random
effects. We consider homogeneous and heterogeneous structures for submodels
of dispersion parameter and investigate, by using sensibility analysis, the effect
of five different prior distributions for variance parameters of random effect, on
the final estimation. Models are illustrated with simulated and real data.
Abstract
It is very frequent in evaluations systems the situation where different groups
of examinees are evaluated and their ability distribution parameters estimated
on the same metric scale. In this work we aim to model the possible growth
of the mean parameters of the ability distributions of groups of examines
when the variances of these distributions are equals, but unknown,and the
items are already calibrated in a large sample study.We consider that K differ-
ent groups of individuals are appraised on a certain area of knowledge, taking
tests denominated Test 1, Test 2,..., Test K, respectively. A sample of Nk subjects
from population k takes test k composed of nk items. The total number of items
149
CLAPEM 2014 Universidad Nacional de Colombia
k = f (t k | b),
Lorena Mansilla
Universidad de Valparaso
(Torres Soledad, CIMFAV-Universidad de
Valparaso; Viens Frederi, Purdue University).
Abstract
This work deals with parameter estimation of the biomass of a biological species,
in which organisms interact with others. In 2012, Kefi et al, define a determin-
istic mathematical model for the biomass in terms of trophic and non-trophic
species interactions. We generalize this work developing a stochastic model,
150
CLAPEM 2014
Carolina Marchant
Universidad Federal de Pernambuco
(Vctor Leiva, Universidad de Valparaso, Helton
Saulo, Universidade Federal do Rio Grande do Sul).
Abstract
The Birnbaum-Saunders distribution is receiving considerable attention due
to its good properties. One of its extensions is the family of scale-mixture
Birnbaum-Saunders distributions, which shares its good properties, but it also
has further properties such as robust estimation. The autoregressive conditional
duration model is the primary family to analyze high-frequency financial trans-
action data. We propose a methodology based on new SBS autoregressive condi-
tional duration models. This methodology includes parameter estimation by the
EM algorithm, inference for these parameters, in-sample and out-of-sample fore-
cast techniques and a residual analysis. We prove the robustness of the estimation
procedure and we carry out a Monte Carlo study to evaluate its performance. In
addition, we assess the practical usefulness of this methodology by using real-
world data of financial transactions from the New York stock exchange.
151
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The aim of this work is to establish consistency and asymptotic normality
of maximum likelihood estimators based on progressively Type II censored
samples according to the same lines as in LeCan (1957). In this study, weaker
conditions than those given by Lin and Balakrishnan (2011) are assumed. The
proposed model relaxes the assumption of existence of the third derivative of
the logarithm of the density function.
Josmar Mazucheli
Universidade Estadual de Maring.
(Francisco Louzada Neto, Universidade de So
Paulo, Mohamed E. Ghitany, Kuwait University).
Abstract
The aim of this work is to compare through Monte Carlo simulations, the
finite sample properties of the estimates of the parameters of the Marshall-
Olkin extended exponential distribution obtained by ten estimation methods:
maximum likelihood, modified moments, L-moments, maximum product of
spacings, ordinary least-squares, weighted east-squares, percentile, Crmer-
von-Mises, Anderson-Darling and Right-tail Anderson-Darling. The bias, root
mean-squared error, absolute and maximum absolute difference between the
true and estimated distribution functions are used as the criterion for compari-
son. The simulation study concludes that the L-moments and maximum prod-
152
CLAPEM 2014
ucts of spacings methods are highly competitive with the maximum likelihood
method in small and large samples.
scar O. Melo
Universidad Nacional de Colombia.
(Carlos E. Melo, Universidad Distrital Francisco
Jos de Caldas, Sandra E. Melo, Universidad
Nacional de Colombia).
Abstract
In the context of regression with a beta-type response variable, we propose a
new method that links two methodologies: a distance-based model, and a beta
regression with variable dispersion. The proposed model is useful for those sit-
uations where the response variable is a rate or a proportion, which is related
with a mixture between continuous and categorical explanatory variables. We
present its main statistical properties and some measures for selection of the
most predictive dimensions for the model. A main advantage of our proposal
is that it is quite general because we only need to choose a suitable distance for
both the mean model and the variable dispersion model depending on the type
of explanatory variables. Furthermore, the mean and precision predictions for
a new individual and the problem of missing data are also developed. Rather
than removing variables or observations with missing data, we use the distance-
based method to work with all data without the need to fill in or impute missing
values. Finally, an application of mutual funds is presented using the Gower
distance for both the mean model and the variable dispersion model. This
methodology is applicable to any problem where estimation of distance-based
beta regression coefficients for correlated explanatory variables is of interest.
153
CLAPEM 2014 Universidad Nacional de Colombia
Mauricio Molina
Universidad Nacional de Colombia
(Jos Alfredo Jimnez, Universidad Nacional de
Colombia; Juan David Pulgarn, Universidad
Nacional de Colombia).
Abstract
The Black-Scholes model is widely used in stock markets due to its simple
implementation. But there is good empirical evidence to prove that the under-
lying stock distribution is not lognormal. One of the reasons for this is that the
volatility is not constant, thus, it is necessary to consider a distribution with
heavier tails in order to improve the pricing of European options. Bahra (1997)
considers that the underlying distribution is a mixed lognormal distribution,
with five parameters. In this case, it is easier to adjust these parameters to the
implied risk neutral distribution RND, making the model more feasible to pres-
ent negative skewness and kurtosis, and easy to apply because the pricing for-
mula results in a convex combination of two Black-Scholes prices. In this work,
we proposed a linear transformation of the underlying asset introducing two
new parameters: location and scale, and a new formula was obtained, similar
to classic mixed pricing formula, that was also a convex combination but not
independent of one another, making it more suitable to the stock data and also
easy to find parameters to adjust the risk neutral distribution. The Black-Scho-
les model can be obtained and adjusted when the asset distribution is bimodal.
There are differences between the classic mixed model and this model. We pres-
ent numerical results, showing the difference of three models Black-Scholes,
mixed Black-Scholes, and the proposed model.
154
CLAPEM 2014
Abstract
The sequence segmentation problem aims to partition a sequence or a set of
sequences into a finite number of segments as homogeneous as possible. In this
work we consider the problem of segmenting a set of random sequences with
values in a finite alphabet A into a finite number of independent blocks. We
suppose also that we have m independent sequences of length n, constructed by
the concatenation of s segments of length l *j and each block is obtained from
l*
the distribution pj over A j ,; j =1,, s . Besides we denote the real cut points
by the vector f k * = (k1* ,, ks*1 ) , with ki* = j=1l *j , i =1,, s 1 , these points
i
155
CLAPEM 2014 Universidad Nacional de Colombia
A spatio-temporal study of
maximum extremes rainfalls in
Guanajuato State, Mexico.
Moreno Leonardo
(Facultad de Ciencias Econmicas, UDELAR,
Montevideo, Uruguay)
(Ortega Snchez Joaqun, Centro de Investigacin
en Matemticas, CIMAT, Guanajuato, Mxico)
156
CLAPEM 2014
Abstract
In the context of combination of time series forecasts, it is useful to determine
whether a certain forecast incorporates all the relevant information in compet-
ing forecasts. The reason for this is because we may know whether compet-
ing forecasts may be fruitfully combined. In the linear forecast combination
approach, for example, we can use the forecast encompassing hypothesis tests
to show if one forecast of time series in the combination is encompassed by the
competing forecasts (Newbold and Harvey, 2007). Nevertheless, in nonlinear
combination of time series forecasts there are no tests for forecast encompass-
ing. In this work, we develop and apply a statistical test based on Wald-statistics
in order to prove the significance of individual input variable in a nonlinear
specification. We use a Regression Neural Network (RNN) model with one hid-
den layer as a system of combining forecasts of time series, where the input
variables are the different forecasts generated from several methods, in order
to obtain a combined forecast with more accuracy than the individual fore-
cast with respect to some criterium. The Wald-statistic is proposed by authors
as an alternative way to prove the significance of parameters related to each
input variable. This test can also be viewed as a model selection strategy based
on statistical concepts, see Anders and Korn (1999). We show an application
of the Wald test to prove forecast encompassing in the nonlinear combination
produced from the RNN. We consider a nonlinear regression RNN with one
hidden layer having a functional form given by
Yt = f (X t ; ) + t (5.9)
where
q
f (X t ; ) = 0 + X t' + jG(g 0 j + X t' g j ) (5.10)
j=1
augmented single layer network, see Kuan and White (1994). The full vector
of parameters, = (o , ', ', g', g o' ) contains p + 1 + q( p + 2) parameters, where
' = (1 ,... p )
' = ( 1 , 2 ,... q )
g' = (g1' , g1' ..., g q' ) ; each g 'j = (g j1 ,..., g jp ) and j =1,..., q
g o' = (g 01 , g 02 ,..., g 0q )
Abstract
The class of all finite mixtures of generalized extreme value distributions is
proved to be identifiable. In addition, the estimates of the unknown parameters
of the mixtures, are obtained via the EM - Algorithm. The performance of the
estimates are tested by Monte Carlo simulation.
158
CLAPEM 2014
Daniel Paredes
Universit Paul Sabatier Toulouse III, Institut de
Mathmatiques de Toulouse
(Gamboa Fabrice, Universit Paul Sabatier Toulouse
III, Institut de Mathmatiques de Toulouse; Guerin
La, Universit de Toulouse; INPT, UPS Laboratoire
de Gnie Chimique).
Abstract
Research in particulate systems often requires the solution of a population
balance equation which is written in terms of the number density function.
Besides, the number density function is defined in terms of internal coordi-
nates (e.g. particle size and particle morphologic coordinates) and it generates
integral and derivative terms. Different methods exist for solving numerically
the population balance equation. This methods are often computationally
expensive and they loose efficiency when they are applied to multivariate func-
tions (often the number density function considers just one particle size coor-
dinate like length or volume). Our aim is to search a method for solving this
kind of population balance equations for aggregation and breakage processes,
considering multivariate number density functions in terms of particle size and
morphological coordinates using Fouriers Basis. Keywords: Stochastic Model-
ing, Aggregation-Breakage Processes, Method of Moments, Fouriers Basis.
159
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
In this talk, we will discuss the blow-up in finite time of stochastic differential
equations driven by a Brownian motion. In particular, we talk about extensions
of Osgood criterion, which can be applied to some nonautonomous stochastic
differential equations with additive Wiener integral noise.
Wilmer Pineda
Fundacin Universitaria Los Libertadores
(Isaac Zainea, Universidad Central).
Abstract
Fractional Brownian motion has been used successfully to model a variety
of natural phenomena. Despite the great impact generated by this motion,
the fractional Brownian motion is never a semimartingale except when it is
the classical Brownian motion corresponding to H = 1/2, also the fBm isnt a
Markov process. Hence, the approximation of the fractional Brownian motion
by other classes of stochastic processes is important. In this talk, we will dis-
cuss different ways to approach the fractional Brownian motion, including the
approximation by martingales, semimartingales, Poisson processes and ran-
dom walks. Also, we discuss the kind of convergence and the advantages and
disadvantages of these approaches.
160
CLAPEM 2014
Abstract
I perform a statistical analysis of the income distribution in the Colombian
society and compare the results with those obtained in other societies. In par-
ticular, I show that the Colombian society is characterized by the presence of a
phase with a Boltzmann-like distribution, which includes most of the popula-
tion, and a Pareto-like distribution, which involves individuals with highest
incomes. In addition, I propose to interpret these results in the context of geo-
metro thermo dynamics to understand the phase transition structure of both
distributions.
Abstract
The price of the electric energy in the Colombian wholesale market is a ran-
dom variable with high volatility. For this reason, since the entry into opera-
tion of this market in 1994, several models have been proposed to represent its
behavior and estimation. It is common to find different models to estimate the
price of electricity in the scientific literature. These tools are based on different
premises to model these prices. However, the characteristics of the Colombian
electricity market (highly hybrid, with exportation commitments, etc.) do not
allow applying the techniques reported in other contexts in the country. In this
161
CLAPEM 2014 Universidad Nacional de Colombia
Pedro Regueiro
University of California, Santa Cruz
(Abel Rodrguez, University of California, Santa
Cruz)
Abstract
The class of Bayesian stochastic blockmodels has become a popular approach
to relational data. This is due, in part, to the fact that inference on structural
properties of networks follows naturally in this framework. Here, we propose a
Bayesian multiscale stochastic blockmodel to identify and study possible hier-
archical organization on network data. The model utilizes a prior for the com-
munity structure closely related to the nested Chinese restaurant process. We
use a latent variable augmentation scheme to develop a Markov chain Monte
Carlo algorithm that allows to fit this model. Illustrations are provided both
through simulated and real datasets.
162
CLAPEM 2014
Laura Rifo
University of Campinas
Andrade, P. University of So Paulo
Abstract
The Rosenblatt distribution is a one-parameter family arising from a non-
Central limit theorem for long-range dependent random variables. This family
includes the standard normal distribution, the standardized chi-squared dis-
tribution, and weighted sums of chi-squared variates. Its analytical form is not
manageable, and its moments, cumulants and empirical distribution have just
recently been numerically studied. We apply a Bayesian likelihood-free meth-
odology to obtain inferences for that family, comparing the performance of
some statistics.
Alejandro Rodrguez
Universidad de Talca, Chile.
(Guillermo Ferreira, Universidad de Concepcin,
Chile).
Abstract
In the context of time series analysis, conditional heteroscedasticity has an
important effect on the coverage of prediction intervals. Moreover, when pre-
diction intervals are constructed using unobserved component models (UCMs),
the problem increases due to the possible existence of several components that
may or may not be conditional heteroscedastic, and consequently, the true cov-
erage depends on the correct identification of the source of the heteroscedastic-
ity. Proposals for testing homoscedasticity have been applied to the auxiliary
residuals of the UCM; however, in most cases, these procedures are unable, on
average, to identify the heteroscedastic component correctly. The problem is
163
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
164
CLAPEM 2014
Abstract
This study aimed to contribute to research on households, families and living
conditions, touching on a topic still lacking studies: living conditions, family
arrangements and characteristics of households in the rural population, as an
empirical benchmark elderly of rural settlements in the region of Ribeiro Preto,
SP, Brazil, where he found a significant percentage of elderly in two settle-
ments in Sao Paulo agrarian reform: Monte Alegre and Guarani. If contacts
that through the analysis of questionnaires, from, only, the univariate descrip-
tive analysis fails to observe the different forms of associations / relationships
between three or more variables. This situation can be resolved using multi-
variate techniques. The aim of this work was to study, by means of multivari-
ate techniques, living conditions, family arrangements and characteristics of
households of elderly rural population. Data were derived from a field survey
with application of closed questionnaire, consisting of thematic blocks in 355
households. Because the data collected are categorical responses, the Multiple
Correspondence Factor Analysis ( MCFA ) was applied to identify associa-
tions. This technique aims to group highly correlated variables, with a result-
ing reduction in the number of predictor variables in the model. Results and
Discussion: The axes generated by FHH showed a satisfactory contribution of
the variables under study to identify the most important relationships between
them, and hence identify the groups. 59 % of the total settlement in Monte
Alegre, and 65 % of the total, the Guarani settlement, were headed by individu-
als 60 years or older, mostly male. Regarding marital status, the highlights were
the percentages of married, the two areas. The collected data also revealed that
the vast majority belonged to complete nuclear families of old or families. Also,
there were seconded percentage of illiteracy or the old primary school incom-
plete. The great majority lived in the settlement for more than 15 years, with an
average income of one minimum wage, however, 35 % of them have disclosed
165
CLAPEM 2014 Universidad Nacional de Colombia
no income. As to the origin, the Guarani elders came, mostly, from the urban
environment, in Monte Alegre greater percentage from rural areas occurs. The
women were closer to the age group 40 to 60 years, only 1 % of them were head
of household and the vast majority had no income. Settlements in the study, it
was found that, contrary to what occurs in urban areas, the vast majority of the
elderly lived with their families. This demonstrates the social practice of adding
family around her other relatives, for shorter or longer periods, depending on
the needs. Respondents made reference to a list of chronic diseases - resem-
bling the general national statistics. A risk factor to health -related occupation
was found: exposure to pesticides and their side effects. Given this situation,
the demographic transition in Brazil requires actually forming new strategies
against elderly in So Paulo countryside, their living conditions and welfare, to
achieve improvement of all items raised about conditions of life and well - being
of elderly residents in rural settlements.
Luis AngelRodriguez
CIMFAV, Facultad de Ingeniera, Universidad
de Valparaso, Chile and Dpto. de Matemticas,
FACYT, Universidad de Carabobo, Venezuela
(Lisandro Fermn, CIMFAV, Facultad de Ingeniera,
Universidad de Valparaso, Ricardo Ros,
Universidad Central de Venezuela)
Abstract
We consider nonparametric estimation for functional autoregressive process
with Markov-switching. First, we study the case where the complete data is
available; i.e. when we observe the Markov-switching regimen, then we esti-
mate the regression function in each regimen using a Nadaraya-Watson type
estimator. Second, we introduce a nonparametric recursive algorithm on the
case of hidden Markov-switching regimen, which restore the missing data by
means Monte-Carlo step and estimate the regression functions by a Robbins-
Monro step. Consistency and asymptotic normality of the estimators is proved.
166
CLAPEM 2014
Abstract
We described a basic notion that uses the characterization of a graph as a set of
independencies and the notion of a minimal I-map. We then defined a notion
of a perfect map and showed that not every distribution has a perfect map. We
described the concept of I-equivalence, which captures an equivalence rela-
tionship between two graphs, one where they specify precisely the same set
of independencies. Finally, we defined a partially directed graph that provides
a compact representation for an entire I-equivalence class, and we provided
an algorithm for constructing this graph. We showed the properties of the
Bayesian network representation and its semantics. These results are crucial to
understanding the cases where we can construct a Bayesian network and pres-
ent the multivariate normal distribution with a random graph.
Abstract
We show in this article that the jackknife empirical likelihood method pro-
posed by Jing, Yuan & Zhou (2009) can be applied to construct design based
confidence intervals under unequal probability sampling without replacement.
This method is extremely simple to use in practice. A simulation study are con-
ducted to compare the Monte Carlo performance of the 95 %jackknife empiri-
167
CLAPEM 2014 Universidad Nacional de Colombia
cal likelihood confidence interval with the standard confidence interval based
on the central limit theorem. In terms of coverage probability of confidence
intervals, the jackknife empirical likelihood generally outperforms the stan-
dard confidence interval. Key words: Confidence Interval, Empirical Likeli-
hood, Jackknife, Unequal Probability Sampling.
Abstract
Regression models under the assumption of independent and normally dis-
tributed errors with varying dispersion are a very flexible statistical tool for
data analysis because they admit that both location and dispersion parame-
ters depend on the explanatory variables, which allows these models can be
applied to a wide variety of practical situations. The statistical inference on
this class of models was developed by Aitkin (1987) and Verbyla (1993) under
the classic approach and by Cepeda and Gamerman (2001) under the Bayesian
approach. Xu and Zhang (2013) extended the proposal of Cepeda and Gamer-
man (2001) including a nonparametric additive effect (which is described by
an B-spline (see, for instance, Boor (1978))) in the systematic component of the
location parameter, i.e., assuming that the functional form of the dependence
between the mean or median of the response variable distribution and a con-
tinuous explanatory variable is unknown. However, in practice, there are data
sets in which the effect of a continuous explanatory variable on the dispersion
parameter has a functional form also unknown. On the other hand, as is well
known, the inference on models under the assumption of normally distributed
errors can be highly influenced by outlying observations on the response vari-
able. Therefore, in this paper we study the statistical inference and the diag-
nostic methods based on the Bayesian approach for regression models under
the assumption of independent additive errors follow normal, Student-t, slash,
168
CLAPEM 2014
Abstract
Response Theory (IRT) has reached a major role in the area of Educational
Assessment, as well as in several other areas of knowledge. Basically, the IRT
proposes models for latent traits, or characteristics of the individual that cannot
be observed directly. This type of variable should be inferred from the observa-
tion of related secondary variables. The IRT suggests methods to represent the
relationship between the probability of an individual to give a correct answer
to an item and its latent trait (ability or proficiency) in the area of knowledge
169
CLAPEM 2014 Universidad Nacional de Colombia
assessed. In addition, a parameter set that describes the item also influences
that probability. Depending on the case, the interest may lie in the estimation of
item parameters (calibration), estimation of individual skills, and/or estimation
of average skills. In many applications, we want to compare the average skills
of several populations in a particular subject (such as mathematics or Portu-
guese language), with items already calibrated. In this case, we can estimate
the population parameters via Marginal Maximum Likelihood, as proposed
in Zimowski & Bock (1997) in specific softwares. However, usually we do not
know the parameters of the items, but we need to estimate them at an early
stage (e.g. via EM Algorithm - see Dempster see at al, 1977) and then estimate
the population parameters afterwards (Andrade, Tavares and Valle, 2000). In
this study, we propose a method for estimating average skills of a latent distri-
bution in Item Response Models. We consider the case where we have only two
study populations, subjected to tests with common items. The proposal is based
on a function of the difference of the proportions of correct answers of the two
populations on the common items. We present some numerical results when
items set in all replicas are fixed, as well as when we vary the items. We also
performed an analysis of residues, and an exploration based on the sample size.
Christian Salas
(Universidad de La Frontera)
(Gregoire Timothy G. (Yale University))
Abstract
Estimation of site productivity is crucial for both management and research
purposes. Site-index is the dominant height of a forest at a reference-age, and
is the index most commonly used for site productivity estimation in forestry.
However, the concept is based on that climate for a given site is fixed through
time. We used stem analysis data of more than 300 dominant trees of the native
species Nothofagus dombeyi, spanning the geographical distribution of these
species in south-central Chile. A solution of a differential equation with a power
transformation was used as growth model, and was fitted using nonlinear
mixed-effects models, adding random-effects to one of its parameters. Later,
we regress the random-effects to site factors, habitat type, and climate variables.
170
CLAPEM 2014
Variance of an alternative
item count technique
Abstract
Some alternative methodologies in order to treat sensitive questions in surveys
have been proposed in the literature (Warner, 1965; Devore, 1977; Miller,1984;
Droitcour et al., 1991; Kim and Warde, 2004; Chaudhuri, 2011; Imai, 2011; Hus-
sain, Shaz y Shabbir, 2013, among many others). A review of these methods
can be found in Trujillo and Gonzalez (2012). A particular type of these recent
methods are known as Item Count Techniques (ICT). The aim is to obtain
estimations of the prevalence or the total number of individuals in a popula-
tion possessing a particular sensitive characteristic. The particular questions
associated to these variables normally carry problems either of nonresponse
or bias. Also, most of the ICT methods proposed consider the strong assump-
tion of a sampling design corresponding to a particular simple random sample
with replacement. In this work, we extend the estimators for a finite population
under any (complex) survey sampling design with their corresponding vari-
ance. Some simulations prove that the theoretical variance is in effect equal to
the found expression in a finite population.
171
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
Each realization of a stochastic process is affected by error measurements,
meanwhile the prediction of time series only take into account the random-
ness regarded to the variability of the stochastic process through time. There-
fore, the uncertainty of the data is not considered in the conventional modeling
and it becomes necessary to design or implement techniques to manage this.
In this work, we exhibit the modeling process of a time series based on fuzzy
techniques. The implementation of if-then fuzzy rules to model the series can
tackle the problem of uncertainty in the input data. The application of fuzzy
techniques are performed using the \Takagi-Sugeno-Kang (TSK) to model the
time series and to make predictions robustly. The TSK can address the uncer-
tainty by recognizing the local behavior of the process and, moreover, we can
interpret the model of the system. Therefore, the if-then fuzzy models have the
advantage over conventional nonlinear modeling because the local representa-
tion of the process. This local representation allows the description of a nonlin-
ear system using mathematical functions and address the overall complexity
underlying the dynamic process. The antecedent of a fuzzy rule divides the
input space in local diffuse regions, while the consequent describes the dynam-
ics of these regions. The antecedent of the rule is within certain regions of the
input space and the consequent is usually an autoregressive model. Further-
more, the TSK model works as a local predictor because it is associated with
a specific region of the input space. Inside these regions, the local predictions
describes the dynamic behavior of part of a complete system captured by the
antecedent part of the rule[1].The model is applied to forecast the level of con-
centration of air pollution based on the time series, where this measurements
are prone to several sources of noise. Simulations results shows a competitive
performance in the mean square error. 1. A. Veloz, R. Salas, H. Allende-Cid and
H. Allende (2012). SIFAR: Self-Identification of Lags of an Autoregressive TSK-
based Model. In the proceeding of the 42nd IEEE International Symposium on
Multiple- Valued Logic, ISMVL 2012, Victoria, BC, Canada, May 14-16, 2012,
pp. 226-231. IEEE Press.
172
CLAPEM 2014
Abstract
The transition Markov models are a tool very important for several areas of
knowledge when studies are developed with repeated measures. They are char-
acterized by modeling the response variable over time conditional to the previ-
ous response which is known as the history. In addition it is possible to include
other covariates. In the case of binary responses, can be constructed a matrix
of transition probabilities from one state to another. In this work, four different
approaches to transition models were compared in order to assess which best
estimates of the causal effect of treatments in an experimental studies where the
outcome is a vector of binary response measured over time. Simulation study
was held taking into account a balanced experiments with three treatments of
categorical nature. To assess the best estimates standard error and bias, beyond
the percentage of coverage were used. The results showed that the marginal-
ized transition models are more appropriate in situation where an experiment
is developed with a reduced number of repeated measurements.
173
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
To determine the proportion of nutrients consumed by ruminants, non-lin-
ear models are widely used in studies that seek to estimate the parameters of
ruminal degradation kinetics, through classical methods of univariate analysis.
However, as these studies involve longitudinal data, the use of mixed meth-
odology may be more favorable to describe this phenomenon. The aim of this
study was to use non-linear mixed models (MNLM) in parameter estimation
of ruminal degradation in situ kinetics of sugar cane, in sheep fed on diets with
different roughage (R): concentrate (C) proportions. The data used were deter-
mined by the in situ technique using 4 adult males sheeps, without race, can-
nulated in rumen. The experiment was originally designed in a split plot, with
plots in a Latin square with 4 animals, 4 periods and 4 treatments (consisting of
diets with different roughage (R): concentrate (C): 100R:0C; 80R:20C, 60R:40C,
40R: 60C), and the plots, 13 incubation times (0, 12, 24, 48, 72, 96, 120, 144, 168,
192, 216, 240 and 312 hours). We adopted the mixed logistic nonlinear model to
explain the behavior of the degradability of indigestible neutral detergent fiber
(iNDF) and indigestible acid detergent fiber (iADF) of sugar cane (in natura)
depending on the incubation times. Variance components were estimated by
the likelihood maximum method. In the selection of random and fixed part of
the model, we used the likelihoods ratio test (LRT) and the information crite-
rion AIC and BIC. The analyses were performed using the nlme package of R
software, 3.0.1, considering a significance level 5%. The final model for iNDF
included a random effect in the parameter I of logistic model, only 3 curves are
needed to describe their degradability over time, where the treatments with
roughage 60 and 40% didnt differ. Have to iADF, the final model included ran-
dom effect in the 2 parameters of the logistic model (I and k), however, 4 curves
174
CLAPEM 2014
were needed to describe the treatments, with the parameter k of the treatments
with 80 and 40% roughage didnt differ. For both variables, the highest percent-
age of roughage provided the highest degradation rate. The correlation of longi-
tudinal data was properly estimated, adequately explaining the extra variability
caused by the effects of factors associated with the experimental design, with
the inclusion of random effects in the model parameters. This fact, if not con-
sidered, can affect the estimates and the associated standard error and, thus,
alter the results significantly. Moreover, the mixed approach is quite attractive
when the research also aims to understand the behavior of the process degrad-
ability over the incubation times.
Ali Satty
University of KwaZulu-Natal.
(Henry Mwambi, University of KwaZulu-Natal
Geert Molenberghs, Hasselt University).
Abstract
This paper compares the performance of weighted generalized estimating
equations (WGEE), multiple imputation based on generalized estimating equa-
tions (MI-GEE) and generalized linear mixed models (GLMM) for analyzing
incomplete longitudinal binary data when the underlying study is subject to
dropout. The paper aims to explore the performance of the above methods in
terms of handling dropouts that are missing at random (MAR). The methods
are compared on simulated data. The longitudinal binary data was generated
from a logistic regression model. Dropouts are generated under several differ-
ent dropout rates and sample sizes. The methods were evaluated in terms of
bias, accuracy and mean square error in cases data are subject to random drop-
out. In the conclusion, MI-GEE method is doing better in both (not in terms)
small and large sample sizes.
175
CLAPEM 2014 Universidad Nacional de Colombia
Adrien Saumard
(Universidad de Valparaso).
Abstract
Penalization is a general tool in nonparametric statistics, that allow to select
an estimator (or equivalently a model) along many others. In many situations,
it is possible to design accurate penalties that are asymptotically optimal and
non-asymptotically nearly optimal ([5]). However, it is well-known that opti-
mal penalties that are prescribed by theory usually bene.t from a slight over-
penalization in practice ([4], [1]), for sample sizes that are small to moderate.
In the case of the AIC criterion, a few non-asymptotic corrections have thus
been proposed, such as the AICc criterion of Burnham et Anderson ([3], [4]) or
the over-penalization proposed by Birg and Rozenholc ([2]). However, these
attempts are not su ciently theoretically grounded to be suitably generalized
to other situations.
References
[1] S. Arlot. Choosing a penalty for model selection in heteroscedastic regres-
sion, June 2010. arXiv:0812.3141.
[2] L. Birg and Y. Rozenholc. How many bins should be put in a regular his-
togram. ESAIM Probab. Stat.,10:24.45 (electronic), 2006.
176
CLAPEM 2014
[4] G. Claeskens and N. L. Hjort. Model selection and model averaging. Cam-
bridge Series in Statistical and Probabilistic Mathematics. Cambridge Uni-
versity Press, Cambridge, 2008.
MatthieuSaumard
(PUCV)
Abstract
Procedure of testing with functional data, for most of them, has been designed
in the case of independent variables. The case of dependent variables is an
important case to be explored. Two tests, similar to the ones we propose, have
been already established in presence of dependent random variables, we refer
to Aue et al. and Horvath et al.. Nevertheless, they are performed to study the
stability in the functional liner model or the functional autoregressive process.
Delsol et al. have considered structural test in presence of independent random
variables. In this poster, we will present a simple test in the context of func-
tional time series. This is a test of no-effect in the functional linear model with
dependent regressors.
177
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The concept of the Item Response Theory (IRT) was founded around 1930, but
it was axiomatized in the 1960s. The IRT is one of the theories of latent model-
ing that emerged in the 1930s, which posit that human behavior is a result of
hypothetical processes called latent traits. In Brazil, many applications in the
field of Education have adopted the TRI as a standard methodology for calibra-
tion of items (question) and the consequent estimation of these latent traits (or
skills), and now serve as selective processes for entrance to universities, stu-
dent funding and scholarships on federal government projects. It is essential to
develop studies to identify which method, if any, is most appropriate for esti-
mating these skills in order to generate the fairest possible results, given the
purposes of these tests. According to Andrade, Tavares & Valle (2000), the IRT
is based on a set of statistical models that seek to mathematically represent the
probability of an individual to give/select the right answer for an item as a func-
tion of the parameters of this item and the skills of that individual. The meth-
odologies employed by this theory require computational implementations of
the methods available. To estimate the parameters of the items, methods such
as Marginal Maximum Likelihood Method, through the EM Algorithm have
been adopted. The skills have been generally estimated by Maximum Likeli-
hood (ML), EAP (Expected a Posteriori), MAP (Maximum A Posteriori), and
weighted and Biponderada. Recently, some applications have been developed
for using IRT, such as IRT - Pro - MG BILOG, PARSCALE, Multilog, TEST-
FACT, and LOGIST. In this study, we describe a study of the implementation
of the estimation process of skills aiming to compare the performance of some
key softwares.
178
CLAPEM 2014
Josiane Stein
Mathematics Institute, Federal University of Rio
Grande do Sul
(Slvia R.C. Lopes, Mathematics Institute, Federal
University of Rio Grande do Sul,Ary V. Medino,
Mathematics Institute, Federal University of Rio
Grande do Sul).
Abstract
This work presents a continuous time process derived from generalized Lan-
gevin equation (GLE). The main interest is to study the GLE when the noise
process has infinite second moment. For this situation, we consider the noise
as being a symmetric -stable Lvy process, which can also have infinite first
moment. One goal is to study the dependence structure of the process, but the
function of autocovariance is not defined in the case of infinite second moment
processes. We propose to use a dependence measure, the so-called codifference.
We also propose an estimator for this dependence measure. Another interest
in this work is to estimate the process parameters. We consider the maximum
likelihood estimator for a particular case of the general process, which is the
one derived from the solution of the classical Langevin equation. The continu-
ous process resulting from this equation is called the Ornstein-Uhlenbeck (OU)
process (see Barndorff and Shephard, 2001, Jongbloed et al., 2005 and Zhang
and Zhang, 2013). Since the -stable distribution has a closed formula in only
three cases, that is, when {0.5,1,2} (see Samorodnitsky and Taqqu, 1994),
it is necessary to use numerical methods for the process generation and the
estimation by maximizing the likelihood function. Consider the GLE, given by
where {L(t )}t 0 is a symmetric -stable Lvy process and V0 is a random vari-
able independent of L(t). Under a few conditions in L() and 1< 2 , the solu-
tion of this equation is given by
179
CLAPEM 2014 Universidad Nacional de Colombia
V (t ) = V0 (t ) + (t s)dL(s), (5.2)
0
(t ) = (t s)(s)ds
0
(5.3)
(0) =1.
One wants to define a dependence measure for any stationary process. If
{ X (t )}t 0 is any stationary process, then the codifference function is given by
{
(t ) = ( X (t ), X (0)) = log ei ( X (t ) X (0)) }
{
log e i ( X (t )) } log { e
i ( X (0))
}
. (5.4)
For the stochastic process given by (5.2), the codifference function can be cal-
culated as
V ((t ) 1)
(t ) = log 0
log(V0 (1)), (5.5)
V ((t ))
0
1/
d
kh 1 e h
Z k ,h = e ( s kh )dZ s = Sk , (5.7)
( k 1)h
180
CLAPEM 2014
References
1. Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-Gaussian Ornstein-
Uhlenbeck-based models and some of their uses in financial economics.
Journal of the Royal Statistical Society, Serie B, 63(2), 167-241.
181
CLAPEM 2014 Universidad Nacional de Colombia
Heliton Tavares
Federal University of Par.
(Dalton Andrade Federal, University of Santa
Catarina, Tnia Macedo, Unesp).
Abstract
The data analysis of Magnetic Resonance Imaging (MRI) is a diagnostic imag-
ing method well established in medical practice and increasing in terms of
its development. Given the high ability to differentiate tissues, the range of
applications extends to all parts of the human body and explores anatomical
and functional aspects. Analysis of Functional Magnetic Resonance Imaging
(fRMI) stands out as one of MRI techniques that has allowed to explore brain
functions such as memory, language and motor control. Several applications
have emerged in the evaluation of cognitive processes, as well as monitoring
the growth of brain tumors, pre-surgical mapping studies of mental chro-
nometry, and also as a method of diagnosis in Alzheimers disease. Basically,
fMRI analyzes blood flow to detect the brain areas activated by some stimu-
lus, a function, and multiple images (slices) are obtained simultaneously, and
this process is repeated over time, sometimes with several individuals. How-
ever, the brain performs other tasks in parallel, so the biggest challenge is to
accurately identify the area activated by the stimulus under study. This article
aims to present a proposal for identification of the active region based on Item
Response Theory, and compared with the current methodology. Each image is
subdivided into small voxels (pixels), which are categorized as Active or Inac-
tive. We assume that the probability of a voxel to be active will be given by a
one-parameter logistic model (LM1), which represents the activity level of the
voxel. The parameters are estimated by Marginal Maximum Likelihood and
represent the area activated by the function under study.
182
CLAPEM 2014
Francisco Torres-Avils
Universidad de Santiago de Chile.
Abstract
This paper presents a review of the inference in gamma regression models
from a Bayesian perspective with emphasis on mixed models. The work begins
by presenting the usual construction of this class of regression models, and
later defines some extensions of these ones. We discuss the choice of the link
function, the elicitation of prior distributions, inclusion of random effects and
model selection through a simulation study. Finally, the methodology is illus-
trated using real data from the public health area.
Abstract
The traditional measure of risk in an assets portfolio is the VaR (Value at Risk)
due to its good properties and easy interpretation. However, only a few refer-
ences are devoted to the generalization of this concept to the multivariate con-
text. In this work, we introduce the definition of a multivariate financial risk
measure MRVaR based on the directional extremality quantile notion recently
introduced in the literature. The directions in the definition of the MRVaR can
be chosen by the investor according to her/his risk preferences. We state the
main properties of this MRVaR, the non-parametric estimation and a robust-
ness analysis. We also show the advantage of using this MRVaR with respect to
other multivariate VaR introduced in the recent literature. Finally, we illustrate
183
CLAPEM 2014 Universidad Nacional de Colombia
our definition with the Archimedian copula for which it is possible to obtain
the explicit expression of the MRVaR.
Soledad Torres
CIMFAV, Facultad de Ingeniera, Universidad de
Valparaso
(Antoine Lejay, Universit de Lorraine, iecn, umr
7502, Vandoeuvre-ls-Nancy, F-54500, France cnrs,
iecl, umr 7502, Vandoeuvre-ls-Nancy, F-54500,
France Inria, Villers-ls-Nancy, F-54600, France
Ernesto Mordecki, Centro de Matemtica, Facultad
de Ciencias, Universidad de la Repblica).
Abstract
We study the asymptotic behavior of the maximum likelihood estimator corre-
sponding to the observation of a trajectory of a skew Brownian motion, through
a uniform time discretization. We characterize the speed of convergence and
the limiting distribution when the step size goes to zero, which in this case are
non-classical, under the null hypothesis of the skew Brownian motion being an
usual Brownian motion. This allows to design a test on the skewness parameter.
Projective convergence
Abstract
We define the notion of projective convergence of probability measures with
complete support. Related to the topology induced by this type of convergence,
we define the projective distance ho and we give some of their topological prop-
erties such as non-separability and completeness of the space. Our motiva-
184
CLAPEM 2014
Abstract
The agents who take part in the electrical markets face the uncertainty of
the future behaviour of the market, impeding decision-making for the short,
medium and long term. The spot price of energy determines the form in which
the commercial exchanges are realized. In this work, several models appear,
which capture the dynamics of the energy spot price of Colombia and the
estimation of its parameters. The seasonality reversion is represented to long
term average and dependence by fundamental variables of this market. The
occurrence of the El Nio phenomenon generated alterations in the price of
the energy both in its expected value and in variance, as well as the hydraulic
generation of the system, the level of the flows and the demand of energy. The
electrical reform implanted with the Laws 142 and 143 of 1994 of Colombia,
created a wholesale competitive market, in order to achieve the efficiency in the
service of electricity and the free entry to the agents interested in giving it. This
market is known as Market of Wholesale Energy (MEM) and in it participate
the agents who develop the activities of generation, transmission, distribution
and commercialization, as well as the big consumers of electricity (Prez et al.,
1999). In 1995, a way was opened for the free competition and private partici-
pation whereby the competition in the generation of energy helped to reflect
the energy real spot price and the variations it suffers due to different factors
such as the occurrence of the phenomenon known as El Nio, availability of
water, costs of generation, among others (Botero et al., 2008). A suitable under-
standing of the factors that influence the price of a bag of energy will allow
the agents who compromise on this market, to define strategies that maximize
185
CLAPEM 2014 Universidad Nacional de Colombia
their income and simultaneously to adequately manage the risk of the varia-
tions of their cash flow by means of the use of financial available derivatives
(Trespalacios et al., 2012).
Previously, several statistical models have proposed themselves for the adjust-
ment and forecast of the price spot such as (Botero et al., 2008), who implement
a process of a statistics family based on the models WEAPON AND ARIMA
(Gil, et al., 2008) and (Lucia et al., 2002) present models that besides taking into
account the effect of the occurrence of the El Nio phenomenon also work on the
explanatory variables in mind: level of the flows, demand of energy and the gen-
eration of energy. The models evaluated by (Pilipovic, 1998) and (Geman et al.,
2003) explain the seasonal variation, reversion to the average and jumps. For the
modeling, they first analyzed the historical behavior of the energy spot price in
Colombia from January 2000 until December 2013, the stationarity, the jumps or
beaks that they present over time and to which factors these anomalies are owed.
Then, they proposed the models which are considered to be suitable departing
(Lucia et al., 2002), the estimation of the respective parameters and the prediction
of the energy spot price for the year 2014 of each one of the models. We think the
energy spot price in Colombia presents bosses of seasonal variation and reversion
to the average, since the previous studies. A change is demonstrated in the struc-
ture of the variance by the end of the year 2013 that is explained by the reduction
of the hydrological contributions in the western zone of the country without a
climatically important event, this deepened the possibly for a small availability of
natural gas to supply the thermal plants in periods of shortage.
Abstract
Nonlinear regression models are commonly applied in areas such as Biology,
Chemistry, Medicine, Economics and Engineering. The analysis based on mod-
186
CLAPEM 2014
els under normal errors and constant variance is the most popular when the
variable of interest is continuous, due to desirable statistical properties and a
comprehensive developed theory. Nevertheless, the application of such mod-
els may be inadequate in some scenarios commonly found in practice. For
instance, as shown in this paper, ignoring the skewness of the response vari-
able distribution may introduce biases on the parameter estimates and/or on
the estimation of the associated variability measures. To deal with this prob-
lem, some proposals have been made in the literature to replace the normal-
ity assumption by more flexible classes of distributions. For example, in the
context of asymmetric and heavy-tailed responses, Lin et al. (2009) derived
diagnostic methods in nonlinear skew-t-normal regression models; Cancho et
al. (2010) studied nonlinear skew-normal regression models using classical and
Bayesian approaches; Lachos et al. (2011) introduced heteroscedastic nonlinear
regression models based on scale mixtures of skew-normal distributions; and
Labra et al. (2012) derived diagnostic methods for the class of regression models
introduced previously by Lachos et al. (2011). Although the models studied in
these papers are attractive, they have some limitations. For instance, modeling
the mean instead the median and assuming that the skewness parameter is
constant across the observations. That being so, this paper provides a unified
theoretical framework for semiparametric regression analysis based on log-
normal, log-Student-t, Birnbaum-Saunders, Birnbaum-Saunders-t and other
skewed and strictly positive distributions, in which both the median and the
skewness of the response variable distribution are explicitly modeled. In this
setup, named here as log-symmetric regression models, the median is described
using a parametric nonlinear function, whereas the skewness is modeled using
a semiparametric function whose nonparametric component is approximated
by a natural cubic spline (see, for instance, Green and Silverman (1994)). In the
context of nonparametric and semiparametric models, it is possible to cite some
of the most important contributions. For instance, Hastie and Tibshirani (1990)
introduced the class of generalized additive models and Rigby and Stasinopou-
los (2005) introduced the generalized additive models for location, scale and
shape (GAMLSS), which deal with the semiparametric mixed joint modeling
of all parameters in a general class of distributions. Rigby and Stasinopoulos
(2006, 2007) also illustrated the use of semiparametric models based on Box-
cox-t distribution; and developed a very flexible implementation of GAMLSS in
the statistical package R (www.R-project.org). More recently, Ibacache-Pulgar
et al. (2013) derived diagnostic tools in symmetric homoscedastic semipara-
metric models.
187
CLAPEM 2014 Universidad Nacional de Colombia
Velasco Jairo
Universidad Nacional de Colombia
Abstract
It is left to consider the form of a family of probability density functions that
allow modeling and studying cases where the data are clustered preferentially
around more than an apparent average. The functional form can be seen from
2ae a(b x ) 2 k
f ( x , a , b, k ) = a (b x ) 2 sin
I (x ) (1)
[1 + e ] 1 + e a(b x ) ( , )
SandraVergara Cardozo
Universidad Nacional de Colombia
(Bryan F. Manly, Western Ecosystem Technology
Inc, Raydonal Ospina, Federal University of
Pernambuco).
Abstract
Resource selection functions (RSFs) are used for quantify how animals are
selective in the use of the habitat period or food. A Resource Selection Prob-
ability Function (RSPF) can be estimated if N, the total number of units in the
population, and n1 the total number of used units in the study period is both
known and small. An approximation of the RSPF can then be estimated using
any standard program for logistic regression but the variances of the estimates
188
CLAPEM 2014
of the parameters are too small. Three methods of bootstrap sampling, para-
metric, nonparametric and a modified parametric method are proposed for
the estimation of variances, with a discussion about the limitations of logistic
regression for estimating RSPF. The method for estimating the RSPF described
here has potential applications in medicine, ecology and other areas.
Ignacio Vidal
Universidad de Talca
(Felipe Vsquez Universidad de Concepcin, Walter
Gmez Universidad de La Frontera).
Abstract
We consider the problem of describing consumer choice situations character-
ized by the simultaneous demand for multiple alternatives that are imperfect
substitutes for one another. In this paper, we propose a modification of the main
econometric technique to deal with this problem, the so called Kuhn-Tucker
multiple discrete-continuous economic consumer demand model. Our pro-
posed approach provides tractable forms for the densities and is based on the
use of a symmetric probability distribution for the error. The considered utility
function has a quadratic structure allowing non-additive preferences. Our pro-
posal includes, then, perfect and imperfect substitutes in choice alternatives and
gives an explicit functional form for the interaction of any pair of alternatives.
The functional form can be coded using modern formal calculation software
tools. By the maximum likelihood method we consider a constrained optimiza-
tion problem appears, that takes into account the mathematical assumptions of
the whole approach. We illustrate our methodology with a real data set related
to the time use of the inhabitants of Santiago de Chile.
189
CLAPEM 2014 Universidad Nacional de Colombia
Abstract
The objective of this study was to estimate the hourly ozone concentration in
region of Grande Vitria, Esprito Santo, Brazil, using ARMAX/ GARCH model,
for the period from 2011/01/01 to 2011/12/31. The data were provided by State Insti-
tute of Environment and Water Resources (IEMA). The models were estimated
for three stations: Laranjeiras, Ensead do Su and Cariacica. Some parameters
measured at the stations were adopted as explanatory variables of ozone concen-
tration, namely: temperature, relative humidity, wind speed and concentration of
nitrogen dioxide. These variables were significant and improved the fit of the esti-
mated model. The hourly predictions for the day 2011/12/31 (reserved to verify
the accuracy of the model) were close to the observed values and the estimates
generally followed the path of daily ozone concentration. When compared with
the ARMA and ARMAX models, ARMAX-GARCH model proved to be more
effective in the prediction of episodes of ozone pollution (higher hourly concen-
tration of 80 ug/m3), reduced the number of false alarms estimated and showed a
lower rate of occurrence of undetected episodes.
190
CLAPEM 2014
Abstract
The study of time series is one of the most important subjects in the statistical
literature, the main purpose being to provide methods for modeling data sets
that exhibit correlation over time and to allow to make predictions. Integer-
valued time series have paid the attention because they occur in many contexts,
for example, the numbers of accidents in a manufacturing plant each month,
or the numbers of fishes caught in a particular area of sea each week, often as
counts of events, objects or individuals in consecutive intervals or at consecu-
tive points in time. In the last three decades, there has been an increasing inter-
est in proposing methodologies to study integer-valued time series, including
how to obtain non-negative and integer predictors. We center our attention in
studying and proposing new forecasting procedures for the Integer-valued first-
order Autoregressive Process (INAR(1)) with Poisson marginal distribution,
based on the binomial thinning operator and for the Integer-valued first-order
Autoregressive Conditional Heteroskedasticity Process (INARCH(1)), which
takes into account the over dispersion.
Special Session
Alexander Infanzon
SAS
Abstract
There is a shortage of data scientists worldwide. Data scientists are professionals
that transform clusters of data collected by different companies or groups into
useful information. In order to win in the current market, create new products
191
CLAPEM 2014 Universidad Nacional de Colombia
or to generate new business, data scientists need to follow the trends that Big
Data and analytics dictates. Although there are now a couple of dozen mas-
ters programs designed to meet the urgent need for talent of Big Data analyt-
ics, there is still a shortage of these resources. This talk focuses on creating
awareness of the situation. It describes the skills required to be a successful
data scientist and how SAS is helping to address this gap by launching the SAS
Analytics U program.
192
CLAPEM 2014
Index of Authors
193
CLAPEM 2014 Universidad Nacional de Colombia
194
CLAPEM 2014
195
Esta publicacin se termin de editar,
imprimir y encuadernar en septiembre de 2014
en Bogot, D. C., Colombia.
Se compuso en fuente tipogrfica
Minion Pro de 10 puntos.
ISSN 2389-9069
Organized by:
xiii
Sponsors: clapem-Latin American Congress of
Probability and Mathematical Statistics
ABSTRACTS BOOK