0607122
0607122
0607122
Ph.D. advisors:
Dr. Jose Ignacio Latorre Sentis - Universitat de Barcelona
Dr. Stefano Forte - Universita di Milano
Universitat de Barcelona
Julio 2006
The neural network approach to parton distributions 2
The neural network approach to parton distributions 3
A Sonia
The neural network approach to parton distributions 4
The neural network approach to parton distributions 5
Agradecimentos
La realizacion de una tesis doctoral, mas que una obra individual, es por encima de todo una
tarea comunitaria. Por esto los agradecimientos son el momento en el cual se tienen en cuenta la
contribucion de todas aquellas personas que con sus aportaciones de todo tipo han permitido que la
presente tesis doctoral haya llegado a buen puerto.
En primer lugar quisiera agradecer a Jose Ignacio Latorre por acompanarme estos cuatro anos
como director de esta tesis doctoral. Desde sus primeras clases de Fisica de Altas Energias, me
cautivo la pasion con la que explicaba, que es la misma con la que vive la aventura de la investigacion
cientifica. Jose Ignacio ha sido para mi no solo un maestro en el campo de la ciencia sino tambien
un modelo por su honestidad en todos los ambitos.
Tambien aprovecho para agradecer al codirector de esta tesis doctoral, Stefano Forte. Con
Stefano he aprendido el rigor necesario para ser un buen cientifico, asi como la pasion necesaria para
afrontar incluso los problemas mas dificiles sin desfallecer. Su ayuda continua han sido esenciales
para llevar a cabo el trabajo de investigacion desarrollado en esta tesis doctoral.
Muchas han sido las personas con las cuales durante estos cuatro anos he compartido el placer
de la investigacion cientifica, aprendiendo continuamente de todos ellos. De manera especial a los
otros miembros de la NNPDF Collaboration, Andrea Piccione y Luigi Del Debbio, pero tambien a
muchas otras personas: Antonio Pineda, Jorge Russo, Giovanni Ridolfi, Concha Gonzalez-Garcia,
Michele Maltoni... Asimismo, querria agradecer a todas aquellas personas con las que he compartido
siempre instructivas discusiones sobre fisica, y con las cuales he aprendido poco a poco el arte de ser
investigador. La lista es muy larga: German Rodrigo, Joan Soto, Ignazio Scimemi, Thomas Becher,
Einan Gardi ...
Tambien querria agradecer a todos aquellos companeros del departamento de Estructura i Con-
tituents de la Materia con los que he pasado estos anos, compartiendo la aventura de realizar una
tesis doctoral: Xavi, Enrique, Roman, Diego, Dani, Miriam, Luca, Jaume, Alex ... Pero especial-
mente a Manel y Arnau, pues desde que nos conocimos en el primer curso de la licenciatura, nos
hemos acompanado a lo largo de toda la carrera y el doctorado. Parece que aquel dia tan lejano
que apenas dislumbrabamos cuando haciamos Fonaments de Fisica ahora esta cerca para los tres.
Tambien estoy agradecido a Joan Soto y Nuria Barberan por comunicarme tanto su pasion como
su rigor en la docencia de las asignaturas que yo he colaborado a impartir en los dos ultimos anos.
Finalmente, agradecer a Oriol y a Rosa su continua ayuda en tantos detalles practicos que han
surgido en estos anos.
Son tantos los amigos que a lo largo de estos cuatro anos me han acompanado en la apasionante
aventura de la vida, haciendo posible renovar continuamente la pasion por todo, incluyendo la
investigacion cientifica, que pido perdon por adelantado por todos aquellos que no tengo presentes.
A Juan Ramon y a Roger especialmente por su inasequible apoyo y su continua ayuda en el cuidar
la no siempre facil vocacion cientifica. A Lluis por comunicarme su pasion por el conocimiento y
por la vida, y por proponerme una amistad que dura hasta hoy. A todos aquellos amigos que hemos
vivido juntos estos anos: Nestor, Raquel, Jorge, Xavi, Alfonso, Anna, Miquel C., Cristina, Miquel,
Josep C., Josep M., Marc, Albert ..., gracias una vez mas por vuestra infatigable compania en este
camino.
The neural network approach to parton distributions 8
Tambien estoy profundamente agradecido a los amigos de fisica de Milan: Giuliano, Tommy,
Betta, Paola, Maria, y a toda la asociacion cultural Euresis, por ayudarme a vivir mi vocacion
cientifica con un horizonte abierto a toda la realidad. Tambien aprovecho para agradecer a los
amigos de Madrid de la Asociacion Universitas, por su empeno continuo en vivir la vida academica
siempre consciente de las razones de la propia vocacion, y por tanto posibilitando renovar siempre
el interes tanto por la docencia como por la investigacion.
Sin embargo, por encima de todo queria agradecer a mi familia todo el apoyo y la ayuda incansable
que me han proporcionado a lo largo todos estos anos. Mi padre Eduardo ha sido desde siempre
mi referencia, tanto a nivel academico como a nivel personal. La pasion y la alegria con la que mi
padre vive su trabajo en la Universidad han sido mi mayor motivacion para empezar fisica primero,
y decidirme por la carrera academica despues. Mi madre Carmen tambien me ha ayudado siempre a
valorar el estudio y el trabajo, educandome en el interes por toda la realidad, algo que nunca podre
agradecer suficientemente. Mi hermano Ignacio ha sido siempre modelo para mi, por la pasion y
seriedad con la que ha vivido siempre el estudio, y ahora el trabajo. Tambien estoy muy agradecido
a mi familia politica, Raul, Margarita, Olga y Montserrat, por su continuo apoyo a todos los niveles
en la realizacion de esta tesis doctoral.
Finalmente, todo mi agradecimiento va dirigido a mi mujer, Sonia. Gracias a ella, a su ayuda
y su estimulo, he podido realizar la presente tesis doctoral. Su apoyo infatigable en todas las
circunstancias ha sido lo que me ha permitido empezar cada dia con una ilusion plenamente renovada,
tanto en la investigacion como en la docencia. Por todo ello, no puedo mas que agradecerle otra
vez todos estos anos en que nos hemos acompanado en la apasionante aventura de la vida y el
matrimonio.
The neural network approach to parton distributions 9
The neural network approach to parton distributions 10
The neural network approach to parton distributions 11
List of publications
L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione and J. Rojo [3], Unbiased deter-
mination of the proton structure function F2p with faithful uncertainty estimation,
JHEP 0503 (2005) 080, [arXiv:hep-ph/0501067].
S. Forte, G. Ridolfi, J. Rojo and M. Ubiali [4], Borel resummation of soft gluon
radiation and higher twists, arXiv:hep-ph/0601048, Phys. Lett. B 635 (2006) 313.
J. Rojo [5], Neural network parametrization of the lepton energy spectrum in B meson
decays, JHEP 0605 (2006) 040 [arXiv:hep-ph/0601229].
J. Mondejar, A. Pineda and J. Rojo [6], Heavy meson semileptonic differential decay
rate in two dimensions in the large Nc , arXiv:hep-ph/0605248.
Conference proceedings
J. Rojo, A probability measure in the space of spectral functions and structure func-
tions [7], proceedings of the QCD International Conference, Montpellier 2004, Nucl.
Phys. B (Proc. Suppl.) 152 (2006) 57, arXiv:hep-ph/0407147.
J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collaboration]
[8], The neural network approach to parton fitting, proceedings of DIS05 workshop,
arXiv:hep-ph/0505044.
J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collabora-
tion] [9], The neural network approach to parton distributions, contribution to the
proceedings of the Hera-LHC workshop, hep-ph/0509059.
J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre and A. Piccione [NNPDF Collaboration]
[10], The neural network approach to parton fitting, proceedings of the ACAT05
workshop, hep-ph/0509067.
The neural network approach to parton distributions 12
Contents
1 Introduccion 15
2 Resumen 21
8 Conclusiones 139
13
The neural network approach to parton distributions 14
Introduccion
Figure 1.1:
La localizacion del tunel de 27 kilometros donde esta situado el LHC, cerca de Ginebra (los Alpes pueden
ser vislumbrados detras) (izquierda) y uno de sus detectores, ATLAS, que es tan grande como un edificio
de siete pisos (derecha).
Sin embargo, la extraccion de la nueva fisica de las colisiones proton-proton a altas energias
que se produciran en el LHC es una tarea extremadamente complicada, por una serie de motivos
15
The neural network approach to parton distributions 16
que detallaremos a continuacion. El principal de estos motivos es que esta nueva fisica estara
escondida entre una multitud de procesos de fisica conocida, debido a la interaccion fuerte entre
quarks y gluones (que son las particulas elementales que componen los protones) determinadas por
el Modelo Estandard, en particular por la teoria conocida como Cromodinamica Cuantica (Quantum
Chromodynamics, QCD). Estos procesos seran mucho mas frecuentes que las colisiones en donde
se produzcan los efectos buscados de nueva fisica. Por lo tanto, el potencial de descubrimiento del
LHC, asi como su habilidad para realizar medidas de precision de las propiedades de esta nueva
fisica, dependen de una manera crucial de nuestra comprension quantitativa de los procesos de la
interaccion fuerte y las incertidumbres que estos llevan asociados. En la fig. 1.2 podemos ver el
resultado de un proceso caracteristico de colision entre protones en el LHC. Es necesario recalcar
que de los miles de millones de colisiones como esta que se produciran cada ano en el LHC, sera
necesario extraer aquellas pocas que contienen informacion autenticamente relevante.
Figure 1.2:
Resultado de una colision tipica entre protones en el LHC: podemos observar las trazas que dejan los
cientos de particulas producidas en cada colision, asi como la energia total que cada una de estas particulas
lleva asociada.
Entre las diversas fuentes de incertidumbre asociadas a los procesos que involucran particulas con
interaccion fuerte, una de las de mayor importancia es debida a las distribuciones de partones (Parton
Distribution Functions, PDFs) en el interior del proton. Estas distribuciones son una medida de la
cantidad de la energia total del proton que lleva cada uno de sus diversos componentes, los quarks
de diferente sabor y los gluones. Estas distribuciones de partones no pueden ser calculadas en teoria
de perturbaciones, sino que necesitan ser extraidas de los datos experimentales que provienen de una
gran cantidad de procesos de fisica de altas energias, como por ejemplo las colisiones profundamente
inelasticas (Deep inelastic scattering, DIS) entre leptones (particulas elementales sin interaccion
fuerte) y protones.
Las distribuciones de partones las denotaremos por
donde Q2 es la energia tipica del proceso de colision, la variable x denota la fraccion de la energia
The neural network approach to parton distributions 17
total del proton que lleva el parton i, y tenemos una distribucion de partones independiente para
cada quark y cada antiquark de diferente sabor, mas una para los gluones. Hay que tener en
cuenta que en QCD las energias son grandes o pequenas en comparacion con la masa del proton
Mp , que constituye la escala caracteristica de la teoria. Estas distribuciones de partones pueden
ser determinadas gracias a la existencia del teorema conocido como el Teorema de la Factorizacion
en Cromodinamica Cuantica. Segun este teorema, cualquier seccion eficaz de colision (que mide la
probabilidad de que dos particulas colisionen y interaccionen) en procesos que involucren la interacion
fuerte puede ser separada en el producto de dos terminos: un coeficiente que depende del proceso
en cuestion, que podemos calcular en teoria de perturbaciones, y un conjunto de distribuciones
de partones que son universales, es decir, que son independientes de los detalles particulares del
proceso. En la fig. 1.3 observamos un esquema del interior de un proton, con los diferentes quarks
interaccionando entre si mediante los gluones, y donde la flecha indica el spin, el momento angular
interno de cada particula. El movimiento de estos quarks y gluones en el proton viene dictado por
estas distribuciones de partones.
Figure 1.3:
El interior de un proton se compone de quarks y antiquarks de diferentes sabores, junto con gluones que
mediante la interaccion fuerte mantienen a estos quarks confinados en el interior del proton.
Por ejemplo, las funciones de estructura en colisiones profundamente inelasticas pueden escribirse
como
F (x, Q2 ) = Ci x, s Q2 qi (x, Q2 ) , (1.2)
es decir, como la convolucion entre un coeficiente Ci (x) que depende del proceso y las distribuciones
de partones qi (x), que son identicas para todo proceso de colision a altas energias que involucre
particulas con la interaccion fuerte en el estado inicial. Solamente es necesario determinar las
distribuciones de partones a una escala inicial Q20 relativamente pequena, Q20 Mp2 , pues su de-
pendencia con la energia (denominada evolucion) viene dictada por la teoria de perturbaciones de
la Cromodinamica Cuantica. Es necesario tener en cuenta que en general diferentes combinaciones
de distribuciones de partones contribuyen a cada observable de manera diferente. Aunque la mayor
fuente de informacion experimental sobre las distribuciones de partones proviene de las medidas de
alta precision en los procesos de colision profundamente inelasticos descritos anteriormente, otros
procesos son esenciales como produccion de jets (conjuntos de hadrones, es decir, de particulas que
The neural network approach to parton distributions 18
Figure 1.4:
Esquema del experimento de colisiones profundamente inelasticas HERA en Hamburgo.
El problema principal, en vista de todo lo descrito anteriormente, es determinar los errores de una
funcion continua como son las distribuciones de partones, es decir, una densidad de probabilidad en
el espacio de estas funciones, a partir de un conjunto finito de datos experimentales. Existe toda una
serie de tecnicas estandard para determinar las distribuciones de partones a partir de las medidas
experimentales, y diversas estrategias han sido usadas para estimar de maneras diversas ademas
The neural network approach to parton distributions 19
los errores asociados a estas distribuciones. Todas estas estrategias han sido utiles para estimar
de manera aproximada el tamano de estas incertidumbres, pero sin embargo sufren de un cierto
numero de problemas. Con las tecnicas estandard, las distribuciones de partones se parametrizan
con funciones relativamente simples, tipicamente polinomios de la forma
c
qi (x, Q20 ) = Ai xbi (1 x) i 1 + di x + ei x2 , i = 1, . . . , Nf + 1 , (1.4)
donde los parametros Ai , bi . . . se determinan a partir de un fit de los datos experimentales. El primer
problema aparece de immediato: nos estamos restringiendo al espacio de distribuciones de partones
parametrizadas por la ec. 1.4, lo cual esta claramente injustificado pues no hay en la Cromodinamica
Cuantica, la teoria que en principio determina la forma de estas distribuciones, nada que implique
que las distribuciones de partones han de tener la forma funcional tan especifica que podemos
observar en la ec. 1.4. Segundo, si se quieren propagar los errores que la ec. 1.4 lleva asociados a
otros observables, como secciones eficaces de la forma de la ec 1.3, son necesarias aproximaciones
de linealizacion, que como es bien conocido no son validas en un amplio rango de situaciones.
Finalmente, el tercer inconveniente que presenta la tecnica estandard es que en presencia de datos
experimentales que provienen de experimentos diferentes, la presencia de incompatibilidades entre
los datos (por ejemplo, que la funcion de estructura, ec. 1.2, medida en dos experimentos distintos sea
muy diferente) implica que la condicion para determinar los errores a partir de la funcion de error 2
no es 2 = 1, como indica la estadistica basica, sino 2 = 100. En la practica esta eleccion implica
que los errores de las distribuciones de partones no tienen ningun significado estadistico riguroso pues
dependen de un parametro arbitrario 2 . En la fig. 1.5 tenemos un ejemplo de una determinacion
reciente de distribuciones de partones junto con las incertidumbres asociadas. Notemos que hay dos
tipos de errores asociados: los errores estadisticos habituales (debido a disponer de un numero finito
de medidas) y los errores sistematicos, que dependen en general del proceso de medida y que estan
correlacionados entre diferentes datos experimentales.
Figure 1.5:
Un ejemplo de una determinacion reciente de distribuciones de partones. Notemos que cada distribucion
tiene asociada una banda de incertidumbre, que es una medida de los errores de cada una de las
distribuciones de partones.
The neural network approach to parton distributions 20
Por todas las razones descritas hasta ahora, y en vista de la importancia crucial que una esti-
macion fidedigna de las incertidumbres asociadas a las distribuciones de partones tiene para la fisica
de precision en el LHC, es claramente deseable investigar estrategias alternativas para la determi-
nacion de estas distribuciones que permitan superar y mejorar los problemas de la tecnica estandard
descrita anteriormente. En [11] una novedosa tecnica fue presentada consistente en la determinacion
de la densidad de probabilidad en el espacio de las funciones, que fue aplicada a la parametrizacion
de funciones de estructura, como la ec. 1.2, en procesos de colision profundamente inelasticos entre
leptones y protones. Esta novedosa tecnica usaba una combinacion de metodos Monte Carlo para la
construccion de samplings de los datos experimentales junto con redes neuronales artificiales como
interpolantes universales. El uso de redes neuronales artificiales en lugar de funciones polinomicas
como la eq. 1.4 en la parametrizacion de las distribuciones de partones permite eliminar la depen-
dencia de los resultados en la forma funcional escogida arbitrariamente. Por su parte, la tecnica de
los samplings Monte Carlo permite una estimacion estadisticamente rigurosa de las incertidumbres
asociadas a la funcion que estamos parametrizando, y ademas la propagacion de estos errores a otros
observables se puede realizar con toda generalidad, sin necesidad de aproximaciones de linearizacion.
En la presente tesis doctoral hemos extendido los resultados presentados en [11], aplicados a las
funciones de estructura en colisiones profundamente inelasticas, en diversas direcciones. En primer
lugar hemos aplicado esta tecnica general para parametrizar datos experimentales a otros procesos
de interes en fisica de altas energias. Los procesos que han sido estudiados son las desintegraciones
hadronicas del lepton tau, las desintegraciones semileptonicas del meson B, y las colisiones pro-
fundamente inelasticas desde dos puntos de vista: a nivel de funciones de estructura, incluyendo
todos los datos experimentales a nuestra disposicion, y a nivel de distribuciones de partones non-
singlet, es decir, cuando el gluon se ha desacoplado del resto de distribuciones. En cada una de estas
aplicaciones, la relacion entre los datos experimentales y la funcion a parametrizar era diferente,
demostrando que la tecnica descrita en esta tesis es completamente general, valida para un gran
numero de situaciones.
Ademas, en la presente tesis hemos extendido la estrategia original descrita en [11] mediante
la introduccion de nuevos algoritmos para entrenar las redes neuronales artificiales, en particular
los conocidos como Algoritmos Geneticos. Estos algoritmos son necesarios para entrenar redes
neuronales mediante funciones de error altamente no lineales, como nos sucede en las diferentes
aplicaciones que describiremos en esta tesis. Finalmente, se han mejorado las tecnicas estadisticas
utilizadas para la validacion de los resultados obtenidos, esto es, de la densidad de probabilidad con-
truida en los diferentes casos. En el siguiente Capitulo describimos con cierto detalle los contenidos
de la presente tesis doctoral y los resultados que han sido obtenidos.
Chapter 2
Resumen
La presente tesis doctoral esta organizada de la manera que se describe con cierto detalle a con-
tinuacion. En el Capitulo 4 hacemos un resumen de los elementos basicos de la Cromodinamica
Cuantica, que como hemos explicado con anterioridad es el sector del Modelo Estandard de fisica
de particulas que gobierna la llamada interaccion fuerte entre particulas elementales. Asimismo, de-
scribimos tambien aquellos procesos de fisica de altas energias en los cuales utilizaremos la estrategia
general para parametrizar datos experimental que constituye el principal objeto de esta tesis. Junto
con este resumen, presentamos tambien una descripcion detallada de la tecnica estandard, discutida
en el Capitulo anterior, que se usa comunmente para extraer las distribuciones de partones con sus
errores asociados a partir de un conjunto de datos experimentales.
A continuacion, en el Capitulo 5 describimos con todo lujo de detalles la tecnica general para
construir la densidad de probabilidad de una funcion a partir de medidas experimentales, esto es,
una tecnica para parametrizar datos experimentales, sin necesidad de hacer hipotesis alguna sobre la
forma funcional de la funcion a parametrizar y con una estimacion fidedigna de las incertidumbres
asociadas, que permite una propagacion de los errores a observables arbitrarios sin necesidad de
aproximaciones lineales. Este Capitulo esta divido en tres partes, metodos Monte Carlo, redes
neuronales artificiales y estimadores estadisticos, que procedemos a describir a continuacion.
En la primera parte de este Capitulo se describen los metodos Monte Carlo que usamos para
construir un sampling de los datos experimentales que contenga toda la informacion que nos pro-
porcionan los experimentos, incluyendo los errores y las correlaciones. Asimismo, introducimos un
conjunto de estimadores estadisticos que nos permiten estimar quantitativamente como este sampling
Monte Carlo reproduce las caracteristicas de las medidas experimentales. Esta tecnica nos permite
estimar de una manera fidedigna los errores asociados a la funcion a parametrizar, y demostraremos
que es equivalente a los errores definidos a partir de una funcion de error 2 cuando las aproximacion
lineales en la propagacion de los errores son suficientes.
En la segunda parte del Capitulo 5 introducimos las redes neuronales artificiales, que utilizaremos
como interpoladores universales, asi como los diferentes metodos que usaremos para entrenar estas
redes neuronales, esto es, para que aprendan los patterns presentes en los datos experimentales. Las
redes neuronales artificiales son una herramienta habitual en diversos campos cientificos, desde la
biologia a la computacion, y en particular se usan con asiduidad en fisica experimental de altas
energias, en aplicaciones como clasificacion de eventos en funcion de sus propiedades. En la fig. 2.1
mostramos una red neuronal artificial de la clase que utilizaremos en esta tesis doctoral, conocida
como multi-layer feed-forward perceptron. Asimismo describiremos como nuestra tecnica permite la
incorporacion de informacion teorica de maneras muy diversas, como reglas de suma o condiciones
implicadas por la cinematica del proceso.
Las redes neuronales artificiales tienen la interesante propiedad de que es es posible demostrar
que cualquier funcion continua, independientemente de lo complicada que sea y del numero de
21
The neural network approach to parton distributions 22
Figure 2.1:
Diagrama esquematico de una red neuronal artificial multicapa del tipo feed-forward.
parametros que tenga, puede ser representada en terminos de una multi-layer feed-forward red neu-
ronal artificial. Una segunda propiedad importante de las redes neuronales artificiales es que estas
son muy eficientes en combinar de una manera optima la informacion experimental que proviene de
diferentes medidas de una misma cantidad. Esto es, cuando la separacion entre datos experimentales
en el espacio de variables de entrada es mas pequena que una determinada longitud de correlacion,
entonces la red neuronal combina de manera eficiente esta informacion experimental, de manera que
el correspondiente pattern de salida es mas preciso que las medidas experimentales individuales.
La utilidad de las redes neuronales artificiales es debida a la existencia de diversos algoritmos de
entrenamiento. Este proceso se llama aprendizaje, pues no requiere un conocimiento a priori de la
forma funcional que describe los datos experimentales. En particular, en la presente tesis doctoral
hemos introducido los llamados algoritmos geneticos para el entrenamiento de redes neuronales
artificiales. Estos algoritmos geneticos tienen un gran numero de ventajas respecto a los metodos
de minimizacion deterministas que los hacen preferibles para problemas, como los que nos ocupan
en la presente tesis doctoral, altamente no lineales y con un enorme espacio de parametros.
Las ventajas de los algoritmos geneticos, que como su propio nombre indica estan inspirados en
los mecanismos que se observan en la naturaleza sobre la evolucion y la seleccion natural, se pueden
resumir en tres. Primero de todo, estos algoritmos trabajan simultaneamente en poblaciones de
soluciones, lo que les permite explorar regiones diferentes del espacio de parametros al mismo tiempo.
Segundo, no necesitan ninguna informacion extra de la funcion a minimizar, como el gradiente de
esta. Finalmente, estos algoritmos tienen una mezcla de elementos estocasticos aplicados bajo reglas
deterministas, que mejora su eficiencia en problemas con muchos extremos locales, pero sin la perdida
de efectividad que una busqueda meramente aleatoria implicaria.
Finalmente, en la tercera parte del Capitulo, analizaremos aquellas herramientas estadisticas
que nos permiten validar el resultado obtenido, esto es, determinar de una manera quantitativa
como la densidad de probabilidad que hemos construido reproduce las carateristicas de los datos
experimentales, asi como su dependencia con respecto diversos parametros, como por ejemplo el
numero de redes neuronales empleadas en la parametrizacion. Estas tecnicas estadisticas permiten
tambien detereminar a partir de criterios solidos caracteristicas de la densidad de probabilidad como
la longitud del entrenamiento de las redes neuronales o el valor optimo de la funcion de error 2 .
El conjunto de redes neuronales artificiales entrenadas en el conjunto de replicas Monte Carlo
de los datos experimentales para una funcion F definen la densidad de probabilidad en el espacio
de funciones F que estabamos buscando. Con esta densidad de probabilidad, podemos obtener los
valores esperados para funcionales arbitrarios de la funcion F , F [F ], a partir del conjunto de redes
The neural network approach to parton distributions 23
como con las distribuciones de probabilidad habituales. De esta manera podemos determinar la
media de F , su variancia y su correlacion, utilizando las definiciones habituales de estos estimadores
estadisticos.
El Capitulo 6 describe con detalle cuatro aplicaciones diferentes de la tecnica introducida en el
Capitulo 5. En primer lugar analizamos la parametrizacion de la funcion espectral V A (s), esto
es, la diferencia entre las funciones espectrales correspondientes a los canales vectoriales y axiales,
en las desintegraciones hadronicas del lepton , que han sido medidas con gran precision en el
experimento LEP (Large Electron Positron collider, gran colisionador de electrones y positrones),
el antecesor de LHC en el CERN. Como producto de este analisis determinamos los condensados
de vacio de QCD, hOk i, que son parametros no perturbativos que deben extraerse de los datos
experimentales. La determinacion de estos condensados a partir de los datos experimentales nos
proporciona informacion sobre aspectos fundamentales de la Cromodinamica Quantica, como el
mecanismo de la ruptura de la simetria quiral, y ha sido objeto de intenso estudio en los ultimos
anos. En la fig. 2.2 representamos los valores obtenidos para los condensados de vacio de QCD,
hOk i, como se describe en la seccion correspondiente de la tesis doctoral.
Figure 2.2:
Los resultados para los condensados de vacio de la Cromodinamica Cuantica obtenidos en la presente tesis
doctoral, como funcion del parametro de integracion s0 . Las bandas de error corresponden a
incertidumbres de 1- que provienen de la parametrizacion de V A (s).
a valores grandes de la variable x, los dos resultados son perfectamente consistentes, como era de
esperar.
Figure 2.3:
Los resultados para la funcion de estructura del proton F2p (x, Q2 ) obtenidos en la presente tesis doctoral
comparados con los resultados originales [11].
En tercer lugar, estudiamos las desintegraciones semileptonicas del meson B, comparamos nue-
stros resultados con una serie de resultados teoricos y extraemos de la parametrizacion del espectro
energetico del lepton la masa del quark b, mb . La determinacion de parametros no perturbativos,
como las masas de los quarks pesados o los elementos de la matriz de CKM, son una de las princi-
pales motivaciones para los analisis teoricos y experimentales de las desintegraciones de los mesones
B. En la Fig. 2.4 mostramos los resultados obtenidos para la parametrizacion del especto leptonico,
para diferentes combinaciones de experimentos incluidos en el fit, como se describe en detalle en la
seccion 6.3.
Esta aplicacion demuestra como nuestra tecnica general para parametrizar datos experimentales
se puede aplicar a la reconstruccion de una funcion si la unica informacion experimental accesible
procede de integrales truncadas de esta funcion. Esto permite una comparacion mas general de
las predicciones teoricas con los datos experimentales, sin necesidad de hipotesis adicionales y con
una estimacion fidedigna de los errores experimentales. Asimismo, el desarrollo de estas tecnicas
permitira en un futuro aplicarlas a otras situaciones de interes en fisica de mesones B, como por
ejemplo la parametrizacion de la shape function S(), que contiene los efectos no perturbativos
dominantes en una serie de procesos como las desintagraciones radiativas de los mesones B.
La aplicacion mas imporante de todas es descrita en la ultima seccion del Capitulo 6: la
parametrizacion de distribuciones de partones. Primero describimos una nueva tecnica para im-
plementar la dependencia con la energia Q2 de estas distribuciones, y seguidamente describimos
la construccion de la densidad de probabilidad en el espacio de las distribuciones de partones non-
singlet, a partir de datos experimentales de funciones de estructura. En la Fig. 2.5 podemos observar
los resultados para la distribucion de partones nonsinglet obtenida en la presente tesis doctoral con
dos valores diferentes Q20 del corte cinematico. Este corte cinematico es debido a que solo los
datos experimentales de la funcion de estructura F2N S (x, Q2 ) con Q2 suficientemente grande pueden
tratarse mediante la teoria de perturbaciones de la Cromodinamica Cuantica.
En la Fig. 2.6 tenemos el esquema del proceso utilizado en la parametrizacion de la distribucion
The neural network approach to parton distributions 25
Figure 2.4:
Comparacion de los resultados para la parametrizacion del espectron energetico leptonico con todos los
experimentos incorporados en el fit, con el caso en que solamente los datos del experimento Babar han sido
considerados.
xqns(x, Q2)
0
0.08
Q2min=3 GeV2
0.06
Q2min=9 GeV2
0.04
0.02
-0.02
10-2 10-1
x
Figure 2.5:
Comparacion de los resultados para la distribucion de partones xqNS (x, Q20 ) para dos valores diferentes del
corte cinematico: el de referencia Q2 3 GeV2 , con uno mas conservativo Q2 9 GeV2 .
Figure 2.6:
Estrategia general seguida para la parametrizacion de la distribucion de partones non-singlet qNS (x, Q20 ) a
partir de datos experimentales de la funcion de estructura F2NS (x, Q2 ).
Chapter 3
The dependence with the scale Q2 of the parton distributions is determined in QCD perturbation
theory from the so-called parton evolution equations. Note that in general different combinations
of parton distributions contribute to different observables. The inclusion of a wide variety of hard-
scattering data is thus crucial to disentangle the various parton distributions. Even if the backbone
of the determinations of parton distributions is the high precision deep-inelastic scattering data,
essential experimental input comes from other measurements like jet production, heavy boson pro-
duction or the Drell-Yan process. In Fig. 3.2 we show the set of parton distribution functions from
a recent global QCD analysis.
The requirements of precision physics at hadron colliders, specially the Large Hadron Collider,
27
The neural network approach to parton distributions 28
Figure 3.1:
The location of the LHC experiment near Geneve (left) and the one of its detectors, ATLAS (right).
determine that it is now mandatory to determine accurately not only the parton distributions them-
selves but also the uncertainties associated to them. Note that a detailed knowledge of parton
distributions and their associated uncertainties is essential if one wants to perform accurate mea-
surements, since a typical LHC cross-section (x, Q2 ) reads
(x, Q2 ) = Cij x, s Q2 qi (x, Q2 ) qj (x, Q2 ) , (3.2)
which involves the product of two parton distributions from each of the two partons in the initial
state of the proton-proton collision. The problem is specially acute since the kinematic region
covered by the LHC overlaps only partially with the kinematical range of those experiments used
to determine the parton distributions, like HERA, as can be seen in Fig. 3.3, and therefore one
has to extrapolate parton distribution into an unknown kinematical region. Also for this reason
it is essential to determine the uncertainties in parton distributions and propagate them into the
extrapolation region probed by LHC.
The main problem to be faced here is that one is trying to determine an uncertainty on a function,
i.e., a probability measure on a space of functions, and to extract it from a finite set of experimental
data. Within the framework of the standard approach to determine parton distributions from
experimental data, several techniques have been constructed to assess these uncertainties. However,
even if all these techniques have been useful to estimate the size of the uncertainties, they suffer from
several drawbacks. First of all, since in the standard approach parton distributions are parametrized
with relatively simple functional forms like
qi (x, Q20 ) = Ai xbi (1 x)ci 1 + di x + ei x2 + . . . , (3.3)
where the parameters Ai , bi , . . . are fitted from experimental data, the estimation of the uncertainties
is restricted to the space parametrized by Eq. 3.3 and therefore depends heavily on the assumptions
done for the non-perturbative shapes of the parton distributions. Second, in order to propagate the
uncertainties in the parton distributions to an arbitrary observable, linearized approximations in
the error propagation have to be used, whose validity is at best doubtful. Finally, in the presence
of experimental data from different experiments, it has been argued that incompatibility between
different experiments force that the tolerance condition in the 2 used to estimate the errors is not
the textbook value 2 = 1 but a much larger value 2 = 50 or 100. Even if this choice can be
justified to some extent, in practice parton uncertainties determined with this condition lose their
statistical meaning, since they depend on the choice of the free parameter 2 .
The neural network approach to parton distributions 29
Figure 3.2:
Parton distribution functions qi (x, Q20 ) as determined from experimental data in a recent QCD global
analysis [12].
Therefore, in view of the crucial importance of parton distribution functions for LHC precision
physics, it is worth investigating alternative approaches for the determination of parton distributions
and the associated uncertainties that bypass the problems of the standard approach discussed above.
In Ref. [11] a novel technique was presented to determine the associated probability measure in the
space of a function, which was applied to the parametrization of deep inelastic structure functions.
This technique uses a combination of Monte Carlo samplings of the experimental data together with
artificial neural networks as universal unbiased interpolants. The use of neural networks instead
of fixed functional forms as in Eq. 3.3 avoids the bias introduced by the assumption of a given
functional form, and the Monte Carlo sampling provides a statistically rigorous estimation of the
function uncertainties, which allows for a general error propagation to arbitrary observables without
the need of linearized approximations.
In this thesis we extend the work of Ref. [11]1 in different ways: first we have applied the general
strategy to parametrize experimental data to other processes of interest, in particular to the case
of parton distribution functions. In all these cases the relation between the parametrized quantity
and the experimental data was completely different, so it has been shown that the approach of Ref.
[11] is suited for a variety of situations. Second, we have extended the basic technique with the
introduction of new minimization algorithms, specially genetic algorithms, as well as new statistical
estimators to assess the stability of the obtained probability measure in different aspects. Finally,
several improvements of the neural network training have been introduced to optimize its efficiency.
The outline of the present thesis is as follows. In Chapter 4 we review the basic elements
of Quantum Chromodynamics, as well as those high energy processes that will be used later in
the thesis. We present also a review of the standard approach to parametrize parton distribution
1 See also Ref. [13]
The neural network approach to parton distributions 30
Figure 3.3:
The kinematical coverage of the LHC compared to that of the HERA collider and fixed target
deep-inelastic scattering experiments.
functions with their associated uncertainties. In Chapter 5 we introduce the general technique to
parametrize experimental data in an unbiased way with faithful estimation of the uncertainties,
which is the main subject of this thesis. Then in Chapter 6 we present four applications of the
general strategy, and special attention is paid to the most important one, the parametrization of
the nonsinglet parton distribution. Two appendices summarize background material on statistical
analysis of experimental data and on the current status of the standard approach to global fits of
parton distributions.
Chapter 4
In this first Chapter we review the basic elements of Quantum Chromodynamics (QCD), the gauge
theory of the strong interactions. After a brief description of the theoretical foundations of QCD,
we describe in some detail the deep-inelastic scattering process. This process is the most important
source of information on parton distribution functions, whose parametrization, as we have discussed
in the Introduction, is the main motivation for the set of techniques developed in the present thesis.
We will consider also two other high energy processes, since they have provided testing grounds for
our strategy to parametrize experimental data: the semileptonic decays of the B meson and the
hadronic tau decays.
Parton distribution functions, as has been discussed in the Introduction, have to extracted from
experimental data, by means of a QCD analysis of a variety of hard scattering measurements. In
the second part of this Chapter we present a summary of the standard approach to global fits of
parton distributions, and we discuss in some detail the different methods, with their advantages and
drawbacks, which are commonly used to estimate the associated uncertainties of parton distribution
functions.
Let us describe the elements of the above equation. The Nf spinor quark fields of different flavor
are labeled by qi , each with mass mi . The Dirac matrices appear due to the fermionic nature of
quarks, and are defined by the anticommutation relation { , } = 2g . The covariant derivative
reads in terms of the gluon field AA
(D )ab = ab + ig tA AA
ab
, (4.2)
31
The neural network approach to parton distributions 32
where is a Lorentz index, = 0, 1, 2, 3, and a, b are color indices for the fundamental representation,
a, b = 1, . . . , N . The matrices tA , where A is a color index in the adjoint representation, A =
1, . . . , N 2 + 1, are the SU(N) generating matrices in the fundamental representation. The last term
in Eq. 4.1 is the field-strength tensor for the gluon field,
A
F = AA A
A gf
ABC B C
A A , (4.3)
where f ABC are the structure constants of SU(3), defined by the commutation relation
A B
t , t = if ABC tC . (4.4)
The last term in Eq. 4.3 describes the self-interaction of the gluons, a typical feature of non-abelian
gauge theories like QCD which renders the theory asymptotically free, as discussed below. Finally,
g is the QCD coupling constant, which is, together with the quark masses, the only free parameter
of the theory.
From this Lagrangian, using standard rules of Quantum Field Theory [16] one can compute
several observables, like cross sections or decay rates, in a perturbative series expansion in powers
of s = g 2 /4. Radiative quantum corrections induce a dependence of the strong coupling with
respect to the typical energy of the process E, which at lowest order reads
1
s = s (E) = 2 , (4.5)
0 ln E
2
QCD
where 0 is the first coefficient of the QCD function that determines the running of s with the
energy. The main feature of QCD can be seen from Eq. 4.5: the theory is asymptotically free
[17, 18], which means that the coupling constant vanishes when the typical energies of the process
become very large with respect to the typical scale of the theory, QCD . Asymptotic freedom allows
us to apply QCD to many high energy processes for which perturbation theory is meaningful, since
in this case E QCD and therefore s (E) 1. On the other hand, the same asymptotic freedom
property implies that the theory becomes strongly coupled at low energies, E QCD , and in this
non-perturbative regime the standard tools of perturbation theory are useless, and one has to resort
to other methods, like lattice computations [19]. In Fig. 4.1 we show a comparison of different
extractions of the strong coupling s (E) with the theoretical QCD predictions [20].
For single scale observables, that is, for processes which depend only on a single hard scale
Q2 (where again a hard scale is a energy scale which satisfies the condition Q2 QCD ), the
perturbative expansion in powers of the strong coupling reads, for those processes in which we are
interested in,
X n
R(Q2 ) = an s Q2 , (4.6)
n=1
where without loss of generality
we have assumed that the perturbative expansion of the observable
R(Q2 ) starts at order s Q2 . The best example of this first case is the total cross section of e+ e
going to hadrons, where in this case the hard scale is identified with the center of mass energy of
the collision, Q2 = s.
Perturbative expansions become more complicated as the number of hard scales present in the
process begins to increase. In this case it is not always true that observables can be expanded in a
simple power series in s Q2 . Let us consider to be definite the deep-inelastic scattering processes,
which is the best known example of a two-scale process, and which will be discussed in detail in
the next Section. In this case the two hard scales will be denoted by Q2 and W 2 . In deep-inelastic
scattering, as we will see in brief, the nonsinglet structure function at leading order is given by
" #(N )/0
NS 2 s Q2
F2 (N, Q ) = , (4.7)
s (Q20 )
The neural network approach to parton distributions 33
Figure 4.1:
Comparison of different determinations of the strong coupling s (Q) at several energies Q from a variety of
processes, as summarized in Ref. [20].
where N is the conjugate variable of the ratio x = Q2 /(Q2 + W 2 ), and therefore F2N S cannot be
expanded in integer powers of s Q2 . Another example is the singlet structure function at small
x [21], which reads
p
2
F2 (x, Q ) exp ln (1/x) ln ln (s (Q2 )) , (4.8)
x0
which again is not expandable in a simple series in powers of s Q2 .
In some cases, however, perturbative computations which depends on two hard scales have a
perturbative expansion of a similar form of Eq. 4.6, that is, a expansion in integer powers of the
strong coupling, like the deep-inelastic scattering coefficient functions, which are of the form
X
2 2
2 k Q2
C(Q , W ) = s Q bk . (4.9)
W2
k=1
Note that unlike Eq. 4.6 the coefficients of the expansion bk are not pure numbers but functions of the
2 2
ratio of scales
Q /W . In these coefficients bk one can encounter large logarithmic terms of the form
2 2 p
ln Q /W . These logarithmic terms can become large in some kinematical regions and need to
be resummed to all orders [22] in order to trust perturbation theory. Note that typical perturbative
expansions like Eq. 4.6 and 4.9 are at best asymptotic expansions, and often their large order
behavior introduces divergences, leading to ambiguities in the value of the summed perturbative
expansion related to nonperturbative corrections [23]. Note also that since the coupling s Q2
increases as the value Q2 decreases, the perturbative expansions Eqs. 4.6 and 4.9 are meaningful
only for large enough values of Q2 QCD , that is, for the so-called hard processes [24].
The foundation for results like Eq. 4.6 and 4.9 is the Operator Product Expansion. This
expansion allows to organize any quantity in a series in inverse powers of Q2 , the typical energy scale
of the process, by means of an expansion of composite operators in terms of simpler operators of the
The neural network approach to parton distributions 34
appropiate dimension. The Operator Product Expansion [25] can be used to parametrize higher-
order nonperturbative effects that are mostly relevant at low Q2 in terms of matrix elements of local
operators, and in this way to extend the validity range of theoretical predictions. For example, for
a single scale observable, the operator product expansion reads schematically
X hOk i
R Q2 = ck s Q2 , (4.10)
Qk
k=0
where hOk i are nonperturbative expectation values of operators with the appropriate dimensions,
and where the leading order result corresponds to the unit operator, O0 = 1,
X
n
c0 s Q2 = an s Q2 . (4.11)
n=1
It is clear from Eq. 4.10 that at large enough values of Q2 only the leading term in the OPE is
relevant for phenomenological purposes. The nonperturbative matrix elements hOk i in Eq. 4.10
can be extracted from experimental data in some processes and then be used in other processes to
increase the accuracy of the theoretical prediction, as in the case of parton distribution functions,
which will be analyzed in detail in Section 4.1.2.
Now we review the high energy processes that will be used in applications of the general technique
to parametrize experimental data which is the main subject of this thesis: deep-inelastic scattering,
semileptonic B meson decays and hadronic tau decays.
Q2
x= , Q2 = q 2 , (4.12)
2p q
The neural network approach to parton distributions 35
where q is the momentum carried by the virtual gauge boson and p is the momentum carried by the
incoming proton. Other important variables are the invariant mass of the final hadronic state W 2
and the inelasticity y, given by
1x qp
W 2 = Q2 , y= , (4.13)
x kp
where k is the momentum carried by the initial state lepton. The applicability of perturbation
theory requires that both scales Q2 and W 2 are large, because if not either higher twist corrections1
are relevant, or the
p perturbative expansion breaks down due to the presence of large logarithms of
the form s Q2 lnk (1 x). The kinematical cut in W 2 can be lowered by including the effects
of threshold resummation [32]. Note that even if Q2 is large, W 2 can be small provided x is large
enough. The relation between perturbative threshold resummation and higher twist nonperturbative
corrections is still an open issue [4].
Figure 4.2:
The deep-inelastic scattering process: the hard scattering of a lepton off a hadron, typically a
proton.
The deep-inelastic scattering cross section can be decomposed using kinematics and Lorentz
invariance in terms of structure functions Fi (x, Q2 ). These structure functions parametrize the
structure of the proton as seen by the virtual gauge boson. If the incoming lepton is a charged
lepton (an electron or a muon) then for Q2 MZ2 the double differential deep-inelastic scattering
cross section reads
d2 em 42 2
2 1y 2 2
= 1 + (1 y) F1 (x, Q ) + F2 (x, Q ) 2xF1 (x, Q ) , (4.14)
dxdQ2 Q4 x
where is the electromagnetic coupling. If the incoming lepton is a neutrino, then the cross section
1 The twist expansion is the operator product expansion applied to the deep-inelastic scattering process.
The neural network approach to parton distributions 36
reads
d2 p G2 M E M 2 2 2 y 2
= F 1y xy F2 (x, Q ) + y xF1 (x, Q ) + y 1 xF3 (x, Q ) . (4.15)
dxdy 2E 2
where GF is the Fermi constant, M the mass of the target hadron and E the neutrino energy
in the hadron rest frame. Note that Eq. 4.15 holds both for charged current (W exchange)
and neutral current (Z exchange) neutrino scattering, even if the decomposition of the structure
functions Fi (x, Q2 ) in terms of parton distributions is different in the two cases [33]. All the structure
functions defined above have been measured in several experiments, and the most precisely known
is the charged lepton scattering neutral current structure function F2 (x, Q2 ), thanks to the high
precision measurements at HERA and in fixed target experiments. In Fig. 4.3 we show a summary
of available data on this structure function from different experiments. Note that the effects of
QCD evolution, that is, the dependence of F2 (x, Q2 ) with the scale Q2 , is clearly observed in the
experimental data, specially in the small-x region.
Figure 4.3:
The deep-inelastic structure function F2p (x, Q2 ) as measured by several different experiments (HERA,
BCDMS and NMC). Note the dependence of F2p (x, Q2 ) with Q2 , as dictated by perturbative QCD.
Structure functions Fi (x, Q2 ) depend on the momentum distribution of partons (quarks and
gluons) inside the proton, which are determined by low energy nonperturbative dynamics, and
therefore cannot be computed in perturbation theory. However, each structure function Fi (x, Q2 ),
using the factorization theorem [34], can be written as a convolution of hard-scattering coefficients
The neural network approach to parton distributions 37
Cij (x, s (Q2 )), which depend only on the short-distance (perturbative) physics, and parton distri-
bution functions qj (x, Q2 ) [35], which parametrize the (non-perturbative) structure of the proton.
The factorized structure function reads
Z 1
2 dy 2 x 2
Fi (x, Q ) = Cij (y, s (Q ))qj ,Q . (4.16)
x y y
The coefficient functions Ci (x, s (Q2 )) can be computed in perturbation theory as a power series
expansion in s (Q2 ). Parton distribution can be formally defined [36] by means of suitable operator
matrix elements in the proton,
Z
2
dy ixp+ y
qi x, Q e p|i 0, y W [y, 0] + (0)|p R , (4.17)
4
where W [y] is a Wilson line (a path-ordered exponential of the gluon field) and the parton distribu-
tions are renormalized at the scale 2 = Q2 . Since these parton distributions are non-perturbative,
they must be determined from experimental data. The techniques used to extract parton distribu-
tions from hard scattering data together with the associated uncertainties will be discussed in the
next Section.
At leading order in the strong coupling s Q2 , the expressions for the different structure func-
tions in terms of parton distribution functions are relatively simple. For example, for charged lepton
scattering off a proton target one has
4 1
em 2
F2 (x, Q ) = x (u + u + c + c) + d + d + s + s x, Q2 , (4.18)
9 9
d2 42 h 2 N C 2 NC 2
y NC 2
i
= xy F1 (x, Q ) + (1 + y)F2 (x, Q ) + y 1 F3 (x, Q ) , (4.22)
dxdQ2 xQ4 2
where in terms of parton distribution functions one has
Nf
X
F2N C (x) = 2xF1N C (x) = [qi (x) + qi (x)] Cq (Q2 ) , (4.23)
i=1
Nf
X
xF3N C (x) = [qi (x) qi (x)] Dq (Q2 ) , (4.24)
i=1
Q2
Dq (Q2 ) = 2e2q Ae Aq PZ + 4Ve Ae Vq Aq PZ2 , PZ = , (4.26)
Q2 + MZ2
where ei are the electromagnetic charges of the quarks and Vi and Ai are the vector and axial
couplings of the fermions to the Z boson. Note the appearance of the parity-violating structure
function F3 (x, Q2 ), which is sensitive to the helicity of the incoming leptons (that is, it is different
for example for an electron and for a positron).
Even if parton distribution functions qi (x, Q20 ) are of non-perturbative origin, it can be shown
that their dependence with the scale Q2 is dictated by perturbative QCD, provided the scale Q2 is
large enough. The dependence of the parton distributions with the scale Q2 , also known as their
evolution with Q2 , is dictated by the perturbative DGLAP [38, 39, 40] evolution equations. These
equations can be used to evolve with Q2 any combination of parton distributions, however, their
form is much simpler if suitable combinations are defined. For nonsinglet combinations of parton
distributions, defined as differences between quark distributions,
qN S,ij (x, Q20 ) (qi qj ) (x, Q20 ) , (4.27)
where i, j label either a quark or an antiquark, the DGLAP evolution equation reads
Z
dqN S (x, Q2 ) s (Q2 ) 1 dy 2
x 2
= PN S y, s (Q ) qN S ,Q , (4.28)
d ln Q2 2 x y y
where PN S (x, s Q2 ) are the non-singlet splitting functions. These splitting functions can be
computed perturbatively as an expansion in powers of s Q2 . For instance, the leading order
expression for the nonsinglet splitting function is given by
(0) 1 + x2 3
PN S (x) = CF + (1 x) . (4.29)
(1 x)+ 2
It is clear from its definition that the gluon is decoupled from the evolution of nonsinglet parton
distributions. The remaining independent combination of parton distribution is called the singlet
parton distribution, defined as the sum of all quark and anti-quark flavors,
Nf
X
(x, Q2 ) qi (x, Q2 ) + q i (x, Q2 ) . (4.30)
i=1
In the singlet sector, the DGLAP equation is a 2-dimensional matrix equation. In this case the singlet
distribution evolves coupled to the gluon parton distribution using the singlet DGLAP evolution
equation,
Z
d (x, Q2 ) s (Q2 ) 1 dy Pqq (y) Pqg (y) (x/y, Q2 )
= , (4.31)
d ln Q2 g(x, Q2 ) 2 x y Pgq (y) Pgg (y) g(x/y, Q2 )
in terms of the singlet matrix of splitting functions.
The DGLAP evolution equations, Eqns. 4.28 and 4.31 can be solved using a wide variety of
techniques. A particularly useful method is the transformation of the evolution equations to Mellin
space, also known as moment space, using the integral transformation
Z 1
2
qi N, Q dxxN 1 qi x, Q2 . (4.32)
0
In Mellin space, the nonsinglet DGLAP evolution equation, Eq. 4.28 is no longer an integro-
differential equation but rather a simple differential equation,
dqN S (N, Q2 ) s (Q2 )
2
= N S N, s (Q2 ) qN S N, Q2 , (4.33)
d ln Q 2
The neural network approach to parton distributions 39
where the anomalous dimension N S N, s (Q2 ) is the Mellin transform of the splitting function,
Z 1
2
N S N, s (Q ) = dxxN 1 PN S x, s (Q2 ) . (4.34)
0
The main advantage of this method is that in Mellin space the DGLAP equations can be solved
analytically. In the nonsinglet section, for example, Eq. 4.33 has the solution
qN S (N, Q2 ) = N, s Q2 , s Q20 qN S (N, Q20 ) , (4.35)
The first term in the right side is the naive result of the parton model. Note that oppositely to
the case of the GLS sum rule discussed above, the Gottfried sum rule has no justification in full
QCD, and indeed it is violated, as was observed from experimental measurements of the nonsinglet
structure function by the NMC collaboration [44]. In this case perturbative corrections turn out to
be negligible. The measurement of this sum rule showed that isospin symmetry in the sea quarks
does not hold for the proton, that is, u(x) 6= d(x).
that the situation was considerably different for hadrons with at least one heavy quark, where heavy
means the condition that its mass mH satisfies
mH QCD . (4.39)
This is so because the heavy quark mass provides a large mass scale, so that perturbative com-
putations of a variety of processes related to this system, like
decay rates or spectroscopy, can be
computed in perturbation theory as an expansion in s m2H 1. The heavy quark condition Eq.
4.39 is satisfied by the charm, bottom and top quarks, even if in the latter case it is of no practical
interest since the top quark does not hadronize due to its short lifetime.
For this reason, hadronic states with b quarks have become a theoretical laboratory for pertur-
bative QCD. It has proved to be an useful environment for the development of several effective field
theories, like Non Relativistic QCD [45] , Heavy Quark Effective Theories [46] and more recently
the Soft Collinear Effective Theory [47, 48]. All these effective theories make use, in one form or
another, of the condition Eq. 4.39 to simplify the dynamics of the relevant processes.
In this thesis we will focus our attention, for reasons to be described in the following, to the
inclusive semileptonic decays of B mesons into charmed final states. This process is useful to de-
termine the CKM matrix element, Vcb with high accuracy as well as the b quark mass mb . The
process, B Xc l, is represented in Fig. 4.4. The most inclusive observable that can be measured
in this process is the total semileptonic decay rate. This decay rate can be written as perturbative
series expansion in s (m2b ), where as has been discussed before, the b quark mass mb plays the
role of the hard scale of the process, as well as an expansion in inverse powers of the b quark mass
parametrized by nonperturbative matrix elements of local operators, in the OPE spirit [49]. This
expansion in powers of 1/mb is also known as the heavy quark expansion [50]. The inclusion of the
leading nonperturbative effects through the heavy quark expansion is crucial to analyze this process,
and in general, heavy meson physics, since the heavy quark masses are not so large compared to
QCD . Therefore, typical nonperturbative corrections of the order O (QCD /mb ) have to be taken
into account for precision theoretical computations.
With this caveats, we can write for the total B meson semileptonic decay rate the following
expression
G2 m5
(B Xc l) = F 3b |Vcb |2 (1 + Aew ) Apert ()
" 192
#
1 2 1
z0 () 1 + g() +O , (4.40)
2m2b 2m2b m3b
m2c
z0 () = 1 8 + 83 4 122 log , = , (4.41)
m2b
Figure 4.4:
The semileptonic decay of a B meson into a charmed hadronic state and a lepton-neutrino pair.
Much more detailed information on the underlying dynamics of the process can be obtained by
analyzing less inclusive observables. These observables can be constructed from the triple differential
decay rate for this process,
B(p) l(pl ) + (p ) + Xc (r) , (4.44)
which depends on three different kinematical variables q 2 , r and El , where q = pl + p is the total
four momentum of the leptonic system, r = p q is the four-momentum of the charmed hadronic
final state, with invariant mass r2 = MX 2
, and El is the lepton energy in the rest frame of the
decaying b quark. This triple differential distribution can be decomposed, using the kinematics of
the process and the symmetries of the theory, in terms of three structure functions,
"
d3 2 G2F |Vcb |2 2
(q , r, El ) = q W1 (q 2 , u)
dq 2 drdEl 16 4
#
q 2
2v pl 2v pl v q + W2 (q 2 , u) + q 2 (2v pl v q) W3 (q 2 , u) , (4.45)
2
where u2 = r2 m2c , v = p/mb and the quantities with a hat are dimensionless quantities normalized
to mb . All the structure functions Wi (q 2 , u) have both a perturbative expansion in powers of s ,
and a nonperturbative expansion in powers of 1/mb , which can be computed in the framework of
the heavy quark expansion. For example, the O(s ) corrections for all the differential distributions
that can be constructed from Eq. 4.45 have become available recently [51, 52].
Typical observables which are accessible in experiments are convolutions of the differential spectra
with suitable weight functions over a large enough range, with kinematical cuts. A particular case
of these observables are the moments of differential decay distributions. In this thesis we will study
the leptonic moments, defined as
Z Emax Z
n d3
Ln (E0 , ) dEl (El ) dq 2 dr 2 (q 2 , r, El ) , (4.46)
E0 dq drdEl
where E0 is a lower cut on the lepton energy, required experimentally to select this decay mode
from the background from other B meson decays, and Emax is the maximum energy allowed from
The neural network approach to parton distributions 42
Figure 4.5:
The folded lepton energy spectrum as measured by the Babar collaboration as a function of the lepton
momentum (left) and the corresponding leading order theoretical prediction (right).
for precision studies of hadronic physics has been extensively studied [65], specially thanks to the
high quality data provided by the LEP accelerator at CERN [66, 67]. Not only the hadronic tau
decays provide one of the most precise determinations of the strong coupling s (M ), but since
the tau mass is not so large as compared to QCD , the non-perturbative effects, parametrized by
vacuum condensates, can be extracted from experimental data in a clean way. In this Section we
will briefly review the theoretical foundations of the QCD analysis of hadronic tau decays.
The hadronic decays of the tau lepton (see Fig. 4.6) are of the form
X , (4.53)
where X is an hadronic system, composed basically by pions, with vanishing total strangeness.
The final hadronic state can be separated into scalar, vector and axial vector contributions, since
parity is maximally violated in decays. The hadronic invariant mass-squared s distribution can be
measured for each decay channel. These invariant mass distributions dNV /A /ds are related to the
so-called spectral functions i (s) by
dNV /A (s)
V /A (s) = KV /A (s) , (4.54)
ds
for vector (V) and axial-vector (A) final states, and where Ki (s) is a purely kinematic factor. In
Fig. 4.7 we show the contributions from the different decay modes to the vector and the axial vector
spectral functions [65].
Spectral functions are the observables that give access to the inner structure of hadronic tau
decays. For reasons to be described in the following, we are in particular interested in the difference
between the vector and the axial vector spectral functions,
Figure 4.6:
The hadronic decays of the tau lepton. The final state of the decay consists on a hadronic system X,
composed mainly by pions, and a undetected neutrino.
is recovered, but current experimental data do not allow to draw any conclusion of how large s is in
reality. Therefore, the spectral function V A (s) is generated entirely from nonperturbative QCD
dynamics, and provides a laboratory for the study of these non-perturbative contributions, which
have resulted to be small and therefore difficult to measure in other processes where the pertur-
bative contribution dominates. As will be discussed in brief, these nonperturbative contributions
are organized by the Operator product expansion and are parametrized by matrix elements in the
vacuum of local operators, the so-called QCD vacuum condensates.
As it is well known [63], the basis of the comparison of theoretical predictions with experimental
data in the hadronic decays of the tau lepton is the fact that unitarity and analyticity connect the
spectral functions of hadronic tau decays to the imaginary part of the hadronic vacuum polarization
tensor, Z
ij,U (q) d4 x eiqx 0|T Uij (x)Uij (0) |0 , (4.57)
of vector Uij Vij = qj qi or axial vector Uij Aij = qj 5 qi color singlet quark currents
in corresponding quantum states. After Lorentz decomposition is used to separate the correlation
function into its J = 1 and J = 0 components,
(1) 2 (0)
2
ij,U (q) = g q + q q
ij,U (q ) + q q ij,U (q 2 ) , (4.58)
for non-strange quark currents one identifies
(1) 1
Imud,V /A (s) = V /A (s) . (4.59)
2
Since as we have shown before the spectral functions V /A (s) can be measured experimentally, Eq.
4.59 provides the basis for the comparison between theoretical predictions in the framework of the
operator product expansion for the hadronic correlator, Eq. 4.57, and experimental data in terms of
spectral functions, Eq. 4.54. This relation then allows us to implement all the technology of QCD
vacuum correlation functions to hadronic tau decays.
The neural network approach to parton distributions 45
Figure 4.7:
The different contributions to the total vector (left) and axial-vector (right) spectral functions from the
hadronic decays of the lepton as measured by the Aleph detector.
As has been mentioned before, the basic tool to study in a systematic way the power corrections
introduced by nonperturbative dynamics is the operator product expansion. Since the approach of
Ref. [25], the operator product expansion has been used to perform calculations with QCD on the
ambivalent energy regions where nonperturbative effects come into play but still perturbative QCD
is relevant. In general, the OPE of a two point correlation function (J) (s) takes the form [63]
X 1 X
(J) (s) = C (J) (s, ) hO()i , (4.60)
D=0,2,4,...
(s)D/2 dimO=D
where the arbitrary parameter separates the long distance nonperturbative effects absorbed into
the vacuum expectation elements hO()i, from the short distance effects which are included in the
Wilson coefficient C (J) (s, ), and J is the parity of the correlator. The operator of dimension D = 0
is the unit operator (the perturbative series) In the case we are interested in, the vector-axial vector
correlator, the operator product expansion has no perturbative term and reads
X 1
(Q2 ) C2n+4 (Q2 , 2 ) O 2n+4 (2 ) , (4.61)
V A
n=1
Q2n+4
where we observe that the D = 6 term is the first non-vanishing non-perturbative contribution,
in the limit of massless light quarks, to the V A (s) spectral function and, moreover, it has been
shown to be the dominant one. Therefore, this spectral function should provide a source for a clean
extraction of the value of the nonperturbative contributions from experimental data.
Even if the V A (s) spectral function is purely non-perturbative, it has to satisfy several sum
rules that can be derived rigorously from QCD. Sum rules have always been an important tool for
studies of non-perturbative aspects of QCD, and have been applied to a wide variety of processes,
from deep-inelastic scattering, as we have seen in Section 4.1.2, to heavy quark bound states [68, 69],
introduced in Section 4.1.3. Now we will review one of the classical examples of low energy QCD sum
rules, the chiral sum rules. The application of chiral symmetry together with the optical theorem
leads to low energy sum rules involving the difference of vector and axial vector spectral functions,
V A (s). These sum rules are dispersion relations between real and absorptive parts of a two point
correlation function that transforms symmetrically under SU (2)L SU (2)R in the case of non strange
The neural network approach to parton distributions 46
The rationale for this functional form is that parton distributions parametrized this way follow quark
counting rules [73] at large x and Regge behavior [74] at small x, and Pi (x) is a smooth polynomial
in x that interpolates between the small-x and the large-x regions. Note that neither of the two
The neural network approach to parton distributions 47
limiting behavior (large-x and small-x) of the parton distributions can be derived in a rigorous way
from Quantum Chromodynamics, so they are more phenomenological expectations rather that firm
theoretical predictions.
In principle one should parametrize and extract from experimental data the 2Nf + 1 independent
parton distributions. In practice, however, one has to take some assumptions since available data
cannot constrain all of them. For example, since there is scarce experimental information of the
valence strange distribution s s, it is typically set to be zero, s = s. Another example of this kind
of simplifications was the assumption that the sea u and d distributions were the same. As more
and better data becomes available, some of this assumptions are shown not to be true, and one has
to allow more freedom in the parametrizations of the parton distributions. For example, the NMC
measurements of the Gottfried sum rule (see [43] and references therein) showed that for the proton
Q2 ). Since that measurement, the assumption u = d was not used anymore in global
u(x, Q2 ) 6= d(x,
fits of parton distributions.
To be definite, we show now the explicit parametrizations from a recent global analysis of par-
ton distributions [75] from the MRST collaboration. The up and down valence distributions are
parametrized as
xuV (x, Q20 ) x (u u) (x, Q20 ) = Au xbu (1 x)cu 1 + du x + eu x , (4.67)
xdV (x, Q20 ) x d d (x, Q20 ) = Ad xbd (1 x)cd 1 + dd x + ed x , (4.68)
then the sea combination of parton distributions is given by
xS(x, Q20 ) = As xbs (1 x)cs 1 + ds x + es x , (4.69)
The total number of fitted parameters in this case is around 20. With the above relatively simple
parametrization one can describe a wealth of hard-scattering processes, thus showing that QCD fac-
torization [34] holds in the majority of high energy processes involving strongly interacting particles.
Note that Ref. [75] allows for the gluon parton distribution to become negative, as can be seen
from Eq. 4.71. This would appear to be in conflict with the interpretation of parton distributions
as the probability distributions of the momentum that quarks and gluon carry inside the proton.
However, it has been emphasized [76] that parton distributions are not physical quantities, in partic-
ular beyond the leading order approximation they depend on the renormalization scheme. Physical
quantities, on the other hand, like cross sections and structure functions, satisfy positivity bounds,
The neural network approach to parton distributions 48
that is, even if parton distributions are allowed to become negative, structure functions are not since
they are observable quantities and therefore are positive definite.
The next step of the global fitting approach is to evolve the set of parton distributions that
have been parametrized, Eq. 4.66 at the initial evolution scale Q20 , to the scale Q2 where there
is experimental data using the solution of the DGLAP evolution equations, Eqns. 4.28 and 4.31.
Then for each specific process one adds the contribution from the perturbative coefficient functions
to construct the corresponding observable, for example deep-inelastic structure functions, as in Eq.
4.16. The number of hard-scattering processes that are nowadays used to constrain the shapes of
the different parton distribution functions is rather large. These include
Deep-inelastic scattering structure functions, F2 (x, Q2 ), F3 (x, Q2 ) and FL (x, Q2 ), both in
charged lepton and in neutrino DIS,
The Drell-Yan process, pp Xll in hadronic collisions,
Jet production, both in e p collisions and in pp collisions,
Gauge boson production pp W (Z)X,
and other processes like prompt photon production and heavy quark production. In any case, it
has to be emphasized that deep-inelastic scattering is and will be in the following years the most
important source of information on parton distribution functions, specially the precision structure
function measurements of the HERA collider [30]. The LHC will not only use extensively parton
distributions, but it will also be useful to constrain its shape, since it proves a kinematic region not
accessible at available colliders, for example with processes like the differential rapidity distribution
of gauge boson production [31].
The final step of a global fit is to minimize a suitable statistical estimator, for example the
diagonal statistical error
2
(exp) (QCD)
N
X dat Fi Fi ({al })
2 ({ai }) = 2 . (4.75)
i=1
i,stat
(exp) (QCD)
where Fi is the experimental measurement and Fi ({ai }) the theoretical prediction as a
function of the parameters {ai } that describe the set of parton distributions. These parameters
are determined by the condition that the statistical estimator is minimized, that is, one wants to
(0)
determine the set of parameters {ai } that satisfy the condition
(0)
2 a l min{al } 2 ({al }) . (4.76)
Until some years ago, in global fits of parton distribution functions the error function to be min-
imized was the statistical error function, Eq. 4.75, where i,stat is the total uncorrelated statistical
uncertainty. However the precision of modern experimental data made compulsory to consider the
effects of the correlated systematic uncertainties. This was so both because the statistical accuracy
of the data became higher and the experimental groups became to provide the contributions from
the different sources of systematic errors with the experimental data. The inclusion of correlated
systematics can be done with two equivalent definitions of the error function 2 . The first one uses
the explicit form of the experimental covariance matrix,
Ndat
1 X (exp) (QCD) (exp) (QCD)
2 = Fi Fi (cov1 )ij Fj Fj , (4.77)
Ndat i,j=1
q
2
where ij is the correlation matrix of experimental data and tot,j = stat,j 2
+ sys,j is the total
experimental uncertainty. Since the number of data points is typically large, the inversion of the
covariance matrix might lead to numerical instabilities. For this reason, a equivalent form of Eq.
4.77 was proposed that does not involve explicitely the covariance matrix. This second equivalent
definition is given by
Ndat K
!2 K
2 1 X 1 (exp) (QCD)
X X
= 2 Fi Fi r
k ki + rk2 , (4.79)
Ndat i=1 stat,i
k=1 k=1
where ki is the contribution from the K sources of correlated systematic uncertainties. In this
approach one has to minimize this 2 both with respect to the parameters rk , which determine the
effect of correlated systematics, as well as with respect to the parameters {ai } defining the QCD
model F (QCD) . The problem with this definition is that the number of correlated systematics can
be very large, which leads to a formidable minimization task. A way to overcome this difficulty is to
perform the minimization with respect to the parameters rk analytically. In this case one obtains a
simplified expression,
2
(exp) (QCD)
1
N
X dat Fi F i
K
X
2 ({al }) = 2 Bk (A1 )kk Bk , (4.80)
Ndat i=1 i,stat k,k
Figure 4.8:
The benchmark parton distribution functions with associated uncertainties as described in Ref. [31]: the
valence down distribution (left) and the gluon distribution (right).
here. This is so because theoretical errors are not gaussian, and the best one can do to estimate
their effects is to provide some suitable prescriptions, but no rigorous statistical analysis is available
for these type of uncertainties.
2 2 ({am }, {rn })
Hij = , (4.83)
ai aj
defined with respect to the set of parameters {al } defining the QCD model, a second Hessian matrix
is defined as
2 2 ({am }, {rn })
Vjk = , (4.84)
aj rk
which contains information on the variations of the error function 2 with respect to the systematic
uncertainties parametrized by the rk parameters. Then the covariance matrix for the systematic
uncertainties is given by
C sys = H 1 V V T H 1 , (4.85)
and the total covariance matrix is constructed as the sum from the two contributions, the statistical
and the systematic,
C tot = C stat + C sys , (4.86)
The neural network approach to parton distributions 51
where C stat = H 1 is the standard statistical covariance matrix. In this method the uncertainty in
any quantity that depends on the parton distribution functions, F [{qi }] is computed from
Npar
X F F
(F )2 = Cij , (4.87)
i,j=1
a i a j
by substituting C by the appropriate covariance matrices, C stat , C sys and C tot to obtain the total
statistical, correlated systematic and total experimental error band respectively. Npar is the total
number of parameters used in the parametrization of the set of parton distributions at the starting
evolution scale, Eq. 4.66. This method is not statistically rigorous, but it has the virtue that it does
not assume gaussianly distributed systematic uncertainties. It gives a conservative error estimate
as compared with other methods, like for example the Hessian method with 2 = 1 rule, to be
discussed in brief. This method suffers two serious drawbacks: first of all it assumes that linear
error propagation gives a decent estimate of the total error propagation, an assumption that has
been shown to be not correct in many cases. Second, as can be seen from the above formulae, the
estimation of the uncertainties depends heavily on the functional form model than one has assumed
for the set of parton distributions.
Another popular method is the so-called Hessian method. In the Hessian method [82] one assumes
that the deviation in 2 for the global fit from the minimum value is quadratic in the deviation of
the parameters specifying the input parton distributions {ai } from their values at the minimum
{a0i }. First one determines the best fit set of parton distributions from the minimization of Eq.
4.79, that is, including the contribution from correlated systematic uncertainties, as opposed to the
Offset method. Then to estimate the associated uncertainty one can write
n X
X n
2 2 20 = Hij ai a0i ai a0j , (4.88)
i=1 j=1
where H is the Hessian matrix, defined in Eq. 4.83. Standard linear propagation implies that the
error on an observable F [{qi }] is given by
Npar
2
X F F
2 stat
(F ) = Cij ({am }) , (4.89)
i,j=1
a i a j
where the covariance matrix of the parameters C stat is again the inverse of the Hessian matrix,
Eq. 4.83, and 2 is the allowed variation in 2 . Textbook statistics imply that one should have
2 = 1, however it has been argued that a higher value, 2 = 100 is required in order to estimate
the uncertainties in a faithful way, due to the fact that data from different experiments is sometimes
incompatible [83].
For practical purposes, it is more numerically stable to diagonalize the covariance matrix and
work in the basis of eigenvectors, defined by
Npar
X
Hij ({al }) vjk = k vik i, k = 1, . . . , Npar . (4.90)
j=1
One has to take into account that since variations in some directions in the parameter space lead
to deterioration of the quality of the fit far more quickly than others, the eigenvalues k span
several orders of magnitude. In the Hessian method, one ends up with a set of 2Npar sets of parton
distributions Si . One can therefore propagate the uncertainty associated to the parton distributions
to any given observable F [{qi }] with the following master formula, equivalent to Eq. 4.89,
Npar
2 1 X 2
(F ) = F ({Si+ }) F({Si }) . (4.91)
2 i=1
The neural network approach to parton distributions 52
The drawbacks of this method are similar to those of the Offset method: first of all one assumes
that the linearized approximation in error propagation is valid, and second errors estimated with
this method depend heavily on the functional form choosen for the parametrization of parton dis-
tributions. Finally the introduction of non-standard tolerance criteria 2 > 1 does not allow to
give a statistically rigorous meaning to the resulting uncertainties.
Figure 4.9:
Relative errors of parton distributions in the recent CTEQ6 analysis [84]: up quark parton distribution
(left) and gluon parton distribution (right).
The final technique that will be discussed is the Lagrange multiplier method [85], which overcomes
some of the drawbacks of the above methods, specially the linearized approximations. To investigate
the allowed variation of a specific physical observable is more rigorous using this method that the
previously discussed ones. In the Lagrange multiplier approach, one performs a global fit while
constraining the values of a physical quantity F [{qi }] in the neighborhood of their values F (0)
obtained in the unconstrained global fit. The starting point is the best-fit set of parton distributions
(0)
S0 characterized by the parameters {ai }. The uncertainty associated to the observable F is
estimated in two steps. First we use the Lagrange multiplier method to determine how the minimum
of 2 ({ai }) increases, as F deviates from the best estimate F (0) , and then one determines the
appropriate tolerance of 2 ({ai }), 2 . The first step is then minimizing the constrained error
function
(, {ai }) = 2 ({ai }) + F ({ai }) , (4.92)
for various values of , following the chain
This procedure generates a parametric relationship between 2 and the observable, F , in terms of
the parameter , so that given an allowed value of 2 , it is straightforward to derive an allowed
range for the observable F , without any linearized approximation. In practice this procedure
generates a sample of sets of parton distributions {Sn } equal to the size of the parton distribution
set parameter space Npar , since the minimization is performed in each direction of the parameter
space, which are then used to assess the range of variation of F allowed by the data, using Eq. 4.89.
It is clear that the Lagrange multiplier method provides a sample of parton distribution sets
tailored to assess the uncertainty associated to the physical problem at hand. This procedure does
The neural network approach to parton distributions 53
Figure 4.10:
Set of parton distributions with uncertainties computed with different methods from a recent QCD
analysis of the ZEUS collaboration [86].
not involve some of the approximations involved in the Hessian and Offset methods, but it still has
the problem of the introduction of non-standard tolerance criteria. However, this method suffers
from a large practical disadvantage, which is that the global fits of parton distributions must be
repeated every time one needs to determine the uncertainties of a different observable coming from
parton distributions.
In Figs. 4.9 and 4.10 we show the errors associated to different parton distributions from two
different QCD global analysis. Note that the gluon parton distribution in particular has a rather large
uncertainty, specially at large and small x, where there are no direct constrains from experimental
data on its shape. The valence parton distributions, on the other hand, are known with higher
accuracy since they are directly constrained from the precise deep-inelastic scattering data.
It must be emphasized that whatever method is used, nowadays there is a common format
to represent the uncertainties in parton distribution. This is accomplished by providing, on top
of the best-fit parton distribution set, additional sets of error parton distributions, that span the
parameter space of the parton distributions allowed by the corresponding experimental uncertainties,
estimated with some of the methods described above. With the sample of parton distribution sets,
one computes the uncertainty in any physical quantity F [qi ] by means of Eq. 4.91. All the modern
sets from different global QCD analysis of parton distribution function, including the error sets, are
available through the LHAPDF library [87].
The standard approach introduced in this section for the determination of unpolarized parton
distributions and the associated uncertainties is also used for similar global QCD analysis which
involve different types of parton distributions, like polarized parton distributions [88, 89, 90], which
measure the fraction of the total spin of the proton carried by the different partons, or nuclear
parton distributions [91, 92, 93] which measure how parton distributions are modified in nucleus
within heavy nuclei with respect to those of the free nucleon. Also for parametrizations of the
photon [94] and pion [95] parton distributions from experimental data the standard approach is
used. However, in all the above cases experimental data is more scarce and with larger uncertainties
than in the case of unpolarized structure functions.
The neural network approach to parton distributions 54
In Section 5.1 we will introduce an alternative approach to estimate faithfully the uncertainties
associated to a function parametrized from experimental data. This approach (the Monte Carlo
approach) can be seen to be equivalent to the more common technique to determine confidence
levels based on the 2 = 1 condition, assuming that in the latter case linear error propagation is a
good enough approximation. In Appendix A.2 we show explicitely this equivalence within a simple
model.
The neural network approach to parton distributions 55
Chapter 5
This Chapter constitutes the core of the present thesis: we describe the strategy that will be used
to parametrize experimental data in an unbiased way with faithful estimation of the associated
uncertainties, using a combination of Monte Carlo techniques and neural networks as unbiased
interpolants. This strategy has three main parts. In the first part, one constructs a Monte Carlo
sampling of the experimental data for a given observable, which determines the probability measure
of this observable over a finite set of data points. Then we use neural networks as basic interpolating
tools, to construct a continuous probability density for the observable under consideration. This
parametrization strategy has the advantage that it does not require any assumption, rather than
continuity, on the functional form of the underlying law fulfilled by the given observable. It also
provides a faithful estimation of the uncertainties, which can be then propagated to other quantities
without the need of any linearized approximation. Finally the third step consists on the statistical
validation of the constructed probability measure in the space of the parametrized function by means
of suitable statistical estimators. In summary, in this Chapter we will describe how given a measured
observable F , the associated probability measure in the space of functions, P [F ], can be constructed,
so that expectation values of arbitrary functionals of F , F [F ] can be computed as with standard
probability distributions, that is
Z
hF[F ]i DF P [F ] F [F ] . (5.1)
In Fig. 5.1 we show a summary of our parametrization strategy for the particular case of the proton
structure function.
56
The neural network approach to parton distributions 57
Figure 5.1:
Summary of the strategy presented in this thesis to parametrize experimental data in an unbiased way
with faithful estimation of the uncertainties for the particular case of the proton structure function
F2 (x, Q2 ). As can be seen, the three main steps of our approach are the Monte Carlo generation of replicas
of experimental data, the neural network training and finally the statistical validation of the results.
as the quadratic sum of the statistical uncertainty stat,i and the Nu uncorrelated systematic uncer-
tainties
esys,ji . The contribution to the i-th data point from the j-th source of correlated systematic
(k) (k)
uncertainty is denoted by sys,ji , and N is the total normalization uncertainty. Finally rN , rt,i
(k)
and rsys,j are zero mean univariate gaussian random numbers with the same correlation as the cor-
responding uncertainties. For example, within a given experiment, for all the data points of the
(k)
k-th replica we use the same random number for normalization uncertainties, rN , since this uncer-
tainty is correlated among all these measurements. The condition that these random numbers are
gaussianly distributed implies that
D E D E
(k) 1 (k)2 1
lim ri =O , lim ri =1+O . (5.5)
Nrep rep Nrep Nrep rep Nrep
where the total error tot,i for the i-th point is given by
r
2
2 + 2 (exp) 2 ,
tot,i = t,i sys,i + Fi N (5.8)
and finally the total correlated uncertainty sys,i is the sum of all correlated systematics,
Nsys
X
2 2
sys,i = sys,ji . (5.9)
j=1
The presence of correlated systematic uncertainties which are not symmetric for some experi-
ments deserves further comment. As it is well known [96, 97, 98, 99], asymmetric errors cannot be
combined in a simple multi-gaussian framework, and in particular they cannot be added to gaussian
errors in quadrature. For the treatment of asymmetric uncertainties, we will follow the approach of
Refs. [98, 99], which, on top of several theoretical advantages, is closest to the ZEUS error analysis
and thus adequate for a faithful reproduction of the ZEUS data on deep-inelastic structure functions,
that is were these asymmetric uncertainties appear. In this approach, a data point with central value
x0 and left and right asymmetric uncertainties R and L (not necessarily positive) is described by
a symmetric gaussian distribution, centered at
R L
hxi = x0 + , (5.10)
2
and with width 2
R + L
x2 = =2
. (5.11)
2
The ensuing distribution can then be treated in the standard gaussian way.
Once the sampling of the experimental data in terms of a set of replicas of artificial data has been
generated, it defines the probability measure of the observable F in those data points where there
exist experimental measurements. From this probability density one can compute any estimator as
The neural network approach to parton distributions 59
with standard probability distributions. That is, if A [F ] is any functional of the observable F (the
simplest example is the observable itself for the i-th data point, A [F ] = Fi ) then its average as
computed from the probability measure is given by
D E Nrep
1 X (art)(k)
A(art) A . (5.12)
rep Ndat
k=1
Experimental data on the observable F might be available without information on the separation
of the different sources of systematic uncertainties, which are then in general collected together in
(exp)
the correlation matrix ij of the experimental data. In this second case, the equivalent of Eq. 5.3
is given by
(art)(k) (exp) (k)
Fi = Fi + ri tot,i , (5.13)
(k)
where {ri } are gaussianly distributed random numbers with the same correlation as the experi-
mental data points, that is, they verify
D E D E D E
(k) (k) (k) (k)
ri rj ri rj D E
rep rep rep (k) (k) 1 (exp)
rD E D E2 r D E D E2 = ri rj + O = ij , (5.14)
(k)2 (k) (k)2 (k) rep Nrep
ri ri rj rj
rep rep rep rep
where averages over replicas have been defined in Eq. 5.12. In this situation the covariance matrix
can be computed using
covij = ij tot,i tot,j . (5.15)
The number of replicas of the experimental data Ndat that needs to be generated with any
of the two equivalent approaches described above is determined by the condition that the Monte
Carlo sample reproduces the central values, errors and correlations of the experimental data. The
comparison between experimental data and the Monte Carlo sample can be made quantitative by
the use of statistical estimators that will be described in the following Section.
D E Nrep
(art) 1 X (art)(k)
Fi = Fi . (5.16)
rep Nrep
k=1
Associated variance s
2 D E2
(art) (art) (art)
i = Fi Fi . (5.17)
rep rep
D E
D (art) (exp)
E N
X dat Fi Fi
1 rep
P E F (art) = (exp) . (5.21)
rep dat Ndat i=1 Fi
D h
iE D h
iE D h
iE
We define analogously V (art) rep , V (art) rep , V cov(art) rep ,
D h
iE D h
dat iE
D h
dat iE dat
and P E (art) rep , P E (art) rep and P E cov(art) rep , for errors, cor-
dat dat dat
relations and covariances respectively.
These estimators indicate how close the averages over generated data are to the experimental
values. Note that in averages over correlations and covariances one has to use the fact that
correlation and covariances matrices are symmetric matrices, and thus one has to be careful
to avoid double counting. For example, the percentage error on the correlation will be defined
as
D E
D (art) (exp)
E 2
N
X dat N
X dat ij ij
rep
P E (art) = (exp) , (5.22)
rep dat Ndat (Ndat + 1) i=1 j=i ij
Average variance:
D E 1 X
Ndat
(art)
(art) = . (5.26)
dat Ndat i=1 i
The neural network approach to parton distributions 61
We define analogously (art) dat and cov(art) dat , as well as the corresponding experimental
quantities. This quantities are interesting because even if the scatter correlations r are close
to 1 there could still be a systematic bias in the estimators Eqns. 5.17-5.19. This is so since
even if all scatter correlations are very close to 1, it could be that some of the Eqns. 5.17-5.19
where sizably smaller than its experimental counterparts, even if being proportional to them.
The typical scaling of the various quantities with the number of generated replicas Nrep follows
the standard behavior of gaussian Monte Carlo samples [100]. p For instance, variances on central
values scale as 1/Nrep, while variances on errors scale as 1/ Nrep . Also, because
2 2
(art) 1 (exp)
V [ij ] = 1 ij , (5.27)
Nrep
as can be checked using Eq. A.6 in Appendix A.1, the estimated correlation fluctuates more for
small values of (exp) , and thus the average correlation tends to be larger than the corresponding
experimental value.
n(l)
(l+1)
X (l) (l) (l+1)
hi = ij j + i , (5.29)
j=1
(l)
where i is the activation threshold of the given neuron, nl is the number of neurons in the l-th
layer, L is the number of layers that the neural network has, and g(x) is the activation function of
the neuron, which we will take to be a sigmoid,
1
g(x) = , (5.30)
1 + ex
except in the last layer, where we use a linear activation function g(x). This enhances the sensitivity
of the neural network, avoiding the saturation of the neurons in the last layer. The fact that the
activation function g(x) is non-linear allows the neural network to reproduce nontrivial functions.
An schematic diagram of a feed-forward artificial neural network can be seen in Fig. 5.2.
Therefore multilayer feed-forward neural networks can be viewed as functions F : Rn1 RnL
parametrized by weights, thresholds and activation functions,
h i
(1) (l) (l)
jL = F i , ij , i , g , j = 1 . . . , nL . (5.31)
It can be proven that any continuous function, no matter how complex, can be represented by a
multilayer feed-forward neural network. In particular, it can be shown [101, 103] that two hidden
layers suffice for representing an arbitrary complicated function [104].
The neural network approach to parton distributions 62
Figure 5.2:
Schematic diagram of a multi-layer feed forward neural network.
Artificial neural networks are a common tool in experimental particle physics [102, 105, 106] in
applications like pattern recognition, where in this context a pattern typically means a experimen-
tal measurement as a function of a set of several variables which characterize this event. We are
interested in artificial neural networks in order to parametrize physical quantities in an unbiased
way, that is, without the need of any assumptions on its functional form behavior. These physical
quantities to be parametrized can be either direct experimental measurements (like in the case of
deep inelastic structure functions), or they can be related to experimental data through a functional
dependence (like in the case of parton distribution functions). Neural networks not only provide
universal unbiased interpolants for given physical quantity in the region were there is experimental
information, they also have the property that their behavior in extrapolation regions is not deter-
mined by their behavior in the data region, as happens in fits with simple functional forms, so
they are also useful to assess the uncertainty in extrapolation regions (that is, in regions without
information from experimental measurements).
Therefore, the performance of neural networks is worse for extrapolation than for interpolation.
That is, new input patterns that are not a mixture of some of the input training patterns cannot be
classified in an efficient way in terms of the learnt patterns. In particular, for a feed forward neural
network with one hidden layer, the generalization error (that is, the error in the classification of
patterns which are very different from the patterns used in the training) is of the order O (Npar /Ndat )
[107], where Npar is the number of neural network parameters and Ndat the number of training
patterns.
Another interesting property of artificial neural networks is that they are efficient in combining in
an optimal way experimental information from different measurements of the same quantity. That
is, when the separation of data points in the space of possible input patterns is smaller than a
certain correlation length, the neural network manages to combine the corresponding experimental
information, thus leading to a determination of the output pattern (in the present case the function
to parametrize) which is more accurate than the individual input patterns (the measurements from
the different experiments).
For this thesis three different estimators have been used: the central values error function,
Ndat 2
(k) 1 X (net)(k)
E1 = Fi F (art)(k) , (5.32)
Ndat i=1
where t,i is the total uncorrelated uncertainty, Eq. 5.4, and finally the error taking into account
correlated systematics,
1 X
Ndat 1
(k) (net)(k) (art)(k) (net)(k) (art)(k)
E3 = Fi Fi cov(k) Fj Fj , (5.34)
Ndat i,j=1 ij
(net)(k)
where in all the above expressions Fi is the prediction of the k-th neural network for the i-th
data point. Note that in the above equation we have defined the covariance matrix cov(k) ij with
the normalization uncertainty included as an overall rescaling of the error due to the normalization
offset of that replica, namely,
Nsys
X
cov(k) ij (k) sys,pi (k) sys,pj + ij (k) t,i , (5.35)
p=1
with
(k) (k)
a,i = (1 + rN N )a,i , (5.36)
(k)
where rN is the same as in Eq. 5.3. Eq. 5.35 is to be compared with the total covariance matrix,
Eq. 5.6. This is necessary in order to avoid a biased treatment of the normalization errors [98]. In
Appendix A.3 we describe within an explicitely solvable model the effects of an incorrect treatment
of normalization uncertainties.
Several relations hold between these estimators. Properties of minimization using the error
function defined with the covariance matrix, Eq. 5.34 can be found in Refs. [98, 108]. For example,
it can be shown that the following relation holds
(k) (k)
E2 E3 , (5.37)
since in the latter case correlated systematic uncertainties are included. The above constraint is
useful to cross-check that the experimental covariance matrix Eq. 5.35 has been correctly computed
and inverted, without numerical instabilities.
The usefulness of neural networks is due to the availability of a training algorithm. This algorithm
allows one to select the values of weights and thresholds such that the neural network reproduces
a given set of input-output data, also known as patterns. This procedure is called learning since
unlike a standard fitting procedure, there is no need to know in advance the underlying rule which
describes the data. In our case the patterns that must be learned are those that minimize any of the
estimators described above. Therefore we need to define a suitable minimization strategy to train
the neural networks. During the course of this thesis we have used different minimization algorithms,
which are described in the following Section.
The neural network approach to parton distributions 64
where the h function is defined in Eq. 5.29, and then back-propagated to the rest of the
network by
X
nl
(l1)k (L)k (l)k (l)
i = g hi j ij . (5.43)
j=1
The procedure is iterated until a suitable converged criterion is satisfied, as will be described
in brief. To avoid getting stuck in local minima, it is useful to add a momentum term in the
variations of the parameters. This means that Eq. 5.39 should be replaced by
(l) 2 (l)
ij = (l)
+ ij (last) , (5.44)
ij
2
(l) (l)
i = (l)
+ i (last) ,
i
The neural network approach to parton distributions 65
where (last) indicates the use of the last value of the update for the weights and the thresholds
before the current one and determines the size of the momentum term. The drawbacks of
back-propagation minimization is that it is not specially suited for non-linear error functions,
or for error functions which depend on the neural network output through convolutions.
One major improvement of this minimization algorithm during the course of the present thesis
was to implement weighted training, in order in increase the efficiency of the algorithm when
the experimental data consist on many different experiments. Since weighted training does
not depend on the minimization algorithm, we discuss it later in this Section.
Genetic Algorithms.
Genetic algorithms1 are the generic name of function optimization algorithms that do not suffer
of the drawbacks that deterministic minimization strategies2 have when applied to problems
with a large parameter space. Genetic algorithms for neural network training were introduced
in this context in [1], and it is a minimization strategy that has been used in different high
energy physics applications [109]. This method is specially suitable for finding the global
minima of highly nonlinear problems, as is the one we are facing in this thesis. Genetic
algorithms have several advantages with respect to deterministic minimization methods:
1. They simultaneously work on populations of solutions, rather than tracing the progress
of one point through the parameter space. This gives them the advantage of checking
many regions of the parameter space at the same time, lowering the possibility that a
global minimum gets missed.
2. No outside knowledge such as local gradient of the minimized function is required.
3. They have a built-in mix of stochastic elements applied under deterministic rules, which
improves their behavior in problems with many local extrema, without the serious per-
formance loss that a purely random search would bring.
All the power of genetic algorithms lies in the repeated application of three basic operations:
mutation, crossover and selection, which we describe in the following. The first step is to
encode the information of the parameter space of the function we want to minimize into an
ordered chain, called chromosome. If Npar is the size of the parameter space, then a point in
this parameter space will be represented by a chromosome,
a = a1 , a2 , a3 , . . . , aNpar . (5.45)
(l)
In our case each bit ai of a chromosome corresponds to either a weight ij or a threshold
(l)
i of a neural network. Once we have the parameters of the neural network written as a
chromosome, we replicate that chain until we have a number Ntot of chromosomes. Each
chromosome has an associated fitness f , which is a measure of how close it is to the best
possible chromosome (the solution of the minimization problem under consideration). In our
case, the fitness of a chromosome is given by the inverse of the function to minimize
1
f (a) = , (5.46)
2 (a)
so chromosomes with larger fitness correspond to those with smaller value of the function to
minimize 2 .
Then we apply the three basic operations:
1 See for example Refs. [109, 1, 110] and references therein. In particular we follow closely the description of genetic
software libraries
The neural network approach to parton distributions 66
Mutation:
Select randomly a bit (an element of the chromosome) and mutate it. The size of the
mutation is called mutation rate , and if the k-th bit has been selected, the mutation is
implemented as
1
ak ak + r , (5.47)
2
where r is a uniform random number between 0 and 1. Over the span of several genera-
tions, even a stagnated chromosome position can become reactivated by mutation. The
optimal size of the mutation rate must be determined for each particular problem, or it
can be adjusted dynamically as a function of the number of iterations.
Crossover:
This operation helps in obtaining a child generation which is genetically different from the
parents. Crossover means selecting at random pairs of individuals, for each pair determine
randomly a crossover bit s, and from this crossover point interchange the bits between
the two individuals.
Selection:
Once mutations and crossover have been performed into the population of individuals
characterized by chromosomes, the selection operation ensures that individuals with best
fitness propagate into the next generation of genetic algorithms. Several selection opera-
tors can be used. The simplest method is to select simply the Nchain chromosomes, out of
the total population of Ntot individuals, with best fitness. Later we will describe a more
efficient selection method based on probabilistic selection.
The procedure is repeated iteratively until a suitable convergence criterion is satisfied. Each
iteration of the procedure is called a generation. A general feature of genetic algorithms is
that the fitness approaches the optimal value within a relatively small number of generations.
The theoretical concept behind the success of genetic algorithms is the concept of patterns or
schemata within the chromosomes [112]. Rather than operating only with Ntot individuals in
each generation, a genetic algorithm works with a much higher number of schemata that partly
match the actual chromosomes. If for simplicity we consider chromosomes whose elements
can take only binary values, the concept of schemata means that a chromosome like 10110
matches 25 schemata, such as **11* or 1*1*0. The generalization of this concept to continuous
parameters is straightforward. Since fit chromosomes are handled down to the next generation
more often than unfit ones, the number of copies of a certain schema S associated with fit
chromosomes will increase from one generation to the next,
f(S)
nS (t + 1) = nS (t) , (5.48)
ftot
where f(S) is the average fitness of all individuals whose chromosome match schema S and
ftot is the average fitness of all individuals. This implies that if we assume a certain schema
approximately giving all matching chromosomes a constant fitness advantage c over the average
one obtains an exponential growth of the number of this schema from one generation to the
next,
nS (t) = nS (0)(1 + c)t . (5.50)
The above derivation is a rough approximation to the behavior in realistic cases, and it needs
to be corrected for effects like mutation. To this purpose we define two measures on schemata:
The neural network approach to parton distributions 67
1. The defining length is the distance between the furthest two fixed positions. In the
above example, the defining length in 1*1*0 is = 4.
2. The order o of a shema is the number of fixed positions it contains. In the above example,
o = 3.
With these measures, if L is the length of a chromosome (that is, the number of parameters
Npar of the function we are minimizing) the following bound can be derived
f(S) (S)
nS (t + 1) nS (t) 1 o (S) pm . (5.51)
ftot L1
The first correction includes the effect of crossover and the second implements the effects of
mutations, since in a schema of order o, there is a probability (1 pm )o (1 opm ) that the
schema survives mutation, where pm = 1/Npar is the probability to select a bit at random out
out the total Npar bits of a chromosome chain. A consequence of Eq. 5.51 is that short, low-
order schemata of high fitness are the building blocks towards a solution of the problem. During
a run of the genetic algorithms, the selection operator ensures that building blocks associated
with fitter individuals propagate through the population. It can be shown [112] that in a
3
population of size Ntot , approximately O(Ntot ) schemata are processed in each generation.
The basic genetic algorithms that we have introduced above can be extended in many ways to
address specific problems. Several improvements over this basic version of the algorithm have
been implemented in the course of this thesis. The first one is the introduction of multiple
mutations Nmut 2. This is helpful to avoid local minima, thereby increasing the speed of
training. It is crucial that rates for these additional mutations are large, in order to allow for
jumps from a local minimum to a deeper one. That is, after the first mutation is performed,
additional mutations are performed
1
al = al + p r , p = 2, . . . , Nmut , (5.52)
2
with mutation rate p and probability Pp . Each time a new mutation is performed, a different
bit al of the chromosome chain is selected at random.
Second, we have introduced a probabilistic selection for the mutated chromosome chains, in-
stead of the basic deterministic selection. This is helpful in allowing for mutations which only
become beneficial after a combination of several individual mutations, and allows for a more
efficient exploration of the whole space of minima. Once one has the mutated population of
Ntot individuals, instead of selecting the Nchain individuals with smallest 2 , one selects first
the chromosome with smallest 2 , with value 2 (a1 ), and then selects the remaining Nchain
individuals according to the probability
!
2
2 (ai ) 2 (a1 )
Pi (ai ) = exp , i = 2, . . . , Ntot , (5.53)
T
where T is the temperature of the system, in analogy with the Metropolis algorithm in Monte
Carlo simulations [113]. At high temperatures even the chromosomes with bad fitness have
some probability to propagate to the next generation, while at low temperature one recovers
the deterministic selection rule. With this selection criteria, individuals with accidentally bad
fitness but with relevant schemata can propagate to the next generations. Of course, after
a number of generations only the individuals with good fitness will propagate through the
generations.
Conjugate Gradient
The neural network approach to parton distributions 68
Conjugate gradient minimization exploits in an efficient way the information on both the
function to minimize 2 (a) and its gradient, 2 (a), where as in Eq. 5.45 a stands for the
set of weights and thresholds that characterize the neural networks. Starting with an arbitrary
initial vector, g0 and letting h0 = g0 , the conjugate gradient method constructs two sequences
of vectors from the recurrence
where we have assumed that the function to be minimized can be approximated by a quadratic
form
1
2 (a) C B a + a A a + O a3 , (5.55)
2
and the scalars are defined as
gi gi
i = , (5.56)
hi A hi
gi+1 gi+1
i = , (5.57)
gi gi
and the dimension of each of the above vectors is Npar , the size of the parameter space. With
these definitions the following orthogonality and conjugacy conditions hold,
gi gj = 0, hi A hj = 0, gi hj = 0, j<i. (5.58)
If one knew the Hessian matrix A in Eq. 5.56, the conjugate gradient method would take us
to the minimum of 2 (a), but in general this Hessian matrix is not available. However, it can
be proven the following property: suppose we happen to have gi = 2 (ai ) at some point
ai of the parameter space. Now we proceed from the point ai along the direction hi to the
local minimum of 2 located at the point ai+1 and then set gi+1 = 2 (ai+1 ). Then the
vector gi+1 constructed this way is the same that the one that would have been constructed
from Eq. 5.54 if the Hessian matrix had been known.
Once we have constructed a set of conjugate directions, the minimization of the function 2 (a)
is straightforward: starting from an initial point in the parameter space a1 , one has to find
the quantity k that minimizes
2 (ak + k gk ) (5.59)
and then one sets
ak+1 = ak + k gk (5.60)
and this procedure is repeated from k = 1 to k = Npar . If the function to be minimized
was a quadratic form, conjugate gradient minimization would find the minimum in a single
iterations. Obviously, for real functions the length of the minimization is typically larger.
Conjugate gradient minimization is also a suitable minimization algorithm for nonlinear error
functions, and the main reason for its efficiency is the optimal use of the information contained
in the gradient of the function to be minimized 2 .
Once we have described the different minimization algorithms that will be used for neural network
training, we discuss the weighted training procedure. The essential idea about weighted training
is that during the neural network training some experimental points are given more weight in the
error function than others. This is useful for example to learn in a more efficient way those data
points with a smaller error. In weighted training, one separates the Ndat data points into Nsets
sets, each with Nl , l = 1, . . . , Nsets data points. A typical partition is that each set corresponds
to each different experiment incorporated in the fit, Nsets = Nexp , but arbitrary partitions can be
The neural network approach to parton distributions 69
considered, like different kinematical regions. The cross-correlations between points corresponding
to different sets are neglected. The weighted error function that is minimized is then
Nsets
1 X
2minim = Nl zl 2l , (5.61)
Ndat
l=1
where zl is the relative weight assigned to each of the sets and 2l is the error function, either Eq.
5.33 or 5.34, for the data points that belong to the l-th set. There are several ways to select the
relative weights zl of each experiment. A possible parametrization of the values of zl is
bl
2l
z l = al , (5.62)
2max
where 2max = maxl {2l } and where al , bl are to be determined in a case-by-case basis. If bl = 0 the
relative weights of each set is kept fixed during the training, and if bl 6= 0 the relative weights can
be dynamically adjusted during the training so that sets with larger 2 have associated a higher
relative weight zl . Note that the final 2 , Eq. 5.76, is the standard unweighted one, and that the
minimization of 2minim during the neural network training training is only a useful strategy to obtain
a more even distribution of 2l among the different data sets. In Fig. 5.3 we show with an example
of Section 6.4.2, the parametrization of the nonsinglet parton distribution qN S (x, Q20 ), how weighted
training helps in obtaining more similar values for the 2 of the different experiments incorporated
in the fit.
Figure 5.3:
A comparison of a training to experimental data with and without weighted training (WT). As can be seen
in the figure, at the end of the neural network training, the 2 of the two experiments incorporated in the
fit (BCDMS and NMC, see Section 6.4.2) are more similar in the case of weighted training than in the case
of unweighted training.
An important issue to optimize the efficiency of the neural network training is the choice of
the optimal architecture or topology of the neural network. The choice of the architecture of the
neural network (the number of layers L and the number of neurons nl in each layer) cannot be
derived from general rules and must be tailored to each specific problem. An essential condition is
The neural network approach to parton distributions 70
that the neural network has to be redundant, that is, it has to have a larger number of parameters
than the minimum number required to satisfactorily fit the patterns we want to learn, in this case
experimental data. However, the architecture cannot be arbitrarily large because then the training
length becomes very large. A suitable criterion to choose the optimal architecture is to select the
architecture which is next to the first stable architecture, that is, the first architecture that can fit
the data and gives the same fit than an architecture with one less neuron. This way one is confident
that the neural network is redundant for the problem under consideration.
A more systematic approach to determine the optimal architecture of a neural network which
parametrizes a function F is related to how many terms are needed in an expansion of F in terms
of the activation function g. Starting from a large neural network, we can reduce the architecture
up to the optimal size with the weight decay approach, in which weights which are rarely updated
are allowed to decay according to
(l) 2 (l)
ij = (l)
ij , (5.63)
ij
where is the decay parameter, typically a small number. This corresponds to adding an extra
complexity term to the error function [114],
X (l)2
2 2 + ij , (5.64)
2
i,j,l
that is, larger weights lead to larger contributions in the error function. A more advanced complexity
terms is
(l)2
X ij /02
2 2 + (l)2
, (5.65)
2
i,j,l 1 + ij /0
which again penalizes those weights whose absolute value is larger. With this technique, the net-
work gets forced to only contain the weights that are really needed to represent the problem under
consideration.
For each training, the parameters that define the behavior of the neural networks, its weights
and thresholds, are initialized at random. To explore in a more efficient way the space of minima,
it has been checked to be specially useful to initialize randomly not only the values of the neural
network parameters but also the same range in which these parameters are initialized. That is, the
parameter initialization range is determined at random between
[ hi , hi + ] , (5.66)
and
[ hi + , hi ] , (5.67)
where hi is the average value of the neural network parameters and is the associated variance.
It has to be taken into account that a minimization strategy is defined not only by an algorithm
to vary parameters so that a given quantity is minimized, but also by a convergence condition
that determines when the minimization is stopped. Several stopping criteria, each with its own
advantages and drawbacks, have been considered during the course of this thesis.
A first criterion is the dynamical stopping of the training. With this criterion, for each replica,
we stop the training either when the condition E (k) 2stop is satisfied or, if the previous condition
cannot the fulfilled, when the maximum number of iterations of the minimization algorithm Nmax
is reached. That is, the length of the neural network training is different for each replica. The
value of the 2stop parameter is determined by the value of the total 2 , Eq. 5.76, that defines the
parametrization. The dynamical stopping of the training allows to avoid both overlearning and
insufficient training, as discussed in detail below.
The neural network approach to parton distributions 71
The method to define the optimal value for Nmax is the following: first perform a training with
a very large value for Nmax . For N rep replicas, the condition E (k) 2stop will not be satisfied due
(k)
to statistical fluctuations in the generation of the data replicas. If ENmax is the final value of the
error function at the end of the training of the k-th replica, which has not reached the dynamical
stopping criterion, then determine for each replica which was the iteration it for which the condition
(k) (k) (k)
E (k) 1 + r2 ENmax , Nit Nmax , (5.68)
Nit
was verified, were r2 is the required tolerance. The new and optimal value for Nmax is determined
then as the average value of such iterations.
N rep
1 X (k)
Nmax = N , (5.69)
N rep k=1 it
where the sum is over those replicas which have not been dynamically stopped. With dynamical
stopping of the training, the final distribution of error functions E (k) is even and there is no problem
of overlearning, as will be discussed in brief.
An alternative criterion to stop the neural network minimization is fixing the maximum number
of iterations of the minimization algorithm to a value large enough so that once is sure that within
a given tolerance the fit has converged. However, this approach does not take into account that the
length of the training of each replica depends on each precise replica, so it is not computationally
efficient and might lead to overlearning.
The rigorous statistical method to determine the value of 2stop with dynamical stopping of the
training is the so called overlearning criterion. Since the number of parameters of the neural network
parametrization is large, in principle, without the presence of inconsistent data, the final 2 could
be lowered to arbitrarily low values for a large enough neural network. The overlearning criterion
to determine the length of the training states that the training should be stopped when the neural
network begins to overlearn, that is, it begins to follow the statistical fluctuations of the experimental
data rather than the underlying law. The onset of overlearning can be determined by separating the
data set into two disjoint sets, called the training set and the validation set. One then minimizes
the error function, Eq. 5.34, computed only with the data points of the training set, and analyzes
the dependence of the error function of the validation set as a function of the number of iterations.
Then one computes the total 2 , Eq. 5.76, for both the training, 2tr , and validation, 2val ,
subsets. It might turn out that statistical fluctuations are large, and one has to average over a large
enough number of partitions to obtain stable results. The onset of overlearning is determined as the
point in the neural network training such that the 2val of the validation set saturates or even rises
while the 2tr of the training set is still decreasing. This implies that the neural network is learning
only statistical fluctuations, and signals the point where the training should be stopped.
A drawback of this approach is that one needs to assume that the training subset reproduces
the main features of the full set of data points. While this is the case in global fits of parton
distributions, where in each kinematical region there are several overlapping measurements, in other
physical situations experimental data is much more scarce and the overlearning criterion cannot be
used to determine the length of the training.
In Fig. 5.4 we show the expected behavior for the onset of overlearning. One can see that
while the 2 of the validation set begins to increase, the 2 of the training set is still decreasing.
This point signals the onset of overlearning, that is, the fact that the neural network is learning
statistical fluctuations rather than the underlying law in the experimental data. Note also that for
some problems, like for example those in which the data points are very dense, overlearning could
not be possible.
An alternative criterion to determine the optimal 2 which defines the neural network training
is the so-called leave-one-out strategy [115]. In this strategy, out of the total Ndat data points,
The neural network approach to parton distributions 72
Figure 5.4:
An example of a neural network learning process with overlearning. The point where the 2 as computed
with the validation set begins to rise while the 2 of the training set is still decreasing is the sign of this
overlearning.
one selects one data point at random and leaves it out of the training of all the remaining points.
One computes then the prediction of the neural network for this point that has been left out. This
procedure is repeated over all the data points, and the total 2 as computed with the predictions
for those points that have been left out is the average 2 for a pure neural network prediction for a
point in the same range. If the underlying law followed by the data points is clear, then the 2 of a
prediction can be as good as the total 2 of the trained data points, and allows to determine which
is the optimal value of the total 2 to aim for in the neural network training. The conditions for the
leave-one-out strategy can be modified to take into account different types of correlations between
the data points, for example choosing sets of independent points at random instead of single data
points.
requirements,
N
X con h h i i2
2(k) 2(k) (net)(k) (theo)
tot = dat + i Fi Fi Fi , (5.70)
i=1
2(k)
where dat is the contribution from the experimental errors, Eqns. 5.33 or 5.34, and where the
Lagrange multipliers i are computed via a stability analysis. They have to be such that the
constraints are imposed with the required accuracy but at the same time the space of minima
2(k) 2(k)
of tot has to be close to that of the data error function, dat .
2. Pseudo-data:
If the constraints to be imposed of the function F are local, they can be implemented as if
they were data points. That is, if the theoretical constraint is
F (x1 , x2 , . . . , xj , . . .) = bj , (5.71)
1 h (net)(k) i2
N
X con
2(k) 2(k)
tot = dat + F (x1 , x2 , . . . , xj , . . .) b j , (5.72)
j=1
j2
where j is the accuracy with which the corresponding constraint must be satisfied. This
approach differs from the Lagrange multiplier approach in that the accuracy to which the
constraints are to be satisfied is fixed by several requirements, for example, by requiring the
error i to be a fixed fraction of the average error of the experimental data points. Note that
this is equivalent to adding to the experimental data set artificial data points with central
value F (x1 , x2 , . . . , xj , . . .) = bj and total uncorrelated error j . The addition of artificial
data points is helpful to constrain the shape of the parametrized function in regions without
experimental data, for example near kinematical thresholds.
3. Hard-wired parametrization:
Another method is to hard-wire the constraint in the neural network parametrization. This
method is specially useful to implement the vanishing of the function for some kinematical
region. If we have
F (x1 , x2 , . . . , xj , . . .) = 0 , (5.73)
then we redefine the function to be parametrized to be
N
Y con
nj (net)(k)
F (net)(k) (x1 , x2 , . . .) (xj xj ) F (x1 , x2 , . . .) , (5.74)
j=1
(net)
where now is the function F the one that is going to be parametrized with a neural
network. The introduction of a partial functional form dependence in the parametrization
does not mean that there is any functional form bias, since one can check the the results of the
fit do not depend on the values of nj , provided they verify nj > 0. The default values of the
nj parameters have to be determined from a stability analysis to assess which is the optimal
neural network training preprocessing.
All of the above techniques can be combined for a given parametrization. Using any of these
techniques, it has to be shown that in each case the result of the fit is modified as expected by the
implementation of the kinematical constraints. For example, the dominant contribution to 2tot has
to be always that of the experimental data points.
The neural network approach to parton distributions 74
just as with standard probability measures. We define now such estimators, that can be divided
into two categories: statistical estimators to assess how well the sample of trained neural networks
reproduce the features of the experimental data, and statistical estimators which assess the stability
of a given fit with respect some parameters, for example the number of trained neural networks.
where now the total experimental covariance matrix Eq. 5.6 includes the contribution from the
normalization uncertainties. If experimental errors were correctly estimated and there were no
incompatibilities between measurements, on statistical grounds one would expect 2 1, with
deviations from this value scaling with the number of data points as 1/ Ndat . To be precise,
instead of Ndat one should have the number of degrees of freedom Ndof , the difference between the
number of data points and the number of free parameters p of the theoretical model Npar , and the
standard deviation of the 2 distribution is given by 2 = 2/Ndof .
Another important estimator is the average error,
Nrep
1 X (k)
hEi = E , (5.77)
Nrep
k=1
where E (k) is either Eq. 5.33 or Eq. 5.34, depending on the estimator which has been used in the
neural network training. It is instructive to estimate the typical values that the average error Eq.
5.77 can take. We will compute now Eq. 5.77 in a simplified model, in which correlated systematics
are neglected. Let us consider a set of measurement mi of F . Then if one assumes that experimental
measurements are distributed gaussianly around the true value ti of the observable, one has
mi = ti + i si , (5.78)
where i is the total error and si a univariate zero mean gaussian number. The k-th replica of
generated artificial data g (k) is as in Eq. 5.3 without correlated uncertainties,
(k) (k) (k)
gi = mi + ri i = ti + (si + ri )i , (5.79)
where rik is another univariate zero mean gaussian random number. Now let us assume that the
best fit neural networks are distributed around the true values of the observable F with error i .
For the k-th neural network n(k) we will have
(k) (k)
ni = ti + li i , (5.80)
The neural network approach to parton distributions 75
(k) (k)
where li are highly correlated to both ri and ti . For a large enough set of measurements, and a
(k)
large enough number of generated replicas, so that correlations between si and ri can be neglected,
it can be seen that the average error is given by,
2
hEirep = 2 + 2 hrli rep , (5.81)
2 dat dat
where the error function is the diagonal error, Eq. 5.33. Note also that within the same model, the
error function of the k-th net as compared to the experimental measurement, defined as
2
(k)
NXdat m n
e (k) = 1
i i
E 2 2 , (5.82)
Ndat i=1 stat,i
has as average
D E 2
e
E =1+ 1, (5.83)
rep 2 dat
and finally in the case that the neuralD Enetwork prediction coincides with the true value of the
function, i = 0, the average error is E = 1, just as expected from textbook statistics.
rep
Now we define the estimators which are used to assess how well the sample of trained neural
networks reproduce the sample of experimental data. These estimators are
Average for each experimental point
D E Nrep
(net) 1 X (net)(k)
Fi = Fi . (5.84)
rep Nrep
k=1
Associated variance s
2 D E2
(net) (net) (net)
i = Fi Fi . (5.85)
rep rep
Associated covariance
D E D E D E
(net) (net) (net) (net)
Fi Fj Fi Fj
(net) rep rep rep
ij = (net) (net)
. (5.86)
i j
(net) (net) (net) (net)
covij = ij i j . (5.87)
As in the case of the artificial replicas, the three above quantities provide the estimators of the
experimental central values, errors and correlations as computed from the sample of trained
neural networks.
Mean variance and percentage error on central values over the Ndat data points.
D E Ndat D E 2
1 X (net) (exp)
V F (net) = Fi Fi , (5.88)
rep dat Ndat i=1 rep
D E
D (net) (exp)
E N
X dat Fi Fi
1 rep
PE F (net) = (exp) . (5.89)
rep dat Ndat i=1 Fi
We define analogously the mean variance and the percentage error for errors, correlations and
covariances.
The neural network approach to parton distributions 76
Scatter correlation
D
E
D
E
h i F (exp) F (net) rep F (exp) dat F (net) rep
r F (net) = dat
(exp) (net)
dat
, (5.90)
s s
(net)
where the scatter variance s associated to the neural network prediction is defined as
s
2 D
E 2
(net)
s = F (net) rep F (net) rep . (5.91)
dat dat
D E Nrep
e = 1 X e (k)
E E , (5.93)
Nrep
k=1
Ndat
e (k) = 1 X (net)(k) (exp) 1 (net)(k) (exp)
E Fi Fi cov(k) ij Fj Fj . (5.94)
Ndat i,j=1
The Rratio is interesting since it provides an estimator which is capable to determine whether
or not when error reduction is observed it is due to the fact that the neural network by
combining data points has found the underlying law or whether in the other hand it is due to
artificial smoothing. To verify this property, assume that correlated systematic uncertainties
can be neglected, as we have done in the example above, then the estimator reads
D 2E
1 + 2
R=
2 D dat E , (5.95)
2 + 2 dat 2 hrlirep
dat
so that if error reduction is due to the fact that the neural network has found the underlying
law that describes the experimental data, one will have and therefore the R-ratio will
satisfy R 1/2.
Note that even if the covariance matrix is determined from the correlation matrix and the total
error from Eq. 5.7, the reproduction of covariances by the probability measure has to be validated
independently of correlations and error. This is so because even if it is clear that when correlations
and errors are correctly reproduced, also covariances will be reproduced, the inverse is not necessarily
true.
for the sample of trained networks. The estimators in the above section were used to compare the
results of either the replica generation or the neural network fit to experimental data. In this section
we want to define statistical estimators which allow us to compare a given fit with another fit. Since
in practical applications we will have a reference fit, and compare other fits with it, let us label the
first fit as the reference fit and is denoted by the superscript (ref) and the second fit as the current
fit, denoted by the superscript (fit).
These estimators are divided in estimators for central values, standard deviations and correla-
tions. The estimators are computed for a set of N edat points which need not to be the same points
where there is experimental data. In particular one can compute these estimators in the extrapola-
tion region, to check the stability of the fit also in the region where there is no experimental data.
These estimators are given by:
Central values:
Relative error:
D E D E D E D E
edat (ref) (fit) * (ref) (fit) +
1
N
X Fi Fi Fi Fi
2 r
hRE [F ]idat = 2 r h i r h i h i r h i ,
e
Ndat (ref) (fit) (ref) (fit)
i=1 V Fi + V Fi V Fi + V Fi
dat
(5.96)
Scatter correlation:
(fit) (ref)
F F dat
F (fit) dat F (ref) dat
r [hF i] = q
2 q
2 . (5.97)
F (fit)2 dat F (fit) dat F (ref)2 dat F (ref) dat
Standard deviations:
Relative error:
edat
N (ref)2 (fit)2
*
(ref)2 (fit)2
+
1 X i i i i
RE 2 = 2 r h i r h i = 2 r h i r h i ,
e
Ndat i=1 (fit) (ref) (fit) (ref)
V i + V i V i + V i
dat
(5.98)
Scatter correlation:
(fit) (ref)
dat
(fit) dat (ref) dat
r [] = q
2 q
2 . (5.99)
(fit)2 dat (fit) dat (ref)2 dat (ref) dat
Correlations:
Relative error:
edat
edat N
1 N
X X (ref)
ij ij
(fit)
hRE []i = 2 r h i r h i
e e
Ndat Ndat + 1 /2 i=1 j=i (fit) (ref)
V ij + V ij
* +
(ref)
ij ij
(fit)
2 r h i r h i , (5.100)
(fit) (ref)
V ij + V ij
dat
The neural network approach to parton distributions 78
Scatter correlation:
(fit) (ref)
dat
(fit) dat (ref) dat
r [] = q
2 q
2 . (5.101)
(fit)2 dat (fit) dat (ref)2 dat (ref) dat
were averages and variances of central values, errors and correlations can be computed using the
expressions in Appendix A.1. Two
probability
2 measures are said to be equivalent to each other if
the conditions hRE [F ]idat < 1, RE dat
< 1 and hRE []idat < 1 are fulfilled. For the scatter
correlations one expects for two compatible probability measures the condition r < 1 to hold. Note
e
that the Ndat points where the stability estimators defined above are computed do not need to be
those were there is experimental data, for instance, these estimators can be used to compute the
stability of a probability measure also in the extrapolation region. Since fluctuations might be large,
it might be needed to average over different partitions of the sets of trained neural networks to
obtain stable results.
Another relevant estimator is the so called pull. The average pull of two different fits is defined
as D E D E
(ref) (fit)
1
N
X dat N
X dat Fi F i
hP idat = = Pi = q , (5.102)
Ndat (fit)2 (ref)2
i=1 i=1 i + i
For two fits to be consistent within the respective uncertainties the condition Pi < 1 should be
satisfied. Note that the condition Pi 1 is necessary for two fits to be compatible within errors, but
it is not sufficient for two fits to define the same probability measure, since in particular, if either
(fit) or (ref) is much larger than the other error, then the two fits will be very different yet the pull
will still satisfy Pi 1.
Finally, there are other conditions that must be verified for the sample of trained neural networks
for a given fit to be considered as satisfactory. These conditions are related to the criteria used to
stop the minimization of the individual replicas. If we use dynamical stopping of the training, as
described in Section 5.2.2, the distribution of errors E (k) at the end of the training over the trained
replica sample must be peaked around the average result hEi, Eq. 5.77, because the opposite case
would mean that the averaged result is obtained combining good fits with bad fits (in the sense of
fits with large values of E (k) ). Another estimator of the consistency of the results is the distribution
of training length, where the training length measures the length of the minimization procedure,
cannot be peaked near Nmax , the maximum number of iterations, because if it is this means that
the dynamical stopping of the minimization is not being effective, and one has instead fixed training
length minimization. In Section 6.3 we show how these two conditions are satisfied in a particular
example.
The neural network approach to parton distributions 79
Chapter 6
In this Chapter we review four applications of the general strategy to parametrize experimental
data that has been described in the previous Chapter. First we describe a parametrization of the
vector-axial vector spectral function V A (s) from the hadronic decays of the tau lepton which
incorporates theoretical information in the form of sum rules, with the motivation of extracting
from this parametrization the values of the nonperturbative vacuum condensates. Then we present
a parametrization of the proton structure function F2p (x, Q2 ), which updates the results of Ref. [11]
by including the data from the HERA experiments. This parametrization is not only interesting as a
new application of the general technique, but also it allows us to develop the necessary techniques to
apply our general strategy to problems with data coming from many different experiments. The third
application is a parametrization of the lepton energy distribution d(El )/dEl in the semileptonic
decays of the B meson. As a byproduct of this parametrization we will provide a determination of
the b quark mass mb (mb ).
We devote special attention to the last and most important application, which is the main motiva-
tion for the development of the techniques described in Chapter 5: the neural network parametriza-
tion of parton distributions. To this purpose a new strategy to solve the DGLAP evolution equations
is introduced. We then review the parametrization of the non-singlet parton distribution qN S (x, Q20 )
from experimental data on the non-singlet structure function F2N S (x, Q2 ), and we devote special at-
tention to the comparison of our results with those of the standard approach to parton distributions
described in Section 4.2.
80
The neural network approach to parton distributions 81
and correlations, together with information from theoretical constraints: the chiral sum rules, Eqns.
4.62-4.65 and the asymptotic vanishing of the spectral function in the perturbative s limit,
Eq. 4.56. This way we will obtain a determination of the QCD vacuum condensates which is
unbiased with respect to the parametrization of the spectral function and with faithful estimation
and propagation of the experimental uncertainties. All sources of uncertainty related to the method
of analysis are kept under control, and their contribution to the total error in the extraction of the
condensates is estimated, as discussed in [1].
Since the relevant spectral function for the determination of the condensates is the V A (s)
spectral function, we need a simultaneous measurement of the vector and axial-vector spectral
functions from the hadronic decays of the tau lepton. Data from the ALEPH Collaboration [116]
and from the OPAL Collaboration [117] will be used1 , which provide a simultaneous determination
of the vector and axial vector spectral functions in the same kinematic region and also provide the
full set of correlated uncertainties for these measurements. There exists additional data on spectral
functions coming from electron-positron annihilation, but their quality is lower than the data from
hadronic tau decays and will not be incorporated to the present analysis.
Experimental data does not consist on the spectral function directly, but rather on the invariant
mass-squared spectra for both the vector and axial-vector components, that are related to the
spectral functions by a kinematic factor and a branching ratio normalization,
1
M2 B( V /A ) 1 dNV /A(s) s 2s
V /A (s) 1 2 1 2 , s M2 .
6|Vud | SEW B( e e ) NV /A
2 ds M M
(6.1)
In the above equation,
1 dNV /A (s)
, (6.2)
NV /A ds
is the normalized invariant mass distribution, M is the tau lepton mass, s is the invariant mass of
the hadronic final state and SEW are the electroweak radiative corrections.
(exp)
In the following V A,i will denote the i-th data point,
(exp)
V A,i = V A (si ) V (si ) A (si ) , i = 1, . . . , Ndat , (6.3)
where Ndat is the number of available data points. Fig. 6.1 shows the experimental data used in the
present analysis from the two LEP experiments. Note that errors are small in the low and middle
s regions and that they become larger as we approach the tau mass threshold. The last points are
almost zero in the invariant mass spectrum, since near threshold the reduced phase space implies a
lack of statistics, and are only enhanced in the spectral functions due to the large kinematic factor
for s near M2 (the last term in Eq. 6.1), so special care must be taken with the physical significance
of these points.
It is clear that the asymptotic vanishing of the spectral function,
implied by perturbative QCD is not reached for s M2 , at least for the central experimental
values, and therefore should be enforced artificially on the parametrization of the V A (s) spectral
function that we will construct. The method that we use takes advantage of the smooth, unbiased
interpolation capability of neural networks: artificial points are added to the data set with adjusted
1 In this analysis we used the ALEPH data as presented in Ref. [116]. Recently, the ALEPH collaboration released
their final results on hadronic spectral functions and branching fractions of the hadronic tau decays [118], which have
reduced uncertainties due to the larger statistics, specially in the large s region. It would be interesting to repeat the
present analysis with this updated experimental data on the V A (s) spectral function, as has been done with other
physical applications [65].
The neural network approach to parton distributions 82
2.5 2.5
2 2
1.5 1.5
v_1(s)-a_1(s)
v_1(s)-a_1(s)
1 1
0.5 0.5
0 0
-0.5 -0.5
Figure 6.1:
Experimental data for V A (s) spectral function from the ALEPH (left) and OPAL (right) collaborations.
Note that errors increase as we approach the kinematic threshold at s = M2 .
V A (s)
Nrep 10 100 1000
r[V A (s)(art) ] 0.98 0.99 0.99
r[ (art) ] 0.98 0.99 0.99
r[(art) ] 0.61 0.95 0.99
Table 6.1:
Comparison between experimental data the Monte Carlo sample of artifical replicas for the V A (s) spectral
function.
errors in a region where s is high enough that the V A (s) spectral function should vanish, as
discussed in Section 5.2.4. That is, we add to the experimental data a set of artificial points,
with error i , and where Ntot is the total number of points (data and artificial). Once these artificial
points are included, the neural network will smoothly interpolate between the real and artificial data
points, also taking into account the constraints of the sum rules, as explained below.
As has been described in Chapter 5, the first step in the parametrization of the spectral function
V A (s) is the generation of a Monte Carlo sample of replicas of the experimental data. The
experimental data for the invariant hadronic mass spectrum consist on the central values, the total
error and the correlations between different invariant mass bins. Therefore, to generate replicas of
the experimental data, we use Eq. 5.13, which in the present case reads
(art)(k) (exp) (k)
V A,i = V A,i + ri tot,i , k = 1, . . . , Nrep , (6.6)
(k)
where the gaussian random numbers ri have the same correlation as experimental data. The
replica generation has the statistical estimators that can be seen in Table 6.1. It can be seen that a
Monte Carlo sample with Nrep = 1000 is required to have scatter correlations larger than 0.99 for
both central values, errors and correlations.
Once the Monte Carlo sample of replicas has been generated, following the method described
above, one has to train a neural network in each replica. The estimator which will be minimized in
The neural network approach to parton distributions 83
the present case is the diagonal error Eq. 5.33, which reads
2
(art)(k) (net)(k)
1
N
X tot V A,i V A,i
(k)
E2 = 2 , k = 1, . . . , Nrep , (6.7)
Ntot stat,i
k=1
(net)(k)
where V A,i is the k-th neural network, trained on the k-th replica of the experimental data. Note
that as explained in [11], correlations are correctly incorporated in the parametrization of V A (s)
through the Monte Carlo pseudo-data generation, Eq. 6.6. As has been described in Section 4.1.4,
the parametrization of the V A (s) spectral function has to satisfy several theoretical constraints.
These theoretical constraints, the chiral sum rules, are implemented using the Lagrange Multipliers
technique, as described in Section 5.2.4. Therefore the total error function to be minimized will be
N
X con 2
(k) (k) (net)(k) (theo)
Etot = E2 + i Fi V A (s) Fi , (6.8)
i=1
where for example, for the first theoretical constraint, the DMO sum rule, Eq. 4.62, one has
h i Z (net)(k)
(net)(k) 1 (s) (theo) f r2
F1 V A (s) = 2 ds V A , Fi = FA , (6.9)
4 0 s 3
and similarly for the remaining chiral sum rules, up to Ncon = 4. These sum rules act as constraints
(k)
on the neural network output, that is, the main contribution to the error function Etot (which
(k)
determines the learning of the network) still comes from the experimental errors, that is, the E2
term, and the sum rules are only relevant in the region where the errors are larger. The relative
weights of the chiral sum rules i are determined according to a stability analysis, as discussed in
[1].
(0)
The length of the training is fixed by studying the behavior of the error function Etot for the
(0)
neural net fitted to the central experimental values, and asking that Etot stabilizes to a value close to
one, which on statistical grounds in the value expected for a correct fit. The minimization algorithm
that is used for the neural network training is Genetic Algorithms, introduced in Section 5.2.3, which
is required since the total error function to be minimized, Eq. 6.8, depends non-linearly with the
V A (s) spectral function through the convolutions of the chiral sum rules.
With the strategy discussed above, Nrep neural networks are trained on the Nrep Monte Carlo
replicas of the experimental data for the spectral function V A (s). As has been described in Chapter
5, once the probability measure in the space of spectral functions P [V A ] has been constructed,
it is crucial to validate the results with suitable statistical estimators. A number of checks is then
performed in order to be sure that an unbiased representation of the probability density has been
obtained. The values for the scatter correlations for central values, errors and correlations are
presented in Table 6.2. It is seen that the central values and the correlations are well reproduced,
whereas this is not the case for the total errors. The average standard deviation for each data point
computed from the Monte Carlo sample of nets is substantially smaller than the experimental error.
This is due to the fact that the network is combining the information from different data points
by capturing and underlying law. This effect is enhanced by the inclusion of sum rules constraints.
All networks have to fulfill these constraints which forces the fit to behave smoothly in a region
where experimental data errors are very large. This should be understood as a success of the fitting
procedure.
The constructed probability measure for the V A (s) spectral function has built-in the theoretical
constraints for the chiral sum rules. For example, it can be checked that the two Weinberg chiral sum
rules are well verified by our neural network parametrization, and thus have been incorporated to the
information contained on the experimental data. This fact will be crucial because different extraction
The neural network approach to parton distributions 84
V A (s)
Nrep 10 100
r[V A (s)(net) ] 0.98 0.98
r[ (net) ] -0.21 -0.20
(net)
r[ ] 0.80 0.85
Table 6.2:
Comparison between experimental data and the averages computed from the sample of trained neural net-
works for the V A (s) spectral function.
methods, differing in combinations of these chiral sum rules, can be shown to be equivalent in the
asymptotic region s0 . In Fig. 6.2 the two Weinberg sum rules, Eqs. 4.63 and 4.64 evaluated
with the neural network parametrization of the spectral function V A (s) are represented. Both
chiral sum rules are well verified in the asymptotic region, beyond the range of available experimental
data. This also will ensure the stability of the evaluation of the condensates with respect to the
specific value of s0 chosen as long as it stays in the asymptotic region.
0.02 0.012
SR2 SR3
Exact result Exact result
0.018 0.01
0.016
0.008
0.014
Second Weinberg chiral sum rule
0.006
First Weinberg chiral sum rule
0.012
0.004
0.01
0.002
0.008
0
0.006
-0.002
0.004
-0.004
0.002
0 -0.006
-0.002 -0.008
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
s_0 [GeV^2] s_0 [GeV^2]
Figure 6.2:
The two Weinberg chiral sum rules, Eqs. 4.63 and 4.64, evaluated from the neural network parametrization
of V A (s). Only central values are shown.
Using the neural network parametrization of the V A spectral function, we can compute any
given sum rule with associated uncertainties. Because the neural parametrization retains all the
experimental information, we can view values coming from the neural networks as direct experimental
determinations of convolutions of the spectral function V A (s). The value of the condensates
hO6 i , hO8 i and higher dimensional condensates are then extracted from the value of an appropriate
sum rule, to be discussed in brief. The method we will follow is the evaluation of the vacuum
condensates as a function of the upper limit of integration for each replica and compute the mean
and standard deviation. As has been explained before, it is crucial to represent the value of the
different sum rules as a function of the upper limit of integration, to check both its convergence and
its stability.
Once the neural network parametrization of the V A (s) spectral function has been constructed
and validated, we can use it to determine the nonperturbative chiral vacuum condensates. These
condensates can be determined by virtue of the dispersion relation from another sum rule, that is, a
convolution of the V A (s) spectral function with an appropriate weight function. As discussed in
The neural network approach to parton distributions 85
Section 4.1.4, we define the operator product expansion of the chiral correlator in the following way
X
1
X 1
(Q2 )|V A = C
2n+4 2n+4
(Q 2
, 2
) O 2n+4 (2
) 2n+4
hO2n+4 i . (6.10)
n=1
Q n=1
Q
The Wilson coefficients, including radiative corrections, are absorbed into the nonperturbative vac-
uum expectation values, to facilitate comparison with the current literature. As has been explained
in Sect. 4.1.4 the analytic structure of the chiral correlator implies that it has to obey the dispersion
relation,
Z Z
1 1 1X sn
V A (Q2 ) = ds ImV A (s) = ds ImV A (s) . (6.11)
0 s + Q2 n=0 0 Q2n+2
Recalling that the imaginary part of the chiral correlator is proportional to the spectral function,
1
ImV A (s) = V A (s) , (6.12)
2
comparing terms of the same order in the 1/Q2 expansion in Eq. 6.10 and in Eq. 6.11, it can be
seen that condensates of arbitrary dimension are given by
Z s0
n 1
hO2n+2 i = (1) dss2 2 V A (s) , n 2 , (6.13)
0 2
which, if the asymptotic regime has reached, should be independent of the upper integration limit
for large enough s0 .
As long as all previous integrals have to be cut at some finite energy s0 M2 , since no exper-
imental information on the V A (s) spectral function is available above M2 , a truncation of the
integration should be performed that competes with all other sources of statistical and systematic
errors, introducing a theoretical bias which is difficult to estimate. Many techniques have been
developed to deal with this finite energy integrals, leading to the so-called Finite Energy Sum Rules
(FESR). A detailed analysis of some alternative techniques and methods to extract the condensates
can be found in the original work [1].
Using Eq. 6.13, we can extract from our neural network parametrization the values of the
nonperturbative condensates. To be explicit, one would compute the dimension 6 condensate from
the sample of trained neural networks in the following way,
Nrep Nrep Z
1 X D (k) E 1 X s0 1 (k)(net)
hO6 i = O6 = dss2 2 V A (s) . (6.14)
Nrep rep Nrep 0 2
k=1 k=1
Stable results are obtained for the dimension six condensate hO6 i whereas higher condensates e. g.
hO8 i are less stable. Fig. 6.3 shows the outcome for hO6 i and hO8 i including the propagation of
statistical errors. It is clearly seen that convergence in the limit of integration s0 is obtained due
to the addition of sum rules and endpoints in the learning procedure. The central values for the
condensates in the asymptotic limit, that is, in the limit in which s0 , come out to be:
The value of the hO6 i is a cross-check of the validity of our treatment: not only there are strong
theoretical arguments that support the fact that hO6 i is negative [119, 120] but also all previous de-
terminations with different techniques yield negative results, being the majority of them compatible
with ours within errors.
The neural network approach to parton distributions 86
Figure 6.3:
Condensates hO6 i and hO8 i as a function of s0 . The error bands include the propagation of experimental
uncertainties.
We note that our evaluation of the condensates is compatible with some of our previous evalua-
tions and has a similar error. This is though misleading as the error quoted here is only statistical and
a discussion on systematic errors is needed. The discussion of the various sources of errors is crucial
to our treatment. The first criterion to judge the reliability of a QCD sum rule is its independence,
at large values of s from the value of the upper integration limit, that its, its saturation. We then
need to explore the values for the final condensates which are stable against the limit of integration
of the sum rule. This stability criterion is completed with demanding independence of the results
on the specific polynomial entering the sum rule. Further criteria are stability with respect to the
artificial endpoints added to the data and with respect to the relative weights in the error function
used to train the neural networks. A detailed analysis of the contributions of the different sources
of uncertainty to the values of the condensates can be found in [1]. These uncertainties include the
statistical error propagation from the experimental covariance matrix, which is the best understood
and treated error source in our analysis, the choice of the finite energy sum rule, the dependence
on the implementation of the asymptotic vanishing of the spectral functions and the dependence on
the implementation of the chiral sum rules. For example, in Fig. 6.4 we show that the final values
of the condensates do not depend on the precise finite energy sum rule used in their extraction.
Our final determination of the nonperturbative condensates including all relevant sources of
The neural network approach to parton distributions 87
0.015 0.25
Z_6a(s_0) Z_8a(s_0)
Z_6b(s_0) Z_8b(s_0)
0.01 Z_8c(s_0)
0.2
0.005
0 0.15
Dimension 6 condensate
Dimension 8 condensate
-0.005
0.1
-0.01
0.05
-0.015
-0.02 0
-0.025
-0.05
-0.03
-0.035 -0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
s_0 [GeV^2] s_0 [GeV^2]
Figure 6.4:
Determination of hO6 i and hO8 i using different finite energy sum rules, as described in Ref. [1].
Table 6.3:
Summary of different extractions of the QCD vacuum condensates. Appropriate rescaling have been per-
formed to allow the comparison of different determinations. Note that some of the above determinations
appeared after the original publication of Ref. [1].
uncertainties is
The values of the QCD nonperturbative condensates have been previously extracted from the
experimental data, with different techniques and different results, as summarized in Table 6.3. We
include also the results of works which were published after the original publication of Ref. [1]. Note
that our results agree, at least on the sign, with that of Refs. [121, 122, 123, 124, 125]. See [126, 127]
for a detailed comparison of the different methods for the extraction of the condensates.
Since the work presented in Ref. [1] was published, there have appeared several studies which
also determine the values of the higher dimensional condensates from experimental data using a wide
variety of methods and techniques [131, 126, 127, 129, 132, 133, 125, 134], from large NC methods
The neural network approach to parton distributions 88
to new sum rule approaches and even a determination inspired in higher dimensional string theories.
The spread of the results obtained for the higher dimensional vacuum condensates using different
techniques show that their determination from experimental data is still an open issue.
Summarizing, in this part of the thesis we have presented a determination a bias-free neural
network parametrization of the V A (s) spectral function, inferred from the data, which retains all
the information on experimental errors and correlations, and is supplemented with the additional
theoretical input of the chiral sum rules. As a byproduct of this analysis, we have performed an
extraction of the nonperturbative vacuum condensates hO6 i and hO8 i aimed at minimizing the
sources of theoretical bias which might be cause of concern in existing determinations of these
condensates from spectral functions. Our final results give negative central values for the dimension
6 and 8 condensates. These results take into account the propagation of statistical errors and their
correlations. Higher dimension condensates carry larger errors, although the sign of the condensates
seem to remain unaltered.
The neural network approach to parton distributions 89
100000
BCDMS
NMC
E665
ZEUS94
10000 ZEUSBPC95
ZEUSSVX95
ZEUSBPT97
ZEUS97
1000 H1SVX95
H197
H1LX97
H199
H100
100
Q2 [GeV^2]
10
0.1
0.01
1e-06 1e-05 0.0001 0.001 0.01 0.1 1
x
Figure 6.5: Kinematic range of the available experimental data for the proton structure function F2p (x, Q2 ).
sys
Experiment Ref. x range Q2 range Ndat hstat i sys N htot i hi hcovi
stat
NMC [138] 2.0 103 6.0 101 0.5 75 288 3.7 2.3 0.76 2.0 5.0 0.17 3.8
BCDMS [139, 140] 6.0 102 8.0 101 7 260 351 3.2 2.0 0.56 3.0 5.4 0.52 13.1
E665 [141] 8.0 104 6.0 101 0.2 75 91 8.7 5.2 0.67 2.0 11.0 0.21 21.7
ZEUS94 [142] 6.3 105 5.0 101 3.5 5000 188 7.9 3.5 1.04 2.0 10.2 0.12 6.4
ZEUSBPC95 [143] 2.0 106 6.0 105 0.11 0.65 34 2.9 6.6 2.38 2.0 7.6 0.61 34.1
ZEUSSVX95 [144] 1.2 105 1.9 103 0.6 17 44 3.8 5.7 1.53 1.0 7.1 0.10 4.1
ZEUS97 [145] 6.0 105 6.5 101 2.7 30000 240 5.0 3.1 0.93 3.0 6.7 0.29 7.0
ZEUSBPT97 [146] 6.0 107 1.3 103 0.045 0.65 70 2.6 3.6 1.40 1.8 4.9 0.41 8.8
H1SVX95 [147] 6.0 106 1.3 103 0.35 3.5 44 6.7 4.6 0.74 3.0 8.9 0.36 28.1
H197 [148] 3.2 103 6.5 101 150 30000 130 12.5 3.2 0.31 1.5 13.3 0.06 10.9
H1LX97 [149] 3.0 105 2.0 101 1.5 150 133 2.6 2.2 0.87 1.7 3.9 0.30 3.9
H199 [150] 2.0 103 6.5 101 150 30000 126 14.7 2.8 0.24 1.8 15.2 0.05 11.0
H100 [151] 1.3 103 6.5 101 100 30000 147 9.4 3.2 0.42 1.8 10.4 0.09 8.6
Table 6.4: Experiments included in this analysis. All values of and cov are given as percentages.
As has been discussed in Section 4.1.2, deep-inelastic structure functions are defined by parametriz-
ing the deep-inelastic neutral current scattering cross section as
d2 42 h 2 2 2
y 2
i
= xy F 1 (x, Q ) + (1 y)F2 (x, Q ) + y 1 F3 (x, Q ) . (6.18)
dxdQ2 xQ4 2
We will construct a parametrization of the structure function F2 (x, Q2 ), which provides the bulk of
the contribution to Eq. 6.18. In fact, a large number of experiments present their results in terms
of the reduced cross section,
xQ4 d2 (x, Q2 )
(x, Q2 ) , (6.19)
42 (1 + (1 y) ) dxdQ2
2
which is equal to F2 (x, Q2 ) in most of the (x, Q2 ) kinematical range. For all experiments the
longitudinal structure function FL (x, Q2 ) contribution, defined as
4M 2 x2
FL (x, Q2 ) = 1 + F2 (x, Q2 ) 2xF1 (x, Q2 ) , (6.20)
Q2
to the cross section has already been subtracted by the experimental collaborations, except for
ZEUSBPC95, where we subtracted it using the values published by the same experiment. Note
that the structure function F2 receives contributions from both and Z exchange, though the Z
contribution is only non-negligible for the high Q2 datasets ZEUS94, H197, H199 and H100. We
will construct a parametrization of the structure function F2 defined in Eq. 6.18, i.e. containing all
contributions, since it is closer to the quantity which is experimentally measured, the reduced cross
section Eq. 6.19. When the experimental collaborations provide separately the contributions to F2
due to or Z exchange we have recombined them in order to get the full F2 Eq. 6.18.
Experimental data on deep-inelastic structure functions consist on central values, statistical
errors, and the contributions from the different sources of correlated and uncorrelated uncertainties.
Uncorrelated systematic errors are added in quadrature to the statistical errors to construct the
total uncorrelated uncertainty. On top of this, for some experiments, in particular for the ZEUS94,
ZEUSSVX95 and ZEUSBPT97 experiments some uncertainties are asymmetric. For the treatment
of asymmetric uncertainties we follow the prescription discussed in Section 5.1.1.
The construction of a parametrization of F2 (x, Q2 ) according to general strategy described in
Chapter 5 consists in three steps: generation of a set of Monte Carlo replicas of the original exper-
imental data, training of a neural network to each replica and finally statistical validation of the
constructed probability measure. The Monte Carlo replicas of the original experiment are generated
as a multi-gaussian distribution: each replica is given, following Eq. 5.3, by a set of values
Nsys
(art)(k) (k) (exp)
X (k) (k)
Fi = 1 + rN N Fi + rsys,li sys,li + rt,i t,i , k = 1, . . . , Nrep , (6.21)
l=1
The neural network approach to parton distributions 91
F2p (x, Q2 )
D h
Nrep iE 10 100 1000
P E F (art) rep 1.88% 0.64% 0.20%
(art)
r F(art) 0.99919 0.99992 0.99999
V (art) dat 6.7 104 2.0 104 6.9 105
PE dat
(art) 37.21% 11.77% 3.43%
dat
0.0292 0.0317 0.0316
(art)
r
(art) 0.945 0.995 0.999
V
dat 8.1 102 7.8 103 7.3 104
(art) dat 0.3048 0.3115 0.2920
(art)
r (art) 0.696 0.951 0.995
V
cov dat 5.2 107 6.8 108 6.9 109
(art)
cov
(art)dat 0.00013 0.00018 0.00015
r cov 0.687 0.941 0.994
where the various errors are defined in Eqns. 5.4-5.8. As has been discussed before, the value of Nrep
is determined in such a way that the Monte Carlo set of replicas models faithfully the probability
distribution of the data in the original set. A comparison of expectation values, variances and
correlations of the Monte Carlo set with the corresponding input experimental values as a function
of the number of replicas is shown in Fig. 6.6, where we display scatter plots of the central values
and errors for samples of 10, 100 and 1000 replicas. The corresponding plot for correlations is
essentially the same as that shown in Ref. [11]. A more quantitative comparison is performed using
the statistical estimators as defined in Section 5.1. The results for these estimators are presented in
Table 6.5. Note in particular the scatter correlations r for central values, errors and correlations,
which indicate the size of the spread of data around a straight line. The table shows that a sample
of 1000 replicas is sufficient to ensure average scatter correlations of 99% and accuracies of a few
percent on structure functions, errors and correlations.
Nrep neural networks are then trained on the Monte Carlo replicas, by training each neural
(k)(art)
network on all the Fi data points in the k-th replica. The architecture of the networks is the
same as in Ref. [11]. The training is subdivided in three epochs, each based on the minimization
(k) (k)
of a different error function, as described in Section 5.2.3. First one minimizes E1 , then E2 and
(k)
finally correlated systematics are incorporated in the minimization of E3 . The rationale behind
this three-step procedure is that the true minimum which the fitting procedure must determine is
(k)
that of the error function with correlated systematics E3 Eq. 5.34. However, this is nonlocal and
time consuming to compute. It is therefore advantageous to look for its rough features at first, then
refine its search, and finally determine its actual location.
The minimum during the first two epochs is found using back-propagation (BP), discussed in
Section 5.2.3. This method is not suitable for the minimization of the nonlocal function Eq. 5.34.
In Ref. [11] BP was used throughout, and the third epoch was omitted. This is acceptable provided
the total systematics is small in comparison to the statistical errors, and indeed it was verified that
a good approximation to the minimum of Eq. 5.34 could be obtained from the ensemble of neural
networks. This is no longer the case for the present extended data set, as we shall see explicitly in
(k)
brief. Therefore, the full E3 Eq. 5.34 is minimized in the third training epoch by means of genetic
algorithms (GA), also discussed in Section 5.2.3.
The neural network approach to parton distributions 92
Figure 6.6: Scatter plot of experimental data versus Monte Carlo artifical data for both central values and
errors.
At the end of the GA training we are left with a sample of Nrep neural networks, from which e.g.
the value of the structure function at (x, Q2 ) can be computed as
Nrep
2 1 X (net)(k)
F2 (x, Q ) = F (x, Q2 ) . (6.22)
Nrep
k=1
The goodness of fit of the final set is thus measured by the 2 per data point, Eq. 5.76, which, given
the large number of data points is essentially identical to the 2 per degree of freedom.
In order to apply the general strategy to the present problem several points must be considered:
the choice of training parameters and training length, the choice of the actual data set, and the
choice of theoretical constraints. We now address these issues in turn. The parameters and length
for the first two training epochs have been determined by inspection of the fit of a neural network
to the central experimental values. Clearly, this choice is less critical, in that it is only important in
order for the training to be reasonably fast, but it does not impact on final result.
(0)
After these first two training epochs, the diagonal error function E2 , Eq. 5.33, for a training
on central values is of order two for the central data set, with a similar length of the training than
(0) (0)
that that was required to reach E2 1.3 for the smaller data set of Ref. [11]. The value of E3 ,
Eq. 5.34, which is always bounded by it, E3 E2 is accordingly smaller (see Table 6.6). The training
algorithm then switches to GA minimization of the E3 , Eq. 5.34. The determination of the length
of this training epoch is critical, in that it controls the form of the final fit. This can only be done
by looking at the features of the full Monte Carlo sample of neural networks.
Before addressing this issue, however, it turns out to be necessary to consider the possibility of
introducing cuts in the data set. Indeed, consider the results obtained after a GA training of 4 104
generation to the central data set, displayed in Table 6.6. This is a rather long training: indeed, in
each GA generation all the data are shown to the nets. Hence in 4 104 GA generations the data
are shown to the nets 0.7 108 times, comparable to the number of times they are shown to the nets
during BP training. It is apparent that whereas E3 1 for most experiments, it remains abnormally
high for NMC and especially ZEUS94 and ZEUSSVX95. Because of the weighted training which
has been adopted, this is unlikely to be due to insufficient training of these data sets, and is more
likely related to problems of these data sets.
Whereas ZEUSSVX95 only contains a small number of data points, NMC and ZEUS94 account
each for more than 10% of the total number of data points, and thus they can bias final results
considerably. The case of NMC was discussed in detail in Ref. [11]. This data set is the only one to
cover the medium-x, medium-small Q2 region (see Fig. 6.5) and thus it cannot be excluded from the
fit. As discussed in Ref. [11], the relatively large value of E3 for this experiment is a consequence of
The neural network approach to parton distributions 93
TABLE 3
A B
Experiment E2 E3 E2 E3
Total 2.05 1.54 2.03 1.36
NMC 1.97 1.56 1.74 1.54
BCDMS 1.93 1.66 1.32 1.26
E665 1.64 1.37 1.83 1.38
ZEUS94 3.15 2.26 3.01 2.21
ZEUSBPC95 4.18 1.32 5.18 1.24
ZEUSSVX95 3.37 1.88 5.68 2.11
ZEUS97 2.33 1.54 3.02 1.37
ZEUSBPT97 2.82 1.97 2.08 1.22
H1SVX95 3.21 0.96 4.74 1.09
H197 0.86 0.76 1.08 0.87
H1LX97 1.96 1.46 1.50 1.18
H199 1.15 1.07 1.10 1.01
H100 1.59 1.50 1.48 1.26
Table 6.6: The uncorrelated error function, E2(0) , Eq. 5.33, and the correlated one, E3(0) , Eq. 5.34, for the
fit to the central data points: (A) after the back-propagation training epoch and (B) after the final genetic
algorithms training epoch.
internal inconsistencies within the data set. A value of E3 1.5 indicates that the neural nets do
not reproduce the subset of data which are incompatible with the bulk, as it should be, whereas a
value E3 1 could only be obtained by overlearning, i.e. essentially by fitting irrelevant fluctuations
(see Ref. [11]).
Let us now consider the case of ZEUS94. The kinematic region of this experiment is entirely
covered by the ZEUS97, H197, H199 and H100 experiments. We can therefore study the impact of
excluding this experiment from the global fit, without information loss. The results obtained in such
case are displayed in Table 6.7: when the experiment is not fitted the E3 value for all experiments
with which it overlaps improves and so does the global E3 , whereas E3 for ZEUS94 itself only
deteriorates by a comparable amount, despite the fact that the experiment is now not fitted at all.
We conclude that the experiment should be excluded from the fit, since its inclusion results in a
deterioration of the fit quality, whereas its exclusion does not entail information loss. Difficulties in
the inclusion of this experiment in global fits were already pointed out in Refs. [152, 153], where it
was suggested that they may be due to underestimated or non-gaussian uncertainties. It is likely
that ZEUSSVX95 has similar problems. However, its inclusion in the fit is no reason of concern, even
if its high E3 value were due to incompatibility of this experiment with the others or underestimate
of its experimental uncertainties, because of the small number of data points. It is therefore retained
in the data set. Our final data set thus includes all experiments in Table 6.4, except ZEUS94. We
are thus left with Ndat =1698 data points.
For the sake of future applications, it is interesting to ask how the procedure of selecting exper-
iments in the data set can be automatized. This can be done in an iterative way as follows: first, a
neural net (or sample of neural nets) is trained on only one experiment; then, the total E3 for the
full data set is computed using this neural net (or sample of nets); the procedure is then repeated for
all experiments, and the experiment which leads to the smallest total E3 is selected. In the second
iteration, the net (or sample of nets) is trained on the experiment selected in the first iteration plus
any of the other experiments, thereby leading to the selection of a second experiment to be added
to that selected previously, and so on. The process can be terminated before all experiments are
selected, for instance if it is seen that the addition of a new experiment does not lead to a significant
The neural network approach to parton distributions 94
TABLE 4
Experiment E3
Total 1.25
NMC 1.51
BCDMS 1.24
E665 1.23
ZEUS94 2.28
ZEUSBPC95 1.16
ZEUSSVX95 2.08
ZEUS97 1.37
ZEUSBPT97 1.00
H1SVX95 1.04
H197 0.84
H1LX97 1.19
H199 1.00
H100 1.24
Table 6.7: The same fit as the last column of Table 6.6 if the ZEUS94 data are excluded from the fit.
Figure 6.7: Dependence of the total 2 Eq. 5.76 on the length of training: (left) total training (right) detail
of the GA training.
the latter is computed from the structure function averaged over nets Eq. 6.22, but also because of
the different treatment of normalization errors in the respective covariance matrices, Eq. 5.35 and
Eq. 5.6. Besides the 2 we also list the values of various quantities, defined in Section 5.3, which
can be used to assess the goodness of fit.
The quality of the final fit is somewhat better than that of the fit to the central data points
shown in Table 6.7. In particular, with the exception of NMC (which is likely to have internal
inconsistencies [11]) and ZEUSSVX95 (which is likely to have the same problems as those of ZEUS94)
the 2 per degree of freedom is of order 1 for all experiments. It is interesting to note that the 2
for the neural network average is rather better than the average hE3 i. The (scatter) correlation
between experimental data and the neural network prediction equals one to about 1% accuracy,
with the exception of NMC, ZEUSSVX95 (which have the aforementioned problems) and E665.
The E665 kinematic region overlaps almost entirely (apart from very small Q2 < 1 GeV2 ) with that
of NMC and BCDMS, while having lower accuracy (this is why the experiment was not included
in the fits of Ref. [11]). The data points corresponding to this experiment are therefore essentially
predicted by the fit to other experiments, thus explaining the somewhat smaller scatter correlation.
The average neural network variance is in general substantially smaller than the average ex-
perimental error, typically by a factor 3 4. This is the reason why hEi > 2 : the neural nets
fluctuate less about central experimental values than the Monte Carlo replicas. In the presence of
substantial error reduction, the (scatter) correlation between network covariance and experimental
error is generally not very high, and can take low values when a small number of data points from
one experiment is enough to determine the outcome of the fit, such as in the case of the NMC
experiment, even more so for E665 [11].
As discussed extensively in Ref. [11] it is important to make sure that this is due to the fact
that information from individual data points is combined through an underlying law by the neural
networks, and not due to parametrization bias. To this purpose, the R-estimator has been introduced
in Section 5.3, where it was shown that in the presence of substantial error reduction R > 1 if there
is parametrization bias, whereas R 0.5 in the absence of parametrization bias. It is apparent from
Tables 6.8 and 6.9 that indeed R 0.5 for all experiments. Note that, contrary to what was found in
ref. [11], there is now some error reduction also for the BCDMS experiment, though by a somewhat
smaller amount than for other experiments. We will come back to this issue when comparing results
to those of Ref. [11].
Further evidence that the error reduction is not due to parametrization bias can be obtained
The neural network approach to parton distributions 96
F2p (x, Q2 )
Nrep 1000
2 1.18
hEi 2.52
r F (net) 0.99
(net)R 0.54
V
dat 1.2 103
P
E (net) dat 80%
(exp)
(net) dat 0.027
dat
0.008
(net)
(net)
r 0.73
V
dat 0.20
(exp)
(net) dat 0.31
dat
0.67
(net)
(art)
r 0.54
V
cov dat 3.3 107
(exp)
cov (net) dat 1.3 104
cov
dat
3.6 105
r cov(net) 0.49
Table 6.8:
Estimators of the final results for the constructed probability measure of F2 (x, Q2 ).
by studying the dependence of (net) dat on the length of training. This dependence is shown in
Fig. 6.8 for the BCDMS experiment. It is apparent that the error reduction is correlated with the
goodness of fit displayed in Fig. 6.7, and it occurs during the GA training, thereby suggesting
that error reduction occurs when the neural networks start reproducing an underlying law. If error
reduction were due to parametrization bias it would be essentially independent of the length of
training.
The point-to-point correlation of the neural nets is somewhat larger than that of the data,
as one might expect as a consequence of an underlying law which is being learnt by the neural
networks. In fact, for the NMC experiment the increase in correlation essentially compensates the
reduction in error, in such a way that the average covariance of the nets and the data are essentially
the same. This again shows that in the case of the NMC experiment a small number of points
is sufficient to predict the remaining ones. For all other experiments, however, the covariance of
the nets is substantially smaller than that of the data. As a consequence the (scatter) correlation
of covariance remains relatively high for all experiments, except NMC, and especially E665 whose
points are essentially predicted by the fit to other experiments.
The structure function and associated one- error band is compared to the data as a function of
x for a pair of typical values of Q2 in Fig. 6.9. In Fig. 6.10 the behavior of the structure function as
a function of x at fixed Q2 and as a function of Q2 at fixed x is also shown. It is apparent that in the
data region the error on the neural nets is rather smaller than that on the data used to train them.
The error however grows rapidly as soon as the nets are extrapolated outside the region of the data,
specially in the small-x region and in the large-Q2 region. At large x, however, the extrapolation
is kept under control by the kinematic constraint F2 (1, Q2 ) = 0. Note that the increase of the
uncertainties in the extrapolation region, as opposed to the case for fits with functional forms, is due
to the fact that the behavior of neural networks in this region is not determined by the corresponding
behavior in the data (interpolation) region.
The number of possible phenomenological applications of the neural network parametrization of
The neural network approach to parton distributions 97
Figure 6.8:
Dependence of (net) dat
on the length of training for the BCDMS experiment.
the proton
structure function F2p (x, Q2 ) is rather large, from determinations of the strong coupling
s Q2 , comparison of different theoretical predictions at low x, checks of sum rules, or its effects
of the extraction of polarized structure functions and polarized parton distributions. Another pos-
sibility is a detailed quantitative study of F2p (x, Q2 ) in the transition region between perturbative
and nonperturbative regimes around Q2 = 1 GeV2 . Our parametrization is specially suited for this
purpose since it incorporates all the information from experimental data without introducing any
bias from the functional form behavior of the transition region. In Fig. 6.11 we show the logarithmic
derivative of the structure function, defined as
2 ln F2 (x, Q2 )
(Q , x0 ) , (6.23)
d ln x
x=x0
for two different values of x0 . Note that at very low Q2 expectations are that the Pomeron exponent,
(Q2 = 0) = 0.08 is recovered.
Let us finally compare the determination of F2 (x, Q2 ) presented here with that of Ref. [11], which
was based on pre-HERA data. In Fig. 6.12 one- error bands for the two parametrizations are
compared, whereas in Fig. 6.13 we display the relative pull of the two parametrizations, introduced
in Section 5.3, which in the present situation is given by
where (x, Q2 ) is the error on the structure function determined as the variance of the neural
network sample. In view of the fact that the old fit only included BCDMS and NMC data, it is
interesting to consider four regions: (a) the BCDMS region (large x, intermediate Q2 , e.g. x = 0.3,
Q2 = 20 GeV2 ); (b) the NMC region (intermediate x, not too large Q2 , e.g. x = 0.1, Q2 = 2 GeV2 );
(c) the HERA region (small x and large Q2 , e.g. 0.01 and Q2 > 10 GeV2 ); (d) the region where
neither the old nor the new fit had data (very large or very small Q2 ). In region (a) the new fit
is rather more precise than the old one, for reasons to be discussed shortly, while central values
agree, with P < 1). In region (b) the new fit is significantly more precise than the old one, while
central values agree to about one sigma. In region (c) the new fit is rather accurate while the old fit
had large errors, but P 1 nevertheless, because the HERA rise of F2 is outside the error bands
The neural network approach to parton distributions 98
Figure 6.9:
Final results for F2 (x, Q2 ) compared to experimental data. For the neural network result, the one- error
band is shown.
extrapolated from NMC. This shows that even though errors on extrapolated data grow rapidly they
become unreliable when extrapolating far from the data. Finally in region (d) all errors are very
large and P is consequently small, except at small x and large Q2 , where the new fits extrapolate
the rise in the HERA data, which is missing altogether in the old fits.
Let us finally come to the issue of the BCDMS error, which, as already mentioned, is reduced
somewhat in the current fit in comparison to the data and the previous fit. This may appear
surprising, in that the new fit does not contain any new data in the BCDMS region. However, as
is apparent from Fig. 6.8, this error reduction takes place in the GA training stage, when E3 is
minimized. Furthermore, we have verified that if the uncorrelated E2 is minimized during the GA
training no error reduction is observed for BCDMS. Hence, we conclude that the reason why error
reduction for BCDMS was not found in Ref. [11] is that in that reference neural networks were
trained by minimizing E2 . In fact, as discussed above, the BCDMS experiment turns out to require
the longest time to learn, especially after inclusion of the HERA data. Error reduction only obtains
after this lengthy minimization process.
In summary, we have presented a determination of the probability density in the space of structure
functions for the structure function F2 (x, Q2 ) for the proton, based on all available data from the
NMC, BCDMS, E665, ZEUS and H1 collaborations. Our results take the form of a Monte Carlo
sample of 1000 neural networks, each of which provides a determination of the structure function
for all (x, Q2 ). The structure function and its statistical moments (errors, correlations and so on)
can be determined by averaging over this sample. The results of this part of the thesis are made
available as a FORTRAN routine which gives F2 (x, Q2 ), determined by a set of parameters, and
1000 sets of parameters corresponding to the Monte Carlo sample of structure functions. They can
be downloaded from the web site http://sophia.ecm.ub.es/f2neural/.
This works updates and upgrades that of Ref. [11], where similar results were obtained from the
BCDMS and NMC data only. The main improvements in the present work are related to the need
of handling a large number of experimental data, affected by large correlated systematics. Apart
from many smaller technical aspects, the main improvement introduced here is the use of genetic
algorithms to train neural networks on top of back-propagation. This has allowed for a more accurate
The neural network approach to parton distributions 99
Figure 6.10:
One- error band for the structure function F2 (x, Q2 ) computed from neural nets. Note the different scale
on the y axis in the two plots.
Figure 6.11:
The logarithmic derivative of F2 (x, Q2 ), (Q2 ), as determined from the neural network parametrization in
the transition region between perturbative and nonperturbative regimes.
Figure 6.12:
Comparison of the parametrization of F2 (x, Q2 ) of ref. [11] (old) with that described in [3] (new). The
pairs of curves correspond to a one- error band.
The neural network approach to parton distributions 101
(net)
R
0.59 0.50 0.56
V 0.002 1.9 105 0.0013
(net) dat
PE 0.63 0.56 0.89
(exp) dat
0.017 0.007 0.032
(net) dat
0.008 0.005 0.008
dat
r (net) 0.23 0.98 0.17
(net)
V 0.51 0.69 0.29
(exp) dat
0.17 0.52 0.20
(net) dat
0.84 0.86 0.60
dat
r (net) 0.08 0.73 0.05
(art)
V cov 2.4 109 1.8 109 4.5 109
(exp) dat
cov 4.4 105 3.8 105 1.7 104
(net) dat
cov 5.2 105 2.3 105 3.3 105
(net)dat
r cov -0.03 0.98 0.16
Experiment ZEUSBPC95 ZEUSSVX95 ZEUS97 ZEUSBPT97 H1SVX95 H197 H1LX97 H199 H100
2 1.02 2.08 1.35 0.86 0.67 0.71 1.07 0.90 1.11
hEi
(net)
2.07 2.03 2.24 2.08 2.03 1.91 2.41 1.93 2.11
r F 0.98 0.96 0.99 0.99 0.97 0.99 0.99 0.98 0.99
(net)
R
0.51 0.66 0.55 0.55 0.44 0.46 0.53 0.48 0.54
V 4.3 104 0.0035 0.0010 1.3 104 0.0043 0.0030 0.0005 0.003 0.0013
(net) dat
PE 0.91 0.94 0.87 0.72 0.96 0.95 0.75 0.96 0.93
(exp) dat
0.022 0.061 0.037 0.012 0.063 0.040 0.027 0.051 0.030
(net) dat
0.006 0.013 0.011 0.006 0.011 0.008 0.008 0.008 0.009
dat
r (net) 0.85 0.72 0.86 0.73 0.84 0.87 0.42 0.82 0.89
(net)
V 0.09 0.30 0.12 0.14 0.118 0.14 0.31 0.16 0.14
(exp) dat
0.61 0.24 0.28 0.40 0.36 0.06 0.29 0.05 0.09
(net) dat
0.77 0.64 0.39 0.63 0.57 0.27 0.58 0.29 0.26
dat
r (net) 0.53 0.40 0.66 0.60 0.48 0.51 0.69 0.37 0.55
(art)
V cov 6.4 108 1.9 106 3.4 107 1.4 109 3.0 106 3.8 107 3.8 108 2.7 107 1.7 107
(exp) dat
cov 2.8 104 8.5 104 3.7 104 5.8 105 0.0014 1.0 104 2.1 104 1.4 104 9.6 105
(net) dat
cov 2.8 105 1.2 104 3.2 105 2.3 105 7.0 105 1.5105 6.9 105 1.6 105 2.2 105
(net)dat
r cov 0.69 0.48 0.77 0.65 0.53 0.61 0.57 0.54 0.58
Table 6.9: Final results for the individual experiments: fixed target (top) and HERA (bottom)
The neural network approach to parton distributions 102
Figure 6.13:
The relative pull, Eq. 6.24, of the new and old F2 (x, Q2 ) parametrizations. The one- band is also shown.
Note that in the kinematical region where experimental data for the two versions of the fit overlaps, the
pull always verifies P (x, Q2 ) 1, as expected from consistency arguments.
The neural network approach to parton distributions 103
Table 6.10: Features of experimental data on lepton moments Mn (E0 ),Eqns. 6.25-6.27, where n stands for
the order of the moment and E0 the lower cut in the lepton energy, see Eq. 4.49. Experimental errors are
given as percentages.
The final published observables for the semileptonic decays of the B meson are the moments of
the lepton energy spectrum, Eq. 4.49, with different cuts in the lepton energy, rather than the full
differential spectrum itself. The data that we use for the present parametrization of the lepton energy
spectrum is the following: the Babar Collaboration [60] provides the partial branching fractions,
Z Emax
d
M0 (E0 ) = B L0 (E0 , 0) = B (El ) dEl , (6.25)
E0 dEl
The neural network approach to parton distributions 104
one ends up with Eq. 6.26, and similarly for the remaining moments. The difference with the Babar
data is that the partial branching fraction Eq. 6.25 is not measured, and that the Belle data cover
a somewhat larger lepton energy range. since the lowest value of E0 of their data set is E0 = 0.4
GeV. These moments, for six different values of E0 from 0.4 to 1.5 GeV, make up a total of 18 data
points. Finally the Cleo Collaboration [156] provides the moments Mn (E0 ) for n = 1, 2, for energies
between 0.6 to 1.5 GeV, for a total of 20 data points (10 for n = 1 and 10 for n = 2). The correlations
for this experiment are larger since measurements of the same moment at different energies E0 are
highly correlated. The three collaborations provide also the total and statistical errors, as well as the
correlation between different measurements. These characteristics are summarized in Table 6.10.
(exp)
We have noticed that the experimental correlation matrices, ij , as presented with the pub-
lished data of the three experiments [155, 60, 156], are not positive definite (see also [159]). The
source of this problem can be traced to the fact that off-diagonal elements of the experimental corre-
lation matrix are large, as expected since moments with similar energy cut contain almost the same
amount of information and are therefore highly correlated. Then one can check that some eigen-
values are negative and small, and this is pointing that the source of the problem is an insufficient
accuracy in the computation of the elements of the correlation matrix.
However, whatever is the original source of the problem, the fact that the experimental correlation
matrices are not positive definite has an important consequence: the technique introduced in Chapter
5 for the generation of a sample of replicas of the experimental data taking into account correlations
relies on the existence of a positive definite correlation matrix, and therefore if the experimental
correlation matrix is not positive definite, our strategy cannot be applied.
A method to overcome these difficulties while keeping the maximum amount on information on
experimental correlations as possible consists on removing those data points for which the exper-
(exp)
imental correlations are larger than a maximum correlation, ij max . The value of max is
determined separately for each experiment as the maximum value for which the resulting correlation
matrix is positive definite. In Table 6.11 we show the value of max for each experiment, together
with the features of the remaining experimental data after cutting those data points with too large
correlations. Similar considerations about the need to take a restricted subset of data points due to
the large point to point correlations have been discussed in the context of global fits in B physics,
see for example [160, 161].
As has been discussed in Chapter 5, the first step of our strategy to parametrize experimental
data is the generation of an ensemble of replicas of the original measurements, which in the present
case consists in the measured moments, which we will denote by
(exp)
Mi , i = 1, . . . , Ndat , (6.29)
The neural network approach to parton distributions 105
Table 6.11: Features of experimental data that is included in the fit, after cutting data points with too
large correlations. Experimental error are given as percentages.
(exp)
where Mi stands for any of Eqns. 6.25-6.27, and Ndat is the total number of experimental data
points, together with the total error and the correlation matrix.
To generate replicas we proceed in the usual way. Since experimental data consists on central
values for the moments, together with its total error and the associated correlations, the k-th replica
of the experimental data is constructed, following Eq. 5.13, as
(art)(k) (exp) (k)
Mj = Mj + rj tot,j , j = 1, . . . , Ndat , , k = 1, . . . , Nrep . (6.30)
One can check that the sample of replicas is able to reproduce the errors and the correlations of
the experimental data. In Table 6.12 we have the statistical estimators for the replica generation.
One can observe that to reach the desired accuracy of a few percent and to have scatter correlations
r 0.99 for central values, errors and correlations, a sample of 1000 replicas is needed.
The next step is to train a neural network parametrizing the lepton spectrum for each replica of
the experimental data. Therefore we parametrize the lepton energy spectrum, Eq. 4.48,
(net)(k)
d
(El ), k = 1, . . . , Nrep , (6.31)
dEl
where El is the lepton energy, with a neural network. From this neural network parametrization
one can compute the corresponding moments, to compare with experimental data. For example, the
leptonic moment L1 (E0 , 0) is computed for the k-th neural network as
Z Emax (net)(k)
(k) d
L1 (E0 , 0) = El (El )dEl . (6.32)
E0 dEl
Now we describe the details of the neural network training. Concerning the topology of the
neural network, in this case we find that an acceptable compromise is given by an architecture
The neural network approach to parton distributions 106
1-4-3-1. Concerning the training strategy, in the present situation we will have a single training
(exp)
epoch minimizing the diagonal error defined in Eq. 5.33 but with the total error i,tot instead
of only the statistical error with dynamical stopping of the training, as discussed in Section 5.2.3.
The minimization technique that we will use for the neural network training is Genetic Algorithms,
which is suitable for minimization of highly nonlinear error functions, as in the present case. We
also use weighted training to obtain a more even 2 distribution between the different experiments.
See the original work [5] for additional details on the neural network training.
In situations in which experimental data consists of different experiments, as it is the case now
(with Babar, Belle and Cleo), one has also to address the issue of possible inconsistency between
different experiments, that is, the possibility that a subset of points from the two experiments in the
same kinematical region do not agree with each other within experimental errors. This issue has been
discussed in Section 6.2, regarding the possible inconsistency of the ZEUS94 experiment with the
other experimental data sets in the parametrization of the structure function F2p (x, Q2 ). This issue
is of paramount importance in the context of global parton distribution fits, see for example [83].
As has been discussed in detail in Ref. [5], in this case the three experiments are fully compatible,
as can be seen in Fig. 6.14 for a training to the experimental data.
Figure 6.14:
(0)
Dependence of the error function E2 for a training of all three experiments on central experimental data.
(0)
Note that at the end of the minimization E2 1 for all experiments.
The lepton energy spectrum, Eq. 6.32, as parametrized with a neural network, has to satisfy
three constraints independently of the dynamics of the process. First of all, it is zero outside the
region where it has kinematical support, in particular it has to vanish at the kinematical endpoints,
El = 0 and El = Emax . Second, the spectrum is a positive definite quantity (since any integral over
it is an observable, a partial branching ratio), therefore it must satisfy a local positivity condition.
As we have discussed in Section 5.2.4, there are several methods in our strategy to introduce kine-
matical constraints in an unbiased way. We have found that for the present application, the optimal
method to implement the kinematical constraints that the spectrum should vanish at the endpoints
is hard-wiring them into the parametrization, that is, the lepton energy spectrum parametrized by
a neural net will be given by
(net)(k)
d (L)
(El ) = Eln1 1 (El )(Emax El )n2 , (6.33)
dEl
(L)
with n1 , n2 positive numbers, and 1 is the output of the neural network for a given value of El .
Assuming this functional behavior at the endpoints of the spectrum introduces no bias since it can
The neural network approach to parton distributions 107
be shown that our results do not depend on the value of n1 and n2 . For the reference training we
have verified that n1 = 1 and n2 = 1 are suitable values for these polynomial exponents.
We will impose the remaining kinematical constraint, the positivity constraint, as a Lagrange
multiplier in the total error. That is, the total quantity to be minimized is the sum of two terms,
" (net) #
(k) (k) d
Etot = E2 + p P , (6.34)
dEl
(k)
where E2 is Eq. 5.33 and the the positivity condition is implemented penalizing those configurations
in which a region of the spectrum is negative,
" (net) # Z Emax (net) (net) !
d d d
P = dEl (El ) (El ) , (6.35)
dEl 0 dEl dEl
since P is zero for a positive spectrum. The relative weight p is determined via a stability analysis,
requiring that p is large enough so that the constraint is verified, but small enough so that experi-
mental data can still be learned in an efficient way. As we will show in brief, the implementation of
the kinematical constraints plays an essential role in the parametrization of the lepton spectrum. In
particular, if kinematic constraints are not included in the fit the error at small El grows very large
and the extrapolation to E0 = 0 becomes completely unreliable.
Figure 6.15:
Lepton energy spectrum, Eq. 4.48, as parametrized by the Monte Carlo ensemble of neural networks. The
bands represent the 1- uncertainty region.
The set of neural networks parametrizing the lepton energy distribution trained on the Monte
Carlo sample of replicas of the experimental data defines the probability measure in the space of
lepton energy spectra. From this probability measure, as explained in Chapter 5, expectation values
The neural network approach to parton distributions 108
Figure 6.16:
Comparison of the different experimental moments, Eqns. 6.25-6.27 as obtained from our parametrization
with the original measurements, as a function of the lower cut on the lepton energy E0 .
We now present the results for this parametrization, from which averages and moments can be
computed with the corresponding uncertainties. In Fig. 6.15 we plot the resulting lepton energy
spectrum with uncertainties. In Fig. 6.16 we compare the computation of the moments of the lepton
energy spectrum using our parametrization to the experimental results from Babar, Belle and Cleo.
We observe good agreement for all the data points. As it has been explained in Section 5.3, it is
crucial to validate the results of the parametrization using suitable statistical estimators. In Table
6.13 we have the most relevant statistical estimators for all the data points, and in Table 6.14 the
same estimators for the different experiments included in the fit.
With the results described in this section the total branching ratio can be computed, even if
experimental information was restricted to a finite value of E0 . This is possible because the continuity
condition implicit in the neural network definition together with the kinematic constraints allow for
an accurate extrapolation from the experimental data with lowest El = 0.4 GeV to the kinematic
The neural network approach to parton distributions 109
endpoint El = 0. Note that this is not true if the Belle data is not included in the fit, see Fig. 6.21.
The result that is obtained for the partial decay rate, computed from the neural network sample,
Nrep Z (net)(k)
1 X Emax d
hB (B Xc l)i = hM0 (El = 0)i = B dEl (El ) , (6.37)
Nrep 0 dEl
k=1
and with the direct Delphi measurement of the total branching ratio [163], which is measured without
restrictions on the lepton energy, which yields
Is is observed that the three results are compatible, even if our determination is somewhat closer,
both in the central value and in the size of the uncertainty, to the Delphi measurement. The small
error in our determination of B (B Xc l) shows that the technique discussed in this work can be
used also to extrapolate in a faithful way into regions where there is no experimental data available.
10 100 1000
2
tot
2
1.31 1.11 1.11
D h iE 2.50 2.23 2.21
P E hM irep 9% 8% 5%
r [M ]
(net)
0.999 0.999 0.999
P
E dat 67% 58% 10%
(exp)
(net) dat 0.00267 0.00267 0.00267
dat
0.00180 0.00168 0.00168
r (net)
(exp) 0.77 0.85 0.85
(net) dat 0.166 0.166 0.166
dat
0.32 0.245 0.245
r
(exp)(net) 0.35 0.38 0.38
cov (net) dat 1.4 106 1.4 106 1.4 106
cov
dat
7.8 107 6.7 107 6.7 107
r cov(net) 0.49 0.53 0.53
Table 6.13: Statistical estimators for the ensemble of trained neural networks, for 10, 100 and 1000 trained
replicas.
Important information on the quality of the fit can be obtained from the analysis of the de-
pendence of the different statistical estimators with respect to the number of genetic algorithms
generations, like the total 2 or the average error.
This
dependence is represented in Fig. 6.17.
Note that at the end of the training 2tot 1 and 2 2, as expected. Note also that the fit has
reached convergence since the 2tot profile is very flat for a large number of generations.
This can be repeated for other statistical estimators, like for example the average spread of
the data points as computed from the neural network ensemble, defined in Section 5.3.1, which is
(exp)
compared with the same quantity computed from the experimental data, i in Fig. 6.18 as a
The neural network approach to parton distributions 110
Table 6.14: Statistical estimators for the ensemble of trained neural networks, for those experiments included
in the fit. The replica sample consists of Nrep =1000 neural networks.
function of the number of genetic algorithms generations. The fact that one has error reduction,
as has been explained in [11], is the sign that the network has found an underlying law to the
experimental data, in this case the lepton energy spectrum.
Another relevant estimator is the scatter correlations of the spread of the points (see Fig. 6.18).
(net)
One can define similarly a scatter correlation for the net correlation ij , also represented in Fig.
6.18 for the Babar experimental data. We observe that when the training ends both values of r
are close to 1, a sign that errors and correlations are being faithfully reproduced. Another relevant
(k)
estimator of the goodness of the fit is the distribution of both E2 and of the training lengths
over the replica sample, see Fig. 6.19. We have checked that these two distributions satisfy the
requirements described in Section 5.3.
We can compare also fits with and without the inclusion of kinematical constraints, to see which
is the effect of their implementation. The endpoint constraint at El = Emax is crucial to obtain
convergent results. In Fig. 6.20 one can observe that when the endpoint constraint at El = 0 and
the positivity constraint are removed the error becomes very large at small El , and on top of that
sightly negative at large El near the endpoint. Note that the physical value for the spectrum at the
endpoint, d/dEl (El = 0) = 0, is contained within the small-El error bars, as expected.
In Fig. 6.21 we show a comparison of a fit of a single experiment, Babar. We observe that when
only the Babar data is incorporated in the fit, the error at small values of El is much larger. This
is so because, as discussed above, the Belle data, which extends to lower values of El , is crucial to
determine the low El behavior of the lepton spectrum. Finally, in Fig. 6.21 we show that our results
are independent of the precise choice of n1 and n2 in Eq. 6.33. In particular using n1 = 1.5 and
n2 = 1.5 results in the same fit as when using the reference values, n1 = 1 and n2 = 2.
As one example of the applications of the present parametrization of the lepton energy spectrum,
in this section our results are compared with the theoretical calculation of Ref. [52] (AGRU). Their
formalism allows the computation of moments of arbitrary differential distributions from semileptonic
B meson decays, with arbitrary kinematical cuts, like the lepton energy spectrum in charmed decays
that is analyzed in the present work. In particular their computation of the lepton energy spectrum
The neural network approach to parton distributions 111
Figure 6.17:
Total 2tot , Eq. 5.76, of the replica sample, compared with average error, 2 .
with E E/mb , and where the leading order partonic semileptonic decay rate LO is given by
LO = 0 |Vcb |2 z0 () , (6.42)
where 0 is defined in Eq. 4.51, m2c /m2b and the phase space factor is defined in Eq. 4.41. These
moments can be related to the experimentally measured moments, defined in Eqns. 6.25- 6.27, in a
straightforward way, for example for the first two moments one has
N1
M0 = B 0 N0 , M 1 = mb , (6.43)
N0
and similarly for the other moments.
In Figs. 6.22 the results of [52] both at leading order (LO) and at next-to-leading order (NLO)
are compared with the moments obtained from our parametrization as a function of the lower cut
in the lepton energy E0 . Comparing results at different perturbative orders is interesting to asses
the behavior of the perturbative expansion. One should take into account in this comparison that
the results of [52] are purely perturbative, therefore the difference between the two results could
be an indicator of the size of the missing nonperturbative corrections. Another interesting feature
is that while for M0 , the partial branching fraction, the NLO corrections are sizable and bring the
theoretical prediction in better agreement with the experimental measurement, for M1 (which is the
ratio of two perturbative expansions) the size of the perturbative corrections is much smaller. In
addition, we show in Fig. 6.23 the comparison for the n = 4 moment (not measured experimentally)
with an estimation of the uncertainties in the theoretical prediction, as described in [5].
A more general comparison with theoretical predictions should include also the known nonpertur-
bative power corrections up to order O(1/m3b ) to the expressions for the moments of the spectrum,
The neural network approach to parton distributions 112
Figure 6.18:
Average error of the data points as computed from the neural network sample, Eq. 5.17, as compared with
the experimental value (left). Dependence during the training of the values of the scatter correlations, Eq.
5.90, during the training, for the Babar experimental data (right).
since in this case the difference of the theoretical results from our parametrization would indicate the
size of the missing unknown corrections, both perturbative and nonperturbative. A more detailed
study of this point, together with an analysis of possible violations of local quark-hadron duality
[164], is left for future work.
As another application of our parametrization of the lepton energy spectrum, it will be used
to determine the b quark mass m1S b from the experimental data using a novel strategy. To this
purpose the technique of Ref. [165] will be used, which consists on the minimization of the size
of higher order corrections to obtain sets of moments of the lepton energy spectrum which have
reduced theoretical uncertainty for the extraction of nonperturbative parameters like 1S and 1 .
In Ref. [5] we describe in detail the method we use to determine the b quark mass from our neural
network parametrization of the leptonic spectrum, which is summarized here.
The moments that minimize the impact of the higher order nonperturbative corrections are given
by
R Emax 1.4 d
El dEl dEl
R1 R1.3Emax , (6.44)
d
1 El dE l
dEl
and R Emax d
El1.7 dE dEl
R2 R1.4
Emax
l
. (6.45)
d
0.8 El1.2 dE l
dEl
The full expression for this moments in terms of heavy-quark non-perturbative parameters can be
found in Ref. [165]. These leptonic moments R1 and R2 depend on 9 nonperturbative parameters,
up to O(1/m3b ): 1S , 1 and 2 , and six matrix elements, 1 , 2 , 1 , 2 , 3 and 4 , that contribute at
order 1/m3b in the heavy quark expansion, of which not all of them are independent [58, 166].
The most relevant feature of these leptonic moments R1 and R2 is that they have non-integer
powers and to the best of our knowledge have not been yet experimentally measured, at least in a
published form. Therefore, the values of R1 and R2 that will be used in this analysis are extracted
from our neural network parametrization of the lepton spectrum, which allows the computation
of arbitrary moments, together with their associated error and correlation. Let us recall that the
The neural network approach to parton distributions 113
Figure 6.19:
(k)
Distribution of error functions, E2 , over the sample of trained replicas (left) and distribution of training
lenghts in GA generations (right).
and similarly for R2 , and the error and the correlation of the moments R1 and R2 are computed
in the standard way. The following values for the moments with the associated errors and their
correlation are obtained,
(net) (net)
R1 = 1.017 0.003, R2 = 0.938 0.004, 12 = 0.94 , (6.47)
that as expected are highly correlated. Then to determine the nonperturbative parameters 1S and
1 the associated 2fit is minimized,
2
X (net) (th) (net) (th)
2fit = Ri Ri cov1 ij Rj Rj , (6.48)
i,j=1
(net)
where cov1
ij is the inverse of the covariance matrix associated to the two moments R1 and
(net) (th)
R2 , and Ri 1S , 1 is the theoretical prediction for these moments as a function of the two
nonperturbative parameters [165].
Once the values of 1S and 1 have been determined from the minimization of Eq. 6.48, one
obtains for the b quark mass m1S b mass in the 1S scheme the following value:
m1S
b = mB 1S = 4.84 0.16
exp
0.05th GeV = 4.84 0.17tot GeV , (6.49)
From the above results one observes that the dominant source of uncertainty is the experimental
uncertainty, that is, the uncertainty associated to our parametrization of the lepton energy spectrum.
This determination of the b quark mass is consistent with determinations from other analysis. The
b quark mass has been determined using different techniques, like the sum rule approach, using
either non-relativistic [167, 168, 169] or relativistic [170, 171] sum rules, global fits of moments of
The neural network approach to parton distributions 114
Figure 6.20:
Comparison of the lepton energy spectrum when no kinematic constraints are incorporated.
differential distributions in B decays, [159, 163, 161], the renormalon analysis of Ref. [172], and
several other methods related to heavy-quarkonium physics [173, 174] (see [175] for a review). To
compare our results with some of the above references, it is useful to relate the m1S
b mass to the
MS-bar mb (mb ) mass [176], and once the conversion is performed the value
mb (mb ) = 4.31 0.17tot GeV , (6.50)
is obtained, where we have used s (MZ2 ) = 0.1182 and included perturbative corrections up to
two loops. It turns out that our determination of m1S b is not competitive with respect to other
determinations due to the large uncertainties associated to the fact that only two moments we use
in the determination of these parameters. The inclusion of additional moments in the fit would
constrain more the nonperturbative parameters and reduce the experimental uncertainty associated
to them.
For the nonperturbative parameter 1 the following value is obtained
1 = 0.17 0.15exp 0.05th GeV2 = 0.17 0.16tot GeV2 . (6.51)
As in the determination of 1S it can be seen that the theoretical uncertainties are smaller than the
experimental ones, which are the dominant ones. Our result for the parameter 1 is consistent with
other extractions in the context of global fits of B decay data [159, 163], but again not competitive
due to the rather large uncertainties.
To summarize, in this part of the thesis we have presented a determination of the probability
density in the space of the lepton energy spectrum from semileptonic B meson decays, based on the
latest available data from the Babar, Belle and Cleo collaborations. In addition, this application
shows the implementation of a well defined strategy to reconstruct functions with uncertainties when
the only available experimental information comes through convolutions of these functions. As a
byproduct of our analysis, using our parametrization of the lepton spectrum, we have extracted the
nonperturbative parameters 1S and 1 , with a method that reduces the contributions from the
theoretical uncertainties.
The number of possible applications of this strategy to other problems in B physics is rather
large, and is discussed in some detail in Ref. [5]. The most promising application is to use our
neural network strategy to construct a parametrization of the shape function S(k) of the B meson, a
The neural network approach to parton distributions 115
Figure 6.21:
Comparison of the lepton energy spectrum when the parameters n1 and n2 are changed (left). Comparison
of the lepton energy spectrum when only Babar data is incorporated in the fit with the result when all
experiments are incorporated in the fit (right).
universal characteristic of the B meson governing inclusive decay spectra in processes with massless
partons in the final state, as extracted from the B Xs and B Xu l decay modes. In this case
we have additional theoretical information on its behaviour. For example, at tree level its moments
Z
An dkk n S(k) , (6.52)
Figure 6.22:
Comparison of the results of Ref. [52] on the partial branching ratio Eq. 6.25 both at leading order (LO)
and at next-to-leading order (NLO) with the same quantity computed from our parametrization.
Figure 6.23:
Comparison of the results of Ref. [52] on the n = 4 moment and at next-to-leading order (NLO), with the
corresponding theoretical uncertainties, with the same quantity computed from the neural network
parametrization parametrization.
The neural network approach to parton distributions 117
Then we will interpolate the x-space evolution factor (x), so that the heavy numerical task of its
computation is decoupled from the determination of the parameters describing the parton distribu-
tion from experimental data. With all these considerations taken into account one ends up with a
fast and efficient evolution code, which will described in this Section.
First of all we define the notation that will be used for the strong coupling constant. The
convention that we use is
X
das (Q2 ) s (Q2 )
Q2 = k as (Q2 )k+1 , as (Q2 ) = , (6.54)
Q2 4
k=0
the scheme-dependent coefficent 2 is given in [181], and Nf is the number of active quark flavors.
The explicit solution for the above equation at the NNLO accuracy can be written in terms of a
boundary condition s MZ2 and is given by
!
2
2
2
2
s Q2 NLO
s Q = s Q 1 + s Q s Q s MZ2 (b2 b21 ) + s 2
Q NLO b1 ln ,
NNLO LO LO LO s (MZ2 )
(6.56)
where we have defined in a recursive way,
Q2
s Q2 NLO = s Q2 LO 1 b1 s Q2 LO ln 1 + 0 s MZ2 ln 2 , (6.57)
MZ
s MZ2
s Q2 LO = Q2
. (6.58)
1 + 0 s (MZ2 ) ln M 2
Z
In the following discussion we assume that in each case we use the appropriate anomalous dimension
for each type of nonsinglet parton distribution, and since we are restricted now to the analysis
of nonsinglet parton distributions, we will assume also that all quantities (parton distributions or
anomalous dimensions) correspond to this nonsinglet sector. The nonsinglet anomalous dimension
has been computed in powers of s Q2 up to NNLO,
!2
2
(0) s Q2 (1) s Q2
N, s Q = (N ) + (N ) + (2) (N ) , (6.61)
4 4
where the N-space anomalous dimension was computed at LO in Ref. [40], at NLO in Ref. [184],
and recently the full NNLO contribution was computed in Ref. [28]. The leading order nonsinglet
anomalous dimension has the explicit form
" N
#
X 1 2 4
(0) (N ) = CF 3 4 + , CF = , (6.62)
k N (N + 1) 3
k=1
which will be needed for the computation of the large-x limit of the x-space evolution factor (x).
The solution to the nonsinglet evolution equation, Eq. 6.59, reads at NNLO accuracy
q(N, Q2 ) = (N, s Q2 , s Q20 )q(N, Q20 ) , (6.64)
The neural network approach to parton distributions 119
where c lies to the right of the rightmost singularity of (N ). That is, one can check that the Mellin
transform of the LHS of Eq. 6.68 is equal to the RHS of Eq. 6.64,
Z 1
dxxN 1 q(x, Q2 ) = N, s Q2 , s Q20 q(N, Q20 ) . (6.70)
0
The complicated numerical task in this evolution formalism is the numerical computation of the
Mellin inverse transformation, Eq. 6.69. However note that as opposite to standard N-space par-
ton evolution methods, the evolution of the parton distribution can be decoupled from the Mellin
inversion of the evolution factor, which is the most time consuming task.
We will use the Fixed Talbot algorithm to perform the numerical inversion of the Mellin transform
Eq. 6.69. For a detailed description of this algorithm and its efficiency see Refs. [185, 186].
The Fixed Talbot algorithm for the numerical inversion of Mellin-Laplace transforms is specially
accurate for the following reason: the numerical computation of inverse Mellin transforms is in
general difficult since the integrand is highly oscillatory since the integration path moves through
the complex plane. The Fixed Talbot algorithm bypasses this problem by choosing a path in the
complex plane which minimizes the imaginary part of the integrand and therefore minimizes the
impact of these oscillations. In the Fixed Talbot algorithm a generic inverse Mellin transform is
computed as Z
1
f (x) = xN f (N )dN , (6.71)
2i C
where the Talbot path C is defined by the condition
Figure
6.24:
The x-space evolution factor x, s Q2 , s Q20 for different perturbative orders at Q2 = 104 GeV2
(left) and for different evolution lenghts at leading order (right). In all cases the starting evolution scale is
Q20 = 2 GeV2 . Note the sharp rise of the evolution factor at large values of x.
where M is the number of required precision digits. The Talbot path minimizes the contribution
from oscillatory terms to the inverse Mellin Eq. 6.71, and therefore avoids the associated numerical
instabilities. The resulting integral, Eq. 6.71, can be computed with adaptative gaussian quadratures
to any desired accuracy.
The x-space evolution factor x, s Q2 , s Q20 , Eq. 6.69, has a flat behavior at intermediate
x together with a growing at both large and small x, where the behavior in both limits can be
computed analytically. Now the explicit analytic expressions for the evolution factor (x) at both
large x and small x will be computed. These results are interesting both for themselves, to obtain
a more quantitative understanding of our evolution technique, and also for practical purposes, since
they allow to perform a more efficient interpolation of (x). In Fig. 6.24 we represent the x-space
evolution factor (x) for different perturbative orders and different evolution lenghts. Note that
in the x range relevant for nonsinglet evolution, x 5 103 , the perturbative expansion for (x)
appears to be near to convergence at the NNLO level. Note also that the small-x behaviour is very
smooth in the relevant x range, unlike the large-x one.
First of all, the large x limit of the evolution kernel will be computed in two equivalent ways.
At leading order in s Q2 , in the large x limit, the dominant contribution to the evolution kernel
comes from the large N limit of the LO anomalous dimension, Eq. 6.63, and therefore one has
Z +i !2CF ln N/b0
dN N s Q2
x1 (x) = x , (6.74)
i 2i s (Q20 )
2
2
1 1 0 s (Q2 )
x1 x, s Q , s Q0 = ln , (6.76)
2CF s (Q0 )
2 x
b0 ln s (Q2 )
The neural network approach to parton distributions 121
where () = 1/(), which coincides with Eq. 6.76 once one takes into account that for large x
one has that ln(1/x) (1 x). On top of the computation of the large-x limit of (x), the above
derivation shows that x-space evolution factor can be defined as a distribution, just as standard
splitting functions.
We can compute also analytically the small x behavior. In the nonsinglet sector (x) grows at
small x as dictated by Double Asymptotic Scaling [188]. To see this, recall that (x) at low x is
given by Eq. 6.69 expanding the LO anomalous dimension around its rightmost singularity, in this
case the N = 0 pole, so that one has
Z +i ! !
2
2
dN 1 2CF s Q20 1 1
x0 x, s Q , s Q0 = exp N ln + ln + . (6.79)
i 2i x 0 s (Q2 ) N 2
If in the expression above we perform a saddle point integration, one obtains that the leading smallx
behavior of the nonsinglet x-space evolution kernel is given by
s !
2
2
2CF 1 s (Q20 )
x0 x, s Q , s Q0 = N exp 2 ln ln , (6.80)
0 x s (Q2 )
where N is a normalization factor. The growing of (x) at low x is more important for larger values
of Q2 , as can be seen in Fig. 6.24. In the singlet sector, the leading behavior of (x) at low x can
also be computed exactly and is given again by Double Asymptotic Scaling [21],
s
2
2
x0 x, s Q , s Q0 = N2 1
e exp 2CF ln s (Q0 ) ln 1 , (6.81)
x 0 s (Q2 ) x
which is much larger at small x than the corresponding non-singlet result, Eq. 6.80. In the singlet
case therefore, one would need to substract the effects of the small-x growth of the evolution factor,
in a similar way that is done now with the large-x growth in the interpolation of (x), to be discussed
in the following.
It is known that all splitting functions Pij (x, s Q2 ), except the non diagonal entries of the
singlet matrix, diverge when x = 1. Therefore the nonsinglet evolution factor (x), Eq. 6.69, will
The neural network approach to parton distributions 122
Figure 6.25:
The x-space evolution factor (x) once the leading large-x behaviour has been substracted, Eq. 6.97, for
different perturbative orders for Q2 = 104 GeV2 . Note that the resulting function is much more smooth,
and therefore much more efficient to interpolate.
Inserting this decomposition in Eq. 6.82 one finds that it can be written as
Z 1 Z
2 dy x 2 2
2
1
f (x, Q ) = (y) f , Q yf x, Q0 + f x, Q0 (y)dy , (6.84)
x y y 0 x
so that now thanks to the subtraction that we have performed all the integrals in Eq. 6.84 converge
and thus can be computed numerically. Note that this is equivalent to the definition of the x-space
evolution factor in terms of the plus distribution prescription,
Z 1 Z
dy x 2 1
f (x, Q2 ) = (y)+ f , Q0 + f x, Q20 (y)dy , (6.85)
x y y x
At leading order the anomalous dimension satisfies (0) (N = 1) = 0 and therefore it follows that
= 1. This result follows from momentum conservation. At higher orders this result applies only to
certain combinations of nonsinglet parton distributions. For practical implementations we use the
equality Z x
x, s Q2 , s Q20 = (N = 1) dy(y, s Q2 , s Q20 ) , (6.89)
0
since in the nonsinglet sector the integral in the above equation is very fast to compute. Note
that the above property does not hold in the singlet sector since the gluon-gluon and gluon-quark
anomalous dimensions have a pole at N = 1.
We have benchmarked our evolution formalism with the benchmark evolution tables first pre-
sented in Ref. [183] and recently updated including the full NNLO anomalous dimension in Ref. [31].
The results and the accuracy of this benchmark can be seen in Table 6.15, where we use exactly the
same parameters than in [183, 31] for parton evolution in the Fixed Flavor Number (FFN) scheme.
More details about this benchmarking procedure of QCD evolution codes can be found in Ref [183].
We have checked that the accuracy is always of the order O 105 , which is the required accuracy
on modern QCD evolution codes. Similar checks have been performed for the evolution of other
parton distributions as well as for evolution in the Variable Flavor Number (VFN) scheme.
The experimental observable that determines the nonsinglet parton distribution is the nonsinglet
structure function, defined as the difference between structure functions in the proton and in the
deuteron,
F2N S (x, Q2 ) F2p (x, Q2 ) F2d (x, Q2 ) , (6.90)
which is related to the nonsinglet parton distribution via a perturbative coefficient function,
Z 1
NS 2 dy 2
x 2
F2 (x, Q ) = x CN S (y, s Q )qN S ,Q , (6.91)
x y y
where the coefficient function has the following expansion up to NNLO in perturbation theory,
!2
2 s Q2 (1) s Q2 (2)
CN S (x, s Q ) = (1 x) + CN S (x) + CN S (x) . (6.92)
4 4
The NLO coefficient C (1) (x) was computed in Ref. [184], while the NNLO nonsinglet coefficient
function was first computed in Ref. [189]. However, for the NNLO coefficient function we do not
use the exact result but rather the N-space parametrization of Ref. [190], which is fast and accurate
enough for our purposes.
The way that we incorporate the coefficient functions into our evolution formalism is through a
redefinition of the x-space evolution kernel,
Z c+i
dN N
e s Q2 , s Q20 ) =
(x, x C N, s Q2 (N, s Q2 , s Q20 ) , (6.93)
ci 2i
The neural network approach to parton distributions 124
so that now the nonsinglet structure function can be written in terms of the parton distribution at
the initial evolution scale as
Z 1
dy e x 2
F2N S (x, Q2 ) = x (y, s Q2 , s Q20 )q , Q0 . (6.94)
x y y
The rationale behind this procedure is to improve the speed of the evolution code, that is, if coefficient
functions where introduced as in Eq. 6.91, we would need to perform an additional convolution
integral each time a structure function was computed. The only drawback of this method is that
the evolution factor becomes process-dependent, while the bare evolution factor (x), Eq. 6.69, is
process independent and indeed it could be used by itself as an alternative procedure for evolution
of standard parton dstributions.
We use a Variable Flavor Number (VFN) scheme with zero mass partons to incorporate the
effects of heavy quark thresholds in the evolution. At NNLO one has to take into account that both
the strong coupling and the parton distributions are discontinuous when crossing the heavy quark
thresholds. We compute s Q2 in the Variable Flavor Number scheme, taking into account the
discontinuity at NNLO at heavy quark thresholds,
2
2 2 C2
s,f +1 (mf ) = s,f (mf ) + s,f (m2f )3 , (6.95)
4
where s,f is the strong coupling in the effective theory with Nf active light quarl flavors, m2f is
the position of the heavy quarl threshold, and C2 = 14/3 was computed in [191]. For the parton
distribution at heavy quark thresholds, the corresponding N-space matching condition is given by
2 !
2
(nf +1) (n ) s,f (m )
qN S (N, m2h ) = qN Sf (N, m2h ) 1 + h
AN
qq
S,(2)
(N )) , (6.96)
4
where the NNLO matching coefficients are determined in Ref. [192]. A more refined treatment of
heavy quark mass effects [193, 194] is postponed to the case of singlet evolution, since it is known
that the influcence of heavy quark mass effects in the nonsinglet sector is rather small.
The evolution approach described above is very accurate but also very CPU time consuming.
This is so since one has to compute both the N-space evolution factor and its Mellin inverse each time
one wants to evolve a parton distribution. This is specially a problem in our approach, where we
are parametrizing our parton distributions with neural networks, with a very large parameter space
and thus the minimization routine requires more time than in the standard approach. What we do
then is to interpolate the evolution kernel (x) so that the the evolution of parton distributions, Eq.
6.86, is much faster.
The only problem is since (x) grows heavily at large x, it is numerically difficult to interpolate
it. A way to overcome this difficulty is to substract to the exact result for (x) the large-x behavior
Eq. 6.76 so that the resulting function to be interpolated is a smooth one. We interpolate the
subtracted x-space evolution kernel, defined as
e s Q2 , s Q20 )
(x,
e (int) 2 2
(x, s Q , s Q0 ) = , (6.97)
x1 (x, s (Q2 ) , s (Q20 ))
where x1 (x) is given by Eq. 6.76 and the inclusion of the coefficient function does not affect
the leading large x behavior. We also interpolate (x) in Eq. 6.86. In Fig 6.25 we represent the
behaviour of the substracted evolution factor (int) (x), it is clear that this functional dependence is
much more efficient to interpolate than that of the bare evolution factor, represented in Fig. 6.25.
Therefore, the nonsinglet structure function will be given in terms of an interpolated evolution factor
as
Z 1
dy e (int) x 2
F2N S (x, Q2 ) = x (x, s Q2 , s Q20 )x1 (x, s Q2 , s Q20 )q , Q0 , (6.98)
x y y
The neural network approach to parton distributions 125
x2 F2N S,LT (T MC , Q2 ) M 2 x3
F2N S,LT,T MC (x, Q2 ) = 3/2 2 + 6 2 2 I2 (T MC , Q2 ) , (6.99)
T MC Q
x xuv (x, Q20 ) (LH) xuv (x, Q20 ) (FT) Rel. error
Leading order
107 5.7722 105 5.7722 105 3.3760 106
106 3.3373 10 4
3.3373 104 1.6880 106
105 1.8724 10 3
1.8724 103 1.9212 106
104 1.0057 10 2
1.0057 102 1.4095 106
103 5.0392 10 2
5.0392 102 2.6145 106
102 2.1955 10 1
2.1955 101 3.1065 106
1
0.1 5.7267 10 5.7267 101 6.4524 106
1
0.3 3.7925 10 3.7925 101 9.2674 106
1
0.5 1.3476 10 1.3476 101 1.1307 105
2
0.7 2.3123 10 2.3122 102 2.1165 105
4
0.9 4.3443 10 4.3440 104 6.3630 105
Next-to-Leading order
107 1.0616 104 1.0616 104 2.1462 106
106 5.4177 10 4
5.4177 104 8.7799 106
105 2.6870 10 3
2.6870 103 9.7796 106
104 1.2841 10 2
1.2841 102 1.3380 105
103 5.7926 10 2
5.7926 102 8.5063 106
102 2.3026 10 1
2.3026 101 3.0757 107
1
0.1 5.5452 10 5.5452 101 7.6419 107
1
0.3 3.5393 10 3.5393 101 2.6979 106
1
0.5 1.2271 10 1.2271 101 2.4466 105
2
0.7 2.0429 10 2.0429 102 1.4810 105
4
0.9 3.6096 10 3.6094 104 6.0762 105
Next-to-Next-to-Leading order
107 1.5287 104 1.5287 104 1.5497 105
106 6.9176 10 4
6.9176 104 5.0711 106
105 3.0981 10 3
3.0981 103 9.5455 106
104 1.3722 10 2
1.3722 102 1.8022 105
103 5.9160 10 2
5.9160 102 5.0631 106
102 2.3078 10 1
2.3078 101 2.4853 106
1
0.1 5.5177 10 5.5177 101 2.4747 106
1
0.3 3.5071 10 3.5071 101 2.8430 107
1
0.5 1.2117 10 1.2117 101 3.5893 105
2
0.7 2.0077 10 2.0077 102 5.5823 106
4
0.9 3.5111 10 3.5109 104 5.8172 105
Table 6.15:
Benchmark tables for the evolution formalism described in this Section in the Fixed Flavor Number Scheme
with Nf = 4 active flavors. The procedure for the benchmarking is described in detail in Section of 1.3 of
Ref. [183] and in Section 4.4 of [31]. We use the same notation as in the above references.
The neural network approach to parton distributions 127
Figure 6.26:
General strategy for the parametrization of the nonsinglet parton distribution qNS (x, Q20 ) from
experimental data on the nonsinglet structure function F2NS (x, Q2 ).
For the parametrization of the nonsinglet parton distribution qN S (x, Q20 ), the same data on the
nonsinglet structure function F2N S (x, Q2 ) that was used in Ref. [11] from the NMC [138] and the
BCDMS [139, 140] collaborations will be used in the present analysis. The main features of this
experimental data from was discussed in Ref. [11]. While BCDMS covers the large-x, large-Q2
kinematical region, NMC covers the complementary small-x, small Q2 region, with some overlap in
the intermediate regions. BCDMS data is rather precise, while the small-x NMC data has larger
uncertainties. The kinematical coverage of the F2N S experimental data available from these two
experiments can be seen in Fig. 6.27. The requirement of a simultanous measurement of the proton
The neural network approach to parton distributions 128
Figure 6.27:
Kinematical
coverage of the available experimental data on the F2NS (x, Q2 ) structure function in the
x, Q plane. Note that fixed target scattering geometry implies that large-Q2 data is at large-x and
2
conversely.
F2p and deuteron F2d structure functions for the determination of the nonsinglet combination, Eq.
6.90, restricts sizeably this kinematical coverage as compared with the one for F2p only, see Fig. 6.5.
However, there are proposals [196, 197] for future deep-inelastic scattering facilities which should
extend this kinematical range in the small-x region and in addition reduce the associated statistical
and systematic uncertainties. The inclusion of additional high statistics data from JLAB [198, 199]
at the largest values of x would require the use of resummed parton evolution.
The main difference with respect the dataset used in Ref. [11] is that now we have applied
to this dataset the kinematical cuts Q2 3 GeV2 and W 2 6.25 GeV2 . The motivation for
the two kinematical cuts is to remove those data points for which the application of perturbation
theory is questionable, as has been discussed in Section 4.1.2. The cut in Q2 is required to remove
experimental data for which higher twist corrections might be sizable, and the cut in W 2 removes
those data points at very large x for which the application of unresummed perturbation theory is
not reliable. After these kinematical cuts, we are left with a total of Ndat = 483 data points.
In Table 6.16 the features of the experimental data that is included in the present analysis are
summarized. Note that the average error is substantially larger for the NMC experiment than for
the more precise BCDMS measurements. All the experiments included in our analysis provide full
correlated systematics, as well as normalization errors. The description of the different uncertainties
of the experimental data that will be used can be found in Ref. [11].
Table 6.16: Experiments included in this analysis. Note that the values of and cov are given as percentages.
The data from the two experiments partially overlaps in the medium-x, medium-Q2 region.
The first step of our strategy, as discussed in Chapter 5, is to construct a Monte Carlo sample
The neural network approach to parton distributions 129
F2N S (x, Q2 )
D h
Nrep iE 10 100 1000
(art)
PE F 20% 6.4% 1.3%
(art) rep
r F(art) 0.97
5
0.997 0.999
V
(art) dat 6.1 10 1.9 105 6.7 106
PE dat
(art) 33% 11% 3%
dat
0.011 0.011 0.011
(art)
r
(art) 0.94 0.994 0.999
V
dat 0.10 9.4 103 1.0 103
(art)
dat
0.182 0.097 0.100
(art)
r
(art) 0.47 0.79 0.97
V
cov dat 5.5 109 1.7 1010 5.7 1011
(art)
cov
(art)dat 1.3 105 7.6 106 8.1 106
r cov 0.41 0.81 0.975
of replicas of the experimental data. Note that as in the case of the parametrization of the lepton
energy spectra discussed in Section 6.3, the quantity for which the sample of replicas is generated,
F2N S (x, Q2 ), does not coincide with the quantity that is parametrized with neural networks, the
parton distribution qN S (x, Q20 ). Since in this case the experimental data is the same as in Ref. [11],
we use the same relations to generate the Monte Carlo sample, namely Eq. 5.3, which for the present
situation reads,
Nsys
N S(art)(k) (k) N S(exp) (k)
X (k)
Fi = 1 + rN N Fi + rt,i t,i + rsys,j sys,ji , i = 1, . . . , Ndat , k = 1, . . . , Nrep .
j=1
(6.102)
In Table 6.17 the statistical estimators from the sample of generated replicas are presented, for the
data set that is used in the fit, where the statistical estimators have been defined in Section 5.1.
This table shows that a sample of 1000 replicas is sufficient to ensure average scatter correlations of
99% and accuracies of a few percent on structure functions, errors and correlations.
Once the Monte Carlo sample of replicas of the structure function F2N S (x, Q2 ) has been generated,
the second step is to train a neural network on each replica of the experimental data. However the
situation is now more complicated than in the structure function case [11]. Now, while the Monte
Carlo sample is constructed from the experimental data on the nonsinglet structure function, the
neural networks parametrize the nonsinglet parton distribution. Therefore, following Eqns. 6.86
and 6.91, the k-th neural network which parametrizes a parton distribution at the starting evolution
scale Q20 is related to the k-th replica of the experimental data on the nonsinglet structure function
at (x, Q2 ) as
Z 1
N S(net)(k) dy e (net)(k) x 2
F2 x, Q2 = x x, s Q2 , s Q20 qN S ,Q , k = 1, . . . , Nrep ,
x y y 0
(6.103)
where the evolution factor e which includes the effect of the perturbative coefficient function
CN S x, s Q2 has been defined in Eq. 6.93.
In the present analysis the value of the strong coupling s Q2 will be kept fixed at the current
The neural network approach to parton distributions 130
1
NX dat 1
(k) N S(net)(k) N S(art)(k) (k) N S(net)(k) N S(art)(k)
E3 = Fi Fi cov Fj Fj ,
Ndat Npar i,j=1 ij
(6.105)
with the covariance matrix defined in Eq. 5.35. Note that the error function is normalized to the
number of degrees of freedom [100]. The minimization algorithm used is genetic algorithms with
dynamical stopping of the training. Weighted training will also be used in order to guarantee that
at the end of the training the total 2 , Eq. 5.76, which now reads
NX dat
D E D E
1 N S(net) N S(exp) N S(net) N S(exp)
2 = Fi Fi cov1 ij Fj Fj ,
Ndat Npar i,j=1 rep rep
(6.106)
as computed for the two experiments has a similar value.
The architecture of the neural network that parametrizes qN S (x, Q20 ) must be determined as
discussed in Section 5.2.3. This optimal architecture is obtained by the requirements that it must be
complex enough to reproduce the experimental data patterns and that the results are independent
of the precise number of neurons. In particular it has been checked that the results are stable for
architectures with one more or one less neuron than the reference architecture, 2-5-3-1. This ensures
that the network architecture is redundant for the problem under consideration.
To define the optimal training strategy we should determine which is the suitable value of the
2stop parameter that defines the dynamical stopping of the training, as discussed in Section 5.2.3.
We will use the overlearning criterion, introduced in the same Section, to determine its value. The
overlearning criterion to determine the length of the training states that the training should be
stopped when the neural network begins to overlearn, that is, it begins to follow the statistical fluc-
tuations of the experimental data rather than the underlying physical law. The onset of overlearning
can be determined by separating the data set into two disjoint sets, called the training set and the
validation set. One then minimizes the error function, Eq. 5.34, computed only with the data points
of the training set, and analyzes the dependence of the error function Eq. 5.34 of the validation set
as a function of the number of generations.
Then one computes the total 2 , for both the training and validation subsets. It turns out that
in the present case fluctuations in the data set turn out to be very large, and one has to average over
a large enough number of partitions to obtain stable results. The onset of overlearning is determined
as the length of the training such that the 2 of the validation set saturates or even rises while the
2 of the training set is still decreasing.
The neural network approach to parton distributions 131
In Fig 6.28 we show the 2 for both the training and validation subsets as a function of genetic
algorithms generations, averaged over a large enough number of partitions. The training partition
contains the 30% of all the data points, selected at random, while the validation partition includes
the complementary 70% data points. If Npart is the number of different partitions used
to determine
the onset of overlearning, in Fig. 6.28 we show both the average value of the total 2 part and its
variance 2 , defined as
Npart
2 1 X 2
part = l , (6.107)
Npart
l=1
Npart
2
1 X 2
2 2 = 2l 2 part , (6.108)
Npart
l=1
where 2l is the value of the error function for the l-th partition. The number of required partitions
Npart has to be large enough so that the resulting distribution is gaussian and therefore the standard
deviation, Eq. 6.108, has the standard statistical interpretation. In Fig. 6.29 we show the histograms
for the distributions of 2tr,l and 2val,l over partitions. One observes that in the present case Npart =
20 is enough to achieve convergence. In Fig. 6.30 we show the relation between the total 2 and the
2stop used in the dynamical stopping of the training. This relation is needed in the determination
of the appropiate value of 2stop in the dynamical stopping of the training to achieve a given value
of the total 2 . The overlearning anaysis points out to the fact that the final total 2 should be
around 2 0.8, which from Fig. 6.30 implies a value 2stop 2.0 for the dynamical stopping of the
training.
Figure 6.28:
The average 2 part
over partitions as a function of the number of genetic algorithms generations. The
bands show also the associated variance 2 . The up-right plot shows the ratio 2val /2tr .
With the stability estimators defined in Section 5.3.2, one can assess which is the required number
of trained replicas Nrep to obtain stability of the results. To this purpose, one compares in Table
The neural network approach to parton distributions 132
Distribution of 2 Distribution of 2
12 12
10 Training 10 Validation
Entries 20 Entries 20
Mean 0.757 Mean 0.788
Error 0.060 Error 0.035
8 8
6 6
4 4
2 2
0 0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
2 2
Figure 6.29:
Distribution of 2l for the different partitions used in the overlearning test, for both the training (left) and
the validation (right) subset of points.
6.18 the results for the probability measure of qN S (x, Q20 ) for different values of Nrep . That is, one
compares the probability measure as obtained from training a sample of neural networks on Nrep
Monte Carlo replicas of the experimental data with another probability measure, constructed from
a different set of Nrep Monte Carlo replicas. One expects that the differences in this comparison are
reduced as the value of Nrep is increased. For these stability comparisons, we use in the data regions
Nedat = 14 points linearly spaced between x = 0.04 and x = 0.75, and the same number of points in
the extrapolation region logarithmically spaced between x = 0.001 and x = 0.01. It can be observed
that the required stability is obtained for Nrep = 500, as expected.
Table 6.18: Stability estimators, defined in Section 5.3.2 for the probability measure of qNS (x, Q20 ) as a
function of the number of trained replicas Nrep , both in the data and in the extrapolation region.
As discussed in detail in Ref. [201], the most unbiased fitting strategy is to parametrize with a
neural network the quantity xqN S (x, Q20 ). On top of that, the nonsinglet parton distribution has to
satisy the kinematical constraint that
qN S (x = 1, Q20 ) = 0 , for all Q20 , (6.109)
that is, parton distributions vanish in the elastic limit. This constraint will be implemented with
one of the techniques discussed in Section 5.2.4, the hard-wiring of the kinematical constraint in
the neural network parametrization. With this two considerations, we write the non-singlet parton
distribution in Eq. 6.103 as
(net)(k)
(net)(k) qeN S (x, Q20 )
qN S (x, Q20 ) = (1 x) , (6.110)
x
(net)(k)
where now it is the quantity qeN S (x, Q20 ) the one that is parametrized with a neural network. It
can be checked that the neural network complemented with some simple functional form dependence
The neural network approach to parton distributions 133
Figure 6.30:
The total 2 , as computed in Eq. 5.76 from averages over replicas, as a function of the 2stop used in the
dynamical stopping of the training.
is still an unbiased approximant to the true value of the nonsinglet parton distribution. In particular
we will show that the results of the fit are not affected if in the expression
(net)(k)
(net)(k) m qeN S (x, Q20 )
qN S,mn (x, Q20 ) = (1 x) , (6.111)
xn
the values of the parameters m and n are modified from their default values, m = 1 and n = 1.
The stability of the results with respect to reasonable variations of the m, n exponents can be made
quantitative by means of the stability statistical estimators introduced in Section 5.3.2. Fig. 6.31
show in two particular cases how the results of the fit are stable againts different choices of the
polynomial exponents m and n in Eq. 6.111. This comparison is made more quantitative with the
stability estimators, as can be seen in Table 6.19.
Table 6.19: The stability estimators defined in Section 5.3.2 for the comparison of fits with different
polynomial exponents, see Fig. 6.31. The method used to compute these estimators is the same that has
been used to assess the stability with respect Nrep .
Now we discuss how the results for the parametrization of qN S (x, Q20 ) depend on the kinematical
cut in Q2 . The kinematical cut in Q2 has been chosen rather low (Q2 3 GeV2 ) since first
of all Target Mass Corrections taken into account in the theoretical expression for the nonsinglet
structure function F2N S , Eq. 6.99, and second we have not observed evidence, within experimental
uncertainties, of a dynamical higher twist correction. High quality data at large-x would allow
The neural network approach to parton distributions 134
0.2 0.04
0.15
0.02
0.1
0
0.05
-0.02
0
10-2 10-1
x x
Figure 6.31:
The nonsinglet parton distribution qNS (x, Q20 ) for different values of the largex exponent (right) and of
the small-x exponent (left). The polynomial exponents m and n are defined in Eq. 6.111.
to extract the higher twist contribution HT(x) to the nonsinglet structure function with higher
accuracy with a variety of techniques [202, 203], even if it is known that the magnitude of the
extracted HT correction is reduced if evolution is performed at the NNLO level, as it is in our case.
The kinematical cut in the invariant mass of the final hadronic state W 2 Q2 (1 x) 6.25 GeV2
removes those points at the largest values of x for which a Sudakov resummed evolution [32, 204]
would be needed.
xqns(x, Q2)
0
0.08
Q2min=3 GeV2
0.06
Q2min=9 GeV2
0.04
0.02
-0.02
10-2 10-1
x
Figure 6.32:
Comparison of the results for the nonsinglet parton distribution xqNS (x, Q20 ) for two different kinematical
cuts: the one for the reference fit Q2 3 GeV2 and another with a more conservative one Q2 9 GeV2 .
In Fig. 6.32 we compare the results of the reference final fit, with the standard kinematical cut
in Q2 3 GeV2 with another fit with exactly the same training parameters but with kinematics
restricted to Q2 9 GeV2 . It is clear that the uncertainties in the parametrization of qN S (x, Q20 )
grow sizeably at medium and small x once the subset of data points with 3 Q2 9 GeV2 is
removed from the fit. Note that the size this subset of points is rather large, 150 data points, the
The neural network approach to parton distributions 135
Large x Small x
0.3 4
CTEQ6.1
MRST02
3 NNPDF
0.25
2
0.2
1
(x, Q2)
(x, Q2)
0
0
CTEQ6.1
MRST02
0.15 0
NNPDF
NS
NS
q
q
0.1 -1
-2
0.05
-3
0
-4
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10-2 10-1
x x
Figure 6.33:
The neural network parametrization of the nonsinglet parton distribution, compared to the CTEQ and
MRST results, both at largex (left) and at small-x (right). Note that uncertainties grow very fast in the
small-x region, since there is no experimental data for x <
10 .
2
FNS 2
2 (x, Q ) FNS 2
2 (x, Q )
0.07 0.07
Q2 = 5 GeV2 2
Q = 50 GeV
2
0.06 0.06
NNPDF05 NNPDF05
0.05 0.05
Neural F2NS Neural F2NS
0.04 0.04
0.03 0.03
0.02 0.02
0.01 0.01
0 0
-0.01 -0.01
-0.02 -2 -0.02 -2
10 10-1 10 10-1
x x
Figure 6.34:
Comparison of the nonsinglet structure function F2NS (x, Q2 ) as computed with the parametrization of
qNS (x, Q20 ) discussed here with the structure function parametrization of Ref. [11], for Q2 = 5 GeV2 (left)
and Q2 = 50 GeV2 (right).
Once the set of Nrep neural networks have been trained with the strategy described above, the
third step of our approach is to validate these results by means of statistical estimators, defined in
Section 5.3. Table 6.20 summarizes the most important estimators for both the total number of
data points and the individual experiments. Note that the same patterns as in Ref. [11] is observed:
errors are reduced, which is sign that the neural network has found the underlying law from the
experimental data, while correlations are increased, in a way that the covariances as computed for
the probability measure for qN S (x, Q20 ) approximately reproduce the corresponding experimental
quantities. Note also that the total 2 for the two experiments included in the fit, NMC and
BCDMS, is very similar. This result is only achieved after a detailed study of the optimal weighted
training strategy, as discussed in Section 5.2.3.
Now we discuss our final results for the parametrization of the nonsinglet parton distribution
The neural network approach to parton distributions 136
F2N S (x, Q2 )
Total NMC BCDMS
2 0.77 0.78 0.75
hEi 2.07 1.94 2.18
r F (art)
(exp) 0.80 0.78 0.95
(net) dat 0.011 0.017 0.006
dat
0.004 0.005 0.003
(art)
r
(exp) 0.51 0.-0.04 0.91
(net) dat 0.18 0.04 0.16
dat
0.47 0.43 0.50
(art)
r
(exp) 0.31 0.19 0.39
6
cov (art) dat 8.6 10 1.0 105 7.2 106
6
cov
dat
9.5 10 1.4 105 5.5 106
r cov(art) 0.25 0.22 0.76
qN S (x, Q20 ), and particular attention is paid to the comparison with the same parton distribution
as obtained with the standard approach described in Section 4.2. In Fig. 6.33, the results of our
parametrization of qN S (x, Q20 ) are presented, both at large x and at small x, where we also compare
with the results of the CTEQ (the CTEQ61 analysis of Ref. [12]) and MRST (the MRST2001E of
Ref. [205]) collaborations. The most striking feature of our results is that the size of the error band
at small x is much larger than in the fits with the standard approach. Note that even if the global
fits of Refs. [12, 205] include much more data than the present analysis, the low x behavior of the
nonsinglet parton distribution is only constrained by the data on the nonsinglet structure function
F2N S that is used in this analysis.
We can also analyze the effects that including the information of QCD parton evolution has
on the parametrization of experimental data. These effects can be seen in Figs. 6.34, where the
results of the present analysis are compared with the parametrization of the nonsinglet structure
function F2N S (x, Q2 ) from Ref. [11]. One observes that the two results are consistent within the
respective uncertainties. Note that due to the effects of QCD evolution, experimental measurements
of the nonsinglet structure function F2N S (x, Q2 ) for different values of Q2 correspond to the same
measurement of qN S (x, Q20 ) repeated many times.
In summary, in this part of the thesis we have described the first application of the technique
described in Chapter 5 to the parametrization of parton distribution functions. The main result
appears to be that uncertainties in parton distributions obtained with the standard approach are
underestimated, specially in the extrapolation region. The results of this part of the thesis will be
continued in Ref. [180], were the full singlet evolution will be considered.
Chapter 7
In this thesis we have described in detail a general technique to parametrize experimental data in an
unbiased way with faithful estimation of the associated experimental uncertainties. This technique
was first introduced in Ref. [11] and during the course of this thesis it has been improved and
extended to the analysis of other processes of interest. In particular we have shown the first appli-
cation to the problem that motivated its development, the parametrization of parton distribution
functions.
The general strategy introduced in Ref. [11] has been extended in several ways. First of all, the
technique has been applied to different situations than the original application, the parametrization
of deep-inelastic structure functions from a moderately large data set. In these new applications,
for example, we have faced problems with a very large amount of experimental data from different
experiments and problems in which the parametrized quantity is related to the experimental data
in through complicated relations, like convolutions. Second, new minimization algorithms for neural
network training have been introduced, in particular genetic algorithms which are suited for the
minimization of highly nonlinear error functions. We have also introduced more refined criteria to
assess the optimal point to stop the training of the neural networks. Additional statistical estimators
to assess several characteristics of the probability measures have also been introduced. Finally,
the application of the strategy discussed in this thesis to several different processes, deep-inelastic
structure functions, hadronic tau decays, semileptonic B meson decays and parton distribution
functions, increases our confidence on its general validity.
The number of possible applications of the general strategy to parametrize experimental data
described in this thesis is rather large. The most important one is to generalize the results of Section
6.4 to the singlet sector [180] and to produce a full set of neural network parton distributions with
a faithful estimation of their uncertainties. Another promising application is the parametrization of
the nonperturbative shape function from semileptonic and radiative B meson decays. Astroparticle
physics is another field in which several applications of the neural network approach have been
envisaged. In particular, there is an ongoing project [206] in which the general strategy discussed
in this thesis is used to determine the atmosferic neutrino flux from experimental data on neutrino
event rates.
137
The neural network approach to parton distributions 138
Chapter 8
Conclusiones
En la presente tesis doctoral hemos descrito en detalle una novedosa tecnica general para construir
la densidad de probabilidad de una funcion a partir de medidas experimentales, esto es, una tecnica
para parametrizar datos experimentales, sin necesidad de hacer hipotesis alguna sobre la forma
funcional de la funcion a parametrizar y con una estimacion fidedigna de las incertidumbres asoci-
adas, que permite una propagacion de los errores a observables arbitrarios sin necesidad de asumir
aproximaciones lineales. Esta tecnica fue introducida en [11] y durante el transcurso de esta tesis
doctoral ha sido mejorada en diferentes aspectos y extendida mediante su aplicacion a otros procesos
de interes. En particular hemos mostrado su aplicacion al proceso que motivo originariamente el
desarrollo de esta tecnica, la parametrizacion de distribuciones de partones en el proton.
La tecnica general ha sido extendida en diversas direcciones. Primero de todo, hemos demostrado
la generalidad de esta tecnica mediante su aplicacion a cuatro procesos de interes, cada uno de ellos
diferente a la aplicacion original descrita en [11]. Por ejemplo, hemos tratado problemas con un gran
numero de datos provenientes de diferentes experimentos, asi como problemas en los que la relacion
entre los datos experimentales y la funcion que estamos parametrizando viene dada por una serie
de convoluciones. En segundo lugar, nuevos algoritmos para el entrenamiento de redes neuronales
han sido introducidos, en particular los conocidos como algoritmos geneticos, que son necesarios
para la minimizacion de funciones de error altamente no lineales. Finalmente, hemos ampliado el
conjunto de tecnicas estadisticas usadas en la validacion de los resultados, esto es, en la comprovacion
quantitativa de como la densidad de probabilidad construida reproduce las caracteristicas de los datos
experimentales.
Las posibles aplicaciones de la estrategia general para parametrizar datos experimentales descrita
en la presente tesis doctoral son ciertamente numerosas. La mas imporante de estas es la general-
izacion de los resultados decritos en la Seccion 6.4 al sector singlet de las distribuciones de partones,
para obtener de esta manera un conjunto completo de distribuciones de partones parametrizadas con
redes neuronales con estimacion fidedigna de las incertidumbres asociadas. Otra aplicacion promete-
dora es la parametrizacion de la shape function, una funcion que contiene los efectos no perturbativos
dominantes en un cierto tipo de desitegraciones del meson B. Finalmente, otra aplicacion interesante
consistiria en la parametrizacion del flujo de neutrinos atmosfericos a partir de datos experimentales
de detecciones de neutrinos en experimentos como Super Kamiokande.
139
The neural network approach to parton distributions 140
Appendix A
Unless otherwise indicated, all the averages are performed with respect to the Nrep measurements,
and we assume that we are in the limit where Nrep is very large. The above estimators describe
features of the underlying probability density of the experimental data, and they approach the true
values as the number of measurements Nrep becomes very large.
This result is quantitatively described by the variances of the different estimators. These esti-
mator measure the difference of the values of the mean, variance of the data and correlations as
determined with averages over measurements with respect to their true values. These variances,
written in terms of moments of the Fi , are given by
1. Variance of the mean:
1 D E 2
V [Fi ] = (Fi hFi i)2 = i , (A.4)
Nrep Nrep
141
The neural network approach to parton distributions 142
It is clear from the above expressions that the variances of the mean, the error and the correlations
decrease when the number of measurements is increased. This implies that as statistics are increased,
the measured values of the mean, error and correlations are closer to their true values. Therefore,
to compare different probability measures, the variances of the mean, the error and the correlation
as defined above should be used.
12 22
2 = . (A.9)
12+ 22
To obtain the proof of the above results, note that if errors are gaussianly distributed, the maximum
likelihood condition imply that the mean x minimizes the 2 function
(x1 x) (x2 x)
2 = + , (A.10)
12 22
and the variance is determined by the condition
2 = 2 (x + ) 2 (x) , (A.11)
The neural network approach to parton distributions 143
which for 2 = 1 leads to Eq. A.9. Note that these properties only hold for gaussian measurements.
An alternative way to compute the mean and the variance of the combined measurements x1
and x2 is the Monte Carlo method: generate Nrep replicas of the pair of values x1 , x2 gaussianly
distributed with the appropriate error,
(k) (k)
x1 = x1 + r1 1 , k = 1, . . . , Nrep , (A.12)
(k) (k)
x2 = x2 + r2 2 , k = 1, . . . , Nrep , (A.13)
(k)
where r are univariate gaussian random numbers. One can then show that for each pair, the
weighted average
(k) (k)
x 2 + x1 22
x(k) = 1 22 , (A.14)
1 + 22
is gaussianly distributed with central value and width equal to the one determined in the previous
case. That is, it can be show that for a large enough value of Nrep ,
D E Nrep
1 X (k)
x(k) = x =x, (A.15)
rep Nrep
k=1
With this modification, the sample of Monte Carlo replicas of x1 and x2 also reproduces the exper-
imental correlations. This can be seen with the standard definition of the correlation,
* (k)
(k)
+
x1 x1 x2 x2
= hr1 r2 irep = 12 . (A.18)
1 2
rep
Therefore, the Monte Carlo approach also correctly takes into account the effects of correlations
between measurements.
In realistic cases, the two procedures are equivalent only up to linearizations of the underlying
law which describes the experimental data. We take the Monte Carlo procedure to be more faithful
in that it does not involve linearizing the underlying law in terms of the parameters. Note that as
emphasized before, the error estimation technique that is described in this thesis does not depend
on whether one uses neural networks or polynomials as interpolants. Conversely, one could derive
1- errors on the parameters of the neural network as an alternative to estimate the uncertainties
in the parametrized function.
As an example of the application of the Monte Carlo error estimation to standard fits of parton
distributions with polynomial functional forms, we repeat the nonsinglet fit of Refs. [207, 208].
The neural network approach to parton distributions 144
We use exactly the same techniques as discussed in Section 6.4.2 but with a functional form to
parametrize qN S (x, Q20 ) instead of a neural network, which is taken to have the functional dependence
[207]
1
qN S (x, Q20 ) = uv dv 2(d u)MRST (x, Q20 ) , (A.19)
6
were we have defined
1
uv (x, Q20 ) = Auv xau (1 x)bu 1 1.108x 2 + 26.283x , (A.20)
1
dv (x, Q20 ) = Adv xad (1 x)bd 1 + 0.895x 2 + 18.179x , (A.21)
(d u)MRST (x, Q20 ) = 1.195x0.24 (1 x)9.10 (1 + 14.05x 45.52x) , (A.22)
and the (d u) combination is taken from the MRST global analysis [75]. The normalization
constants are fixed by the conservation of the number of valence quarks
Z 1 Z 1
dx uv (x) = 2 , dx dv (x) = 1 . (A.23)
0 0
The values of the parameters obtained from a fit to the experimental data are summarized in Table
A.1, were we compare with the results of the original fit [207]. In particular one observes that the
exponent which governs the small-x behavior of the nonsinglet parton distribution, au , is correctly
reproduced as expected, since at small-x the experimental data that determines the behavior of
qN S (x, Q20 ) is the same in the two cases.
au bu ad bd
Refs. [207, 208] -0.686 4.199 -0.587 6.190
NNPDF -0.705 0.844 0.384 1.035
Table A.1:
The results of a fit to the nonsinglet structure function F2NS (x, Q2 ) for a parton distribution with functional
dependence given by Eq. A.19, compared with the results of the fit of [207, 208].
Note that Refs. [207, 208] have a different parametrization above and below x = 0.3, while we
take only the one they use for x < 0.3 and that are the large x behavior they use also HERA data.
One observes in Fig. A.1 that the small-x behavior of our polynomial fit coincides precisely with the
small-x behavior of the nonsinglet parton distributions from the MRST and CTEQ global analysis.
Note also that at medium and smallx the uncertainties as determined with the standard methods
introduced in Section 4.2 appear to be underestimated.
xqns(x, Q2) NS
0 F2 (x, Q2)
0.08 0.07 2 2 2
11 GeV < Q <12 GeV
0.07 NNPDF05
0.06
CTEQ
Neural network fit
0.06 MRST
0.05
0.05 Polynomial fit
Data
0.04 0.04
0.03 0.03
0.02
0.02
0.01
0.01
0
0
-0.01
-0.02 -2 -0.01
10 10-1 10-2 10-1
x x
Figure A.1:
The results of a polynomial fit with the Monte Carlo method for error estimation with the parametrization
of Ref. [207], compared with the corresponding fit with neural networks (left) and with the standard fits of
the CTEQ and MRST collaborations (right).
The best theoretical prediction in this case will therefore correspond to fitting a constant which
we will denote by k. The diagonal error, Eq. 5.33, will be given by
Note that to derive such expression one has to assume that the experimental measurements are
gaussianly distributed. If normalization uncertainties are neglected, then the expression for the
covariance matrix error, Eq. 5.34, is given by
2 c2 2 2 2
1 + (x1 k) 12 22
+ (x2 k) 2 c 2 2 (x1 k) (x2 k) 2 c 2
E3 = E2 1 2 1 2
, (A.25)
2 1 1
1 + c 2 + 2
1 2
and finally the full 2 , including normalization uncertainties, Eq. 5.76, is given by
c2 +x22 f2 2 +x2 2 c2 +x1 x2 f2
1 + (x1 k)2 12 22
+ (x2 k)2 c2 12 f 2 (x1 k) (x2 k) 12 22
2 = E2 21 2 2 2 2
. (A.26)
1 1 2 x x 2
1 + c2 12
+ 2 + f 12 + 22 + (x1 x2 ) c2 f2
2 1 2 1 2
Note that for example as c 0 the correlated error function E3 reduces to the uncorrelated one
E2 , as expected.
Once we have defined the different error functions, we can compute the values of k for which
each of the different error functions has a minimum. This is achieved imposing the conditions
d d d 2
E2 (k) = 0, E3 (k) = 0, (k) =0. (A.27)
dk dk dk
k=k2 k=k3 k=k2
For the diagonal error, Eq. A.24 one has the standard weighted average
x1 22 + x2 12
k2 = , (A.28)
12 + 22
The neural network approach to parton distributions 146
then for the covariance matrix error, Eq. A.25 one has the same minimum as before
x1 22 + x2 12
k3 = k2 = . (A.29)
12 + 22
This points to the fact that in a realistic case the result of a minimization of the diagonal error
function, Eq. 5.33 should be rather similar so the corresponding result when the minimized error
function is Eq. 5.34. Finally for the full 2 with normalization uncertainties one has
x1 22 + x2 12
k2 = , (A.30)
12 + 22 + (x1 x2 )2 f2
which as can be rather different from the naive estimation Eq. A.28 if data are incompatible and
normalization error is sizeable. This is true even if normalization effects are small if the measured
values are inconsistent, since the effect of normalization uncertainties is proportional to f (x1 x2 )
are thus can be arbitrarily large. In particular one has a sizeable effect if
f2 2
(x1 x2 ) 1 , (A.31)
12 + 22
so one can have much larger effects than those naively expected from the value of f . This shows that
the error function with normalization errors as defined in Eq. A.26 leads to completely unexpected
and anti-intuitive results.
The quantities that are relevant to compute are the values of the different error functions at the
different possible minima ki . The first one is the value of the diagonal error when minimizing the
same error, then one has
(x1 x2 )2
E2 (k2 ) = . (A.32)
12 + 22
Another interesting quantity is the ratio between E2 and E3 when minimizing either E2 or E3 (since
as long as normalization uncertainties are not included the minimum is the same for both quantities).
This ratio is given by
2 +2
E2 (k2 ) 1 + c 12 22
1 2
= 2 , (A.33)
E3 (k3 ) 1 + c (x1x 2)
2 2
1 2
This ratio is typically of order 1, showing why both errors are comparable in fits without normal-
ization error. In particular if x1 = x2 one has
E2 (k2 ) 2 + 2
= 1 + c 1 2 2 2 1 . (A.34)
E3 (k2 ) 1 2
Finally let us consider the case in which we minimize the full 2 with normalizations errors included
in the experimental covariance matrix Eq. 5.76, and we compute the following ratio at the value
k2 for which Eq. 5.76 has a minimum
2
2
4 x1 x22
E2 k2 1 + (x12x
+
2)
2 f 2 + 2 + 2
2
1 2 1 2 f
= h i . (A.35)
E2 (k2 ) (x x ) 2 2
1 f2 2 +2
1 2
1 2
Note that the above quantity depends not only on (x1 x2 ) but also on the absolute magnitude
x1 , x2 of the measurement. The above ratio always verifies the property
E2 k2
1. (A.36)
E2 (k2 )
The neural network approach to parton distributions 147
Therefore including normalization errors in the minimized 2 results not only in a lower value of the
fitted parameter k, but also on a larger diagonal error E2 , and the two effects arise from the same
source: combination of inconsistent data and normalization errors. This effect can be very large
even if normalization errors themselves are small due to the presence of inconsistent data. One can
explicitely check that the same conclusions hold in the case that the inconsistent data comes from
different experiments.
To check the effects of the incorrect treatment of the normalization uncertainties in a more
realistic fit, in Fig. A.2 we have repeated the F2 (x, Q2 ) parametrization described in Section 6.2
but with the incorrect treatment of normalization uncertainties, that is, with the minimization of
the error function with the covariance matrix Eq. 5.6, rather than with Eq. 5.35, which does not
include the normalization uncertainties. One observes that the results of the incorrect treatment of
the normalization errors is that the structure function is systematically lower than the result with
the correct treatment, and this effect is much larger than the size one would naively expect since
normalization errors are of the order of 2 3%.
Figure A.2:
Comparison of parametrizations of the proton structure function F2 (x, Q2 ) with the correct and incorrect
treatment of normalization uncertainties. Note that the effect is much larger than the one naively expected
from the relative size of normalization errors, N 2 3%.
The neural network approach to parton distributions 148
Appendix B
In this Appendix we summarize the present status of global fits of parton distribution functions. The
standard approach to the determination of parton distributions from experimental data has been
discussed in detail in Section 4.2. We do not attempt now to review all the huge available literature
in the subject but rather to provide the reader with a brief description of the current status of the
field. Much more detailed information can be obtained from the original references as well as from
proceedings of workshops like [183, 31].
During the 1980s and the early 1990s many sets of parton distributions were developed to try
to describe all the available hard scattering data [209, 210, 211, 212, 213, 214]. Today the most
commonly used sets of parton distributions are those of the CTEQ and MRST Collaborations.
This is so because these collaborations take into account all modern data from a wide variety of
experiments as well as the progress in perturbative QCD computations, and provide regular updates
of their parton distributions sets. Now we review the current status of the global analysis of these
two groups. Note that even if these two groups release new versions of their sets rather frequently,
in general these updates are only minor changes of a basic set, like CTEQ4 or CTEQ5. Note also
that all modern QCD analysis provide estimations of the uncertainties associated to the parton
distribution functions.
The MRS(T) Collaboration presented his first global parton fit in Ref. [215]. Then this global
fit was sequentially improved from a series of works: [216, 217, 218, 219, 220, 221, 75]. One of
their latest sets of parton distributions is MRST2001 [221], which is described in some detail in the
following. The experimental data that is used in the fit is given by:
Neutral current deep-inelastic structure functions from the H1 and ZEUS experiments at the
HERA e p collider.
The ZEUS measurement of the charm contribution to the DIS structure function, F2cc (x, Q2 ).
The fixed target DIS structure functions measurements from the CCFR, BCDMS, NMC and
E665 experiments, as well as preliminary data from the NuTeV experiment, for different types
of targets.
Inclusive jet cross sections from the D0 and CDF detectors at Fermilab pp collider Tevatron.
The E866 measurements of the Drell-Yan process for both proton and neutron targets, as well
as previous measurements by the E605 experiment.
The measurement of the W-lepton asymmetry from the CDF detector at Tevatron.
Note that the prompt photon data, that was used for some time in parton fits, is not included any
more due to theoretical problems as well as possible inconsistencies.
149
The neural network approach to parton distributions 150
MRST2001 is a global NLO QCD analysis with starting evolution scale Q0 = 1 GeV, that uses
the M S renormalization scheme and the Thorne-Roberts scheme for the treatment of heavy quark
mass effects. The kinematical cuts are given by Q2 2 GeV2 and W 2 12.5 GeV2 . The parton
distributions at the initial evolution scale are parametrized by
xuV (x, Q20 ) = x (u u) (x, Q20 ) = Au xbu (1 x)cu 1 + du x + eu x , (B.1)
xdV (x, Q20 ) = x d d (x, Q20 ) = Ad xbd (1 x)cd 1 + dd x + ed x , (B.2)
xS(x, Q20 ) = As xbs (1 x)cs 1 + ds x + es x , (B.3)
xS(x, Q20 ) 2x u + d + s , (B.4)
xg(x, Q20 ) = Ag xbg (1 x)cg 1 + dg x + eg x Fg xgg (1 x)hg , (B.5)
2u, 2d, 2s = 0.4S + , 0.4S , 0.2S , (B.6)
x = x d u = A xb (1 x)c 1 + d x + e x2 . (B.7)
As well as the parameters that describe the nonperturbative
shape of the parton distributions, for
this analysis also the strong coupling s MZ2 is fitted, resulting in a value consistent with the
current world average [20]. The associated uncertainties to the above parton distributions from
experimental errors were discussed in detail in Ref. [205] and those associated to experimental
uncertainties, like the perturbative order, higher twist corrections or ln 1/x and ln(1 x) effects,
in Ref. [80]. In Fig. B.1 we show the parton distributions that result from the global analysis
discussed above at the scale Q2 = 104 GeV2 . Note that at such a large energy scale, the gluon
distribution becomes dominant at medium and small x, and the contribution from heavy quarks
becomes sizeable. At large evolution lengths Q2 the shape of the parton distributions is essentially
determined by perturbative evolution and becomes less dependent of the initial nonperturbative
condition at Q20 .
Figure B.1:
The MRST2001 partons at the scale Q2 = 104 GeV2 .
The CTEQ Collaboration has performed global analysis of parton distributions since the early 90s
[222, 223, 224, 225] until now. One of their latest work is named CTEQ6 [84], that is summarized in
the following. As will be seen this analysis is very close to that of the MRST2001 analysis discussed
above. CTEQ6 uses as experimental input:
The neural network approach to parton distributions 151
Neutral current deep-inelastic structure functions from the H1 and ZEUS experiments ate the
HERA e p collider.
The fixed target DIS structure functions measurements from the CCFR, BCDMS and NMC
experiments.
Inclusive jet cross sections in several rapidity bins from the D0 detector at Fermilab pp collider
Tevatron.
The E866 measurements of the Drell-Yan deuteron to proton ratio, and the E605 measurement
of the Drell-Yan cross section.
The measurement of the W-lepton asymmetry from the CDF detector at Tevatron.
For all these experiments, all the information on correlated systematic uncertainties is available.
Note that even if global QCD analysis succeed in describing a wide variety of hard-scattering data,
the precision DIS structure function measurements from HERA and fixed target experiments still
provide the backbone of parton distribution analysis.
Figure B.2:
A comparison of the CTEQ6M partons with the MRST2001 partons at the initial evolution scale Q = 2
GeV.
The nonperturbative input to the parton global analysis, as has been discussed in Section 4.2, are
the parametrization of the parton distributions at a starting evolution scale, which in the CTEQ6
set is taken to be Q0 = 1.3 GeV. Let us recall (see Section 4.1.2) that the parton distributions at
Q Q0 are determined by the NLO DGLAP evolution equations. The functional form used in the
CTEQ6 analysis is
A5
xf (x, Q0 ) = A0 xA1 (1 x)A2 eA3 x 1 + eA4 x , (B.8)
with independent parameters for the flavor combinations
u u, d d, g, and u + d. The strange
parton distribution is kept fixed to s = s = 0.2 u + d /2. Also the sea quark ratio
d
= A0 xA1 (1 x)A2 + (1 + A3 x) (1 x)A4 , (B.9)
u
The neural network approach to parton distributions 152
is parametrized. Some parameters are held fixed, for a total of 20 free parameters to model the
nonperturbative parton distribution shape at the input scale Q0 . From the CTEQ6 global analysis
not only the best-fit set of parton distributions are determined from experimental data, also the as-
sociated uncertainties in the parton distributions are estimated using some of the methods discussed
in Section 4.2, see the original reference [84] for a more detailed description of the results of this
global analysis.
Note that the two main groups performing global fits of parton distributions, MRST and CTEQ,
use a very similar set of experimental data, similar assumptions on the nonperturbative shape of
the parton distributions and quite similar methods to determine the associated errors to the parton
distributions. This means that the spread of the MRST and CTEQ results by itself cannot be
taken, even at a qualitative level, as a measure of the uncertainty in the determination of parton
distributions. In Fig. B.2 we show the results of the CTEQ6 global QCD analysis discussed above
compared with the MRST2001 partons. Note the good agreement between the two analysis for the
u(x) and d(x) partons, while the difference is larger for the gluon, as was to be expected since the
gluon distribution has rather large uncertainties.
Finally let us review some other recent determinations of parton distributions. These global
analysis are less frequently used in phenomenological studies than the sets from the CTEQ and
MRST collaborations, since in none of the following cases all the available hard-scattering data is
included. S. Alekhin has presented several QCD analysis of deep-inelastic scattering data [226, 227],
with emphasis in the consistency of the statistical analysis of the data. The GRV published a series
of parton analysis [228, 229, 230, 231] whose main feature was that a very low starting evolution
scale Q20 for the evolution. However, since the release of their last set of parton distributions [231]
no updates have appeared. In particular this group did not produce an estimation of the associated
uncertainties in their parton distributions. Finally, in the last years different groups have published
additional global fits of parton distributions, each one with different motivations. For example the
analysis of BPZ [232] devotes special attention to the accurate determination of the sea strange
parton distribution, while the Fermi partons [233], which have some overlap with our approach, also
attempt to construct a probability measure in the space of parton distributions.
Summarizing, global fits of parton distributions have been a field of active development in the
recent years. Two sets of parton distributions (MRST and CTEQ) are nowadays commonly used in
most phenomenological applications, since they include all available experimental data and provide
regular updates of their parton sets. Note that as discussed before, in both cases the experimental
data, the theoretical assumptions and the technique to assess the uncertainties are rather similar,
and therefore theoretical predictions for observables using the two sets of parton distributions are
in general in good agreement.
Bibliography
[1] J. Rojo and J. I. Latorre, Neural network parametrization of spectral functions from hadronic
tau decays and determination of qcd vacuum condensates, JHEP 01 (2004) 055,
[hep-ph/0401047].
[2] J. Brugues, J. Rojo, and J. G. Russo, Non-perturbative states in type ii superstring theory
from classical spinning membranes, Nucl. Phys. B710 (2005) 117138, [hep-th/0408174].
[3] NNPDF Collaboration, L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione, and J. Rojo,
Unbiased determination of the proton structure function f2p with faithful uncertainty
estimation, hep-ph/0501067.
[4] S. Forte, G. Ridolfi, J. Rojo, and M. Ubiali, Borel resummation of soft gluon radiation and
higher twists, Phys. Lett. B635 (2006) 313319, [hep-ph/0601048].
[5] J. Rojo, Neural network parametrization of the lepton energy spectrum in semileptonic b
meson decays, JHEP 05 (2006) 040, [hep-ph/0601229].
[6] J. Mondejar, A. Pineda, and J. Rojo, Heavy meson semileptonic differential decay rate in two
dimensions in the large n(c), hep-ph/0605248.
[7] J. Rojo Chacon, A probability measure in the space of spectral functions and structure
functions, Nucl. Phys. Proc. Suppl. 152 (2006) 5760, [hep-ph/0407147].
[8] NNPDF Collaboration, J. Rojo, L. Del Debbio, S. Forte, J. I. Latorre, and A. Piccione, The
neural network approach to parton fitting, AIP Conf. Proc. 792 (2005) 376379,
[hep-ph/0505044].
[9] M. Dittmar et al., Parton distributions: Summary report for the hera - lhc workshop,
hep-ph/0511119.
[10] the NNPDF Collaboration, A. Piccione, L. Del Debbio, S. Forte, J. I. Latorre, and J. Rojo,
Neural network approach to parton distributions fitting, Nucl. Instrum. Meth. A559 (2006)
203206, [hep-ph/0509067].
[11] S. Forte, L. Garrido, J. I. Latorre, and A. Piccione, Neural network parametrization of
deep-inelastic structure functions, JHEP 05 (2002) 062, [hep-ph/0204232].
[12] D. Stump et al., Inclusive jet production, parton distributions, and the search for new
physics, JHEP 10 (2003) 046, [hep-ph/0303013].
[13] A. Piccione, Aspects of qcd perturbative evolution, hep-ph/0207204.
[14] R. K. Ellis, W. J. Stirling, and B. R. Webber, Qcd and collider physics, Camb. Monogr. Part.
Phys. Nucl. Phys. Cosmol. 8 (1996) 1435.
153
The neural network approach to parton distributions 154
[15] CTEQ Collaboration, R. Brock et al., Handbook of perturbative qcd: Version 1.0, Rev. Mod.
Phys. 67 (1995) 157248.
[16] M. E. Peskin and D. V. Schroeder, An introduction to quantum field theory, . Reading, USA:
Addison-Wesley (1995) 842 p.
[17] D. J. Gross and F. Wilczek, Ultraviolet behavior of non-abelian gauge theories, Phys. Rev.
Lett. 30 (1973) 13431346.
[18] H. D. Politzer, Reliable perturbative results for strong interactions?, Phys. Rev. Lett. 30
(1973) 13461349.
[19] J. B. Kogut, A review of the lattice gauge theory approach to quantum chromodynamics, Rev.
Mod. Phys. 55 (1983) 775.
[20] S. Bethke, Determination of the qcd coupling s , J. Phys. G26 (2000) R27,
[hep-ex/0004021].
[21] R. D. Ball and S. Forte, Double asymptotic scaling at hera, Phys. Lett. B335 (1994) 7786,
[hep-ph/9405320].
[22] G. Sterman, Partons, factorization and resummation, hep-ph/9606312.
[23] M. Beneke, Renormalons, Phys. Rept. 317 (1999) 1142, [hep-ph/9807443].
[24] Y. L. Dokshitzer, D. Diakonov, and S. I. Troian, Hard processes in quantum
chromodynamics, Phys. Rept. 58 (1980) 269395.
[25] M. A. Shifman, A. I. Vainshtein, and V. I. Zakharov, Qcd and resonance physics. sum rules,
Nucl. Phys. B147 (1979) 385447.
[26] J. D. Bjorken, Asymptotic sum rules at infinite momentum, Phys. Rev. 179 (1969)
15471553.
[27] A. Vogt, S. Moch, and J. A. M. Vermaseren, The three-loop splitting functions in qcd: The
singlet case, Nucl. Phys. B691 (2004) 129181, [hep-ph/0404111].
[28] S. Moch, J. A. M. Vermaseren, and A. Vogt, The three-loop splitting functions in qcd: The
non-singlet case, Nucl. Phys. B688 (2004) 101134, [hep-ph/0403192].
[29] S. Moch, A. Vogt, and J. Vermaseren, Sudakov resummations at higher orders, Acta Phys.
Polon. B36 (2005) 32953308, [hep-ph/0511113].
[30] H. Abramowicz and A. Caldwell, Hera collider physics, Rev. Mod. Phys. 71 (1999)
12751410, [hep-ex/9903037].
[31] M. Dittmar et al., Parton distributions: Summary report, hep-ph/0511119.
[32] G. Corcella and L. Magnea, Soft-gluon resummation effects on parton distributions, Phys.
Rev. D72 (2005) 074017, [hep-ph/0506278].
[33] J. M. Conrad, M. H. Shaevitz, and T. Bolton, Precision measurements with high energy
neutrino beams, Rev. Mod. Phys. 70 (1998) 13411392, [hep-ex/9707015].
[34] J. C. Collins, D. E. Soper, and G. Sterman, Factorization of hard processes in qcd, Adv. Ser.
Direct. High Energy Phys. 5 (1988) 191, [hep-ph/0409313].
[35] G. Altarelli, Partons in quantum chromodynamics, Phys. Rept. 81 (1982) 1.
The neural network approach to parton distributions 155
[36] J. C. Collins, What exactly is a parton density?, Acta Phys. Polon. B34 (2003) 3103,
[hep-ph/0304122].
[37] J. Callan, Curtis G. and D. J. Gross, High-energy electroproduction and the constitution of
the electric current, Phys. Rev. Lett. 22 (1969) 156159.
[38] V. N. Gribov and L. N. Lipatov, Deep inelastic ep scattering in perturbation theory, Sov. J.
Nucl. Phys. 15 (1972) 438450.
[39] Y. L. Dokshitzer, Calculation of the structure functions for deep inelastic scattering and
e+ e annihilation by perturbation theory in quantum chromodynamics. (in russian), Sov.
Phys. JETP 46 (1977) 641653.
[40] G. Altarelli and G. Parisi, Asymptotic freedom in parton language, Nucl. Phys. B126 (1977)
298.
[41] R. P. Feynmann, Photon hadron interactions, W.A. Benjamin, New York (1972).
[42] D. J. Gross and C. H. Llewellyn Smith, High-energy neutrino - nucleon scattering, current
algebra and partons, Nucl. Phys. B14 (1969) 337347.
[43] R. Abbate and S. Forte, Re-evaluation of the gottfried sum using neural networks,
hep-ph/0511231.
[44] New Muon Collaboration, M. Arneodo et al., A reevaluation of the gottfried sum, Phys.
Rev. D50 (1994) 13.
[45] G. T. Bodwin, E. Braaten, and G. P. Lepage, Rigorous qcd analysis of inclusive annihilation
and production of heavy quarkonium, Phys. Rev. D51 (1995) 11251171, [hep-ph/9407339].
[46] M. Neubert, Heavy quark symmetry, Phys. Rept. 245 (1994) 259396, [hep-ph/9306320].
[47] M. Beneke, A. P. Chapovsky, M. Diehl, and T. Feldmann, Soft-collinear effective theory and
heavy-to-light currents beyond leading power, Nucl. Phys. B643 (2002) 431476,
[hep-ph/0206152].
[48] C. W. Bauer, S. Fleming, D. Pirjol, and I. W. Stewart, An effective field theory for collinear
and soft gluons: Heavy to light decays, Phys. Rev. D63 (2001) 114020, [hep-ph/0011336].
[49] D. Benson, I. I. Bigi, T. Mannel, and N. Uraltsev, Imprecated, yet impeccable: On the
theoretical evaluation of (b xc l), Nucl. Phys. B665 (2003) 367401, [hep-ph/0302262].
[50] T. Mannel, Operator product expansion for inclusive semileptonic decays in heavy quark
effective field theory, Nucl. Phys. B413 (1994) 396412, [hep-ph/9308262].
[51] M. Trott, Improving extractions of |vcb | and mb from the hadronic invariant mass moments
of semileptonic inclusive b decay, Phys. Rev. D70 (2004) 073003, [hep-ph/0402120].
[52] V. Aquila, P. Gambino, G. Ridolfi, and N. Uraltsev, Perturbative corrections to semileptonic
b decay distributions, hep-ph/0503083.
[53] I. I. Y. Bigi, M. A. Shifman, N. G. Uraltsev, and A. I. Vainshtein, Qcd predictions for lepton
spectra in inclusive heavy flavor decays, Phys. Rev. Lett. 71 (1993) 496499,
[hep-ph/9304225].
[54] M. Jezabek and J. H. Kuhn, Lepton spectra from heavy quark decay, Nucl. Phys. B320
(1989) 20.
The neural network approach to parton distributions 156
[55] M. Gremm and I. Stewart, Order 2s 0 correction to the charged lepton spectrum in b xc l
decays, Phys. Rev. D55 (1997) 12261232, [hep-ph/9609341].
[56] S. J. Brodsky, G. P. Lepage, and P. B. Mackenzie, On the elimination of scale ambiguities in
perturbative quantum chromodynamics, Phys. Rev. D28 (1983) 228.
[57] A. V. Manohar and M. B. Wise, Inclusive semileptonic b and polarized b decays from qcd,
Phys. Rev. D49 (1994) 13101329, [hep-ph/9308246].
[58] M. Gremm and A. Kapustin, Order 1/m3b corrections to inclusive semileptonic b decay, Phys.
Rev. D55 (1997) 69246932, [hep-ph/9603448].
[59] P. Gambino and N. Uraltsev, Moments of semileptonic b decay distributions in the 1/mb
expansion, Eur. Phys. J. C34 (2004) 181189, [hep-ph/0401063].
[60] BABAR Collaboration, B. Aubert et al., Measurement of the electron energy spectrum and
its moments in inclusive b xe decays, Phys. Rev. D69 (2004) 111104, [hep-ex/0403030].
[61] Y.-S. Tsai, Decay correlations of heavy leptons in e+ e l+ l , Phys. Rev. D4 (1971) 2821.
[62] A. Hocker, Measurement of the spectral functions of the tau lepton and applications to
quantum chromodynamics, . PhD. Thesis, LAL-97-18.
[63] E. Braaten, S. Narison, and A. Pich, Qcd analysis of the tau hadronic width, Nucl. Phys.
B373 (1992) 581612.
[64] F. Le Diberder and A. Pich, Testing qcd with tau decays, Phys. Lett. B289 (1992) 165175.
[65] M. Davier, A. Hocker, and Z. Zhang, The physics of hadronic tau decays, hep-ph/0507078.
[66] ALEPH Collaboration, R. Barate et al., Studies of quantum chromodynamics with the aleph
detector, Phys. Rept. 294 (1998) 1165.
[67] S. Kluth, Tests of quantum chromo dynamics at e+ e colliders, hep-ex/0603011.
[68] E. de Rafael, An introduction to sum rules in qcd, hep-ph/9802448.
[69] S. Narison, Spectral function sum rules in quantum chromodynamics. 1. charged currents
sector, Nucl. Phys. B155 (1979) 115.
[70] T. Das, V. S. Mathur, and S. Okubo, Low-energy theorem in the radiative decays of charged
pions, Phys. Rev. Lett. 19 (1967) 859861.
[71] S. Weinberg, Precise relations between the spectra of vector and axial vector mesons, Phys.
Rev. Lett. 18 (1967) 507509.
[72] T. Das, G. S. Guralnik, V. S. Mathur, F. E. Low, and J. E. Young, Electromagnetic mass
difference of pions, Phys. Rev. Lett. 18 (1967) 759761.
[73] S. J. Brodsky and G. R. Farrar, Scaling laws for large momentum transfer processes, Phys.
Rev. D11 (1975) 1309.
[74] A. C. Irving and R. P. Worden, Regge phenomenology, Phys. Rept. 34 (1977) 117231.
[75] A. D. Martin, R. G. Roberts, W. J. Stirling, and R. S. Thorne, Nnlo global parton analysis,
Phys. Lett. B531 (2002) 216224, [hep-ph/0201127].
[76] G. Altarelli, S. Forte, and G. Ridolfi, On positivity of parton distributions, Nucl. Phys. B534
(1998) 277296, [hep-ph/9806345].
The neural network approach to parton distributions 157
[77] G. L. Fogli, E. Lisi, A. Marrone, D. Montanino, and A. Palazzo, Getting the most from the
statistical analysis of solar neutrino oscillations, Phys. Rev. D66 (2002) 053010,
[hep-ph/0206162].
[78] G. Altarelli, R. D. Ball, and S. Forte, Perturbatively stable resummed small x evolution
kernels, hep-ph/0512237.
[79] R. S. Thorne et al., Questions on uncertainties in parton distributions, J. Phys. G28 (2002)
27172722, [hep-ph/0205233].
[80] A. D. Martin, R. G. Roberts, W. J. Stirling, and R. S. Thorne, Uncertainties of predictions
from parton distributions. ii: Theoretical errors, Eur. Phys. J. C35 (2004) 325348,
[hep-ph/0308087].
[81] M. Botje, A qcd analysis of hera and fixed target structure function data, Eur. Phys. J. C14
(2000) 285297, [hep-ph/9912439].
[82] J. Pumplin et al., Uncertainties of predictions from parton distribution functions. ii: The
hessian method, Phys. Rev. D65 (2002) 014013, [hep-ph/0101032].
[83] J. C. Collins and J. Pumplin, Tests of goodness of fit to multiple data sets, hep-ph/0105207.
[84] J. Pumplin et al., New generation of parton distributions with uncertainties from global qcd
analysis, JHEP 07 (2002) 012, [hep-ph/0201195].
[85] D. Stump et al., Uncertainties of predictions from parton distribution functions. i: The
lagrange multiplier method, Phys. Rev. D65 (2002) 014012, [hep-ph/0101051].
[86] A. Cooper-Sarkar and C. Gwenlan, Comparison and combination of zeus and h1 pdf
analyses, hep-ph/0508304.
[87] M. R. Whalley, D. Bourilkov, and R. C. Group, The les houches accord pdfs (lhapdf ) and
lhaglue, hep-ph/0508110.
[88] G. Altarelli, R. D. Ball, S. Forte, and G. Ridolfi, Theoretical analysis of polarized structure
functions, Acta Phys. Polon. B29 (1998) 11451173, [hep-ph/9803237].
[89] Asymmetry Analysis Collaboration, Y. Goto et al., Polarized parton distribution
functions in the nucleon, Phys. Rev. D62 (2000) 034017, [hep-ph/0001046].
[90] J. Blumlein and H. Bottcher, Qcd analysis of polarized deep inelastic scattering data and
parton distributions, Nucl. Phys. B636 (2002) 225263, [hep-ph/0203155].
[91] M. Hirai, S. Kumano, and T. H. Nagai, Nuclear corrections of parton distribution functions,
Nucl. Phys. Proc. Suppl. 139 (2005) 2126, [hep-ph/0408135].
[92] D. de Florian and R. Sassot, Nuclear parton distributions at next to leading order, Phys. Rev.
D69 (2004) 074028, [hep-ph/0311227].
[93] K. J. Eskola, H. Honkanen, V. J. Kolhinen, P. V. Ruuskanen, and C. A. Salgado, Nuclear
parton distributions in the dglap approach, hep-ph/0110348.
[94] M. Gluck, E. Reya, and A. Vogt, Photonic parton distributions, Phys. Rev. D46 (1992)
19731979.
[95] P. J. Sutton, A. D. Martin, R. G. Roberts, and W. J. Stirling, Parton distributions for the
pion extracted from drell-yan and prompt photon experiments, Phys. Rev. D45 (1992)
23492359.
The neural network approach to parton distributions 158
[117] OPAL Collaboration, K. Ackerstaff et al., Measurement of the strong coupling constant s
and the vector and axial-vector spectral functions in hadronic tau decays, Eur. Phys. J. C7
(1999) 571593, [hep-ex/9808019].
[118] ALEPH Collaboration, S. Schael et al., Branching ratios and spectral functions of tau
decays: Final aleph measurements and physics implications, Phys. Rept. 421 (2005) 191284,
[hep-ex/0506072].
[119] E. Witten, Some inequalities among hadron masses, Phys. Rev. Lett. 51 (1983) 2351.
[120] J. Comellas, J. I. Latorre, and J. Taron, Constraints on chiral perturbation theory parameters
from qcd inequalities, Phys. Lett. B360 (1995) 109116, [hep-ph/9507258].
[121] J. Bijnens, E. Gamiz, and J. Prades, Matching the electroweak penguins q7 , q8 and spectral
correlators, JHEP 10 (2001) 009, [hep-ph/0108240].
[122] V. Cirigliano, J. F. Donoghue, E. Golowich, and K. Maltman, Determination of
h()(i = 2)|q7,8 |k0 i in the chiral limit, Phys. Lett. B522 (2001) 245256, [hep-ph/0109113].
[123] C. A. Dominguez and K. Schilcher, Finite energy chiral sum rules in qcd, Phys. Lett. B581
(2004) 193198, [hep-ph/0309285].
[124] V. Cirigliano, E. Golowich, and K. Maltman, Qcd condensates for the light quark v-a
correlator, Phys. Rev. D68 (2003) 054013, [hep-ph/0305118].
[125] J. Bordes, C. A. Dominguez, J. Penarrocha, and K. Schilcher, Chiral condensates from tau
decay: A critical reappraisal, hep-ph/0511293.
[126] S. Friot, D. Greynat, and E. de Rafael, Chiral condensates, q7 and q8 matrix elements and
large-nc qcd, JHEP 10 (2004) 043, [hep-ph/0408281].
[127] S. Narison, V-a hadronic tau decays: A laboratory for the qcd vacuum, Phys. Lett. B624
(2005) 223232, [hep-ph/0412152].
[128] M. Davier, L. Girlanda, A. Hocker, and J. Stern, Finite energy chiral sum rules and tau
spectral functions, Phys. Rev. D58 (1998) 096014, [hep-ph/9802447].
[129] B. L. Ioffe, Qcd at low energies, Prog. Part. Nucl. Phys. 56 (2006) 232277,
[hep-ph/0502148].
[130] M. Knecht, S. Peris, and E. de Rafael, A critical reassessment of q7 and q8 matrix elements,
Phys. Lett. B508 (2001) 117126, [hep-ph/0102017].
[131] K. N. Zyablyuk, V - a sum rules with d = 10 operators, Eur. Phys. J. C38 (2004) 215223,
[hep-ph/0404230].
[132] O. Cata, M. Golterman, and S. Peris, Duality violations and spectral sum rules, JHEP 08
(2005) 076, [hep-ph/0506004].
[133] M. Davier, A. Hocker, and Z. Zhang, The physics of hadronic tau decays, hep-ph/0507078.
[134] J. Hirn, N. Rius, and V. Sanz, Geometric approach to condensates in holographic qcd,
hep-ph/0512240.
[135] W.-K. Tung, Status of global qcd analysis and the parton structure of the nucleon,
hep-ph/0409145.
The neural network approach to parton distributions 160
[137] Spin Muon Collaboration, B. Adeva et al., A next-to-leading order qcd analysis of the spin
structure function g1, Phys. Rev. D58 (1998) 112002.
[138] New Muon Collaboration, M. Arneodo et al., Measurement of the proton and deuteron
structure functions, f2p and f2d , and of the ratio l /t , Nucl. Phys. B483 (1997) 343,
[hep-ph/9610231].
[139] BCDMS Collaboration, A. C. Benvenuti et al., A high statistics measurement of the proton
structure functions f2 (x, q 2 ) and r from deep inelastic muon scattering at high q 2 , Phys. Lett.
B223 (1989) 485.
[140] BCDMS Collaboration, A. C. Benvenuti et al., A high statistics measurement of the
deuteron structure functions f2 (x, q 2 ) and r from deep inelastic muon scattering at high q 2 ,
Phys. Lett. B237 (1990) 592.
[141] E665 Collaboration, M. R. Adams et al., Proton and deuteron structure functions in muon
scattering at 470-gev, Phys. Rev. D54 (1996) 30063056.
[142] ZEUS Collaboration, M. Derrick et al., Measurement of the f2 structure function in deep
inelastic e+ p scattering using 1994 data from the zeus detector at hera, Z. Phys. C72 (1996)
399424, [hep-ex/9607002].
[143] ZEUS Collaboration, J. Breitweg et al., Measurement of the proton structure function f2 and
tot ( p) at q 2 and very low x at hera, Phys. Lett. B407 (1997) 432448, [hep-ex/9707025].
[144] ZEUS Collaboration, J. Breitweg et al., Zeus results on the measurement and phenomenology
of f2 at low x and low q 2 , Eur. Phys. J. C7 (1999) 609630, [hep-ex/9809005].
[145] ZEUS Collaboration, S. Chekanov et al., Measurement of the neutral current cross section
and f2 structure function for deep inelastic e+ p scattering at hera, Eur. Phys. J. C21 (2001)
443471, [hep-ex/0105090].
[146] ZEUS Collaboration, J. Breitweg et al., Measurement of the proton structure function f2 at
very low q 2 at hera, Phys. Lett. B487 (2000) 5373, [hep-ex/0005018].
[147] H1 Collaboration, C. Adloff et al., A measurement of the proton structure function f2 (x, q 2 )
at low x and low q 2 at hera, Nucl. Phys. B497 (1997) 330, [hep-ex/9703012].
[148] H1 Collaboration, C. Adloff et al., Measurement of neutral and charged current
cross-sections in positron proton collisions at large momentum transfer, Eur. Phys. J. C13
(2000) 609639, [hep-ex/9908059].
[149] H1 Collaboration, C. Adloff et al., Deep-inelastic inclusive e p scattering at low x and a
determination of s , Eur. Phys. J. C21 (2001) 3361, [hep-ex/0012053].
[150] H1 Collaboration, C. Adloff et al., Measurement of neutral and charged current cross
sections in electron proton collisions at high q 2 , Eur. Phys. J. C19 (2001) 269288,
[hep-ex/0012052].
[151] H1 Collaboration, C. Adloff et al., Measurement and qcd analysis of neutral and charged
current cross sections at hera, Eur. Phys. J. C30 (2003) 132, [hep-ex/0304003].
The neural network approach to parton distributions 161
[159] C. W. Bauer, Z. Ligeti, M. Luke, A. V. Manohar, and M. Trott, Global analysis of inclusive b
decays, Phys. Rev. D70 (2004) 094017, [hep-ph/0408002].
[160] BABAR Collaboration, B. Aubert et al., Determination of the branching fraction for
b xc ll decays and of vcb from hadronic mass and lepton energy moments, Phys. Rev. Lett.
93 (2004) 011803, [hep-ex/0404017].
[161] O. Buchmueller and H. Flaecher, Fits to moment measurements from b xc l and b xs
decays using heavy quark expansions in the kinetic scheme, hep-ph/0507253.
[162] Particle Data Group Collaboration, S. Eidelman et al., Review of particle physics, Phys.
Lett. B592 (2004) 1.
[163] DELPHI Collaboration, J. Abdallah et al., Determination of heavy quark non-perturbative
parameters from spectral moments in semileptonic b decays, hep-ex/0510024.
[171] G. Corcella and A. H. Hoang, Uncertainties in the ms-bar bottom quark mass from
relativistic sum rules, Phys. Lett. B554 (2003) 133140, [hep-ph/0212297].
[172] A. Pineda, Determination of the bottom quark mass from the (1s) system, JHEP 06 (2001)
022, [hep-ph/0105008].
[173] N. Brambilla, Y. Sumino, and A. Vairo, Quarkonium spectroscopy and perturbative qcd: A
new perspective, Phys. Lett. B513 (2001) 381390, [hep-ph/0101305].
[174] N. Brambilla, Y. Sumino, and A. Vairo, Quarkonium spectroscopy and perturbative qcd:
Massive quark loop effects, Phys. Rev. D65 (2002) 034001, [hep-ph/0108084].
[175] N. Brambilla et al., Heavy quarkonium physics, hep-ph/0412158.
[176] A. H. Hoang, Bottom quark mass from upsilon mesons: Charm mass effects,
hep-ph/0008102.
[177] S. W. Bosch, B. O. Lange, M. Neubert, and G. Paz, Factorization and shape-function effects
in inclusive b- meson decays, Nucl. Phys. B699 (2004) 335386, [hep-ph/0402094].
[178] C. W. Bauer and A. V. Manohar, Shape function effects in b x/s gamma and b x/u l
nu decays, Phys. Rev. D70 (2004) 034024, [hep-ph/0312109].
[179] I. Bizjak, A. Limosani, and T. Nozaki, Determination of the b-quark leading shape function
parameters in the shape function scheme using the belle b xs photon energy spectrum,
hep-ex/0506057.
[180] NNPDF Collaboration, L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione, and J. Rojo, The
neural network approach to parton distribution functions: The singlet case, .
[181] A. Vogt, Efficient parton evolution with pegasus, hep-ph/0407089.
[182] M. Botje, Qcdnum manual, http://www.nikhef.nlh24/qcdnum/.
[183] W. Giele et al., The qcd/sm working group: Summary report, hep-ph/0204316.
[184] G. Curci, W. Furmanski, and R. Petronzio, Evolution of parton densities beyond leading
order: The nonsinglet case, Nucl. Phys. B175 (1980) 27.
[185] J. Abate and P. Valko, Multi-precision laplace transform inversion, International Journal for
Numerical Methods in Engineering 60 (2003) 979993.
[186] J. Abate and P. Valko, Comparison of sequence accelerators fro the gaver method of
numerical laplace transform inversions, Computing and mathematics with applications 48
(2004) 629636.
[187] S. Forte and G. Ridolfi, Renormalization group approach to soft gluon resummation, Nucl.
Phys. B650 (2003) 229270, [hep-ph/0209154].
[188] S. Forte and R. D. Ball, Universality and scaling in perturbative qcd at small x, Acta Phys.
Polon. B26 (1995) 20972134, [hep-ph/9512208].
[189] E. B. Zijlstra and W. L. van Neerven, Order 2s qcd corrections to the deep inelastic proton
structure functions f2 and fl , Nucl. Phys. B383 (1992) 525574.
[190] W. L. van Neerven and A. Vogt, Nnlo evolution of deep-inelastic structure functions: The
non-singlet case, Nucl. Phys. B568 (2000) 263286, [hep-ph/9907472].
The neural network approach to parton distributions 163
[191] K. G. Chetyrkin, B. A. Kniehl, and M. Steinhauser, Strong coupling constant with flavour
thresholds at four loops in the ms-bar scheme, Phys. Rev. Lett. 79 (1997) 21842187,
[hep-ph/9706430].
[192] M. Buza, Y. Matiounine, J. Smith, and W. L. van Neerven, Charm electroproduction viewed
in the variable-flavour number scheme versus fixed-order perturbation theory, Eur. Phys. J.
C1 (1998) 301320, [hep-ph/9612398].
[193] J. Amundson, C. Schmidt, W.-K. Tung, and X. Wang, Charm production in deep inelastic
scattering from threshold to high q 2 , JHEP 10 (2000) 031, [hep-ph/0005221].
[194] R. S. Thorne and R. G. Roberts, An ordered analysis of heavy flavour production in deep
inelastic scattering, Phys. Rev. D57 (1998) 68716898, [hep-ph/9709442].
[195] H. Georgi and H. D. Politzer, Freedom at moderate energies: Masses in color dynamics,
Phys. Rev. D14 (1976) 1829.
[196] A. Deshpande, R. Milner, R. Venugopalan, and W. Vogelsang, Study of the fundamental
structure of matter with an electron ion collider, Ann. Rev. Nucl. Part. Sci. 55 (2005) 165,
[hep-ph/0506148].
[197] J. B. Dainton, M. Klein, P. Newman, E. Perez, and F. Willeke, Deep inelastic electron
nucleon scattering at the lhc, hep-ex/0603016.
[198] M. Osipenko et al., The proton structure function f2 with clas, hep-ex/0309052.
[199] CLAS Collaboration, M. Osipenko et al., The deuteron structure function f2d with clas,
hep-ex/0507098.
[200] J. Pumplin, A. Belyaev, J. Huston, D. Stump, and W. K. Tung, Parton distributions and the
strong coupling strength s , hep-ph/0512167.
[201] NNPDF Collaboration, L. Del Debbio, S. Forte, J. I. Latorre, A. Piccione, and J. Rojo, The
neural network approach to parton distribution functions: The nonsinglet case, .
[202] A. V. Kotikov and V. G. Krivokhijine, f2 structure function and higher-twist contributions
(nonsinglet case), hep-ph/9805353.
[203] U.-K. Yang and A. Bodek, Parton distributions, d/u, and higher twist effects at high x, Phys.
Rev. Lett. 82 (1999) 24672470, [hep-ph/9809480].
[204] G. Sterman and W. Vogelsang, Soft-gluon resummation and pdf theory uncertainties,
hep-ph/0002132.
[205] A. D. Martin, R. G. Roberts, W. J. Stirling, and R. S. Thorne, Uncertainties of predictions
from parton distributions. i: Experimental errors, Eur. Phys. J. C28 (2003) 455473,
[hep-ph/0211080].
[206] C. Garcia-Gonzalez, M. Maltoni, and J. Rojo, Neural network parametrization of the
atmosferic neutrino flux, .
[207] J. Blumlein, H. Bottcher, and A. Guffanti, Non-singlet qcd analysis of the structure function
f2 in 3-loops, Nucl. Phys. Proc. Suppl. 135 (2004) 152155, [hep-ph/0407089].
[208] J. Blumlein, H. Bottcher, and A. Guffanti, Non-singlet qcd analysis of f2 in 3-loops, AIP
Conf. Proc. 747 (2005) 5053.
The neural network approach to parton distributions 164
[209] E. Eichten, I. Hinchliffe, K. D. Lane, and C. Quigg, Super collider physics, Rev. Mod. Phys.
56 (1984) 579707.
[210] D. W. Duke and J. F. Owens, q 2 dependent parametrizations of parton distribution functions,
Phys. Rev. D30 (1984) 4954.
[211] P. N. Harriman, A. D. Martin, W. J. Stirling, and R. G. Roberts, Parton distributions
extracted from data on deep inelastic lepton scattering, prompt photon production and the
drell- yan process, Phys. Rev. D42 (1990) 798810.
[212] M. Diemoz, F. Ferroni, E. Longo, and G. Martinelli, Parton densities from deep inelastic
scattering to hadronic processes at super collider energies, Z. Phys. C39 (1988) 21.
[213] J. F. Owens and W.-K. Tung, Parton distribution functions of hadrons, Ann. Rev. Nucl.
Part. Sci. 42 (1992) 291332.
[214] J. Kwiecinski, A. D. Martin, W. J. Stirling, and R. G. Roberts, Parton distributions at small
x, Phys. Rev. D42 (1990) 36453659.
[215] A. D. Martin, R. G. Roberts, and W. J. Stirling, Structure function analysis and psi, jet, w,
z production: Pinning down the gluon, Phys. Rev. D37 (1988) 1161.
[216] A. D. Martin, R. G. Roberts, and W. J. Stirling, Implications of new deep inelastic scattering
data for parton distributions, Phys. Lett. B206 (1988) 327.
[217] A. D. Martin, W. J. Stirling, and R. G. Roberts, New information on parton distributions,
Phys. Rev. D47 (1993) 867882.
[218] A. D. Martin, W. J. Stirling, and R. G. Roberts, Parton distributions of the proton, Phys.
Rev. D50 (1994) 67346752, [hep-ph/9406315].
[219] A. D. Martin, R. G. Roberts, and W. J. Stirling, Parton distributions: A study of the new
hera data, alphas , the gluon and p anti-p jet production, Phys. Lett. B387 (1996) 419426,
[hep-ph/9606345].
[220] A. D. Martin, R. G. Roberts, W. J. Stirling, and R. S. Thorne, Parton distributions: A new
global analysis, Eur. Phys. J. C4 (1998) 463496, [hep-ph/9803445].
[227] S. Alekhin, Parton distributions from deep-inelastic scattering data, Phys. Rev. D68 (2003)
014002, [hep-ph/0211096].
[228] M. Gluck, E. Reya, and A. Vogt, Radiatively generated parton distributions for high-energy
collisions, Z. Phys. C48 (1990) 471482.
[229] M. Gluck, E. Reya, and A. Vogt, Parton distributions for high-energy collisions, Z. Phys.
C53 (1992) 127134.
[230] M. Gluck, E. Reya, and A. Vogt, Dynamical parton distributions of the proton and small x
physics, Z. Phys. C67 (1995) 433448.
[231] M. Gluck, E. Reya, and A. Vogt, Dynamical parton distributions revisited, Eur. Phys. J. C5
(1998) 461470, [hep-ph/9806404].
[232] V. Barone, C. Pascaud, and F. Zomer, A new global analysis of deep inelastic scattering data,
Eur. Phys. J. C12 (2000) 243262, [hep-ph/9907512].
[233] W. T. Giele, S. A. Keller, and D. A. Kosower, Parton distribution function uncertainties,
hep-ph/0104052.
2.2
Total contribution
Contribution from errors
1.8
Contributions to the error function
1.6
1.4
1.2
0.8
0.6
0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000
Number of generations
4
Contribution from errors
Contribution from SR1
3.5
Partial contributions to the error function
2.5
1.5
0.5
0
0.001 0.01 0.1 1 10
Relative weights of SR1 wsr_1