License: arXiv.org perpetual non-exclusive license
arXiv:2312.00439v1 [stat.ME] 01 Dec 2023
\floatsetup

[figure]style=plain, subcapbesideposition=top [subfigure]position=top,justification=raggedright \floatsetup[table]style=plaintop [subfigure]position=top,justification=raggedright, singlelinecheck=off

Modeling the Ratio of Correlated Biomarkers Using Copula Regression

Moritz Berger Department of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Venusberg Campus 1, 53127 Bonn Nadja Klein Chair of Uncertainty Quantification and Statistical Learning, Research Center for Trustworthy Data Science and Security (UA-Ruhr) and Department of Statistics (Technische Universität Dortmund), Joseph-von-Fraunhofer Straße 25, 44227 Dortmund Michael Wagner German Center for Neurodegenerative Diseases, Venusberg Campus 1, 53127 Bonn Matthias Schmid Department of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Venusberg Campus 1, 53127 Bonn German Center for Neurodegenerative Diseases, Venusberg Campus 1, 53127 Bonn
Abstract

Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are based on oversimplified assumptions, assuming e.g. independence or strictly positive associations between the components. In this paper, we close this gap in the literature and propose a regression model where the marginal distributions of the two components are linked by Frank copula. A key feature of our model is that it allows for both positive and negative correlations between the components, with one of the model parameters being directly interpretable in terms of Kendall’s rank correlation coefficient. We study our method theoretically, evaluate finite sample properties in a simulation study and demonstrate its efficacy in an application to diagnosis of Alzheimer’s disease via ratios of amyloid-beta and total tau protein biomarkers.

Keywords: Distributional regression, Frank copula, Gamma distribution, Negative dependence, Ratio outcome

1 Introduction

A common objective in observational research is to analyze the ratio of two (possibly dependent) components U,V+𝑈𝑉superscriptU,V\in\mathbb{R}^{+}italic_U , italic_V ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT (Long et al., 2016). Typical examples are, among others, (i) the LDL/HDL cholesterol ratio in cardiovascular research (Natarajan et al., 2003), defined as the ratio of the low-density lipoprotein and the high-density lipoprotein concentrations in plasma or serum, (ii) the CD4/CD8 ratio in HIV research (Caby et al., 2016), which measures the ratio of T helper cells to cytotoxic T cells in the human immune system, (iii) the testosterone over epitestosterone (T/E) ratio in antidoping research (Sottas et al., 2007), and (iv) the GEFC/REFC ratio in ophthalmic research, corresponding to the green and red emission components in fundus autofluorescence imaging (Wintergerst et al., 2022). In many of such studies, biomarker ratios are used as early indicators or even as surrogate endpoints for a specific disease. In these cases, the focus is not only on the characterization of the marginal ratio distribution, but also on modeling this distribution as a function of a set of covariates X=(X1,,Xp)𝑋superscriptsubscript𝑋1subscript𝑋𝑝topX=(X_{1},\ldots,X_{p})^{\top}italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Usually, this amounts to specifying a regression model that includes the ratio as outcome variable (Berger et al., 2019).

When setting up a model relating the ratio outcome R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V to the covariates X𝑋Xitalic_X, a common assumption is that both components follow either log-normal or gamma distributions, thereby accounting for the nonnegativity of the component values and the skewness of their distributions (Mitchell et al., 2015; Van Domelen et al., 2021). In the former case it is easily derived that the ratio is again log-normally distributed. The latter case, which will be dealt with in this paper, is considerably less straightforward but is often preferred in practice due to its increased efficiency (Firth, 1988; Wiens, 1999; Berger et al., 2019).

In the special case where U𝑈Uitalic_U and V𝑉Vitalic_V are independently gamma distributed, the ratio R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V follows a generalized beta distribution of the second kind, in the following abbreviated by GB2 (Kleiber and Kotz, 2003). A regression approach for the GB2 distribution has been proposed by Tulupyev et al. (2013), who studied the determinants of alcohol abuse in HIV-positive persons using the framework of vector generalized additive models (VGAMs; Yee, 2015). More recent examples include Safari-Katesari and Zaroudi (2020), Bourguignon et al. (2021), Medeiros et al. (2023) and dos Santos et al. (2023). The case of correlated gamma distributed components has earlier been studied by Lee et al. (1979) and Tubbs (1986). Based on Kibble’s bivariate gamma distribution for (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) (Kibble, 1941), Berger et al. (2019) developed the extended GB2 (eGB2) model for the ratio of two positively correlated gamma distributed components. Their model is characterized by three parameters, of which one is directly interpretable in terms of the Pearson correlation coefficient between the two components. Conceptually, the extended GB2 model can be seen as a distributional regression model embedded in the framework of generalized additive models for location, scale and shape (GAMLSS; Rigby and Stasinopoulos, 2005; Berger and Schmid, 2020).

Despite its major importance in biostatistics, no regression modeling strategy exists (to the best of our knowledge) for ratio outcomes with two negatively correlated gamma distributed components. Negatively correlated measurements are encountered in numerous applications, for example in dementia research, where ratios of cerebrospinal fluid (CSF) biomarkers are used for the early diagnosis of Alzheimer’s disease (Koyama et al., 2012). Importantly, measurements of the widely employed amyloid-β𝛽\betaitalic_β 42 protein and total tau protein biomarkers are known to exhibit a negative correlation (Tapiola et al., 2009). In recent publications, the Gaussian regression model has been used for modeling ratios of CSF biomarkers (e.g., Xu et al., 2020). Clearly, this model neither accounts for the characteristics of the bivariate distribution of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) nor for the skewness in the distribution of the ratio outcome R𝑅Ritalic_R.

Motivated by these problems, and to address the current shortcomings in modeling ratio outcomes with negatively correlated gamma distributed components, we propose a regression model where the joint bivariate distribution of the two gamma distributed components is defined by Frank copula (Genest, 1987). By this choice, the model flexibly accounts for either negative or positive associations between the two components (measured by Spearman’s or Kendall’s rank correlation coefficient). It also allows for modeling different characteristics of the two marginal distributions, including possibly unequal rate and shape parameters. By relating the covariates X𝑋Xitalic_X to the parameters of the marginals, as well as to the association parameter defined by the copula, our model further allows to derive the conditional probability density function of R|Xconditional𝑅𝑋R\,|\,Xitalic_R | italic_X as a function of covariates. This, in turn, allows for the analysis of conditional distributional parameters (like the expected value, median or quantiles), including valid inferential conclusions for these quantities.

We apply the new approach to data from a multi-center observational cohort study conducted by the German Dementia Network (DCN; Kornhuber et al., 2009). Study participants were diagnosed with either mild cognitive impairment (MCI), Alzheimer’s Disease (AD), or other dementia. The study aims at determining the diagnostic and prognostic power of clinical, laboratory and imaging methods. This task is considered to be a major challenge, as the period from the first clinical symptoms of AD to disease onset might take years to decades (Sperling et al., 2013). Consequently, as biomarker ratios like the amyloid-β𝛽\betaitalic_β 42/total tau ratio are considered to be strong predictors of AD progression, it is of high interest to relate these measurements to patient characteristics like age, sex and educational level (Jack et al., 2015). As will be demonstrated in Section 4, the proposed copula regression model can be suitably applied to address this problem, resulting in meaningful descriptive and inferential findings regarding the associations between the biomarker ratio and individual patient characteristics.

The rest of the paper is organized as follows: Section 2 derives the distributional copula regression model, states novel theoretical results with implications for the interpretation of covariate effects, and presents estimation, prediction and inference. The efficacy of our approach is demonstrated empirically in a simulation study in Section 3 and in our main application to AD progression in Section 4. The main findings of the paper are discussed in Section 5.

2 Methods

Section 2.1 starts by deriving the distribution of the ratio of two gamma distributed components with dependence induced by a Frank copula. Details on model specification and fitting are given in Section 2.2. Section 2.3 covers the prediction of distributional parameters and inference.

2.1 Distributional concept

Let U𝑈Uitalic_U and V𝑉Vitalic_V be two gamma distributed random variables with probability density functions (PDF)

fU(u)=λUδUΓ(δU)uδU1exp(λUu)andfV(v)=λVδVΓ(δV)vδV1exp(λVv),formulae-sequencesubscript𝑓𝑈𝑢superscriptsubscript𝜆𝑈subscript𝛿𝑈Γsubscript𝛿𝑈superscript𝑢subscript𝛿𝑈1subscript𝜆𝑈𝑢andsubscript𝑓𝑉𝑣superscriptsubscript𝜆𝑉subscript𝛿𝑉Γsubscript𝛿𝑉superscript𝑣subscript𝛿𝑉1subscript𝜆𝑉𝑣\displaystyle f_{U}(u)=\frac{\lambda_{U}^{\delta_{U}}}{\Gamma(\delta_{U})}u^{% \delta_{U}-1}\exp(-\lambda_{U}u)\ \ \ \text{and}\ \ \ f_{V}(v)=\frac{\lambda_{% V}^{\delta_{V}}}{\Gamma(\delta_{V})}v^{\delta_{V}-1}\exp(-\lambda_{V}v)\,,italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG italic_u start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_u ) and italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) end_ARG italic_v start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_exp ( - italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT italic_v ) , (1)

where λUsubscript𝜆𝑈\lambda_{U}italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, λVsubscript𝜆𝑉\lambda_{V}italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT >0absent0>0> 0 denote the rate parameters and δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT >0absent0>0> 0 denote the shape parameters of fUsubscript𝑓𝑈f_{U}italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and fVsubscript𝑓𝑉f_{V}italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, respectively. We allow for dependencies between U𝑈Uitalic_U and V𝑉Vitalic_V, and assume that their joint distribution can be described by Frank copula with copula function Cθsubscript𝐶𝜃C_{\theta}italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. By Sklar’s theorem, the joint distribution of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) is thus given by

FU,V(u,v)subscript𝐹𝑈𝑉𝑢𝑣\displaystyle F_{U,V}(u,v)italic_F start_POSTSUBSCRIPT italic_U , italic_V end_POSTSUBSCRIPT ( italic_u , italic_v ) =Cθ(FU(u),FV(v))absentsubscript𝐶𝜃subscript𝐹𝑈𝑢subscript𝐹𝑉𝑣\displaystyle=C_{\theta}\left(F_{U}(u),F_{V}(v)\right)= italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) , italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) )
=1θlog(1+(exp(θFU(u))1)(exp(θFV(v))1)exp(θ)1),absent1𝜃1𝜃subscript𝐹𝑈𝑢1𝜃subscript𝐹𝑉𝑣1𝜃1\displaystyle=-\frac{1}{\theta}\,\log\left(1+\frac{(\exp(-\theta F_{U}(u))-1)(% \exp(-\theta F_{V}(v))-1)}{\exp(-\theta)-1}\right)\,,= - divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG roman_log ( 1 + divide start_ARG ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) ) - 1 ) ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) ) - 1 ) end_ARG start_ARG roman_exp ( - italic_θ ) - 1 end_ARG ) , (2)

where FU,Vsubscript𝐹𝑈𝑉F_{U,V}italic_F start_POSTSUBSCRIPT italic_U , italic_V end_POSTSUBSCRIPT, FUsubscript𝐹𝑈F_{U}italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and FVsubscript𝐹𝑉F_{V}italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT denote the joint bivariate and marginal cumulative distribution functions (CDFs) of U𝑈Uitalic_U and V𝑉Vitalic_V, respectively. The parameter θ{0}𝜃0\theta\in\mathbb{R}\setminus\{0\}italic_θ ∈ blackboard_R ∖ { 0 } determines the association between U𝑈Uitalic_U and V𝑉Vitalic_V. It can be shown that Kendall’s rank correlation coefficient τ[1,1]𝜏11\tau\in[-1,1]italic_τ ∈ [ - 1 , 1 ] is a monotone increasing function of θ𝜃\thetaitalic_θ, given by

τ(θ)=1+4θ(1θ0θtet1𝑑t1)𝜏𝜃14𝜃1𝜃superscriptsubscript0𝜃𝑡superscript𝑒𝑡1differential-d𝑡1\tau(\theta)=1\,+\,\frac{4}{\theta}\left(\frac{1}{\theta}\int_{0}^{\theta}% \frac{t}{e^{t}-1}\,dt-1\right)italic_τ ( italic_θ ) = 1 + divide start_ARG 4 end_ARG start_ARG italic_θ end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT divide start_ARG italic_t end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - 1 end_ARG italic_d italic_t - 1 ) (3)

(Harry, 2014). As a consequence, the CDF in (2.1) allows for (possibly highly) positive or negative correlations between the two components U𝑈Uitalic_U and V𝑉Vitalic_V. The joint PDF of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) is given by

fU,V(u,v)subscript𝑓𝑈𝑉𝑢𝑣\displaystyle f_{U,V}(u,v)italic_f start_POSTSUBSCRIPT italic_U , italic_V end_POSTSUBSCRIPT ( italic_u , italic_v ) =2uvFU,V(u,v)=cθ(FU(u),FV(v))fU(u)fV(v)absentsuperscript2𝑢𝑣subscript𝐹𝑈𝑉𝑢𝑣subscript𝑐𝜃subscript𝐹𝑈𝑢subscript𝐹𝑉𝑣subscript𝑓𝑈𝑢subscript𝑓𝑉𝑣\displaystyle=\frac{\partial^{2}}{\partial u\partial v}{F_{U,V}(u,v)}=c_{% \theta}\big{(}F_{U}(u),F_{V}(v)\big{)}\,f_{U}(u)\,f_{V}(v)= divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_u ∂ italic_v end_ARG italic_F start_POSTSUBSCRIPT italic_U , italic_V end_POSTSUBSCRIPT ( italic_u , italic_v ) = italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) , italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v )
=θexp(θFU(u))exp(θFV(v))(exp(θ)1)fU(u)fV(v)((exp(θ)1)+(exp(θFU(u))1)(exp(θFV(v))1))2,absent𝜃𝜃subscript𝐹𝑈𝑢𝜃subscript𝐹𝑉𝑣𝜃1subscript𝑓𝑈𝑢subscript𝑓𝑉𝑣superscript𝜃1𝜃subscript𝐹𝑈𝑢1𝜃subscript𝐹𝑉𝑣12\displaystyle=\frac{-\theta\,\exp(-\theta F_{U}(u))\,\exp(-\theta F_{V}(v))\,(% \exp(-\theta)-1)\,f_{U}(u)\,f_{V}(v)}{((\exp(-\theta)-1)+(\exp(-\theta F_{U}(u% ))-1)\,(\exp(-\theta F_{V}(v))-1))^{2}}\,,= divide start_ARG - italic_θ roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) ) roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) ) ( roman_exp ( - italic_θ ) - 1 ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) end_ARG start_ARG ( ( roman_exp ( - italic_θ ) - 1 ) + ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u ) ) - 1 ) ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v ) ) - 1 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (4)

where cθ(a,b):=2/(ab)Cθ(a,b)assignsubscript𝑐𝜃𝑎𝑏superscript2𝑎𝑏subscript𝐶𝜃𝑎𝑏c_{\theta}(a,b):=\partial^{2}/(\partial a\partial b)\,C_{\theta}(a,b)italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_a , italic_b ) := ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( ∂ italic_a ∂ italic_b ) italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_a , italic_b ) is the PDF of Frank copula.

We derive the resulting PDF of the ratio R𝑅Ritalic_R, an interpretable representation thereof and the CDF in the following three propositions.

Proposition 1.

Let the PDF of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) be defined by (2.1). Then the PDF of the ratio R:=U/Vassign𝑅𝑈𝑉R:=U/Vitalic_R := italic_U / italic_V is given by

fR(r)=01subscript𝑓𝑅𝑟superscriptsubscript01\displaystyle f_{R}(r)=\int_{0}^{1}italic_f start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT |FV1(s)|cθ(FU(rFV1(s)),s)fU(rFV1(s))dssuperscriptsubscript𝐹𝑉1𝑠subscript𝑐𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝑠subscript𝑓𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝑑𝑠\displaystyle{\left|F_{V}^{-1}(s)\right|\,c_{\theta}\left(F_{U}(r\,F_{V}^{-1}(% s)),s\right)\,f_{U}(rF_{V}^{-1}(s))}\,ds| italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) | italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) , italic_s ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) italic_d italic_s
=01absentsuperscriptsubscript01\displaystyle=\int_{0}^{1}= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT exp(θFU(rFV1(s)))exp(θs)(θ)(exp(θ)1)((exp(θ)1)+(exp(θFU(rFV1(s)))1)(exp(θs)1))2𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝜃𝑠𝜃𝜃1superscript𝜃1𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠1𝜃𝑠12\displaystyle\frac{\exp(-\theta\,F_{U}(r\,F_{V}^{-1}(s)))\,\exp(-\theta s)(-% \theta)(\exp(-\theta)-1)}{((\exp(-\theta)-1)+(\exp(-\theta\,F_{U}(r\,F_{V}^{-1% }(s)))-1)(\exp(-\theta s)-1))^{2}}divide start_ARG roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) roman_exp ( - italic_θ italic_s ) ( - italic_θ ) ( roman_exp ( - italic_θ ) - 1 ) end_ARG start_ARG ( ( roman_exp ( - italic_θ ) - 1 ) + ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) - 1 ) ( roman_exp ( - italic_θ italic_s ) - 1 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
×FV1(s)fU(rFV1(s))ds,absentsuperscriptsubscript𝐹𝑉1𝑠subscript𝑓𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝑑𝑠\displaystyle\times F_{V}^{-1}(s)\,f_{U}(r\,F_{V}^{-1}(s))\,ds\,,× italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) italic_d italic_s , (5)

where |||\cdot|| ⋅ | denotes the absolute value function.

Proof.

Proposition 1 can be derived from Proposition 1 of Ly et al. (2019), who provided analytical results for the PDF of the quotient U/V𝑈𝑉U/Vitalic_U / italic_V of two random variables whose dependence structure can be described by an absolutely continuous copula. ∎

Refer to caption
Figure 1: Examples of the PDF of R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V derived in Proposition 1 for parameters λU,λV{1,2}subscript𝜆𝑈subscript𝜆𝑉12\lambda_{U},\lambda_{V}\in\{1,2\}italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ∈ { 1 , 2 }, δU,δV{2,3}subscript𝛿𝑈subscript𝛿𝑉23\delta_{U},\delta_{V}\in\{2,3\}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ∈ { 2 , 3 } and θ{10,1,10}𝜃10110\theta\in\{-10,1,10\}italic_θ ∈ { - 10 , 1 , 10 } (corresponding to rank correlation coefficients τ{0.67,0.11,0.67}𝜏0.670.110.67\tau\in\{-0.67,0.11,0.67\}italic_τ ∈ { - 0.67 , 0.11 , 0.67 }). In each panel the three lines refer to θ=10𝜃10\theta=-10italic_θ = - 10 (dotted), θ=1𝜃1\theta=1italic_θ = 1 (solid) and θ=10𝜃10\theta=10italic_θ = 10 (dashed). Vertical lines refer to the median values of R𝑅Ritalic_R.
Remark 1.

Figure 1 visualizes the PDF of R𝑅Ritalic_R for different values of the rate, shape and association parameters. The figure illustrates that the form of the PDF is strongly related to the ratio of marginal means 𝔼(U)/𝔼(V)=λVδU/λUδV𝔼𝑈𝔼𝑉subscript𝜆𝑉subscript𝛿𝑈subscript𝜆𝑈subscript𝛿𝑉\mathbb{E}(U)/\mathbb{E}(V)=\lambda_{V}\delta_{U}/\lambda_{U}\delta_{V}blackboard_E ( italic_U ) / blackboard_E ( italic_V ) = italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, which is highest in the lower left panel (𝔼(U)/𝔼(V)=3𝔼𝑈𝔼𝑉3\mathbb{E}(U)/\mathbb{E}(V)=3blackboard_E ( italic_U ) / blackboard_E ( italic_V ) = 3) where the dispersion is very large, and lowest in the upper right panel (𝔼(U)/𝔼(V)=1/3𝔼𝑈𝔼𝑉13\mathbb{E}(U)/\mathbb{E}(V)=1/3blackboard_E ( italic_U ) / blackboard_E ( italic_V ) = 1 / 3) where the PDFs are heavily right-skewed. Figure 1 also describes the association between the PDF and the Kendall’s rank correlation coefficient. In each of the nine cases the mode of the PDF increases as θ𝜃\thetaitalic_θ increases. Of note, the median of R𝑅Ritalic_R does not vary with θ𝜃\thetaitalic_θ (being equal for the three PDFs in each panel).

Proposition 2.

Let the PDF of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) be defined by (2.1). Then the PDF of the random variable R𝑅Ritalic_R in (1) can be re-written as

fR(r)=01subscript𝑓𝑅𝑟superscriptsubscript01\displaystyle f_{R}(r)=\int_{0}^{1}italic_f start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cθ(1Γ(δU)γ(δU,rΛγ1(δV,Γ(δV)s)),s)subscript𝑐𝜃1Γsubscript𝛿𝑈𝛾subscript𝛿𝑈𝑟Λsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠𝑠\displaystyle{c_{\theta}\left(\frac{1}{\Gamma(\delta_{U})}\,\gamma\left(\delta% _{U},r\,\Lambda\,\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})s\right)\right)% ,s\right)}italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG italic_γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_r roman_Λ italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) , italic_s )
×ΛδUr(δU1)Γ(δU)(γ1(δV,Γ(δV)s))δUexp(rΛγ1(δV,Γ(δV)s))ds,absentsuperscriptΛsubscript𝛿𝑈superscript𝑟subscript𝛿𝑈1Γsubscript𝛿𝑈superscriptsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠subscript𝛿𝑈𝑟Λsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠𝑑𝑠\displaystyle\times\,\frac{\Lambda^{\delta_{U}}\,r^{(\delta_{U}-1)}}{\Gamma(% \delta_{U})}\left(\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})s\right)\right% )^{\delta_{U}}\,\exp\left(-r\,\Lambda\,\gamma^{-1}\left(\delta_{V},\Gamma(% \delta_{V})s\right)\right)\,ds\,,× divide start_ARG roman_Λ start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG ( italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_exp ( - italic_r roman_Λ italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) italic_d italic_s , (6)

where Λ:=λU/λVassignnormal-Λsubscript𝜆𝑈subscript𝜆𝑉\Lambda:=\lambda_{U}/\lambda_{V}roman_Λ := italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT denotes the ratio of the two rate parameters and γ(,)𝛾normal-⋅normal-⋅\gamma(\cdot,\cdot)italic_γ ( ⋅ , ⋅ ) is the lower incomplete gamma function.

Proof. The proof of Proposition 2 is given in Appendix A.

Remark 2.

By Proposition 2 the PDF of R𝑅Ritalic_R can be written as a function of the ratio of the rate parameters Λ=λU/λVΛsubscript𝜆𝑈subscript𝜆𝑉\Lambda=\lambda_{U}/\lambda_{V}roman_Λ = italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT. This facilitates the interpretation of the proposed regression model, as it results in a sparser representation of the covariate effects. In particular, covariate effects can be investigated using one-dimensional hypothesis tests and p-values, see our application in Section 4. Figure 2 illustrates how the median of R𝑅Ritalic_R is related to ΛΛ\Lambdaroman_Λ. It is seen that the median decreases with ΛΛ\Lambdaroman_Λ independent of the values of the shape parameters and the association parameter.

Proposition 3.

Let the CDF of (U,V)𝑈𝑉(U,V)( italic_U , italic_V ) be defined by (2.1). Then the CDF of the random variable R𝑅Ritalic_R is given by

FR(r)=01AfU(rFV1(s))rfV(FV1(s))(exp(θs)1)+(A1)exp(θs)exp(θ)+(A1)exp(θs)A𝑑s,subscript𝐹𝑅𝑟superscriptsubscript01𝐴subscript𝑓𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝑟subscript𝑓𝑉superscriptsubscript𝐹𝑉1𝑠𝜃𝑠1𝐴1𝜃𝑠𝜃𝐴1𝜃𝑠𝐴differential-d𝑠\displaystyle F_{R}(r)=\int_{0}^{1}\,{\frac{A\,f_{U}(r\,F_{V}^{-1}(s))\,\frac{% r}{f_{V}(F_{V}^{-1}(s))}\,(\exp(-\theta s)-1)+(A-1)\exp(-\theta s)}{\exp(-% \theta)+(A-1)\exp(-\theta s)-A}}\,ds\,,italic_F start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG italic_A italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) divide start_ARG italic_r end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) end_ARG ( roman_exp ( - italic_θ italic_s ) - 1 ) + ( italic_A - 1 ) roman_exp ( - italic_θ italic_s ) end_ARG start_ARG roman_exp ( - italic_θ ) + ( italic_A - 1 ) roman_exp ( - italic_θ italic_s ) - italic_A end_ARG italic_d italic_s , (7)

where A=exp(θFU(rFV1(s)))𝐴𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠A=\exp\left(-\theta\,F_{U}(rF_{V}^{-1}(s))\right)italic_A = roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ).

Proof.

By Equation (9) of Ly et al. (2019), the CDF of R𝑅Ritalic_R is derived as

FR(r)subscript𝐹𝑅𝑟\displaystyle F_{R}(r)\,italic_F start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r ) =FV(0)=0+01sgn(FV1(s))=1sCθ(FU(rFV1(s)),s)𝑑sabsentsubscriptsubscript𝐹𝑉0absent0superscriptsubscript01subscriptsgnsuperscriptsubscript𝐹𝑉1𝑠absent1𝑠subscript𝐶𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠𝑠differential-d𝑠\displaystyle\,=\underbrace{F_{V}(0)}_{=0}\,+\,\int_{0}^{1}{\underbrace{\text{% sgn}(F_{V}^{-1}(s))}_{=1}\,\frac{\partial}{\partial s}C_{\theta}\left(F_{U}(r% \,F_{V}^{-1}(s)),s\right)}\,ds= under⏟ start_ARG italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( 0 ) end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT under⏟ start_ARG sgn ( italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) end_ARG start_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_s end_ARG italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) , italic_s ) italic_d italic_s
=1θ01slog(1+(exp(θFU(rFV1(s)))1)(exp(θs)1)exp(θ)1)𝑑sabsent1𝜃superscriptsubscript01𝑠1𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠1𝜃𝑠1𝜃1differential-d𝑠\displaystyle\,=-\frac{1}{\theta}\,\int_{0}^{1}{\frac{\partial}{\partial s}% \log\left(1+\frac{\big{(}\exp(-\theta\,F_{U}(rF_{V}^{-1}(s)))-1\big{)}\,(\exp(% -\theta s)-1)}{\exp(-\theta)-1}\right)}\,ds= - divide start_ARG 1 end_ARG start_ARG italic_θ end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_s end_ARG roman_log ( 1 + divide start_ARG ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) - 1 ) ( roman_exp ( - italic_θ italic_s ) - 1 ) end_ARG start_ARG roman_exp ( - italic_θ ) - 1 end_ARG ) italic_d italic_s
=01(exp(θFU(rFV1(s)))fU(rFV1(s))rfV(FV1(s))(exp(θs)1)exp(θ)1+(exp(θFU(rFV1(s)))1)(exp(θs)1)\displaystyle\,=\int_{0}^{1}\left({\frac{\exp(-\theta\,F_{U}(rF_{V}^{-1}(s)))% \,f_{U}(r\,F_{V}^{-1}(s))\,\frac{r}{f_{V}(F_{V}^{-1}(s))}\,(\exp(-\theta s)-1)% }{\exp(-\theta)-1+\big{(}\exp(-\theta\,F_{U}(rF_{V}^{-1}(s)))-1\big{)}\,(\exp(% -\theta s)-1)}}\right.= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( divide start_ARG roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) divide start_ARG italic_r end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) end_ARG ( roman_exp ( - italic_θ italic_s ) - 1 ) end_ARG start_ARG roman_exp ( - italic_θ ) - 1 + ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) - 1 ) ( roman_exp ( - italic_θ italic_s ) - 1 ) end_ARG
+(exp(θFU(rFV1(s)))1)exp(θs)exp(θ)1+(exp(θFU(rFV1(s)))1)(exp(θs)1))ds,\displaystyle\hskip 24.18501pt\left.+\frac{\big{(}\exp(-\theta\,F_{U}(rF_{V}^{% -1}(s)))-1\big{)}\,\exp(-\theta s)}{\exp(-\theta)-1+\big{(}\exp(-\theta\,F_{U}% (rF_{V}^{-1}(s)))-1\big{)}\,(\exp(-\theta s)-1)}\right)ds\,,+ divide start_ARG ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) - 1 ) roman_exp ( - italic_θ italic_s ) end_ARG start_ARG roman_exp ( - italic_θ ) - 1 + ( roman_exp ( - italic_θ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) ) ) - 1 ) ( roman_exp ( - italic_θ italic_s ) - 1 ) end_ARG ) italic_d italic_s ,

where sgn()sgn\text{sgn}(\cdot)sgn ( ⋅ ) is the sign function. Rearrangement of the last equation gives (7). ∎

Refer to caption
Figure 2: Median of R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V for parameters Λ[0.1,4]Λ0.14\Lambda\in[0.1,4]roman_Λ ∈ [ 0.1 , 4 ] and δU,δV{2,3}subscript𝛿𝑈subscript𝛿𝑉23\delta_{U},\delta_{V}\in\{2,3\}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ∈ { 2 , 3 }, as calculated from the formula in Proposition 2. Note that the median of R𝑅Ritalic_R does not vary with θ𝜃\thetaitalic_θ.

2.2 Regression specification and estimation

To model the entire distribution of R𝑅Ritalic_R as a function of covariates X=(X1,,Xp)𝑋superscriptsubscript𝑋1subscript𝑋𝑝topX=(X_{1},\ldots,X_{p})^{\top}italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, we propose to relate both the logarithmic rate parameters λUsubscript𝜆𝑈\lambda_{U}italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and λVsubscript𝜆𝑉\lambda_{V}italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and the association parameter θ𝜃\thetaitalic_θ to predictor functions of the form

log(λU|X)conditionalsubscript𝜆𝑈𝑋\displaystyle\log(\lambda_{U}|X)roman_log ( italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT | italic_X ) =ηU=βU0+βU1X1++βUpXp,absentsubscript𝜂𝑈subscript𝛽𝑈0subscript𝛽𝑈1subscript𝑋1subscript𝛽𝑈𝑝subscript𝑋𝑝\displaystyle=\eta_{U}=\beta_{U0}+\beta_{U1}X_{1}+\ldots+\beta_{Up}X_{p}\,,= italic_η start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_U 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_U 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_U italic_p end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , (8)
log(λV|X)conditionalsubscript𝜆𝑉𝑋\displaystyle\log(\lambda_{V}|X)roman_log ( italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT | italic_X ) =ηV=βV0+βV1X1++βVpXpandformulae-sequenceabsentsubscript𝜂𝑉subscript𝛽𝑉0subscript𝛽𝑉1subscript𝑋1subscript𝛽𝑉𝑝subscript𝑋𝑝and\displaystyle=\eta_{V}=\beta_{V0}+\beta_{V1}X_{1}+\ldots+\beta_{Vp}X_{p}\quad% \text{and}= italic_η start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_V 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_V 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_V italic_p end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and (9)
θ|Xconditional𝜃𝑋\displaystyle\theta|Xitalic_θ | italic_X =ηθ=βθ0+βθ1X1++βθpXp,absentsubscript𝜂𝜃subscript𝛽𝜃0subscript𝛽𝜃1subscript𝑋1subscript𝛽𝜃𝑝subscript𝑋𝑝\displaystyle=\eta_{\theta}=\beta_{\theta 0}+\beta_{\theta 1}X_{1}+\ldots+% \beta_{\theta p}X_{p}\,,= italic_η start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT italic_θ 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_θ 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + italic_β start_POSTSUBSCRIPT italic_θ italic_p end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , (10)

where 𝜷U=(βU0,,βUp)subscript𝜷𝑈superscriptsubscript𝛽𝑈0subscript𝛽𝑈𝑝top\boldsymbol{\beta}_{U}=(\beta_{U0},\ldots,\beta_{Up})^{\top}bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_U 0 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_U italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, 𝜷V=(βV0,,βVp)subscript𝜷𝑉superscriptsubscript𝛽𝑉0subscript𝛽𝑉𝑝top\boldsymbol{\beta}_{V}=(\beta_{V0},\ldots,\beta_{Vp})^{\top}bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_V 0 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_V italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and 𝜷θ=(βθ0,,βθp)subscript𝜷𝜃superscriptsubscript𝛽𝜃0subscript𝛽𝜃𝑝top\boldsymbol{\beta}_{\theta}=(\beta_{\theta 0},\ldots,\beta_{\theta p})^{\top}bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = ( italic_β start_POSTSUBSCRIPT italic_θ 0 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_θ italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT are sets of real-valued coefficients. Analogous to classical gamma regression (Chapter 5.3 of Fahrmeir et al., 2022), the use of the logarithmic transformation in (8) and (9) ensures positivity of the rate parameters. Since θ{0}𝜃0\theta\in\mathbb{R}\setminus\{0\}italic_θ ∈ blackboard_R ∖ { 0 }, no transformation is needed for the association parameter. As a result of (8) and (9) it holds that log(Λ|X)=ηUηVconditionalΛ𝑋subscript𝜂𝑈subscript𝜂𝑉\log(\Lambda|X)=\eta_{U}-\eta_{V}roman_log ( roman_Λ | italic_X ) = italic_η start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT.

Remark 3.

In principle, our approach allows to make use of the full flexibility of GAMLSS by relating all distributional parameters (including the shape parameters δU,δVsubscript𝛿𝑈subscript𝛿𝑉\delta_{U},\delta_{V}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT) to the covariates and by including nonlinear effects in the predictor functions. However, in our application we found that the specification in (8) – (10) provides a sufficient fit, thereby meeting a compromise between model fit and model complexity. Furthermore, it greatly simplifies the interpretability of the results (as we will further elaborate in Section 4). Based on these considerations, our model assumes that the shape parameters δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT do not depend on X𝑋Xitalic_X, but can be treated as nuisance parameters.

Definition 1.

The regression model for the ratio R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V with the distribution from Proposition 3 and with covariate-dependent parameters as specified in (8)–(10) will be termed Frank copula with Gamma Distributed Marginals (FCGAM) in the following. The FCGAM model imposes the constraint δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT >1absent1>1> 1 to ensure that the two marginals both exhibit a unimodal, right-skewed distribution, which is the common form of biomarker distributions in medical applications.

Corollary 1.

For a set of i.i.d. observations (u1,v1,𝐱1),,(un,vn,𝐱n)superscriptsubscript𝑢1subscript𝑣1superscriptsubscript𝐱1toptopnormal-…superscriptsubscript𝑢𝑛subscript𝑣𝑛superscriptsubscript𝐱𝑛toptop(u_{1},v_{1},\boldsymbol{x}_{1}^{\top})^{\top},\ldots,(u_{n},v_{n},\boldsymbol% {x}_{n}^{\top})^{\top}( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , ( italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with ratios r1=u1/v1,,rn=un/vnformulae-sequencesubscript𝑟1subscript𝑢1subscript𝑣1normal-…subscript𝑟𝑛subscript𝑢𝑛subscript𝑣𝑛r_{1}=u_{1}/v_{1},\ldots,r_{n}=u_{n}/v_{n}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and model coefficients 𝛄=(𝛃U,𝛃V,𝛃θ,δU,δV)𝛄superscriptsuperscriptsubscript𝛃𝑈topsuperscriptsubscript𝛃𝑉topsuperscriptsubscript𝛃𝜃topsubscript𝛿𝑈subscript𝛿𝑉top\boldsymbol{\gamma}=\left(\boldsymbol{\beta}_{U}^{\top},\boldsymbol{\beta}_{V}% ^{\top},\boldsymbol{\beta}_{\theta}^{\top},\delta_{U},\delta_{V}\right)^{\top}bold_italic_γ = ( bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, the log-likelihood function of the FCGAM model is given by

(𝜷U,𝜷V,\displaystyle\ell(\boldsymbol{\beta}_{U},\boldsymbol{\beta}_{V},roman_ℓ ( bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , 𝜷θ,δU,δV;u1,,un,v1,,vn,𝒙1,,𝒙n)\displaystyle\boldsymbol{\beta}_{\theta},\delta_{U},\delta_{V};u_{1},\ldots,u_% {n},v_{1},\ldots,v_{n},\boldsymbol{x}_{1},\ldots,\boldsymbol{x}_{n})bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ; italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=i=1n{\displaystyle=\sum_{i=1}^{n}\,\bigg{\{}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { log(fU,V(ui,vi|𝒙i,𝜷U,𝜷V,𝜷θ,δU,δV))}\displaystyle\log\Big{(}f_{U,V}(u_{i},v_{i}|\boldsymbol{x}_{i},\boldsymbol{% \beta}_{U},\boldsymbol{\beta}_{V},\boldsymbol{\beta}_{\theta},\delta_{U},% \delta_{V})\Big{)}\bigg{\}}roman_log ( italic_f start_POSTSUBSCRIPT italic_U , italic_V end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ) }
=i=1n{\displaystyle=\sum_{i=1}^{n}\,\bigg{\{}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { log(exp(𝒙i𝜷θFU(ui;𝒙i𝜷U,δU))exp(𝒙i𝜷θFV(vi;𝒙i𝜷V,δV))\displaystyle\log\Big{(}\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{% \theta}\,F_{U}(u_{i};\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{U},\delta_{U% }))\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{\theta}\,F_{V}(v_{i};% \boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{V},\delta_{V}))roman_log ( roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) ) roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) )
×(𝒙i𝜷θ)(exp(𝒙i𝜷θ)1)fU(ui;𝒙i𝜷U,δU)fV(vi;𝒙i𝜷V,δV))\displaystyle\ \times\ (-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{\theta})% (\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{\theta})-1)f_{U}(u_{i};% \boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{U},\delta_{U})\,f_{V}(v_{i};% \boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{V},\delta_{V})\Big{)}× ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ( roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) - 1 ) italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) )
22\displaystyle-2\,- 2 log((exp(𝒙i𝜷θ)1)+(exp(𝒙i𝜷θFU(ui;𝒙i𝜷U,δU))1)\displaystyle\log\,\Big{(}(\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{% \theta})-1)+(\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{\theta}\,F_{U}% (u_{i};\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{U},\delta_{U}))-1)roman_log ( ( roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) - 1 ) + ( roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) ) - 1 )
×(exp(𝒙i𝜷θFV(vi;𝒙i𝜷V,δV))1))}.\displaystyle\ \times\ (\exp(-\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{% \theta}\,F_{V}(v_{i};\boldsymbol{x}_{i}^{\top}\boldsymbol{\beta}_{V},\delta_{V% }))-1)\Big{)}\bigg{\}}\,.× ( roman_exp ( - bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ) - 1 ) ) } . (11)
Corollary 2.

Under the usual regularity assumptions, the estimator

𝜸^=(𝜷^U,\displaystyle\hat{\boldsymbol{\gamma}}=(\hat{\boldsymbol{\beta}}^{\top}_{U},over^ start_ARG bold_italic_γ end_ARG = ( over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , 𝜷^V,𝜷^θ,δ^U,δ^V):=\displaystyle\hat{\boldsymbol{\beta}}^{\top}_{V},\hat{\boldsymbol{\beta}}^{% \top}_{\theta},\hat{\delta}_{U},\hat{\delta}_{V})^{\top}:=over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT :=
𝑎𝑟𝑔𝑚𝑎𝑥𝜸=𝜷U,𝜷V,𝜷θ,δU,δV(𝜷U,𝜷V,𝜷θ,δU,δV;u1,,un,v1,,vn,𝒙1,,𝒙n)𝜸subscript𝜷𝑈subscript𝜷𝑉subscript𝜷𝜃subscript𝛿𝑈subscript𝛿𝑉𝑎𝑟𝑔𝑚𝑎𝑥subscript𝜷𝑈subscript𝜷𝑉subscript𝜷𝜃subscript𝛿𝑈subscript𝛿𝑉subscript𝑢1subscript𝑢𝑛subscript𝑣1subscript𝑣𝑛subscript𝒙1subscript𝒙𝑛\displaystyle\underset{\boldsymbol{\gamma}=\boldsymbol{\beta}_{U},\boldsymbol{% \beta}_{V},\boldsymbol{\beta}_{\theta},\delta_{U},\delta_{V}}{\text{argmax}}% \ell(\boldsymbol{\beta}_{U},\boldsymbol{\beta}_{V},\boldsymbol{\beta}_{\theta}% ,\delta_{U},\delta_{V};u_{1},\ldots,u_{n},v_{1},\ldots,v_{n},\boldsymbol{x}_{1% },\ldots,\boldsymbol{x}_{n})\,start_UNDERACCENT bold_italic_γ = bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_UNDERACCENT start_ARG argmax end_ARG roman_ℓ ( bold_italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , bold_italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ; italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (12)

is consistent and asymptotically normal for nnormal-→𝑛n\to\inftyitalic_n → ∞.

Implementational details. Maximization of the log-likelihood function in (1) can be carried out using the R function FCGAMoptim(), which is part of the supplementary material to this paper. The optimization algorithm is based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm implemented in the R function optim(), setting the additional constraint δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT >1absent1>1> 1.

2.3 Prediction of distributional parameters and inference

Prediction. For a new observation with covariate values 𝒙~~𝒙\tilde{\boldsymbol{x}}over~ start_ARG bold_italic_x end_ARG, predictions of the conditional PDF fR(r|𝒙~)subscript𝑓𝑅conditional𝑟~𝒙{f}_{R}(r|\tilde{\boldsymbol{x}})italic_f start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r | over~ start_ARG bold_italic_x end_ARG ) can be obtained by computing the maximum likelihood estimate (MLE) and by plugging the estimated parameters Λ^|𝒙~=exp(𝒙~𝜷^U𝒙~𝜷^V)conditional^Λ~𝒙superscript~𝒙topsubscript^𝜷𝑈superscript~𝒙topsubscript^𝜷𝑉\hat{\Lambda}\,|\,\tilde{\boldsymbol{x}}=\exp(\tilde{\boldsymbol{x}}^{\top}% \hat{\boldsymbol{\beta}}_{U}-\tilde{\boldsymbol{x}}^{\top}\hat{\boldsymbol{% \beta}}_{V})over^ start_ARG roman_Λ end_ARG | over~ start_ARG bold_italic_x end_ARG = roman_exp ( over~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - over~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ), θ^|𝒙~=𝒙~𝜷^θconditional^𝜃~𝒙superscript~𝒙topsubscript^𝜷𝜃\hat{\theta}\,|\,\tilde{\boldsymbol{x}}=\tilde{\boldsymbol{x}}^{\top}\hat{% \boldsymbol{\beta}}_{\theta}over^ start_ARG italic_θ end_ARG | over~ start_ARG bold_italic_x end_ARG = over~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and δ^Usubscript^𝛿𝑈\hat{\delta}_{U}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, δ^Vsubscript^𝛿𝑉\hat{\delta}_{V}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT in Equation (2). The predicted PDF can then be used to predict any distributional parameter of interest (like the expected value, median or quantiles). For example, denoting the predicted PDF by f^R(r|𝒙~)subscript^𝑓𝑅conditional𝑟~𝒙\hat{f}_{R}(r|\tilde{\boldsymbol{x}})over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r | over~ start_ARG bold_italic_x end_ARG ), the predicted median can be calculated by

r^med|𝒙~=min{r+|0rf^R(s|𝒙~)𝑑s0.5}.conditionalsubscript^𝑟med~𝒙minconditional-set𝑟superscriptsuperscriptsubscript0𝑟subscript^𝑓𝑅conditional𝑠~𝒙differential-d𝑠0.5\displaystyle\hat{r}_{\text{med}}\,|\,\tilde{\boldsymbol{x}}=\text{min}\bigg{% \{}r\in\mathbb{R}^{+}\,\Big{|}\,\int_{0}^{r}\hat{f}_{R}(s|\tilde{\boldsymbol{x% }})\,ds\geq 0.5\bigg{\}}\,.over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT med end_POSTSUBSCRIPT | over~ start_ARG bold_italic_x end_ARG = min { italic_r ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT | ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_s | over~ start_ARG bold_italic_x end_ARG ) italic_d italic_s ≥ 0.5 } . (13)

Inference. Despite the asymptotic results from Corollary 2, more reliable finite-sample confidence intervals have been established in additive models (Wood, 2017). This is particularly the case for the quantities of interest here (such as the median of R𝑅Ritalic_R above). The reason is that these are nonlinear transformations of the original model coefficients such that confidence intervals would show an additional finite-sample bias due to the application of the ΔΔ\Deltaroman_Δ-rule. Following Wood (2017), we thus propose to construct confidence intervals of 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ using a Bayesian approach, which we accordingly refer to as credible intervals. Assuming flat priors on 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ, the posterior distribution of 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ is given by

𝜸|u1,,un,v1,,vnN(𝜸^,J^1(𝜸^)),similar-toconditional𝜸subscript𝑢1subscript𝑢𝑛subscript𝑣1subscript𝑣𝑛𝑁^𝜸superscript^𝐽1^𝜸\displaystyle\boldsymbol{\gamma}\,|\,u_{1},\ldots,u_{n},v_{1},\ldots,v_{n}\,% \sim\,N\big{(}\hat{\boldsymbol{\gamma}},\hat{J}^{-1}(\hat{\boldsymbol{\gamma}}% )\big{)}\,,bold_italic_γ | italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ italic_N ( over^ start_ARG bold_italic_γ end_ARG , over^ start_ARG italic_J end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_γ end_ARG ) ) , (14)

where J^(𝜸^)^𝐽^𝜸\hat{J}(\hat{\boldsymbol{\gamma}})over^ start_ARG italic_J end_ARG ( over^ start_ARG bold_italic_γ end_ARG ) is the Hessian of the negative log-likelihood evaluated at 𝜸^^𝜸\hat{\boldsymbol{\gamma}}over^ start_ARG bold_italic_γ end_ARG (Equation (6.26) of Wood, 2017). Consequently, approximate (1α)%percent1𝛼(1-\alpha)\%( 1 - italic_α ) % credible intervals for the coefficients 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ can be obtained by drawing a large sample from the posterior distribution (14) and by calculating the α/2𝛼2\alpha/2italic_α / 2 and (1α/2)1𝛼2(1-\alpha/2)( 1 - italic_α / 2 ) percentiles from this sample (Wood, 2017, p. 293). In our simulations (Section 3) and in the analysis of the DCN study data (Section 4) we used samples of size 10,000 throughout.

3 Simulations

We conducted three simulation studies to investigate the performance of the FCGAM model. Our main aims were (a) to analyze the model fit and the coverage of the credible intervals, (b) to evaluate how the performance of the FCGAM approach is affected by the sample size and the choice of the association parameter θ𝜃\thetaitalic_θ, and (c) to benchmark our method against alternative ones, in particular against the extended GB2 model by Berger et al. (2019) which assumes the correlation between U𝑈Uitalic_U and V𝑉Vitalic_V to be positive.

3.1 Experimental design

In all simulations the ratio outcome was generated according to the PDF of the FCGAM model derived in Proposition 1. Similar to the application data in Section 4, we considered two standard normally distributed covariates X1,X2N(0,1)similar-tosubscript𝑋1subscript𝑋2𝑁01X_{1},X_{2}\sim N(0,1)italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_N ( 0 , 1 ) and two binary covariates X3,X4B(1,0.5)similar-tosubscript𝑋3subscript𝑋4𝐵10.5X_{3},X_{4}\sim B(1,0.5)italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ∼ italic_B ( 1 , 0.5 ), which were equi-correlated with Pearson correlation coefficient 0.4. For each n{200,500,1000}𝑛2005001000n\in\{200,500,1000\}italic_n ∈ { 200 , 500 , 1000 } we simulated 1000100010001000 independent data sets.

In Simulation Study 1, we considered scenarios with fixed negative correlation (the case which motivated our development of the FCGAM model), setting βθ0{1,5,10}subscript𝛽𝜃01510\beta_{\theta 0}\in\{-1,-5,-10\}italic_β start_POSTSUBSCRIPT italic_θ 0 end_POSTSUBSCRIPT ∈ { - 1 , - 5 , - 10 } and βθ1==βθ4=0subscript𝛽𝜃1subscript𝛽𝜃40\beta_{\theta 1}=\ldots=\beta_{\theta 4}=0italic_β start_POSTSUBSCRIPT italic_θ 1 end_POSTSUBSCRIPT = … = italic_β start_POSTSUBSCRIPT italic_θ 4 end_POSTSUBSCRIPT = 0. This resulted in the respective rank correlation coefficients τ{0.11,0.46,0.67}𝜏0.110.460.67\tau\in\{-0.11,-0.46,-0.67\}italic_τ ∈ { - 0.11 , - 0.46 , - 0.67 }. The rate parameters were related to the four covariates through the coefficients βU=(0,0.4,0.4,0.2,0.2)subscript𝛽𝑈superscript00.40.40.20.2top\beta_{U}=(0,0.4,-0.4,0.2,-0.2)^{\top}italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ( 0 , 0.4 , - 0.4 , 0.2 , - 0.2 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and βV=(0,0.2,0.2,0.4,0.4)subscript𝛽𝑉superscript00.20.20.40.4top\beta_{V}=(0,-0.2,0.2,-0.4,0.4)^{\top}italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = ( 0 , - 0.2 , 0.2 , - 0.4 , 0.4 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. The shape parameters were set to δU=2subscript𝛿𝑈2\delta_{U}=2italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = 2 and δV=6subscript𝛿𝑉6\delta_{V}=6italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = 6 in all scenarios.

In Simulation Study 2, we considered scenarios with fixed positive correlation (the case which has already been covered by the eGB2 model but also applies to the FCGAM model), setting βθ0{1,5,10}subscript𝛽𝜃01510\beta_{\theta 0}\in\{1,5,10\}italic_β start_POSTSUBSCRIPT italic_θ 0 end_POSTSUBSCRIPT ∈ { 1 , 5 , 10 } and βθ1==βθ4=0subscript𝛽𝜃1subscript𝛽𝜃40\beta_{\theta 1}=\ldots=\beta_{\theta 4}=0italic_β start_POSTSUBSCRIPT italic_θ 1 end_POSTSUBSCRIPT = … = italic_β start_POSTSUBSCRIPT italic_θ 4 end_POSTSUBSCRIPT = 0. This resulted in the respective rank correlation coefficients τ{0.11,0.46,0.67}𝜏0.110.460.67\tau\in\{0.11,0.46,0.67\}italic_τ ∈ { 0.11 , 0.46 , 0.67 }. To ensure that the outcome values were in a meaningful range (comparable to Simulation Study 1) we set the regression coefficients to βU=(0,0.4,0.4,0.2,0.2)subscript𝛽𝑈superscript00.40.40.20.2top\beta_{U}=(0,0.4,-0.4,0.2,-0.2)^{\top}italic_β start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ( 0 , 0.4 , - 0.4 , 0.2 , - 0.2 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and βV=(0,0.2,0.2,0.4,0.4)subscript𝛽𝑉superscript00.20.20.40.4top\beta_{V}=(0,0.2,-0.2,0.4,-0.4)^{\top}italic_β start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = ( 0 , 0.2 , - 0.2 , 0.4 , - 0.4 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, and the shape parameters to δU=2subscript𝛿𝑈2\delta_{U}=2italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = 2 and δV=2subscript𝛿𝑉2\delta_{V}=2italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT = 2.

In Simulation Study 3, we evaluated how the model fit of the FCGAM model was affected when falsely assuming a dependence of θ𝜃\thetaitalic_θ on X1,,X4subscript𝑋1subscript𝑋4X_{1},\ldots,X_{4}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, or when ignoring a present dependence of θ𝜃\thetaitalic_θ on X1,,X4subscript𝑋1subscript𝑋4X_{1},\ldots,X_{4}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. For this we reconsidered the data sets from Simulation Study 1 with τ=0.11𝜏0.11\tau=-0.11italic_τ = - 0.11, and additionally considered scenarios where the association parameter θ𝜃\thetaitalic_θ was related to the four covariates through the coefficient vector βθ=(0,1,1,0.5,0.5)subscript𝛽𝜃superscript0110.50.5top\beta_{\theta}=(0,1,-1,0.5,-0.5)^{\top}italic_β start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = ( 0 , 1 , - 1 , 0.5 , - 0.5 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (resulting in covariate-dependent rank correlation coefficients τisubscript𝜏𝑖\tau_{i}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with the remaining parameters as in Simulation Study 1). In both cases we fitted the FCGAM model with covariate-dependent θ𝜃\thetaitalic_θ (according to (10)) and with constant θ=βθ0𝜃subscript𝛽𝜃0\theta=\beta_{\theta 0}italic_θ = italic_β start_POSTSUBSCRIPT italic_θ 0 end_POSTSUBSCRIPT.

Benchmark methods. We evaluated the fits of the 1000 FCGAM models by computing the predictive log-likelihood values on 1000 independent test data sets. The test data sets (of size n𝑛nitalic_n each) were also used to compare the FCGAM model to alternative models. To this purpose, we evaluated the predictive log-likelihood values of the following benchmark methods, where (ii), (iii) and (vi) are univariate regression models for R𝑅Ritalic_R, (iv) is a univariate regression model for log(R)𝑅\log(R)roman_log ( italic_R ), and (v) and (vii) are distributional regression models:

  • (i)

    The copula-based FCGAM model introduced in Section 2.2.

  • (ii)

    The extended GB2 model (Berger et al., 2019) (eGB2) assuming a positive correlation between U𝑈Uitalic_U and V𝑉Vitalic_V.

  • (iii)

    The simple GB2 model (GB2) assuming zero Pearson correlation between U𝑈Uitalic_U and V𝑉Vitalic_V.

  • (iv)

    A Gaussian regression model with log-transformed outcome values (LN).

  • (v)

    A Gaussian GAMLSS with log-transformed outcome values, where both the mean and the standard deviation were related to the covariates (LN.LSS). The standard deviation was modeled using the log link.

  • (vi)

    A Gamma regression model with the original outcome values (GA). The mean parameter was related to the covariates and was modeled using the log link.

  • (vii)

    A Gamma GAMLSS with the original outcome values, where both the mean and the scale parameters were related to the covariates (GA.LSS) using the log link.

3.2 Results

Point estimates of the FCGAM coefficients. Figure 3 presents the coefficient estimates β^Usubscript^𝛽𝑈\hat{\beta}_{U}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT in Simulation Study 1 with negative (but covariate-independent) correlation between U𝑈Uitalic_U and V𝑉Vitalic_V. The boxplots show that on average the estimated coefficients are very close to the true ones, regardless of the association parameter θ𝜃\thetaitalic_θ. Accordingly, the finite-sample bias of the MLEs is small in all scenarios (with varying n𝑛nitalic_n and θ𝜃\thetaitalic_θ). From Figure 3 it can also be seen that, as expected, the variance of the estimates decreases with increasing sample size, in particular for the two binary covariates X3subscript𝑋3X_{3}italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and X4subscript𝑋4X_{4}italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. In contrast, the correlation (determined by the value of θ𝜃\thetaitalic_θ) has only a small impact on the variance of the estimates. The coefficient estimates β^Vsubscript^𝛽𝑉\hat{\beta}_{V}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT (presented in Supplementary Figure S1) exhibit even smaller variances in the scenarios with n=500𝑛500n=500italic_n = 500 and n=1000𝑛1000n=1000italic_n = 1000.

The coefficient estimates β^Usubscript^𝛽𝑈\hat{\beta}_{U}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and β^Vsubscript^𝛽𝑉\hat{\beta}_{V}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT from Simulation Study 2 with positive correlation between U𝑈Uitalic_U and V𝑉Vitalic_V are shown in Supplementary Figures S2 and S3, respectively. In both cases the bias is small throughout all scenarios. Regarding the variance of the estimates, the results are largely the same as in Figure 3.

Refer to caption
Figure 3: Point estimates of the FCGAM coefficients in Simulation Study 1. The boxplots visualize the MLEs of the coefficients βU1=0.4subscript𝛽𝑈10.4\beta_{U1}=0.4italic_β start_POSTSUBSCRIPT italic_U 1 end_POSTSUBSCRIPT = 0.4, βU2=0.4subscript𝛽𝑈20.4\beta_{U2}=-0.4italic_β start_POSTSUBSCRIPT italic_U 2 end_POSTSUBSCRIPT = - 0.4, βU3=0.2subscript𝛽𝑈30.2\beta_{U3}=0.2italic_β start_POSTSUBSCRIPT italic_U 3 end_POSTSUBSCRIPT = 0.2 and βU40.2subscript𝛽𝑈40.2\beta_{U4}-0.2italic_β start_POSTSUBSCRIPT italic_U 4 end_POSTSUBSCRIPT - 0.2 that were obtained from fitting the FCGAM model to 1000 data sets of size n𝑛nitalic_n each. The red lines refer to the true values of the coefficients.

Coverage proportions of the credible intervals. The coverage proportions of the 95% credible intervals obtained from the FCGAM fits are presented in Table 1. They range between 0.9280.9580.9280.9580.928-0.9580.928 - 0.958 (Simulation Study 1) and between 0.9280.9620.9280.9620.928-0.9620.928 - 0.962 (Simulation Study 2), which is close to the nominal coverage of 95%. There were only minor differences with regard to sample size and the correlation coefficient. This result demonstrates that not only point estimation but also inference works well for highly positive or negative correlations and fairly small samples.

Table 1: Coverage proportions of the FCGAM credible intervals. For each coefficient βUjsubscript𝛽𝑈𝑗\beta_{Uj}italic_β start_POSTSUBSCRIPT italic_U italic_j end_POSTSUBSCRIPT, j=1,,4,𝑗14j=1,\ldots,4,italic_j = 1 , … , 4 , and βVjsubscript𝛽𝑉𝑗\beta_{Vj}italic_β start_POSTSUBSCRIPT italic_V italic_j end_POSTSUBSCRIPT, j=1,,4,𝑗14j=1,\ldots,4,italic_j = 1 , … , 4 , the table contains the coverage proportion of the 95% credible interval, as obtained from fitting the FCGAM model to 1000 independent data sets of size n𝑛nitalic_n each.
Simulation Study 1 βU1subscript𝛽𝑈1\beta_{U1}italic_β start_POSTSUBSCRIPT italic_U 1 end_POSTSUBSCRIPT βU2subscript𝛽𝑈2\beta_{U2}italic_β start_POSTSUBSCRIPT italic_U 2 end_POSTSUBSCRIPT βU3subscript𝛽𝑈3\beta_{U3}italic_β start_POSTSUBSCRIPT italic_U 3 end_POSTSUBSCRIPT βU4subscript𝛽𝑈4\beta_{U4}italic_β start_POSTSUBSCRIPT italic_U 4 end_POSTSUBSCRIPT βV1subscript𝛽𝑉1\beta_{V1}italic_β start_POSTSUBSCRIPT italic_V 1 end_POSTSUBSCRIPT βV2subscript𝛽𝑉2\beta_{V2}italic_β start_POSTSUBSCRIPT italic_V 2 end_POSTSUBSCRIPT βV3subscript𝛽𝑉3\beta_{V3}italic_β start_POSTSUBSCRIPT italic_V 3 end_POSTSUBSCRIPT βV4subscript𝛽𝑉4\beta_{V4}italic_β start_POSTSUBSCRIPT italic_V 4 end_POSTSUBSCRIPT
n𝑛nitalic_n=200 θ=1𝜃1\theta=-1italic_θ = - 1 0.938 0.932 0.936 0.938 0.949 0.949 0.954 0.950
θ=5𝜃5\theta=-5italic_θ = - 5 0.958 0.942 0.955 0.941 0.943 0.935 0.946 0.952
θ=10𝜃10\theta=-10italic_θ = - 10 0.937 0.938 0.932 0.947 0.952 0.946 0.940 0.949
n𝑛nitalic_n=500 θ=1𝜃1\theta=-1italic_θ = - 1 0.949 0.937 0.940 0.938 0.952 0.953 0.946 0.930
θ=5𝜃5\theta=-5italic_θ = - 5 0.954 0.938 0.928 0.935 0.952 0.932 0.935 0.949
θ=10𝜃10\theta=-10italic_θ = - 10 0.934 0.948 0.939 0.951 0.955 0.947 0.958 0.943
n𝑛nitalic_n=1000 θ=1𝜃1\theta=-1italic_θ = - 1 0.949 0.937 0.955 0.947 0.944 0.946 0.948 0.937
θ=5𝜃5\theta=-5italic_θ = - 5 0.947 0.942 0.936 0.936 0.957 0.950 0.950 0.950
θ=10𝜃10\theta=-10italic_θ = - 10 0.933 0.945 0.934 0.953 0.943 0.948 0.949 0.943
Simulation Study 2 βU1subscript𝛽𝑈1\beta_{U1}italic_β start_POSTSUBSCRIPT italic_U 1 end_POSTSUBSCRIPT βU2subscript𝛽𝑈2\beta_{U2}italic_β start_POSTSUBSCRIPT italic_U 2 end_POSTSUBSCRIPT βU3subscript𝛽𝑈3\beta_{U3}italic_β start_POSTSUBSCRIPT italic_U 3 end_POSTSUBSCRIPT βU4subscript𝛽𝑈4\beta_{U4}italic_β start_POSTSUBSCRIPT italic_U 4 end_POSTSUBSCRIPT βV1subscript𝛽𝑉1\beta_{V1}italic_β start_POSTSUBSCRIPT italic_V 1 end_POSTSUBSCRIPT βV2subscript𝛽𝑉2\beta_{V2}italic_β start_POSTSUBSCRIPT italic_V 2 end_POSTSUBSCRIPT βV3subscript𝛽𝑉3\beta_{V3}italic_β start_POSTSUBSCRIPT italic_V 3 end_POSTSUBSCRIPT βV4subscript𝛽𝑉4\beta_{V4}italic_β start_POSTSUBSCRIPT italic_V 4 end_POSTSUBSCRIPT
n𝑛nitalic_n=200 θ=1𝜃1\theta=1italic_θ = 1 0.935 0.929 0.934 0.935 0.947 0.933 0.959 0.954
θ=5𝜃5\theta=5italic_θ = 5 0.952 0.940 0.954 0.941 0.949 0.951 0.954 0.947
θ=10𝜃10\theta=10italic_θ = 10 0.938 0.939 0.939 0.947 0.951 0.944 0.948 0.953
n𝑛nitalic_n=500 θ=1𝜃1\theta=1italic_θ = 1 0.941 0.948 0.936 0.928 0.940 0.942 0.946 0.940
θ=5𝜃5\theta=5italic_θ = 5 0.948 0.941 0.929 0.929 0.952 0.944 0.941 0.962
θ=10𝜃10\theta=10italic_θ = 10 0.931 0.947 0.937 0.951 0.948 0.944 0.949 0.937
n𝑛nitalic_n=1000 θ=1𝜃1\theta=1italic_θ = 1 0.950 0.944 0.954 0.944 0.952 0.939 0.953 0.935
θ=5𝜃5\theta=5italic_θ = 5 0.955 0.942 0.934 0.934 0.956 0.952 0.949 0.947
θ=10𝜃10\theta=10italic_θ = 10 0.928 0.946 0.934 0.955 0.935 0.956 0.948 0.945

Distributional prediction. The root mean squared error (RMSE) of the estimated conditional median values computed from (13) are given in Table 2. In Simulation Study 1 the performance is quite similar for all three values of θ𝜃\thetaitalic_θ, whereas in Simulation Study 2 the RMSE considerably decreases with increasing value of θ𝜃\thetaitalic_θ. This indicates that estimating the median value works best for highly positive correlations where the PDF of R𝑅Ritalic_R is rather diffuse with a large mode value (compare Figure 1). It is also seen from Table 2 that the means and the standard deviations of the RMSE decrease with increasing sample size.

Table 2: RMSE of the estimated conditional median values of R𝑅Ritalic_R. The table presents the mean RMSE of the estimated conditional median of R𝑅Ritalic_R, as obtained from fitting the FCGAM model to 1000 independent data sets of size n𝑛nitalic_n each. Standard deviations of the RMSE values (across the 1000 data sets) are given in brackets.
Simulation Study 1 θ=1𝜃1\theta=-1italic_θ = - 1 θ=5𝜃5\theta=-5italic_θ = - 5 θ=10𝜃10\theta=-10italic_θ = - 10
n𝑛nitalic_n = 200 0.076 (0.035) 0.084 (0.039) 0.081 (0.037)
n𝑛nitalic_n = 500 0.049 (0.021) 0.053 (0.023) 0.051 (0.021)
n𝑛nitalic_n = 1000 0.034 (0.015) 0.038 (0.016) 0.036 (0.016)
Simulation Study 2 θ=1𝜃1\theta=1italic_θ = 1 θ=5𝜃5\theta=5italic_θ = 5 θ=10𝜃10\theta=10italic_θ = 10
n𝑛nitalic_n = 200 0.154 (0.056) 0.101 (0.035) 0.063 (0.021)
n𝑛nitalic_n = 500 0.097 (0.033) 0.064 (0.022) 0.040 (0.014)
n𝑛nitalic_n = 1000 0.069 (0.023) 0.046 (0.015) 0.029 (0.010)
Refer to caption
Figure 4: Comparison of the FCGAM model to alternative methods in Simulation Study 1. The boxplots visualize the predictive log-likelihood values obtained from the FCGAM model and from the benchmark methods (ii) to (vii). All models were fitted to 1000 independent data sets and evaluated on independently generated test data sets of the same size. In each panel, the dashed horizontal line indicates the median predictive log-likelihood value of the best performing method.

Comparison to alternative models. Figure 4 and Supplementary Figure S4 show the prediction accuracy (i.e. the predicted log-likelihood values on the test sets) of the FCGAM model and the benchmark methods (ii) to (vi). In Simulation Study 1 with negative correlation, it can be observed that the FCGAM model achieves the highest accuracy in all scenarios (Figure 4). The superiority is even more evident when the sample size and the value of the correlation coefficient are increased. The extended GB2 and simple GB2 methods yield similar performances as the Gaussian models with log-transformed outcome (LN and LN.LSS), whereas the Gamma regression models (GA and GA.LSS) result in the lowest accuracy. For both LN and GA the GAMLSS models are not superior to their simple counterparts.

In Simulation Study 2 with positive correlation, the results change considerably (Supplementary Figure S4). As expected, the performance of the FCGAM and eGB2 models is largely the same, as the eGB2 model also assumes gamma distributed components with positive correlation. The simple GB2 model (assuming uncorrelated components) and the Gaussian models with log-transformed outcomes (LN and LN.LSS) perform comparably well in the scenarios with θ=1𝜃1\theta=1italic_θ = 1, but deteriorated with increasing correlation (θ=5𝜃5\theta=5italic_θ = 5 and θ=10𝜃10\theta=10italic_θ = 10). Again, the Gamma regression models (GA and GA.LSS) exhibit the worst performance by far.

Table 3: Analysis of misspecified models for the association parameter in Simulation Study 3. The table presents the mean RMSE of the estimated conditional median values (upper part) and the mean of the predictive log-likelihood values (lower part), as obtained from fitting the FCGAM model to 1000 independent data sets and evaluating the fits on 1000 independently generated test data sets. Standard deviations (across the 1000 data sets) are given in brackets. The left part of the table refers to the scenarios with covariate-dependent correlation coefficients (in the observed range τi[0.483,,0.464]subscript𝜏𝑖0.4830.464\tau_{i}\in[-0.483,\ldots,0.464]italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ - 0.483 , … , 0.464 ]), whereas the right part refers to the scenarios with fixed negative correlation τ=0.11𝜏0.11\tau=-0.11italic_τ = - 0.11. The terms “modeled θ𝜃\thetaitalic_θ” and “constant θ𝜃\thetaitalic_θ” refer to the FCGAM models with a covariate-dependent predictor function for θ𝜃\thetaitalic_θ (as in (10)) and an intercept-only predictor function for θ𝜃\thetaitalic_θ (βθ1==βθp=0subscript𝛽𝜃1subscript𝛽𝜃𝑝0\beta_{\theta 1}=\ldots=\beta_{\theta p}=0italic_β start_POSTSUBSCRIPT italic_θ 1 end_POSTSUBSCRIPT = … = italic_β start_POSTSUBSCRIPT italic_θ italic_p end_POSTSUBSCRIPT = 0 in (10)), respectively.
RMSE Covariate-dependent correlation Fixed correlation
modeled θ𝜃\thetaitalic_θ constant θ𝜃\thetaitalic_θ modeled θ𝜃\thetaitalic_θ constant θ𝜃\thetaitalic_θ
n𝑛nitalic_n = 200 0.069 (0.031) 0.082 (0.042) 0.079 (0.032) 0.077 (0.034)
n𝑛nitalic_n = 500 0.042 (0.018) 0.058 (0.026) 0.050 (0.020) 0.047 (0.018)
n𝑛nitalic_n = 1000 0.031 (0.013) 0.049 (0.021) 0.036 (0.015) 0.034 (0.014)
Predictive Covariate-dependent correlation Fixed correlation
log-likelihood modeled θ𝜃\thetaitalic_θ constant θ𝜃\thetaitalic_θ modeled θ𝜃\thetaitalic_θ constant θ𝜃\thetaitalic_θ
n𝑛nitalic_n = 200 -6.769 (19.176) -7.116 (19.133) -20.165 (19.163) -19.356 (19.056)
n𝑛nitalic_n = 500 -10.736 (29.031) -12.853 (29.087) -44.585 (29.092) -43.979 (29.050)
n𝑛nitalic_n = 1000 -18.237 (41.177) -23.330 (40.965) -85.656 (40.989) -85.071 (40.967)

Misspecified models for the association parameter in Simulation Study 3. The RMSE of the estimated conditional median values and the predictive log-likelihood values of the FCGAM fits are summarized in Table 3. It is seen that ignoring the dependence of θ𝜃\thetaitalic_θ on the covariates (left part of Table 3) decreases both the predictive ability and the model fit. In the scenario with n=1000𝑛1000n=1000italic_n = 1000 (large sample size), the difference in predictive log-likelihood values of 5.093 suggests “considerably less” empirical support for the model with constant θ𝜃\thetaitalic_θ (according to the rules of thumb provided in Burnham and Anderson, 2002). On the other hand, when unnecessarily modeling the dependence of θ𝜃\thetaitalic_θ on X1,,X4subscript𝑋1subscript𝑋4X_{1},\ldots,X_{4}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT (right part of Table 3) the predictive ability and the model fit are mostly unaffected (showing only negligible differences in the RMSE and the predictive log-likelihood values).

Overall summary. Taken together, we make the following key empirical observations:

  1. 1.

    Point estimates from the FCGAM model are reliable and nearly unbiased even for small sample sizes.

  2. 2.

    The FCGAM model outperforms the eGB2 model in case of negative correlation and is en par with the eGB2 model when the correlation is positive.

  3. 3.

    In all scenarios, the Gamma regression models perform worst. In particular, they perform worse than the Gaussian models with log-transformed outcome.

  4. 4.

    Falsely modeling the association parameter does not deteriorate predictive performance to a large degree, whereas the FCGAM model with covariate-dependent θ𝜃\thetaitalic_θ improves the fit when the true association depends on the covariates.

4 Cohort Study of the German Dementia Competence Network

Background. The multi-center cohort study conducted by the German Dementia Competence Network (DCN; Kornhuber et al., 2009) enrolled patients aged older than 50 years that were diagnosed with either mild cognitive impairment (MCI), Alzheimer’s disease (AD) or other dementia. Recruitment took place between 2003 and 2007. The main objective of the original study was to establish biomarkers for the diagnosis and prognosis of AD using clinical, laboratory and imaging measurements. Here, we investigate covariates that are potentially associated with amyloid-β𝛽\betaitalic_β 42, amyloid-β𝛽\betaitalic_β 40 and total tau protein concentrations measured in cerebrospinal fluid samples. These analyses are of high relevance for clinical routine in the neurosciences, since biomarkers enable the detection of AD pathology long before the occurrence of the first clinically obvious symptoms (Sperling et al., 2013). Thus, relating covariates to biomarker values provides insight into disease pathology and prevention at the individual patient level. In the neurosciences, amyloid-β𝛽\betaitalic_β 42, amyloid-β𝛽\betaitalic_β 40 and total tau protein concentrations are usually not analyzed separately but in terms of their ratios. More specifically, the amyloid-β𝛽\betaitalic_β 42/40 ratio and amyloid-β𝛽\betaitalic_β 42/total tau ratio are considered to be strong predictors of AD progression (Koyama et al., 2012). Therefore we focus on the group of MCI patients and relate their ratios to patient-related risk factors for dementia.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: Analysis of the DCN study data. Distribution of the amyloid-β𝛽\betaitalic_β 42/40 ratios (a) and the amyloid-β𝛽\betaitalic_β 42/total tau ratios (b) in patients with MCI (n=330𝑛330n=330italic_n = 330).

Description of the data. In the DCN study, amyloid-β𝛽\betaitalic_β and total tau baseline concentrations were measured in 374 patients diagnosed with MCI. In all other MCI patients, CSF biosamples were not collected due to either logistic reasons or lack of consent to the invasive procedure of lumbar puncture. Exclusion of patients that did not meet the inclusion criterion (age \leq 50 years; 7777 patients) and of patients with missing values in at least one of the considered risk factors (37373737 patients) resulted in an analysis data set of n=330𝑛330n=330italic_n = 330 patients. For details on the handling of missing values we refer to Berger et al. (2019). The unconditional distributions of the amyloid-β𝛽\betaitalic_β 42/40 ratio and the amyloid-β𝛽\betaitalic_β 42/total tau ratio are visualized in Figure 5. While the values of the amyloid-β𝛽\betaitalic_β 42/40 ratios are all smaller than 0.3, the amyloid-β𝛽\betaitalic_β 42/total tau ratios range between 0.2 and 13, exhibiting a heavily right-skewed distribution. Kendall’s rank correlation coefficient between the two components is measured to be τ=0.307𝜏0.307\tau=0.307italic_τ = 0.307 (amyloid-β𝛽\betaitalic_β 42/40) and τ=0.269𝜏0.269\tau=-0.269italic_τ = - 0.269 (amyloid-β𝛽\betaitalic_β 42/total tau). Thus, our analysis had to deal with both positive and negative correlations between the ratio components. As mentioned before, this problem was our main motivation for the development of the FCGAM model.

The risk factors included in the analysis are summarized in Table 4. These were: (i) sex, (ii) age in years, (iii) educational level (measured by the number of years of education), and (iv) a binary variable indicating whether a patient was a carrier of the apolipoprotein Eϵitalic-ϵ\epsilonitalic_ϵ4 (ApoE ϵitalic-ϵ\epsilonitalic_ϵ4) allele, which is a strong genetic predictor of AD.

Table 4: Description and summary statistics of the two ratio outcomes and the covariates used for the analysis of the DCN study data (Q1 = first quartile, Q3 = third quartile). All numbers refer to a subset of patients diagnosed with MCI (n=330𝑛330n=330italic_n = 330). For details on the collection of the data, see Kornhuber et al. (2009).
Variable Summary statistics
min Q1 median Q3 max mean sd
Amyloid-β𝛽\betaitalic_β 42/40 0.03 0.08 0.10 0.14 0.26 0.11 0.04
Amyloid-β𝛽\betaitalic_β 42/total tau 0.19 0.91 2.13 3.72 12.95 2.70 2.34
Age (years) 51 60 66 73 89 66.51 8.11
Education (years) 2 11 11 13 19 12.18 2.96
Sex  male: 194 (58.8%) female: 136 (41.2%)
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4  no: 182 (55.2%) yes: 148 (44.8%)

Model fitting I. In a preliminary analysis, we fitted GA and GA.LSS models for the components amyloid-β𝛽\betaitalic_β 42, amyloid-β𝛽\betaitalic_β 40 and total tau, where either the rate parameters only (GA) or both the rate and the shape parameters (GA.LSS) were related to the four covariates. According to the Bayesian information criterion (BIC) the simple GA models (BIC =4899.988absent4899.988=4899.988= 4899.988 for amyloid-β𝛽\betaitalic_β 42, BIC =6248.201absent6248.201=6248.201= 6248.201 for amyloid-β𝛽\betaitalic_β 40 andBIC =4574.974absent4574.974=4574.974= 4574.974 for total tau) showed better fits than the respective GA.LSS models (BIC =4914.396absent4914.396=4914.396= 4914.396 for amyloid-β𝛽\betaitalic_β 42, BIC =6266.375absent6266.375=6266.375= 6266.375 for amyloid-β𝛽\betaitalic_β 40 and BIC =4581.902absent4581.902=4581.902= 4581.902 for total tau). This result indicates that it is sufficient to relate the two rate parameters to the covariates. Furthermore, it supports the assumptions of the proposed FCGAM model, which treats the shape parameters δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT as nuisance parameters.

The fits of the FCGAM model with covariate-dependent association parameter are presented in Supplementary Table S1. According to the credible intervals given in columns 4 and 6, none of the risk factors is found to affect the association parameter θ𝜃\thetaitalic_θ. Applying Equation (3) yielded the mean estimated rank correlations τ^(θ^)=0.35^𝜏^𝜃0.35\hat{\tau}(\hat{\theta})=0.35over^ start_ARG italic_τ end_ARG ( over^ start_ARG italic_θ end_ARG ) = 0.35 (range: 0.21 to 0.49) for amyloid-β𝛽\betaitalic_β 42/40 and τ^(θ^)=0.23^𝜏^𝜃0.23\hat{\tau}(\hat{\theta})=-0.23over^ start_ARG italic_τ end_ARG ( over^ start_ARG italic_θ end_ARG ) = - 0.23 (range: -0.41 to 0.03) for amyloid-β𝛽\betaitalic_β 42/total tau. Both estimates are close to the respective unconditional rank correlations.

Model fitting II. Based on the above findings and to further reduce model complexity, we fitted FCGAM models with constant association parameter θ𝜃\thetaitalic_θ (setting the coefficients βθ,Age,,βθ,ApoE ϵ4subscript𝛽𝜃Agesubscript𝛽𝜃ApoE ϵ4\beta_{\theta,\text{Age}},\ldots,\beta_{\theta,\text{ApoE $\epsilon 4$}}italic_β start_POSTSUBSCRIPT italic_θ , Age end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_θ , ApoE italic_ϵ 4 end_POSTSUBSCRIPT to zero). We then calculated the BIC from these reduced models along with their counterparts obtained from the models with covariate-dependent θ𝜃\thetaitalic_θ. For amyloid-β𝛽\betaitalic_β 42/40, the BIC values were 11063.7411063.7411063.7411063.74 (constant θ𝜃\thetaitalic_θ) and 11084.7511084.7511084.7511084.75 (modeled θ𝜃\thetaitalic_θ). For amyloid-β𝛽\betaitalic_β 42/total tau, the BIC values were 9249.999249.999249.999249.99 (constant θ𝜃\thetaitalic_θ) and 9267.4859267.4859267.4859267.485 (modeled θ𝜃\thetaitalic_θ). This result suggests that the reduced models with constant θ𝜃\thetaitalic_θ meet a better compromise between model fit and model complexity than the respective full models with covariate-dependent θ𝜃\thetaitalic_θ. This observation is confirmed by randomized quantile residuals of the reduced FCGAM models (Figure 6) indicating only slight deviations from normality.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: Analysis of the amyloid-β𝛽\betaitalic_β 42/40 ratios (a) and the amyloid-β𝛽\betaitalic_β 42/total tau ratios (b) in the DCN study data. The figure presents normal quantile-quantile plots of the quantile residuals obtained from the FCGAM model fits.

Main results. The results obtained from the reduced FCGAM models are shown in Table 5 and Supplementary Figures S5 and S6. The upper part of Table 5 refers to the parameter Λ=λU/λVΛsubscript𝜆𝑈subscript𝜆𝑉\Lambda=\lambda_{U}/\lambda_{V}roman_Λ = italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, reporting the differences β^Λj:=β^Ujβ^Vjassignsubscript^𝛽Λ𝑗subscript^𝛽𝑈𝑗subscript^𝛽𝑉𝑗\hat{\beta}_{\Lambda j}:=\hat{\beta}_{Uj}-\hat{\beta}_{Vj}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_Λ italic_j end_POSTSUBSCRIPT := over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_U italic_j end_POSTSUBSCRIPT - over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_V italic_j end_POSTSUBSCRIPT. Note that the coefficient estimates are very similar to the respective estimates of the more complex model in Table S1. For example, for amyloid-β𝛽\betaitalic_β 42/total tau one obtains β^Λ,ApoE ϵ4=0.3786subscript^𝛽ΛApoE ϵ40.3786\hat{\beta}_{\Lambda,\text{ApoE $\epsilon 4$}}=0.3786over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_Λ , ApoE italic_ϵ 4 end_POSTSUBSCRIPT = 0.3786 (Table 5) and β^Λ,ApoE ϵ4=0.2411+0.1406=0.3817subscript^𝛽ΛApoE ϵ40.24110.14060.3817\hat{\beta}_{\Lambda,\text{ApoE $\epsilon 4$}}=0.2411+0.1406=0.3817over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_Λ , ApoE italic_ϵ 4 end_POSTSUBSCRIPT = 0.2411 + 0.1406 = 0.3817 (Table S1). The credible intervals in Table 5 were obtained by drawing a sample of size 10,000 from the posterior distribution in (14) and by calculating the 2.5%percent2.52.5\%2.5 % and 97.5%percent97.597.5\%97.5 % percentiles from the sampled differences βUjβVjsubscript𝛽𝑈𝑗subscript𝛽𝑉𝑗{\beta}_{Uj}-{\beta}_{Vj}italic_β start_POSTSUBSCRIPT italic_U italic_j end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_V italic_j end_POSTSUBSCRIPT. According to the results of the FCGAM model, there is strong evidence for an effect of the risk factors age and ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 on theamyloid-β𝛽\betaitalic_β 42/40 and amyloid-β𝛽\betaitalic_β 42/total tau ratios. As depicted in Supplementary Figures S5(a) and S6(a), both the expected amyloid-β𝛽\betaitalic_β 42/40 ratio and the expected amyloid-β𝛽\betaitalic_β 42/total tau ratio decrease with increasing age, implying a higher risk of progression to AD in older patients. Similarly, the expected ratios of ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 carriers are strongly reduced compared to patients not carrying the allele (Supplementary Figures S5(d) and S6(d), confirming the important role of this genetic risk factor in AD progression). The figures also illustrate how the estimated median values as well as the 10%percent1010\%10 % and 90%percent9090\%90 % percentiles of the distributions change with the covariates. In contrast to age and ApoE ϵitalic-ϵ\epsilonitalic_ϵ4, Table 5 shows no evidence for an effect of sex and educational level on the two ratio outcomes. These results are in full agreement with the findings by Berger et al. (2019), who fitted an eGB2 model with amyloid-β𝛽\betaitalic_β 42/40 outcome to the DCN study data.

Table 5: Analysis of the amyloid-β𝛽\betaitalic_β 42/40 ratios (left) and the amyloid-β𝛽\betaitalic_β 42/total tau ratios (right) in the DCN study data. The table presents the coefficient estimates with 95%percent9595\%95 % credible intervals (calculated by the procedure described in Section 2.3), as obtained from fitting FCGAM models with constant association parameter θ𝜃\thetaitalic_θ.
amyloid-β𝛽\boldsymbol{\beta}bold_italic_β 42/40 amyloid-β𝛽\boldsymbol{\beta}bold_italic_β 42/total tau
Parameter Covariate β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG 95%percent9595\%95 % CI β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG 95%percent9595\%95 % CI
ΛΛ\Lambdaroman_Λ Age (years) 0.00890.00890.00890.0089 [0.0041;0.0137]0.00410.0137[0.0041;0.0137][ 0.0041 ; 0.0137 ] 0.02560.02560.02560.0256 [0.0156;0.0377]0.01560.0377[0.0156;0.0377][ 0.0156 ; 0.0377 ]
Education (years) 0.00170.0017-0.0017- 0.0017 [0.0152;0.0117]0.01520.0117[-0.0152;0.0117][ - 0.0152 ; 0.0117 ] 0.01500.0150-0.0150- 0.0150 [0.0461;0.0168]0.04610.0168[-0.0461;0.0168][ - 0.0461 ; 0.0168 ]
Sex (male) . . . .
Sex (female) 0.07240.07240.07240.0724 [0.0073;0.1547]0.00730.1547[-0.0073;0.1547][ - 0.0073 ; 0.1547 ] 0.02830.02830.02830.0283 [0.1544;0.2146]0.15440.2146[-0.1544;0.2146][ - 0.1544 ; 0.2146 ]
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (no) . . . .
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (yes) 0.19670.19670.19670.1967 [0.1157;0.2766]0.11570.2766[0.1157;0.2766][ 0.1157 ; 0.2766 ] 0.37860.37860.37860.3786 [0.1965;0.5584]0.19650.5584[0.1965;0.5584][ 0.1965 ; 0.5584 ]
θ𝜃\thetaitalic_θ 3.53253.53253.53253.5325 [2.7671;4.2888]2.76714.2888[2.7671;4.2888][ 2.7671 ; 4.2888 ] 2.06112.0611-2.0611- 2.0611 [2.8064;1.3073]2.80641.3073[-2.8064;-1.3073][ - 2.8064 ; - 1.3073 ]
δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT 6.05866.05866.05866.0586 [5.1169;7.0218]5.11697.0218[5.1169;7.0218][ 5.1169 ; 7.0218 ] 5.80905.80905.80905.8090 [4.9062;6.7092]4.90626.7092[4.9062;6.7092][ 4.9062 ; 6.7092 ]
δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT 10.015110.015110.015110.0151 [8.5182;11.5039]8.518211.5039[8.5182;11.5039][ 8.5182 ; 11.5039 ] 2.67182.67182.67182.6718 [2.2900;3.0542]2.29003.0542[2.2900;3.0542][ 2.2900 ; 3.0542 ]

5 Discussion

The main contribution of this work is a copula-based regression model that relates the ratio of two gamma distributed components to a set of covariates. Our model is primarily designed for the analysis of ratio outcomes in medical research, which is an important task, for instance, in neurology (Novellino et al., 2021), infectiology (Caby et al., 2016) and pharmacology (Cawley et al., 2022). Importantly, when biomarker ratios are used as clinical metrics or indicators of clinical outcomes, our model may be used to relate the respective ratio values to a set of risk factors and/or confounding variables. A prototypical example is given by the prognosis of AD progression considering ratios of amyloid-β𝛽\betaitalic_β and total tau protein biomarkers, as presented in Section 4 of this paper.

Conceptually, the FCGAM model developed in this paper has the following advantages: First, by assuming the ratio components to follow marginal gamma distributions, the FCGAM model represents the two biomarkers by real-valued random variables with positive support and right-skewed (marginal) distributions. These distributional characteristics, which are common to most biomarkers encountered in medical research, are directly incorporated in the definition of the proposed copula model. As a consequence, the resulting ratio density incorporates the full information contained in the marginal densities of the components of the ratio. We emphasize that this property does not apply to simpler modeling approaches approximating the ratio by a single log-normal or gamma-distributed variable. In fact, without consideration of the paired components themselves, these approximations inevitably bear the risk of a loss of information (“neglected companion”; Kerkhof et al., 2019). This issue has been demonstrated by the results of our simulation study (Section 3), which resulted in an increased estimation accuracy of the proposed copula-based approach in all data-generating scenarios. We also stress that linking the two marginal distributions by a copula does in general not restrict our model to the use of two gamma distributions for the ratio components. In fact, although our model can be seen as the most relevant use case in many medical applications, the marginal distributions can in principle be replaced by arbitrary parametric distributions. For instance, our model can in a straightforward manner be extended to situations where one biomarker is discrete or ordinal.

Second, the proposed FCGAM model has a high flexibility regarding the direction of the association between the two ratio components. Importantly, by the choice of Frank copula, the FCGAM model allows for both positive and negative values of the (rank) correlation between the components U𝑈Uitalic_U and V𝑉Vitalic_V, thereby improving previous modeling approaches that restricted this correlation to be zero (Yee, 2015) or positive (Berger et al., 2019). As demonstrated in the simulation study in Section 3, the FCGAM model indeed performs better in terms of estimation accuracy when the association between U𝑈Uitalic_U and V𝑉Vitalic_V is negative. On the other hand, it does not perform worse than the aforementioned approach when the association between U𝑈Uitalic_U and V𝑉Vitalic_V is positive.

Third, although the proposed model incorporates the full information contained in the marginal densities fUsubscript𝑓𝑈f_{U}italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and fVsubscript𝑓𝑉f_{V}italic_f start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT, it provides a rather simple interpretation of the associations between the ratio U/V𝑈𝑉U/Vitalic_U / italic_V and the covariates. This is because the FCGAM model reduces the original five-parameter set (λU,δU,λV,δV,θ)superscriptsubscript𝜆𝑈subscript𝛿𝑈subscript𝜆𝑉subscript𝛿𝑉𝜃top(\lambda_{U},\delta_{U},\lambda_{V},\delta_{V},\theta)^{\top}( italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_θ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (including all parameters of the marginal densities and the association parameter θ𝜃\thetaitalic_θ) to the restricted set (Λ,δU,δV,θ)superscriptΛsubscript𝛿𝑈subscript𝛿𝑉𝜃top(\Lambda,\delta_{U},\delta_{V},\theta)^{\top}( roman_Λ , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_θ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with Λ=λU/λVΛsubscript𝜆𝑈subscript𝜆𝑉\Lambda=\lambda_{U}/\lambda_{V}roman_Λ = italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT. As a consequence, when treating δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT (and possibly also θ𝜃\thetaitalic_θ) as nuisance parameters, the associations between U/V𝑈𝑉U/Vitalic_U / italic_V and each of the covariates can be investigated using one-dimensional coefficient estimates and single-parameter hypothesis tests. Similarly, the association between the components U𝑈Uitalic_U and V𝑉Vitalic_V has a natural interpretation in terms of Kendall’s rank correlation, being related to θ𝜃\thetaitalic_θ by the one-to-one relationship given in Equation (3).

Finally, beside the flexibility in specifying other marginal distributions than the gamma distribution, the FCGAM model may be extended in many other ways. For example, Frank copula could be replaced by other copulas (noting that the results on ratio densities are also valid for other absolutely continuous copulas; see Ly et al., 2019). When there is particular interest in the tail dependencies of U𝑈Uitalic_U and V𝑉Vitalic_V, benchmark experiments to identify the best fitting copula and/or marginal distributions could be performed using resampling techniques (e.g. bootstrapping or subsampling). It should be noted, however, that other copulas from the literature might be less flexible regarding the range of θ𝜃\thetaitalic_θ (and thus also the range of possible associations between the components U𝑈Uitalic_U and V𝑉Vitalic_V; see e.g. Ghosh et al., 2022, for a recent overview of copulas allowing for modeling negative dependence). For example, it is not possible to model negative associations between U𝑈Uitalic_U and V𝑉Vitalic_V using non-rotated Gumbel or Joe copulas.

Despite our biostatistical focus, the proposed FCGAM methodology is a general statistical modeling approach that can in principle be used in any research discipline. Interesting areas are e.g. environmental research (Perri and Porporato, 2022) and information engineering (Mekić et al., 2012).

6 Appendices

It contains the proof of Proposition 2 (Appendix A), additional simulation results (Appendix B) as well as further results of the analysis of the DCN study data (Appendix C).

7 Acknowledgments

Moritz Berger acknowledges support by the grant BE 7543/1-1 of the German research foundation (DFG). Nadja Klein acknowledges support by the Emmy Noether grant KL 3037/1-1 of the DFG. The analysis of the DCN study data was supported by the German Federal Ministry of Education and Research (Kompetenznetz Demenzen, grant 01GI0420)

References

  • Berger and Schmid (2020) M. Berger and M. Schmid. Flexible modeling of ratio outcomes in clinical and epidemiological research. Statistical Methods in Medical Research, 29:2250–2268, 2020.
  • Berger et al. (2019) M. Berger, M. Wagner, and M. Schmid. Modeling biomarker ratios with gamma distributed components. The Annals of Applied Statistics, 13:548–572, 2019.
  • Bourguignon et al. (2021) M. Bourguignon, M. Santos-Neto, and M. de Castro. A new regression model for positive random variables with skewed and long tail. Metron, 79:33–55, 2021.
  • Burnham and Anderson (2002) K. P. Burnham and D. A. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York, 2nd edition, 2002.
  • Caby et al. (2016) F. Caby, A. Guihot, S. Lambert-Niclot, M. Guiguet, D. Boutolleau, R. Agher, M.-A. Valantin, R. Tubiana, V. Calvez, A.-G. Marcelin, B. Autran, D. Costagliola, and C. Katlama. Determinants of a low CD4/CD8 ratio in HIV-1–infected individuals despite long-term viral suppression. Clinical Infectious Diseases, 62:1297–1303, 2016.
  • Cawley et al. (2022) A. Cawley, B. Keen, K. Tou, M. Elbourne, and J. Keledjian. Biomarker ratios. Drug Testing and Analysis, 14:983–990, 2022.
  • dos Santos et al. (2023) K. H. dos Santos, T. L. Pereira, T. C. de Souza, and M. Bourguignon. A misspecification test for beta prime regression models. Communications in Statistics – Simulation and Computation, 52:4561–4574, 2023.
  • Fahrmeir et al. (2022) L. Fahrmeir, T. Kneib, S. Lang, and B. D. Marx. Regression: Models, Methods and Applications. Springer, Berlin, Heidelberg, 2nd edition, 2022.
  • Firth (1988) D. Firth. Multiplicative errors: Log-normal or gamma? Journal of the Royal Statistical Society, Series B, 50:266–268, 1988.
  • Genest (1987) C. Genest. Frank’s family of bivariate distributions. Biometrika, 74:549–555, 1987.
  • Ghosh et al. (2022) S. Ghosh, P. Bhuyan, and M. Finkelstein. On a bivariate copula for modeling negative dependence: Application to New York air quality data. Statistical Methods & Applications, 31:1329–1353, 2022.
  • Harry (2014) J. Harry. Dependence Modeling with Copulas. Chapman & Hall, New York, 2014.
  • Jack et al. (2015) C. R. Jack, H. J. Wiste, S. D. Weigand, D. S. Knopman, P. Vemuri, M. M. Mielke, V. Lowe, M. L. Senjem, J. L. Gunter, M. M. Machulda, B. E. Gregg, V. S. Pankratz, W. A. Rocca, and R. C. Petersen. Age, sex, and APOE ε𝜀\varepsilonitalic_ε4 effects on memory, brain structure, and β𝛽\betaitalic_β-amyloid across the adult life span. JAMA Neurology, 72:511–519, 2015.
  • Kerkhof et al. (2019) P. L. M. Kerkhof, R. A. Peace, and N. Handly. Ratiology and a complementary class of metrics for cardiovascular investigations. Physiology, 34:250–263, 2019.
  • Kibble (1941) W. F. Kibble. A two-variate gamma type distribution. Sankhya: The Indian Journal of Statistics, 5:137–150, 1941.
  • Kleiber and Kotz (2003) C. Kleiber and S. Kotz. Statistical Size Distributions in Economics and Actuarial Sciences. Wiley, Hoboken, 2003.
  • Kornhuber et al. (2009) J. Kornhuber, K. Schmidtke, L. Frölich, R. Perneczky, S. Wolf, H. Hampel, F. Jessen, I. Heuser, O. Peters, M. Weih, H. Jahn, C. Luckhaus, M. Hüll, H.-J. Gertz, J. Schröder, J. Pantel, O. Rienhoff, S. A. Seuchter, E. Rüther, F. Henn, W. Maier, and J. Wiltfang. Early and differential diagnosis of dementia and mild cognitive impairment: Design and cohort baseline characteristics of the German Dementia Competence Network. Dementia and Geriatric Cognitive Disorders, 27:404–417, 2009.
  • Koyama et al. (2012) A. Koyama, O. I. Okereke, T. Yang, D. Blacker, D. J. Selkoe, and F. Grodstein. Plasma amyloid-beta as a predictor of dementia and cognitive decline – a systematic review and meta-analysis. Archives of Neurology, 69:824–831, 2012.
  • Lee et al. (1979) R.-Y. Lee, B. S. Holland, and J. A. Flueck. Distribution of a ratio of correlated gamma random variables. SIAM Journal on Applied Mathematics, 36:304–320, 1979.
  • Long et al. (2016) Q. Long, X. Zhang, Y. Zhao, B. A. Johnson, and R. M. Bostick. Modeling clinical outcome using multiple correlated functional biomarkers: A Bayesian approach. Statistical Methods in Medical Research, 25:520–537, 2016.
  • Ly et al. (2019) S. Ly, K.-H. Pho, S. Ly, and W.-K. Wong. Determining distribution for the quotients of dependent and independent random variables by using copulas. Journal of Risk and Financial Management, 12:42, 2019.
  • Medeiros et al. (2023) F. M. C. Medeiros, M. C. Araújo, and M. Bourguignon. Improved estimators in beta prime regression models. Communications in Statistics – Simulation and Computation, 52:5125–5138, 2023.
  • Mekić et al. (2012) E. Mekić, M. Stefanović, P. Spalević, N. Sekulović, and A. Stanković. Statistical analysis of ratio of random variables and its application in performance analysis of multihop wireless transmissions. Mathematical Problems in Engineering, 2012:841092, 2012.
  • Mitchell et al. (2015) E. M. Mitchell, R. H. Lyles, and E. F. Schisterman. Positing, fitting, and selecting regression models for pooled biomarker data. Statistics in Medicine, 34:2544–2558, 2015.
  • Natarajan et al. (2003) S. Natarajan, H. Glick, M. Criqui, D. Horowitz, S. R. Lipsitz, and B. Kinosian. Cholesterol measures to identify and treat individuals at risk for coronary heart disease. American Journal of Preventive Medicine, 25:50–57, 2003.
  • Novellino et al. (2021) F. Novellino, A. Donato, N. Malara, J. L. M. Madrigal, and G. Donato. Complete blood cell count-derived ratios can be useful biomarkers for neurological diseases. International Journal of Immunopathology and Pharmacology, 35:20587384211048264, 2021.
  • Perri and Porporato (2022) S. Perri and A. Porporato. Environmental concentrations as ratios of random variables. Environmental Research Letters, 17:024011, 2022.
  • Rigby and Stasinopoulos (2005) R. A. Rigby and D. M. Stasinopoulos. Generalized additive models for location, scale and shape (with discussion). Journal of the Royal Statistical Society, Series C, 54:507–554, 2005.
  • Safari-Katesari and Zaroudi (2020) H. Safari-Katesari and S. Zaroudi. Count copula regression model using generalized beta distribution of the second kind. Statistics in Transition, New Series, 21:1–12, 2020.
  • Sottas et al. (2007) P.-E. Sottas, N. Baume, C. Saudan, C. Schweizer, M. Kamber, and M. Saugy. Bayesian detection of abnormal values in longitudinal biomarkers with an application to T/E ratio. Biostatistics, 8:285–296, 2007.
  • Sperling et al. (2013) R. A. Sperling, J. Karlawish, and K. A. Johnson. Preclinical Alzheimer disease – the challenges ahead. Nature Reviews Neurology, 9:54–58, 2013.
  • Tapiola et al. (2009) T. Tapiola, I. Alafuzoff, S.-K. Herukka, L. Parkkinen, P. Hartikainen, H. Soininen, and T. Pirttilä. Cerebrospinal fluid β𝛽\betaitalic_β-amyloid 42 and tau proteins as biomarkers of Alzheimer-type pathologic changes in the brain. Archives of Neurology, 66:382–389, 2009.
  • Tubbs (1986) J. D. Tubbs. Moments for a ratio of correlated gamma variates. Communications in Statistics – Theory and Methods, 15:251–259, 1986.
  • Tulupyev et al. (2013) A. Tulupyev, A. Suvorova, J. Sousa, and D. Zelterman. Beta prime regression with application to risky behavior frequency screening. Statistics in Medicine, 32:4044–4056, 2013.
  • Van Domelen et al. (2021) D. R. Van Domelen, E. M. Mitchell, N. J. Perkins, E. F. Schisterman, A. K. Manatunga, Y. Huang, and R. H. Lyles. Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors. Biostatistics, 22:250–265, 2021.
  • Wiens (1999) B. L. Wiens. When log-normal and gamma models give different results: A case study. The American Statistician, 53:89–93, 1999.
  • Wintergerst et al. (2022) M. W. M. Wintergerst, N. R. Merten, M. Berger, C. Dysli, J. H. Terheyden, E. Poletti, F. G. Holz, V. S. Schäfer, M. Schmid, T. Ach, and R. P. Finger. Spectrally resolved autofluorescence imaging in posterior uveitis. Scientific Reports, 12:14337, 2022.
  • Wood (2017) S. N. Wood. Generalized Additive Models: An Introduction with R. CRC press, Boca Raton, 2nd edition, 2017.
  • Xu et al. (2020) W. Xu, L. Tan, B.-J. Su, H. Yu, Y.-L. Bi, X.-F. Yue, Q. Dong, and J.-T. Yu. Sleep characteristics and cerebrospinal fluid biomarkers of Alzheimer’s disease pathology in cognitively intact older adults: The CABLE study. Alzheimer’s & Dementia, 16:1146–1152, 2020.
  • Yee (2015) T. W. Yee. Vector Generalized Linear and Additive Models: With an Implementation in R. Springer, New York, 2015.

Appendix Appendix A Proof of Proposition 2

By Proposition 1, the PDF of the ratio R=U/V𝑅𝑈𝑉R=U/Vitalic_R = italic_U / italic_V is given by

fR(r;λU,λV,δU,δV,θ)=01subscript𝑓𝑅𝑟subscript𝜆𝑈subscript𝜆𝑉subscript𝛿𝑈subscript𝛿𝑉𝜃superscriptsubscript01\displaystyle f_{R}(r;\lambda_{U},\lambda_{V},\delta_{U},\delta_{V},\theta)=% \int_{0}^{1}italic_f start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_r ; italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_θ ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cθ(FU(rFV1(s;λV,δV);λU,δU),s)subscript𝑐𝜃subscript𝐹𝑈𝑟superscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉subscript𝜆𝑈subscript𝛿𝑈𝑠\displaystyle\,c_{\theta}\left(F_{U}(r\,F_{V}^{-1}(s;\lambda_{V},\delta_{V});% \lambda_{U},\delta_{U}),s\right)italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ; italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) , italic_s )
×|FV1(s;λV,δV)|fU(rFV1(s;λV,δV),λU,δU)dsabsentsuperscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉subscript𝑓𝑈𝑟superscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉subscript𝜆𝑈subscript𝛿𝑈𝑑𝑠\displaystyle\times\,\left|F_{V}^{-1}(s;\lambda_{V},\delta_{V})\right|\,f_{U}(% rF_{V}^{-1}(s;\lambda_{V},\delta_{V}),\lambda_{U},\delta_{U})\,ds× | italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) | italic_f start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) , italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) italic_d italic_s
=01absentsuperscriptsubscript01\displaystyle=\int_{0}^{1}= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cθ(1Γ(δU)γ(δU,λUrFV1(s;λV,δV)),s)subscript𝑐𝜃1Γsubscript𝛿𝑈𝛾subscript𝛿𝑈subscript𝜆𝑈𝑟superscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉𝑠\displaystyle c_{\theta}\left(\frac{1}{\Gamma(\delta_{U})}\,\gamma\left(\delta% _{U},\lambda_{U}\,r\,F_{V}^{-1}(s;\lambda_{V},\delta_{V})\right),s\right)italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG italic_γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ) , italic_s )
×FV1(s;λV,δV)λUδUΓ(δU)(rFV1(s;λV,δV))δU1absentsuperscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉superscriptsubscript𝜆𝑈subscript𝛿𝑈Γsubscript𝛿𝑈superscript𝑟superscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉subscript𝛿𝑈1\displaystyle\times\,F_{V}^{-1}(s;\lambda_{V},\delta_{V})\ \frac{\lambda_{U}^{% \delta_{U}}}{\Gamma(\delta_{U})}\left(r\,F_{V}^{-1}(s;\lambda_{V},\delta_{V})% \right)^{\delta_{U}-1}× italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) divide start_ARG italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG ( italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT
×exp(λUrFV1(s;λV,δV))dsabsentsubscript𝜆𝑈𝑟superscriptsubscript𝐹𝑉1𝑠subscript𝜆𝑉subscript𝛿𝑉𝑑𝑠\displaystyle\times\,\exp\left(-\lambda_{U}\,r\,F_{V}^{-1}(s;\lambda_{V},% \delta_{V})\right)\,ds× roman_exp ( - italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_r italic_F start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ; italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) ) italic_d italic_s
=01absentsuperscriptsubscript01\displaystyle=\int_{0}^{1}= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cθ(1Γ(δU)γ(δU,λUrγ1(δV,Γ(δV)s)λV),s)subscript𝑐𝜃1Γsubscript𝛿𝑈𝛾subscript𝛿𝑈subscript𝜆𝑈𝑟superscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠subscript𝜆𝑉𝑠\displaystyle c_{\theta}\left(\frac{1}{\Gamma(\delta_{U})}\,\gamma\left(\delta% _{U},\lambda_{U}\,r\ \frac{\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})\,s% \right)}{\lambda_{V}}\right),s\right)italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG italic_γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_r divide start_ARG italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_ARG ) , italic_s )
×λUδUΓ(δU)(γ1(δV,Γ(δV)s)λV)δUrδU1absentsuperscriptsubscript𝜆𝑈subscript𝛿𝑈Γsubscript𝛿𝑈superscriptsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠subscript𝜆𝑉subscript𝛿𝑈superscript𝑟subscript𝛿𝑈1\displaystyle\times\,\frac{\lambda_{U}^{\delta_{U}}}{\Gamma(\delta_{U})}\left(% \frac{\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})\,s\right)}{\lambda_{V}}% \right)^{\delta_{U}}r^{\delta_{U}-1}× divide start_ARG italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG ( divide start_ARG italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT
×exp(λUrγ1(δV,Γ(δV)s)λV)dsabsentsubscript𝜆𝑈𝑟superscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠subscript𝜆𝑉𝑑𝑠\displaystyle\times\,\exp\left(-\lambda_{U}\,r\ \frac{\gamma^{-1}\left(\delta_% {V},\Gamma(\delta_{V})\,s\right)}{\lambda_{V}}\right)\,ds× roman_exp ( - italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT italic_r divide start_ARG italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_ARG ) italic_d italic_s
=Λ=λU/λV01Λsubscript𝜆𝑈subscript𝜆𝑉superscriptsubscript01\displaystyle\overset{\Lambda=\lambda_{U}/\lambda_{V}}{=}\int_{0}^{1}start_OVERACCENT roman_Λ = italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_OVERACCENT start_ARG = end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT cθ(1Γ(δU)γ(δU,rΛγ1(δV,Γ(δV)s)),s)subscript𝑐𝜃1Γsubscript𝛿𝑈𝛾subscript𝛿𝑈𝑟Λsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠𝑠\displaystyle{c_{\theta}\left(\frac{1}{\Gamma(\delta_{U})}\,\gamma\left(\delta% _{U},r\,\Lambda\,\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})s\right)\right)% ,s\right)}italic_c start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG italic_γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_r roman_Λ italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) , italic_s )
×ΛδUr(δU1)Γ(δU)(γ1(δV,Γ(δV)s))δUabsentsuperscriptΛsubscript𝛿𝑈superscript𝑟subscript𝛿𝑈1Γsubscript𝛿𝑈superscriptsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠subscript𝛿𝑈\displaystyle\times\,\frac{\Lambda^{\delta_{U}}\,r^{(\delta_{U}-1)}}{\Gamma(% \delta_{U})}\,\left(\gamma^{-1}\left(\delta_{V},\Gamma(\delta_{V})s\right)% \right)^{\delta_{U}}× divide start_ARG roman_Λ start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) end_ARG ( italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
×exp(rΛγ1(δV,Γ(δV)s))dsabsent𝑟Λsuperscript𝛾1subscript𝛿𝑉Γsubscript𝛿𝑉𝑠𝑑𝑠\displaystyle\times\,\exp\left(-r\,\Lambda\,\gamma^{-1}\left(\delta_{V},\Gamma% (\delta_{V})s\right)\right)\,ds× roman_exp ( - italic_r roman_Λ italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , roman_Γ ( italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) italic_s ) ) italic_d italic_s
=fRabsentsubscript𝑓𝑅\displaystyle=f_{R}= italic_f start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT (r;Λ,δU,δV,θ),𝑟Λsubscript𝛿𝑈subscript𝛿𝑉𝜃\displaystyle(r;\Lambda,\delta_{U},\delta_{V},\theta)\,,( italic_r ; roman_Λ , italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT , italic_θ ) ,

where γ(,)𝛾\gamma(\cdot,\cdot)italic_γ ( ⋅ , ⋅ ) denotes the lower incomplete gamma function. \square

Appendix Appendix B Further Simulation Results

Refer to caption
Figure S1: Point estimates of the FCGAM coefficients in Simulation Study 1. The boxplots visualize the MLEs of the coefficients βV1=0.2subscript𝛽𝑉10.2\beta_{V1}=-0.2italic_β start_POSTSUBSCRIPT italic_V 1 end_POSTSUBSCRIPT = - 0.2, βV2=0.2subscript𝛽𝑉20.2\beta_{V2}=0.2italic_β start_POSTSUBSCRIPT italic_V 2 end_POSTSUBSCRIPT = 0.2, βV3=0.4subscript𝛽𝑉30.4\beta_{V3}=-0.4italic_β start_POSTSUBSCRIPT italic_V 3 end_POSTSUBSCRIPT = - 0.4 and βV4=0.4subscript𝛽𝑉40.4\beta_{V4}=0.4italic_β start_POSTSUBSCRIPT italic_V 4 end_POSTSUBSCRIPT = 0.4 that were obtained from fitting the FCGAM model to 1000 data sets of size n𝑛nitalic_n each. The red lines refer to the true values of the coefficients.
Refer to caption
Figure S2: Point estimates of the FCGAM coefficients in Simulation Study 2. The boxplots visualize the MLEs of the coefficients βU1=0.4subscript𝛽𝑈10.4\beta_{U1}=0.4italic_β start_POSTSUBSCRIPT italic_U 1 end_POSTSUBSCRIPT = 0.4, βU2=0.4subscript𝛽𝑈20.4\beta_{U2}=-0.4italic_β start_POSTSUBSCRIPT italic_U 2 end_POSTSUBSCRIPT = - 0.4, βU3=0.2subscript𝛽𝑈30.2\beta_{U3}=0.2italic_β start_POSTSUBSCRIPT italic_U 3 end_POSTSUBSCRIPT = 0.2 and βU4=0.2subscript𝛽𝑈40.2\beta_{U4}=-0.2italic_β start_POSTSUBSCRIPT italic_U 4 end_POSTSUBSCRIPT = - 0.2 that were obtained from fitting the FCGAM model to 1000 data sets of size n𝑛nitalic_n each. The red lines refer to the true values of the coefficients.
Refer to caption
Figure S3: Point estimates of the FCGAM coefficients in Simulation Study 2. The boxplots visualize the MLEs of the coefficients βV1=0.2subscript𝛽𝑉10.2\beta_{V1}=0.2italic_β start_POSTSUBSCRIPT italic_V 1 end_POSTSUBSCRIPT = 0.2, βV2=0.2subscript𝛽𝑉20.2\beta_{V2}=-0.2italic_β start_POSTSUBSCRIPT italic_V 2 end_POSTSUBSCRIPT = - 0.2, βV3=0.4subscript𝛽𝑉30.4\beta_{V3}=0.4italic_β start_POSTSUBSCRIPT italic_V 3 end_POSTSUBSCRIPT = 0.4 and βV4=0.4subscript𝛽𝑉40.4\beta_{V4}=-0.4italic_β start_POSTSUBSCRIPT italic_V 4 end_POSTSUBSCRIPT = - 0.4 that were obtained from fitting the FCGAM model to 1000 data sets of size n𝑛nitalic_n each. The red lines refer to the true values of the coefficients.
Refer to caption
Figure S4: Comparison of the FCGAM model to alternative methods in Simulation Study 2. The boxplots visualize the predictive log-likelihood values obtained from the FCGAM model and from the benchmark methods (ii) to (vii). All models were fitted to 1000 independent data sets and evaluated on independently generated test data sets of the same size. In each panel, the dashed horizontal line indicates the median predictive log-likelihood value of the best performing method.

Appendix Appendix C Further Results of the Analysis of the DCN Study Data

Table S1: Analysis of the amyloid-β𝛽\betaitalic_β 42/40 ratios (left) and the amyloid-β𝛽\betaitalic_β 42/total tau ratios (right) in the DCN study data. The table presents the coefficient estimates with 95%percent9595\%95 % credible intervals (calculated by the procedure described in Section 2.3), as obtained from fitting FCGAM models with covariate-dependent association parameter θ𝜃\thetaitalic_θ.
amyloid-β𝛽\boldsymbol{\beta}bold_italic_β 42/40 amyloid-β𝛽\boldsymbol{\beta}bold_italic_β 42/total tau
Parameter Covariate β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG 95%percent9595\%95 % CI β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG 95%percent9595\%95 % CI
λUsubscript𝜆𝑈\lambda_{U}italic_λ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT Age (years) 0.00730.00730.00730.0073 [0.0019;0.0127]0.00190.0127[0.0019;0.0127][ 0.0019 ; 0.0127 ] 0.00920.00920.00920.0092 [0.0037;0.0148]0.00370.0148[0.0037;0.0148][ 0.0037 ; 0.0148 ]
Education (years) 0.00790.00790.00790.0079 [0.0071;0.0224]0.00710.0224[-0.0071;0.0224][ - 0.0071 ; 0.0224 ] 0.00350.00350.00350.0035 [0.0123;0.0192]0.01230.0192[-0.0123;0.0192][ - 0.0123 ; 0.0192 ]
Sex (male) . . . .
Sex (female) 0.00170.00170.00170.0017 [0.0905;0.0933]0.09050.0933[-0.0905;0.0933][ - 0.0905 ; 0.0933 ] 0.02470.0247-0.0247- 0.0247 [0.1178;0.0680]0.11780.0680[-0.1178;0.0680][ - 0.1178 ; 0.0680 ]
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (no) . . . .
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (yes) 0.17740.17740.17740.1774 [0.0871;0.2681]0.08710.2681[0.0871;0.2681][ 0.0871 ; 0.2681 ] 0.24110.24110.24110.2411 [0.1503;0.3325]0.15030.3325[0.1503;0.3325][ 0.1503 ; 0.3325 ]
λVsubscript𝜆𝑉\lambda_{V}italic_λ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT Age (years) 0.00150.0015-0.0015- 0.0015 [0.0056;0.0025]0.00560.0025[-0.0056;0.0025][ - 0.0056 ; 0.0025 ] 0.01600.0160-0.0160- 0.0160 [0.0242;0.0079]0.02420.0079[-0.0242;-0.0079][ - 0.0242 ; - 0.0079 ]
Education (years) 0.01000.01000.01000.0100 [0.0013;0.0212]0.00130.0212[-0.0013;0.0212][ - 0.0013 ; 0.0212 ] 0.01660.01660.01660.0166 [0.0071;0.0406]0.00710.0406[-0.0071;0.0406][ - 0.0071 ; 0.0406 ]
Sex (male) . . . .
Sex (female) 0.07510.0751-0.0751- 0.0751 [0.1460;0.0061]0.14600.0061[-0.1460;-0.0061][ - 0.1460 ; - 0.0061 ] 0.07460.0746-0.0746- 0.0746 [0.2127;0.0663]0.21270.0663[-0.2127;0.0663][ - 0.2127 ; 0.0663 ]
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (no) . . . .
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (yes) 0.01410.0141-0.0141- 0.0141 [0.0811;0.0526]0.08110.0526[-0.0811;0.0526][ - 0.0811 ; 0.0526 ] 0.14060.1406-0.1406- 0.1406 [0.2745;0.0065]0.27450.0065[-0.2745;-0.0065][ - 0.2745 ; - 0.0065 ]
θ𝜃\thetaitalic_θ Age (years) 0.03690.03690.03690.0369 [0.0645;0.1347]0.06450.1347[-0.0645;0.1347][ - 0.0645 ; 0.1347 ] 0.06370.06370.06370.0637 [0.0356;0.1640]0.03560.1640[-0.0356;0.1640][ - 0.0356 ; 0.1640 ]
Education (years) 0.12470.12470.12470.1247 [0.1453;0.3928]0.14530.3928[-0.1453;0.3928][ - 0.1453 ; 0.3928 ] 0.01650.01650.01650.0165 [0.2416;0.2750]0.24160.2750[-0.2416;0.2750][ - 0.2416 ; 0.2750 ]
Sex (male) . . . .
Sex (female) 0.56790.5679-0.5679- 0.5679 [2.1432;1.0406]2.14321.0406[-2.1432;1.0406][ - 2.1432 ; 1.0406 ] 0.93240.9324-0.9324- 0.9324 [2.4944;0.6044]2.49440.6044[-2.4944;0.6044][ - 2.4944 ; 0.6044 ]
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (no) . . . .
ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 (yes) 0.56100.56100.56100.5610 [0.9635;2.0978]0.96352.0978[-0.9635;2.0978][ - 0.9635 ; 2.0978 ] 1.28971.2897-1.2897- 1.2897 [2.8033;0.1928]2.80330.1928[-2.8033;0.1928][ - 2.8033 ; 0.1928 ]
δUsubscript𝛿𝑈\delta_{U}italic_δ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT 5.78225.78225.78225.7822 [4.9088;5.6759]4.90885.6759[4.9088;5.6759][ 4.9088 ; 5.6759 ] 5.69005.69005.69005.6900 [4.8084;6.5605]4.80846.5605[4.8084;6.5605][ 4.8084 ; 6.5605 ]
δVsubscript𝛿𝑉\delta_{V}italic_δ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT 10.074010.074010.074010.0740 [8.4946;11.6237]8.494611.6237[8.4946;11.6237][ 8.4946 ; 11.6237 ] 2.60232.60232.60232.6023 [2.2232;2.9831]2.22322.9831[2.2232;2.9831][ 2.2232 ; 2.9831 ]
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure S5: Analysis of the amyloid-β𝛽\betaitalic_β 42/40 ratios in the DCN study data. The black lines refer to the estimated PDFs for a covariate profile of a randomly selected study participant (60 years of age, Sex = male, Education = 8 years, ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 = no). The gray lines refer to a situation where the participant would have been 70 years of age (a), would have been female (b), would have had 18 years of education (c), and would have been a carrier of the ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 allele (d). The vertical lines correspond to the estimated mean values (solid), median values (dashed) and 10%percent1010\%10 % and 90%percent9090\%90 % percentiles (dotted).
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure S6: Analysis of the amyloid-β𝛽\betaitalic_β 42/total tau ratios in the DCN study data. The black lines refer to the estimated PDFs for a covariate profile of a randomly selected study participant (60 years of age, Sex = male, Education = 8 years, ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 = no). The gray lines refer to a situation where the participant would have been 70 years of age (a), would have been female (b), would have had 18 years of education (c), and would have been a carrier of the ApoE ϵitalic-ϵ\epsilonitalic_ϵ4 allele (d). The vertical lines correspond to the estimated mean values (solid), median values (dashed) and 10%percent1010\%10 % and 90%percent9090\%90 % percentiles (dotted).