SSRN Id4377891

1 Methods for estimating Salmonella concentration in chilled chicken using highly left-
d
2 censored contamination data
we
3 Tianmei Sun a,1, Yangtai Liu a,1, Shufei Gao b, Xiaojie Qin a, Zijie Lin a, Xin Dou a, Xiang Wang
4 a, Hui Zhang c, Qingli Dong a, *
vie
5 a School of Health Science and Engineering, University of Shanghai for Science and
6 Technology, Shanghai, China
7 b College of Science, University of Shanghai for Science and Technology, Shanghai, China
re
8 c Jiangsu Academy of Agricultural Sciences, Nanjing, China
9 1 These authors contributed equally to this work

er
10 * Corresponding author: Qingli Dong
pe
11 Mailing address: School of Health Science and Engineering, University of Shanghai for
12 Science and Technology, 516 Jun Gong Rd., Shanghai 200093, China.
13 Tel.: +86-21-5527-1117
ot
14 Email: [email protected]
tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
15 Abstract
d
16 Salmonella is a common chicken-borne pathogen that causes human infections. Data below the
we
17 detection limit, referred to as left-censored data, are frequently encountered in the detection of
18 pathogens. The approach of handling the censored data was regarded to affect the estimation
vie
19 accuracy of microbial concentration. In this study, a set of Salmonella contamination data was
20 collected from chilled chicken samples using the most probable number (MPN) method, which
21 consisted of 90.42% (217/240) non-detect values. Two simulated datasets with fixed censoring
re
22 degrees of 73.60% and 90.00% were generated based on the real-sampling Salmonella dataset
23 for comparison. Three methodologies were applied for handling left-censored data: (i)
er
24 substitution with different alternatives, (ii) the distribution-based maximum likelihood
pe
25 estimation (MLE) method, and (iii) the multiple imputation (MI) method. For each dataset, the
26 negative binomial (NB) distribution-based MLE and zero-modified NB distribution-based
27 MLE were preferable for highly censored data and resulted in the least root mean square error
ot
28 (RMSE). Replacing the censored data with half the limit of quantification was the next best
tn
29 method. The mean concentration of Salmonella monitoring data estimated by the NB-MLE and
30 zero-modified NB-MLE methods was 0.68 MPN/g. This study provided an available statistical
31 method for handling bacterial highly left-censored data.

rin
32 Keywords: Foodborne pathogen; Limit of detection; Count distribution; Maximum likelihood
33 estimation; Multiple imputation method

ep
Pr
34 1. Introduction
d
35 Foodborne disease caused by Salmonella is an important public health concern worldwide
we
36 (Scallan et al., 2011). Salmonella can transmit among animals, foods, and humans, and is able
37 to survive in various environments (Batz et al., 2012). Among the risk sources, contaminated
vie
38 chicken and chicken-associated products are the most common triggers of human salmonellosis,
39 which may lead to gastroenteritis, enteric fever, typhoid fever, and bacteremia syndromes
40 (Foley et al., 2013; Andrews-Polymenis et al., 2010).
re
41 To effectively evaluate and control the effect of Salmonella on the public health, the risk of
42 consuming chicken products by concerned people is usually estimated by quantitative

er
43 microbiological risk assessment (QMRA) approach (Pouillot et al., 2012; Khalid et al., 2020;
pe
44 Xiao et al., 2021). A complete QMRA consists of four stages: (a) hazard identification, (b)
45 hazard characterization, (c) exposure assessment, and (d) risk characterization (Membre &
46 Boue, 2018). Among above stages, implementation of exposure assessment depends on

ot
47 microbiological data that characterize the detection and concentration of pathogens in food
tn
48 samples (Danyluk & Schaffner, 2011; Pouillot et al., 2013).
49 One of the challenges of monitoring pathogens in food samples is that available quantitative
50 data are limited because analyses are labor-intensive, and they are poorly informative due to a
rin
51 high proportion of non-detects (i.e., when a detection has been performed and the observation
52 is absence) (Commeau et al., 2012). In terms of statistics, observations below the limit of
ep
53 quantification (LOQ) are called left-censored data. As defined by Canales et al. (2018), the
54 censoring degree of microbial data can be divided into four levels: low (10%), medium (35%),
Pr
55 high (65%), and severe (90%). Ignoring these left-censored data may introduce bias into the
56 final concentration estimation. In practice, the seasonal and monthly prevalence of Salmonella
d
57 in chicken samples vary greatly, and the concentrations are relatively low (Ta et al., 2014; Zhu
we
58 et al., 2014). At the retail level, due to initial contamination multiply into colonies, the
59 destruction of biofilms, or the local growth of pathogens in non-liquid foods, contamination
vie
60 often occurs in clusters, resulting in highly censored concentration data (Jongenburger et al,
61 2012a; Jongenburger et al., 2012b).
62 In recent years, there are considerable developments and interests in methods for left-censored
re
63 data in environment- and health- related researches (Helsel, 2010; Krol et al., 2016; Petterson
64 et al., 2015). Substituting non-detects with a particular value (e.g., zero, LOQ, LOQ/2, or LOQ/
er
65 2) has been a common approach (Shoari & Dube, 2018). However, it is well recognized that
pe
66 substitution methods may introduce unnecessary errors, especially when the dataset is highly
67 censored (Helsel, 2010). Instead of using substitution methods, methods based on the maximum
68 likelihood estimation (MLE) have been regarded as the “gold standard” for handling censored
ot
69 data (Ganser & Hewett, 2010). MLE requires the assumption that a specific distribution should
tn
70 closely fit the shape of data (Helsel, 2012). When pathogens are randomly distributed in food,
71 a Poisson distribution (i.e., the variance is equal to the mean) is widely used as a standard
72 framework for the analysis of the observed count data (Mussida et al., 2013). However, in
rin
73 practice, some real-sampling counting outcomes may exhibit over-dispersion or clustered
74 phenomenon (i.e., variance is greater than the mean) (Sun et al., 2020). Under these
ep
75 circumstances, it is necessary to use more flexible models such as the heterogeneous Poisson
76 models and zero-modified models (Hinde & Demetrio, 1998; Gonzales-Barron et al., 2010).
Pr
77 The most commonly used heterogeneous Poisson distribution is the negative binomial which is
78 a suitable model for a mixture of Poisson and Gamma (Gonzales-Barron & Butler, 2011). The
d
79 zero-modified models are mainly divided into zero-inflated models and hurdle models, both of
we
80 which use a logit model with binomial assumptions to deal with non-detects and observed
81 counts (Mullahy, 1986; Gonzales-Barron et al., 2010). Among the more principled methods for
vie
82 dealing with left-censored data, multiple imputation (MI) has emerged as a popular approach
83 due to its flexibility and ability to provide unbiased estimates (Rezvan et al., 2015; Sullivan et
84 al., 2021). However, the performance of different MI methods remains to be explored when
re
85 dealing with highly censored microbial data.
86 The objectives of the present study are (i) to collect the concentration data of Salmonella in
er
87 chilled chicken in Shanghai, (ii) to evaluate substitution, MLE, and MI methods for handling
pe
88 left-censored data of Salmonella datasets, and (iii) to quantify precisely the contamination level
89 of Salmonella in chilled chicken at the retail level.
90 2. Material and methods

ot
91 2.1. Data collection

tn
92 Between October 2018 and September 2019, a total of 240 chilled chicken samples were
93 randomly sampled from retail markets in Shanghai, P. R. China. To account for seasonal
94 variation in Salmonella growth characteristics, a total of 12 different batches were conducted,

rin
95 in which the sample size of each batch was 20. Each sample was weighed, labelled, and placed
96 in a separately sterile bag before being transported to the laboratory for microbiological testing
ep
97 in an icebox within 2 h.
98 The quantitative analytical procedure for Salmonella was determined using the three-tube three-
Pr
99 dilution most probable number (MPN) method as described by the Food Safety and Inspection
100 Service of United States Department of Agriculture (USDA/FSIS, 2014, 2022) with a bit
d
101 modification. Briefly, a 25 g homogenized sample unit was added to 225 mL of butter peptone
we
102 water (BPW). Then, a triplicate-10 mL of this mixture was taken directly from the bags and
103 placed into three empty sterile tubes, transferring triplicate-1 mL from the original tubes to
vie
104 three further tubes containing 9 mL of BPW. A second 10-fold dilution in BPW was conducted
105 for each tube respectively. All tubes were incubated at 37°C for 20~24 h. For selective
106 enrichment, 0.1 ± 0.02 mL of pre-enriched culture was incubated into 10 mL tetrathionate
re
107 brilliant green broth (TTB) at 42°C and 10 mL selenite cystine broth (SC) at 37°C for 22~24 h.
108 A loopful of TTB and SC broth culture was streaked onto xylose lysine tergitol 4 (XLT4) agar
er
109 plates and incubated at 37°C for 22~24 h. Typical Salmonella colonies were selected from each
pe
110 plate and further confirmed. The MPN value was determined on the basis of the number of
111 positive or negative tube(s) in each of the three sets and the standard MPN table. Since the 10-
112 fold serial dilution in this quantitative analysis was from 10-1 to 10-3, the theoretical LOQ was
ot
113 determined to be 3 MPN/g according to the MPN calculation method derived by Jarvis et al
tn
114 (2010). The contamination level of this real-sampling quantitative dataset was either reported
115 as a number (e.g., 10 MPN/g) or as below LOQ (< 3 MPN/g).
116 Two more datasets were simulated to represent “true” distribution of Salmonella concentrations
rin
117 (C) in chilled chicken at retail. For each dataset, we assumed a uniform distribution:
118 𝐶~Uniform(𝑎, 𝑏) (1)

ep
119 where a is the theoretical LOQ value of MPN method; and b is the maximum value of the real-
120 sampling Salmonella dataset. In a previous study, we have investigated the pooled prevalence
Pr
121 of Salmonella in Chinese retail raw chicken is 26.4%, which means that approximately 73.6%
122 samples were censored (Sun et al., 2021). Additionally, we included a censoring rate similar to
d
123 that of the real-sampling dataset as a control to examine the relationship between method
we
124 selection and the censoring degree of data. Based on these, two copies of the simulated
125 Salmonella dataset were altered by a completely random censoring method, so that
vie
126 approximately 73.6% and 90% of the concentration data in the datasets were below the
127 theoretical LOQ in this study. Then, the methods for handling left-censored data were applied
128 to the real-sampling dataset (dataset R), 73.6% censored dataset (dataset A), and 90% censored
re
129 dataset (dataset B) and outcomes were compared.
130 2.2 Concentration estimation under left-censored data

er
131 2.2.1. Substitution methods
pe
132 The common substitution method is the easiest to implement and has the advantage of
133 automatically adapting to multiple constants (Ganser, & Hewett, 2010). In the three datasets of
134 Salmonella contamination in chilled chicken samples, each non-detect (ND) was simply
ot
135 replaced with an appropriate value-zero, LOQ, LOQ/2 or LOQ/ 2, and then the conventional
tn
136 statistical analysis was performed on each revised dataset. Particularly, we chose to replace
137 NDs with zero as the baseline method of this study.
138 2.2.2. The distribution-based MLE methods

rin
139 The MLE method for left-censored data is conducted by a likelihood function L, where for a
140 distribution with two parameters (mean and variance), L (mean, variance) defines the likelihood
ep
141 of matching with the observed data (Helsel, 2012). The distributions of Poisson, negative
142 binomial, zero-modified Poisson and zero-modified negative binomial were assumed further to
Pr
143 three datasets of Salmonella contamination levels in chicken samples.
144 (1) Poisson distribution
d
145 The index of dispersion (I) is used to describe the distribution of microorganisms in food, which
we
146 is measured by the variance-to-mean ratio (𝜎2/𝜇) (Mussida et al., 2013). In the Poisson
147 distribution model, pathogen cells are generally considered to be randomly distributed in the
vie
148 samples (𝐼 = 1, i.e., the variance is equal to the mean). The single-parameter Poisson
149 distribution models the number of microorganisms in a simplest form where the mean
150 concentration (𝜆) is constant. The probability mass function of the Poisson distribution is given
re
151 by:
𝑌𝑖
𝑒𝑥𝑝( ―𝜆)𝜆
152 𝑃𝑟(𝑌𝑖) = 𝑌𝑖!
er (2)
153 Where Yi is the Salmonella count at observation i (note that i=1, 2, 3, …, n observations); and
pe
154 𝜆 describes the mean parameter of the Salmonella contamination levels.
155 In order to fit the Poisson distribution, the log-likelihood function (LL) can be obtained by:
156 𝐿𝐿P = ∑𝑛𝑖=1 [ ―𝜆 + 𝑌𝑖𝐿𝑛(𝜆) ― 𝐿𝑛(𝑌𝑖!)] (3)

ot
157 (2) Negative binomial distribution

tn
158 When the over-dispersion occurs in the microbial dataset (i.e., the variance is greater than the
159 mean), the physical distribution of the microorganisms is not suitable to be described by
160 Poisson regression. As Gonzales-Barron et al (2012) assumed that, the random variation of
rin
161 microbial concentration (cells/g) within the mass of a food batch follows a gamma distribution,
162 while the number of microbial counts (cells) in the sample follows a Poisson distribution with
ep
163 the parameter (𝜆) given by the random gamma concentration. Thus, the Poisson distribution
164 can be further generalized by adding a gamma distribution to adjust for the heterogeneity in the
Pr
165 count data. The above modification to the Poisson distribution yields a more flexible
166 distribution, namely the discrete Poisson-gamma distribution, also known as the negative
d
167 binomial distribution (NB). The discrete probability of the Salmonella contamination level Yi,
we
168 sampled from a batch of chilled chicken, can then be described as:
𝛤(𝑌𝑖 + 𝛼―1) 𝛼―1

)[ ] [ ]𝑌
𝛼―1 𝜆 𝑖
169 Pr (𝑌𝑖) = 𝛤(𝑌 + 1)𝛤(𝛼
―1 (4)
𝑖 𝛼―1 + 𝜆 𝛼―1 + 𝜆
vie
170 where 𝛤(𝛼) represents the gamma function; Yi is the number of Salmonella cells present in the
171 ith sample; and 𝜆 describes the mean parameter of the contamination levels.
172 For distribution fitting, the log-likelihood function of the NB distribution was calculated as:
re
173 𝐿𝐿NB = ∑𝑛𝑖=1 𝐿𝑛 [ [ 𝛤(𝑌𝑖 + 𝛼―1)
𝛤(𝑌𝑖 + 1)𝛤(𝛼―1)
] ― (𝑌 + 𝛼
𝑖
―1 )𝐿𝑛(1 + 𝛼𝜆) + 𝑌𝑖𝐿𝑛(𝛼𝜆) ] (5)
174 (3) Zero-inflated Poisson distribution er

175 When a high proportion of NDs appear in the microbial dataset, the variance function may not
pe
176 be accounted by the Poisson or even the NB distribution (Hall, 2000). The zero-inflated
177 distributions are used to model this excess of zeros. In a zero-inflated distribution model, it is
178 usually assumed that the pathogen counts originate from two groups, where one group is for
ot
179 the portion of NDs, and the other group will follow a standard count distribution (such as a
tn
180 Poisson distribution or a NB distribution). If the standard count distribution is a Poisson, the
181 probability mass function of the zero-inflated Poisson distribution (ZIP) is given by:
𝑝0 + (1 ― 𝑝0)𝑒𝑥𝑝( ―𝜆), 𝑌𝑖 < 𝐿𝑂𝑄

{
rin
182 Pr (𝑌𝑖) = 𝑌𝑖
(1 ― 𝑝0) 𝑒𝑥𝑝( ―𝜆)𝜆 , 𝑌𝑖 ≥ 𝐿𝑂𝑄 (6)
𝑌! 𝑖
183 where Yi is the number of Salmonella cells present in the ith sample; p0 is the probability for the
ep
184 occurrence of zero counts resulting from uncontaminated sample units; and 𝜆 is the
185 contamination level of Salmonella in the batch of chilled chicken samples.

Pr
186 In order to fit the ZIP distribution, the log-likelihood function was expressed as:
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛[𝑝0 + (1 ― 𝑝0)𝑒𝑥𝑝( ―𝜆)] +
187 𝐿𝐿ZIP = ∑𝑛𝑖=1 ( [
𝐼 𝑌𝑖 ≥ 𝐿𝑂𝑄)[𝐿𝑛(1 ― 𝑝0) + 𝑌𝑖𝐿𝑛(𝜆) ― 𝜆 ― 𝐿𝑛(𝑌𝑖!)] ] (7)
d
188 (4) Zero-inflated negative binomial distribution
we
189 In some cases, even if the NDs are adequately fitted, the over-dispersion phenomenon of the
190 fundamental count distribution may still exist (Wang & Hailemariam, 2018b). If the zero-
vie
191 inflated model fails to take this over-dispersion into account, the probability distribution may
192 lead to biased estimates. Substituting Poisson portions in Eq. (6) by the NB yields the zero-
193 inflated negative binomial distribution (ZINB), whose probability mass function becomes:
re
{
𝛼―1
𝑝0 + (1 ― 𝑝0) [ 𝛼―1
𝛼―1 + 𝜆
] , 𝑌 < 𝐿𝑂𝑄𝑖
194 Pr (𝑌𝑖) = (8)
[ )[
] ][ ] ], 𝑌 ≥ 𝐿𝑂𝑄
―1
𝛼
𝛤(𝑌𝑖 + 𝛼 )
[ 𝑌𝑖
―1
𝛼―1 𝜆
(1 ― 𝑝0) 𝑖
) (
𝛤(𝑌𝑖 + 1 𝛤 𝛼―1 𝛼―1 + 𝜆
er 𝛼―1 + 𝜆
195 where 𝛤(𝛼) is the gamma function; Yi represents the number of Salmonella cells present in the
pe
196 ith sample; p0 is the probability for the occurrence of zero counts resulting from uncontaminated
197 sample units; and 𝜆 is the contamination level of Salmonella in the batch of chilled chicken
198 samples.
ot
199 The log-likelihood function of the ZINB distribution to be fitted was then be computed as:
[
𝛼―1
]
(
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛 𝑝0 + (1 ― 𝑝0)(𝛼―1 (𝛼―1 + 𝜆)) )+
tn
∑𝑛𝑖=1 𝐿𝑛(1 ― 𝑝0) + 𝐿𝑛[𝛤(𝑌𝑖 + 𝛼 )] ― 𝐿𝑛[𝛤(𝑌𝑖 + 1)]

[
―1
200 𝐿𝐿ZINB =
𝐼(𝑌𝑖 ≥ 𝐿𝑂𝑄)
―𝐿𝑛[𝛤(𝛼―1)] + 𝑌𝑖𝐿𝑛(𝛼𝜆) ― (𝑌𝑖 + 𝛼―1)𝐿𝑛(1 + 𝛼𝜆) ] (9)
201 (5) Hurdle Poisson distribution

rin
202 The hurdle distribution model is another zero-modified model that can deal with a high
203 percentage of NDs. The idea underlying the hurdle model is that a binomial probability
ep
204 distribution, which determines whether the microbial data is NDs or positive counts (> LOQ).
205 If the microbial observation is positive, the ‘hurdle’ is crossed, and the conditional distribution
Pr
206 of a positive count is governed by a truncated-at-zero distribution model (Mullahy, 1986). The
10
207 hurdle Poisson model (HP) is the most popular hurdle model, which includes a truncated
d
208 Poisson distribution to describe the positive counts in the microbial dataset. Thus, the
we
209 probability mass function of HP distribution is given by:
{
𝜔0, 𝑌𝑖 < 𝐿𝑂𝑄
210 𝑃𝑟(𝑌𝑖) = (1 ― 𝜔 ) 𝑒𝑥𝑝( ―𝜆)𝜆𝑌𝑖 , 𝑌 ≥ 𝐿𝑂𝑄 (10)
0 (1 ― 𝑒𝑥𝑝( ―𝜆))𝑌 ! 𝑖
𝑖
vie
211 where Yi is the number of Salmonella cells present in the ith sample; 𝜆 is the concentration of
212 Salmonella in the chilled chicken samples; and 𝜔0 is the probability for the zero counts arising
re
213 from the negative samples.
214 The log-likelihood function for fitting the HP distribution to count data is illustrated as:
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)(𝜔0) +

[ ]
215
𝑖 0 𝑖
er
𝐿𝐿HP = ∑𝑛𝑖=1 𝐼(𝑌 ≥ 𝐿𝑂𝑄)[𝐿𝑛(1 ― 𝜔 ) ― 𝜆 + 𝑌 𝐿𝑛(𝜆) ― 𝐿𝑛(1 ― 𝑒𝑥𝑝( ―𝜆)) ― 𝐿𝑛(𝑌 !)] (11)
𝑖
216 (6) Hurdle negative binomial distribution

pe
217 Similarly, the probability mass function of the hurdle negative binomial distribution (HNB) can
218 be mathematically described as follows:
{
𝜔0, 𝑌𝑖 < 𝐿𝑂𝑄
ot
[ ]
(
𝛤 𝑌𝑖 + 𝛼―1) (
― 𝑌𝑖+𝛼―1 ) 𝑌𝑖
(1 + 𝛼𝜆) (𝜆𝛼)
219 𝑃𝑟(𝑌𝑖) = (1 ― 𝜔0) 𝛤(𝑌𝑖 + 1)𝛤(𝛼―1)
, 𝑌𝑖 ≥ 𝐿𝑂𝑄 (12)
𝛼―1
1― ( 𝛼―1
𝛼―1 + 𝜆
)
tn
220 where 𝛤(𝛼) represents the gamma function; Yi is the number of Salmonella cells present in the
221 ith sample; 𝜆 describes the mean parameter of the contamination levels; and 𝜔0 is the probability
rin
222 for the zero counts arising from the negative samples.
223 The log-likelihood function of the HNB distribution to fitted was given by:
ep
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛(𝜔0)
[ [ ]
𝛼―1
𝑛
𝐿𝑛(1 ― 𝜔0) + 𝐿𝑛[𝛤(𝑌𝑖 + 𝛼 ―1
[ ( )]
)] ― 𝐿𝑛 1 ―
𝛼―1
𝛼―1 + 𝜆
Pr
224 𝐿𝐿𝐻𝑁𝐵 = +𝐼(𝑌𝑖 ≥ 𝐿𝑂𝑄)

𝑖=1 ―𝐿𝑛(𝛤(𝑌𝑖 + 1)) ― 𝐿𝑛(𝛤(𝛼―1)) ― (𝑌𝑖 + 𝛼―1)𝐿𝑛(1 + 𝛼𝜆)
+ 𝑌𝑖𝐿𝑛(𝜆𝛼)
11
225 (13)
d
226 2.2.3. The MI method
we
227 MI is a way to solve the problem of complex incomplete data (Rubin, 1996). In this study, the
228 multivariate-imputation by chained equations method (MICE) was used. To estimate the
vie
229 parameters of the three highly-censored Salmonella datasets, the technique was coded as a
230 function named as predict mean matching (pmm). The MI-pmm method is a general semi-
231 parametric imputation method, which is characterized by imputing to the censored values in
re
232 the dataset based on observations. The MI-pmm method implemented by the R package ‘mice’
233 (version 3.11.0) can be found at https://cran.r-project.org/web/packages/mice/mice.pdf .

er
234 2.3. Statistical analysis
pe
235 The distributions of Poisson, NB, ZIP, ZINB, HP and HNB were fitted as MLE methods to
236 both three Salmonella datasets. The goodness-of-fit (GOF) statistics and estimated parameters
237 of each distribution model were calculated. A standard method for comparing the models is to
ot
238 refer to the information criteria based on the fitted log-likelihood (LL) function. Commonly
tn
239 used approaches include Akaike Information Criterion (AIC) and Bayesian Information
240 Criterion (BIC),
241 𝐴𝐼𝐶 = 2𝑘1 ―2𝐿𝐿 (14)

rin
242 𝐵𝐼𝐶 = 𝑘2𝐿𝑛(𝑛) ―2𝐿𝐿 (15)
243 where 𝑘1 and 𝑘2 represent the number of parameters in the model, and n represents the number
ep
244 of observations.
245 Also, the cumulative probability plots and observed minus predicted probability plots of these
Pr
246 distribution models were used to illustrate the quality of the fit. For each distribution, predicted
12
247 probabilities were computed by the probability mass functions shown in Eqs. (2), (4), (6), (8),
d
248 (10) and (12).
we
249 In addition, comparisons among substitution methods, distribution-based MLE methods, and
250 MI method were performed by root mean square errors (RMSEs). The formula of the RMSE
vie
251 was as the following:
2
252 𝑅𝑀𝑆𝐸 = 1
∑𝑛𝑖=1 (𝑌𝑖 ― 𝑌𝑖) (16)
𝑛
253 where 𝑌𝑖 is the observed value (i.e., the baseline data in this study), 𝑌𝑖 is the predicted value,
re
254 and n represents the number of observations. A lower RMSE value indicates closer estimation
255 to the known value. All analyses were performed in R (Version 3.6.0, http://www.R-
er
256 project.org/).
pe
257 3. Results and Discussion
258 3.1. Overall Salmonella prevalence and MPN determination
259 A total of 240 retail chicken samples were collected, of which 9.58% (23/240) were positive
ot
260 for Salmonella contamination. The quantitative microbial method used in this study quantified
tn
261 the load values of Salmonella in chicken samples with the MPN as the unit, ranging from 3
262 MPN/g to 29 MPN/g. Table 1 summarized the prevalence and MPN enumeration data of
263 Salmonella in retail chicken samples. According to the quantitative analysis, most Salmonella-
rin
264 positive chicken samples (17 out of 23) were contaminated at a level between 3 and 10 MPN/g,
265 with only three samples exceeding 10 MPN/g. The quantitative detection results of Salmonella
ep
266 in retail chicken samples were consistent with the results reported from previous studies
267 conducted in China (Wang et al., 2014; Zhu et al., 2014; Yang et al., 2020). As shown in Table
Pr
268 1, the number of Salmonella in the majority of these samples was below the LOQ of the
13
269 microbiological method, which was 3 MPN/g. In the real sampling dataset, 217 out of 240
d
270 samples were non-detected, that is, left-censored data were included. The number of censored
we
271 data in the simulated datasets A and B were 177 and 216, respectively. In other words, the order
272 according to the degree of left-censoring of the data is: real sampling dataset, simulated dataset
vie
273 B, and simulated dataset A. With most observations below the LOQ, the count dataset contains
274 a significant amount of censoring, which is not uncommon in quantitative microbiological
275 investigations, but can make statistical inference challenging.
re
276 3.2. Fitting of data by probabilistic distributions
277 Each dataset, including 73.75%, 90.00% and 90.42% left-censored data of the data according
er
278 to Salmonella contamination levels, was fitted to the Poisson, NB, ZIP, ZINB, HP, and HNB
pe
279 distributions. When fitting these six distributions to the real sampling dataset of Salmonella,
280 the better fit was obtained with the NB distribution and the HNB distribution (lower AIC and
281 BIC) (Table 2). It can be noted that the NB distribution produced a better fit than Poisson
ot
282 distribution. As expected, Poisson distribution does not fit well since it has no model parameters
tn
283 to account for over-dispersion and zero-inflation (Sun et al., 2019). Therefore, it is necessary
284 to introduce this heterogeneity term to adapt to the unobserved heterogeneity in the count data,
285 resulting in a more flexible heterogeneous Poisson distribution (i.e., the NB distribution). Table
rin
286 2 also showed that the standard deviation (SD) of the NB distribution was 3.81, which was no
287 longer a compressed value as the over-dispersion had been well explained. With regard to the
ep
288 zero-modified distributions (i.e., ZIP, ZINB, HP, and HNB), adding the zero-inflated part to
289 the count data did not improve the fit obtained by simple NB distribution, except for the HNB
Pr
290 distribution. It can be visualized in Fig. 1(a) and Fig. 2(a), the observed probabilities of the real
14
291 sampling data of Salmonella (circle markers) against Poisson distribution (dash lines) indicated
d
292 that Poisson distribution underestimated the level of zero counts existed in the dataset. The
we
293 probability difference plot of Fig. 3(a) showed that the probability values predicted by the HNB
294 distribution and the NB distribution were closer to the observed values of Salmonella. Owing
vie
295 to the Poisson distribution is much too restrictively by definition, the ZIP distribution and the
296 HP distribution were capable only to address the level of zero counts and still underestimated
297 the SD of the data (see Table 2). However, when fitting the real sampling data of Salmonella
re
298 in chilled chicken samples with a censoring rate of 90.42%, the ZINB distribution did not
299 converge. Based on these findings, it is assumed that the NB or HNB distributions can be
er
300 applied to estimate the Salmonella contamination level with a mean of 0.68 MPN/g.
pe
301 For the simulated dataset A, the fitting characteristics of Poisson, NB, ZIP, ZINB, HP, and
302 HNB distributions were basically consistent with the real sampling dataset of Salmonella. In
303 Table 3, the ZINB distribution and the HNB distribution showed a better fit to the over-
ot
304 dispersion dataset which contained left-censored data. What’s more, the statistical performance
tn
305 of the ZINB distribution was almost the same as those of the HNB distribution. The mean level
306 of Salmonella contamination in the simulated dataset A, which was obtained by the ZINB or
307 HNB distribution, was 3.92 MPN/g. Probability density (or mass) estimates for Salmonella
rin
308 counts of simulated dataset A were shown in Fig. 1(b). Compared with the ZINB distribution
309 and the HNB distribution, although the NB distribution was sufficient to model the data, it
ep
310 sightly underestimated the probability of zero counts in this dataset. In Fig. 2(b) and Fig. 3 (b),
311 the cumulative probability and probability difference of the ZINB distribution followed a
Pr
312 similar trend to those of the HNB distribution. While the NB distribution provides the simplest
15
313 way of modeling microbial data, the zero-modified NB distributions are expected to outperform
d
314 the NB distribution due to their capacity to describe the extra zero counts (Gonzales-Barron et
we
315 al., 2010).
316 For the simulated dataset B, the fitting results showed two similarities with the simulated
vie
317 dataset A and the real sampling dataset (Table 4). Firstly, the NB distribution fitted the count
318 data better than Poisson distribution (lower AIC and BIC). Secondly, the zero-modified
319 distributions were not too much better than the simple NB distribution in fitting the over-
re
320 dispersion data. In addition, although the proportion of NDs in the simulated dataset B was
321 similar to that in the real sampling dataset of Salmonella, there was a slight difference in the
er
322 fitting results of these six distribution models compared with the real sampling dataset. In Table
pe
323 4, it can be found that the ZIP distribution and the HP distribution had lower AIC and BIC,
324 producing a little better fit than other distributions (i.e., NB or HNB distribution). For
325 comparison, it was shown in Fig. 1(c) and Fig. 2(c) that the HP distribution was slightly
ot
326 overestimated than the ZIP distribution at zero values. However, according to the SDs of the
tn
327 simulated dataset B obtained with these six distributions (Table 4), the SDs of the NB-based
328 distributions were greater than those of Poisson-based distributions. It can be explained that
329 while the zero-modified Poisson distributions can account for the probability at zero, the count
rin
330 distribution component still requires more flexibility, which can only be provided by a
331 heterogeneous count distribution like the NB distribution (Gonzales-Barron et al., 2010). The
ep
332 use of the NB distribution has already been used to describe the clustering of Campylobacter
333 contamination in broilers by better handling the heterogeneity within the dataset (Reich et al.,
Pr
334 2018). According to the probability difference plot in Fig. 3(c), the probability values predicted
16
335 by the NB distribution, the ZIP distribution, and the HP distribution were close to the
d
336 Salmonella counts in the dataset. Similarly, when fitting the simulated dataset of Salmonella in
we
337 chilled chicken samples with a censoring rate of 90.00%, the ZINB distribution did not
338 converge too.
vie
339 Based on these findings, this study demonstrated the superiority of the NB-based distributions
340 over the Poisson-based distributions for representing highly censored data. On one hand, the
341 NB distribution is better at describing low microbial counts than Poisson distribution, so it
re
342 constitutes the main framework for the derivation of the low-incidence microbial statistical
343 model (Gonzales-Barron et al., 2014). On the other hand, the central tendency estimate of the
er
344 NB distribution is the ‘arithmetic average’, which can reflect the actual concentration more
pe
345 accurately than the mean logs (geometric means as found by the lognormal and Poisson-
346 lognormal distributions) (Gonzales-Barron & Butler, 2011; Jongenburger et al., 2015). A
347 previous study showed that the ZINB distribution were capable to represent data of 13% NDs
ot
348 (coliforms data) and of 42% NDs (E.coli) with considerable accuracy (Gonzales-Barron et al.,
tn
349 2010), while the ability to fit the severely censored data (≥90%) in this study has yet to be
350 proven.
351 3.3. Method comparison

rin
352 Under high censoring, substitution method, distribution-based MLE method, and MI method
353 were assessed to left-censored data of microbial counts of Salmonella in chilled chicken
ep
354 samples. Estimated parameters and statistical performance were presented in Table 5. In this
355 work, in addition to consider a non-detect sample as zero (baseline), we also evaluated the
Pr
356 treatment of the remaining three alternatives (i.e., LOQ/2, LOQ/ 2 , and LOQ). It can be noted
17
357 that the RMSEs increased when NDs were replaced in the following order: LOQ/2, LOQ/ 2 ,
d
358 and LOQ. Among the different alternatives evaluated, it seems more reasonable to use a
we
359 corresponding LOQ/2 or LOQ/ 2 for each ND than to replace all of them by a unique value
360 like the average or median of the dataset (Rajal et al., 2007; Poma et al., 2019). For the real
vie
361 sampling dataset, the mean level of Salmonella contamination was 2.07 MPN/g and 2.63
362 MPN/g when the LOQ/2 and LOQ/ 2 were used to represent NDs, respectively. Additionally,
363 it is well known that any negative result is associated with LOQ (which can be modified by
re
364 more sensitive and efficient methods), so the common practice of substituting ND with zero is
365 unrealistic (Poma et al., 2019). Using zero may result in an underestimation of public health
er
366 risks if a decision is made to protect public health based on these estimations. Conversely, when
pe
367 the value related to LOQ is used to replace the ND, it can indicate a possible false-negative
368 case, that is, there is still contamination with the target pathogen although the concentration is
369 below the LOQ of the detection method.

ot
370 As mentioned before, the NB distribution and the HNB distribution can account for the large
tn
371 amount of NDs in the dataset. In a comparison of all measures of performance, NB-MLE and
372 HNB-MLE were overall most superior methods, which yielded similar descriptive statistics and
373 small RMSE. For the real sampling dataset, the mean concentration of Salmonella on raw
rin
374 chicken samples obtained by both the NB-MLE and HNB-MLE approaches was 0.68 MPN/g.
375 Earlier, Gonzales-Barron et al. (2014) pointed out that there was not a significant difference in
ep
376 the microbial prediction between the NB distribution and the HNB distribution for the pre-chill
377 batches of beef carcasses as well. In general, a better understanding of the actual physical
Pr
378 distribution of pathogens can often improve decision-making on food quality and safety issues.
18
379 (Wang, & Hailemariam, 2018a). A simple MLE method was found to be ineffective in fitting
d
380 highly censored data because the distribution selection was not considered (Canales et al., 2018).
we
381 Thus, the choice of the probability distribution has more important impact on the risk estimation
382 especially when the microbial data is highly censored.
vie
383 The MI-pmm method did not perform as well as other methods in this study. This meant that,
384 at worst, the method of imputing NDs based on predicted mean matching greatly overpredicted
385 the concentration of Salmonella (i.e., the mean concentration of Salmonella in the real sampling
re
386 dataset was 4.19 MPN/g). Although the MI method has performed well in other simulation
387 studies, for example, it has been noted that the lognormal distribution-based MI method and
er
388 the uniform distribution-based MI method are more suitable for fitting highly left-censored data
pe
389 than the simple MLE method (Canales et al., 2018). However, the misspecification of the
390 distribution will also cause a certain bias. If the assumed distribution is not the distribution
391 followed by the real data, this may lead to poor performance of the method assuming a
ot
392 particular distribution (Shoari et al., 2015). Furthermore, the adoption of a more appropriate
tn
393 distribution when using the MI method can offer better accuracy in handling highly left-
394 censored data of specific foodborne pathogens, such as the NB (or zero-modified NB)
395 distribution for Salmonella suggested in this study.

rin
396 4. Conclusions
397 The methodology used to find the optimal fit for microbial data is critical, especially in the case
ep
398 of highly left-censored data, because different approaches can lead to very different quantitative
399 estimates. The MLE method based on the NB model or the zero-modified NB model can help
Pr
400 to estimate the contamination level of Salmonella more precisely when the count data is highly
19
401 censored. Based on the monitoring data on Salmonella, the mean level of contamination
d
402 obtained by the NB-MLE or zero-modified NB-MLE model was 0.68 MPN/g. Regarding the
we
403 substitution methods for treating censored data, replacing NDs by LOQ/2 seems to be the next
404 best method. Overall, the MI-pmm method alone generated poor outcomes. In the future, both
vie
405 the MLE method and the MI method could be expanded by selecting suitable distributions joint
406 alternative values for left-censored data, thereby bridging the current gap in solving commonly
407 experienced detection of limit issues in microbiology. If there is the aim of correctly
re
408 interpreting the left-censored data for foodborne pathogens by using a more reliable model, the
409 QMRA results may be improved to better predict foodborne infections.

er
410 CRediT authorship contribution statement
pe
411 Tianmei Sun: Methodology, Data curation, Software, Formal analysis, Writing - original draft.
412 Yangtai Liu: Conceptualization, Methodology, Writing - review & editing. Shufei Gao:
413 Methodology, Writing - review & editing. Xiaojie Qin: Validation, Resources. Zijie Lin:
ot
414 Conceptualization, Investigation. Xin Dou: Conceptualization, Investigation. Xiang Wang:

tn
415 Conceptualization, Methodology. Hui Zhang: Supervision. Qingli Dong: Conceptualization,
416 Project administration, Funding acquisition, Writing - review & editing.
417 Declaration of Competing Interest

rin
418 The authors declare that there is no conflict of interest.
419 Data availability

ep
420 Data will be made available on request.
421 Acknowledgments
Pr
422 This study was supported by the National Nature Science Foundation of China (Grant
20
423 No.32102111) and Shanghai Agriculture Applied Technology Development Program, China
d
424 (Grant No.X2021-02-08-00-12-F00782).
we
vie
re
er
pe
ot
tn
rin
ep
Pr
21
425 References
d
426 Andrews-Polymenis, H. L., Baumler, A. J., McCormick, B. A., & Fang, F. C. (2010). Taming
427 the elephant: Salmonella biology, pathogenesis, and prevention. Infection and Immunity, 78(6),
we
428 2356-2369. https://doi.org/10.1128/IAI.00096-10
429
430 Batz, M. B., Hoffmann, S., & Morris, J. G. (2012). Ranking the disease burden of 14 pathogens
431 in food sources in the United States using attribution data from outbreak investigations and
vie
432 expert elicitation. Journal of Food Protection, 75(7), 1278-1291. https://doi.org/10.4315/0362-
433 028X.JFP-11-418
434
re
435 Canales, R. A., Wilson, A. M., Pearce-Walker, J. I., Verhougstraete, M. P., & Reynolds, K. A.
436 (2018). Methods for handling left-censored data in quantitative microbial risk assessment.
437 Applied and Environmental Microbiology, 84(20), e01203-18.
438 https://doi.org/10.1128/AEM.01203-18
439
er
440 Commeau, N., Parent, E., Delignette-Muller, M. L., & Cornu, M. (2012). Fitting a lognormal
441 distribution to enumeration and absence/presence data. International Journal of Food
pe
442 Microbiology, 155(3), 146-152. https://doi.org/10.1016/j.ijfoodmicro.2012.01.023
443
444 Danyluk, M. D., & Schaffner, D. W. (2011). Quantitative assessment of the microbial risk of
445 leafy greens from farm to consumption: preliminary framework, data, and risk estimates.
ot
446 Journal of Food Protection, 74(5), 700-708. https://doi.org/10.4315/0362-028X.JFP-10-373
447
tn
448 Foley, S. L., Johnson, T. J., Ricke, S. C., Nayak, R., & Danzeisen, J. (2013). Salmonella
449 pathogenicity and host adaptation in chicken-associated serovars. Microbiology and Molecular
450 Biology Reviews, 77(4), 582-607. https://doi.org/10.1128/MMBR.00015-13
451
rin
452 Ganser, G. H., & Hewett, P. (2010). An accurate substitution method for analyzing censored
453 data. Journal of Occupational and Environmental Hygiene, 7(4), 233-244.
454 https://doi.org/10.1080/15459621003609713
ep
455
456 Gonzales-Barron, U., Kerr, M., Sheridan, J. J., & Butter, F. (2010). Count data distributions
457 and their zero-modified equivalents as a framework for modelling microbial data with a
Pr
458 relatively high occurrence of zero counts. International Journal of Food Microbiology, 136(3),
459 268-277. https://doi.org/10.1016/j.ijfoodmicro.2009.10.016
460
22
461 Gonzales-Barron, U., & Butler, F. (2011). A comparison between the discrete Poisson-gamma
462 and Poisson-lognormal distributions to characterise microbial counts in foods. Food Control,
d
463 22(8), 1279-1286. https://doi.org/10.1016/j.foodcont.2011.01.029
we
464
465 Gonzales-Barron, U., Lenahan, M., Sheridan, J., & Butler, F. (2012). Use of a Poisson-gamma
466 model to assess the performance of the EC process hygiene criterion for Enterobacteriaceae
467 on Irish sheep carcasses. Food Control, 25(1), 172-183.
468 https://doi.org/10.1016/j.foodcont.2011.10.035
vie
469
470 Gonzales-Barron, U., Cadavez, V., & Butler, F. (2014). Conducting inferential statistics for
471 low microbial counts in foods using the Poisson-gamma regression. Food Control, 37, 385-
re
472 394. https://doi.org/10.1016/j.foodcont.2013.09.032
473
474 Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case
475 study. Biometrics, 56(4), 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x
476
er
477 Helsel, D. (2010). Much ado about next to nothing: incorporating nondetects in science. Annals
pe
478 of Occupational Hygiene, 54(3), 257-262. https://doi.org/10.1093/annhyg/mep092
479
480 Helsel, D. R. (2012). Statistics for censored environmental data using Minitab and R (2nd ed.).
481 John Wiley & Sons, Inc., Hoboken, New Jersey (Chapter 10).
ot
482
483 Hinde, J., & Demetrio, C. G. B. (1998). Overdispersion: models and estimation. Computational
tn
484 Statistics & Data Analysis, 27(2), 151-170. https://doi.org/10.1016/S0167-9473(98)00007-3
485
486 Jarvis, B., Wilrich, C., & Wilrich, P. T. (2010). Reconsideration of the derivation of most
rin
487 probable numbers, their standard deviations, confidence bounds and rarity values. Journal of
488 Applied Microbiology, 109(5), 1660-1667. https://doi.org/10.1111/j.1365-2672.2010.04792.x
489
ep
490 Jongenburger, I., Bassett, J., Jackson, T., Zwietering, M. H., & Jewell, K. (2012a). Impact of
491 microbial distributions on food safety I. Factors influencing microbial distributions and
492 modelling aspects. Food Control, 26(2), 601-609.
Pr
494
495 Jongenburger, I., Bassett, J., Jackson, T., Gorris, L. G. M., Jewell, K., & Zwietering, M. H.
23
496 (2012b). Impact of microbial distributions on food safety II. Quantifying impacts on public
497 health and sampling. Food Control, 26(2), 546-554.
d
we
499
500 Jongenburger, I., den Besten, H. M. W., & Zwietering, M. H. (2015). Statistical aspects of food
501 safety sampling. In M. P. Doyle, & T. R. Klaenhammer (Eds). Annual review of food science
502 and technology (vol. 6, pp. 479-503). Palo Alto: Annual Reviews.
vie
503
504 Khalid, T., Hdaifeh, A., Federighi, M., Cummins, E., Boue, G., Guillou, S., & Tesson, V.
505 (2020). Review of quantitative microbial risk sssessment in poultry meat: the central position
506 of consumer behavior. Foods, 9(11), 1661. https://doi.org/10.3390/foods9111661
re
507
508 Krol, A., Ferrer, L., Pignon, J. P., Proust-Lima, C., Ducreux, M., Bouche, O., …Rondeau, V.
509 (2016). Joint model for left-censored longitudinal data, recurrent events and terminal event:
510 predictive abilities of tumor burden for cancer evolution with application to the FFCD 2000-05
511
er
trial. Biometrics, 72(3), 907-916. https://doi.org/10.1111/biom.12490
512
pe
513 Membre, J. M., & Boue, G. (2018). Quantitative microbiological risk assessment in food
514 industry: Theory and practical application. Food Research International, 106, 1132-1139.
515 https://doi.org/10.1016/j.foodres.2017.11.025
516
ot
517 Mullahy, J. (1986). Specification and testing of some modified count data models. Journal of
518 Econometrics, 33 (3), 341-365. https://doi.org/10.1016/0304-4076(86)90002-3
tn
519
520 Mussida, A., Gonzales-Barron, U., & Butler, F. (2013). Effectiveness of sampling plans by
521 attributes based on mixture distributions characterising microbial clustering in food. Food
522 Control, 34(1), 50-60. https://doi.org/10.1016/j.foodcont.2013.04.001
rin
523
524 Petterson, S., Grondahl-Rosado, R., Nilsen, V., Myrmel, M., & Robertson, L. J. (2015).
525 Variability in the recovery of a virus concentration procedure in water: implications for QMRA.
ep
526 Water Research, 87, 79-86. https://doi.org/10.1016/j.watres.2015.09.006
527
528 Poma, H. R., Kundu, A., Wuertz, S., & Rajal, V. B. (2019). Data fitting approach more critical
Pr
529 than exposure scenarios and treatment of censored data for quantitative microbial risk
530 assessment. Water Research, 154, 45-53. https://doi.org/10.1016/j.watres.2019.01.041
531
24
532 Pouillot, R., Garin, B., Ravaonindrina, N., Diop, K., Ratsitorahina, M., Ramanantsoa, D., &
533 Rocourt, J. (2012). A risk assessment of campylobacteriosis and salmonellosis linked to
d
534 chicken meals prepared in households in Dakar, Senegal. Risk Analysis, 32(10), 1798-1819.
535 https://doi.org/10.1111/j.1539-6924.2012.01796.x
we
536
537 Pouillot, R., Hoelzer, K., Chen, Y. H., & Dennis, S. (2013). Estimating probability distributions
538 of bacterial concentrations in food based on data generated using the most probable number
539 (MPN) method for use in risk assessment. Food Control, 29(2),350-357.
vie
541
542 Rajal, V. B., McSwain, B. S., Thompson, D. E., Leutenegger, C. M., Kildare, B. J., & Wuertz,
re
543 S. (2007). Validation of hollow fiber ultrafiltration and real-time PCR using bacteriophage PP7
544 as surrogate for the quantification of viruses from water samples. Water Research, 41(7), 1411-
545 1422. https://doi.org/10.1016/j.watres.2006.12.034
546 er
547 Reich, F., Valero, A., Schill, F., Bungenstock, L., & Klein, G. (2018). Characterisation of
548 Campylobacter contamination in broilers and assessment of microbiological criteria for the
549 pathogen in broiler slaughterhouses. Food Control, 87, 60-69.
pe
551
552 Rezvan, P.H., Lee, K. J., & Simpson, J. A. (2015). The rise of multiple imputation: a review of
553 the reporting and implementation of the method in medical research. BMC Medical Research
ot
554 Methodology, 15, Article 30. https://doi.org/10.1186/s12874-015-0022-1
555
tn
556 Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical
557 Association, 91(434), 473-489. https://doi.org/10.2307/2291635
558
rin
559 Scallan, E., Hoekstra, R. M., Angulo, F. J., Tauxe, R. V., Widdowson, M. A., Roy, S.
560 L., …Griffin, P. M. (2011). Foodborne illness acquired in the United States-major pathogens.
561 Emerging Infections Diseases, 17(1), 7-15. https://doi.org/10.3201/eid1701.P11101
562
ep
563 Shoari, N., Dube, J. S., & Chenouri, S. (2015). Estimating the mean and standard deviation of
564 environmental data with below detection limit observations: considering highly skewed data
565 and model misspecification. Chemosphere, 138, 599-608.
Pr
566 https://doi.org/10.1016/j.chemosphere.2015.07.009
567
25
568 Shoari, N., & Dube, J. S. (2018). Toward improved analysis of concentration data: embracing
569 nondetects. Environmental Toxicology and Chemistry, 37(3), 643-656.
d
570 https://doi.org/10.1002/etc.4046
we
571
572 Sullivan, T. R., Yelland, L. N., Moreno-Betancur, M., & Lee, K. J. (2021). Multiple imputation
573 for handling missing outcome data in randomized trials involving a mixture of independent and
574 paired data. Statistics in Medicine, 40(27), 6008-6020. https://doi.org/10.1002/sim.9166
vie
575
576 Sun, W. X., Jin, Y. Q., Dai, Y. X., Xiao, J. W., Wang, X., & Dong, Q. L. (2019). Application
577 of zero-inflated models in quantitative exposure assessment of Listeria monocytogenes in bulk
578 cooked meat. in Chinese. Food Science, 40(11), 49-54.
re
579
580 Sun, W. X., Sun, T. M., Wang, X., Liu, Q., & Dong, Q. L. (2020). Probabilistic model for
581 estimating Listeria monocytogenes concentration in cooked meat products from
582 presence/absence data. Food Research International, 131, 109040.
583
er
https://doi.org/10.1016/j.foodres.2020.109040
584
pe
585 Sun, T. M., Liu, Y. T., Qin, X. J., Aspridou, Z., Zheng, J. M., Wang, X., …Dong, Q. L. (2021).
586 The prevalence and epidemiology of Salmonella in retail raw poultry meat in China: A
587 systematic review and meta-analysis. Foods, 10(11), 2757.
588 https://doi.org/10.3390/foods10112757
ot
589
590 Ta, Y. T., Nguyen, T. T., To, P. B., Pham, D. X., Le, H. T. H., Thi, G. N., …Doyle, M. P.
591 (2014). Quantification, serovars, and antibiotic resistance of Salmonella isolated from retail
tn
592 raw chicken meat in Vietnam. Journal of Food Protection, 77 (1), 57-66.
593 https://doi.org/10.4315/0362-028X.JFP-13-221
594
rin
595 USDA/FSIS, 2014. Most probable number procedure and tables. Available at:
596 https://www.fsis.usda.gov/sites/default/files/media_file/2021-03/MLG-Appendix-2.pdf.
597 (accessed 3 June, 2022)
598
ep
599 USDA/FSIS, 2022. Isolation and identification of Salmonella from meat, poultry, pasteurized
600 egg and siluriformes (Fish) products and Carcass and Environmental Sponges. Available at:
601 https://www.fsis.usda.gov/sites/default/files/media_file/documents/MLG-4.12.pdf. (accessed
Pr
602 3 June, 2022)
603
26
604 Wang, F. K., & Hailemariam, S. S. (2018a). Sampling plans for the zero-inflated Poisson
605 distribution in the food industry. Food Control, 85, 359-368.
d
we
607
608 Wang, F. K., & Hailemariam, S. S. (2018b). Sampling plans for the zero-inflated negative
609 binomial distribution in the food industry. Quality Reliability Engineering International, 34(6),
610 1174-1184. https://doi.org/10.1002/qre.2316
vie
611
612 Wang, Y. R., Chen, Q., Cui, S. H., Xu, X., Zhu, J. H., Luo, H. P, …Li, F. Q. (2014).
613 Enumeration and characterization of Salmonella isolates from retail chicken carcasses in
614 Beijing, China. Foodborne Pathogens and Disease, 11(2), 126-132.
re
615 https://doi.org/10.1089/fpd.2013.1586
616
617 Xiao, X. N., Wang, W., Zhang, J. M., Liao, M., Rainwater, C., Yang, H., & Li, Y. B. (2021).
618 A quantitative risk assessment model of Salmonella contamination for the yellow-feathered
619
er
broiler chicken supply chain in China. Food Control, 121, 107612.
620 https://doi.org/10.1016/j.foodcont.2020.107612
pe
621
622 Yang, X. J., Huang, J. H., Zhang, Y. X., Liu, S. R., Chen, L., Xiao, C., …Wu, Q. P. (2020).
623 Prevalence, abundance, serovars and antimicrobial resistance of Salmonella isolated from retail
624 raw poultry meat in China. Science of The Total Environment, 713, 136385.
625 https://doi.org/10.1016/j.scitotenv.2019.136385
ot
626
627 Zhu, J. H., Wang, Y. R., Song, X. Y., Cui, S. H., Xu, H. B., Yang, B. W., …Li, F. Q. (2014).
tn
628 Prevalence and quantification of Salmonella contamination in raw chicken carcasses at the
629 retail in China. Food Control, 44, 198-202. https://doi.org/10.1016/j.foodcont.2014.03.050
rin
ep
Pr
27
630 Tables
d
631 Table 1 Prevalence and MPN enumeration data of Salmonella obtained from 240 retail chilled
we
632 chicken samples.
No. (%) of total Distribution of positive samples

Dataset Sample size
positive samples 3 MPN/g >3-10 MPN/g >10-20 MPN/g >20 MPN/g
vie
Real sampling data 240 23 (9.58) 3 17 1 2
Simulated-A 240 63 (26.25) 0 22 26 15
Simulated-B 240 24 (10.00) 0 6 10 8
re
er
pe
ot
tn
rin
ep
Pr
28
634 Table 2 Parameter estimates and fit statistics of the Poisson, negative binomial, zero-inflated
d
635 Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial
we
636 distributions fitted to the real sampling dataset.
Mean (95% CI) a SD

Method AIC BIC
(MPN/g) (MPN/g)
Poisson 0.68 (0.00-2.00) 0.82 972.34 975.82
vie
Negative binomial 0.68 (0.00-3.00) 3.81 297.11 304.07
Zero-inflated Poisson 0.68 (0.00-7.00) 2.25 356.70 363.66
Hurdle Poisson 0.69 (0.00-7.00) 2.26 356.70 363.66
re
Hurdle negative binomial 0.68 (0.00-5.00) 2.83 290.33 300.77
637 a 95%CI: 95% confidence interval.
638
er
pe
ot
tn
rin
ep
Pr
29
d
we
641 distributions fitted to the simulated dataset A.
Mean (95% CI) a SD

Method AIC BIC
(MPN/g) (MPN/g)
Poisson 3.92 (1.00-7.00) 1.98 2997.95 3001.43
vie
Zero-inflated negative binomial 3.92 (0.00-21.00) 7.49 700.06 710.50
re
Hurdle Poisson 3.91 (0.00-18.00) 6.85 761.75 768.72

642 a 95%CI: 95% confidence interval. er
643
pe
ot
tn
rin
ep
Pr
30
d
we
646 distributions fitted to the simulated dataset B.
Mean (95% CI) a SD

Method AIC BIC
(MPN/g) (MPN/g)
Poisson 1.63 (0.00-4.00) 1.28 1992.82 1996.30
vie
Hurdle Poisson 1.63 (0.00-16.00) 4.01 350.23 357.20
re
647 a 95%CI: 95% confidence interval.
648
er
pe
ot
tn
rin
ep
Pr
31
649 Table 5 The mean values (MPN/g), standard deviation (MPN/g), and RMSEs of Salmonella
d
650 MPN counts in chilled chicken samples implemented by censored data methods.
we
Simulated-A Simulated-B Real sampling data
Method *
Mean (SD) RMSE Mean (SD) RMSE Mean (SD) RMSE
Baseline 3.92 (7.44) - 1.63 (5.39) - 0.72 (3.08) -
LOQ/2 5.02 (6.86) 1.29 3.00 (5.04) 1.43 2.07 (2.78) 1.43
LOQ/ 2 5.19 (6.77) 1.49 3.21 (4.99) 1.65 2.63 (2.67) 1.65
vie
LOQ 6.13 (6.30) 2.60 4.37 (4.70) 2.87 3.43 (2.52) 2.85
NB-MLE 3.92 (13.52) 1.01 1.63 (10.17) 0.99 0.68 (3.81) 1.00
HNB-MLE 3.92 (7.50) 0.99 1.60 (10.36) 1.03 0.68 (2.83) 1.08
MI-pmm 11.50 (8.62) 11.65 16.44 (7.35) 17.10 4.19 (6.47) 7.58
651 *Baseline: the method of substituting non-detects with zero; LOQ/2: the method of
re
652 substituting non-detects with LOQ/2; LOQ/ 2: the method of substituting non-detects with
653 LOQ/ 2; LOQ: the method of substituting non-detects with LOQ; NB-MLE: the negative
654 binomial distribution-based maximum likelihood estimation method; HNB-MLE: the hurdle
655 negative binomial distribution-based maximum likelihood estimation method; MI-pmm: the
656 multiple imputation method implemented by predict mean matching.
657
er
pe
ot
tn
rin
ep
Pr
32
658 Figures
d
we
vie
659
660 (a) (b)
re
661
er
662 (c)
pe
663 Fig. 1. Predictive distribution of Salmonella counts as modelled by the Poisson (PO), negative
664 binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), hurdle
665 Poisson (HP), and hurdle negative binomial (HNB) distributions with the real sampling dataset
ot
666 (a), simulated dataset A (b), and simulated dataset B (c). Circle markers represent observed
tn
667 probabilities of the data (Obs).

rin
ep
Pr
33
668
d
we
vie
669
670 (a) (b)
re
671
er
672 (c)
pe
673 Fig. 2. Cumulative probability plot of the Salmonella counts as modelled by the Poisson (PO),
674 negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB),
675 hurdle Poisson (HP), and hurdle negative binomial (HNB) distributions with the real
ot
676 sampling dataset (a), simulated dataset A (b), and simulated dataset B (c). Circle markers
tn
677 represent observed probabilities of the data (Obs).

rin
ep
Pr
34
678
d
we
vie
679
680 (a) (b)
re
681
er
682 (c)
pe
683 Fig. 3. Probability difference (observed minus fitted) in Salmonella counts as fitted by the
684 Poisson (PO), negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative
685 binomial (ZINB), hurdle Poisson (HP), and hurdle negative binomial (HNB) distributions with
ot
686 the real sampling dataset (a), simulated dataset A (b), and simulated dataset B (c).
tn
687
rin
ep
Pr
35

SSRN Id4377891

Uploaded by

Copyright:

Available Formats

SSRN Id4377891

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id4377891

Uploaded by

Copyright:

Available Formats

1 Methods for estimating Salmonella concentration in chilled chicken using highly left-

4 a, Hui Zhang c, Qingli Dong a, *

6 Technology, Shanghai, China

9 1 These authors contributed equally to this work

26 negative binomial (NB) distribution-based MLE and zero-modified NB distribution-based

31 method for handling bacterial highly left-censored data.

32 Keywords: Foodborne pathogen; Limit of detection; Count distribution; Maximum likelihood

33 estimation; Multiple imputation method

40 (Foley et al., 2013; Andrews-Polymenis et al., 2010).

42 consuming chicken products by concerned people is usually estimated by quantitative

46 Boue, 2018). Among above stages, implementation of exposure assessment depends on

48 samples (Danyluk & Schaffner, 2011; Pouillot et al., 2013).

59 destruction of biofilms, or the local growth of pathogens in non-liquid foods, contamination

61 2012a; Jongenburger et al., 2012b).

73 practice, some real-sampling counting outcomes may exhibit over-dispersion or clustered

89 of Salmonella in chilled chicken at the retail level.

90 2. Material and methods

91 2.1. Data collection

94 variation in Salmonella growth characteristics, a total of 12 different batches were conducted,

115 as a number (e.g., 10 MPN/g) or as below LOQ (< 3 MPN/g).

118 𝐶~Uniform(𝑎, 𝑏) (1)

130 2.2 Concentration estimation under left-censored data

137 NDs with zero as the baseline method of this study.

138 2.2.2. The distribution-based MLE methods

143 three datasets of Salmonella contamination levels in chicken samples.

156 𝐿𝐿P = ∑𝑛𝑖=1 [ ―𝜆 + 𝑌𝑖𝐿𝑛(𝜆) ― 𝐿𝑛(𝑌𝑖!)] (3)

157 (2) Negative binomial distribution

𝛤(𝑌𝑖 + 𝛼―1) 𝛼―1

174 (3) Zero-inflated Poisson distribution er

𝑝0 + (1 ― 𝑝0)𝑒𝑥𝑝( ―𝜆), 𝑌𝑖 < 𝐿𝑂𝑄

185 contamination level of Salmonella in the batch of chilled chicken samples.

∑𝑛𝑖=1 𝐿𝑛(1 ― 𝑝0) + 𝐿𝑛[𝛤(𝑌𝑖 + 𝛼 )] ― 𝐿𝑛[𝛤(𝑌𝑖 + 1)]

201 (5) Hurdle Poisson distribution

𝐼(𝑌𝑖 < 𝐿𝑂𝑄)(𝜔0) +

216 (6) Hurdle negative binomial distribution

218 be mathematically described as follows:

𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛(𝜔0)

224 𝐿𝐿𝐻𝑁𝐵 = +𝐼(𝑌𝑖 ≥ 𝐿𝑂𝑄)

233 (version 3.11.0) can be found at https://cran.r-project.org/web/packages/mice/mice.pdf .

240 Criterion (BIC),

241 𝐴𝐼𝐶 = 2𝑘1 ―2𝐿𝐿 (14)

242 𝐵𝐼𝐶 = 𝑘2𝐿𝑛(𝑛) ―2𝐿𝐿 (15)

258 3.1. Overall Salmonella prevalence and MPN determination

274 a significant amount of censoring, which is not uncommon in quantitative microbiological

275 investigations, but can make statistical inference challenging.

338 converge too.

351 3.3. Method comparison

369 below the LOQ of the detection method.

382 especially when the microbial data is highly censored.

395 distribution for Salmonella suggested in this study.

409 QMRA results may be improved to better predict foodborne infections.

414 Conceptualization, Investigation. Xin Dou: Conceptualization, Investigation. Xiang Wang:

415 Conceptualization, Methodology. Hui Zhang: Supervision. Qingli Dong: Conceptualization,

416 Project administration, Funding acquisition, Writing - review & editing.

417 Declaration of Competing Interest

418 The authors declare that there is no conflict of interest.

419 Data availability

420 Data will be made available on request.