SSRN Id4377891

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

1 Methods for estimating Salmonella concentration in chilled chicken using highly left-

d
2 censored contamination data

we
3 Tianmei Sun a,1, Yangtai Liu a,1, Shufei Gao b, Xiaojie Qin a, Zijie Lin a, Xin Dou a, Xiang Wang

4 a, Hui Zhang c, Qingli Dong a, *

vie
5 a School of Health Science and Engineering, University of Shanghai for Science and

6 Technology, Shanghai, China

7 b College of Science, University of Shanghai for Science and Technology, Shanghai, China

re
8 c Jiangsu Academy of Agricultural Sciences, Nanjing, China

9 1 These authors contributed equally to this work


er
10 * Corresponding author: Qingli Dong
pe
11 Mailing address: School of Health Science and Engineering, University of Shanghai for

12 Science and Technology, 516 Jun Gong Rd., Shanghai 200093, China.

13 Tel.: +86-21-5527-1117
ot

14 Email: [email protected]
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
15 Abstract

d
16 Salmonella is a common chicken-borne pathogen that causes human infections. Data below the

we
17 detection limit, referred to as left-censored data, are frequently encountered in the detection of

18 pathogens. The approach of handling the censored data was regarded to affect the estimation

vie
19 accuracy of microbial concentration. In this study, a set of Salmonella contamination data was

20 collected from chilled chicken samples using the most probable number (MPN) method, which

21 consisted of 90.42% (217/240) non-detect values. Two simulated datasets with fixed censoring

re
22 degrees of 73.60% and 90.00% were generated based on the real-sampling Salmonella dataset

23 for comparison. Three methodologies were applied for handling left-censored data: (i)
er
24 substitution with different alternatives, (ii) the distribution-based maximum likelihood
pe
25 estimation (MLE) method, and (iii) the multiple imputation (MI) method. For each dataset, the

26 negative binomial (NB) distribution-based MLE and zero-modified NB distribution-based

27 MLE were preferable for highly censored data and resulted in the least root mean square error
ot

28 (RMSE). Replacing the censored data with half the limit of quantification was the next best
tn

29 method. The mean concentration of Salmonella monitoring data estimated by the NB-MLE and

30 zero-modified NB-MLE methods was 0.68 MPN/g. This study provided an available statistical

31 method for handling bacterial highly left-censored data.


rin

32 Keywords: Foodborne pathogen; Limit of detection; Count distribution; Maximum likelihood

33 estimation; Multiple imputation method


ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
34 1. Introduction

d
35 Foodborne disease caused by Salmonella is an important public health concern worldwide

we
36 (Scallan et al., 2011). Salmonella can transmit among animals, foods, and humans, and is able

37 to survive in various environments (Batz et al., 2012). Among the risk sources, contaminated

vie
38 chicken and chicken-associated products are the most common triggers of human salmonellosis,

39 which may lead to gastroenteritis, enteric fever, typhoid fever, and bacteremia syndromes

40 (Foley et al., 2013; Andrews-Polymenis et al., 2010).

re
41 To effectively evaluate and control the effect of Salmonella on the public health, the risk of

42 consuming chicken products by concerned people is usually estimated by quantitative


er
43 microbiological risk assessment (QMRA) approach (Pouillot et al., 2012; Khalid et al., 2020;
pe
44 Xiao et al., 2021). A complete QMRA consists of four stages: (a) hazard identification, (b)

45 hazard characterization, (c) exposure assessment, and (d) risk characterization (Membre &

46 Boue, 2018). Among above stages, implementation of exposure assessment depends on


ot

47 microbiological data that characterize the detection and concentration of pathogens in food
tn

48 samples (Danyluk & Schaffner, 2011; Pouillot et al., 2013).

49 One of the challenges of monitoring pathogens in food samples is that available quantitative

50 data are limited because analyses are labor-intensive, and they are poorly informative due to a
rin

51 high proportion of non-detects (i.e., when a detection has been performed and the observation

52 is absence) (Commeau et al., 2012). In terms of statistics, observations below the limit of
ep

53 quantification (LOQ) are called left-censored data. As defined by Canales et al. (2018), the

54 censoring degree of microbial data can be divided into four levels: low (10%), medium (35%),
Pr

55 high (65%), and severe (90%). Ignoring these left-censored data may introduce bias into the

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
56 final concentration estimation. In practice, the seasonal and monthly prevalence of Salmonella

d
57 in chicken samples vary greatly, and the concentrations are relatively low (Ta et al., 2014; Zhu

we
58 et al., 2014). At the retail level, due to initial contamination multiply into colonies, the

59 destruction of biofilms, or the local growth of pathogens in non-liquid foods, contamination

vie
60 often occurs in clusters, resulting in highly censored concentration data (Jongenburger et al,

61 2012a; Jongenburger et al., 2012b).

62 In recent years, there are considerable developments and interests in methods for left-censored

re
63 data in environment- and health- related researches (Helsel, 2010; Krol et al., 2016; Petterson

64 et al., 2015). Substituting non-detects with a particular value (e.g., zero, LOQ, LOQ/2, or LOQ/
er
65 2) has been a common approach (Shoari & Dube, 2018). However, it is well recognized that
pe
66 substitution methods may introduce unnecessary errors, especially when the dataset is highly

67 censored (Helsel, 2010). Instead of using substitution methods, methods based on the maximum

68 likelihood estimation (MLE) have been regarded as the “gold standard” for handling censored
ot

69 data (Ganser & Hewett, 2010). MLE requires the assumption that a specific distribution should
tn

70 closely fit the shape of data (Helsel, 2012). When pathogens are randomly distributed in food,

71 a Poisson distribution (i.e., the variance is equal to the mean) is widely used as a standard

72 framework for the analysis of the observed count data (Mussida et al., 2013). However, in
rin

73 practice, some real-sampling counting outcomes may exhibit over-dispersion or clustered

74 phenomenon (i.e., variance is greater than the mean) (Sun et al., 2020). Under these
ep

75 circumstances, it is necessary to use more flexible models such as the heterogeneous Poisson

76 models and zero-modified models (Hinde & Demetrio, 1998; Gonzales-Barron et al., 2010).
Pr

77 The most commonly used heterogeneous Poisson distribution is the negative binomial which is

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
78 a suitable model for a mixture of Poisson and Gamma (Gonzales-Barron & Butler, 2011). The

d
79 zero-modified models are mainly divided into zero-inflated models and hurdle models, both of

we
80 which use a logit model with binomial assumptions to deal with non-detects and observed

81 counts (Mullahy, 1986; Gonzales-Barron et al., 2010). Among the more principled methods for

vie
82 dealing with left-censored data, multiple imputation (MI) has emerged as a popular approach

83 due to its flexibility and ability to provide unbiased estimates (Rezvan et al., 2015; Sullivan et

84 al., 2021). However, the performance of different MI methods remains to be explored when

re
85 dealing with highly censored microbial data.

86 The objectives of the present study are (i) to collect the concentration data of Salmonella in
er
87 chilled chicken in Shanghai, (ii) to evaluate substitution, MLE, and MI methods for handling
pe
88 left-censored data of Salmonella datasets, and (iii) to quantify precisely the contamination level

89 of Salmonella in chilled chicken at the retail level.

90 2. Material and methods


ot

91 2.1. Data collection


tn

92 Between October 2018 and September 2019, a total of 240 chilled chicken samples were

93 randomly sampled from retail markets in Shanghai, P. R. China. To account for seasonal

94 variation in Salmonella growth characteristics, a total of 12 different batches were conducted,


rin

95 in which the sample size of each batch was 20. Each sample was weighed, labelled, and placed

96 in a separately sterile bag before being transported to the laboratory for microbiological testing
ep

97 in an icebox within 2 h.

98 The quantitative analytical procedure for Salmonella was determined using the three-tube three-
Pr

99 dilution most probable number (MPN) method as described by the Food Safety and Inspection

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
100 Service of United States Department of Agriculture (USDA/FSIS, 2014, 2022) with a bit

d
101 modification. Briefly, a 25 g homogenized sample unit was added to 225 mL of butter peptone

we
102 water (BPW). Then, a triplicate-10 mL of this mixture was taken directly from the bags and

103 placed into three empty sterile tubes, transferring triplicate-1 mL from the original tubes to

vie
104 three further tubes containing 9 mL of BPW. A second 10-fold dilution in BPW was conducted

105 for each tube respectively. All tubes were incubated at 37°C for 20~24 h. For selective

106 enrichment, 0.1 ± 0.02 mL of pre-enriched culture was incubated into 10 mL tetrathionate

re
107 brilliant green broth (TTB) at 42°C and 10 mL selenite cystine broth (SC) at 37°C for 22~24 h.

108 A loopful of TTB and SC broth culture was streaked onto xylose lysine tergitol 4 (XLT4) agar
er
109 plates and incubated at 37°C for 22~24 h. Typical Salmonella colonies were selected from each
pe
110 plate and further confirmed. The MPN value was determined on the basis of the number of

111 positive or negative tube(s) in each of the three sets and the standard MPN table. Since the 10-

112 fold serial dilution in this quantitative analysis was from 10-1 to 10-3, the theoretical LOQ was
ot

113 determined to be 3 MPN/g according to the MPN calculation method derived by Jarvis et al
tn

114 (2010). The contamination level of this real-sampling quantitative dataset was either reported

115 as a number (e.g., 10 MPN/g) or as below LOQ (< 3 MPN/g).

116 Two more datasets were simulated to represent “true” distribution of Salmonella concentrations
rin

117 (C) in chilled chicken at retail. For each dataset, we assumed a uniform distribution:

118 𝐶~Uniform(𝑎, 𝑏) (1)


ep

119 where a is the theoretical LOQ value of MPN method; and b is the maximum value of the real-

120 sampling Salmonella dataset. In a previous study, we have investigated the pooled prevalence
Pr

121 of Salmonella in Chinese retail raw chicken is 26.4%, which means that approximately 73.6%

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
122 samples were censored (Sun et al., 2021). Additionally, we included a censoring rate similar to

d
123 that of the real-sampling dataset as a control to examine the relationship between method

we
124 selection and the censoring degree of data. Based on these, two copies of the simulated

125 Salmonella dataset were altered by a completely random censoring method, so that

vie
126 approximately 73.6% and 90% of the concentration data in the datasets were below the

127 theoretical LOQ in this study. Then, the methods for handling left-censored data were applied

128 to the real-sampling dataset (dataset R), 73.6% censored dataset (dataset A), and 90% censored

re
129 dataset (dataset B) and outcomes were compared.

130 2.2 Concentration estimation under left-censored data


er
131 2.2.1. Substitution methods
pe
132 The common substitution method is the easiest to implement and has the advantage of

133 automatically adapting to multiple constants (Ganser, & Hewett, 2010). In the three datasets of

134 Salmonella contamination in chilled chicken samples, each non-detect (ND) was simply
ot

135 replaced with an appropriate value-zero, LOQ, LOQ/2 or LOQ/ 2, and then the conventional
tn

136 statistical analysis was performed on each revised dataset. Particularly, we chose to replace

137 NDs with zero as the baseline method of this study.

138 2.2.2. The distribution-based MLE methods


rin

139 The MLE method for left-censored data is conducted by a likelihood function L, where for a

140 distribution with two parameters (mean and variance), L (mean, variance) defines the likelihood
ep

141 of matching with the observed data (Helsel, 2012). The distributions of Poisson, negative

142 binomial, zero-modified Poisson and zero-modified negative binomial were assumed further to
Pr

143 three datasets of Salmonella contamination levels in chicken samples.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
144 (1) Poisson distribution

d
145 The index of dispersion (I) is used to describe the distribution of microorganisms in food, which

we
146 is measured by the variance-to-mean ratio (𝜎2/𝜇) (Mussida et al., 2013). In the Poisson

147 distribution model, pathogen cells are generally considered to be randomly distributed in the

vie
148 samples (𝐼 = 1, i.e., the variance is equal to the mean). The single-parameter Poisson

149 distribution models the number of microorganisms in a simplest form where the mean

150 concentration (𝜆) is constant. The probability mass function of the Poisson distribution is given

re
151 by:
𝑌𝑖
𝑒𝑥𝑝( ―𝜆)𝜆
152 𝑃𝑟(𝑌𝑖) = 𝑌𝑖!
er (2)

153 Where Yi is the Salmonella count at observation i (note that i=1, 2, 3, …, n observations); and
pe
154 𝜆 describes the mean parameter of the Salmonella contamination levels.

155 In order to fit the Poisson distribution, the log-likelihood function (LL) can be obtained by:

156 𝐿𝐿P = ∑𝑛𝑖=1 [ ―𝜆 + 𝑌𝑖𝐿𝑛(𝜆) ― 𝐿𝑛(𝑌𝑖!)] (3)


ot

157 (2) Negative binomial distribution


tn

158 When the over-dispersion occurs in the microbial dataset (i.e., the variance is greater than the

159 mean), the physical distribution of the microorganisms is not suitable to be described by

160 Poisson regression. As Gonzales-Barron et al (2012) assumed that, the random variation of
rin

161 microbial concentration (cells/g) within the mass of a food batch follows a gamma distribution,

162 while the number of microbial counts (cells) in the sample follows a Poisson distribution with
ep

163 the parameter (𝜆) given by the random gamma concentration. Thus, the Poisson distribution

164 can be further generalized by adding a gamma distribution to adjust for the heterogeneity in the
Pr

165 count data. The above modification to the Poisson distribution yields a more flexible

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
166 distribution, namely the discrete Poisson-gamma distribution, also known as the negative

d
167 binomial distribution (NB). The discrete probability of the Salmonella contamination level Yi,

we
168 sampled from a batch of chilled chicken, can then be described as:

𝛤(𝑌𝑖 + 𝛼―1) 𝛼―1


)[ ] [ ]𝑌
𝛼―1 𝜆 𝑖
169 Pr (𝑌𝑖) = 𝛤(𝑌 + 1)𝛤(𝛼
―1 (4)
𝑖 𝛼―1 + 𝜆 𝛼―1 + 𝜆

vie
170 where 𝛤(𝛼) represents the gamma function; Yi is the number of Salmonella cells present in the

171 ith sample; and 𝜆 describes the mean parameter of the contamination levels.

172 For distribution fitting, the log-likelihood function of the NB distribution was calculated as:

re
173 𝐿𝐿NB = ∑𝑛𝑖=1 𝐿𝑛 [ [ 𝛤(𝑌𝑖 + 𝛼―1)
𝛤(𝑌𝑖 + 1)𝛤(𝛼―1)
] ― (𝑌 + 𝛼
𝑖
―1 )𝐿𝑛(1 + 𝛼𝜆) + 𝑌𝑖𝐿𝑛(𝛼𝜆) ] (5)

174 (3) Zero-inflated Poisson distribution er


175 When a high proportion of NDs appear in the microbial dataset, the variance function may not
pe
176 be accounted by the Poisson or even the NB distribution (Hall, 2000). The zero-inflated

177 distributions are used to model this excess of zeros. In a zero-inflated distribution model, it is

178 usually assumed that the pathogen counts originate from two groups, where one group is for
ot

179 the portion of NDs, and the other group will follow a standard count distribution (such as a
tn

180 Poisson distribution or a NB distribution). If the standard count distribution is a Poisson, the

181 probability mass function of the zero-inflated Poisson distribution (ZIP) is given by:

𝑝0 + (1 ― 𝑝0)𝑒𝑥𝑝( ―𝜆), 𝑌𝑖 < 𝐿𝑂𝑄


{
rin

182 Pr (𝑌𝑖) = 𝑌𝑖
(1 ― 𝑝0) 𝑒𝑥𝑝( ―𝜆)𝜆 , 𝑌𝑖 ≥ 𝐿𝑂𝑄 (6)
𝑌! 𝑖

183 where Yi is the number of Salmonella cells present in the ith sample; p0 is the probability for the
ep

184 occurrence of zero counts resulting from uncontaminated sample units; and 𝜆 is the

185 contamination level of Salmonella in the batch of chilled chicken samples.


Pr

186 In order to fit the ZIP distribution, the log-likelihood function was expressed as:

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛[𝑝0 + (1 ― 𝑝0)𝑒𝑥𝑝( ―𝜆)] +
187 𝐿𝐿ZIP = ∑𝑛𝑖=1 ( [
𝐼 𝑌𝑖 ≥ 𝐿𝑂𝑄)[𝐿𝑛(1 ― 𝑝0) + 𝑌𝑖𝐿𝑛(𝜆) ― 𝜆 ― 𝐿𝑛(𝑌𝑖!)] ] (7)

d
188 (4) Zero-inflated negative binomial distribution

we
189 In some cases, even if the NDs are adequately fitted, the over-dispersion phenomenon of the

190 fundamental count distribution may still exist (Wang & Hailemariam, 2018b). If the zero-

vie
191 inflated model fails to take this over-dispersion into account, the probability distribution may

192 lead to biased estimates. Substituting Poisson portions in Eq. (6) by the NB yields the zero-

193 inflated negative binomial distribution (ZINB), whose probability mass function becomes:

re
{
𝛼―1
𝑝0 + (1 ― 𝑝0) [ 𝛼―1
𝛼―1 + 𝜆
] , 𝑌 < 𝐿𝑂𝑄𝑖
194 Pr (𝑌𝑖) = (8)
[ )[
] ][ ] ], 𝑌 ≥ 𝐿𝑂𝑄
―1
𝛼
𝛤(𝑌𝑖 + 𝛼 )
[ 𝑌𝑖
―1
𝛼―1 𝜆
(1 ― 𝑝0) 𝑖
) (
𝛤(𝑌𝑖 + 1 𝛤 𝛼―1 𝛼―1 + 𝜆
er 𝛼―1 + 𝜆

195 where 𝛤(𝛼) is the gamma function; Yi represents the number of Salmonella cells present in the
pe
196 ith sample; p0 is the probability for the occurrence of zero counts resulting from uncontaminated

197 sample units; and 𝜆 is the contamination level of Salmonella in the batch of chilled chicken

198 samples.
ot

199 The log-likelihood function of the ZINB distribution to be fitted was then be computed as:

[
𝛼―1

]
(
𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛 𝑝0 + (1 ― 𝑝0)(𝛼―1 (𝛼―1 + 𝜆)) )+
tn

∑𝑛𝑖=1 𝐿𝑛(1 ― 𝑝0) + 𝐿𝑛[𝛤(𝑌𝑖 + 𝛼 )] ― 𝐿𝑛[𝛤(𝑌𝑖 + 1)]


[
―1
200 𝐿𝐿ZINB =
𝐼(𝑌𝑖 ≥ 𝐿𝑂𝑄)
―𝐿𝑛[𝛤(𝛼―1)] + 𝑌𝑖𝐿𝑛(𝛼𝜆) ― (𝑌𝑖 + 𝛼―1)𝐿𝑛(1 + 𝛼𝜆) ] (9)

201 (5) Hurdle Poisson distribution


rin

202 The hurdle distribution model is another zero-modified model that can deal with a high

203 percentage of NDs. The idea underlying the hurdle model is that a binomial probability
ep

204 distribution, which determines whether the microbial data is NDs or positive counts (> LOQ).

205 If the microbial observation is positive, the ‘hurdle’ is crossed, and the conditional distribution
Pr

206 of a positive count is governed by a truncated-at-zero distribution model (Mullahy, 1986). The

10

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
207 hurdle Poisson model (HP) is the most popular hurdle model, which includes a truncated

d
208 Poisson distribution to describe the positive counts in the microbial dataset. Thus, the

we
209 probability mass function of HP distribution is given by:

{
𝜔0, 𝑌𝑖 < 𝐿𝑂𝑄
210 𝑃𝑟(𝑌𝑖) = (1 ― 𝜔 ) 𝑒𝑥𝑝( ―𝜆)𝜆𝑌𝑖 , 𝑌 ≥ 𝐿𝑂𝑄 (10)
0 (1 ― 𝑒𝑥𝑝( ―𝜆))𝑌 ! 𝑖
𝑖

vie
211 where Yi is the number of Salmonella cells present in the ith sample; 𝜆 is the concentration of

212 Salmonella in the chilled chicken samples; and 𝜔0 is the probability for the zero counts arising

re
213 from the negative samples.

214 The log-likelihood function for fitting the HP distribution to count data is illustrated as:

𝐼(𝑌𝑖 < 𝐿𝑂𝑄)(𝜔0) +


[ ]
215
𝑖 0 𝑖
er
𝐿𝐿HP = ∑𝑛𝑖=1 𝐼(𝑌 ≥ 𝐿𝑂𝑄)[𝐿𝑛(1 ― 𝜔 ) ― 𝜆 + 𝑌 𝐿𝑛(𝜆) ― 𝐿𝑛(1 ― 𝑒𝑥𝑝( ―𝜆)) ― 𝐿𝑛(𝑌 !)] (11)
𝑖

216 (6) Hurdle negative binomial distribution


pe
217 Similarly, the probability mass function of the hurdle negative binomial distribution (HNB) can

218 be mathematically described as follows:

{
𝜔0, 𝑌𝑖 < 𝐿𝑂𝑄
ot

[ ]
(
𝛤 𝑌𝑖 + 𝛼―1) (
― 𝑌𝑖+𝛼―1 ) 𝑌𝑖
(1 + 𝛼𝜆) (𝜆𝛼)
219 𝑃𝑟(𝑌𝑖) = (1 ― 𝜔0) 𝛤(𝑌𝑖 + 1)𝛤(𝛼―1)
, 𝑌𝑖 ≥ 𝐿𝑂𝑄 (12)
𝛼―1
1― ( 𝛼―1
𝛼―1 + 𝜆
)
tn

220 where 𝛤(𝛼) represents the gamma function; Yi is the number of Salmonella cells present in the

221 ith sample; 𝜆 describes the mean parameter of the contamination levels; and 𝜔0 is the probability
rin

222 for the zero counts arising from the negative samples.

223 The log-likelihood function of the HNB distribution to fitted was given by:
ep

𝐼(𝑌𝑖 < 𝐿𝑂𝑄)𝐿𝑛(𝜔0)

[ [ ]
𝛼―1
𝑛
𝐿𝑛(1 ― 𝜔0) + 𝐿𝑛[𝛤(𝑌𝑖 + 𝛼 ―1
[ ( )]
)] ― 𝐿𝑛 1 ―
𝛼―1
𝛼―1 + 𝜆
Pr

224 𝐿𝐿𝐻𝑁𝐵 = +𝐼(𝑌𝑖 ≥ 𝐿𝑂𝑄)


𝑖=1 ―𝐿𝑛(𝛤(𝑌𝑖 + 1)) ― 𝐿𝑛(𝛤(𝛼―1)) ― (𝑌𝑖 + 𝛼―1)𝐿𝑛(1 + 𝛼𝜆)
+ 𝑌𝑖𝐿𝑛(𝜆𝛼)

11

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
225 (13)

d
226 2.2.3. The MI method

we
227 MI is a way to solve the problem of complex incomplete data (Rubin, 1996). In this study, the

228 multivariate-imputation by chained equations method (MICE) was used. To estimate the

vie
229 parameters of the three highly-censored Salmonella datasets, the technique was coded as a

230 function named as predict mean matching (pmm). The MI-pmm method is a general semi-

231 parametric imputation method, which is characterized by imputing to the censored values in

re
232 the dataset based on observations. The MI-pmm method implemented by the R package ‘mice’

233 (version 3.11.0) can be found at https://cran.r-project.org/web/packages/mice/mice.pdf .


er
234 2.3. Statistical analysis
pe
235 The distributions of Poisson, NB, ZIP, ZINB, HP and HNB were fitted as MLE methods to

236 both three Salmonella datasets. The goodness-of-fit (GOF) statistics and estimated parameters

237 of each distribution model were calculated. A standard method for comparing the models is to
ot

238 refer to the information criteria based on the fitted log-likelihood (LL) function. Commonly
tn

239 used approaches include Akaike Information Criterion (AIC) and Bayesian Information

240 Criterion (BIC),

241 𝐴𝐼𝐶 = 2𝑘1 ―2𝐿𝐿 (14)


rin

242 𝐵𝐼𝐶 = 𝑘2𝐿𝑛(𝑛) ―2𝐿𝐿 (15)

243 where 𝑘1 and 𝑘2 represent the number of parameters in the model, and n represents the number
ep

244 of observations.

245 Also, the cumulative probability plots and observed minus predicted probability plots of these
Pr

246 distribution models were used to illustrate the quality of the fit. For each distribution, predicted

12

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
247 probabilities were computed by the probability mass functions shown in Eqs. (2), (4), (6), (8),

d
248 (10) and (12).

we
249 In addition, comparisons among substitution methods, distribution-based MLE methods, and

250 MI method were performed by root mean square errors (RMSEs). The formula of the RMSE

vie
251 was as the following:

2
252 𝑅𝑀𝑆𝐸 = 1
∑𝑛𝑖=1 (𝑌𝑖 ― 𝑌𝑖) (16)
𝑛

253 where 𝑌𝑖 is the observed value (i.e., the baseline data in this study), 𝑌𝑖 is the predicted value,

re
254 and n represents the number of observations. A lower RMSE value indicates closer estimation

255 to the known value. All analyses were performed in R (Version 3.6.0, http://www.R-
er
256 project.org/).
pe
257 3. Results and Discussion

258 3.1. Overall Salmonella prevalence and MPN determination

259 A total of 240 retail chicken samples were collected, of which 9.58% (23/240) were positive
ot

260 for Salmonella contamination. The quantitative microbial method used in this study quantified
tn

261 the load values of Salmonella in chicken samples with the MPN as the unit, ranging from 3

262 MPN/g to 29 MPN/g. Table 1 summarized the prevalence and MPN enumeration data of

263 Salmonella in retail chicken samples. According to the quantitative analysis, most Salmonella-
rin

264 positive chicken samples (17 out of 23) were contaminated at a level between 3 and 10 MPN/g,

265 with only three samples exceeding 10 MPN/g. The quantitative detection results of Salmonella
ep

266 in retail chicken samples were consistent with the results reported from previous studies

267 conducted in China (Wang et al., 2014; Zhu et al., 2014; Yang et al., 2020). As shown in Table
Pr

268 1, the number of Salmonella in the majority of these samples was below the LOQ of the

13

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
269 microbiological method, which was 3 MPN/g. In the real sampling dataset, 217 out of 240

d
270 samples were non-detected, that is, left-censored data were included. The number of censored

we
271 data in the simulated datasets A and B were 177 and 216, respectively. In other words, the order

272 according to the degree of left-censoring of the data is: real sampling dataset, simulated dataset

vie
273 B, and simulated dataset A. With most observations below the LOQ, the count dataset contains

274 a significant amount of censoring, which is not uncommon in quantitative microbiological

275 investigations, but can make statistical inference challenging.

re
276 3.2. Fitting of data by probabilistic distributions

277 Each dataset, including 73.75%, 90.00% and 90.42% left-censored data of the data according
er
278 to Salmonella contamination levels, was fitted to the Poisson, NB, ZIP, ZINB, HP, and HNB
pe
279 distributions. When fitting these six distributions to the real sampling dataset of Salmonella,

280 the better fit was obtained with the NB distribution and the HNB distribution (lower AIC and

281 BIC) (Table 2). It can be noted that the NB distribution produced a better fit than Poisson
ot

282 distribution. As expected, Poisson distribution does not fit well since it has no model parameters
tn

283 to account for over-dispersion and zero-inflation (Sun et al., 2019). Therefore, it is necessary

284 to introduce this heterogeneity term to adapt to the unobserved heterogeneity in the count data,

285 resulting in a more flexible heterogeneous Poisson distribution (i.e., the NB distribution). Table
rin

286 2 also showed that the standard deviation (SD) of the NB distribution was 3.81, which was no

287 longer a compressed value as the over-dispersion had been well explained. With regard to the
ep

288 zero-modified distributions (i.e., ZIP, ZINB, HP, and HNB), adding the zero-inflated part to

289 the count data did not improve the fit obtained by simple NB distribution, except for the HNB
Pr

290 distribution. It can be visualized in Fig. 1(a) and Fig. 2(a), the observed probabilities of the real

14

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
291 sampling data of Salmonella (circle markers) against Poisson distribution (dash lines) indicated

d
292 that Poisson distribution underestimated the level of zero counts existed in the dataset. The

we
293 probability difference plot of Fig. 3(a) showed that the probability values predicted by the HNB

294 distribution and the NB distribution were closer to the observed values of Salmonella. Owing

vie
295 to the Poisson distribution is much too restrictively by definition, the ZIP distribution and the

296 HP distribution were capable only to address the level of zero counts and still underestimated

297 the SD of the data (see Table 2). However, when fitting the real sampling data of Salmonella

re
298 in chilled chicken samples with a censoring rate of 90.42%, the ZINB distribution did not

299 converge. Based on these findings, it is assumed that the NB or HNB distributions can be
er
300 applied to estimate the Salmonella contamination level with a mean of 0.68 MPN/g.
pe
301 For the simulated dataset A, the fitting characteristics of Poisson, NB, ZIP, ZINB, HP, and

302 HNB distributions were basically consistent with the real sampling dataset of Salmonella. In

303 Table 3, the ZINB distribution and the HNB distribution showed a better fit to the over-
ot

304 dispersion dataset which contained left-censored data. What’s more, the statistical performance
tn

305 of the ZINB distribution was almost the same as those of the HNB distribution. The mean level

306 of Salmonella contamination in the simulated dataset A, which was obtained by the ZINB or

307 HNB distribution, was 3.92 MPN/g. Probability density (or mass) estimates for Salmonella
rin

308 counts of simulated dataset A were shown in Fig. 1(b). Compared with the ZINB distribution

309 and the HNB distribution, although the NB distribution was sufficient to model the data, it
ep

310 sightly underestimated the probability of zero counts in this dataset. In Fig. 2(b) and Fig. 3 (b),

311 the cumulative probability and probability difference of the ZINB distribution followed a
Pr

312 similar trend to those of the HNB distribution. While the NB distribution provides the simplest

15

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
313 way of modeling microbial data, the zero-modified NB distributions are expected to outperform

d
314 the NB distribution due to their capacity to describe the extra zero counts (Gonzales-Barron et

we
315 al., 2010).

316 For the simulated dataset B, the fitting results showed two similarities with the simulated

vie
317 dataset A and the real sampling dataset (Table 4). Firstly, the NB distribution fitted the count

318 data better than Poisson distribution (lower AIC and BIC). Secondly, the zero-modified

319 distributions were not too much better than the simple NB distribution in fitting the over-

re
320 dispersion data. In addition, although the proportion of NDs in the simulated dataset B was

321 similar to that in the real sampling dataset of Salmonella, there was a slight difference in the
er
322 fitting results of these six distribution models compared with the real sampling dataset. In Table
pe
323 4, it can be found that the ZIP distribution and the HP distribution had lower AIC and BIC,

324 producing a little better fit than other distributions (i.e., NB or HNB distribution). For

325 comparison, it was shown in Fig. 1(c) and Fig. 2(c) that the HP distribution was slightly
ot

326 overestimated than the ZIP distribution at zero values. However, according to the SDs of the
tn

327 simulated dataset B obtained with these six distributions (Table 4), the SDs of the NB-based

328 distributions were greater than those of Poisson-based distributions. It can be explained that

329 while the zero-modified Poisson distributions can account for the probability at zero, the count
rin

330 distribution component still requires more flexibility, which can only be provided by a

331 heterogeneous count distribution like the NB distribution (Gonzales-Barron et al., 2010). The
ep

332 use of the NB distribution has already been used to describe the clustering of Campylobacter

333 contamination in broilers by better handling the heterogeneity within the dataset (Reich et al.,
Pr

334 2018). According to the probability difference plot in Fig. 3(c), the probability values predicted

16

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
335 by the NB distribution, the ZIP distribution, and the HP distribution were close to the

d
336 Salmonella counts in the dataset. Similarly, when fitting the simulated dataset of Salmonella in

we
337 chilled chicken samples with a censoring rate of 90.00%, the ZINB distribution did not

338 converge too.

vie
339 Based on these findings, this study demonstrated the superiority of the NB-based distributions

340 over the Poisson-based distributions for representing highly censored data. On one hand, the

341 NB distribution is better at describing low microbial counts than Poisson distribution, so it

re
342 constitutes the main framework for the derivation of the low-incidence microbial statistical

343 model (Gonzales-Barron et al., 2014). On the other hand, the central tendency estimate of the
er
344 NB distribution is the ‘arithmetic average’, which can reflect the actual concentration more
pe
345 accurately than the mean logs (geometric means as found by the lognormal and Poisson-

346 lognormal distributions) (Gonzales-Barron & Butler, 2011; Jongenburger et al., 2015). A

347 previous study showed that the ZINB distribution were capable to represent data of 13% NDs
ot

348 (coliforms data) and of 42% NDs (E.coli) with considerable accuracy (Gonzales-Barron et al.,
tn

349 2010), while the ability to fit the severely censored data (≥90%) in this study has yet to be

350 proven.

351 3.3. Method comparison


rin

352 Under high censoring, substitution method, distribution-based MLE method, and MI method

353 were assessed to left-censored data of microbial counts of Salmonella in chilled chicken
ep

354 samples. Estimated parameters and statistical performance were presented in Table 5. In this

355 work, in addition to consider a non-detect sample as zero (baseline), we also evaluated the
Pr

356 treatment of the remaining three alternatives (i.e., LOQ/2, LOQ/ 2 , and LOQ). It can be noted

17

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
357 that the RMSEs increased when NDs were replaced in the following order: LOQ/2, LOQ/ 2 ,

d
358 and LOQ. Among the different alternatives evaluated, it seems more reasonable to use a

we
359 corresponding LOQ/2 or LOQ/ 2 for each ND than to replace all of them by a unique value

360 like the average or median of the dataset (Rajal et al., 2007; Poma et al., 2019). For the real

vie
361 sampling dataset, the mean level of Salmonella contamination was 2.07 MPN/g and 2.63

362 MPN/g when the LOQ/2 and LOQ/ 2 were used to represent NDs, respectively. Additionally,

363 it is well known that any negative result is associated with LOQ (which can be modified by

re
364 more sensitive and efficient methods), so the common practice of substituting ND with zero is

365 unrealistic (Poma et al., 2019). Using zero may result in an underestimation of public health
er
366 risks if a decision is made to protect public health based on these estimations. Conversely, when
pe
367 the value related to LOQ is used to replace the ND, it can indicate a possible false-negative

368 case, that is, there is still contamination with the target pathogen although the concentration is

369 below the LOQ of the detection method.


ot

370 As mentioned before, the NB distribution and the HNB distribution can account for the large
tn

371 amount of NDs in the dataset. In a comparison of all measures of performance, NB-MLE and

372 HNB-MLE were overall most superior methods, which yielded similar descriptive statistics and

373 small RMSE. For the real sampling dataset, the mean concentration of Salmonella on raw
rin

374 chicken samples obtained by both the NB-MLE and HNB-MLE approaches was 0.68 MPN/g.

375 Earlier, Gonzales-Barron et al. (2014) pointed out that there was not a significant difference in
ep

376 the microbial prediction between the NB distribution and the HNB distribution for the pre-chill

377 batches of beef carcasses as well. In general, a better understanding of the actual physical
Pr

378 distribution of pathogens can often improve decision-making on food quality and safety issues.

18

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
379 (Wang, & Hailemariam, 2018a). A simple MLE method was found to be ineffective in fitting

d
380 highly censored data because the distribution selection was not considered (Canales et al., 2018).

we
381 Thus, the choice of the probability distribution has more important impact on the risk estimation

382 especially when the microbial data is highly censored.

vie
383 The MI-pmm method did not perform as well as other methods in this study. This meant that,

384 at worst, the method of imputing NDs based on predicted mean matching greatly overpredicted

385 the concentration of Salmonella (i.e., the mean concentration of Salmonella in the real sampling

re
386 dataset was 4.19 MPN/g). Although the MI method has performed well in other simulation

387 studies, for example, it has been noted that the lognormal distribution-based MI method and
er
388 the uniform distribution-based MI method are more suitable for fitting highly left-censored data
pe
389 than the simple MLE method (Canales et al., 2018). However, the misspecification of the

390 distribution will also cause a certain bias. If the assumed distribution is not the distribution

391 followed by the real data, this may lead to poor performance of the method assuming a
ot

392 particular distribution (Shoari et al., 2015). Furthermore, the adoption of a more appropriate
tn

393 distribution when using the MI method can offer better accuracy in handling highly left-

394 censored data of specific foodborne pathogens, such as the NB (or zero-modified NB)

395 distribution for Salmonella suggested in this study.


rin

396 4. Conclusions

397 The methodology used to find the optimal fit for microbial data is critical, especially in the case
ep

398 of highly left-censored data, because different approaches can lead to very different quantitative

399 estimates. The MLE method based on the NB model or the zero-modified NB model can help
Pr

400 to estimate the contamination level of Salmonella more precisely when the count data is highly

19

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
401 censored. Based on the monitoring data on Salmonella, the mean level of contamination

d
402 obtained by the NB-MLE or zero-modified NB-MLE model was 0.68 MPN/g. Regarding the

we
403 substitution methods for treating censored data, replacing NDs by LOQ/2 seems to be the next

404 best method. Overall, the MI-pmm method alone generated poor outcomes. In the future, both

vie
405 the MLE method and the MI method could be expanded by selecting suitable distributions joint

406 alternative values for left-censored data, thereby bridging the current gap in solving commonly

407 experienced detection of limit issues in microbiology. If there is the aim of correctly

re
408 interpreting the left-censored data for foodborne pathogens by using a more reliable model, the

409 QMRA results may be improved to better predict foodborne infections.


er
410 CRediT authorship contribution statement
pe
411 Tianmei Sun: Methodology, Data curation, Software, Formal analysis, Writing - original draft.

412 Yangtai Liu: Conceptualization, Methodology, Writing - review & editing. Shufei Gao:

413 Methodology, Writing - review & editing. Xiaojie Qin: Validation, Resources. Zijie Lin:
ot

414 Conceptualization, Investigation. Xin Dou: Conceptualization, Investigation. Xiang Wang:


tn

415 Conceptualization, Methodology. Hui Zhang: Supervision. Qingli Dong: Conceptualization,

416 Project administration, Funding acquisition, Writing - review & editing.

417 Declaration of Competing Interest


rin

418 The authors declare that there is no conflict of interest.

419 Data availability


ep

420 Data will be made available on request.

421 Acknowledgments
Pr

422 This study was supported by the National Nature Science Foundation of China (Grant

20

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
423 No.32102111) and Shanghai Agriculture Applied Technology Development Program, China

d
424 (Grant No.X2021-02-08-00-12-F00782).

we
vie
re
er
pe
ot
tn
rin
ep
Pr

21

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
425 References

d
426 Andrews-Polymenis, H. L., Baumler, A. J., McCormick, B. A., & Fang, F. C. (2010). Taming
427 the elephant: Salmonella biology, pathogenesis, and prevention. Infection and Immunity, 78(6),

we
428 2356-2369. https://doi.org/10.1128/IAI.00096-10

429

430 Batz, M. B., Hoffmann, S., & Morris, J. G. (2012). Ranking the disease burden of 14 pathogens
431 in food sources in the United States using attribution data from outbreak investigations and

vie
432 expert elicitation. Journal of Food Protection, 75(7), 1278-1291. https://doi.org/10.4315/0362-
433 028X.JFP-11-418

434

re
435 Canales, R. A., Wilson, A. M., Pearce-Walker, J. I., Verhougstraete, M. P., & Reynolds, K. A.
436 (2018). Methods for handling left-censored data in quantitative microbial risk assessment.
437 Applied and Environmental Microbiology, 84(20), e01203-18.
438 https://doi.org/10.1128/AEM.01203-18

439
er
440 Commeau, N., Parent, E., Delignette-Muller, M. L., & Cornu, M. (2012). Fitting a lognormal
441 distribution to enumeration and absence/presence data. International Journal of Food
pe
442 Microbiology, 155(3), 146-152. https://doi.org/10.1016/j.ijfoodmicro.2012.01.023

443

444 Danyluk, M. D., & Schaffner, D. W. (2011). Quantitative assessment of the microbial risk of
445 leafy greens from farm to consumption: preliminary framework, data, and risk estimates.
ot

446 Journal of Food Protection, 74(5), 700-708. https://doi.org/10.4315/0362-028X.JFP-10-373

447
tn

448 Foley, S. L., Johnson, T. J., Ricke, S. C., Nayak, R., & Danzeisen, J. (2013). Salmonella
449 pathogenicity and host adaptation in chicken-associated serovars. Microbiology and Molecular
450 Biology Reviews, 77(4), 582-607. https://doi.org/10.1128/MMBR.00015-13

451
rin

452 Ganser, G. H., & Hewett, P. (2010). An accurate substitution method for analyzing censored
453 data. Journal of Occupational and Environmental Hygiene, 7(4), 233-244.
454 https://doi.org/10.1080/15459621003609713
ep

455

456 Gonzales-Barron, U., Kerr, M., Sheridan, J. J., & Butter, F. (2010). Count data distributions
457 and their zero-modified equivalents as a framework for modelling microbial data with a
Pr

458 relatively high occurrence of zero counts. International Journal of Food Microbiology, 136(3),
459 268-277. https://doi.org/10.1016/j.ijfoodmicro.2009.10.016

460

22

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
461 Gonzales-Barron, U., & Butler, F. (2011). A comparison between the discrete Poisson-gamma
462 and Poisson-lognormal distributions to characterise microbial counts in foods. Food Control,

d
463 22(8), 1279-1286. https://doi.org/10.1016/j.foodcont.2011.01.029

we
464

465 Gonzales-Barron, U., Lenahan, M., Sheridan, J., & Butler, F. (2012). Use of a Poisson-gamma
466 model to assess the performance of the EC process hygiene criterion for Enterobacteriaceae
467 on Irish sheep carcasses. Food Control, 25(1), 172-183.
468 https://doi.org/10.1016/j.foodcont.2011.10.035

vie
469

470 Gonzales-Barron, U., Cadavez, V., & Butler, F. (2014). Conducting inferential statistics for
471 low microbial counts in foods using the Poisson-gamma regression. Food Control, 37, 385-

re
472 394. https://doi.org/10.1016/j.foodcont.2013.09.032

473

474 Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case
475 study. Biometrics, 56(4), 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x

476
er
477 Helsel, D. (2010). Much ado about next to nothing: incorporating nondetects in science. Annals
pe
478 of Occupational Hygiene, 54(3), 257-262. https://doi.org/10.1093/annhyg/mep092

479

480 Helsel, D. R. (2012). Statistics for censored environmental data using Minitab and R (2nd ed.).
481 John Wiley & Sons, Inc., Hoboken, New Jersey (Chapter 10).
ot

482

483 Hinde, J., & Demetrio, C. G. B. (1998). Overdispersion: models and estimation. Computational
tn

484 Statistics & Data Analysis, 27(2), 151-170. https://doi.org/10.1016/S0167-9473(98)00007-3

485

486 Jarvis, B., Wilrich, C., & Wilrich, P. T. (2010). Reconsideration of the derivation of most
rin

487 probable numbers, their standard deviations, confidence bounds and rarity values. Journal of
488 Applied Microbiology, 109(5), 1660-1667. https://doi.org/10.1111/j.1365-2672.2010.04792.x

489
ep

490 Jongenburger, I., Bassett, J., Jackson, T., Zwietering, M. H., & Jewell, K. (2012a). Impact of
491 microbial distributions on food safety I. Factors influencing microbial distributions and
492 modelling aspects. Food Control, 26(2), 601-609.
493 https://doi.org/10.1016/j.foodcont.2012.02.004
Pr

494

495 Jongenburger, I., Bassett, J., Jackson, T., Gorris, L. G. M., Jewell, K., & Zwietering, M. H.

23

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
496 (2012b). Impact of microbial distributions on food safety II. Quantifying impacts on public
497 health and sampling. Food Control, 26(2), 546-554.

d
498 https://doi.org/10.1016/j.foodcont.2012.01.064

we
499

500 Jongenburger, I., den Besten, H. M. W., & Zwietering, M. H. (2015). Statistical aspects of food
501 safety sampling. In M. P. Doyle, & T. R. Klaenhammer (Eds). Annual review of food science
502 and technology (vol. 6, pp. 479-503). Palo Alto: Annual Reviews.

vie
503

504 Khalid, T., Hdaifeh, A., Federighi, M., Cummins, E., Boue, G., Guillou, S., & Tesson, V.
505 (2020). Review of quantitative microbial risk sssessment in poultry meat: the central position
506 of consumer behavior. Foods, 9(11), 1661. https://doi.org/10.3390/foods9111661

re
507

508 Krol, A., Ferrer, L., Pignon, J. P., Proust-Lima, C., Ducreux, M., Bouche, O., …Rondeau, V.
509 (2016). Joint model for left-censored longitudinal data, recurrent events and terminal event:
510 predictive abilities of tumor burden for cancer evolution with application to the FFCD 2000-05
511
er
trial. Biometrics, 72(3), 907-916. https://doi.org/10.1111/biom.12490

512
pe
513 Membre, J. M., & Boue, G. (2018). Quantitative microbiological risk assessment in food
514 industry: Theory and practical application. Food Research International, 106, 1132-1139.
515 https://doi.org/10.1016/j.foodres.2017.11.025

516
ot

517 Mullahy, J. (1986). Specification and testing of some modified count data models. Journal of
518 Econometrics, 33 (3), 341-365. https://doi.org/10.1016/0304-4076(86)90002-3
tn

519

520 Mussida, A., Gonzales-Barron, U., & Butler, F. (2013). Effectiveness of sampling plans by
521 attributes based on mixture distributions characterising microbial clustering in food. Food
522 Control, 34(1), 50-60. https://doi.org/10.1016/j.foodcont.2013.04.001
rin

523

524 Petterson, S., Grondahl-Rosado, R., Nilsen, V., Myrmel, M., & Robertson, L. J. (2015).
525 Variability in the recovery of a virus concentration procedure in water: implications for QMRA.
ep

526 Water Research, 87, 79-86. https://doi.org/10.1016/j.watres.2015.09.006

527

528 Poma, H. R., Kundu, A., Wuertz, S., & Rajal, V. B. (2019). Data fitting approach more critical
Pr

529 than exposure scenarios and treatment of censored data for quantitative microbial risk
530 assessment. Water Research, 154, 45-53. https://doi.org/10.1016/j.watres.2019.01.041

531

24

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
532 Pouillot, R., Garin, B., Ravaonindrina, N., Diop, K., Ratsitorahina, M., Ramanantsoa, D., &
533 Rocourt, J. (2012). A risk assessment of campylobacteriosis and salmonellosis linked to

d
534 chicken meals prepared in households in Dakar, Senegal. Risk Analysis, 32(10), 1798-1819.
535 https://doi.org/10.1111/j.1539-6924.2012.01796.x

we
536

537 Pouillot, R., Hoelzer, K., Chen, Y. H., & Dennis, S. (2013). Estimating probability distributions
538 of bacterial concentrations in food based on data generated using the most probable number
539 (MPN) method for use in risk assessment. Food Control, 29(2),350-357.

vie
540 https://doi.org/10.1016/j.foodcont.2012.05.041

541

542 Rajal, V. B., McSwain, B. S., Thompson, D. E., Leutenegger, C. M., Kildare, B. J., & Wuertz,

re
543 S. (2007). Validation of hollow fiber ultrafiltration and real-time PCR using bacteriophage PP7
544 as surrogate for the quantification of viruses from water samples. Water Research, 41(7), 1411-
545 1422. https://doi.org/10.1016/j.watres.2006.12.034

546 er
547 Reich, F., Valero, A., Schill, F., Bungenstock, L., & Klein, G. (2018). Characterisation of
548 Campylobacter contamination in broilers and assessment of microbiological criteria for the
549 pathogen in broiler slaughterhouses. Food Control, 87, 60-69.
pe
550 https://doi.org/10.1016/j.foodcont.2017.12.013

551

552 Rezvan, P.H., Lee, K. J., & Simpson, J. A. (2015). The rise of multiple imputation: a review of
553 the reporting and implementation of the method in medical research. BMC Medical Research
ot

554 Methodology, 15, Article 30. https://doi.org/10.1186/s12874-015-0022-1

555
tn

556 Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical
557 Association, 91(434), 473-489. https://doi.org/10.2307/2291635

558
rin

559 Scallan, E., Hoekstra, R. M., Angulo, F. J., Tauxe, R. V., Widdowson, M. A., Roy, S.
560 L., …Griffin, P. M. (2011). Foodborne illness acquired in the United States-major pathogens.
561 Emerging Infections Diseases, 17(1), 7-15. https://doi.org/10.3201/eid1701.P11101

562
ep

563 Shoari, N., Dube, J. S., & Chenouri, S. (2015). Estimating the mean and standard deviation of
564 environmental data with below detection limit observations: considering highly skewed data
565 and model misspecification. Chemosphere, 138, 599-608.
Pr

566 https://doi.org/10.1016/j.chemosphere.2015.07.009

567

25

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
568 Shoari, N., & Dube, J. S. (2018). Toward improved analysis of concentration data: embracing
569 nondetects. Environmental Toxicology and Chemistry, 37(3), 643-656.

d
570 https://doi.org/10.1002/etc.4046

we
571

572 Sullivan, T. R., Yelland, L. N., Moreno-Betancur, M., & Lee, K. J. (2021). Multiple imputation
573 for handling missing outcome data in randomized trials involving a mixture of independent and
574 paired data. Statistics in Medicine, 40(27), 6008-6020. https://doi.org/10.1002/sim.9166

vie
575

576 Sun, W. X., Jin, Y. Q., Dai, Y. X., Xiao, J. W., Wang, X., & Dong, Q. L. (2019). Application
577 of zero-inflated models in quantitative exposure assessment of Listeria monocytogenes in bulk
578 cooked meat. in Chinese. Food Science, 40(11), 49-54.

re
579

580 Sun, W. X., Sun, T. M., Wang, X., Liu, Q., & Dong, Q. L. (2020). Probabilistic model for
581 estimating Listeria monocytogenes concentration in cooked meat products from
582 presence/absence data. Food Research International, 131, 109040.
583
er
https://doi.org/10.1016/j.foodres.2020.109040

584
pe
585 Sun, T. M., Liu, Y. T., Qin, X. J., Aspridou, Z., Zheng, J. M., Wang, X., …Dong, Q. L. (2021).
586 The prevalence and epidemiology of Salmonella in retail raw poultry meat in China: A
587 systematic review and meta-analysis. Foods, 10(11), 2757.
588 https://doi.org/10.3390/foods10112757
ot

589

590 Ta, Y. T., Nguyen, T. T., To, P. B., Pham, D. X., Le, H. T. H., Thi, G. N., …Doyle, M. P.
591 (2014). Quantification, serovars, and antibiotic resistance of Salmonella isolated from retail
tn

592 raw chicken meat in Vietnam. Journal of Food Protection, 77 (1), 57-66.
593 https://doi.org/10.4315/0362-028X.JFP-13-221

594
rin

595 USDA/FSIS, 2014. Most probable number procedure and tables. Available at:
596 https://www.fsis.usda.gov/sites/default/files/media_file/2021-03/MLG-Appendix-2.pdf.
597 (accessed 3 June, 2022)

598
ep

599 USDA/FSIS, 2022. Isolation and identification of Salmonella from meat, poultry, pasteurized
600 egg and siluriformes (Fish) products and Carcass and Environmental Sponges. Available at:
601 https://www.fsis.usda.gov/sites/default/files/media_file/documents/MLG-4.12.pdf. (accessed
Pr

602 3 June, 2022)

603

26

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
604 Wang, F. K., & Hailemariam, S. S. (2018a). Sampling plans for the zero-inflated Poisson
605 distribution in the food industry. Food Control, 85, 359-368.

d
606 https://doi.org/10.1016/j.foodcont.2017.10.021

we
607

608 Wang, F. K., & Hailemariam, S. S. (2018b). Sampling plans for the zero-inflated negative
609 binomial distribution in the food industry. Quality Reliability Engineering International, 34(6),
610 1174-1184. https://doi.org/10.1002/qre.2316

vie
611

612 Wang, Y. R., Chen, Q., Cui, S. H., Xu, X., Zhu, J. H., Luo, H. P, …Li, F. Q. (2014).
613 Enumeration and characterization of Salmonella isolates from retail chicken carcasses in
614 Beijing, China. Foodborne Pathogens and Disease, 11(2), 126-132.

re
615 https://doi.org/10.1089/fpd.2013.1586

616

617 Xiao, X. N., Wang, W., Zhang, J. M., Liao, M., Rainwater, C., Yang, H., & Li, Y. B. (2021).
618 A quantitative risk assessment model of Salmonella contamination for the yellow-feathered
619
er
broiler chicken supply chain in China. Food Control, 121, 107612.
620 https://doi.org/10.1016/j.foodcont.2020.107612
pe
621

622 Yang, X. J., Huang, J. H., Zhang, Y. X., Liu, S. R., Chen, L., Xiao, C., …Wu, Q. P. (2020).
623 Prevalence, abundance, serovars and antimicrobial resistance of Salmonella isolated from retail
624 raw poultry meat in China. Science of The Total Environment, 713, 136385.
625 https://doi.org/10.1016/j.scitotenv.2019.136385
ot

626

627 Zhu, J. H., Wang, Y. R., Song, X. Y., Cui, S. H., Xu, H. B., Yang, B. W., …Li, F. Q. (2014).
tn

628 Prevalence and quantification of Salmonella contamination in raw chicken carcasses at the
629 retail in China. Food Control, 44, 198-202. https://doi.org/10.1016/j.foodcont.2014.03.050
rin
ep
Pr

27

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
630 Tables

d
631 Table 1 Prevalence and MPN enumeration data of Salmonella obtained from 240 retail chilled

we
632 chicken samples.

No. (%) of total Distribution of positive samples


Dataset Sample size
positive samples 3 MPN/g >3-10 MPN/g >10-20 MPN/g >20 MPN/g

vie
Real sampling data 240 23 (9.58) 3 17 1 2
Simulated-A 240 63 (26.25) 0 22 26 15
Simulated-B 240 24 (10.00) 0 6 10 8

re
er
pe
ot
tn
rin
ep
Pr

28

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
634 Table 2 Parameter estimates and fit statistics of the Poisson, negative binomial, zero-inflated

d
635 Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial

we
636 distributions fitted to the real sampling dataset.

Mean (95% CI) a SD


Method AIC BIC
(MPN/g) (MPN/g)
Poisson 0.68 (0.00-2.00) 0.82 972.34 975.82

vie
Negative binomial 0.68 (0.00-3.00) 3.81 297.11 304.07

Zero-inflated Poisson 0.68 (0.00-7.00) 2.25 356.70 363.66

Hurdle Poisson 0.69 (0.00-7.00) 2.26 356.70 363.66

re
Hurdle negative binomial 0.68 (0.00-5.00) 2.83 290.33 300.77
637 a 95%CI: 95% confidence interval.

638
er
pe
ot
tn
rin
ep
Pr

29

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
639 Table 3 Parameter estimates and fit statistics of the Poisson, negative binomial, zero-inflated

d
640 Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial

we
641 distributions fitted to the simulated dataset A.

Mean (95% CI) a SD


Method AIC BIC
(MPN/g) (MPN/g)
Poisson 3.92 (1.00-7.00) 1.98 2997.95 3001.43

vie
Negative binomial 3.92 (0.00-22.00) 13.52 804.55 811.51

Zero-inflated Poisson 3.91 (0.00-7.00) 6.85 761.75 768.72

Zero-inflated negative binomial 3.92 (0.00-21.00) 7.49 700.06 710.50

re
Hurdle Poisson 3.91 (0.00-18.00) 6.85 761.75 768.72

Hurdle negative binomial 3.92 (0.00-21.00) 7.50 700.06 710.50


642 a 95%CI: 95% confidence interval. er
643
pe
ot
tn
rin
ep
Pr

30

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
644 Table 4 Parameter estimates and fit statistics of the Poisson, negative binomial, zero-inflated

d
645 Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial

we
646 distributions fitted to the simulated dataset B.

Mean (95% CI) a SD


Method AIC BIC
(MPN/g) (MPN/g)
Poisson 1.63 (0.00-4.00) 1.28 1992.82 1996.30

vie
Negative binomial 1.63 (0.00-6.00) 10.17 367.28 374.24

Zero-inflated Poisson 1.63 (0.00-7.00) 5.05 350.24 357.20

Hurdle Poisson 1.63 (0.00-16.00) 4.01 350.23 357.20

re
Hurdle negative binomial 1.60 (0.00-5.00) 10.36 370.56 381.00
647 a 95%CI: 95% confidence interval.

648
er
pe
ot
tn
rin
ep
Pr

31

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
649 Table 5 The mean values (MPN/g), standard deviation (MPN/g), and RMSEs of Salmonella

d
650 MPN counts in chilled chicken samples implemented by censored data methods.

we
Simulated-A Simulated-B Real sampling data
Method *
Mean (SD) RMSE Mean (SD) RMSE Mean (SD) RMSE
Baseline 3.92 (7.44) - 1.63 (5.39) - 0.72 (3.08) -
LOQ/2 5.02 (6.86) 1.29 3.00 (5.04) 1.43 2.07 (2.78) 1.43
LOQ/ 2 5.19 (6.77) 1.49 3.21 (4.99) 1.65 2.63 (2.67) 1.65

vie
LOQ 6.13 (6.30) 2.60 4.37 (4.70) 2.87 3.43 (2.52) 2.85
NB-MLE 3.92 (13.52) 1.01 1.63 (10.17) 0.99 0.68 (3.81) 1.00
HNB-MLE 3.92 (7.50) 0.99 1.60 (10.36) 1.03 0.68 (2.83) 1.08
MI-pmm 11.50 (8.62) 11.65 16.44 (7.35) 17.10 4.19 (6.47) 7.58
651 *Baseline: the method of substituting non-detects with zero; LOQ/2: the method of

re
652 substituting non-detects with LOQ/2; LOQ/ 2: the method of substituting non-detects with
653 LOQ/ 2; LOQ: the method of substituting non-detects with LOQ; NB-MLE: the negative
654 binomial distribution-based maximum likelihood estimation method; HNB-MLE: the hurdle
655 negative binomial distribution-based maximum likelihood estimation method; MI-pmm: the
656 multiple imputation method implemented by predict mean matching.
657
er
pe
ot
tn
rin
ep
Pr

32

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
658 Figures

d
we
vie
659
660 (a) (b)

re
661
er
662 (c)
pe
663 Fig. 1. Predictive distribution of Salmonella counts as modelled by the Poisson (PO), negative

664 binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), hurdle

665 Poisson (HP), and hurdle negative binomial (HNB) distributions with the real sampling dataset
ot

666 (a), simulated dataset A (b), and simulated dataset B (c). Circle markers represent observed
tn

667 probabilities of the data (Obs).


rin
ep
Pr

33

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
668

d
we
vie
669
670 (a) (b)

re
671
er
672 (c)
pe
673 Fig. 2. Cumulative probability plot of the Salmonella counts as modelled by the Poisson (PO),

674 negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB),

675 hurdle Poisson (HP), and hurdle negative binomial (HNB) distributions with the real
ot

676 sampling dataset (a), simulated dataset A (b), and simulated dataset B (c). Circle markers
tn

677 represent observed probabilities of the data (Obs).


rin
ep
Pr

34

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891
678

d
we
vie
679
680 (a) (b)

re
681
er
682 (c)
pe
683 Fig. 3. Probability difference (observed minus fitted) in Salmonella counts as fitted by the

684 Poisson (PO), negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative

685 binomial (ZINB), hurdle Poisson (HP), and hurdle negative binomial (HNB) distributions with
ot

686 the real sampling dataset (a), simulated dataset A (b), and simulated dataset B (c).
tn

687
rin
ep
Pr

35

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4377891

You might also like