You Et Al., 2023

Marine Pollution Bulletin 189 (2023) 114712

Marine Pollution Bulletin

Dynamics of fecal coliform bacteria along Canada's coast

Shuai You a, Xiaolin Huang a, Li Xing c, Mary Lesperance a, Charles LeBlanc d, L. Paul Moccia e,
Vincent Mercier f, Xiaojian Shao b, Youlian Pan b, *, Xuekui Zhang a, *
University of Victoria, 3800 Finnerty Road, Victoria, BC V8W 2Y2, Canada
Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
University of Saskatchewan, 105 Administration Place, Saskatoon, Saskatchewan S7N 5A2, Canada
Shellfish Water Classification Program - Atlantic Region, Environment and Climate Change Canada, Government of Canada, 443 University Ave., Moncton, NB E1A
3E9, Canada
Shellfish Water Classification Program – Pacific Region, Environment and Climate Change Canada, Government of Canada, 2645 Dollarton Highway, Vancouver, BC
V7H 1B1, Canada
National Coordination, Environment and Climate Change Canada, Government of Canada, 443 University Ave., Moncton, NB E1A 3E9, Canada


Keywords: The vast coastline provides Canada with a flourishing seafood industry including bivalve shellfish production. To
Fecal coliform sustain a healthy bivalve molluscan shellfish production, the Canadian Shellfish Sanitation Program was
Longitudinal measurements established to monitor the health of shellfish harvesting habitats, and fecal coliform bacteria data have been
Functional principal component analysis
collected at nearly 15,000 marine sample sites across six coastal provinces in Canada since 1979. We applied
British Columbia
River discharge
Functional Principal Component Analysis and subsequent correlation analyses to find annual variation patterns
Precipitation of bacteria levels at sites in each province. The overall magnitude and the seasonality of fecal contamination
were modelled by functional principal component one and two, respectively. The amplitude was related to
human and warm-blooded animal activities; the seasonality was strongly correlated with river discharge driven
by precipitation and snow melt in British Columbia, but such correlation in provinces along the Atlantic coast
could not be properly evaluated due to lack of data during winter.

1. Introduction measured to monitor the quality of surrounding water environments

across the world, such as in the US (Soueidan et al., 2021), China (Aram
Fecal coliform bacteria are potentially pathogenic microorganisms et al., 2021), and Korea (Park et al., 2006).
that originate in the intestines of human beings and warm-blooded an­ Besides monitoring, there has been a large variety of studies on the
imals. They can be transported in fecal excrement to various water en­ bacteria to learn about their existence and distribution. Relationships
vironments from sources like sewage effluent as well as wildlife and between the bacteria and environmental factors have been worked on.
waterfowl (Ksoll et al., 2007). The contaminated water has been shown For example, Pearson correlation coefficient was used to determine that
to place risk on human health. For example, exposure to fecal contam­ fecal coliforms were negatively correlated with salinity and dissolved
inated water is associated with an increased risk of diarrhea (Laborde oxygen (Kazemitabar et al., 2021). Rainfall-induced surface runoff
et al., 1993); consumption of water contaminated by fecal coliform containing manure would lead to fecal pollution, while a simulation
bacteria is associated with typhoid fever (Mermin et al., 1999), gastro­ study found that such transportation could be effectively reduced by
intestinal illness (Weissman et al., 1976), and hepatitis (Bergeisen et al., subsurface manure injection (Hilaire et al., 2022). Predictions models
1985). As a result, fecal coliform bacteria levels are regarded as one type for the bacteria levels have also been applied to aquatic environment.
of indicator that reflects the fecal contamination level of water (Noble The Soil and Water Assessment Tool (SWAT) model, which was origi­
et al., 2003). A high level of fecal coliform bacteria in the water indicates nally developed to predict the longtime influence of land management
a degradation in the marine environment; and, as such, they are widely actions on different watersheds, was modified to predict of the fecal

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

coliform contamination (Cho et al., 2012); machine learning techniques, bacteria levels via application of FPCA to data collected in British
such as Artificial Neural Networks, Cubist, and Multiple Linear Regres­ Columbia. The data characteristics represented by these two patterns
sion, were fitted and compared in terms of their predictive performance and their relationships with specific regional geographical factors are
(Sbahi et al., 2021). Moreover, source tracking the bacteria, e.g., using then illustrated through linear regression models and correlation ana­
colored dissolved organic matter (Clark et al., 2007), is another direct lyses. This workflow was subsequently applied in provinces along the
approach for finding the origin of the pollution. For instance, wild ani­ Atlantic coast in Canada, where there was a significant limit on data
mal and human were identified as two dominant sources of fecal availability. The structure of the article is as follows. Section 2 provides
pollution in an urban watershed in Florida (Whitlock et al., 2002). information regarding the available data, the preprocessing procedure
These efforts help protecting human beings from the consumption of and the methods used. Pattern discovery and characterization, and their
contaminated seafood with fecal coliform bacteria. In the seawater, the correlations with geographical factors are presented in Section 3.
contaminants are concentrated/biomagnified by the filter feeding Finally, Section 4 provides a discussion, conclusion, and consideration of
bivalve shellfish. Therefore, to sustain a healthy bivalve molluscan further research directions.
shellfish industry, the Canadian Shellfish Sanitation Program (CSSP)
was established in 1948 to monitor the health of shellfish harvesting 2. Data and methods
areas. As a part of the program, Environment and Climate Change
Canada (ECCC) has been monitoring and recording the concentration of 2.1. Data
fecal coliform bacteria, which are mostly E. coli (Dufour, 1977), along
with salinity since 1979. Currently, bacteriological water quality is Data consist of measurements of fecal concentrations collected at
monitored at an average of 6000 sites annually, designated for shellfish nearly 15,000 shellfish harvesting sites across 6 coastal provinces in
harvesting across 6 coastal provinces in Canada: British Columbia (BC), Canada, including British Columbia (BC), Quebec (QC), New Brunswick
Quebec (QC), New Brunswick (NB), Prince Edward Island (PE), Nova (NB), Prince Edward Island (PE), Nova Scotia (NS), and Newfoundland
Scotia (NS), and Newfoundland and Labrador (NL). Data from nearly and Labrador (NL). At each site, water samples of unit volume were
15,000 unique sites were evaluated in this study. Prior to 1999, there taken for monitoring purposes from 1979 to 2018/2019. The number of
were multi-day surveys conducted in selected locations. After 1999, the fecal coliform bacteria in each sample (bacteria count per 100 mL of
samples were better distributed over space and time. Over time, ECCC water) was then measured according to the standard method (Braun-
routinely assessed fecal coliform levels at sites and accordingly adjusted Howland et al., 2017) and ranged from 2 to 1600 FC/100 mL MPN.
shellfish harvesting schedules for closures and re-openings of specific Temperature (◦ C) and salinity (‰) were collected along with each
harvesting areas. The assessment typically involved statistical analyses bacteria-count measurement. In addition, we obtained two other data­
on recent multi-year sample sets of data at particular shellfish harvesting sets: 1) precipitation records over the 240-h period prior to each sam­
sites, identification of potential sources of point and non-point pollution, pling, and 2) GPS coordinates of the CSSP monitoring sites. In total, the
and classification of the sites based on their fecal contamination levels. data were in the form of 18 matrices, with 3 for each province.
To complement this effort, we conducted a sweeping study across re­
gions using multi-year datasets to provide macro-scale understanding of 2.2. Data preprocessing
potential water quality impairment and make further recommendations
for coastal management. For better understanding of potential causal The amount of available data for each province varied across times
factors, river water flow data were retrieved from the Environment and sites. To screen out data that were not suitable for analyzing the
Canada Data Explorer developed by the Government of Canada and variation patterns of fecal coliform contamination levels, the following
available. Using this rich data collected over four decades, we aimed to three exclusion criteria as described in You et al. (2022) were applied
find temporal fluctuation patterns of fecal coliform bacteria levels at with adjustment to the third criterion. Briefly, we only focused on data
sites in each coastal province and their associations with temporal collected after 1999, when the sampling was more frequent and regular
physical climate and oceanographic factors, which would help with at sites across the six provinces, and thus sites that did not have any
more efficient monitoring of coastal marine environmental contamina­ measurements after 1999 were removed. Secondly, we excluded the
tion in Canada. sites showing no contamination. In this regard, sites with all measure­
The temporal bacteria data in this study are of extremely high di­ ments below 2 fecal coliform bacteria per 100 mL of water sample,
mensions due to the large number of distinct time points and locations at which is below the detection limit, were removed.
which the measurements were taken. Concisely summarizing the asso­ As the data were collected to serve the CSSP monitoring program, the
ciation between two sets of high-dimensional temporal data is a chal­ data density spreading over the 20 years was very low and unreliable to
lenge. Temporal data often carry substantial information in the generate meaningful results. To address the data sparsity problem, we
correlation between data observed in adjacent time points. Hence, we decided to pool the data. Based on the exploratory data analysis of raw
decided to use lower-dimensional surrogate variables to replace the data, we found no significant cross-year differences in the bacterial
high-dimensional temporal data. Principal Component Analysis (PCA) is measurements (Supplementary Fig. S1). This motivated us to pool the
a popular tool for this type of task, as it finds sets of uncorrelated linear 20 years data onto a scale of one year (365 days) at each site. We further
combinations of the input features, i.e., principal components (PCs), that reduced the time resolution from 365 days to 52 weeks by taking the
decompose the variance of the features among the samples (Jolliffe and average of 7-day periods. Data in each weekly bin were summarized
Cadima, 2016). As a result, the originally high-dimensional data of each using the mean of all counts within the bin. After pooling, we excluded
sample are summarized into a collection of sample-specific values sites that had consecutively missing measurements over any 4-week
indicating how observations of the sample differ in each of the directions period because they would not be reliable to reveal seasonal variation
represented by the PCs. However, PCA does not consider the temporal of the bacteria concentrations. Consequently, 847 (20.9 %) of the 4062
order in the data, which was a key attribute of our data, and PCA re­ sites in BC were retained. More detailed descriptions of data pre­
quires that the input data is a matrix without any missing data, but our processing, including data pooling and binning, missing data handling is
data were collected at sparse and irregular time points that were drawn provided in You et al. (2022).
from a continuum, i.e., a 40-year time period. We reduced the dimension Sites in BC tended to have data across the year, whereas the sites in
of our data using the non-parametric Functional Principal Component the five provinces on the Atlantic coast did not have data in the winter
Analysis (FPCA) based on empirical conditional estimation (Wang et al., months. Most measurements were available between late May and mid-
2016; Yao et al., 2005). November in QC, NB, PE and NS. In NL, the measurements covered an
In this paper, we unveil two prominent patterns in the fecal coliform even shorter time period. Therefore, we combined the data in QC, NB,

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

PE and NS into one dataset and focused on the subset that ranged from information in each sample is summarized into a collection of FPC scores
the 19th (mid-May) to the 45th week (early mid-November). In NL, we that are the coordinates of the sample in the newly constructed K-
focused on the subset from the 20th week to the 38th week (early dimensional FPC space.
October). After the same screening, 906 (28.2 %) of the 3213 sites in QC,
776 (39.0 %) of the 1989 sites in NB, 700 (78.2 %) of the 895 sites in PE, 2.4. Correlation analysis and visualization
1041 (31.0 %) of the 3359 sites in NS, and 1006 (78.5 %) of the 1282
sites in NL were retained. We aimed to find the correlations between potential data charac­
A log base 10 transformation was applied to the weekly mean counts teristics and the variation patterns found via FPCA. As the FPC scores
before subsequent analyses. The precipitation data corresponding to the were quantification of such variation patterns, we chose to fit linear
sites, as well as the water flow data in recent years of different rivers regression models to test whether the FPC scores could help predict the
obtained from Environment Canada Data Explorer (https://www. individual characteristics revealed by the corresponding FPCs in the preprocessed data. Through linear regression, values of one variable, Y,
w/quantity/monitoring/survey/data-products-services/national-arch that represented a certain data characteristic were expressed by a linear
ive-hydat.html) were preprocessed in the same manner, except for the equation model of the scores of a certain FPC. The coefficients in this
logarithm transformation. Cumulative precipitation over the 5 days model were calculated by minimizing the residual sum of squares (RSS)
prior to each sampling and weekly average water flows were used for between the real and the fitted values of Y. To measure how good the
downstream correlation analyses. model was for explaining the change of Y in terms of the FPC scores, we
referred to the R2, which was the proportion of the variance of Y that was
explained by the variance of the FPC scores. In the meantime, the p-
2.3. Dimension reduction via functional principal component analysis value, which was the probability of the current and more extreme sit­
uations happening under the assumption that the coefficients were all
The measurements at a site taken at different time points were 0 in the model, also provided evidence of the existence of the model. A
treated as realizations drawn from an underlying function of time. high (e.g., >0.5) R2 and a low (≤0.05) p-value would suggest a signif­
Through the application of FPCA on sparse data via the “fdapace” icant utility of the FPC scores to explain/predict Y. There is a built-in
package in R (Gajardo et al., 2021), kernel functions were applied during function “lm” in R to perform this method and acquire the two statis­
estimation of the FPCA model, and a moderate amount of missing data at tics, and one can find more about linear regression analysis in Seber and
a site was acceptable. More available measurement data would Lee (2012).
contribute to more reliable results, while missing measurements at in­ The correlations between these patterns and potential environmental
dividual sites could be interpolated based on the learned model. factors and other contributors of fecal contamination were also in pur­
If we model the measurement at the i-th site and time j as Xi(j), then suit. The causality between these pairs was uncertain, and correlation
the functional principal components (FPCs), ϕ’s, are a set of ordered analysis was performed to find the magnitude and direction of the
orthonormal eigen functions, derived one by one and of which the order relationship between each pair of variables. We considered two corre­

is marked by a subscript k, that maximize the variance of J(Xi(j) − lations, Pearson correlation, which measures the linear relationship
μ(j))ϕk(j)dj, i = 1, …, n, where μ is the mean function averaged over all between two variables, and the rank-based Spearman correlation, which
functions over the time continuum, n is the sample size, and J is the instead measures the monotonic relationship. A comparison between the
domain of time j. two illustrates the advantage of Spearman correlation when the data
Upon finding the FPCs, an appropriate number, K, is selected as the have high kurtosis, outliers, or a heavy-tailed distribution (De Winter
number of FPCs that users consider as an optimal summary of the et al., 2016). Since the data in this study were highly sparse and skewed,
variance in the samples. In the “fdapace” package, based on the Prin­ and abrupt peaks of bacteria levels were frequently observed without
cipal Analysis by Conditional Estimation (PACE) algorithm (Yao et al., intermediate measurements, Spearman correlation was more appro­
2005), under the assumption that the random measurement errors in the priate and used throughout this study. From values of a pair of variables
centered data follow a Gaussian distribution with a zero mean and a of each site across the time scale, the Spearman's Rank correlation co­
constant variance, the maximum number of FPCs that can be obtained is efficient, ρ, measured the concordance between the respective ranks of
n − 2. Typically, the FPCs explain decreasing proportions of variance in the paired values. A p-value was also calculated as the probability of
the centered data, and most of the variance gets explained by several top seeing those and more extreme pairs of ranks if they were pure random.
FPCs. As a result, selecting a small K can result in the loss of significant Consequently, a positively or negatively large ρ with a low p-value
information, whereas selecting a large K may include in the model (≤0.05) would provide evidence of a strong positive or negative corre­
patterns related to outliers or noise. Users mostly start from the first FPC lation between the two variables. The definite calculation of ρ and the p-
and end when a sizeable accumulative percentage of variance has been value is described in Zar (1972), while more introduction of the corre­
explained while the remaining FPCs seem to barely convey any useful lation analysis can be found in Gogtay and Thatte (2017). Users can
information. We followed this procedure in this study. derive these statistics via the “cor.test” function in R.
With the FPCs and K determined, the value at the j-th time point of The “ggmap” package in R (Kahle and Wickham, 2013) was used for
the i-th sample's function can be well approximated by a linear combi­ visualization of the scores in all regions in this study. After the variation
nation of the K selected FPCs and the corresponding K FPC scores, βi, k, k patterns and the data characteristics represented by them were found,
= 1, …, K, i.e., the FPC scores functioned as indicators of the strengths of those char­
acteristics. Mapping them on the corresponding sites' geographical lo­

Xi (j) ≈ μ(j) + βi,k ⋅ϕk (j), (1) cations enabled the discovery of potential environmental factors that
k=1 were correlated with the observed bacteria levels at the sites.
An interactive website, which uses the “shiny” package in R (Chang
where et al., 2015) to visualize both the data and the analytical results, is
∫ available at and updated periodically.
βi,k = (Xi (j) − μ(j) )ϕk (j)dj (2)

is the score assigned to the i-th sample's function corresponding to the k-

th FPC which measures how strong this function differs along the di­
rection of the k-th FPC against the mean function. Therefore, most of the

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

3. Results with stronger seasonal variations in their bacteria levels among those
with the highest overall amplitudes of bacteria levels, i.e., those with the
3.1. Coastal fecal coliform patterns in British Columbia (BC) top 10 % of the FPC1 scores (the red dots in Fig. 2b). Two groups were
extracted based on two extrema (the top and the bottom 10 %) of the
The data from the coast of British Columbia (BC) were available FPC2 scores. Their approximated bacteria levels according to Eq. (1),
across the entire year, so we started to explore the distribution patterns represented by red and blue curves in Fig. 3a and b, corresponding to 28
based on this relatively complete dataset. and 20 sites, respectively. These approximations have been validated in
more detail (Supplementary Fig. S2). While the seasonal variations of
3.1.1. Variance decomposition of fecal coliform Bacteria levels the two groups were drastically different, sites within each group
Functional Principal Component Analysis (FPCA) was applied to showed synchronized temporal patterns in their approximated bacteria
model the variance in the data. The results are presented in Fig. 1. In levels across a year. Among sites with the higher FPC2 scores, their
Fig. 1a, the mean function appeared at its lowest near the end of March highest approximated bacteria levels appeared in summer and fall
while at its highest in November; the top 2 functional principal com­ (Fig. 3a), whereas the sites with lower FPC2 score had low values from
ponents (FPCs) explained 95 % of the variance from the mean function spring to early fall (Fig. 3b). Given different ranges (every 10 %) of FPC1
at the sites, and their variations over time are depicted in Fig. 1b. scores, such contrast between high or low FPC2 scores was consistent
The FPC1 explained 74 % of the variance. According to the black across all amplitudes of bacterial levels represented by the entire spec­
curve in Fig. 1b, it stayed positive throughout the year, and thus, during trum of FPC1 scores (Supplementary Fig. S2). The FPC2 thus justifiably
approximation of the bacteria levels at a site via Eq. (1), a larger FPC1 explained seasonal variation patterns in the bacteria level at each site.
score would contribute to a larger amount of the FPC1 value added to
the approximation in every week, and we would see an overall increase 3.1.2. Amplitudes of bacteria levels around BC coast
in the approximated bacteria levels at the corresponding site. Such With the main variation patterns confirmed, for efficient visualiza­
relationship indicated that the FPC1 reflected the overall amplitude of tion of the relative overall amplitudes of bacteria levels, sites were
the bacterial level at a site across a year. Therefore, sites with higher grouped into 10 bins of equal size in the order of increasing FPC1 scores.
overall amplitudes were assigned larger FPC1 scores. Sequential colors were assigned to these bins, blue for the ones with the
This finding was supported by a significant correlation between the lowest FPC1 scores while red for those with the highest (Fig. 4).
maximum level of bacteria at a site throughout the year and the According to Fig. 4, many high-amplitude sites appeared in the Strait
percentile of its FPC1 score (R2 = 0.61, p < 2.2e-16, Fig. 2a). The of Georgia in close proximity to the city of Vancouver, southeast Van­
maximum values at different sites appeared at different amplitudes and couver Island between Courtenay, Nanaimo, and Victoria, and along the
different times across the year, and the separation between the sites with west coast of Vancouver Island, while low-amplitude sites were along
the top and bottom 10 % of FPC1 score was obvious (Fig. 2b). These the northwest of the mainland side of the strait and in between the small
findings suggested that the FPC1 scores indeed reflected the overall islands from the Texada Island to Desolation Sound (marked as the grey
amplitude of the bacteria levels at sites in BC. circles in Fig. 4). Such distribution was confirmed by marking the sites
Apart from the FPC1, the FPC2 accounted for 21 % of the variance. It with the highest and the lowest 10 % of the FPC1 scores, respectively
was negative before May and after November, while positive in the (Supplementary Fig. S3).
middle of a year (Fig. 1b). As a result, the FPC2 scores indicated positive The sites around major cities with large populations usually had high
or negative contributions in different seasons of the year as reflected by FPC1 scores that indicated the overall amplitudes of fecal coliform
Eq. (1). This time-related pattern indicated the seasonality in a year; concentration. Such cases were around Vancouver (662,248 people),
sites with higher and positive FPC2 scores had their highest bacteria Victoria (91,867 people), Nanaimo (99,863 people), and Courtenay
levels during the middle of a year, while sites with lower and negative (28,420 people), according to the 2021 Census of Population by Statis­
FPC2 scores had their highest at the beginning or the end of a year. tics Canada. Between Victoria and Nanaimo, some sites appeared low-
For a clearer depiction of this seasonality, we looked into the sites amplitude. Compared to the high-amplitude sites in this region,

Fig. 1. Functional Principal Component Analysis (FPCA) on preprocessed BC fecal coliform data: (a) the mean function, μ, was calculated as the average bacteria
levels at all sites in the weeks; the variance from the mean function at the sites were then decomposed into Functional Principal Components (FPCs); (b) the top 2
FPCs together explained 95 % of the variance, with corresponding proportions of variance explained in the brackets. (For interpretation of the references to colors in
this figure, readers can refer to the web version of this article.)

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 2. Characteristics of FPC1: (a) the maximum measurement on at each site against the corresponding percentile of FPC1 score, (b) the maximum measurement at
a site with an FPC1 score in the highest (red) or the lowest (blue) 10 % against the week of occurrence. (For interpretation of the references to colors in this figure,
readers can refer to the web version of this article.)

locations of the low-amplitude ones were farther away from the coast. colors in Fig. 5. According to Fig. 3a, these sites typically had high
This is reasonable since a large proportion of fecal coliform contami­ bacteria levels in the summer and low bacteria levels in winter. The
nation would be expected to originate from inland populated or agri­ approximated bacteria levels at individual sites in Fig. 3a were highly
cultural areas and travel seaward. correlated with Squamish River annual water flow from 1999 to 2018 in
Fig. 4 also shows a number of sites with high amplitudes of fecal Fig. 3c (ρ ≥ 0.74, p < 2.2e-16).
coliform contamination on the west coast of Vancouver Island, which The Fraser River is the longest river in BC. It has a much larger
are not highly populated, but are subject to some of the highest levels of drainage area that flows from some of the warmest parts of BC, including
rainfall in the province. Non-point-source pollution due to human and parts of Rocky Mountain, to the Strait of Georgia. Its seasonal flow
wild-life activities is usually regarded as the cause of the contamination pattern (Fig. 3e) was similar to the Squamish River's, except that the
in such areas. The spring months are replete with wildlife such as birds Fraser River flow appeared to peak a few weeks earlier according to flow
and marine mammals and upland animals, and certain areas are very data collected at a farther inland location. Approximated bacteria levels
popular with boaters, tourists, and outdoor enthusiasts. Contributions at individual sites shown in Fig. 3a were also highly correlated with
from these animal and human activities would bring higher amounts of Fraser River annual water flow patterns from 1999 to 2016 in Fig. 3e (ρ
fecal matter to the area. ≥ 0.62, p < 1.4e-06).

3.1.3. Seasonality of bacteria levels and leading factors Precipitation. Populated regions around and in between cities
The seasonality in bacteria levels represented by the FPC2 was also of in BC are surrounded by blue-ish colors in Fig. 5, indicating low FPC2
major interest. Using the same procedure, assigning 10 sequential colors scores. Bacteria levels at sites in these regions, resembling those of the
to 10 bins of increasing FPC2 scores resulted in distributions of the FPC2 sites in Fig. 3b, were high in winter, the season of high precipitation, and
scores among the sites (Fig. 5). There was a clear contrast in the bacteria low in summer, the dry season in coastal BC.
levels at the sites in the vicinity of major municipalities, such as the area Moreover, certain river systems are sensitive to or mostly driven by
from Campbell River to Victoria and around Vancouver City, as precipitation. On the east side of Vancouver Island, the Courtenay River,
compared to elsewhere. The sites away from cities usually showed red, Englishman River, Nanaimo River, Chemainus River, and Cowichan
where the bacteria levels correlated with the discharge patterns of rivers River are such examples (Fig. 3f). Sites at the discharging locations of
driven by snow melt, which peaked in summer, from the mountains in these rivers were mostly covered in blue-ish colors in Fig. 5, which
BC (Fig. 3c and e). The sites around the major cities appeared to show suggested the contribution of precipitation to the high concentrations of
blue, where the bacteria levels correlated with the regional precipitation fecal matter in the rainy seasons in these areas.
pattern (Fig. 3d). The sites located in between the cities on the east coast The correlation between the bacteria levels at those sites with the
of Vancouver Island were also mostly blue, which agreed with the strongest but opposite seasonal variation (sites in Fig. 3b) and the pre­
discharge patterns of small rivers mostly driven by precipitation that cipitation measured in BC supports this observation. At measuring lo­
peaked in winter and early spring, followed by a drought in summer, but cations (sectors) corresponding to individual sites, each measurement
then increased again in the fall due to increasing rainfall (Fig. 3f). was the cumulative precipitation over the past 5 days prior to each Fecal
Coliform sampling, and the mean of all such measurements within the River discharge. The annual discharge patterns of the Squamish weekly bins were acquired for analysis. The weekly precipitation levels
River and the Fraser River highly agreed with the seasonal variation of at sites in Fig. 3b (those with the highest 10 % of FPC1 scores and the
bacteria levels of the sites with high FPC2 scores (Fig. 3a, c, e). The lowest 10 % of FPC2 scores) were presented in Fig. 3d. We obtained the
Squamish River is a proglacial river to the northwest of Vancouver City, correlation values between the approximated fecal coliform bacteria
driven by the meltwater from the Coast Mountains (Hickin, 1984). It levels and the approximated precipitation levels (by FPCA, with >95 %
forms one of the many similar paths for the inland water to access the of variance explained) at those individual sites. The correlations were
Strait of Georgia, on the southwest coast of the mainland. The sur­ found all significant (ρ ≥ 0.54, p < 5.7e-05). In Fig. 6, the correlation
rounding sites at these discharging locations from the north side of Strait analysis between approximated levels of fecal coliform bacteria and
of Georgia to the northwest of Vancouver City were dominated by red precipitation was performed at all sites in BC; sites with significantly

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 3. Characteristics of FPC2, river discharge, and precipitation along BC coast: approximated bacteria levels at sites with the highest 10 % FPC1 scores and the
highest 10 % (a) and lowest 10 % (b) of FPC2 scores (corresponding to the sites with red and blue dots of Fig. 2b; the mean values are shown as the dashed black
curves); average weekly flow of the Squamish River from 1999 to 2018 (c) and the Fraser River from 1999 to 2016 (e); (d) weekly precipitation at the sites presented
in (b), with the mean values shown as the solid black curve; (f) average flows of small local rivers on the east coast of Vancouver Island, between Victoria and
Courtenay. (For interpretation of the references to colors in this figure, readers can refer to the web version of this article.)

positive correlations were marked, with the correlations grouped into 10 upland drainage area may enter river systems and are discharged into
bins. the marine where they can become entrained in surface waters.
Other than the urban areas, some scattered sites at which the bac­ In summary, the FPCA analyses at the sites in British Columbia
teria levels were highly correlated with precipitation appeared to the indicated that the majority of the variance among their fecal coliform
northwest of Vancouver City, the north of Nanaimo, and the east of bacteria levels revealed differences in the overall amplitude of fecal
Courtenay (around Texada Island, marked as the large grey circle in contamination in a year as reflected by the FPC1; whereas, the seasonal
Fig. 4). During periods of high precipitation, the bacteria within the variation of the contamination level was reflected by the FPC2. The high

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 4. Distribution of FPC1 scores in BC: the sites under analysis are marked as colored points on BC provincial map via “ggmap” package in R software; the ordered
scores of the sites are grouped into 10 bins of equal size, and sequential colors from blue to red are assigned to sites in corresponding bins for efficient visualization of
the relative magnitudes of FPC1 scores. (For interpretation of the references to colors in this figure, readers can refer to the web version of this article.)

and low FPC2 scores revealed distinct patterns of seasonal variations. level at a site and the corresponding FPC1 score (Fig. 7a, c) was found to
The sites with high FPC2 scores had high amounts of fecal matter in the be significant in both the 4 combined provinces (R2 = 0.68, p < 2.2e-16)
summer months during seasonal presence of high human/animal ac­ and in NL alone (R2 = 0.74, p < 2.2e-16). Meanwhile, in each scenario,
tivities in the regions or high discharge from rivers which carried fecal the occurrences of the maximum measurements at sites were quite
material from inland areas. Sites with low FPC2 scores, on the other evenly distributed over the selected weeks (Fig. 7b, d), and there was
hand, were affected by regional precipitation or discharge from smaller very little overlap between the maximum measurements collected at
regional river systems that were sensitive to precipitation. These sites sites with the highest or the lowest 10 % of FPC1 scores, which justified
were generally in the urbanized regions or near areas of high agricul­ our reference to an FPC1 as a reflection of the overall amplitude of the
tural activities. fecal contamination at the corresponding site in these 2 scenarios.
The FPC2 explained 7 % and 17 % of the variance among bacteria
levels in the 4 provinces combined and NL alone, respectively. Each
3.2. Fecal coliform patterns along Atlantic coast
FPC2 showed a change from positive to negative values around the 35th
week (Supplementary Fig. S4b, d). In the former scenario, accounting for
We applied the same techniques described above to coastal marine
such a small percentage of variance, the approximated bacteria levels at
environments across Atlantic Canada and analyzed the data in the same
sites with two extrema of FPC2 scores did not present major differences
in their variation patterns; in the latter scenario, the differences were
larger. Among sites with the 10 % highest FPC1 scores, the increasing
3.2.1. Variance decomposition of fecal coliform bacteria in QC, NB, PE,
trend of bacteria levels stopped at those with the highest 10 % of FPC2
and NS combined, and in NL
scores after around the 33rd week, while those with the lowest 10 % of
Based on data availability, FPCA was applied to the period of week
FPC2 scores had generally increasing bacteria levels over the range
19 to 45 for the provinces of Quebec (QC), New Brunswick (NB), Prince
(Supplementary Fig. S5).
Edward Island (PE), and Nova Scotia (NS), and to week 20 to 38 in
The two leading factors, the river discharge and the precipitation
Newfoundland and Labrador (NL) alone (Supplementary Fig. S4).
that were found correlated with the seasonality in the bacteria levels in
The FPC1 in these two scenarios explained 89 % and 79 % of the
BC, were then studied in these provinces along the Atlantic coast
variance, respectively. The positive values of its variations over all the
(Supplementary Fig. S6).
selected weeks suggested positive contributions in Eq. (1). Therefore,
In QC, NB, PE, and NS combined and in NL alone, the river discharge
similar to that in BC scenario, every FPC1 score indicated the overall
patterns had similar trends in a year. Some of the stream-gauged rivers
amplitude of the bacteria level at a site in these provinces from 1999 to
were selected in these provinces, and they all peaked around the 20th
2019. The linear relationship between the maximum observed bacteria

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 5. Distribution of FPC2 scores in BC: in the same way as for FPC1 scores, sites with relative magnitudes of FPC2 scores, indicated by the sequential colors, are
marked along BC coast. (For interpretation of the references to colors in this figure, readers can refer to the web version of this article.)

week, followed by a trough that lasted until around the 40th week. The The distributions of FPC1 scores in these provinces agreed with that
major rivers, such as the Saint John River in NB and the Churchill River in BC: colors indicating high amplitudes of bacteria levels were usually
in NL (Supplementary Fig. S6a, c), had increasing flows afterwards, surrounding populated areas, such as cities, villages, and communities.
whereas the smaller rivers remained in their trough for the rest of the For example, such areas included Gaspe and Rimouski in QC, Tracadie,
year. Caraquet, and Bathurst in NB, Alberton, Summerside, Souris, Montague,
The precipitation patterns appeared similar in the 2 scenarios and Cornwall in PE, and Digby, Yarmouth, Shelburne, and New Glasgow
(Supplementary Fig. S6b, d). The precipitation was less variable over the in NS, and Robert's Arm in NL.
selected weeks along Atlantic coast.
The seasonal variations of the bacteria levels and their correlation 4. Discussion
with corresponding river discharge and precipitation patterns at sites in
these provinces over the selected range of time appeared not as promi­ This paper illustrates the main variation patterns in the fecal coli­
nent as in BC. Given that there were major seasonal data void in the form bacteria levels on a one-year scale and their correlations with
bacteria-count, further analyses on this part would require more com­ surrounding geographical factors at sites across coastal areas in Canada.
plete data. For example, our analysis of fecal coliform data covered the Functional principal component analysis indicated that over 95 % of the
time range from mid or late May to early October in the case of variance in the bacteria levels at sites across the 6 provinces could be
Newfoundland and Labrador, and to mid-November in the case of the decomposed into differences in the overall amplitude by FPC1 and the
other four provinces. This would miss the peak precipitation in spring as seasonality by FPC2, respectively. The former is correlated with regional
it already declined in mid-May. The fall peak of precipitation would be human/animal population activities, while the latter with the precipi­
missed as well for the case of Newfoundland and Labrador. tation around urbanized areas and with the discharge of meltwater-
driven rivers elsewhere. These findings are the most obvious in British
3.2.2. Amplitudes of bacteria levels and populated areas in QC, NB, PE, Columbia where the data are available throughout the year and similar
NS, and NL in the other five provinces along the Atlantic coast: Quebec, New
The FPC1 scores for sites in the 4 provinces combined and NL were Brunswick, Prince Edward Island, Nova Scotia, and Newfoundland and
both ordered and separated into 10 bins of equal size. The same 10 Labrador, where the data in winter and early spring are not available. A
sequential colors as in Fig. 4 were then assigned to the bins. The dis­ summary of the findings is presented in Table 1.
tributions of the colors are mapped to the sites' geographical locations in Regions with more human activities, such as population centers and
QC (Supplementary Fig. S7), NB (Supplementary Fig. S8), PE (Supple­ areas with seasonal tourist activities, such as the west coast of Van­
mentary Fig. S9), NS (Supplementary Fig. S10), and NL (Supplementary couver Island, tended to have relatively higher fecal coliform contami­
Fig. S11). nation. The relationship between the fecal contamination level and

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 6. Correlation between precipitation and fecal coliform bacteria levels: distribution of sites with significant correlation between their approximated bacteria and
precipitation levels; the sites were grouped into bins according to corresponding p-values (sites with p-values smaller than 1e-10 were all grouped into bin No. 10).
(For interpretation of the references to colors in this figure, readers can refer to the web version of this article.)

human population has been acknowledged in many studies. For amplitudes of fecal contamination used onsite wastewater systems for
example, a statistically significant positive correlation between fecal sewage disposal, which has been shown to relate to fecal contamination
coliform levels and human population has been reported in the head­ in the surrounding areas (Carroll et al., 2005). These systems could be
waters of the May River, South Carolina (Soueidan et al., 2021), and sensitive to heavy rainfall, resulting in more contamination reaching
more fecal indicator bacteria are recorded at more central urban outlets, marine waters (Humphrey et al., 2018). Meanwhile, runoff caused by
in the coastal zone of Gabon, Central Western Africa (Leboulanger et al., high rainfall has been widely recognized as a strongly correlated factor
2021). The insufficient treatment of human fecal matter appears to be a to the fecal contamination in literature. For example, during wet sea­
major cause for such phenomenon (Kataržytė et al., 2018; Durand et al., sons, the fecal coliform levels were significantly higher in Pacific
2020). The biggest source of fecal pollution at Lake Pontchartrain, a Northwest estuaries (Zimmer-Faust et al., 2018), around the Jaranman
tourist attraction, is also human associated in Louisiana (Xue et al., Saryangdo area in Korea (Mok et al., 2016; Jung et al., 2017), and at two
2018). Moreover, the distribution of the contaminants across the coast is Costa Rican beaches (Laureano-Rosario et al., 2021). Runoff caused by
affected by coastal currents in Todos Santos Bay, Mexico (Tanahara precipitation might also result in flooding of the watershed and fecal
et al., 2021), which could also be the case around Texada Island in the matter on the surface being washed into the river (Boithias et al., 2021),
Strait of Georgia in British Columbia: according to the average surface such as the areas with upland agricultural activities along the east coast
circulation in the Strait of Georgia (Thompson, 1981), the amplitudes of of Vancouver Island; the riverbank filtration has been proven important
fecal contamination at these sites might be brought up by some of the for reduction of the fecal pollution (Wang et al., 2022), so we also sus­
strongest surface circulation currents in the strait that carried the highly pect that it could be the case that, with enough rainfall, the riverbank
contaminated water from the surrounding cities. height is exceeded, and a pulse of high turbidity water may be washed
Regarding the seasonality of the fecal contamination, in British into the marine waters, leading to high fecal coliform levels, as sus­
Columbia, the staggering periods of high amounts of meltwater and pended sediments can play a significant role in the transportation of the
precipitation allowed us to find regions with distinct fecal contamina­ bacteria (Chen and Liu, 2017). We infer that runoff elsewhere that was
tion seasonality correlated with meltwater or precipitation. Around driven by meltwater-driven river discharge should have similar effects
urban areas, storm water or high precipitation can cause an overflow except for influences on sewage discharge systems. Meltwater-driven
and worse water treatment than usual (Hall et al., 1998). In BC, pre­ rivers like the Squamish River and the Fraser River in BC could carry
cipitation in late fall and winter was usually very high, and certain lo­ substantial amount of fecal matter from upstream areas into marine
cations suffered from inflow and infiltration where rainwater leaked into waters in summer, potentially leading to peaks of bacteria levels at more
the sanitary sewer and could overwhelm the treatment system. Some distant locations. Fecal pollution entering the coastal marine environ­
moderately populated areas that coincided with higher overall ments in these approaches could also be influenced by the surface water

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

Fig. 7. Characteristics of FPC1 in the provinces along Atlantic coast: the maximum measurement at a site against the corresponding percentile of FPC1 score in the 4
provinces combined (a) and NL alone (c); the maximum measurement at a site with an FPC1 score in the highest (red) or the lowest (blue) 10 % against the week of
occurrence in the 4 provinces combined (b) and NL alone (d). (For interpretation of the references to colors in this figure, readers can refer to the web version of
this article.)

Table 1
The seasonality pattern was not sufficiently clear along the Atlantic
Summary of the findings: the found significant R2 and/or ρ between factors;
coast because the fecal coliform data in high precipitation seasons were
sites1 refer to sites with the highest 10 % of FPC1 scores and the highest 10 % of
not available. Regarding the precipitation patterns in the five provinces
FPC2 scores (those in Fig. 3a); sites2 refer to those with the highest 10 % of FPC1
scores and the lowest 10 % of FPC2 scores (those in Fig. 3b). along Atlantic coast, high precipitation usually occurred in the spring
and fall. Even though it was not as drastically dry as in BC, the precip­
BC QC, NB, PE, and NL
itation in these provinces was relatively low in the summer as compared
to the other seasons in the regions. Nevertheless, due to small numbers
FPC1 score vs. maximum FC bacteria R2 = 0.61 R2 = 0.68 R2 = 0.74
of fecal coliform measurements in winter and early spring there, we
level in p < 2.2e- p < 2.2e-16 p < 2.2e-
16 16 were unable to model the possible association between fecal bacteria
levels and the precipitation. Thus, our result should not be interpreted as
no association between precipitation and fecal bacteria level. For spe­
Meltwater-driven Approximated cific sites, there could be significant correlation between precipitation
water flow at sites1 precipitation at sites2 and fecal bacterial levels, especially at the sites with high FPC1 scores.
Approximated FC bacteria level at The Squamish ρ ≥ 0.54
Sites with significantly positive correlations between approximated
high-amplitude sites with strong River: ρ ≥ 0.74 levels of the two in each of the 5 provinces were grouped into 10 bins
seasonality in BC vs. p < 2.2e-16 and plotted in Supplementary Figs. S12-S16.
The Fraser River: p < 5.7e-05 Additional analyses have been done on other factors in British
ρ ≥ 0.62
Columbia. Fecal coliform bacteria levels have been found negatively
p < 1.4e-06
correlated with salinity levels (Aslan-Yılmaz et al., 2004; Ortega et al.,
2009; Soueidan et al., 2021), which agrees with our findings (results not
shown). However, salinity could change as a result of freshwater

S. You et al. Marine Pollution Bulletin 189 (2023) 114712

discharge and rainfall (Bonilla et al., 2007). The effect of temperature on org/10.1016/j.marpolbul.2023.114712.
the fecal coliform levels was hard to determine as it shared a similar
S. You et al. Marine Pollution Bulletin 189 (2023) 114712

