Get Provenance and Annotation of Data and Processes 7th International Provenance and Annotation Workshop IPAW 2018 London UK July 9 10 2018 Proceedings Khalid Belhajjame Free All Chapters
Get Provenance and Annotation of Data and Processes 7th International Provenance and Annotation Workshop IPAW 2018 London UK July 9 10 2018 Proceedings Khalid Belhajjame Free All Chapters
Get Provenance and Annotation of Data and Processes 7th International Provenance and Annotation Workshop IPAW 2018 London UK July 9 10 2018 Proceedings Khalid Belhajjame Free All Chapters
OR CLICK LINK
https://textbookfull.com/product/provenance-and-
annotation-of-data-and-processes-7th-
international-provenance-and-annotation-workshop-
ipaw-2018-london-uk-july-9-10-2018-proceedings-
khalid-belhajjame/
Read with Our Free App Audiobook Free Format PFD EBook, Ebooks dowload PDF
with Andible trial, Real book, online, KINDLE , Download[PDF] and Read and Read
Read book Format PDF Ebook, Dowload online, Read book Format PDF Ebook,
[PDF] and Real ONLINE Dowload [PDF] and Real ONLINE
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/virtual-reality-and-augmented-
reality-15th-eurovr-international-conference-eurovr-2018-london-
uk-october-22-23-2018-proceedings-patrick-bourdot/
https://textbookfull.com/product/telematics-and-computing-7th-
international-congress-witcom-2018-mazatlan-mexico-
november-5-9-2018-proceedings-miguel-felix-mata-rivera/
Data Analytics 31st British International Conference on
Databases BICOD 2017 London UK July 10 12 2017
Proceedings 1st Edition Andrea Calì
https://textbookfull.com/product/data-analytics-31st-british-
international-conference-on-databases-bicod-2017-london-uk-
july-10-12-2017-proceedings-1st-edition-andrea-cali/
https://textbookfull.com/product/case-based-reasoning-research-
and-development-26th-international-conference-
iccbr-2018-stockholm-sweden-july-9-12-2018-proceedings-michael-t-
cox/
https://textbookfull.com/product/logic-language-information-and-
computation-24th-international-workshop-wollic-2017-london-uk-
july-18-21-2017-proceedings-1st-edition-juliette-kennedy/
https://textbookfull.com/product/computational-data-and-social-
networks-7th-international-conference-csonet-2018-shanghai-china-
december-18-20-2018-proceedings-xuemin-chen/
Khalid Belhajjame
Ashish Gehani
Pinar Alper (Eds.)
Provenance
LNCS 11017
123
Lecture Notes in Computer Science 11017
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Khalid Belhajjame Ashish Gehani
•
Provenance
and Annotation of Data
and Processes
7th International Provenance
and Annotation Workshop, IPAW 2018
London, UK, July 9–10, 2018
Proceedings
123
Editors
Khalid Belhajjame Pinar Alper
Paris Dauphine University University of Luxembourg
Paris Belvaux
France Luxembourg
Ashish Gehani
SRI International
Menlo Park, CA
USA
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume contains the proceedings of the 7th International Provenance and
Annotation Workshop (IPAW), held during July 9–10, 2018, at King’s College in
London, UK. For the third time, IPAW was co-located with the Workshop on the
Theory and Practice of Provenance (TaPP). Together, the two leading provenance
workshops anchored Provenance Week 2018, a full week of provenance-related
activities that included a shared poster session and three other workshops on algorithm
accountability, incremental re-computation, and security. The proceedings of IPAW
include 12 long papers that report in-depth the results of research around provenance,
two system demonstration papers, and 19 poster papers.
IPAW 2018 provided a rich program with a variety of provenance-related topics
ranging from the capture and inference of provenance to its use and application. Since
provenance is a key ingredient to enable reproducibility, several papers have investi-
gated means for enabling dataflow steering and process re-computation. The modeling
of provenance and its simulation has been the subject of a number of papers, which
tackled issues that seek, among other things, to model provenance in software engi-
neering activities or to use provenance to model aspects of the European Union General
Data Protection Regulation. Other papers investigated inference techniques to propa-
gate beliefs in provenance graphs, efficiently update RDF graphs, mine similarities
between processes, and discover workflow schema-level dependencies. This year’s
program also featured extensions of the W3C Prov recommendation to support new
features, e.g., versioning of mutable entities, or cater for new domain knowledge, e.g.,
astronomy.
In closing, we would like to thank the members of the Program Committee for their
thoughtful reviews, Vasa Curcin and Simon Miles for the local organization of IPAW
and the Provenance Week at King’s College, London, and the authors and participants
for making IPAW a successful event.
Program Committee
Pinar Alper University of Luxembourg, Luxembourg
Ilkay Altintas SDSC, USA
David Archer Galois, Inc., USA
Khalid Belhajjame University of Paris-Dauphine, France
Vanessa Braganholo UFF, Brazil
Kevin Butler University of Florida, USA
Sarah Cohen-Boulakia LRI, University of Paris-Sud, France
Oscar Corcho Universidad Politécnica de Madrid, Spain
Vasa Curcin King’s College London, UK
Susan Davidson University of Pennsylvania, USA
Daniel de Oliveira Fluminense Federal University, Brazil
Saumen Dey University of California, Davis, USA
Alban Gaignard CNRS, France
Daniel Garijo Information Sciences Institute, USA
Ashish Gehani SRI International, USA
Paul Groth Elsevier Labs, The Netherlands
Trung Dong Huynh King’s College London, UK
Grigoris Karvounarakis LogicBlox, Greece
David Koop University of Massachusetts Dartmouth, USA
Bertram Ludaescher University of Illinois at Urbana-Champaign, USA
Tanu Malik University of Chicago, USA
Marta Mattoso Federal University of Rio de Janeiro, Brazil
Deborah McGuinness Rensselaer Polytechnic Institute (RPI), USA
Simon Miles King’s College London, UK
Paolo Missier Newcastle University, UK
Luc Moreau King’s College London, UK
Beth Plale Indiana University Bloomington, USA
Satya Sahoo Case Western Reserve University, USA
Stian Soiland-Reyes The University of Manchester, UK
Jun Zhao University of Oxford, UK
Additional Reviewers
Reproducibility
PROV Extensions
Scientific Workflows
Applications
System Demonstrations
Quine: A Temporal Graph System for Provenance Storage and Analysis . . . . 177
Ryan Wright
1 Introduction
Consider data analytics processes that exhibit the following characteristics. C1:
are resource-intensive and thus expensive when repeatedly executed over time,
i.e., on a cloud or HPC cluster; C2: require sophisticated implementations to run
efficiently, such as workflows with a nested structure; C3: depend on multiple
reference datasets and software libraries and tools, some of which are versioned
and evolve over time; C4: apply to a possibly large population of input instances.
This is not an uncommon set of characteristics. A prime example is data
processing for high throughput genomics, where the genomes (or exomes) of
a cohort of patient cases are processed, individually or in batches, to produce
lists of variants (genetic mutations) that form the basis for a number of diag-
nostic purposes. These variant calling and interpretation pipelines take batches
of 20–40 patient exomes and require hundreds of CPU-hours to complete (C1).
Initiatives like the 100K Genome project in the UK (www.genomicsengland.co.
uk) provide a perspective on the scale of the problem (C4).
c Springer Nature Switzerland AG 2018
K. Belhajjame et al. (Eds.): IPAW 2018, LNCS 11017, pp. 3–15, 2018.
https://doi.org/10.1007/978-3-319-98379-0_1
4 J. Cala and P. Missier
Fig. 1. A typical variant discovery pipeline processing a pool of input samples. Each
step is usually implemented as a workflow or script that combines a number of tools
run in parallel.
Figure 1, taken from our prior work [5], shows the nested workflow structure
(C2) of a typical variant calling pipeline based on the GATK (Genomics Analysis
Toolkit) best practices from the Broad Institute.1 Each task in the pipeline relies
on some GATK (or other open source) tool, which in turn requires lookups in
public reference datasets. For most of these processes and reference datasets new
versions are issued periodically or on an as-needed basis (C3). The entire pipeline
may be variously implemented as a HPC cluster script or workflow. Each single
run of the pipeline creates a hierarchy of executions which are distributed across
worker nodes and coordinated by the orchestrating top-level workflow or script
(cf. the “Germline Variant Discovery” workflow depicted in the figure).
Upgrading one or more of the versioned elements risks invalidating previ-
ously computed knowledge outcomes, e.g. the sets of variants associated with
patient cases. Thus, a natural reaction to a version change in a dependency is to
upgrade the pipeline and then re-process all the cases. However, as we show in
the example at the end of this section, not all version changes affect each case
equally, or in a way that completely invalidates prior outcomes. Also, within each
pipeline execution only some of the steps may be affected. We therefore need a
system that can perform more selective re-processing in reaction to a change. In
[6] we have described our initial results in developing such a system for selective
re-computation over a population of cases in reaction to changes, called ReComp.
ReComp is a meta-process designed to detect the scope of a single change or of
a combination of changes, estimate the impact of those changes on the popula-
tion in scope, prioritise the cases for re-processing, and determine the minimal
amount of re-processing required for each of those cases. Note that, while ide-
ally the process of upgrading P is controlled by ReComp, in reality we must also
account for upgrades of P that are performed “out-of-band” by developers, as
we have assumed in our problem formulation.
1
https://software.broadinstitute.org/gatk/best-practices.
Provenance Annotation and Analysis 5
Briefly, ReComp consists of the macro-steps shown in Fig. 2. The work pre-
sented in this paper is instrumental to the ReComp design, as it addresses the
very first step (S1) indicated in the figure, in a way that is generic and agnostic
to the type of process and data.
The problem we address in this paper is to identify, for each change front
CF , the smallest set of those executions that are affected by CF . We call this the
re-computation front C relative to CF . We address this problem in a complex
general setting where many types of time-interleaved changes are allowed, where
many configurations are enabled by any of these changes, and where executions
Provenance Annotation and Analysis 7
may reflect any of these configurations, and in particular individual cases x may
be processed using any such different configurations. The example from the next
section illustrates how this setting can manifest itself in practice.
Our main contribution is a generic algorithm for discovering re-computation
front that applies to a range of processes, from simple black-box, single compo-
nent programs where P is indivisible, to complex hierarchical workflows where P
consists of subprograms Pi which may itself be defined in terms of subprograms.
Following a tradition from the literature to use provenance as a means
to address re-computation [2,6,12], our approach also involves collecting and
exploiting both execution provenance for each E, as well as elements of process–
subprocess dependencies as mentioned above. To the best of our knowledge this
particular use of provenance and the algorithm have not been proposed before.
wDF wDF
Now consider the new change C2 = {a3 −−−→ a2 , b2 −−−→ b1 }, affecting both
D1 and D2 , and suppose both x1 and x2 are going to be re-processed. Then,
for each x we retrieve the latest executions that are affected by the change,
in this case E2 , E1 , as their provenance may help optimising the re-processing
of x1 , x2 using the new change front {a3 , b2 }. After re-processing we have two
new executions: E3 = P (x1 , [a3 , b2 ]), E4 = P (x2 , [a3 , b2 ]) which may have been
optimised using E2 , E1 , respectively, as indicated by their ordering: E3 E2 ,
E4 E1 (see Fig. 3/right).
To continue with the example, let us now assume that the provenance for
a new execution: E5 = P (x1 , [a1 , b2 ]) appears in the system. This may have
been triggered by an explicit user action independently from our re-processing
system. Note that the user has disregarded the fact that the latest version of
ai is a3 . The corresponding scenario is depicted in Fig. 4/left. We now have two
executions for x1 with two configurations. Note that despite E0 E5 holds it
wIB
is not reflected by a corresponding E5 −−→ E0 in our re-computation system
Provenance Annotation and Analysis 9
because E5 was an explicit user action. However, consider another change event:
wDF
{b3 −−−→ b2 }. For x2 , the affected executions is E4 , as this is the single latest
execution in the ordering recorded so far for x2 . But for x1 there are now two
executions that need to be brought up-to-date, E3 and E5 , as these are the
maximal elements in the set of executions for x1 relative according to the order:
E0 E2 E3 , E0 E5 . We call these executions the recomputation front for
x1 relative to change front {a3 , b3 }, in this case.
This situation, depicted in Fig. 4/right, illustrates the most general case
where the entire set of previous executions need to be considered when re-
processing an input with a new configuration. Note that the two independent
executions E3 and E5 have merged into the new E6 .
Formally, the recomputation front for x ∈ X and for a change front CF =
{w1 . . . wk }, k ≤ m is the set of maximal executions E = P (x, [v1 . . . vm ]) where
vi ≤ wi for 1 ≤ i ≤ m.
To illustrate this problem let us focus on a small part of our pipeline – the
alignment step (Align Sample and Align Lane). Figure 5 shows this step modelled
using ProvONE. P0 denotes the top program – the Align Sample workflow, SP 0
is the Align Lane subprogram, SSP 0 –SSP 3 represent the subsub-programs of
bioinformatic tools like bwa and samtools, while SP 1 –SP 3 are the invocations
of the samtools program. Programs have input and output ports (the dotted
grey arrows) and ports p1 –p8 are related with default artefacts a0 , b0 , etc. spec-
ified using the provone:hasDefaultParam statement. The artefacts refer to the
code of the executable file and data dependencies; e.g. e0 represents the code of
samtools. Programs are connected to each other via ports and channels, which
in the figure are identified using reversed double arrows.
Running this part of the pipeline would generate the runtime provenance
information with the structure resembling the program specification (cf. Fig. 6).
The main difference between the static program model and runtime information
is that during execution all ports transfer some data – either default artefacts
indicated in the program specification, data provided by the user, e.g. input
sample or the output data product. When introducing a change in this context,
wDF wDF
e.g. {b1 −−−→ b0 , e1 −−−→ e0 }, two things are important. Firstly, the usage of
the artefacts is captured at the sub-execution level (SSE 1 , SSE 3 and SE 1 –SE 3 )
while E0 uses these artefacts indirectly. Secondly, to rerun the alignment step
it is useful to consider the sub-executions grouped together under E0 , which
determines the end of processing and delivers data y0 and z0 meaningful for the
user. We can capture both these elements using the tree structure that naturally
fits the hierarchy of executions encoded with ProvONE. We call this tree the
restart tree as it indicates the initial set of executions that need to be rerun. The
tree also provides references to the changed artefacts, which is useful to perform
further steps of the ReComp meta-process. Figure 6 shows in blue the restart
tree generated as a result of change in artefacts b and e.
Provenance Annotation and Analysis 11
Fig. 6. An execution trace for the program shown in Fig. 5 with the restart
tree and artefact references highlighted in blue. ( ) – the wasPartOf relation
between executions; ( ) – the used statements; ( ) – the sequence of the
Ej used z wasGeneratedBy Ei statements. (Color figure online)
Finding the restart tree involves building paths from the executions that used
changed artefacts, all the way up to the top-level execution following the was-
PartOf relation. The tree is formed by merging all paths with the same top-level
execution.
Listing 1.2. Function to generate the path from the given execution to its top-level
parent.
1 f u n c t i o n p a t h t o r o o t ( ChangedItem , Exec ) : Path
2 OutPath := [ ChangedItem ]
3 repeat
4 f o r wIB i n i t e r w a s i n f o r m e d b y ( Exec )
5 i f t y p e o f ( wIB ) i s ” recomp : r e −e x e c u t i o n ” then
6 return [ ]
7 OutPath . append ( Exec )
8 Exec := g e t p a r e n t ( Exec )
9 u n t i l Exec = n u l l
10 r e t u r n OutPath
2
https://github.com/ReComp-team/IPAW2018.
Provenance Annotation and Analysis 13
4 Related Work
A recent survey by Herschel et al. [9] lists a number of applications of provenance
like improving collaboration, reproducibility and data quality. It does not high-
light, however, the importance of process re-computation which we believe needs
much more attention nowadays. Large, data-intensive and complex analytics
requires effective means to refresh its outcomes while keeping the re-computation
costs under control. This is the goal of the ReComp meta-process [6]. To the best
of our knowledge no prior work addresses this or a similar problem.
Previous research on the use of provenance in re-computation focused on
the final steps of our meta-process: partial or differential re-execution. In [4]
Bavoil et al. optimised re-execution of VisTrails dataflows. Similarly, Altintas
et al. [2] proposed the “smart” rerun of workflows in Kepler. Both consider data
dependencies between workflow tasks such that only the parts of the workflow
affected by a change are rerun. Starflow [3] allowed the structure of a workflow
and subworkflow downstream a change to be discovered using static, dynamic
and user annotations. Ikeda et al. [10] proposed a solution to determine the
fragment of a data-intensive program that needs rerun to refresh stale results.
Also, Lakhani et al. [12] discussed rollback and re-execution of a process.
We note two key differences between the previous and our work. First, we
consider re-computation in the view of a whole population of past executions;
executions that may not even belong to the same data analysis. From the popu-
lation, we select only those which are affected by a change, and for each we find
the restart tree. Second, restart tree is a concise and effective way to represent
the change in the context of a past, possibly complex hierarchical execution. The
tree may be very effectively computed and also used to start partial rerun. And
using the restart tree, partial re-execution does not need to rely on data cache
that may involve high storage costs for data-intensive analyses [15].
Another use of provenance to track changes has been proposed in [8,11] and
recently in [14]. They address the evolution of workflows/scripts, i.e. the changes
in the process structure that affect the outcomes. Their work is complementary
to our view, though. They use provenance to understand what has changed in
the process e.g. to link the execution results together or decide which execution
provides the best results. We, instead, observe changes in the environment and
then react to them by finding the minimal set of executions that require refresh.
References
1. Alper, P., Belhajjame, K., Curcin, V., Goble, C.: LabelFlow framework for anno-
tating workflow provenance. Informatics 5(1), 11 (2018)
2. Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the
Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006.
LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006). https://doi.org/10.
1007/11890850 14
3. Angelino, E., Yamins, D., Seltzer, M.: StarFlow: a script-centric data analysis
environment. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010.
LNCS, vol. 6378, pp. 236–250. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-17819-1 27
4. Bavoil, L., et al.: VisTrails: enabling interactive multiple-view visualizations. In:
VIS 05. IEEE Visualization, 2005, No. Dx, pp. 135–142. IEEE (2005)
5. Cala, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-
exome data processing using workflows on the cloud. Future Gener. Comput. Syst.
65, 153–168 (2016)
6. Cala, J., Missier, P.: Selective and recurring re-computation of Big Data analytics
tasks: insights from a Genomics case study. Big Data Res. (2018). https://doi.org/
10.1016/j.bdr.2018.06.001. ISSN 2214-5796
7. Cuevas-Vicenttı́n, V., et al.: ProvONE: A PROV Extension Data Model for Scien-
tific Workflow Provenance (2016)
8. Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.:
Managing rapidly-evolving scientific workflows. In: Proceedings of the 2006 Inter-
national Conference on Provenance and Annotation of Data, pp. 10–18 (2006)
9. Herschel, M., Diestelkämper, R., Ben Lahmar, H.: A survey on provenance: what
for? what form? what from? VLDB J. 26(6), 1–26 (2017)
10. Ikeda, R., Das Sarma, A., Widom, J.: Logical provenance in data-oriented work-
flows. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE),
pp. 877–888. IEEE (2013)
11. Koop, D., Scheidegger, C.E., Freire, J., Silva, C.T.: The provenance of workflow
upgrades. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010.
LNCS, vol. 6378, pp. 2–16. Springer, Heidelberg (2010). https://doi.org/10.1007/
978-3-642-17819-1 2
12. Lakhani, H., Tahir, R., Aqil, A., Zaffar, F., Tariq, D., Gehani, A.: Optimized
rollback and re-computation. In: 2013 46th Hawaii International Conference on
System Sciences, No. I, pp. 4930–4937. IEEE (Jan 2013)
Provenance Annotation and Analysis 15
13. Moreau, L., et al.: PROV-DM: the PROV data model. Technical report, World
Wide Web Consortium (2012)
14. Pimentel, J.F., Murta, L., Braganholo, V., Freire, J.: noWorkflow: a tool for col-
lecting, analyzing, and managing provenance from python scripts. Proc. VLDB
Endow. 10(12), 1841–1844 (2017)
15. Woodman, S., Hiden, H., Watson, P.: Applications of provenance in performance
prediction and data storage optimisation. Future Gener. Comput. Syst. 75, 299–
309 (2017)
Another random document with
no related content on Scribd:
CHAPTER XVII
TIME was spreading its rust and its vines over everything, eating
away the edges of his passions and fastening the hinges of his will
so that it could not turn.
The hate he felt for Chalender was slowly paralyzed. Having
forborne the killing of him lest the public be apprised of what he had
killed him for, it followed that Chalender must be treated politely
before the public for the same reason. Thus justice and etiquette
were both suborned to keep people from wondering and saying,
Why?
Being unable to avoid Chalender, he had to greet him casually, to
pass the time of day, even to smile at Chalender’s flippancies. Under
such custom the grudge itself decayed, or retreated at least to the
place where old heartbreaks and horrors make their lair.
There was much talk of Chalender’s splendid engineering work.
His section of the aqueduct prospered exceedingly. He had a way
with his men and though there was an occasional outburst, he kept
them happier and busier than they were in most of the other
sections.
He had a joke or a picturesque sarcasm for everyone, and the
men were aware that his lightness was not a disguise for cowardice.
They remembered that when two of them had fought with picks, he
had jumped into the ditch between them. He could now walk up to
drunken brutes of far superior bulk and take away their weapons,
and often their tempers. He composed quarrels with a laugh or
leaped in with a quick slash of his fist on the nearest nose.
People said to RoBards: “Fine lad, Harry Chalender, great friend of
yours, isn’t he? Plucky devil, too.”
That was hard to deny without an ugly explanation. It would have
been peculiarly crass to sneer or snarl at a man held in favor for
courage.
So the tradition prospered that Chalender and RoBards were
cronies. It was a splendid mask for the ancient resentment. And by
and by the disguise became the habitual wear, the feelings adapted
themselves to their clothes. He would have felt naked without them.
RoBards had to shake himself now and then to remind himself that
he was growing not only tolerant of Chalender, but fond of him.
This was not entirely satisfactory to Patty. She had a woman’s
terrified love of conflict in her behalf. A woman who sees a man slain
on her account suffers beyond doubt, but there is a glory in her
martyrdom. Patty’s intrigue had ended in a disgusting armistice, a
smirking truce. It was comfortable to have a husband and a home,
but it was ignominious to have the husband at peace with the
intruder.
The aqueduct was all the while growing, a vast cubical stone
serpent increasing bone by bone and scale by scale.
It still lacked a head, and RoBards the lawyer like a tiny Siegfried
continued to assail the dragon everywhere, seeking a mortal spot.
The Croton dam was yet to be built, as well as two big bridges and
two great reservoirs in the city. It grew plain that the seven miles
within the island of Manhattan would cost nearly as much as the
original estimates for the whole forty-six.
And the times were cruelly hard. The estimates rose as the
difficulty of raising money increased. Four and a half million dollars
were disbursed without the error of a cent, and the devotion and
dogged heroism of all the water army won even RoBards’
admiration.
By the beginning of 1841 thirty-two miles were finished, including
Harry Chalender’s section. He was called next to aid the work of
completing the dam. A new lake now submerged four hundred acres
of hills and vales with a smooth sheet of water.
Then the laborers on the upper line struck for higher wages and
marched down the aqueduct, driving away or gathering into their
own ranks all the workmen they met. They overawed the rural police,
but when the Mayor of New York called out the militia, the laborers
were forced back to their jobs.
The building of the dam was a work of titanic nicety. The rock
bottom of gneiss was so far down that an artificial foundation had to
be laid under a part of the wall, while a long tunnel and a gateway
must be cut through living rock. A protection wall was building from a
rock abutment, but there came a vast rain on the fifth of January and
it fell upon the deep snow for two days and nights. The overfall had
been raised to withstand a rise of six feet, but the flood came surging
up a foot an hour until it lifted a sea fifteen feet above the apron of
the dam.
Foreseeing a devastation to come, a young man named Albert
Brayton played the Paul Revere and ran with the alarm until he was
checked by a gulf where Tompkins Bridge had stood a while before.
Then he got a horn and played the Angel Gabriel: blew a mighty
blast to warn the sleeping folk on the other shore that their Judgment
Day had come.
The earthen embankment of the dam dissolved and took the
heavy stone work with it. Just before dawn the uproar of the torrent
wakened the farmers miles away as the catapult of water hurtled
down the river, sweeping with it barns, stables, homes, grist mills,
cattle, people, and every bridge across the Croton’s whole length, till
it flung them upon the Hudson’s icy waste.
The Quaker Bridge, which carried the Albany stages, went
swirling; also the Pines Bridge that Washington and his men had
traversed time and again. At Bailey’s iron and wire mills the snarling
wave fell so swiftly upon the settlement that it made driftwood of the
factory and flung fifty women and men from their beds into the
current. There was such a fleet of uprooted trees afloat that all of the
people were saved except two stout men who overweighted the
boughs they clung to. A Mr. Bailey waded breast deep carrying his
father and a box of gold in his arms and got them both to safety.
Harry Chalender played the hero as usual. After one laborer on
the dam had lost his outstretched hand and was drowned, he ran
along the black waters and darting in here and there brought forth
whatever his hand found, whether girl or babe, lowing calf or
squeaking pig. He brought one swirling bull in by the tail and had like
to have been gored to death for his courtesy. But with his wonted
nimbleness he stepped aside, and the bull charging past him
plunged into another arm of the stream and went sailing down with
all fours in air.
The collapse of the dam was a grave shock to the public
confidence. It meant a heavy loss in precious cash and its time
equivalent, but the Crotonians grew only a little grimmer, a little more
determined.
There was much blazon of Chalender in the newspapers, and a
paragraph describing how meek he was about the strength and
courage of his own hands and how proud of the fact that his section
at Sing Sing had stood the battering rams of the deluge without a
quiver.
Patty’s comment on this was a domestic sniff: “I suppose he got
his feet so wet he’ll catch a terrible cold. Well, I hope he doesn’t
come here to be nursed. If he should I’ll send him packing mighty
quick, I’ll tell you.”
Comment was difficult for RoBards, to whom the mention of
Chalender’s mere name was the twisting of a rusty nail in his heart,
but his mind leaped with a wonderful meditation:
There had been progress not only in the building of the aqueduct
but in the laying of a solid causeway under the feet of his family. A
sudden storm had swept Patty’s emotions over the dam of restraint
and wrecked their lives for a while, but now the damage was so well
repaired that she could speak with light contempt of the man who
had carried her heart away; she could say that she would shut in his
face the door to the home he had all but destroyed. Plainly the house
was now her home, too, and Chalender vagrant outside.
This thought filled RoBards’ heart with a flood of overbrimming
tenderness for Patty. He watched her when she tossed the
newspaper to the floor and caught her more exciting baby from its
cradle to her breast. She laughed and nuzzled the child and crushed
him to her heart and made up barbaric new words to call him. Calling
him Davie Junior and little Davikins was in itself a way of making
love to her husband by the proxy of their child.
The sunlight that made a shimmering aureole about her flashed in
her eyes shining with the tears of rapture. RoBards understood one
thing at last about her: She wanted someone to caress and to
defend.
He had always read her wrong. He had offered to be her
champion and to shelter her under his strong arms. But Chalender
had won her by being hungry for her and by stretching his arms
upward to drag her down to him.
RoBards felt that he had never really won Patty because he had
always been trying to be lofty and noble. She had rushed to him
always when he was dejected or helpless with anger; but he had
always lost her as soon as he recovered his self-control.
He wished that he might learn to play the weakling before her to
keep her busy about him. But he could not act so uncongenial a part
at home or abroad.
CHAPTER XVIII
BACK of the house and above it on a hilltop too rocky for clearing,
too rough for pasture even, was a little pool ringed around with huge
boulders. No one could explain them, though the Indians had
believed that they had been hurled in a battle of giants.
Tall trees stood up among them and canopied the pool with such
shadow that on the hottest days there was a chill there.
RoBards had brought Patty hither on their first visit to Tuliptree
Farm as bride and groom fugitive from the cholera plague. She had
cried out in delight at the spookiness of the place and he had called
it the Tarn of Mystery. He was not quite sure what a tarn might be but
the word had a somber color that he liked. And Patty had shuddered
deliciously, rounding her eyes and her lips with a murmurous “ooh!”
like a girl hearing a ghost story late at night.
He had helped her to skip from rock to rock like an Alpine climber
among glaciers, but when they came close to the pool glowing as an
emerald of unimaginable weight, she had recoiled from it in disgust,
because it seemed to her but a sheet of green scum. He explained
to her that what revolted her was an almost solid field of drenched
tiny leaves. But he could not persuade her to come near and admire.
She hated the look of it, and when she saw a tiny water snake
wriggling through it in pursuit of a frog, she fled in loathing.
In the fall the leaves came down from the trees in slow spirals.
They lay on the surface of the pool, which had not water enough to
draw them into its plant-choked shallows. The sharpening winds
swept them across the surface in little flocks.
The children loved to play beside the Tarn, though Patty told them
stories of Indians that had murdered and been murdered there. She
whispered to RoBards that when she saw the Tarn it always hinted of
suicide or assassination. The farmer, Mr. Albeson, laughed at this,
but his wife, Abby—even the children called her Abby—said they
was stories about the place. She had forgotten just what they was,
but like as not they was dead bodies there. Folks enough had
vanished during the Revolution, and maybe some of them was still
laying out there waiting for Judgment Day to rouse them up.
It was to this moody retreat that RoBards hurried now. He took one
rail fence at a leap and landed running, like a hurdler. He stumbled
and fell and was up again. Keith clambered after his father, crawled
through the fence and over the rocks till he came where Immy lay
bruised and stunned. Keith saw his father drop to his knees and lift
the child, clench her to his breast, and shake his head over her, then
raise his eyes to the sky and say something to God that the boy
could not hear.
The boy had always been reproached for tears and had been told,
“You’re a big man now and big men don’t cry.” Yet he could see that
his father was crying, crying like a little frightened girl. This strange
thing twisted the boy’s heart and his features and he pushed forward
to comfort his father. He was near enough to hear his sister
moaning:
“Papa—papa—I’m hurt—Immy’s hurt!”
Before the boy could touch him, RoBards lowered Immy gently in
the autumn leaves and put up his head and let out a strange sound
like a wolf’s howl.
Then he struggled to his feet, and ran here and there, looking,
looking. He climbed one of the high boulders about the Tarn and
stared this way and that; leaped down and vanished.
Keith ran past Immy whimpering and struggled up the steep slab
of the same boulder on all fours. Before he reached the top he could
hear voices, his father’s in horrible anger, and another voice in terror.
It was Jud Lasher’s voice and there was so much fear in it that
Keith’s own heart froze.
Sprawling at the peak of the boulder, he peered over, and there he
saw his father beating and kicking and hurling Jud Lasher about on
the sharp stones. He swung his fist like the scythe the farmer swung
and slashed Jud’s head and swept him to the ground; then picked