Full Ebook of Genomics in The Cloud Using Docker Gatk and WDL in Terra 1St Edition Geraldine A Van Der Auwera Brian O Connor Online PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Genomics in the Cloud Using Docker

GATK and WDL in Terra 1st Edition


Geraldine A Van Der Auwera Brian O
Connor
Visit to download the full and correct content document:
https://ebookmeta.com/product/genomics-in-the-cloud-using-docker-gatk-and-wdl-in-t
erra-1st-edition-geraldine-a-van-der-auwera-brian-o-connor/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Financial Risk Management for Cryptocurrencies Eline


Van Der Auwera

https://ebookmeta.com/product/financial-risk-management-for-
cryptocurrencies-eline-van-der-auwera/

Genomics in the Azure Cloud (First Early Release) Dr.


Colby T. Ford

https://ebookmeta.com/product/genomics-in-the-azure-cloud-first-
early-release-dr-colby-t-ford/

The Stillness Within Finding Inner Peace in a


Conflicted World Joanna Van Der Hoeven

https://ebookmeta.com/product/the-stillness-within-finding-inner-
peace-in-a-conflicted-world-joanna-van-der-hoeven/

Software Development with Go: Cloud-Native Programming


using Golang with Linux and Docker Nanik Tolaram

https://ebookmeta.com/product/software-development-with-go-cloud-
native-programming-using-golang-with-linux-and-docker-nanik-
tolaram/
Software Development with Go Cloud Native Programming
using Golang with Linux and Docker 1st Edition Nanik
Tolaram

https://ebookmeta.com/product/software-development-with-go-cloud-
native-programming-using-golang-with-linux-and-docker-1st-
edition-nanik-tolaram/

Peanut Butter and Jellyfish Jane O Connor

https://ebookmeta.com/product/peanut-butter-and-jellyfish-jane-o-
connor/

Coen van der Kroon Urine Therapy Short Guide Golden


Fountain The Complete Guide to Urine Therapy by Coen
van der Kroon Coen Van Der Kroon

https://ebookmeta.com/product/coen-van-der-kroon-urine-therapy-
short-guide-golden-fountain-the-complete-guide-to-urine-therapy-
by-coen-van-der-kroon-coen-van-der-kroon/

Accelerating Development Velocity Using Docker: Docker


Across Microservices 1st Edition Kinnary Jangla

https://ebookmeta.com/product/accelerating-development-velocity-
using-docker-docker-across-microservices-1st-edition-kinnary-
jangla-2/

Accelerating Development Velocity Using Docker: Docker


Across Microservices 1st Edition Kinnary Jangla

https://ebookmeta.com/product/accelerating-development-velocity-
using-docker-docker-across-microservices-1st-edition-kinnary-
jangla/
1. Foreword
2. Preface

a. Purpose, Scope, and Intended Audience of This


Book
i. What You Will Learn from This Book
ii. What Computational Experience Is
Needed for the Exercises?
b. Conventions Used in This Book
c. Using Code Examples
d. O’Reilly Online Learning
e. How to Contact Us
f. Acknowledgments

3. 1. Introduction
a. The Promises and Challenges of Big Data in
Biology and Life Sciences
b. Infrastructure Challenges
c. Toward a Cloud-Based Ecosystem for Data
Sharing and Analysis
i. Cloud-Hosted Data and Compute
ii. Platforms for Research in the Life
Sciences
iii. Standardization and Reuse of
Infrastructure
d. Being FAIR
e. Wrap-Up and Next Steps
4. 2. Genomics in a Nutshell: A Primer for Newcomers to
the Field

a. Introduction to Genomics

i. The Gene as a Discrete Unit of


Inheritance (Sort Of)
ii. The Central Dogma of Biology: DNA to
RNA to Protein
iii. The Origins and Consequences of DNA
Mutations
iv. Genomics as an Inventory of Variation
in and Among Genomes
v. The Challenge of Genomic Scale, by the
Numbers
b. Genomic Variation

i. The Reference Genome as Common


Framework
ii. Physical Classification of Variants
iii. Germline Variants Versus Somatic
Alterations

c. High-Throughput Sequencing Data Generation

i. From Biological Sample to Huge Pile of


Read Data
ii. Types of DNA Libraries: Choosing the
Right Experimental Design
d. Data Processing and Analysis

i. Mapping Reads to the Reference


Genome
ii. Variant Calling
iii. Data Quality and Sources of Error
iv. Functional Equivalence Pipeline
Specification

e. Wrap-Up and Next Steps


5. 3. Computing Technology Basics for Life Scientists

a. Basic Infrastructure Components and


Performance Bottlenecks

i. Types of Processor Hardware: CPU,


GPU, TPU, FPGA, OMG
ii. Levels of Compute Organization: Core,
Node, Cluster, and Cloud
iii. Addressing Performance Bottlenecks

b. Parallel Computing

i. Parallelizing a Simple Analysis


ii. From Cores to Clusters and Clouds:
Many Levels of Parallelism
iii. Trade-Offs of Parallelism: Speed,
Efficiency, and Cost

c. Pipelining for Parallelization and Automation

i. Workflow Languages
ii. Popular Pipelining Languages for
Genomics
iii. Workflow Management Systems
d. Virtualization and the Cloud

i. VMs and Containers


ii. Introducing the Cloud
iii. Categories of Research Use Cases for
Cloud Services
e. Wrap-Up and Next Steps
6. 4. First Steps in the Cloud
a. Setting Up Your Google Cloud Account and
First Project
i. Creating a Project
ii. Checking Your Billing Account and
Activating Free Credits

b. Running Basic Commands in Google Cloud


Shell
i. Logging in to the Cloud Shell VM
ii. Using gsutil to Access and Manage
Files
iii. Pulling a Docker Image and Spinning
Up the Container
iv. Mounting a Volume to Access the
Filesystem from Within the Container

c. Setting Up Your Own Custom VM


i. Creating and Configuring Your VM
Instance
ii. Logging into Your VM by Using SSH
iii. Checking Your Authentication
iv. Copying the Book Materials to Your
VM
v. Installing Docker on Your VM
vi. Setting Up the GATK Container Image
vii. Stopping Your VM…to Stop It from
Costing You Money
d. Configuring IGV to Read Data from GCS
Buckets
e. Wrap-Up and Next Steps
7. 5. First Steps with GATK

a. Getting Started with GATK


i. Operating Requirements
ii. Command-Line Syntax
iii. Multithreading with Spark
iv. Running GATK in Practice
b. Getting Started with Variant Discovery
i. Calling Germline SNPs and Indels with
HaplotypeCaller
ii. Filtering Based on Variant Context
Annotations
c. Introducing the GATK Best Practices

i. Best Practices Workflows Covered in


This Book
i. Other Major Use Cases

d. Wrap-Up and Next Steps


8. 6. GATK Best Practices for Germline Short Variant
Discovery
a. Data Preprocessing
i. Mapping Reads to the Genome
Reference
ii. Marking Duplicates
iii. Recalibrating Base Quality Scores
b. Joint Discovery Analysis
i. Overview of the Joint Calling Workflow
ii. Calling Variants per Sample to
Generate GVCFs
iii. Consolidating GVCFs
iv. Applying Joint Genotyping to Multiple
Samples
v. Filtering the Joint Callset with Variant
Quality Score Recalibration
vi. Refining Genotype Assignments and
Adjusting Genotype Confidence
vii. Next Steps and Further Reading
c. Single-Sample Calling with CNN Filtering
i. Overview of the CNN Single-Sample
Workflow
ii. Applying 1D CNN to Filter a Single-
Sample WGS Callset
iii. Applying 2D CNN to Include Read Data
in the Modeling
d. Wrap-Up and Next Steps
9. 7. GATK Best Practices for Somatic Variant Discovery
a. Challenges in Cancer Genomics
b. Somatic Short Variants (SNVs and Indels)

i. Overview of the Tumor-Normal Pair


Analysis Workflow
ii. Creating a Mutect2 PoN
iii. Running Mutect2 on the Tumor-
Normal Pair
iv. Estimating Cross-Sample
Contamination
v. Filtering Mutect2 Calls
vi. Annotating Predicted Functional
Effects with Funcotator

c. Somatic Copy-Number Alterations


i. Overview of the Tumor-Only Analysis
Workflow
ii. Creating a Somatic CNA PoN
iii. Applying Denoising
iv. Performing Segmentation and Call
CNAs
v. Additional Analysis Options
d. Wrap-Up and Next Steps
10. 8. Automating Analysis Execution with Workflows

a. Introducing WDL and Cromwell


b. Installing and Setting Up Cromwell
c. Your First WDL: Hello World

i. Learning Basic WDL Syntax Through a


Minimalist Example
ii. Running a Simple WDL with Cromwell
on Your Google VM
iii. Interpreting the Important Parts of
Cromwell’s Logging Output
iv. Adding a Variable and Providing Inputs
via JSON
v. Adding Another Task to Make It a
Proper Workflow

d. Your First GATK Workflow: Hello


HaplotypeCaller

i. Exploring the WDL


ii. Generating the Inputs JSON
iii. Running the Workflow
iv. Breaking the Workflow to Test Syntax
Validation and Error Messaging
e. Introducing Scatter-Gather Parallelism

i. Exploring the WDL


ii. Generating a Graph Diagram for
Visualization

f. Wrap-Up and Next Steps


11. 9. Deciphering Real Genomics Workflows

a. Mystery Workflow #1: Flexibility Through


Conditionals

i. Mapping Out the Workflow


ii. Reverse Engineering the Conditional
Switch

b. Mystery Workflow #2: Modularity and Code


Reuse

i. Mapping Out the Workflow


ii. Unpacking the Nesting Dolls

c. Wrap-Up and Next Steps

12. 10. Running Single Workflows at Scale with Pipelines


API

a. Introducing the GCP Genomics Pipelines API


Service

i. Enabling Genomics API and Related


APIs in Your Google Cloud Project

b. Directly Dispatching Cromwell Jobs to PAPI

i. Configuring Cromwell to Communicate


with PAPI
ii. Running Scattered HaplotypeCaller via
PAPI
iii. Monitoring Workflow Execution on
Google Compute Engine
c. Understanding and Optimizing Workflow
Efficiency

i. Granularity of Operations
ii. Balance of Time Versus Money
iii. Suggested Cost-Saving Optimizations
iv. Platform-Specific Optimization Versus
Portability

d. Wrapping Cromwell and PAPI Execution with


WDL Runner

i. Setting Up WDL Runner


ii. Running the Scattered HaplotypeCaller
Workflow with WDL Runner
iii. Monitoring WDL Runner Execution
e. Wrap-Up and Next Steps
13. 11. Running Many Workflows Conveniently in Terra

a. Getting Started with Terra

i. Creating an Account
ii. Creating a Billing Project
iii. Cloning the Preconfigured Workspace

b. Running Workflows with the Cromwell Server


in Terra
i. Running a Workflow on a Single
Sample
ii. Running a Workflow on Multiple
Samples in a Data Table
iii. Monitoring Workflow Execution
iv. Locating Workflow Outputs in the Data
Table
v. Running the Same Workflow Again to
Demonstrate Call Caching
c. Running a Real GATK Best Practices Pipeline
at Full Scale

i. Finding and Cloning the GATK Best


Practices Workspace for Germline
Short Variant Discovery
ii. Examining the Preloaded Data
iii. Selecting Data and Configuring the
Full-Scale Workflow
iv. Launching the Full-Scale Workflow and
Monitoring Execution
v. Options for Downloading Output Data
—or Not
d. Wrap-Up and Next Steps
14. 12. Interactive Analysis in Jupyter Notebook

a. Introduction to Jupyter in Terra

i. Jupyter Notebooks in General


ii. How Jupyter Notebooks Work in Terra
b. Getting Started with Jupyter in Terra

i. Inspecting and Customizing the


Notebook Runtime Configuration
ii. Opening Notebook in Edit Mode and
Checking the Kernel
iii. Running the Hello World Cells
iv. Using gsutil to Interact with Google
Cloud Storage Buckets
v. Setting Up a Variable Pointing to the
Germline Data in the Book Bucket
vi. Setting Up a Sandbox and Saving
Output Files to the Workspace Bucket
c. Visualizing Genomic Data in an Embedded IGV
Window

i. Setting Up the Embedded IGV Browser


ii. Adding Data to the IGV Browser
iii. Setting Up an Access Token to View
Private Data

d. Running GATK Commands to Learn, Test, or


Troubleshoot

i. Running a Basic GATK Command:


HaplotypeCaller
ii. Loading the Data (BAM and VCF) into
IGV
iii. Troubleshooting a Questionable
Variant Call in the Embedded IGV
Browser
e. Visualizing Variant Context Annotation Data

i. Exporting Annotations of Interest with


VariantsToTable
ii. Loading R Script to Make Plotting
Functions Available
iii. Making Density Plots for QUAL by
Using makeDensityPlot
iv. Making a Scatter Plot of QUAL Versus
DP
v. Making a Scatter Plot Flanked by
Marginal Density Plots

f. Wrap-Up and Next Steps


15. 13. Assembling Your Own Workspace in Terra

a. Managing Data Inside and Outside of


Workspaces

i. The Workspace Bucket as Data


Repository
ii. Accessing Private Data That You
Manage Outside of Terra
iii. Accessing Data in the Terra Data
Library
b. Re-Creating the Tutorial Workspace from Base
Components

i. Creating a New Workspace


ii. Adding the Workflow to the Methods
Repository and Importing It into the
Workspace
iii. Creating a Configuration Quickly with a
JSON File
iv. Adding the Data Table
v. Filling in the Workspace Resource Data
Table
vi. Creating a Workflow Configuration
That Uses the Data Tables
vii. Adding the Notebook and Checking the
Runtime Environment
viii. Documenting Your Workspace and
Sharing It
c. Starting from a GATK Best Practices
Workspace
i. Cloning a GATK Best Practices
Workspace
ii. Examining GATK Workspace Data
Tables to Understand How the Data Is
Structured
iii. Getting to Know the 1000 Genomes
High Coverage Dataset
iv. Copying Data Tables from the 1000
Genomes Workspace
v. Using TSV Load Files to Import Data
from the 1000 Genomes Workspace
vi. Running a Joint-Calling Analysis on
the Federated Dataset
d. Building a Workspace Around a Dataset

i. Cloning the 1000 Genomes Data


Workspace
ii. Importing a Workflow from Dockstore
iii. Configuring the Workflow to Use the
Data Tables

e. Wrap-Up and Next Steps


16. 14. Making a Fully Reproducible Paper
a. Overview of the Case Study

i. Computational Reproducibility and the


FAIR Framework
ii. Original Research Study and History of
the Case Study
iii. Assessing the Available Information
and Key Challenges
iv. Designing a Reproducible
Implementation

b. Generating a Synthetic Dataset as a Stand-In


for the Private Data
i. Overall Methodology
ii. Retrieving the Variant Data from 1000
Genomes Participants
iii. Creating Fake Exomes Based on Real
People
iv. Mutating the Fake Exomes
v. Generating the Definitive Dataset
c. Re-Creating the Data Processing and Analysis
Methodology
i. Mapping and Variant Discovery
ii. Variant Effect Prediction,
Prioritization, and Variant Load
Analysis
iii. Analytical Performance of the New
Implementation

d. The Long, Winding Road to FAIRness


e. Final Conclusions
17. Glossary
18. Index
Praise for Genomics in the Cloud

This book captures the essence of what’s been learned


about bringing genomics to the cloud. And it lays out an
accessible path for newcomers to join this exciting and
important ecosystem.
—Eric S. Lander, Founding Director, The Broad
Institute of MIT and Harvard

This book is a fantastic introduction to modern genome


analysis using state-of-the-art tools and practices. It covers
everything a reader needs to get their own analyses running
in an open, repeatable way. This is the quintessential primer
on the GATK and cloud-based analysis with Terra.
—Jonathan Smith, Principal Software Engineer,
The Broad Institute of MIT and Harvard

This is a great primer about reproducible bioinformatics in


the cloud. Geraldine and Brian are at the forefront of this
field, so we are learning from the best. And for those who
have yet to work with Terra, look no further for an excellent
introduction to it!
—Jessica Maia, Data Scientist, BD
Transferring from physics to cancer research as I did, I
learned genomics, sequencing, statistics piecemeal. I could
have used a book like this back then, because no matter
how much time you’ve spent in the field or if it’s your first
contact, there’s something new to learn and an appreciation
for the bigger picture to be gained.
—Aaron Chevalier, PhD Candidate, Boston
University

Genomics in the Cloud covers everything from the science of


genomic analysis to the computing technologies used to
process this data at massive scale; presented in a way that
lets you jump right in and run the same tools in the cloud
that are used by biologists, researchers, and clinicians
worldwide.
—Andrew Moschetti, Senior Solutions Architect,
Google Cloud Life Sciences

As the volume of genomic data increases, implementing


analysis using best practice cloud patterns becomes more
and more important. In this book, you’ll learn these patterns
via practical examples that you can try out using your own
data and research questions.
—Lynn Langit, Cloud Architect, Google
Developer Expert and AWS Community Hero
Genomics in the Cloud is an excellent introduction both to
genomics and cloud-based research, perfect for those who
wish to capitalize on the cloud environment to move their
research forward and for those who wish to better
understand this space.
—David E. Mohs, Software Engineer, The Broad
Institute of MIT and Harvard
Genomics in the Cloud
Using Docker, GATK, and WDL in Terra

Geraldine A. Van der Auwera and Brian D.


O’Connor
Genomics in the Cloud

by Geraldine A. Van der Auwera and Brian D. O’Connor

Copyright © 2020 The Broad Institute, Inc. and Brian


O’Connor All rights reserved.

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway


North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or


sales promotional use. Online editions are also available for
most titles (http://oreilly.com). For more information, contact
our corporate/institutional sales department: 800-998-9938 or
[email protected].

Acquisitions Editor: Rachel Novak

Development Editor: Michele Cronin

Production Editor: Katherine Tozer

Copyeditor: Octal Publishing, LLC

Proofreader: Sharon Wilkey

Indexer: Ellen Troutman-Zaig


Interior Designer: David Futato

Cover Designer: Karen Montgomery

Illustrator: Rebecca Demarest

April 2020: First Edition

Revision History for the First Edition


2020-04-02: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491975190
for release details.

The O’Reilly logo is a registered trademark of O’Reilly Media,


Inc. Genomics in the Cloud, the cover image, and related trade
dress are trademarks of O’Reilly Media, Inc.

The views expressed in this work are those of the authors, and
do not represent the publisher’s views. While the publisher and
the authors have used good faith efforts to ensure that the
information and instructions contained in this work are
accurate, the publisher and the authors disclaim all
responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any
code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual
property rights of others, it is your responsibility to ensure that
your use thereof complies with such licenses and/or rights.

978-1-491-97519-0

[LSI]
Foreword

I migrated from mathematics into the field of genomics in 1985


—roughly a year before the field officially came into existence.
The word genomics was coined in 1986, which also saw the
first public debate, at the Cold Spring Harbor Laboratory, about
the notion of mounting a Human Genome Project.

It’s hard to imagine how much has changed since then.


Computers hardly figured in biomedicine—the initial design for
the Whitehead Institute for Biomedical Research, founded in
the early 1980s, included no provision for a computer. Large
amounts of data were seen as a nuisance, not an asset—in a
Nature article reporting on the Human Genome Project debate,
the journal’s biology editor wrote, “If the skill and ingenuity of
modern biology are already stretched to interpret sequences of
known importance, such as those of the DMD and CGD genes,
what possible use could be made of more sequences?”

Despite such doubts, biologists eventually decided to press on


—launching the Human Genome Project, their first major data
gathering effort, in 1990. One of the important motivations
was the prospect of deploying systematic methods—rather
than guesswork—to discover the genes responsible for human
diseases. In 1980, a brilliant biologist, David Botstein, had
conceived how to find the location of genes for rare monogenic
diseases by tracing their inheritance in families relative to a
genetic map of DNA variants across the human genome.
Realizing the full power of the idea, though, would require
mapping—and eventually sequencing—the entire human
genome.

The Human Genome Project was an extraordinary collaboration


that spanned six countries and twenty institutions, took
thirteen years, and cost $3 billion. When the dust settled, the
world had the three billion nucleotide-long DNA sequence of a
single human genome.

With this project completed, many biologists thought that


business would return to usual. But what happened next was
even more remarkable. Over the next 15 years, biology
became an information science—in which the generation of
massive amounts of data reshaped the field. For example:

Genetic mapping in families revealed the genes


responsible for more than 5,000 serious rare
monogenic disorders.

New kinds of genetic mapping in populations led to the


discovery of ~100,000 robust associations of specific
genetic regions with common diseases and traits.

Genetic analysis of thousands of tumors uncovered


hundreds of new genes in which mutations propelled
cancer.
Remarkably, the cost of sequencing a human genome fell by a
factor of five million—from $3 billion to $600—and the cost is
likely to reach $100 in the coming years. More than one million
genomes have been sequenced so far. Overall, genomic data of
all kinds is doubling roughly every eight months.

None of this would have been possible without the


development of powerful new computational methods and
tools to work with the many new types of data that were being
generated. A good example is the Genome Analysis Toolkit,
developed by colleagues at the Broad Institute, which you’ll
read a lot more about in this book.

Today, life sciences are in the midst of new data explosions.


Many countries are undertaking systematic efforts to collect
genomic and medical data into national biobanks, which will
give researchers the ability to probe even further into the
genetics of both common and rare diseases and traits. It will
be especially important to ensure that the world’s full genetic
diversity is represented in these large-scale efforts—not just
people of European descent.

Because of the amazing technological progress in recent years,


we can now read out not just the DNA blueprint, but how this
blueprint is read out as RNA in individual cells. Methods have
been developed to read out gene expression at the single-cell
level, with an initial analysis of 18 cells soon leading to
analyses of more than 18 million cells. This work has given rise
to an international Human Cell Atlas project, involving more
than 60 countries around the world. These datasets are
beginning to make it possible to use computational methods,
including modern machine learning, to systematically infer the
underlying circuitry of cells.

As the biological applications burgeon, though, we are often


held back by systemic limitations in how we access and share
data. Most of the world’s biomedical data has traditionally been
held in silos—accessible only through servers from which each
authorized researcher or group must download their own
copies to their own institution’s computing infrastructure. From
a purely technical standpoint, this is unsustainable. Instead of
bringing data to researchers, we need systems that allow
researchers to operate on the data where it resides. We also
need more transparent models for managing custody of the
data, as well as efficient ways to assess, enforce and audit who
can access the data and for what purpose. We should aim to
abide by these four principles: (1) copying data should not be
the default mode of sharing data; (2) security and auditing
should be baked in and enterprise-grade; (3) large-scale
analysis should be accessible to all research groups; and (4)
computational resources should be elastic, so that they can be
scaled up or down as needed.

Cloud computing has emerged as the leading solution for the


technical aspect of these challenges. In practice, though, it
creates new obstacles that require creative solutions.
At the Broad Institute, we started moving to the cloud four
years ago, to cope with the rising tide of genomic data. We cut
our teeth by converting our genomic data-processing operation
from a traditional on-premises system to one that runs on the
cloud from the moment the data is generated in our genome
sequence platform. That move required rethinking every aspect
of the process and building entirely new systems from scratch
to handle the terabytes of data that come streaming off
sequencing machines every day. But that was just the
beginning. Once the data was up on the cloud, we hit the next
obstacle: the available cloud services, in their current state,
can be daunting to use for life sciences researchers without
advanced training. So, we teamed up with partners to develop
a software and analysis platform, Terra.

Other such platforms also have emerged as the move to the


cloud has picked up steam in biomedical research. Today we
are working with many other groups to build a federated data
ecosystem of interconnected components that offer
complementary services and capabilities. We expect these
platforms will help facilitate the kind of open collaboration that
is needed to bring together data, tools, and expertise spanning
multiple domains and disciplines. We also want to lower the
technical thresholds for individual researchers to participate in
the cloud-based ecosystem, especially those with fewer IT
resources at their disposal.

By all accounts, the transition of genomics to the cloud is still


in its early phases. At the Broad Institute, we’ve learned many
hard lessons on our own journey to the cloud, and we’re
learning more every day. In a time of such disruptive change,
it’s essential that groups share their experiences with each
other.

That’s why I’m so excited that the incomparable Geraldine Van


der Auwera, longtime advocate for the research community at
the Broad Institute, and Brian O’Connor, an ardent campaigner
for software and data interoperability at UCSC, have written
this book. The book captures the essence of what we have
learned so far, and lays out an accessible path for newcomers
to join the genomics cloud ecosystem.

Eric S. Lander, Founding Director, The Broad Institute


of MIT and Harvard
Preface
If cloud technology is the future of biomedical science, then for
genomics, the future is already here.

Genomics is the first biomedical discipline to move en masse to


the cloud. Perhaps inevitably so, given that it was the first to
experience explosive growth in data generation, leading to
rapidly escalating compute and storage requirement issues that
a cloud infrastructure is ideally positioned to address. Major
genomic datasets and their derived resources are now
available in the cloud, and many tools like the industry-leading
Genome Analysis Toolkit (GATK) produced by the Broad
Institute are now offered in forms optimized to run efficiently
on a cloud infrastructure. As a result, many researchers making
use of genomic data and related analysis tools are now or will
soon be confronted with the need to learn to use cloud
resources, which can represent a huge challenge to many.
Meanwhile, many informatics and bioinformatics support staff
are being pulled in to help researchers to achieve this
transition, sometimes with only minimal or no training relevant
to the science of genomics. Taken together, these two
populations form a continuum of people who need to get on
the same page and work together to solve the challenges they
face.
Purpose, Scope, and Intended Audience of
This Book
With this book, we aim to provide a hands-on orientation tour
of major tools, mechanisms, and processes involved in
performing genomic analysis in the cloud that can serve as a
middle ground for the majority of people on this spectrum. We
try to assume as little prior knowledge as possible, and we
provide two primer-style chapters, one focused on genomics
and one on technology, to ensure that everyone has a firm
grounding in the fundamental concepts we rely on from both
domains. In addition, we deliberately chose a particular open
source technology stack—GATK, Workflow Description
Language (WDL), Terra, Docker, and Google Cloud Platform—
that provides end-to-end functionality and is backed by robust
user support systems in order to guarantee a successful
educational experience.

To be clear, this book is not intended to be comprehensive,


either in terms of tooling options or the scientific scope of
genomic analyses. Our operational definition of genomics,
centered on variant discovery and immediately related
analyses, is intentionally narrow; and for every step of the
processes we describe, there often exist several, if not many,
alternative tools that you could substitute for those we chose
to showcase. However, we designed the topics and exercises
presented here to provide patterns and takeaways that are
largely transferable and extensible to other tools and analyses
in order to maximize their long-term value to readers. In
addition, we plan to release a series of companion blog posts
and other online materials that will show complementary
approaches using different platforms and technologies; see the
book’s GitHub repository and its companion website.

What You Will Learn from This Book


The very idea of doing genomics in the cloud might seem
intimidating on first approach, especially if you’re new to either
one or both, but it’s not as complicated as you might think.
Throughout this book, we walk you through all of the
important pieces of the puzzle, step by step. You’ll have the
opportunity to run genomic analyses involving the GATK,
selected for their broad appeal and interesting computational
approaches. You’ll do so first through the “bare” services
provided by the Google Cloud Platform (GCP) and then on
Terra, a scalable platform for biomedical research codeveloped
by the Broad Institute and Verily, an Alphabet company, on top
of GCP.

By the end of the book, you should expect to have learned or


achieved the following:

Fundamentals of computational infrastructure and


processes

Fundamentals of genomics including biological


underpinnings, formats, and conventions
Beginner- to intermediate-level hands-on usage of the
core technology stack:

GATK, WDL, Terra, Docker, and Google Cloud

GATK Best Practices for variant discovery as


formulated by the GATK development team at
the Broad Institute, covering germline short
variants, somatic short variants, and somatic
copy-number alterations

Reading, authoring, and interpreting analysis


workflows, first in a sandbox environment and
then at scale through several modes of
execution (from a standalone command-line
package to a fully managed system)

Managing data and workflow execution in a


workspace environment

Performing interactive analysis using Jupyter


Notebooks

Tying it all together: achieving computational


reproducibility in publications through the use
of cloud data storage, synthetic data
generation, portable workflows, and
containerized tools
Secondary goals

Increased familiarity with computational


concepts such as scaling and optimization
approaches

Practical experience with several bioinformatics


command-line packages, common commands,
and file formats

What Computational Experience Is Needed


for the Exercises?
For the exercises in Chapter 4 through Chapter 10, we assume
that you are already somewhat familiar with command-line
fundamentals, including the basics of navigating directories
and interacting with text files in a Bash shell; composing and
running simple commands; and the concepts of environment
variables, path, and working directory. For Chapter 8 through
Chapter 11 and Chapter 13, we assume that you are familiar
with the concept of writing scripts, though we do not require
you to have practical experience doing so. For Chapter 12 and
Chapter 14, we assume that you have heard of the
programming languages R and Python, and you will find it
easier to understand the more complex examples if you have
some familiarity with their syntax, though it is not required.

If at any point during the exercises you feel out of your depth
in terms of the computational tooling and terminology, we
recommend that you check out the lessons provided by the
Software Carpentry organization, which are specifically
designed for research scientists who have not had formal
computational training. The lessons on the Unix shell can be
particularly helpful if you don’t have any prior command-line
experience. They also have sets of lessons on Python and on R
as well as other topics relevant to the book like version control
with Git. These lessons are all open source and developed by
volunteers in the community who understand the everyday
challenges faced by researchers, so they’re a truly fantastic
resource.

Conventions Used in This Book


The following typographical conventions are used in this book:

Italic
Indicates new terms, URLs, email addresses, filenames, file
extensions, table names and components, and workflows.

Constant width

Used for program listings, as well as within paragraphs to


refer to program elements such as variable or function
names, databases, data types, environment variables,
statements, and keywords.

Constant width bold


Shows text that should be typed literally by the user.
Constant width italic

Shows text that should be replaced with user-supplied


values or by values determined by context.

$ before code
Indicates a command run in the VM shell

# before code

Indicates a command run in the docker container

NOTE
This element signifies a note.

Using Code Examples


Supplemental material (code examples, exercises, full-size
color figures, etc.) is available for download on GitHub.

This book is here to help you get your job done. In general, if
example code is offered with this book, you may use it in your
programs and documentation. You do not need to contact us
for permission unless you’re reproducing a significant portion
of the code. For example, writing a program that uses several
chunks of code from this book does not require permission.
Selling or distributing examples from O’Reilly books does
require permission. Answering a question by citing this book
and quoting example code does not require permission.
Incorporating a significant amount of example code from this
book into your product’s documentation does require
permission.

We appreciate, but generally do not require, attribution. An


attribution usually includes the title, author, publisher, and
ISBN. For example: “Genomics in the Cloud by Geraldine A.
Van der Auwera and Brian D. O’Connor (O’Reilly). Copyright
2020 The Broad Institute, Inc. and Brian O’Connor, 978-1-491-
97519-0.”

If you feel your use of code examples falls outside fair use or
the permission given above, feel free to contact us at
[email protected].

O’Reilly Online Learning

NOTE
For more than 40 years, O’Reilly Media has provided technology
and business training, knowledge, and insight to help companies
succeed.

Our unique network of experts and innovators share their


knowledge and expertise through books, articles, and our
online learning platform. O’Reilly’s online learning platform
gives you on-demand access to live training courses, in-depth
learning paths, interactive coding environments, and a vast
collection of text and video from O’Reilly and 200+ other
publishers. For more information, visit http://oreilly.com.

How to Contact Us
Please address comments and questions concerning this book
to the publisher:

O’Reilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

800-998-9938 (in the United States or Canada)

707-829-0515 (international or local)

707-829-0104 (fax)

We have a web page for this book, where we list errata,


examples, and any additional information. You can access this
page at https://oreil.ly/genomics-cloud.

Email [email protected] to comment or ask technical


questions about this book.

To learn more about our books, courses, and news, visit


http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments
We would like to thank our countless colleagues at the Broad
Institute and at the University of California, Santa Cruz (UCSC),
who contributed in so many ways to making this book a reality.

We are hugely indebted to all the past and present members of


the frontline support and education teams in the Data Sciences
Platform at the Broad Institute who developed and maintain
the original educational materials and resources on which we
based many of the hands-on exercises presented in this book.
Within the education team led by Robert Majovski, we’d like to
highlight the work of Soo Hee Lee, whose thoroughness and
exacting attention to detail produced some of the deepest
resources available about GATK tools; Allie Hajian and Anton
Kovalsky, who are tasked with the Herculean feat of
documenting how to use Terra even as it wriggles and evolves
from underneath them; and Kate Noblett, who wrote much of
the original WDL documentation and now coordinates GATK,
WDL, and Terra workshops with an iron hand. Within the
frontline support team led by Tiffany Miller, we’d like to
highlight the work of Beri Shifaw, who maintains the gatk-
workflows pipelines on GitHub and in Dockstore as well as the
featured workspaces in Terra; and Bhanu Gandham, who has
so enthusiastically taken on the responsibility of obsessing
about the well-being of the GATK user community. Other
contributing members from these two teams, past and present,
include Derek Caetano-Anolles, Sushma Chaluvadi, Sheila
Chandran, Elizabeth Kiernan, David Kling, Ron Levine and
Adelaide Rhodes.

We also recognize and appreciate the growing role played by


the Broad DSP Field Engineering team led by Alexander
Baumann in this arena. Star among the stars, Yvonne Blanco
swooped in from the User Experience team to improve key
diagrams and illustrations with her impeccable design mojo.

We are eternally grateful to the many members of the GATK


development team who have provided critical input to
educational resources and lent their expertise in GATK
workshops across the globe. There are too many of them to
enumerate here, but within that team, we would like to
highlight the invaluable support of Eric Banks, Laura Gauthier,
Yossi Farjoun, and Lee Lichtenstein; the seemingly endless
patience of David Benjamin and Sam Lee; the unflappable
aplomb of David Roazen and jovial fatalism of Louis Bergelson;
the quiet expertise of Mark “Duplicates” Fleharty and the
cheerful expertise of Megan Shand. Special shout-out also to
Chris Norman for his work on the Barclay library, which powers
the GATK documentation system.
On a more personal level, Geraldine would like to thank
Mauricio Carneiro and Mark De Pristo, past member and
founder of the original GATK team, respectively, for taking a
chance and hiring a confused microbiologist all those years
ago.

Speaking of too many to count, we could not begin to name


everyone involved in the development of the chapters on WDL,
Cromwell, and Terra, but we’d like to put in a special mention
for Adrian “Notebooks Guy” Sharma, William Disman, Ruchi
Munshi, and Kyle Vernest, who all contributed helpful insights
and put up with our constant badgering about issues we hoped
to see addressed before the book came out. On that note, we
owe a big thank you to Chris Llanwarne and Adam Nichols for
patching wdltool just in time for Chapter 9 to make a lot
more sense than it would have otherwise. And speaking of
badgering, our deepest apologies go to Eric Karofsky and
Jerôme Chadel from the User Experience team, who had to
endure a constant barrage of questions about what elements
of the Terra interface would change next and on what timeline.
We’re deeply grateful to Matthieu J. Miossec for collaborating
with us to develop the project we present in Chapter 14.

We are forever grateful to the reviewers who took the time to


read through early draft versions in order to help us identify
what didn’t work reliably and to understand what could be
improved. The book you see before you is very different from
what we originally gave them to evaluate, for the better. In this
category, we salute Titus Brown, Aaron Chevalier, Jeff Gentry,
Sean Horgan, Lynn Langit, Lee Lichtenstein, Jessica Maia,
David Mohs, Andrew Moschetti, Anubhav Shelat, and Jonn
Smith.

None of this would have been possible without the support of


our respective leadership teams. At the Broad Institute, we
would like to thank Eric Lander, Lee McGuire, and the Data
Sciences Platform leaders, particularly Anthony Philippakis, Eric
Banks again, and Danielle Ciofani for keeping the faith that this
book would eventually materialize. At UCSC, we thank the
Genomics Institute (GI) leadership including Benedict Paten
and the institute director, David Haussler, for their support
along with Greta Martin, whose organizational skills are
unrivaled, and Nadine Gassner, who keeps us funded so that
we can work on cool projects.

Within the UCSC GI, we want to thank the Computational


Genomics Platform (CGP) team whose members work on a
variety of projects that leverage Terra and other cloud-based
analysis ecosystem components we present in this book.
Contributors include Jesse Brennan, Amar Jandu, Natan Lao,
Melaina Legaspi, Geryl Pelayo, Charles Reid, Hannes Schmidt,
and Daniel Sotirhos. Within CGP, the Lighthouse Point team—
Michael Baumann (now at the Broad Institute), Lon Blauvelt,
Brian Hannafious, and Ash O’Farrell, led by Beth Sheets—
deserves special recognition for their role in writing excellent
research tutorials that helped inspire sections of the book.
We also want to thank the Dockstore teams at both UCSC and
the Ontario Institute for Cancer Research (OICR) for their
feedback on this effort and support building a platform for
workflow sharing that contributes to the Terra ecosystem.
Charles Overbeck leads the technical team at UCSC, and we
are grateful for contributions by Louise Cabansay, Abraham
Chavez, Andy Chen, Trevor Heathorn, Nneka Olunwa, Kevin
Osborn, Natalie Perez, Walter Shands, Emily Soth, Cricket
Sloan, and David Steinberg. Denis Yuen leads the technical
team at OICR with Lincoln Stein as the PI and contributions
from Ryan Bautista, Kitty Cao, Andy Chen, Vincent Chung,
Andrew Duncan, Victor Liu, Gary Luu, Shreya Radesh, and
Jennifer Wu.

Last but most definitely not least, we would like to thank our
loved ones for their patience and support during the more than
two years that it took us to produce this book. Geraldine hopes
that her lovely wife, Jessica, and daughters, Gabrielle and
Melanie, will be suitably impressed and somehow forget her
many late nights, obsessive behavior, and general inability to
complete any home-improvement projects during that time
period. Meanwhile, Brian thanks his partner Dhawal for his
infinite patience, understanding, and encouragement to finish
the book, along with his mom (Patty) and dad (Jim) for
providing the occasional and appreciated push to “get it done!”
Chapter 1. Introduction
We live in a time of great opportunity: technological advances
are making it possible to generate incredibly detailed and
comprehensive data about everything, from the sequence of
our entire genomes to the patterns of gene expression of
individual cells. Not only can we generate this type of data, we
can generate a lot of it.

Over the past 10 years, we’ve seen a stunning growth in the


amount of sequencing data produced worldwide, enabled by a
huge reduction in cost of short read sequencing, a technology
that we explore in Chapter 2 (Figure 1-1). Recently developed
and up-and-coming technologies like long-read sequencing and
single-cell transcriptomics promise a future filled with similar
transformative drops in costs and greater access to ’omics
experimental designs than ever before.
Another random document with
no related content on Scribd:
A Dieu, Jehanne vraye pucelle,
Qui est d’icelui bien aymée;
Ayez toujours ferme pensée
De Dieu estre sa pastorelle.

Puis s’en part, et y a pose.

MICHEL.

Pere, j’ay du tout acomply


Le vostre messaige humblement,
Sans riens avoir mis en oubly,
A la pucelle, vrayement;
Laquelle, debonnairement
De tout son cueur, vous veult servir,
Et tout vostre commandement
Le vouldra faire et acomplir.

DIEU.

Le royaulme je remetray sus,


Et les anemis confonduz,
Par la pucelle ruez jus
Et par elle tout convaincuz;
Que, dès si qu’elle les aura veuz,
En elle sera telle vaillance
Que il en seront esperduz.
Ou royaulme n’auront plus puissance.

Pose. Puis dit:

LA PUCELLE.
O mon Dieu et mon créateur,
Plaise vous moy toujours conduire.
Vous estes mon père et seigneur
Auquel je ne veuil contredire.

XI

POÉSIE FRANÇAISE DU XVIᵉ SIÈCLE

CANTIQUE DE PÈLERINAGE

1.

Je chanteray du Seigneur
La grandeur
En presence de ses Anges.
Son sainct Nom je beniray
Et diray
Tousjours ses sainctes louanges.

2.

Soit que le flambeau du jour


De son tour
Ait avancé la carrière,
Soit qu’il s’en aille levant
Ou couchant
Il me verra en prière.

3.
Les petits chantres aislés
Esveillés
Seront de la compagnie;
Parmi les champs et les bois
De leurs voix
Accompliront l’armonie.

4.

L’air noircy de tourbillons


A nos sons
Appaisera son orage;
Le ciel qui nous entendra
Monstrera
Les rays de son beau visage.

5.

Leve donc mon cueur à toy,


O grand Roy,
Embrase moy de ta flamme,
Afin que nul entretien
Que le tien
Ne puisse attirer mon ame.

6.

Ta majesté, ô grand Dieu,


D’aucun lieu
Ne sçauroit estre bornée,
Et devant toy cent mille ans
S’escoulant
Ne sont pas une journée.
7.

Tu es au plus haut des cieux


Glorieux,
Tu es au plus bas du monde;
Tu balances sur trois doigts
Tout le poids
De cette machine ronde.

8.

Ton esprit penetre tout


Jusque au bout,
Rien n’est hors de ta presence;
Tu es cet œil qui tout voit
Et connoit
Le fond de la conscience.

9.

Le Ciel, qui d’astres reluit


Toute nuict,
Emprunte de toy sa grace,
Et tout l’esclat non-pareil
Du soleil
N’est qu’un rayon de ta face.

10.
Sans efforts ce que tu veux
Tu le peux,
Et ton vouloir est ta peine;
Tu peux effacer ce tout
Tout d’un coup
Au seul vent de ton haleine.

11.

Tu fais cheminer les Roys


Sous tes loix
Et les princes de la terre
Desquels tu romps d’un clein-d’œil
Tout l’orgueil
Qui est fresle comme verre.

12.

Tu nous donnes les moissons


Aux saisons
Que toy seul fais et disposes;
Tu fais largesse et soutiens
De tes biens
La vie de toutes choses.

13.

C’est toi qui d’un riche esmail


Sans travail
Dore nos belles preries;
C’est toi qui donne à ces champs,
Tous les ans
Leurs gayes tapisseries.
14.

Dieu! qui ne voit les bienfaicts


Que tu fais
A toute humaine nature.
Bien qu’il semble homme au dehors
En son corps,
Il n’en a que la figure.

15.

Si tu monstres ton courroux


Contre nous,
Tout se renverse et chancelle;
La terre tremble d’effroy,
Hors de soy
Devant ta face immortelle.

16.

Quand tu lances par les airs


Mille esclairs
Et les esclats de ta fouldre,
Si tu ne les reserrois,
Tu mettrois
Tout cet univers en poudre.

17.
Tu fais de flots escumer
Cette mer,
Tu la brouilles de nuages.
Et puis tu retiens les vens
Insolens
Pour accoiser ces orages.

18.

Toy qui commandes à ces flux


Et reflux,
Fais qu’aucun mal ne me greve,
Et deffend ton pelerin
Au chemin
Quand il passera la greve.

19.

Anges qui donnez les mains


Aux humains,
Au cours de nostre voyage,
Soyez tousjours mon support
Jusque au port
De ce mien pelerinage.

20.

Et toy, reçoy ces accens


Dont le sens
Est tiré de tes ouvrages.
Que tous courbez avec moy
Devant toy
Te font honneur et hommage.
Amen.

XII

AU DIX-SEPTIÈME SIÈCLE

UN CHANT POPULAIRE EN L’HONNEUR DE SAINT-MICHEL

Lent.

Saint Michel, Archange des mers;


Votre puissance est sans égale,
Ayant renversé Lucifer,
Malgré sa fureur infernale;
Nous nous prosternons devant vous:
Saint Archange, priez pour nous.
2.

Vous êtes l’ornement des Cieux.


Et la gloire vous est acquise,
Prince des Esprits glorieux
Et le protecteur de l’Église:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

3.

Vous défendez les gens de bien,


Et le pauvre, dans l’indigence,
Ne manquera jamais de rien,
Lorsque vous serez sa défense:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

4.

Vous consolez les Pèlerins,


Qui, pour vous rendre leurs hommages,
Vous invoquent par les chemins,
Afin d’obtenir vos suffrages:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

5.
C’est vous, l’Archange glorieux,
Qui portez l’arme de victoire;
Nous venons vous offrir nos vœux,
Et chanter en votre mémoire:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

6.

Nous n’aurons que vous au moment


Que viendra le Juge sévère
Pour tenir son grand jugement,
Qui puisse adoucir sa colère:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

7.

Lorsqu’à l’article de la mort,


Le Diable nous voudra surprendre,
Daignez dans ce dernier effort
Venir du Ciel pour nous défendre:
Nous avons tous recours à vous,
Saint Archange, priez pour nous.

8.

Nous vous prions à jointes mains,


Prosternés en votre présence,
De nous aider en nos besoins;
Soyez, grand Saint, notre défense;
Nous avons tous recours à vous,
Saint Archange, priez pour nous.
9.

O Saint Michel, qui, dans le ciel,


Chantez du Très Haut les louanges;
Saint Raphaël, saint Gabriel,
Anges, Chérubins et Archanges:
Priez le Rédempteur pour nous;
Anges du Ciel, priez pour nous.

XIII

AU XVIIIᵉ SIÈCLE

UNE LETTRE A MABILLON[52]

Au Révérend Père Dom Jan Mabillon, religieux de l’abbaye de


Saint Germain des Prez, a Paris.
[Page 346]

Du Mont Saint Michel ce 8 avril 1706 P. C.


Mon révérend pere,
Je ne scais si j’ay fait response à la lettre que votre révérence m’a fait
l’honneur de m’escrire au sujet de notre monastère dont elle veut faire tirer
des planches. Dans le doutte où je suis, j’aymes mieux luy escrire deux foys
que de manquer à une, je dois luy avoir escrit que j’avois cherché le dessein
de notre monastère fait par nos pères, mais inutilement. Monsieur notre
intendant me la demande avecq instance: je fus dans la mesme peine à son
égard que je suis au votre. Si j’avois icy quelqu’un capable d’en faire un
dessein exact, je le ferois faire mais je n’ay personne; il mériteroit plus
qu’aucun autre, sans contredit, une place dans vos annales, mais j’aimerois
autant ou peut-être mieux ne l’y point mettre du tout s’il n’y est bien fait et si
tout n’y est bien marqué.
La fontaine de Saint Aubert est au bas d’un grand escalier qui descend du
pied de notre batiment, sur la grève, elle est sur la grève mesme tout joignant
le rocher, elle étoit autrefoys renfermée dans une tour que la mer a renversé,
et a penetré dans la ditte fontaine qui est ordinairement salée quand la mer y
pénètre, c’est un grand puis elevé de quinze à vingt pieds de la grève. Le
bout de notre dortoir donne à l’orient et règne au nord et au midi. Le
batiment a près de deux cents pieds de long. Dans le premier étage sont de
grandes sales voutées sans avoir que de très petites ouvertures et en petit
nombre à cause qu’il est en manière de forteresses; du bout de l’orient sont
le réfectoir au deuxième étage, la cuisine, la sale des chevaliers, au bout de
laquelle est cet escalier qui descend à la fontaine de Saint Aubert; au
troisième étage c’est un dortoir avecq le cloistre, qui est au dessus de la sale
des chevaliers, et qui n’a aucun étage au dessus; au quatrième étage un
deuxieme dortoir au dessus du premier, et un cinquieme étage au dessus où
est la classe d’un bout, et de l’autre un grenier.
Du côté du midy on a joint à ce batiment un autre petit corps de logis qui
ne comance qu’au deuxieme étage, c’est à dire au plain pié du réfectoir. Il y
a quatre étages; le premier sert de lavoir, le deuxieme c’est la chambre des
hostes; les deux autres étages n’occupent qu’une petite partie du bout du
dortoir joignant le cloitre, parceque s’il s’étendoit tout le long du dortoir il en
déroberoit tout le jour et les cellules en seroient inutiles, il en occupe trois
qui ne servent de rien. Le troisième étage est une chambre commune, et le
quatrieme la bibliotheque; il n’y a qu’un espace de six à sept pieds entre le
ron point de l’eglise et ce petit corps de logis, qui sert d’entrée au monastère.
Je ne connois point de petite montaigne à l’oposite de Tombelaine.
Tombelaine est un rocher, au nord du notre; à un gros quart de lieüe, on y
conte une demi lieüe. C’est un diminutif de tombe, la notre s’apelle le mont
de tombe [in monte tombe]. L’autre s’apelle Tombelaino, quasi tombula. Il y
a eu des batiments qui ont tous esté razez par ordre de la cour, c’est un
prieuré dont le revenu s’estand pour la pluspart dans la paroisse de Bassillé
distante de deux lieues dudit Tombelaino, où il y a un fief qui en dépend. Au
nord est de Tombelaino, il y a une pointe de terre qui avance en la mer et qui
est fort elevé qui s’apelle le pignon butor, mais il n’y a jamais eu ni église ni
chapelle. Au nord ouest est la pointe de cancale ces deux pointes font
comme un croissant ou une très grande anse; nous sommes dans le milieu de
cet anse, car le flux nous entour d’une demi lieu au sud.
A l’oüest de Tombelaine, il y a une montaigne
apellée Montdol, éloigné d’un gros quart de lieu cancal
le mont la grande mer
de Dol et d’une demie lieue au plus du rivage de St Michel
la mer. Je ne scays pas si vous ne voulez point le pignon butor
parler de cette montaigne. Il y a un petit prieuré dépendant d’icy dont
l’église est sur la montaigne avecq un bourg.
Il seroit trop juste que notre monastère contribuast à la gravure de ces
planches, et si j’en avois eu la nouvelle dans le temps que notre premier
procureur étoit à Paris, je l’aurois chargé de donner quelque chose à votre
réverence, mais il me seroit plus facile de tirer de l’eaue de notre rocher que
de l’argent de nos officiers et en verité quand ils le voudroient ils ne le
pouroient pas à présent. La misère est si grande que cela passe l’imagination.
Il y a trois ans que je dois quelque chose a un marchand libraire de Rennes
que je n’ay encore pu faire payer. Je suis bien faché de ne pouvoir satisfaire
a sa tres juste demande, car on ne peut estre avecq plus de distinction
d’estime et de considération que je suis,
Mon révérend père,
Votre très humble et tres obeissant serviteur et confrere.

Frère Julien Doyte.


M. B.

XIV

PENDANT LA RÉVOLUTION[53]

[Page 351]
Nous maires et officiers municipaux de la ville du Mont Sᵗ Michel étant
occupés à nos affaires communes de laditte ville avons été interrompu par un
bruit extraordinaire, qui a fait sortir plusieurs habitants de leurs maisons, et
après estre sorty de notre bureau dans la grande rüe, nous avons vu les
nommés Thomas Desplancher et Jean Desplancher son frère habitans de
cette ville qui se frapoient à coups de poind et de pieds, et se tiroient par les
cheveux, et se tretoient indignement par des jurement, et comme lesdits
Desplancher sont dans lusages de faire du tapages dans la ville et de troubler
journellement le repôs public; pourquoi nous avons ordonné comme de
police que lesdits Desplancher garderoient provisoirement prison l’espace de
vingt-quatre heures dans les prisons de cette ville, pourquoy nous avons
enjoint au sieur Turgot officier de garde de ce jour de commander des
soldats en nombre suffisent pour constituer prisonniers lesdits Desplancher;
sauf à ordonner plus grande peines s’il y echeit contre les dits Desplancher
donné à notre bureau le vingt sept may sur les dix heures et demie du soir
mil sept cent quatre-vingt-dix.
L’evatu, maire; J. Richard, officier municipal; Blin, officier municipal;
Auquetil, procureur; L. Leroy, ptre grefier.
Ledit jour a comparu ledit sieur Turgot officier de garde sur les huit
heures du soir le quel nous a déclaré avoir constitué prisonniers lesdits
Desplancher au terme de la sentence cy dessus ce qu’il a signé
Charles Turgot.
Du jeudi vingt décembre mil sept cent quatre vingt douze nous officiers
municipaux soussigné ayant apris que la nouvelle municipalité est
constituée, Déclarons nos fonctions municipalle finie et arrestée ce dit jour et
an que dessus.
F. Mouillaud, Cy dev. officier; L. Leroy, Cy dev. maire; Charles
Turgot, Cy dev. officier; Hevaut, Cy dev. greffier.

L’an mil sept cent quatre vingt treize le deuxième octobre, l’an 2ᵉ de la
République française une et indivisible
Au Mont Sᵗ Michel
S’est présenté en la maison commune le citoyen Oury envoyé de
l’assemblée primaire du canton d’Avranches district du dit lieu, section de
Saint-Saturnin, lequel nous a apparu 1º du raport et du decret du 23 août dʳ
sur la réquisition civique des jeunes citoyens pour la deffense de la patrie, 2º
d’une proclamation du citoyen Le Carpentier représentant du peuple envoyé
par la Convention nationale dans le département de la Manche; 3º d’une
commission à lui adressée par le d. citoyen Le Carpentier au nom du salut
public et conformement à l’article 4 du decret de la Convention nationale en
date du 16 août dʳ; 4º enfin d’une lettre des citoyens administratieurs de ce
district, du 25 de ce mois avec invitation de publier et de donner lecture des
pièces ci dessus aux jeunes citoyens âgés de 18 à 25 ans afin qu’ils
connoissent leur requisition; de faire aussi un état nominatif de tous ces
mèmes jeunes gens et d’en tenir registre, de donner le denombrement exact
de la quantité et de la qualité de tous les chevaux autres que ceux servant à
l’agriculture et enfin celui de tous les fusils et surtout de ceux de calibre, de
tout quoi le d. citoyen Oury nous a demandé acte et a signé après lecture,
Oury.
L’an mil sept cent quatre vingt traize l’an deux de la République française
une et indivisible le traize 8ᵇʳᵉ mil sept cent quatre-vingt-traize il a été aresté
par le Conseille generale de la commune apprès avoir antandu le procureur
qua dater de ce jour ille est etabli un burau dans le ci-devant presbitaire dans
laquelle il serfait un tron à trois clés pour ramasser toutes les laitres qui
viendront soit à l’adresse des prestre deténu dans le chateau de cette vil ou a
l’adrès du maire ou officiers municipaux ou procureur de la commune
pourvu qui soit sust l’adrèse ou dedans pour remaitre à quaques uns de ces
prestres detenus, dont le tron ne sera ouvairt que deux fois par semaine
savoir le mardi et le vandredi de chaque semaine; il en sera fait un à la porte
du chateau paraille dans laquelle tous les praîtres mettront leurs laitres san
qu’il an soit pris une par aucun manbre de la commune qua l’ouvairtur du
tronc, il est aresté quauquun paquet ni assignat ne seront remis à aucuns de
ses refractaire quan présence de la commune; s’il vien du pain, il séra
distribué egallement; ille est defandu au consierge de laisser autres qui que
se soit plus loing que la porte sou paine daitre punis suivant larresté autre le
conseille généralle de la commune, aresté an maison commune le dit jour,
mois et an ci-dessus; il est défandu au consierge de quitter sa porte sous
peine de pairdre sa pansion à moins qu’il ne se fasse ranplacer par une
pairsones capable de le ranplacer et que la municipalité yst consante deux
mot rayé nul.
J. Hevaut, officier; J. Richard, maire; Tomas Fouché; Etienne
Vidal, procureur; Julien Menard; J. Basire; Jean Gainard.

Du jeudi trente août mil sept cent quatre-vingt-douze, l’an 4ᵐᵉ de la


liberté en la maison commune du Mont St Michel s’est présenté devant nous
maire et officiers municipaux de la dite ville du Mont St Michel le sʳ Henri
Jean Dufour prestre ci-devant religieux bénédictin, lequel a déclaré vouloir
sur le champ prester le serment prescrit suivant les loix. Nous susdits
officiers municipaux, en reconnaissance de la conduite patriotique du sʳ
Dufour an nous bien connue depuis la Révolution, et vû que le sʳ Dufour
s’est présenté differentes fois et a offert son serment à la municipalité
pourquoi nous n’avons pas crü devoir differ d’avantage à l’admettre à
prester son serment ce qu’il a fait dans les termes suivans: je jure d’être
fidelle à la Nation à la Loi et au Roi, et de maintenir de tout mon pouvoir la
Constitution du Royaume décretée par l’Assemblée nationale constituante
aux années 1789, 1790, 1791. Ce qu’il a singné avec nous ce présent proces
verbal fait et arrété ce dit jour et an que dessus.
Henry-Jean Dufour, cy devant religieux benedictin; L. Leroy, maire;
F. Morillaud, officier; Jean Duval; F. Hevaut, greffier.

Du jeudi quatre octobre mil sept cent quatre vingt douze l’an 4ᵐᵉ de la
Liberté et le 1ᵉʳ de Legalité, nous officiers muncipaux du Mont St Michel,
extraordinairement assemblé, au domicile du sʳ Henry-Jean Dufour prestre ci
devant religieux bénédictin en vertu d’une réquisition de sa part, tendant à
prester le serment réquis, pourquoi nous susdits officiers municipaux vû
l’infirmité du sʳ Dufour et né pouvant se rendre à la maison commune nous
nous sommes expres transporté au lieu de son domicilie pour récévoir son
serment léquel la sur le champ proféré dans les termes suivants, je jure de
maintenir la liberté et legalité ou de mourir en la deffendant. Ce qu’il a
singné avec nous le dit jour et an que dessus.
Henry Jean Dufour; L. Leroy, maire; F. Morillaud, officier;
F. Hevaut, p. greffier; Ch. Turgot, officier.

Dudit jour jeudi quatre octobre mil sept cent quatre-vingt-douze l’an 4ᵐᵉ
de la liberté et 1ᵉʳ de l’egalité, c’est présenté devant nous, officiers
municipaux à la commune du Mont-St-Michel, le sʳ Claude Carton prestre ci
devant religieux benedictin léquel a déclaré vouloir se conformer à la loi, et
prester le serment requis par les décrets de l’assemblée nationalle lequel la
main levée l’a proféré sur le champ dans les termes suivants: Je jure de
maintenir la liberté et l’egalité ou de mourir en la deffendant. Ce qu’il a
singné avec nous lesdit jour et an que dessus.
Claude Carton; L. Leroy; F. Morillaud, officier; Ch. Turgot,
officier; F. Hevaut, greffier.
Dudit jour jeudi quatre octobre mil sept cent quatre vingt douze l’an 4ᵐᵉ
de la liberté et 1ᵉʳ de l’egalité c’est présenté devant nous officiers
municipaux à la maison commune du Mont Sᵗ Michel, le sʳ Louis Augustin
Pissès prestre ci devant religieux benedictin léquel a déclaré vouloir se
conformer à la loi et prester le serment requis par les décrets de l’Assemblée
nationale, lequel la main levée la profferé sur le champ dans termes suivants:
je jure de maintenir la liberté et legalité ou de mourir en la deffendants, ce
qu’il a singné avec nous ce dit jour et an que dessus.
Louis Aug. Pissis, L. Leroy, maire; F. Morillaud, officier;
Ch. Turgot, officier; F. Hevaut, greffier.

Dudit jour jeudi quatre octobre mil sept cent quatre vingt douze, l’an 4ᵐᵉ
de la liberté et de legalité c’est présenté devant nous officiers municipaux à
la maison commune du Mont St Michel le sʳ Jacque Besnard curé
constitutionnel dudit lieu, lequel a déclaré vouloir se conformer à la loi et
prester le serment requis par les décrets de lassemblée nationale lequel la
main levée la profferé sur le champ dans les termes suivants: je jure de
maintenir la liberté et legalité ou de mourir en la deffendants. Ce qu’il a
singné avec nous ce dit jour et an que dessus.
Jacques Besnard, Curé du Mont; L. Leroy, maire; Ch. Turgot,
officier; L. Hevaut, greffier; F. Morillaud, of.

Nous maire et officiers municipaux de la commune du Mont-St-Michel


certifions à qui il appartiendra que le citoyen Nicolas de la Goude prestre
originaire de la paroisse de Saint Lo Dourville demeurant depuis plusieurs
anée à celle de St Georges de Bohom est maintenant à la maison commune
du Mont St Michel, nous lui avons delivré le presant pour lui servir en cas de
bezoin, fait ce vingt trois aoust mil sept cent quatre vint treize l’an deux de la
République françoise.
La Goude, J.; Richard, maire; Etienne Vidal, procureur.
Suivant la déclaration que Nicolas de la Goude nous a fait ce dit jour et
an que dessus. J. Richard, m.
Nous maire et officiers municipaux de la commune du Mont St Michel
soussigné cairtifions que le citoyen Charles le Venard lainé ci-devant prieur
de la Mancellière est vivant et existant et habite présantement au chatiau du
Mont St Michel en foy de quoi nous lui avons délivré le présent pour lui
servir et valloir en cas de besoin en la maison commune ce dix 7ᵇʳᵉ mil sept
cent quatre vingt treize, l’an deux de la République françoise une et
indivisible.
Charles Le Venard l’ainé; J. Richard, maire.

Nous maire et officiers municipaux de la commune du Mont St Michel


district d’Avranche département de la Manche en exécution du decret de la
Convention nationale du quinze mars dernier certifions acqui il appartiendra
que le citoyen Jean Jacques Chatiaux prêtre est vivant et existant demeure au
chatau dudit Mont St Michel sans interruption depuis le seize may dairnier
presente année 1793 en foy de quoi nous avons delivré le present pour servir
et valoir ce que de raison au dit citoyen Chataux, affiche prealablement faite
dudit certificat pendant trois jours an la maison commune du Mont St Michel
le quinze octobre mil sept cent quatre vingt treize l’an deux de la République
une et indivisible.
J. Richard, maire; Chateaux; Etienne Vidal, procureur; Jean
Gainard; Hevaut, greffier; J. Hamel.

Les derniers ecclesiastique qui étoient detenus dans le chateau du Mont


Saint Michel an sont party le vingt un germinal l’an trois de la République
francoise une et indivisible. Extrait du registre des délibérations du Conseil
general du district d’Avranches du sept octobre 1793, l’an 2 de la
République francoise une et indivisible.
L’assemblée de Conseil du district d’Avranches vu la pétition de la
commune du Mont St Michel en date de ce jour par laquelle elle expose que
ses habitants et les pretres detenus dans la ci devant abbaye sont dans un
besoin puissant, qu’elle ne peut leur procurer de subsistance dans la
campagne, qu’il est difficile pour ne pas dire impossible de s’en procurer au
marchert considerant que la position du Mont Saint Michel environné le plus
souvant de la mer ne permet pas à ses habitants de sortir librement pour aller
aux marché d’Avranche, chef lieu de canton que celui de Pontorson est
beaucoup plus à proximité après avoir de nouveau examiné les requisitions
adressées jusqu’ici aux communes de ce canton et les avoir conférées le
procureur syndic entendû arrété que les requisitions adressées aux
communes environnant Pontorson sont revoquées à commencer de jeudi
prochain, qu’au lieu de 340 rahiaux de blé requis jusqu’ici pour
l’aprovisionnement du marché de Pontorson, il en sera requis quatre cent dix
parce que la municipalité veillera à ce que l’excedent vertisse specialement à
l’approvisionnement du Mont St Michel qua commencer par le marché du
16 de ce mois les communes denommées au present arresté et portées sur le
registre des deliberations contribueront à l’approvisionnement du marché de
chaque semaine dans la proportion ditte des deux tiers au moins en fromant
seigle et paumelle et l’autre au plus en sarrazin qua cette fois les
municipalités seront tenüe sous leur responsabilité d’adresser aux
cultivateurs de leurs communes, autres que les fermiers des domaines
nationaux et d’émigrés, les requisitions nécessaires pour faire fournir la
quantité qui leur est assignée d’en envoyer l’etat delle certifié à la
municipalité de Pontorson assez à tems pour qu’elle puisse vérifier ceux qui
refuseront de defférer aux requisitions qui leur seront faites et que sur sa
denonciation ils soient poursuivis suivants la rigueur de la loi. Signé le
Marié présidant et le Maistre pour expédition conforme Carbonnel le
Maistre.
Délivré un certificat d’existance au citoyen Jean Baptiste Monteuil prêtre
originaire de la commune de la Haye du Puit district de Carentan
département susdit est vivant et existant, demeure au châtiau du Mont St
Michel sans interruption depuis le vingt deux juin. Foy de quoi nous avons
délivré le présent pour servir et valoir ce que de raisons au dit citoyen
Monteuil affiche prealablement faite dudit certificat pendant trois jours en la
maison commune du Mont St Michel. Ce onze pluviôse 2 année de la
République françoise une et indivisible.
J. Richard, maire; Monteuil.
LIBERTÉ ÉGALITÉ.—PLACE DAVRANCHE
Le commandant Leuperaiec d’Avranche commandant de la gandarmerie
national tu voudras bien citoyen, commander le nombre de gens darmes que
tu jugeras necessaire pour conduire au Mont-St-Michel, les nomées Le
Mornier, Le Chevalier et Saussons ci devant curé et vicaires de Sacé ils sont
à la maison d’arret. Tu y joindras le nomée le Souge prêtre qui est à la
prison. Letellier
Le 10 ventôse l’an 2 de la Republique françoise une et indivisible.
Le neuf floreal deuxième année republicaine s’est présenté le citoyen
Francois Grentel ci devant curé à Vains St Leonard canton et district
d’Avranche devant les officiés municipaux et notable en pairmanance de la
commune du Mont libre, lequel a declaré que par obeisance à l’aresté du
citoyen le Carpentié représentant du peuple à Port Malo an date du vingt
quatre gairminal lequele aresté n’a encore été publié officiellement, il venait
se rendre au dit lieu du Mont libre le dit jour et an que dessus.
F. Grentet; Etienne Vidal, agent national; F. Morillaud, officier
en permanence.

Avranches 7 floreal lan 2ᵉ de la Republique une et indivisible.


LIBERTÉ, ÉGALITÉ, UNITÉ ET INDIVISIBLE DE LA RÉPUBLIQUE
L’agent national près le district d’Avranches aux citoyens du district.
Laissez librement passer le citoyen Pierre Lainé de la commune de St Sénier
près Avranches allant au Mont St Michel conduire le citoyen Pierre
Affichard prêtre condamnée à la réclusion dans la maison commune qui y est
établie la Mᵗᵉ du Mont St Michel permettra au citoyen Lainé d’entrer dans la
dit maison pour y faire le arrangement necessaire au dit Affichard.
Avranche 7 floreal lan 2 de la Republique françoise.
Frain, ageans nationale.

You might also like