(FREE PDF Sample) Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson Ebooks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Full download ebook at ebookmass.

com

Exploratory Data Analysis Using R 1st Edition


Ronald K. Pearson

For dowload this book click link below


https://ebookmass.com/product/exploratory-data-
analysis-using-r-1st-edition-ronald-k-pearson/

OR CLICK BUTTON

DOWLOAD EBOOK

Download more ebook from https://ebookmass.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Biostatistics and Computer-based Analysis of Health


Data using R 1st Edition Christophe Lalanne

https://ebookmass.com/product/biostatistics-and-computer-based-
analysis-of-health-data-using-r-1st-edition-christophe-lalanne/

Using R For Data Analysis In Social Sciences: A


Research Project-oriented Approach Li

https://ebookmass.com/product/using-r-for-data-analysis-in-
social-sciences-a-research-project-oriented-approach-li/

Data Analysis for the Life Sciences with R 1st Edition

https://ebookmass.com/product/data-analysis-for-the-life-
sciences-with-r-1st-edition/

Singular Spectrum Analysis using R Hossein Hassani

https://ebookmass.com/product/singular-spectrum-analysis-using-r-
hossein-hassani/
Numerical Methods Using Kotlin: For Data Science,
Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-
data-science-analysis-and-engineering-1st-edition-haksun-li-2/

Numerical Methods Using Kotlin: For Data Science,


Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-
data-science-analysis-and-engineering-1st-edition-haksun-li/

Spatial analysis using big data: methods and urban


applications Yamagata

https://ebookmass.com/product/spatial-analysis-using-big-data-
methods-and-urban-applications-yamagata/

Practical Business Analytics Using R and Python: Solve


Business Problems Using a Data-driven Approach 2nd
Edition Umesh R. Hodeghatta

https://ebookmass.com/product/practical-business-analytics-using-
r-and-python-solve-business-problems-using-a-data-driven-
approach-2nd-edition-umesh-r-hodeghatta/

Statistics for Ecologists Using R and Excel: Data


Collection, Exploration,

https://ebookmass.com/product/statistics-for-ecologists-using-r-
and-excel-data-collection-exploration/
EXPLORATORY
DATA ANALYSIS
USING R
Chapman & Hall/CRC
Data Mining and Knowledge Series
Series Editor: Vipin Kumar

Computational Business Analytics


Subrata Das
Data Classification
Algorithms and Applications
Charu C. Aggarwal
Healthcare Data Analytics
Chandan K. Reddy and Charu C. Aggarwal
Accelerating Discovery
Mining Unstructured Information for Hypothesis Generation
Scott Spangler
Event Mining
Algorithms and Applications
Tao Li
Text Mining and Visualization
Case Studies Using Open-Source Tools
Markus Hofmann and Andrew Chisholm
Graph-Based Social Media Analysis
Ioannis Pitas
Data Mining
A Tutorial-Based Primer, Second Edition
Richard J. Roiger
Data Mining with R
Learning with Case Studies, Second Edition
Luís Torgo
Social Networks with Rich Edge Semantics
Quan Zheng and David Skillicorn
Large-Scale Machine Learning in the Earth Sciences
Ashok N. Srivastava, Ramakrishna Nemani, and Karsten Steinhaeuser
Data Science and Analytics with Python
Jesus Rogel-Salazar
Feature Engineering for Machine Learning and Data Analytics
Guozhu Dong and Huan Liu
Exploratory Data Analysis Using R
Ronald K. Pearson

For more information about this series please visit:


https://www.crcpress.com/Chapman--HallCRC-Data-Mining-and-Knowledge-Discovery-Series/book-series/CHDAMINODIS
EXPLORATORY
DATA ANALYSIS
USING R

Ronald K. Pearson
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20180312

International Standard Book Number-13: 978-1-1384-8060-5 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents

Preface xi

Author xiii

1 Data, Exploratory Analysis, and R 1


1.1 Why do we analyze data? . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The view from 90,000 feet . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Exploratory analysis . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Computers, software, and R . . . . . . . . . . . . . . . . . 7
1.3 A representative R session . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Organization of this book . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Graphics in R 29
2.1 Exploratory vs. explanatory graphics . . . . . . . . . . . . . . . . 29
2.2 Graphics systems in R . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Base graphics . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 Grid graphics . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.3 Lattice graphics . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.4 The ggplot2 package . . . . . . . . . . . . . . . . . . . . . 36
2.3 The plot function . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 The flexibility of the plot function . . . . . . . . . . . . . 37
2.3.2 S3 classes and generic functions . . . . . . . . . . . . . . . 40
2.3.3 Optional parameters for base graphics . . . . . . . . . . . 42
2.4 Adding details to plots . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 Adding points and lines to a scatterplot . . . . . . . . . . 44
2.4.2 Adding text to a plot . . . . . . . . . . . . . . . . . . . . 48
2.4.3 Adding a legend to a plot . . . . . . . . . . . . . . . . . . 49
2.4.4 Customizing axes . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 A few different plot types . . . . . . . . . . . . . . . . . . . . . . 52
2.5.1 Pie charts and why they should be avoided . . . . . . . . 53
2.5.2 Barplot summaries . . . . . . . . . . . . . . . . . . . . . . 54
2.5.3 The symbols function . . . . . . . . . . . . . . . . . . . . 55

v
vi CONTENTS

2.6 Multiple plot arrays . . . . . . . . . . . . . . . . . . . . . . . . . 57


2.6.1 Setting up simple arrays with mfrow . . . . . . . . . . . . 58
2.6.2 Using the layout function . . . . . . . . . . . . . . . . . . 61
2.7 Color graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7.1 A few general guidelines . . . . . . . . . . . . . . . . . . . 64
2.7.2 Color options in R . . . . . . . . . . . . . . . . . . . . . . 66
2.7.3 The tableplot function . . . . . . . . . . . . . . . . . . . . 68
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3 Exploratory Data Analysis: A First Look 79


3.1 Exploring a new dataset . . . . . . . . . . . . . . . . . . . . . . . 80
3.1.1 A general strategy . . . . . . . . . . . . . . . . . . . . . . 81
3.1.2 Examining the basic data characteristics . . . . . . . . . . 82
3.1.3 Variable types in practice . . . . . . . . . . . . . . . . . . 84
3.2 Summarizing numerical data . . . . . . . . . . . . . . . . . . . . 87
3.2.1 “Typical” values: the mean . . . . . . . . . . . . . . . . . 88
3.2.2 “Spread”: the standard deviation . . . . . . . . . . . . . . 88
3.2.3 Limitations of simple summary statistics . . . . . . . . . . 90
3.2.4 The Gaussian assumption . . . . . . . . . . . . . . . . . . 92
3.2.5 Is the Gaussian assumption reasonable? . . . . . . . . . . 95
3.3 Anomalies in numerical data . . . . . . . . . . . . . . . . . . . . 100
3.3.1 Outliers and their influence . . . . . . . . . . . . . . . . . 100
3.3.2 Detecting univariate outliers . . . . . . . . . . . . . . . . 104
3.3.3 Inliers and their detection . . . . . . . . . . . . . . . . . . 116
3.3.4 Metadata errors . . . . . . . . . . . . . . . . . . . . . . . 118
3.3.5 Missing data, possibly disguised . . . . . . . . . . . . . . 120
3.3.6 QQ-plots revisited . . . . . . . . . . . . . . . . . . . . . . 125
3.4 Visualizing relations between variables . . . . . . . . . . . . . . . 130
3.4.1 Scatterplots between numerical variables . . . . . . . . . . 131
3.4.2 Boxplots: numerical vs. categorical variables . . . . . . . 133
3.4.3 Mosaic plots: categorical scatterplots . . . . . . . . . . . . 135
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4 Working with External Data 141


4.1 File management in R . . . . . . . . . . . . . . . . . . . . . . . . 142
4.2 Manual data entry . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.2.1 Entering the data by hand . . . . . . . . . . . . . . . . . . 145
4.2.2 Manual data entry is bad but sometimes expedient . . . . 147
4.3 Interacting with the Internet . . . . . . . . . . . . . . . . . . . . 148
4.3.1 Previews of three Internet data examples . . . . . . . . . 148
4.3.2 A very brief introduction to HTML . . . . . . . . . . . . . 151
4.4 Working with CSV files . . . . . . . . . . . . . . . . . . . . . . . 152
4.4.1 Reading and writing CSV files . . . . . . . . . . . . . . . 152
4.4.2 Spreadsheets and csv files are not the same thing . . . . . 154
4.4.3 Two potential problems with CSV files . . . . . . . . . . . 155
4.5 Working with other file types . . . . . . . . . . . . . . . . . . . . 158
CONTENTS vii

4.5.1 Working with text files . . . . . . . . . . . . . . . . . . . . 158


4.5.2 Saving and retrieving R objects . . . . . . . . . . . . . . . 162
4.5.3 Graphics files . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.6 Merging data from different sources . . . . . . . . . . . . . . . . . 165
4.7 A brief introduction to databases . . . . . . . . . . . . . . . . . . 168
4.7.1 Relational databases, queries, and SQL . . . . . . . . . . 169
4.7.2 An introduction to the sqldf package . . . . . . . . . . . 171
4.7.3 An overview of R’s database support . . . . . . . . . . . . 174
4.7.4 An introduction to the RSQLite package . . . . . . . . . . 175
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

5 Linear Regression Models 181


5.1 Modeling the whiteside data . . . . . . . . . . . . . . . . . . . . 181
5.1.1 Describing lines in the plane . . . . . . . . . . . . . . . . 182
5.1.2 Fitting lines to points in the plane . . . . . . . . . . . . . 185
5.1.3 Fitting the whiteside data . . . . . . . . . . . . . . . . . 186
5.2 Overfitting and data splitting . . . . . . . . . . . . . . . . . . . . 188
5.2.1 An overfitting example . . . . . . . . . . . . . . . . . . . . 188
5.2.2 The training/validation/holdout split . . . . . . . . . . . 192
5.2.3 Two useful model validation tools . . . . . . . . . . . . . 196
5.3 Regression with multiple predictors . . . . . . . . . . . . . . . . . 201
5.3.1 The Cars93 example . . . . . . . . . . . . . . . . . . . . . 202
5.3.2 The problem of collinearity . . . . . . . . . . . . . . . . . 207
5.4 Using categorical predictors . . . . . . . . . . . . . . . . . . . . . 211
5.5 Interactions in linear regression models . . . . . . . . . . . . . . . 214
5.6 Variable transformations in linear regression . . . . . . . . . . . . 217
5.7 Robust regression: a very brief introduction . . . . . . . . . . . . 221
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

6 Crafting Data Stories 229


6.1 Crafting good data stories . . . . . . . . . . . . . . . . . . . . . . 229
6.1.1 The importance of clarity . . . . . . . . . . . . . . . . . . 230
6.1.2 The basic elements of an effective data story . . . . . . . 231
6.2 Different audiences have different needs . . . . . . . . . . . . . . 232
6.2.1 The executive summary or abstract . . . . . . . . . . . . 233
6.2.2 Extended summaries . . . . . . . . . . . . . . . . . . . . . 234
6.2.3 Longer documents . . . . . . . . . . . . . . . . . . . . . . 235
6.3 Three example data stories . . . . . . . . . . . . . . . . . . . . . 235
6.3.1 The Big Mac and Grande Latte economic indices . . . . . 236
6.3.2 Small losses in the Australian vehicle insurance data . . . 240
6.3.3 Unexpected heterogeneity: the Boston housing data . . . 243
viii CONTENTS

7 Programming in R 247
7.1 Interactive use versus programming . . . . . . . . . . . . . . . . . 247
7.1.1 A simple example: computing Fibonnacci numbers . . . . 248
7.1.2 Creating your own functions . . . . . . . . . . . . . . . . 252
7.2 Key elements of the R language . . . . . . . . . . . . . . . . . . . 256
7.2.1 Functions and their arguments . . . . . . . . . . . . . . . 256
7.2.2 The list data type . . . . . . . . . . . . . . . . . . . . . 260
7.2.3 Control structures . . . . . . . . . . . . . . . . . . . . . . 262
7.2.4 Replacing loops with apply functions . . . . . . . . . . . 268
7.2.5 Generic functions revisited . . . . . . . . . . . . . . . . . 270
7.3 Good programming practices . . . . . . . . . . . . . . . . . . . . 275
7.3.1 Modularity and the DRY principle . . . . . . . . . . . . . 275
7.3.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
7.3.3 Style guidelines . . . . . . . . . . . . . . . . . . . . . . . . 276
7.3.4 Testing and debugging . . . . . . . . . . . . . . . . . . . . 276
7.4 Five programming examples . . . . . . . . . . . . . . . . . . . . . 277
7.4.1 The function ValidationRsquared . . . . . . . . . . . . . 277
7.4.2 The function TVHsplit . . . . . . . . . . . . . . . . . . . 278
7.4.3 The function PredictedVsObservedPlot . . . . . . . . . 278
7.4.4 The function BasicSummary . . . . . . . . . . . . . . . . . 279
7.4.5 The function FindOutliers . . . . . . . . . . . . . . . . . 281
7.5 R scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

8 Working with Text Data 289


8.1 The fundamentals of text data analysis . . . . . . . . . . . . . . . 290
8.1.1 The basic steps in analyzing text data . . . . . . . . . . . 290
8.1.2 An illustrative example . . . . . . . . . . . . . . . . . . . 293
8.2 Basic character functions in R . . . . . . . . . . . . . . . . . . . . 298
8.2.1 The nchar function . . . . . . . . . . . . . . . . . . . . . 298
8.2.2 The grep function . . . . . . . . . . . . . . . . . . . . . . 301
8.2.3 Application to missing data and alternative spellings . . . 302
8.2.4 The sub and gsub functions . . . . . . . . . . . . . . . . . 304
8.2.5 The strsplit function . . . . . . . . . . . . . . . . . . . 306
8.2.6 Another application: ConvertAutoMpgRecords . . . . . . 307
8.2.7 The paste function . . . . . . . . . . . . . . . . . . . . . 309
8.3 A brief introduction to regular expressions . . . . . . . . . . . . . 311
8.3.1 Regular expression basics . . . . . . . . . . . . . . . . . . 311
8.3.2 Some useful regular expression examples . . . . . . . . . . 313
8.4 An aside: ASCII vs. UNICODE . . . . . . . . . . . . . . . . . . . 319
8.5 Quantitative text analysis . . . . . . . . . . . . . . . . . . . . . . 320
8.5.1 Document-term and document-feature matrices . . . . . . 320
8.5.2 String distances and approximate matching . . . . . . . . 322
8.6 Three detailed examples . . . . . . . . . . . . . . . . . . . . . . . 330
8.6.1 Characterizing a book . . . . . . . . . . . . . . . . . . . . 331
8.6.2 The cpus data frame . . . . . . . . . . . . . . . . . . . . . 336
CONTENTS ix

8.6.3 The unclaimed bank account data . . . . . . . . . . . . . 344


8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

9 Exploratory Data Analysis: A Second Look 357


9.1 An example: repeated measurements . . . . . . . . . . . . . . . . 358
9.1.1 Summary and practical implications . . . . . . . . . . . . 358
9.1.2 The gory details . . . . . . . . . . . . . . . . . . . . . . . 359
9.2 Confidence intervals and significance . . . . . . . . . . . . . . . . 364
9.2.1 Probability models versus data . . . . . . . . . . . . . . . 364
9.2.2 Quantiles of a distribution . . . . . . . . . . . . . . . . . . 366
9.2.3 Confidence intervals . . . . . . . . . . . . . . . . . . . . . 368
9.2.4 Statistical significance and p-values . . . . . . . . . . . . . 372
9.3 Characterizing a binary variable . . . . . . . . . . . . . . . . . . 375
9.3.1 The binomial distribution . . . . . . . . . . . . . . . . . . 375
9.3.2 Binomial confidence intervals . . . . . . . . . . . . . . . . 377
9.3.3 Odds ratios . . . . . . . . . . . . . . . . . . . . . . . . . . 382
9.4 Characterizing count data . . . . . . . . . . . . . . . . . . . . . . 386
9.4.1 The Poisson distribution and rare events . . . . . . . . . . 387
9.4.2 Alternative count distributions . . . . . . . . . . . . . . . 389
9.4.3 Discrete distribution plots . . . . . . . . . . . . . . . . . . 390
9.5 Continuous distributions . . . . . . . . . . . . . . . . . . . . . . . 393
9.5.1 Limitations of the Gaussian distribution . . . . . . . . . . 394
9.5.2 Some alternatives to the Gaussian distribution . . . . . . 398
9.5.3 The qqPlot function revisited . . . . . . . . . . . . . . . . 404
9.5.4 The problems of ties and implosion . . . . . . . . . . . . . 406
9.6 Associations between numerical variables . . . . . . . . . . . . . 409
9.6.1 Product-moment correlations . . . . . . . . . . . . . . . . 409
9.6.2 Spearman’s rank correlation measure . . . . . . . . . . . . 413
9.6.3 The correlation trick . . . . . . . . . . . . . . . . . . . . . 415
9.6.4 Correlation matrices and correlation plots . . . . . . . . . 418
9.6.5 Robust correlations . . . . . . . . . . . . . . . . . . . . . . 421
9.6.6 Multivariate outliers . . . . . . . . . . . . . . . . . . . . . 423
9.7 Associations between categorical variables . . . . . . . . . . . . . 427
9.7.1 Contingency tables . . . . . . . . . . . . . . . . . . . . . . 427
9.7.2 The chi-squared measure and Cramér’s V . . . . . . . . . 429
9.7.3 Goodman and Kruskal’s tau measure . . . . . . . . . . . . 433
9.8 Principal component analysis (PCA) . . . . . . . . . . . . . . . . 438
9.9 Working with date variables . . . . . . . . . . . . . . . . . . . . . 447
9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

10 More General Predictive Models 459


10.1 A predictive modeling overview . . . . . . . . . . . . . . . . . . . 459
10.1.1 The predictive modeling problem . . . . . . . . . . . . . . 460
10.1.2 The model-building process . . . . . . . . . . . . . . . . . 461
10.2 Binary classification and logistic regression . . . . . . . . . . . . 462
10.2.1 Basic logistic regression formulation . . . . . . . . . . . . 462
x CONTENTS

10.2.2 Fitting logistic regression models . . . . . . . . . . . . . . 464


10.2.3 Evaluating binary classifier performance . . . . . . . . . . 467
10.2.4 A brief introduction to glms . . . . . . . . . . . . . . . . . 474
10.3 Decision tree models . . . . . . . . . . . . . . . . . . . . . . . . . 478
10.3.1 Structure and fitting of decision trees . . . . . . . . . . . 479
10.3.2 A classification tree example . . . . . . . . . . . . . . . . 485
10.3.3 A regression tree example . . . . . . . . . . . . . . . . . . 487
10.4 Combining trees with regression . . . . . . . . . . . . . . . . . . . 491
10.5 Introduction to machine learning models . . . . . . . . . . . . . . 498
10.5.1 The instability of simple tree-based models . . . . . . . . 499
10.5.2 Random forest models . . . . . . . . . . . . . . . . . . . . 500
10.5.3 Boosted tree models . . . . . . . . . . . . . . . . . . . . . 502
10.6 Three practical details . . . . . . . . . . . . . . . . . . . . . . . . 506
10.6.1 Partial dependence plots . . . . . . . . . . . . . . . . . . . 507
10.6.2 Variable importance measures . . . . . . . . . . . . . . . . 513
10.6.3 Thin levels and data partitioning . . . . . . . . . . . . . . 519
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

11 Keeping It All Together 525


11.1 Managing your R installation . . . . . . . . . . . . . . . . . . . . 525
11.1.1 Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . 526
11.1.2 Updating packages . . . . . . . . . . . . . . . . . . . . . . 526
11.1.3 Updating R . . . . . . . . . . . . . . . . . . . . . . . . . . 527
11.2 Managing files effectively . . . . . . . . . . . . . . . . . . . . . . 528
11.2.1 Organizing directories . . . . . . . . . . . . . . . . . . . . 528
11.2.2 Use appropriate file extensions . . . . . . . . . . . . . . . 531
11.2.3 Choose good file names . . . . . . . . . . . . . . . . . . . 532
11.3 Document everything . . . . . . . . . . . . . . . . . . . . . . . . . 533
11.3.1 Data dictionaries . . . . . . . . . . . . . . . . . . . . . . . 533
11.3.2 Documenting code . . . . . . . . . . . . . . . . . . . . . . 534
11.3.3 Documenting results . . . . . . . . . . . . . . . . . . . . . 535
11.4 Introduction to reproducible computing . . . . . . . . . . . . . . 536
11.4.1 The key ideas of reproducibility . . . . . . . . . . . . . . . 536
11.4.2 Using R Markdown . . . . . . . . . . . . . . . . . . . . . . 537

Bibliography 539

Index 544
Preface

Much has been written about the abundance of data now available from the
Internet and a great variety of other sources. In his aptly named 2007 book Glut
[81], Alex Wright argued that the total quantity of data then being produced was
approximately five exabytes per year (5 × 1018 bytes), more than the estimated
total number of words spoken by human beings in our entire history. And that
assessment was from a decade ago: increasingly, we find ourselves “drowning in
a ocean of data,” raising questions like “What do we do with it all?” and “How
do we begin to make any sense of it?”
Fortunately, the open-source software movement has provided us with—at
least partial—solutions like the R programming language. While R is not the
only relevant software environment for analyzing data—Python is another option
with a growing base of support—R probably represents the most flexible data
analysis software platform that has ever been available. R is largely based on
S, a software system developed by John Chambers, who was awarded the 1998
Software System Award by the Association for Computing Machinery (ACM)
for its development; the award noted that S “has forever altered the way people
analyze, visualize, and manipulate data.”
The other side of this software coin is educational: given the availability and
sophistication of R, the situation is analogous to someone giving you an F-15
fighter aircraft, fully fueled with its engines running. If you know how to fly it,
this can be a great way to get from one place to another very quickly. But it is
not enough to just have the plane: you also need to know how to take off in it,
how to land it, and how to navigate from where you are to where you want to
go. Also, you need to have an idea of where you do want to go. With R, the
situation is analogous: the software can do a lot, but you need to know both
how to use it and what you want to do with it.
The purpose of this book is to address the most important of these questions.
Specifically, this book has three objectives:

1. To provide a basic introduction to exploratory data analysis (EDA);

2. To introduce the range of “interesting”—good, bad, and ugly—features


we can expect to find in data, and why it is important to find them;

3. To introduce the mechanics of using R to explore and explain data.

xi
xii PREFACE

This book grew out of materials I developed for the course “Data Mining Using
R” that I taught for the University of Connecticut Graduate School of Business.
The students in this course typically had little or no prior exposure to data
analysis, modeling, statistics, or programming. This was not universally true,
but it was typical, so it was necessary to make minimal background assumptions,
particularly with respect to programming. Further, it was also important to
keep the treatment relatively non-mathematical: data analysis is an inherently
mathematical subject, so it is not possible to avoid mathematics altogether,
but for this audience it was necessary to assume no more than the minimum
essential mathematical background.
The intended audience for this book is students—both advanced undergrad-
uates and entry-level graduate students—along with working professionals who
want a detailed but introductory treatment of the three topics listed in the
book’s title: data, exploratory analysis, and R. Exercises are included at the
ends of most chapters, and an instructor’s solution manual giving complete
solutions to all of the exercises is available from the publisher.
Author

Ronald K. Pearson is a Senior Data Scientist with GeoVera Holdings, a


property insurance company in Fairfield, California, involved primarily in the
exploratory analysis of data, particularly text data. Previously, he held the po-
sition of Data Scientist with DataRobot in Boston, a software company whose
products support large-scale predictive modeling for a wide range of business
applications and are based on Python and R, where he was one of the authors
of the datarobot R package. He is also the developer of the GoodmanKruskal R
package and has held a variety of other industrial, business, and academic posi-
tions. These positions include both the DuPont Company and the Swiss Federal
Institute of Technology (ETH Zürich), where he was an active researcher in the
area of nonlinear dynamic modeling for industrial process control, the Tampere
University of Technology where he was a visiting professor involved in teaching
and research in nonlinear digital filters, and the Travelers Companies, where he
was involved in predictive modeling for insurance applications. He holds a PhD
in Electrical Engineering and Computer Science from the Massachusetts Insti-
tute of Technology and has published conference and journal papers on topics
ranging from nonlinear dynamic model structure selection to the problems of
disguised missing data in predictive modeling. Dr. Pearson has authored or
co-authored five previous books, including Exploring Data in Engineering, the
Sciences, and Medicine (Oxford University Press, 2011) and Nonlinear Digital
Filtering with Python, co-authored with Moncef Gabbouj (CRC Press, 2016).
He is also the developer of the DataCamp course on base R graphics.

xiii
Chapter 1

Data, Exploratory Analysis,


and R

1.1 Why do we analyze data?


The basic subject of this book is data analysis, so it is useful to begin by
addressing the question of why we might want to do this. There are at least
three motivations for analyzing data:

1. to understand what has happened or what is happening;

2. to predict what is likely to happen, either in the future or in other cir-


cumstances we haven’t seen yet;

3. to guide us in making decisions.

The primary focus of this book is on exploratory data analysis, discussed further
in the next section and throughout the rest of this book, and this approach is
most useful in addressing problems of the first type: understanding our data.
That said, the predictions required in the second type of problem listed above
are typically based on mathematical models like those discussed in Chapters 5
and 10, which are optimized to give reliable predictions for data we have avail-
able, in the hope and expectation that they will also give reliable predictions for
cases we haven’t yet considered. In building these models, it is important to use
representative, reliable data, and the exploratory analysis techniques described
in this book can be extremely useful in making certain this is the case. Similarly,
in the third class of problems listed above—making decisions—it is important
that we base them on an accurate understanding of the situation and/or ac-
curate predictions of what is likely to happen next. Again, the techniques of
exploratory data analysis described here can be extremely useful in verifying
and/or improving the accuracy of our data and our predictions.

1
2 CHAPTER 1. DATA, EXPLORATORY ANALYSIS, AND R

1.2 The view from 90,000 feet


This book is intended as an introduction to the three title subjects—data, its ex-
ploratory analysis, and the R programming language—and the following sections
give high-level overviews of each, emphasizing key details and interrelationships.

1.2.1 Data
Loosely speaking, the term “data” refers to a collection of details, recorded to
characterize a source like one of the following:
• an entity, e.g.: family history from a patient in a medical study; manufac-
turing lot information for a material sample in a physical testing applica-
tion; or competing company characteristics in a marketing analysis;
• an event, e.g.: demographic characteristics of those who voted for different
political candidates in a particular election;
• a process, e.g.: operating data from an industrial manufacturing process.
This book will generally use the term “data” to refer to a rectangular array
of observed values, where each row refers to a different observation of entity,
event, or process characteristics (e.g., distinct patients in a medical study), and
each column represents a different characteristic (e.g., diastolic blood pressure)
recorded—or at least potentially recorded—for each row. In R’s terminology,
this description defines a data frame, one of R’s key data types.
The mtcars data frame is one of many built-in data examples in R. This data
frame has 32 rows, each one corresponding to a different car. Each of these cars
is characterized by 11 variables, which constitute the columns of the data frame.
These variables include the car’s mileage (in miles per gallon, mpg), the number
of gears in its transmission, the transmission type (manual or automatic), the
number of cylinders, the horsepower, and various other characteristics. The
original source of this data was a comparison of 32 cars from model years 1973
and 1974 published in Motor Trend Magazine. The first six records of this data
frame may be examined using the head command in R:
head(mtcars)

## mpg cyl disp hp drat wt qsec vs am gear carb


## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

An important feature of data frames in R is that both rows and columns have
names associated with them. In favorable cases, these names are informative,
as they are here: the row names identify the particular cars being characterized,
and the column names identify the characteristics recorded for each car.
1.2. THE VIEW FROM 90,000 FEET 3

A more complete description of this dataset is available through R’s built-in


help facility. Typing “help(mtcars)” at the R command prompt will bring up
a help page that gives the original source of the data, cites a paper from the
statistical literature that analyzes this dataset [39], and briefly describes the
variables included. This information constitutes metadata for the mtcars data
frame: metadata is “data about data,” and it can vary widely in terms of its
completeness, consistency, and general accuracy. Since metadata often provides
much of our preliminary insight into the contents of a dataset, it is extremely
important, and any limitations of this metadata—incompleteness, inconsistency,
and/or inaccuracy—can cause serious problems in our subsequent analysis. For
these reasons, discussions of metadata will recur frequently throughout this
book. The key point here is that, potentially valuable as metadata is, we cannot
afford to accept it uncritically: we should always cross-check the metadata with
the actual data values, with our intuition and prior understanding of the subject
matter, and with other sources of information that may be available.
As a specific illustration of this last point, a popular benchmark dataset for
evaluating binary classification algorithms (i.e., computational procedures that
attempt to predict a binary outcome from other variables) is the Pima Indi-
ans diabetes dataset, available from the UCI Machine Learning Repository, an
important Internet data source discussed further in Chapter 4. In this partic-
ular case, the dataset characterizes female adult members of the Pima Indians
tribe, giving a number of different medical status and history characteristics
(e.g., diastolic blood pressure, age, and number of times pregnant), along with
a binary diagnosis indicator with the value 1 if the patient had been diagnosed
with diabetes and 0 if they had not. Several versions of this dataset are avail-
able: the one considered here was the UCI website on May 10, 2014, and it has
768 rows and 9 columns. In contrast, the data frame Pima.tr included in R’s
MASS package is a subset of this original, with 200 rows and 8 columns. The
metadata available for this dataset from the UCI Machine Learning Repository
now indicates that this dataset exhibits missing values, but there is also a note
that prior to February 28, 2011 the metadata indicated that there were no miss-
ing values. In fact, the missing values in this dataset are not coded explicitly
as missing with a special code (e.g., R’s “NA” code), but are instead coded as
zero. As a result, a number of studies characterizing binary classifiers have been
published using this dataset as a benchmark where the authors were not aware
that data values were missing, in some cases, quite a large fraction of the total
observations. As a specific example, the serum insulin measurement included in
the dataset is 48.7% missing.
Finally, it is important to recognize the essential role our assumptions about
data can play in its subsequent analysis. As a simple and amusing example,
consider the following “data analysis” question: how many planets are there or-
biting the Sun? Until about 2006, the generally accepted answer was nine, with
Pluto the outermost member of this set. Pluto was subsequently re-classified
as a “dwarf planet,” in part because a larger, more distant body was found in
the Kuiper Belt and enough astronomers did not want to classify this object as
the “tenth planet” that Pluto was demoted to dwarf planet status. In his book,
4 CHAPTER 1. DATA, EXPLORATORY ANALYSIS, AND R

Is Pluto a Planet? [72], astronomer David Weintraub argues that Pluto should
remain a planet, based on the following defining criteria for planethood:

1. the object must be too small to generate, or to have ever generated, energy
through nuclear fusion;

2. the object must be big enough to be spherical;

3. the object must have a primary orbit around a star.

The first of these conditions excludes dwarf stars from being classed as planets,
and the third excludes moons from being declared planets (since they orbit
planets, not stars). Weintraub notes, however, that under this definition, there
are at least 24 planets orbiting the Sun: the eight now generally regarded as
planets, Pluto, and 15 of the largest objects from the asteroid belt between Mars
and Jupiter and from the Kuiper Belt beyond Pluto. This example illustrates
that definitions are both extremely important and not to be taken for granted:
everyone knows what a planet is, don’t they? In the broader context of data
analysis, the key point is that unrecognized disagreements in the definition of
a variable are possible between those who measure and record it, and those
who subsequently use it in analysis; these discrepancies can lie at the heart of
unexpected findings that turn out to be erroneous. For example, if we wish to
combine two medical datasets, characterizing different groups of patients with
“the same” disease, it is important that the same diagnostic criteria be used to
declare patients “diseased” or “not diseased.” For a more detailed discussion
of the role of definitions in data analysis, refer to Sec. 2.4 of Exploring Data in
Engineering, the Sciences, and Medicine [58]. (Although the book is generally
quite mathematical, this is not true of the discussions of data characteristics
presented in Chapter 2, which may be useful to readers of this book.)

1.2.2 Exploratory analysis


Roughly speaking, exploratory data analysis (EDA) may be defined as the art
of looking at one or more datasets in an effort to understand the underlying
structure of the data contained there. A useful description of how we might go
about this is offered by Diaconis [21]:

We look at numbers or graphs and try to find patterns. We pursue


leads suggested by background information, imagination, patterns
perceived, and experience with other data analyses.

Note that this quote suggests—although it does not strictly imply—that the
data we are exploring consists of numbers. Indeed, even if our dataset contains
nonnumerical data, our analysis of it is likely to be based largely on numerical
characteristics computed from these nonnumerical values. As a specific exam-
ple, categorical variables appearing in a dataset like “city,” “political party
affiliation,” or “manufacturer” are typically tabulated, converted from discrete
named values into counts or relative frequencies. These derived representations
1.2. THE VIEW FROM 90,000 FEET 5

can be particularly useful in exploring data when the number of levels—i.e., the
number of distinct values the original variable can exhibit—is relatively small.
In such cases, many useful exploratory tools have been developed that allow us
to examine the character of these nonnumeric variables and their relationship
with other variables, whether categorical or numeric. Simple graphical exam-
ples include boxplots for looking at the distribution of numerical values across
the different levels of a categorical variable, or mosaic plots for looking at the
relationship between categorical variables; both of these plots and other, closely
related ones are discussed further in Chapters 2 and 3.
Categorical variables with many levels pose more challenging problems, and
these come in at least two varieties. One is represented by variables like U.S.
postal zipcode, which identifies geographic locations at a much finer-grained
level than state does and exhibits about 40,000 distinct levels. A detailed dis-
cussion of dealing with this type of categorical variable is beyond the scope
of this book, although one possible approach is described briefly at the end of
Chapter 10. The second type of many-level categorical variable arises in settings
where the inherent structure of the variable can be exploited to develop special-
ized analysis techniques. Text data is a case in point: the number of distinct
words in a document or a collection of documents can be enormous, but special
techniques for analyzing text data have been developed. Chapter 8 introduces
some of the methods available in R for analyzing text data.
The mention of “graphs” in the Diaconis quote is particularly important
since humans are much better at seeing patterns in graphs than in large collec-
tions of numbers. This is one of the reasons R supports so many different graph-
ical display methods (e.g., scatterplots, barplots, boxplots, quantile-quantile
plots, histograms, mosaic plots, and many, many more), and one of the reasons
this book places so much emphasis on them. That said, two points are important
here. First, graphical techniques that are useful to the data analyst in finding
important structure in a dataset are not necessarily useful in explaining those
findings to others. For example, large arrays of two-variable scatterplots may be
a useful screening tool for finding related variables or anomalous data subsets,
but these are extremely poor ways of presenting results to others because they
essentially require the viewer to repeat the analysis for themselves. Instead, re-
sults should be presented to others using displays that highlight and emphasize
the analyst’s findings to make sure that the intended message is received. This
distinction between exploratory and explanatory displays is discussed further in
Chapter 2 on graphics in R and in Chapter 6 on crafting data stories (i.e., ex-
plaining your findings), but most of the emphasis in this book is on exploratory
graphical tools to help us obtain these results.
The second point to note here is that the utility of any graphical display
can depend strongly on exactly what is plotted, as illustrated in Fig. 1.1. This
issue has two components: the mechanics of how a subset of data is displayed,
and the choice of what goes into that data subset. While both of these aspects
are important, the second is far more important than the first. Specifically, it is
important to note that the form in which data arrives may not be the most useful
for analysis. To illustrate, Fig. 1.1 shows two sets of plots, both constructed
Another random document with
no related content on Scribd:
invokes the Saxon Chronicle and other authorities in proof of the
credibility of his narrative, but these references themselves show
that he is not unconscious of the fact that his story stands in need of
extraneous support.

And yet, this artificiality being once conceded, how beautiful is the
structure! How fine the material, and how symmetrically it is put
together! Sometimes, perhaps, the narrative lags a little; sometimes
the descriptions, like those of Cedric’s hall or Athelstane’s castle, are
longer than the impatience of the reader cares to tolerate. Yet the
great scenes of the drama, how vividly do all these stand forth in our
memory! How splendid the stage setting and how well arranged the
incidents!

The story opens quietly. Gurth, the swineherd, and Wamba, the
jester of Cedric the Saxon, are driving home a herd of swine, when
they are overtaken by Prior Aymer and the Templar Brian de Bois-
Guilbert with their train. Then follows the supper scene at
Rotherwood, the residence of Cedric, where Ivanhoe, disguised as a
wandering palmer, returned from Palestine, visits his father’s home,
answers the boasting taunts of the Templar, saves the poor Jew,
Isaac of York, and is supplied with armor for the coming tourney.

Next follows one of the most celebrated scenes in literature, the


description of the passage at arms at Ashby, in which Ivanhoe, as
the “Disinherited Knight,” vanquishes all antagonists and names
Rowena, Cedric’s ward, as queen of love and beauty, and where, in
the melée on the second day, the “Black Sluggard,” another
unknown knight, turns the fortunes of the fray against the Templar.

Perhaps even more admirably constructed are the scenes which


follow,—the capture of Cedric, Rowena, Athelstane, Isaac and his
daughter Rebecca, by the Norman nobles, and their imprisonment in
the castle of Front-de-Boeuf, where they are separated from each
other, and where the events which take place simultaneously in
different parts of the castle are narrated with great vividness and
power. While Cedric and Athelstane are held for ransom, Front-de-
Boeuf seeks to extort a vast sum of money from poor Isaac by
preparing to roast him alive; De Bracy, a Norman noble, demands
the hand of Rowena as the price of her safety and that of Ivanhoe;
and the Templar besets Rebecca with his amorous importunities until
she prepares to fling herself from the parapet to escape his violence.
The interruption of these scenes by a bugle call from without, the
demand for the release of the captives made by Wamba, Gurth, the
Black Knight, and Locksley, captain of the outlaws, followed by the
siege and burning of the castle, constitute perhaps the climax of the
story, and are even more impressive than its third great scene, the
trial of Rebecca for sorcery, and her deliverance by Ivanhoe, who
appears as her champion at the last moment.

Certain episodes are almost as attractive as the main thread of the


narrative. For instance, the drinking bout between Friar Tuck and the
Black Knight (who turns out to be King Richard) in the chapel in the
forest.

There are improbabilities in this work which show us very clearly


that it is a creation of the imagination rather than a transcript of
observations from actual life. Take, for instance, the conversation
between Brian de Bois-Guilbert and the captive Rebecca in the
castle. It is safe to say that no knight, however profligate, ever
began a love-suit to a maiden with a satirical reminder that her
father was then being tortured for money in another part of the
castle, in such words as the following:

“Know, bright lily of the vale of Baca, that thy father is already in
the hands of a powerful alchemist, who knows how to convert
into gold and silver even the rusty bars of a dungeon grate. The
venerable Isaac is subjected to an alembic, which will distill from
him all he holds dear, without any assistance from my requests or
thy entreaty. Thy ransom must be paid by love and beauty, and in
no other coin will I accept it.”
And yet, in spite of such defects, the heroism displayed by Rebecca
in this particular scene has made it one of the most attractive in the
entire story. Rebecca is indeed one of the noblest characters in
fiction, and the portrait is natural and human, as well as heroic.
Although she was delivered from the stake by her champion, the
story ends sadly for her, since the knight whom she loves has
become the husband of Rowena. Scott tells us in his preface that he
has been censured for this, but he adds, with admirable taste, that
he thinks that a character of a highly virtuous and lofty stamp is
degraded rather than exalted by an attempt to reward virtue with
temporal prosperity.

But to my mind the most attractive person in the book is Wamba,


the jester. He appears to me in many ways a close imitation of some
of Shakespeare’s clowns. His jests are on an average quite as good,
and he everywhere awakens our liveliest interest and sympathy,
from the hour when he interposes his shield of brawn in front of the
Jew at the tournament until the time when he exchanges repartees
and songs with the Black Knight on their way through the forest.
There is, moreover a strain of pathos in his merriment, and when he
enters the castle of Front-de-Boeuf, disguised as a monk, and
exchanges his garments with his master, remaining within the castle
in expectation of death, the gibes with which he accompanies his
sacrifice give to his character something very human, lovable, and
withal heroic. Even Shakespeare has hardly given us a better clown.

The resuscitation and the appearance of the Saxon noble Athelstane


at his own funeral feast is far from artistic. Scott himself calls it a
tour de force, and says he put it in at the vehement entreaties of his
friend and printer, who was inconsolable at the Saxon being
conveyed to the tomb,—an example which ought to be a warning to
authors to follow their own judgment rather than that of their
friends.

In the crucible of Scott’s imagination moral qualities are sometimes


fused together in such manner that the original ingredients are quite
undiscernible. Robin Hood and his outlaws become generous heroes,
and Friar Tuck, who is in reality a dissolute and hypocritical monk,
becomes amiable and attractive. Indeed, this great writer of
romance is filled with such ever present optimism and love of
honorable qualities, that it is almost impossible for him to draw the
picture of a really detestable man. His novels offer the strongest
possible contrast to the pessimistic realism of some of the more
recent works of fiction.

Men may differ in their estimates of Ivanhoe as a picture of human


life and character, but they can hardly differ in their estimate of it as
a beautiful piece of poetic imagination.
THE BETROTHED
ALESSANDRO MANZONI

“The Betrothed,” by Manzoni, has not received at the hands of the


English or American public that wide celebrity or high rank which it
deserves. It is a very great novel. Excepting only “Don Quixote,” and
some of the masterpieces of Thackeray, I know of nothing more
excellent in the whole range of fiction. There is no artificiality, no
sensationalism, no straining after effect; but the story proceeds
naturally and even quietly through events of great historic as well as
tragic interest, to its consummation.

The scene opens at a village on the shores of the lake of Como, on


an occasion when Don Abbondio, the curate of the parish, is stopped
on his way home by two “bravoes” of Don Rodrigo, a nobleman of
the locality, and warned, upon pain of death, not to celebrate the
marriage of Renzo Tramaglino and Lucia Mondella, which had been
fixed for the following day. The scene is a very vivid one, and the
terror of Don Abbondio is set forth in the liveliest manner. He is also
warned not to disclose the warning; “It will be the same as marrying
them,” says the bravo. But the poor priest is a leaky vessel, and
when he grumbles and complains to his housekeeper Perpetua, he
can not refrain from relating to her the awful threat. Dreadful are his
dreams that night of “bravoes, Don Rodrigo, Renzo, cries, muskets”;
and on the next day, when he makes blundering excuses to the
bridegroom and tries to overwhelm him with Latin quotations which
he can not understand, the truth all comes out, for Perpetua has
talked with Renzo about “overbearing tyrants,” and Renzo at last
worms the story, and even the name of the “tyrant,” out of the
frightened priest.

But the wedding is stopped, and Renzo betakes himself to Dr.


Azzecca Garbugli, learned in the law, who treats him encouragingly
and confidentially, so long as he thinks he has only a malefactor to
defend, quoting terrible edicts with the comforting assurance that he
can get him off, until he learns that Renzo has come, not to defeat
but to seek justice, and that too against the powerful Don Rodrigo.
Then he sends the poor fellow away, and will hear nothing in
justification of his suit.

But the unfortunate lovers have a friend in the person of Father


Cristoforo, a monk, who in his early life had killed a man in a rage,
and devoted the remainder of his days to the humility and
repentance of the cloister. He takes it upon himself to visit Don
Rodrigo, and in earnest and indignant words remonstrates with the
abandoned nobleman, but he is ordered from the house.

And now Agnese, the gossiping mother of Lucia, proposes to


accomplish the marriage by craft. The lovers are to make a
declaration before the curate in the presence of witnesses. This, it
seems, was a method recognized by law. Renzo undertakes his
preparations for the scheme; gains access to Don Abbondio’s house
through a friend, who comes under pretense of paying rent; but just
as they are making the mutual declaration they are interrupted by a
great outcry on the part of Don Abbondio, who throws the tablecloth
over Lucia’s face and stops the proceedings.

That same night Don Rodrigo has sent his bravoes to abduct Lucia.
They steal into the house, but find it empty, and are suddenly
startled by the ringing of the bell, which has followed the outcry of
Don Abbondio. “Each of the villains seems to hear in these peals his
name, surname, and nickname,” and they flee in consternation,
while the betrothed betake themselves to the convent of Father
Cristoforo, at Pescarenico; and the tumult aroused in the village by
these events, admirably pictured by the novelist, at length subsides.

Father Cristoforo sends Renzo to Milan, and the women to a convent


at Monza, where Lucia is to find refuge with “the Signora,” a nun of
high rank, who has been compelled by her father to assume the veil.
The Signora is proud, passionate, unreconciled. Her history, and the
schemes by which her consent to a monastic life had been extorted
by alternate persecutions and flatteries, are skillfully delineated, as
well as her intrigue with Egidio, an abandoned man, living in a house
adjoining the convent, which intrigue is followed by the mysterious
disappearance of a lay sister who has discovered the crime. But “the
Signora” now rejoices at the opportunity of thus sheltering an
innocent creature like Lucia, whom she takes under her protection.

Renzo reaches Milan at the time of the breaking out of the bread
riots, due to the prevailing famine. The looting and destruction of
one of the bake-houses is vividly described, and also the attack upon
the superintendent of provisions. Renzo can not keep out of these
exciting scenes, and becomes quite a hero, making a speech to the
crowd, innocent enough in purpose, but easily construed into
sedition by a secret agent of the government who hears it, attaches
himself to Renzo, acts as his guide to an inn in the neighborhood,
where the innocent young man unlawfully refuses to give his name
to the innkeeper, but unwittingly reveals it to his guide; then goes to
bed intoxicated, is arrested next morning, escapes from the officers
of justice in the midst of the crowd, flees from the city, and does not
stop until he has quit the duchy of Milan, crossed the Adda, and
taken refuge with his cousin Bortolo in the Bergamascan territory—
all of which is followed by proceedings declaring him a dangerous
outlaw,—luckily, however, after he is well out of reach.

Through the intrigues of Don Rodrigo, the monk Cristoforo is sent


away to Rimini, and the nobleman now betakes himself to the castle
of a great lord, whose name is not given, so dreadful were the
crimes he was said to have committed. The Unnamed took upon
himself the task of kidnapping Lucia from the convent, and for this
purpose availed himself of Egidio, who compelled the Signora to
betray the girl committed to her keeping and to send Lucia on a
pretended message, to be seized, thrown into a carriage, and driven
to that lair of robbers, the castle of the Unnamed. But so great are
her sufferings, so moving her piteous appeals, that even the heart of
the outlaw is touched, and he falters in his desperate scheme. Lucia
in her agony prays to the Madonna for deliverance, and, resolving to
sacrifice what she holds most dear, she determines to give up her
beloved Renzo, and vows to remain a virgin.

A fine description is given of the remorse which steals over the


conscience of the desperate malefactor, his despair at the
contemplation of a career which is now drawing near its close, with
its inevitable termination, and the thought, “If there should really be
another life!” He hears again the piteous words of Lucia when she
besought him to set her free, “God pardons so many sins for one
deed of mercy!”

When the morning breaks after a night of this remorse, he hears the
distant chiming of bells; learns of the festival of the people in the
neighborhood who were going to meet their bishop, Cardinal
Federigo Borromeo, and, by a sudden impulse, he too determines to
go and present himself to the cardinal. The history of this great
prelate, a saintly man, is given in detail—his works of charity, his
writings, his efforts in the cause of education. The Unnamed is
welcomed by the Cardinal with joy and genuine tenderness, and the
details of a religious conversion, often repulsive to an unsympathetic
reader, here become, through the author’s skill, both natural and
attractive.

Don Abbondio, to his great consternation is now sent with the


celebrated outlaw to fetch Lucia from his castle. He goes thither,
trembling, grumbling, and complaining to himself like an old woman.
The poor girl is released, and believes, of course, that her
deliverance is due to the Madonna.

Shortly afterwards the cardinal, on the occasion of a visit to Don


Abbondio’s parish, takes the poor priest to task for his violated duty
in refusing to celebrate the marriage. There are few passages in
literature more impressive than the solemn severity of his reproof;—

“Signor Curate, why did you not unite in marriage this Lucia with
her bethrothed husband?”....

“Don Abbondio began to relate the doleful history; but


suppressing the principal name, he merely substituted a great
Signor; thus giving to prudence the little that he could in such an
emergency.

“‘And you have no other motive?’ asked the Cardinal, having


attentively heard the whole.

“‘Perhaps I have not sufficiently explained myself,’ replied Don


Abbondio. ‘I was prohibited under pain of death to perform this
marriage.’

“‘And does this appear to you a sufficient reason for omitting a


positive duty?’

“‘I have always endeavored to do my duty, even at very great


inconvenience; but when one’s life is concerned....’

“‘And when you presented yourself to the church,’ said Federigo,


in a still more solemn tone, ‘to receive Holy Orders, did she
caution you about your life?’.... ‘He from whom we have received
teaching and example, in imitation of whom we suffer ourselves
to be called, and call ourselves, shepherds; when He descended
upon earth to execute His office, did He lay down as a condition
the safety of His life? And to save it, to preserve it, I say, a few
days longer upon earth, at the expense of charity and duty, did he
institute the holy unction, the imposition of hands, the gift of the
priesthood? Leave it to the world to teach this virtue, to advocate
this doctrine. What do I say? Oh, shame! the world itself rejects
it; the world also makes its own laws, which fix the limits of good
and evil; it, too, has its gospel, a gospel of pride and hatred; and
it will not have it said that the love of life is a reason for
transgressing its precepts. It will not, and it is obeyed. And we!
children and proclaimers of the promise! What would the Church
be, if such language as yours were that of all your brethren?’

“‘I repeat, my Lord,’ answered Don Abbondio, ‘that I shall be to


blame.... One can’t give one’s self courage.’

“‘And why then, I might ask you, did you undertake an office
which binds upon you a continual warfare with the passions of the
world?... Ah, if for so many years of pastoral labors you have
loved your flock (and how could you not love them?)—if you have
placed in them your affections, your cares, your happiness,
courage ought not to fail you in the moment of need; love is
intrepid.’”

This discourse, which is much longer than I have quoted, gives us


an admirable ideal of the episcopal office, and through the whole of
it the contrast between these two natures vividly appears, without
any apparent effort on the part of the author to produce it.

In the meantime, Renzo, who has been in hiding under an assumed


name, has established secret communication with Agnese, the
mother of his betrothed, and is naturally greatly disgusted to learn
of Lucia’s vow. Lucia has found refuge at Milan with a distinguished
lady, one Donna Prassede, who is a type of the “superior woman”—
one of those pestilent, unsympathetic natures, determined to do
good to others at whatever violence to their feelings; who feels
herself the instrument of Heaven and with a consciousness of innate
superiority, and great display of patronage, torments Lucia by
denouncing the unworthy outlaw to whom her affections have been
engaged.

Up to this point the narrative has traversed scenes common enough


to the period with which it deals; but here it takes up the story of
one of the most terrible public calamities which history records—the
appearance of the plague in Milan. The scenes of the preceding
famine are vividly described; the inefficacy of the ridiculous legal
remedies by which it was proposed to supply the lack of natural
resources; the establishment of the Lazaretto; the war raging in
Italy, which distracted the attention of the authorities; and, finally,
the invasion of the German army, by which the plague was
introduced into the territory of Milan. A historical account is given of
the introduction of the contagion, and the various stages of public
sentiment in regard to it.

“First, then, it was not the plague, absolutely not—by no means;


the very utterance of the term was prohibited. Then, it was
pestilential fevers; the idea was indirectly admitted in the
adjective. Then, it was not the true nor real plague; that is to say,
it was the plague, but only in a certain sense; not positively and
undoubtedly the plague, but something to which no other name
could be affixed. Lastly, it was the plague without doubt, without
dispute; but even then another idea was appended to it, the idea
of poison and witchcraft, which altered and confounded that
conveyed in the word they could no longer repress.”

There are descriptions of the processions in the streets, the


exhibition of the body of San Carlo Borromeo, and of the public rage
against the supposed poisoners. But the most vivid part of the
description begins when the author again takes up the thread of his
story and describes the return of Don Rodrigo from a carousal,
where he had excited great laughter by a funeral eulogium on his
kinsman, Count Attilio, who had been carried off by the disease two
days before. There is a powerful description of the coming on of the
fatal malady, on his return, and of the dreams that tormented him in
his sleep.

“He went on from one thing to another, till he seemed to find


himself in a large church, in the first ranks, in the midst of a great
crowd of people; there he was, wondering how he had got there,
how the thought had ever entered his head, particularly at such a
time; and he felt in his heart excessively vexed. He looked at the
bystanders; they had all pale, emaciated countenances, with
staring and glistening eyes, and hanging lips; their garments were
tattered and falling to pieces; and through the rents appeared
livid spots, and swellings. ‘Make room, you rabble!’ he fancied he
cried, looking towards the door, which was far, far away; and
accompanying the cry with a threatening expression of
countenance, but without moving a limb; nay, even drawing up
his body to avoid coming in contact with those polluted creatures,
who crowded only too closely upon him on every side. But not
one of the senseless beings seemed to move, nor even to have
heard him; nay, they pressed still more upon him; and, above all,
it felt as if some one of them, with his elbow, or whatever it might
be, was pushing against his left side, between the heart and arm-
pit, where he felt a painful, and as it were, heavy pressure. And if
he writhed himself to get rid of this uneasy feeling, immediately a
fresh unknown something began to prick him in the very same
place. Enraged, he attempted to lay his hand on his sword; and
then it seemed as if the thronging of the multitude had raised it
up level with his chest, and that it was the hilt of it which pressed
so in that spot; and the moment he touched it he felt a still
sharper stitch. He cried out, panted, and would have uttered a
still louder cry, when, behold! all these faces turned in one
direction. He looked the same way, perceived a pulpit, and saw
slowly rising above its edge something round, smooth, and
shining; then rose, and distinctly appeared, a bald head; then two
eyes, a face, a long and white beard, and the upright figure of a
friar, visible above the sides down to the girdle; it was Friar
Cristoforo! Darting a look around upon his audience, he seemed
to Don Rodrigo to fix his gaze on him, at the same time raising his
hand in exactly the attitude he had assumed in that room on the
ground floor in his palace. Don Rodrigo then himself lifted up his
hands in fury, and made an effort, as if to throw himself forward
and grasp that arm extended in the air; a voice, which had been
vainly and secretly struggling in his throat, burst forth in a great
howl; and he awoke. He dropped the arm he had in reality
uplifted, strove, with some difficulty, to recover the right meaning
of everything, and to open his eyes, for the light of the already
advanced day gave him no less uneasiness than that of the candle
had done; recognized his bed and his chamber; understood that
all had been a dream; the church, the people, the friar, all had
vanished—all, but one thing—that pain in his left side. Together
with this, he felt a frightful acceleration of palpitation at the heart,
a noise and humming in his ears, a raging fire within, and a
weight in all his limbs, worse than when he lay down. He
hesitated a little before looking at the spot that pained him; at
length, he uncovered it, and glanced at it with a shudder;—there
was a hideous spot, of a livid purple hue.”

The unhappy man now finds that he has been betrayed by Griso, the
chief of his bravoes, who, under pretense of bringing the doctor, has
introduced into the room the horrible monatti, whose duty it is to
drag away the dead to their graves and the sick to the Lazaretto.
They plunder the stricken man of his treasures before his eyes, and
then carry him away.

In the meantime Renzo, who has had the plague in the


Bergamascan territory, finds it safe to return home, amid the general
confusion, and proceeds to Milan to find Lucia. The terrible scenes in
the streets are graphically described, but the realism is combined
with a certain delicacy on the part of the author which renders even
its most dreadful details not wholly repulsive. For instance, Renzo
sees coming down the steps of one of the doorways.

“A woman with the delicate, yet majestic beauty, which is


conspicuous in the Lombard blood. Her gait was weary, but not
tottering; no tears fell from her eyes, though they bore tokens of
having shed many; there was something peaceful and profound in
her sorrow, which indicated a mind fully conscious and sensitive
enough to feel it.... She carried in her arms a little child, about
nine years old, now a lifeless body; but laid out and arranged,
with her hair parted on her forehead, and in a white and
remarkably clean dress, as if those hands had decked her out for
a long promised feast, granted as a reward. Nor was she lying
there, but upheld and adjusted on one arm, with her breast
reclining against her mother’s, like a living creature; save that a
delicate little hand, as white as wax, hung from one side with a
kind of inanimate weight, and the head rested upon her mother’s
shoulder with an abandonment deeper than that of sleep: her
mother; for, even if their likeness to each other had not given
assurance of the fact, the countenance which still depicted any
feeling would have clearly revealed it.”

“A horrible looking monatto approached the woman, and


attempted to take the burden from her arms, with a kind of
unusual respect, however, and with involuntary hesitation. But
she, slightly drawing back, yet with the air of one who shows
neither scorn nor displeasure, said, ‘No, don’t take her from me
yet; I must place her myself on this cart; here.’ So saying, she
opened her hand, displayed a purse which she held in it, and
dropped it into that which the monatto extended towards her. She
then continued: ‘Promise me not to take a thread from around
her, nor let any one else attempt to do so, and to lay her in the
ground thus.’

“The monatto laid his right hand on his heart; and then zealously,
and almost obsequiously, rather from the new feeling by which he
was, as it were, subdued, than on account of the unlooked-for
reward, hastened to make a little room on the car for the infant
dead. The lady, giving it a kiss on the forehead, laid it on the spot
prepared for it, as upon a bed, arranged it there, covering it with
a pure white linen cloth, and pronounced the parting words:
‘Farewell, Cecilia! rest in peace! This evening we, too, will join
you, to rest together forever. In the meanwhile, pray for us; for I
will pray for you and the others.’ Then, turning again to the
monatto, ‘You,’ said she, ‘when you pass this way in the evening,
may come to fetch me too, and not me only.’

“So saying, she re-entered the house, and, after an instant,


appeared at the window, holding in her arms another more
dearly-loved one, still living, but with the marks of death on its
countenance. She remained to contemplate these so unworthy
obsequies of the first child, from the time the car started until it
was out of sight, and then disappeared. And what remained for
her to do, but to lay upon the bed the only one that was left her,
and to stretch herself beside it, that they might die together, as
the flower already full blown upon the stem falls together with the
bud still enfolded in its calyx, under the scythe which levels alike
all the herbage of the field.”

Renzo learns that Lucia has been taken to the Lazaretto, and he
proceeds thither. The scenes in that dreadful abode of suffering are
described in detail. Here he meets Father Cristoforo, who in tending
the sick is already falling a victim.

“His voice was feeble, hollow, and as changed as everything else


about him. His eye alone was what it always was, or had
something about it even more bright and resplendent; as if
Charity, elevated by the approaching end of her labors, and
exulting in the consciousness of being near her source, restored
to it a more ardent and purer fire than that which infirmity was
every hour extinguishing.”

Renzo learns that Don Rodrigo himself is lying unconscious in one of


the miserable hovels, and, filled at first with rage at the recollection
of the man who has caused him so much wretchedness, he is at last
brought, by the commanding reproofs of Father Cristoforo, into such
a forgiving spirit that he can pray for his enemy’s salvation.

Renzo seeks Lucia in vain amid the procession of the few persons
who were going forth cured from the Lazaretto, but he finds her at
last, convalescent in one of the little huts in the woman’s quarters. A
very characteristic conversation ensued between the lovers in regard
to the binding nature of her vow, which Renzo naturally disputes,
and calls Father Cristoforo to remonstrate and interpose. The good
father consolingly tells Lucia that she had no right to offer to the
Lord the will of another to whom she was already pledged; and by
virtue of the authority of the church he absolves her from her vow.
It is not long until the lovers, restored to their former happiness,
leave the Lazaretto; and the book concludes with the consummation
of their wishes—their marriage, and a happy wedded life.

A great deal of quiet satire pervades the story. Take, for instance,
the following, in the description of Lecco, at the very opening of the
book:

“At the time the events happened which we undertake to recount,


this town, already of considerable importance, was also a place of
defense, and for that reason had the honor of lodging a
commander, and the advantage of possessing a fixed garrison of
Spanish soldiers, who taught modesty to the damsels and
matrons of the country; bestowed from time to time marks of
their favor on the shoulders of a husband or a father; and never
failed, in autumn, to disperse themselves in the vineyards, to thin
the grapes, and lighten for the peasant the labors of the vintage.”
There is a great deal of homely philosophy intermixed with this
satire. For instance, the criticism of

“those prudent persons who shrink back with alarm from the
extreme of virtue as well as vice, are forever proclaiming that
perfection lies in the medium between the two, and fix that
medium exactly at the point which they have reached, and where
they find themselves very much at their ease.”

These delicate touches come in most appropriately, and, as it were,


spontaneously from the context. They are never lugged in head
foremost, for the evident purpose of saying a good thing.

The book abounds in apt similes; for instance, in the following


description of Perpetua’s vain efforts to keep a secret:

“But certain it is that such a secret in the poor woman’s breast


was like very new wine in an old and badly-hooped cask, which
ferments, and bubbles, and boils, and if it does not send the bung
into the air, works itself about till it issues in froth, and penetrates
between the staves, and oozes out in drops here and there, so
that one can taste it, and almost decide what kind of wine it is.”

When the bravoes, led by Griso, in the guise of a pilgrim, attempt to


carry off Lucia from her home and are suddenly thrown into
consternation by the pealing of the bell, the author tells us:

“It required all the authority of Griso to keep them together, so


that it might be a retreat and not a flight. Just as a dog urging a
drove of pigs, runs here and there after those that break the
ranks, seizes one by the ears, and drags him into the herd,
propels another with his nose, barks at a third that leaves the line
at the same moment, so the pilgrim laid hold of one of his troop
just passing the threshold, and drew him back, detained with his
staff others who had almost reached it, called after some who
were flying they knew not whither, and finally succeeded in
assembling them all in the middle of the courtyard.”
The characters are extremely well described. Perhaps the two lovers
are the least striking of any in the book. Lucia is a simple peasant
girl; Renzo, a rash, impulsive, kindly boy, easily led, a very natural,
grown-up child such as Italy produces in greater luxuriance than
colder and severer latitudes. There are no passionate love scenes in
the book. The affection of the betrothed for each other seems rather
an incident than the principal theme of the story. Don Ferrante, the
husband of Donna Prassede, is a fine type of scholastic pedantry.
The catalogue of his ridiculous acquirements in the absurd
philosophy and learning of the time, with long lists of authors now
unknown, reminds us of the studies of Don Quixote; Don Ferrante,
too, is skilled in the science of chivalry, wherein he enjoyed the title
of “Professor,” and “not only argued on it in a real, masterly manner,
but, frequently requested to interfere in affairs of honor, always gave
some decision.”

The officiousness of Donna Prassede is well set forth in the


following:

“It was well for Lucia that she was not the only one to whom
Donna Prassede had to do good.... Besides the rest of the family,
all of whom were persons more or less needing amendment and
guidance—besides all the other occasions which offered
themselves to her, or she contrived to find, of extending the same
kind offices, of her own free will, to many to whom she was under
no obligations; she had also five daughters, none of whom were
at home, but who gave her much more to think about than if they
had been. Three of these were nuns, two were married; hence
Donna Prassede naturally found herself with three monasteries,
and two houses to superintend; a vast and complicated
undertaking, and the more arduous, because two husbands,
backed by fathers, mothers, and brothers; three abbesses,
supported by other dignitaries, and by many nuns, would not
accept her superintendence. It was a complete warfare, alias five
warfares, concealed, and even courteous, up to a certain point,
but ever active, ever vigilant. There was in every one of these
places a continued watchfulness to avoid her solicitude, to close
the door against her counsels, to elude her inquiries, and to keep
her in the dark, as far as possible, on every undertaking. We do
not mention the resistance, the difficulties she encountered in the
management of other still more extraneous affairs; it is well
known that one must generally do good to men by force.”

The story, like some other of the greatest works of fiction—like Don
Quixote, Les Misérables, nay, like Henry Esmond itself, is somewhat
too prolix. The long historical citations, the extracts from the edicts
and proclamations of the time, look as if the author considered it
necessary to prove his story rather than to let it prove itself. That
Renzo and Lucia should leave Father Cristoforo to die alone is, to my
mind, the most serious blemish in the book; but in spite of these
shortcomings, “The Betrothed” is entitled to one of the first places in
the front rank of the masterpieces of fiction.
EUGENIE GRANDET
HONORÉ DE BALZAC

It is not quite fair to Balzac to judge him by any one of the stories in
his encyclopædic “Comedie Humaine.” The countless varieties of life
and character which he portrays show the author’s versatility and
power, and have perhaps a value from their very number which can
not be adequately treated when we consider only a single specimen
of his work. Many of his characters, it is true, are grotesques; some
are absolute deformities; others are hard to understand by any but a
Frenchman,—French human nature, as it seems to me, being a little
different from human nature elsewhere; but there is one great work
of his which, although it is not without its morbid side, must appeal
to the common consciousness of all mankind, and bring to every
human heart the conviction of its spiritual truth. “Eugenie Grandet”
is a novel of this universal kind of excellence.

The plot is a very simple one. M. Grandet is a miser who lives in an


old comfortless house in Saumur with his wife, his daughter Eugenie,
and big Nanon, the maid of all work. The Cruchots and the De
Grassins are intriguing for the hand of the heiress, and on Eugenie’s
birthday, when all these are assembled, a stranger unexpectedly
appears, Charles Grandet, her cousin, committed to the care of his
uncle by his father in Paris, who has become a bankrupt and has
determined upon suicide. Charles, however, knows nothing of this,
and is overcome with pitiful grief when he learns of his father’s
death. Eugenie, a simple minded girl, falls in love with him, but the
old miser, anxious to get rid of him, sends him to the Indies.
Grandet’s tyranny over his wife and child is graphically portrayed.
The poor wife succumbs to it and dies. It is not long till the miser
follows her, and Eugenie is left alone with a colossal fortune for
which she cares nothing, and with a lover from whom she has
received no word. In the meantime Charles has acquired a fortune of
his own, and on his return writes to her that he wishes to marry
another. Her dream is over, the light of her life is extinguished; she
gives her hand without her heart to Cruchot, and upon his death
continues her hopeless life alone in the desolate home,
administering her estate with economy, but devoting its proceeds to
works of beneficence.

This is a story, the like of which has happened many a time in actual
life, but the cold skeleton of the tale as given above conveys not the
slightest idea of the warm flesh and blood with which it is invested.
The description of the old street and the dreary house and its
furniture is a literary jewel. The account of the way in which Grandet
accumulates his fortune, and of the neighborhood rumors regarding
his wealth, stirs our own acquisitiveness as we read it, and shows
him to be a very natural and almost inevitable sort of miser. He is
moreover a man of commanding ability, who extorts respect even
though he inspires abhorrence. The details of his habits, his
economies, and his schemes, as well as his personal appearance, are
admirably given. Equally lifelike are the descriptions of big Nanon,
the devoted house-servant, starved and overtasked, yet always
grateful to the master who took her when none others would; of the
wife, submissive, sensitive, magnanimous, and uncomplaining; and
of Eugenie, a girl who has grown up in perfect innocence of the
world, pure, beautiful, and of a generous and noble spirit. All these
are the subjects of an odious domestic tyranny on the part of
“Goodman Grandet,” the particulars of which are set forth with
powerful fidelity.
Charles is a rather uninteresting young dandy, who comes arrayed
for conquest. It is not unnatural that an artless girl like Eugenie
should fall in love with him, and her devices to procure him such
luxuries as a cake, a wax candle, and sugar for his coffee, add to the
charm of their simple love-making. The sympathy of the two women
in his sorrow contrasts sharply with the sordid calculations of the
miser, and the scene where Eugenie learns his needs by furtively
reading two of his letters (for even her good qualities are decidedly
of the French type) and then brings him her little store of gold, and
when he hesitates, begs him on her knees to take it—this scene is
very effective, as is also her despairing cry, after he departs, “O
mother, mother, if I had God’s power for one moment!”

But the more tragic parts of this simple drama are near its close,—
the stormy scene when Grandet learns that Eugenie has given
Charles her money, her imprisonment in a room of the old house,
her mother’s illness and patient death, and, ghastliest of all, the last
hours of the miser:

“So long as he could open his eyes, where the last sparks of life
seemed to linger, they used to turn at once to the door of the
room where all his treasures lay, and he would say to his
daughter, in tones that seemed to thrill with a panic of fear:

“‘Are they there still?’

“‘Yes, father.’

“‘Keep watch over the gold!... Let me see the gold.’

“Then Eugenie used to spread out the louis on a table before him,
and he would sit for whole hours with his eyes fixed on the louis
in an unseeing stare, like that of a child who begins to see for the
first time; and sometimes a weak infantine smile, painful to see,
would steal across his features.

“‘That warms me!’ he muttered more than once, and his face
expressed a perfect content.
“When the curé came to administer the sacrament, all the life
seemed to have died out of the miser’s eyes, but they lit up for
the first time for many hours at the sight of the silver crucifix, the
candlesticks, and holy water vessel, all of silver; he fixed his gaze
on the precious metal, and the wen on his face twitched for the
last time.

“As the priest held the gilded crucifix above him that the image of
Christ might be laid to his lips, he made a frightful effort to clutch
it—a last effort which cost him his life. He called Eugenie, who
saw nothing; she was kneeling beside him, bathing in tears the
hand that was growing cold already. ‘Give me your blessing,
father,’ she entreated. ‘Be very careful!’ the last words came from
him; ‘one day you will render an account to me of everything here
below.’ Which utterance clearly shows that a miser should adopt
Christianity as his religion.”

Then follows the long waiting of Eugenie; the dastardly letter sent
by Charles after his return; the noble dignity with which she releases
him and pays his father’s creditors to preserve the honor of one who
is quite careless of it himself, and then resigns herself to her
hopeless destiny.

“Eugenie Grandet” is a consummate work of art.


DEAD SOULS
NIKOLAI GOGOL

“Dead Souls,” the masterpiece of Gogol, is not very widely known


among English readers, but it is entitled to a high rank in literature.
Perhaps the fact that it is a torso has been one cause of this neglect,
for before the second volume was finished the author was overtaken
by that madness which clouded his last days. But the first volume is
practically complete in itself. It records the efforts of the smug,
shrewd, rascally Tchitschikoff to procure from various landowners
certain paper transfers of the serfs who had died on their estates
since the last enumeration in order to effect a fraudulent loan by
means of a list corresponding with the official register. The
description of the stranger, of his sudden arrival in a provincial city,
of the various estates he visits and the remarkable people he
encounters, and then, while his enterprise is prospering, of the
sudden spreading of the scandal through the town and his forced
flight to other regions—these things are told with a power of
portraiture which is amazing. The characters he describes are
sometimes grotesque, but they are faithful to the essentials of
human nature; even the wild Nozdreff and the massive Sobakevitch
are very real. Gogol has been called the Dickens of Russian
literature, and his portraits, while fewer in number and variety, are
less like puppets than many of those drawn by the English novelist.
His description of Pliushkin the miser is quite as striking as that of
L’Avare of Molière or Père Grandet of Balzac, while his account of the
way the gossip regarding Tchitschikoff started and circulated is as
fine as anything in “The School for Scandal.” He calls his book a
“poem,” and although it is quite devoid of versification or lofty
diction, yet if the word “poem” means a “work of original creative
art,” “Dead Souls” will fully justify the name.

It has the same sort of masterly quality as “Don Quixote,” and


transports us as completely to the scenes which it describes. His
patriotic apostrophe to Russia in the final chapter, and his description
of the swift flight of the hero in his troika, are picturesque and
eloquent to the last degree.
THE THREE GUARDSMEN
ALEXANDRE DUMAS

Probably there is no better example of the novel of adventure than


“The Three Guardsmen,” by Alexandre Dumas. The author claims in
his preface a historical origin for his novel. However that may be, the
plot seems plausible in spite of its extravagances, and never was
there a book in which men conspired and slaughtered each other
more merrily, nor in which the mere strenuous life without moral
accessories has found a more perfect embodiment.

The book in its way is a masterpiece. The style is simple and


luminous to such a degree as would hardly be possible in any other
language than that in which it was written. No work in the world is
more easy to read, to understand, or to translate. The old French
dictum that no words should be used in literature which can not be
understood upon the market-place here attains its highest
realization.

As for the characters, they are of the simplest type. The dashing
devil-may-care soldier and adventurer, the deep drinker, the heavy
player, the man who with equal gayety defies the bullets of the
enemy and the commonest precepts of morality, has here his
apotheosis. Perhaps the hero of the book even more than
D’Artagnan himself is Athos, the chief of the three musketeers, who,
having made an unfortunate marriage in his youth, has forsaken his
name and station and embarked upon a life of mere adventure. We
love him and admire him, and yet it is hard to tell why upon any
logical or ethical principles we should do either. Yet when he gets
very drunk, or when he hangs his wife because he finds that she
bears upon her shoulder the mark of a criminal conviction, we feel
that he has done in each case exactly the right thing. Generally a
novelist seeks by contrasting his hero with more commonplace
characters to set him off in relief, but in this novel almost everybody
is a hero, and all are equally and superlatively great and admirable,
except perhaps the poor woman who has been hanged and comes
to life again and engages in divers diabolical plots against the rest of
the world.
JANE EYRE
CHARLOTTE BRONTE

“Jane Eyre” is a book which impresses the reader with its power,—I
might say its masculine power, were it not for the fact that the
author gives us at every turn the woman’s point of view.

The narrative, like that of “David Copperfield,” is in the form of an


autobiography, and the plot, which is quite simple, has only that sort
of unity which the heroine gives it. Yet the work glows with intense
passion and the characters are so faithful to nature that they
convince us that vivid personal experience must have come to the
aid of the author’s imagination in delineating them.

Jane Eyre, an orphan, is abused and mistreated in childhood, first in


the family of Mrs. Reed, where she is brought up, and afterwards at
the Lowood charity school, where she is first a pupil and then
becomes a teacher. She seeks a situation as governess, and finds
employment at Thornfield Hall, the residence of a Mr. Rochester,
who, after a wild, dissipated, wandering life, has come, some time
before, into possession of this splendid property. Here she has the
charge of Adele, his ward.

There is a certain uncanny secret about Thornfield which the


governess finds herself unable to fathom. She hears wild laughter
and inarticulate sounds in a distant part of the Hall. One night
Rochester’s bed is mysteriously set on fire, and Jane Eyre saves his
life. On another occasion, while the house is full of guests, a horrible
shriek comes from the upper floor and a murder is well nigh
committed by some unknown creature who is hidden there.

In the meantime Mr. Rochester has become greatly interested in his


little governess, who, although quiet and plain in appearance, is
warm-hearted and high-spirited, with a strong sense of duty, great
courage, and an indomitable will. And she on her side becomes
fascinated and at last utterly devoted to her master, a man of
brilliant parts, strong, brusque, proud and autocratic. He offers her
his hand, and she accepts him, to learn, however, in the very
presence of the altar and during the wedding ceremony, that he has
another wife! It seems that in his early years he had been beguiled
into a marriage in the West Indies with a woman whose dissolute
courses had wrecked his life, and had terminated in her own
madness, and that this was the maniac who had occasioned the
strange scenes at the Hall.

Jane Eyre now flees from Thornfield, concealing all traces of her
whereabouts. She wanders amid incredible hardships and
destitution, and at last finds shelter at Moor House, the home of St.
John Rivers and his two sisters, who are afterwards discovered to be
her relatives, and with whom she divides a legacy which she
receives from a deceased uncle. St. John is a country clergyman of
high character, full of zeal, ambition, and fanaticism, and determined
to devote his life to missionary service in India. He seeks her hand,
but she realizes that it is not from love but to make her his fellow
laborer in the work of the Gospel. He has sought to inspire her with
his own enthusiasm, and she is on the point of yielding, when she
seems to hear the voice of Rochester calling to her in pain and
anguish. She returns to Thornfield, and finds that the Hall has been
consumed in a conflagration kindled by the maniac, and that
Rochester, who had sought in vain to save the life of the wretched
creature, has been himself rescued, blind and a cripple, from the
ruins. She seeks him and becomes his wife.
But the bare recital of these leading events gives very little idea of
the characters in this somber and tragic tale, or the feelings which
control their actions. The book must be read through to be
understood. From the very beginning the author strikes a resounding
chord in human nature. Brutality to children stirs us to fury, and no
one, not even Dickens or Victor Hugo, has painted this form of
tyranny in livelier colors than Charlotte Brontë. The conduct of Mrs.
Reed and of Rev. Mr. Brocklehurst, the sanctified and inhuman
director of Lowood school, arouses our hot resentment.

Of course there are blemishes in the book. Sometimes the


conversation is too carefully written to be natural. Then there is an
intrinsic improbability in the plot. Why should a young woman so
self-sufficient as the heroine consent to marry Rochester before she
had solved the secret of Thornfield? But these defects in the novel
are trifling by the side of its abounding excellences. At nearly every
point the heroine awakens our admiration; we feel (sometimes,
perhaps, in spite of our better judgment) that she is doing right; and
so masterly is the author’s portraiture that, in spite of many
repulsive features, she awakens a stronger sympathy for the seared
and blighted Rochester than for the pure and devoted yet inexorable
St. John Rivers. Jane Eyre is an eloquent novel. It is emphatically a
work of genius.
CARMEN
PROSPER MERIMÉE

It has always seemed to me that “Carmen” was a story of great


power and told with wonderful skill. I know not whether it be fact,
nor whether the author has learned it in the way he says; but so
convincing is the narrative, it seems to me impossible that it is a
mere product of the imagination. Yet the leading characters are so
abnormal that I sometimes wonder why I believe this story so
thoroughly. It must be because it is true.

The author, in pursuing certain archæological researches to discover


the site of the ancient battle of Munda, comes with his guide upon a
secluded amphitheatre among the rocks, where he suddenly
encounters an outlaw, José Navarro, whom he makes his friend by
the exchange of some simple courtesies and by warning him at the
humble venta where they lodge together, of the approach of the
officers of justice.

Some days afterwards, while the author was leaning upon the
parapet of the quay at Cordova, Carmen, a young gipsy girl of a
strange and savage beauty, comes and sits near him. After some
conversation he accompanies her to her residence to have his
fortune told. Suddenly the door opens, and Navarro, in a very bad
humor, enters the room. A quarrel ensues between him and Carmen
in the gipsy language, and it appears from the gestures that the

You might also like