Full Download PDF of Deep Learning in Bioinformatics: Techniques and Applications in Practice - Ebook PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Deep Learning in Bioinformatics:

Techniques and Applications in


Practice - eBook PDF
Go to download the full and correct content document:
https://ebooksecure.com/download/deep-learning-in-bioinformatics-techniques-and-a
pplications-in-practice-ebook-pdf/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Artificial Intelligence and Deep Learning in Pathology


1st Edition Stanley Cohen Md (Editor) - eBook PDF

https://ebooksecure.com/download/artificial-intelligence-and-
deep-learning-in-pathology-ebook-pdf/

(eBook PDF) Bioinformatics and Data Analysis in


Microbiology

http://ebooksecure.com/product/ebook-pdf-bioinformatics-and-data-
analysis-in-microbiology/

(eBook PDF) Data Mining for Business Analytics:


Concepts, Techniques, and Applications in R

http://ebooksecure.com/product/ebook-pdf-data-mining-for-
business-analytics-concepts-techniques-and-applications-in-r/

Learning Control Applications in Robotics and Complex


Dynamical Systems 1st Edition- eBook PDF

https://ebooksecure.com/download/learning-control-ebook-pdf/
Statistical Process Monitoring Using Advanced Data-
Driven and Deep Learning Approaches: Theory and
Practical Applications 1st Edition Fouzi Harrou - eBook
PDF
https://ebooksecure.com/download/statistical-process-monitoring-
using-advanced-data-driven-and-deep-learning-approaches-theory-
and-practical-applications-ebook-pdf/

Cell biology : translational impact in cancer biology


and bioinformatics 1st Edition Mitchell - eBook PDF

https://ebooksecure.com/download/cell-biology-translational-
impact-in-cancer-biology-and-bioinformatics-ebook-pdf/

(eBook PDF) Encyclopedia of Bioinformatics and


Computational Biology: ABC of Bioinformatics

http://ebooksecure.com/product/ebook-pdf-encyclopedia-of-
bioinformatics-and-computational-biology-abc-of-bioinformatics/

(eBook PDF) Deep Learning for Medical Image Analysis by


S. Kevin Zhou

http://ebooksecure.com/product/ebook-pdf-deep-learning-for-
medical-image-analysis-by-s-kevin-zhou/

Big Data Analytics in Chemoinformatics and


Bioinformatics: With Applications to Computer-Aided
Drug Design, Cancer Biology, Emerging Pathogens and
Computational Toxicology 1st Edition - eBook PDF
https://ebooksecure.com/download/big-data-analytics-in-
chemoinformatics-and-bioinformatics-with-applications-to-
computer-aided-drug-design-cancer-biology-emerging-pathogens-and-
Deep Learning in
Bioinformatics
This page intentionally left blank
Deep Learning in
Bioinformatics
Techniques and Applications in
Practice

Habib Izadkhah
Department of Computer Science
University of Tabriz
Tabriz, Iran
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

ISBN: 978-0-12-823822-6

For information on all Academic Press publications


visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner


Acquisitions Editor: Chris Katsaropoulos
Editorial Project Manager: Joshua Mearns
Production Project Manager: Nirmala Arumugam
Designer: Victoria Pearson
Typeset by VTeX
To my wife Sepideh
and my children Amir Reza and Rose
This page intentionally left blank
Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
CHAPTER 1 Why life science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why deep learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contemporary life science is about data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Deep learning and bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 What will you learn? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 2 A review of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 What is machine learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Challenge with machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Overfitting and underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Mitigating overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Adjusting parameters using cross-validation . . . . . . . . . . . . . . . . . . . . 15
2.4.3 Cross-validation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Types of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 The math behind deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Relevant mathematical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 The math behind machine learning: statistics . . . . . . . . . . . . . . . . . . . . 25
2.7 TensorFlow and Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Real-world tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
CHAPTER 3 An introduction of Python ecosystem for deep learning . . . . . . . . . . . . . . . . . 31
3.1 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 SciPy (scientific Python) ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 A quick refresher in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 Data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 Control flow statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.5 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
vii
viii Contents

3.5 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Matplotlib crash course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 How to load dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.1 Considerations when loading CSV data . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.2 Pima Indians diabetes dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.3 Loading CSV files in NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.4 Loading CSV files in Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9 Dimensions of your data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10 Correlations between features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.11 Techniques to understand each feature in the dataset . . . . . . . . . . . . . . . . . . . . 53
3.11.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11.2 Box-and-whisker plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.11.3 Correlation matrix plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.12 Prepare your data for deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.12.1 Scaling features to a range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.2 Data normalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.3 Binarize data (make binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.13 Feature selection for machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.1 Univariate selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.2 Recursive feature elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.13.3 Principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.13.4 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.14 Split dataset into training and testing sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.15 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
CHAPTER 4 Basic structure of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 The neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Layers of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 How a neural network is trained? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Delta learning rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Generalized delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.1 Stochastic gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.2 Batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7.3 Mini-batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8 Example: delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8.1 Implementation of the SGD method . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8.2 Implementation of the batch method . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Limitations of single-layer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
CHAPTER 5 Training multilayer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Contents ix

5.2 Backpropagation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96


5.3 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Neural network models in keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 ‘Hello world!’ of deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.6 Tuning hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.7.1 Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.7.2 Value normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
CHAPTER 6 Classification in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1.1 Binary classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1.2 Pima indians onset of diabetes dataset . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.3 Label encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2 Multiclass classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.1 Sigmoid and softmax activation functions . . . . . . . . . . . . . . . . . . . . . . 128
6.2.2 Types of classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
CHAPTER 7 Introduction to deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.2 Improving the performance of deep neural networks . . . . . . . . . . . . . . . . . . . 132
7.2.1 Vanishing gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2.3 Computational load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.3 Configuring the learning rate in keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.3.1 Adaptive learning rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3.2 Layer weight initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.4 Imbalanced dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.5 Breast cancer detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.5.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.5.2 Introduction and task definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.6 Molecular classification of cancer by gene expression . . . . . . . . . . . . . . . . . . 163
7.6.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.6.2 Introduction and task definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
CHAPTER 8 Medical image processing: an insight to convolutional neural networks . . . . 175
8.1 Convolutional neural network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2 Convolution layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.3 Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.4 Stride and padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.5 Convolutional layer in keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
x Contents

8.6 Coronavirus (COVID-19) disease diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . 184


8.6.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.6.2 Introduction and task definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.7 Predicting breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.7.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.7.2 Introduction and task definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.7.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.8 Diabetic retinopathy detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.8.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.8.2 Introduction and task definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.8.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
CHAPTER 9 Popular deep learning image classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.2 LeNet-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.3 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.4 ZFNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.5 VGGNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.6 GoogLeNet/inception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.7 ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.8 DenseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.9 SE-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
CHAPTER 10 Electrocardiogram (ECG) arrhythmia classification . . . . . . . . . . . . . . . . . . . . 249
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.2 MIT-BIH arrhythmia database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.4 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.5 Architecture of the CNN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
CHAPTER 11 Autoencoders and deep generative models in bioinformatics . . . . . . . . . . . . 261
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.2 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
11.2.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.2.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.2.3 Distance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.3 Variant types of autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.3.1 Undercomplete autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.3.2 Deep autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Contents xi

11.3.3 Convolutional autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269


11.3.4 Sparse autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.3.5 Denoising autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.3.6 Variational autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.3.7 Contractive autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
11.4 An example of denoising autoencoders – bone suppression in chest radiographs 284
11.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
11.5 Implementation of autoencoders for chest X-ray images (pneumonia) . . . . . . . 290
11.5.1 Undercompleted autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
11.5.2 Sparse autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
11.5.3 Denoising autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
11.5.4 Variational autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
11.5.5 Contractive autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
11.6 Generative adversarial network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.6.1 GAN network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.6.2 GAN network cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
11.6.3 Cost function optimization process in GAN . . . . . . . . . . . . . . . . . . . . . 310
11.6.4 General GAN training process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
11.7 Convolutional generative adversarial network . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.7.1 Deconvolution layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.7.2 DCGAN network structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
CHAPTER 12 Recurrent neural networks: generating new molecules and proteins sequence
classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
12.2 Types of recurrent neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
12.3 The problem, short-term memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.4 Bidirectional LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
12.5 Generating new molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
12.5.1 Simplified molecular-input line-entry system . . . . . . . . . . . . . . . . . . . 329
12.5.2 A generative model for molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
12.5.3 Generating new SMILES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
12.5.4 Analyzing the generative model’s output . . . . . . . . . . . . . . . . . . . . . . . 337
12.6 Protein sequence classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
12.6.1 Protein structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12.6.2 Protein function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12.6.3 Prediction of protein function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12.6.4 LSTM with dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.6.5 LSTM with bidirectional and CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
CHAPTER 13 Application, challenge, and suggestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
13.2 Legendary deep learning architectures, CNN, and RNN . . . . . . . . . . . . . . . . . 347
xii Contents

13.3 Deep learning applications in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . 348


13.4 Biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
13.4.1 Learning tasks on graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
13.4.2 Graph neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
13.5 Perspectives, limitations, and suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
13.6 DeepChem, a powerful library for bioinformatics . . . . . . . . . . . . . . . . . . . . . . 357
13.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Acknowledgments

This book is a product of sincere cooperation of many people. The author would like to thank all those
who contributed in the process of writing and publishing this book. Dr. Masoud Kargar, Dr. Masoud
Aghdasifam, Hamed Babaei, Mahsa Famil, Esmaeil Roohparver, Mehdi Akbari, Mahsa Hashemzadeh,
and Shabnam Farsiani have read the whole draft and made numerous suggestions, improving the pre-
sentation quality of the book; I thank them for all their effort and encouragement.
I wish to express my sincere appreciation to the team at Elsevier, particularly Chris Katsaropoulos,
Senior Acquisitions Editor, for his guidance, comprehensive explanations of the issues, prompt reply
to my e-mails, and, of course, his patience. I would like to also thank Joshua Mearns and Nirmala
Arumugam, for preparing the production process and coordinating the web page, and the production
team. Finally, I thank “the unknown reviewers,” for their great job on exposing what needed to be
restated, clarified, rewritten, and/or complemented.

Habib Izadkhah

xiii
This page intentionally left blank
Preface

Artificial Intelligence, Machine Learning, Deep Learning, and Big Data have become the latest hot
buzzwords, Deep learning and bioinformatics being two of the hottest areas of contemporary research.
Deep learning, as an emerging branch from machine learning, is a good solution for big data analytics.
Deep learning methods have been extensively applied to various fields of science and engineering,
including computer vision, speech recognition, natural language processing, social network analyzing,
and bioinformatics, where they have produced results comparable to and in some cases superior to
domain experts. A vital value of deep learning is the analysis and learning of massive amounts of data,
making it a valuable method for Big Data Analytics.
Bioinformatics research comes into an era of Big Data. With increasing data in biology, it is ex-
pected that deep learning will become increasingly important in the field and will be utilized in a vast
majority of analysis problems. Mining potential value in biological data for researchers and the health
care domain has great significance. Deep learning, which is especially formidable in handling big data,
shows outstanding performance in biological data processing.
To practice deep learning, you need to have a basic understanding of the Python ecosystem. Python
is a versatile language that offers a large number of libraries and features that are helpful for Artificial
Intelligence and Machine Learning in particular, and, of course, you do not need to learn all of these
libraries and features to work with deep learning. In this book, I first give you the necessary Python
background knowledge to study deep learning. Then, I introduce deep learning in an easy to under-
stand and use way, and also explore how deep learning can be utilized for addressing several important
problems in bioinformatics, including drug discovery, de novo molecular design, protein structure pre-
diction, gene expression regulation, protein sequence classification, and biomedical image processing.
Through real-world case studies and working examples, you’ll discover various methods and strategies
for building deep neural networks using the Keras library. The book will give you all the practical in-
formation available on the bioinformatics domain, including the best practices. I believe that this book
will provide valuable insights for a successful career and will help graduate students, researchers, ap-
plied bioinformaticians working in the industry and academia to use deep learning techniques in their
biological and bioinformatics studies as a starting point.
This book
• provides necessary Python background for practicing deep learning,
• introduces deep learning in a convenient way,
• provides the most practical information available on the domain to build efficient deep learning
models,
• presents how deep learning can be utilized for addressing several important problems in bioinfor-
matics,
• explores the legendary deep learning architectures, including convolutional and recurrent neural
networks, for bioinformatics,
• discusses deep learning challenges and suggestions.

Habib Izadkhah
xv
This page intentionally left blank
CHAPTER

Why life science?


1
1.1 Introduction
There are many paths which people can follow based on their technical desires and interests in data.
Due to the availability of massive data in recent years, biomedical studies have drawn a great deal
of attention. The advent of modern medicine has transformed many fundamental aspects of human
life. Over the past 20 years, there have been innovations affecting the lives of many people. Not so
long ago, HIV/AIDS was considered a fatal disease. The ongoing development of antiviral treatments
has significantly increased the life expectancy of patients in developed countries. Other diseases such as
hepatitis C, which was not effectively treatable a decade ago, can now be treated. Genetic breakthroughs
have brought about high hopes for the treatment of different diseases. Innovation in diagnosis and
availability of precision tools enable physicians to diagnose and target a special disease in the human
body. Many of these breakthroughs have used and will benefit from computational methods.

1.2 Why deep learning?


Living in the golden era of machine learning, we are now experiencing a revolution directed by machine
learning programs.
In today’s world, machine learning algorithms are indispensable to every process ranging from
prediction to financial services. As a matter of fact, machine learning is a modern human invention
that has not only led to developments in industries and different businesses but also left a significant
footprint on the individual lives of humans. Scientists are developing certain algorithms which enable
digital assistants (e.g., Amazon Echo and Google Home) to speak well. There have also been notable
advances in psychologist robots.
Sentiment analysis is another modern application of machine learning. This is the process of de-
termining a speaker’s or an author’s attitudes or beliefs. Machine learning developments have allowed
for multilingual translation. In addition to the daily life, machine learning has affected many areas of
physical sciences and other aspects of life. The algorithms are employed for different purposes, rang-
ing from the identification of new galaxies through telescopic images to the classification of subatomic
reactions in the Large Hardon Collider.
The development of a class of machine learning methods, known as deep neural networks, has con-
tributed to these technological advances. Although the technological infrastructure of artificial neural
networks was developed in the 1950s and modified in the 1980s, the real power of this technique was
not totally perceived until the recent decade, in which many breakthroughs have been achieved in com-
puter hardware. While Chapters 3 and 4 give a more comprehensive review of neural networks, and a
Deep Learning in Bioinformatics. https://doi.org/10.1016/B978-0-12-823822-6.00008-1
Copyright © 2022 Elsevier Inc. All rights reserved.
1
2 Chapter 1 Why life science?

deep neural network (deep learning) is presented in the subsequent chapters of the book, it is important
to know about some of the breakthroughs achieved with deep learning first.
A common application of deep learning is image recognition. Using deep learning for facial recog-
nition includes a wide range of applications from security areas and cell phone unlocking methods to
automated tagging of individuals who are present in an image. Companies now seek to use this feature
to set up the process of making purchases without the need for credit cards. For instance, have you
noticed that Facebook has developed an extraordinary feature that lets you know about the presence of
your friends in your photos? Facebook used to make you click on photos and type your friends’ names
to tag them. However, as soon as a photo is uploaded, Facebook now does the magic and tags everybody
for you. This technology is called facial recognition.
Deep learning can also be utilized to restore images or eliminate their noise. This feature of machine
learning is also employed in different security areas, identification of criminals, and quality enhance-
ment of a family photo or a medical image. Producing fake images is also another feature of deep
learning. In fact, deep learning algorithms are able to generate new images of people’s faces, objects,
and even sceneries that have never existed. These images are utilized in graphic design, video game
development, and movie production.
Leading to a plethora of applications for users, many of the similar deep learning developments are
now employed in bioinformatics and biomedicine to classify tumor cells into various categories. Given
the scarcity of medical data, fake images can be produced to generate new data.
Deep learning has also resulted in many speech recognition developments that have become perva-
sive in search engines, cell phones, computers, TV sets, and other online devices everywhere.
So far, various speech recognition technologies have been developed, such as Alexa, Cortana,
Google Assistant, and Siri, changing human interactions with devices, homes, cars, and jobs. Through
the speech recognition technology, it is possible to talk with computers and devices, which can also
understand what the speech means and can make a response. Introducing voice-controlled or digital
assistants into the speech recognition market has changed the outlook of this technology in the 21st
century.
Analyzing its user’s behavior, a recommender system suggests the most appropriate items (e.g., data,
information, and goods). Helping users find their targets faster, this system is an approach proposed to
deal with the problems caused by the growingly massive amount of information. Many companies that
have extensive websites now employ recommender systems to facilitate their processes. Given different
preferences of various users at different ages, there is no doubt that users select different products; thus,
recommender systems should yield various results accordingly. Recommender systems have significant
effects on the revenues of different companies. If employed correctly, these systems can bring about
high profitability for companies. For instance, Netflix has announced that 60% of DVDs rented by users
are provided through recommender systems, which can greatly affect user choices of films.
Recommender systems can also be employed to prescribe appropriate medicines for patients. In
fact, prescribing the right medicines for patients is among the most important processes of their treat-
ments, for which accurate decisions must be made based on patients’ current conditions, history, and
symptoms. In many cases, patients may need more than one medicine or new medicines for another
condition in addition to a previous disease. Such cases increase the chances of medical error in the
prescription of medicines and the incidence of side effects of medicine misuse.
These are only a few innovations achieved through the use of deep learning methods in bioinformat-
ics. Ranging from medical diagnosis and tumor detection to production and prescription of customized
1.3 Contemporary life science is about data 3

medicines based on a specific genome, deep learning has attracted many large pharmaceutical and
medical companies. Many deep learning ideas used in bioinformatics are inspired by the conventional
applications of deep learning.
We are living in an interesting era when there is a convergence of biological data and the extensive
scientific methods of processing that kind of data. Those who can combine data with novel methods to
learn from data patterns can achieve significant scientific breakthroughs.

1.3 Contemporary life science is about data


As discussed earlier, the fundamental nature of life sciences has changed. The large-scale use of ma-
chine experiments has significantly increased the amount of producible experimental data. For instance,
signal processing and 3D imaging in empirical molecular biology can result in a large amount of raw
information. In the 1980s, a biologist would conduct an experiment and draw a conclusion. This ex-
periment would lack a sufficient amount of data because of computational limitations. In addition, the
experimental data would not be made available to others due to the absence of extensive communication
tools. However, modern biology benefits from a mechanism which can generate millions of experimen-
tal data in one or two days. Furthermore, experiments such as gene sequencing, which can generate
massive datasets, have become inexpensive and easy to access.
Advances in gene sequencing can produce the databases which attribute a person’s genetic code to
a multitude of health-related outcomes, including diabetes, cancer, and genetic diseases such as cystic
fibrosis. Employing computational techniques for the analysis and extraction of data, scientists are now
perceiving the causes of these diseases correctly in order to develop novel treatment methods.
The disciplines which used to basically rely on human observations now benefit from the datasets
that cannot easily be analyzed manually due to their massive dimensions. Machine learning is now
usually used for image classification. The outputs of these machine learning models are employed to
detect and classify cancerous tumors and evaluate the effects of potential treatments for a disease.
Advances in empirical techniques have resulted in the development of several databases which
list the structures of chemicals and their effects on a wide range of processes or biological activities.
These structure–activity relationships (SARs) lay the foundations for a discipline that is known as
cheminformatics. Scientists use the data of these large datasets to develop predictive models. Moreover,
making good and rapid decisions in the field of medicine can lead to the identification and optimization
of problems.
The huge amount of data requires a new generation of scientists who are competent in both scientific
and computational areas. Those who possess these combinatorial skills will have the potential to work
on the structures and procedures for big datasets and make scientific discoveries.
Bioinformatics is an interdisciplinary science that includes methods and software for understanding
biological information. Bioinformatics uses a combination of computer science, statistics, and mathe-
matics to analyze and interpret biological information. In other words, bioinformatics is used to analyze
biological problems using computer algorithms, mathematical and statistical techniques.
4 Chapter 1 Why life science?

1.4 Deep learning and bioinformatics


Deep learning with successful experimental results and wide applications has the potential to change
the future of medical science. Today, the use of artificial intelligence has become increasingly common
and is used in various fields such as cancer diagnosis. Deep learning also enables computer vision,
imaging, and more accurate medical diagnosis. So it is no surprise that a report from Report Linker
states that the market for artificial intelligence in the medical industry is expected to grow from $1.2
billion in 2018 to $26 billion in 2025!
Deep learning: the future of medical science
As deep learning has become so popular in the industry, the question arises as to how it will affect
our lives in the next few years. In medicine, although we have stored large amounts of patient data
over the past few years, deep learning has so far been used to analyze image or text data. In addition,
deep learning has recently been used to predict a wide range of problems and clinical outcomes. Deep
learning will have a wonderful future in medicine.
Today’s interest in deep learning in medicine stems from two factors. First, the growth of deep
learning techniques is widespread. Second, a dramatic increase in health care data.
Use deep learning in e-health records
Electronic health systems store patient data such as demographic information, medical records, and
test results. These systems can use deep learning algorithms to improve the correct diagnosis and the
time required to diagnose the disease. These algorithms use data stored in electronic health systems to
identify patterns of health trends and risk factors, and draw conclusions based on identified patterns.
Researchers can also use data from e-health systems to create depth learning models that predict the
likelihood of some health-related outcomes.

1.5 What will you learn?


Let us briefly review what you will learn in this book:
Chapter 2 provides a brief introduction to machine learning. I begin with a definition of Artifi-
cial Intelligence from Oxford Dictionary. Then, I provide a figure that shows the relationship between
Artificial Intelligence, Machine Learning, and Deep Learning. The difference between traditional pro-
gramming and machine learning methods is stated. In Chapter 2, I discuss the model. Machine learning
aims to automatically create a “model” from “data”, which you can then use to make decisions. Ma-
chine learning typically proceeds by initially splitting a dataset into a training set that is used to generate
a model and a test set that is used to evaluate the performance of the model. Chapter 2 also discusses
generalization. Generalization usually refers to a machine learning model’s ability to perform well on
unseen data rather than just the data that it was trained on. Due to the concept of generalizability in
machine learning, two other terms emerge, called Underfitting and Overfitting. If your model is overfit-
ted then it will not generalize well. I describe these problems using an example. Then, I summarize the
meanings of these two concepts. To deal with the overfitting problem, in general, there are two ways,
namely regularization and cross-validation, which I discuss.
There are different ways of how machines learn. In some cases, we train them (called supervised
learning) and, in some other cases, machines learn on their own (called unsupervised learning). In Chap-
1.5 What will you learn? 5

ter 2, I discuss the three ways that a machine can learn, which are supervised learning, unsupervised
learning, and reinforcement learning.
To work with deep learning, you need to be familiar with a number of mathematical and statistical
concepts. In Chapter 2, I outline some of the important concepts, e.g., tensors, you will be working with.
Chapter 2 introduces the Keras library where we will implement deep learning projects. Chapter 2 ends
by introducing several real-world tensors.
Chapter 3 provides a brief introduction to the Python ecosystem. If you would like to make a ca-
reer in the domain of deep learning, you need to know Python programming along with the Python
ecosystem. Based on the report of GitHub, Python is the most popular programming language used
for machine learning hosted on its service. To build effective deep learning models, you need to have
some basic understanding of the Python ecosystem, e.g., Numpy, Pandas, Matplotlib, and Scikit-learn
libraries. This chapter introduces various Python libraries and examples that are very useful to develop
deep learning applications.
The chapter begins with introducing four high-performance computing environments that you can
use to write Python programs without installing anything, including IPython, the Jupyter notebook,
Colaboratory, and Kaggle. Chapter 3 provides general descriptions about SciPy (Scientific Python)
ecosystem and Scikit-learn library. This chapter provides a few basic details about Python syntax you
should be familiar with to understand the code and write a typical program. The syntaxes discussed
include identifier, comments, data type, control flow statements, data structures, and functions. I provide
examples to explain these syntaxes.
NumPy is a Python core library that is widely used in deep learning applications. This library
supports multidimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on them. In Chapter 3, I provide several examples about this library which you will
need in deep learning applications. After providing an overview of NumPy, I discuss the Matplotlib
library which is a plotting library used for creating plots and charts. An easy way to load data is to use
the Pandas library. This library is built on top of the Python programming language. In Chapter 3, you
learn how to use this library to load data. In Python, there exist several ways to load a CSV data file to
use in deep learning algorithms. In Chapter 3 you will learn two frequently used ways: (1) loading CSV
files with NumPy and (2) loading CSV Files with Pandas. Reviewing the shape of the dataset is one
of the most frequent data manipulation operations in deep learning applications, for example, seeing
how much data we have, in terms of rows and columns. Chapter 3 also provides examples of this. After
that, I explain how you can use the Pearson correlation coefficient to determine the correlation between
features.
In Chapter 3, I explain Histograms, Box and Whisker Plots, and Correlation Matrix Plot, three
techniques that you can use to understand each feature of your dataset independently. Deep learning
algorithms use numerical features to learn from the data. However, when the features have different
scales, such as “Age” in years and “Income” in hundreds of dollars, the features using larger scales
can unduly influence the model. As a result, we want the features to be on a similar scale that can
be achieved through scaling techniques. In this chapter, you learn how to standardize the data using
Scikit-learn.
Bioinformatics datasets are often high-dimensional. Chapter 3 introduces several feature selection
methods. Feature selection is one of the key concepts in machine learning which is used to select a
subset of features that contribute the most to the output. It thus hugely impacts the performance of the
6 Chapter 1 Why life science?

constructed model. Chapter 3 ends with introducing the train_test_split() function which allows you to
split a dataset into the training and test sets.
Chapter 4 provides the basic structure of neural networks. In this chapter, I discuss the types of
neural network and provide an example of how to train a single-layer neural network. Chapter 4 dis-
cusses gradient descent which is used to update the network’s weights. To this end, three gradient
descent methods, namely Stochastic Gradient Descent, Batch Gradient Descent, and Mini-batch Gradi-
ent Descent, are discussed. Chapter 4 ends with a discussion about the limitations of single-layer neural
networks.
Training a multilayer neural network is discussed in Chapter 5. In this chapter, the backpropagation
algorithm, an effective algorithm used to train a neural network, is introduced. After that, I explain how
you can design a neural network in Keras. The MNIST dataset is often considered the “hello world” of
deep learning. The purpose of this example is first to classify different types of handwritten numbers
based on their appearance and then to classify the handwritten input into the most similar group in order
to identify the corresponding digit. In this chapter, I implement a handwritten classification problem
with dense layers in Keras. After the implementation of this problem, you can learn the components of
neural networks without going into technical details. Chapter 5 ends with a discussion about two more
general data preprocessing techniques, namely vectorization and value normalization. After studying
this chapter, you will be able to design a deep learning network with dense layers.
Chapter 6 discusses the classification problem. Classification is a very important task in bioinfor-
matics and refers to a predictive modeling problem where a class label is predicted for a given input
data. Pima Indians Diabetes Database is employed to predict the onset of diabetes based on diagnostic
measures. In this dataset, there are 768 observations with 8 input variables (i.e., the number of features)
and one output variable (Diabetic and Nondiabetic). In this chapter, using the Pima dataset, I imple-
ment a binary classification in Keras to classify people into diabetic and nondiabetic categories. Neural
networks expect numerical input values. For nonnumerical data, we need to convert it to numerical data
to make this data ready for the network. Label encoding is one of the popular processes of converting
labels, i.e., categorical texts, into numeric values in order to make them understandable for machines.
Chapter 6 explains how you can do this. In this chapter, I also discuss multiclass classification.
Chapter 7 provides an overview of deep learning. Deep learning is a type of machine learning that
has improved the ability to classify, recognize, detect, and generate—or in one word, understand. Chap-
ter 7 helps you understand why deep learning was introduced much later than the single-layer neural
networks and also what challenges deep learning faces. This chapter discusses the most important chal-
lenge of deep learning, namely overfitting, and how to deal with it. This chapter will show you how
to build a deep neural network with two examples from the bioinformatics field, namely breast cancer
classification and molecular classification of cancer by gene expression, using the Keras library.
In deep learning, the main problem is overfitting. The best solution to reduce overfitting is to get
more training data. When no further training data can be accessed, the next best solution is to limit
the amount of information your model can store or be allowed to store. This is called regularization.
In Chapter 7, I describe three techniques, namely reducing the network’s size, dropout, and weight
regularization, to deal with overfitting. Another important concept discussed in this chapter is how to
deal with imbalanced datasets. A dataset is said to be imbalanced when there is a significant difference
in the number of instances in one set of classes, called a majority class, compared to another set of
classes, called a minority class. In imbalanced datasets, neural networks can function well. To deal
with this problem, this chapter introduces RandomOverSampler class in Keras.
1.5 What will you learn? 7

Chapter 8 introduces the convolutional neural network, a deep neural network with special image
processing applications. Such networks significantly improve the processing of information (images) by
deep layers. In Chapter 8, I briefly explain the basic part of a convolution architecture. How convolution
works can hardly be described in words. However, the concept and steps of calculating it are simpler
than they first seem. In this chapter, using a simple example, I show how convolution works. This
chapter also discusses the pooling layer. This layer is utilized to reduce the image’s size by summarizing
neighboring pixels and giving them a value. In fact, it is a downsampling operation. In this chapter, I
implement three medical image processing problems, namely predicting coronavirus disease (COVID-
19), predicting breast cancer, and diabetic retinopathy detection, in Keras. After studying these three
problems, you will learn many practical concepts and techniques in image processing.
Chapter 9 provides an overview of popular deep learning image classifiers. In this chapter, I analyze
eight well-known image classification architectures that have been ranked first in the ILSVRC compe-
tition in different years, along with their Keras codes. After studying this chapter, you will be able to
design high-precision convolutional networks for a problem of interest.
In Chapter 10, I discuss electrocardiogram (ECG) arrhythmia classification. Arrhythmia refers to
any irregular change from normal heart rhythms. This chapter helps you understand how to classify
ECG signals into normal and different types of arrhythmia using a convolutional neural network (CNN).
This chapter provides a Keras code to do this.
Chapter 11 discusses autoencoders and generative models and how to implement them. The net-
works discussed in this chapter, although seemingly identical to the previous chapters, use different
concepts called encoding and decoding. These concepts were not present in previous chapters. The au-
toencoders and generative models are a newly emerging field in deep learning, showing a lot of success
and receiving increasing attention in the deep learning area in the past couple of years. In this chapter,
I will discuss different types of deep generative model and focus on autoencoders’ variations, teaching
how to implement and train autoencoders and deep generators using Keras.
A large part of the data, such as speech, protein sequence, data received from sensors, videos,
and texts, are inherently serial (sequential). Sequential data are data whose current value depends on
previous values. Recurrent neural networks (simple RNNs) are a good way to process sequential data
due to considering the sequence dependence in the calculations. But their capability to compute long
sequence data is limited. The long short-term memory networks, in short LSTM, is a type of recurrent
neural network utilized to handle large sequences. In Chapter 12, I discuss RNN and LSTM, as well as
two important topics in bioinformatics, namely protein sequence classification and the design of new
molecules.
Chapter 13 presents several deep learning applications in bioinformatics, then discusses several deep
learning challenges and ways we can overcome them.
This page intentionally left blank
CHAPTER

A review of machine learning


2
2.1 Introduction
“Machine learning can’t get something from nothing ... what it does is get more from less.” – Dr. Pedro
Domingo, University of Washington

Before moving on to the meaning of machine learning, let us find out what the sense of Artificial
Intelligence (AI) is. According to Webster’s dictionary, intelligence is the ability to learn and solve
problems. It will be recalled that intelligence is the skill to obtain and apply knowledge. Knowledge
is the information taken through experience or/and training. Artificial, also, refers to something that is
simulated or made by humans, not by nature.
Now we are ready to define AI. There is not a unique definition for AI. The Oxford Dictionary
defines AI as “the theory and development of computer systems able to perform tasks normally requir-
ing human intelligence, such as visual perception, speech recognition, decision-making, and translation
between languages.” It is, therefore, an intelligence where we would like to add all the abilities to a
machine that human mind contains.

2.2 What is machine learning?


Machine Learning (ML), as a branch of AI, is the study of algorithms and statistical inference that a
machine can learn on its own without being explicitly programmed, build upon on patterns and infer-
ence instead. Here’s is a basic definition of ML—machine learning is a data analysis method that learns
from that data and then employs what they have learned to make well-informed decisions. Many people
think that the terms machine learning, deep learning, and artificial intelligence are the same and they
use these words interchangeably. These terms overlap and easily could be confused. In the computer
science field, these terms are related but not identical. Fig. 2.1 depicts the relationships among these
three terms. AI is an umbrella term often used to describe systems that make automatic decisions on
their own. ML is a way of achieving AI, which means by the use of machine learning we may be able to
achieve AI in the future. While AI is the broad field of study that mimics human intelligence, machine
learning is a specific branch of AI that trains a machine how to learn. Therefore, when we are talking
about AI, it is everything else from machine learning and deep learning. Deep learning is just a subset
of machine learning and machine learning is a subset of AI.
Machine learning provides the system with the capability of automatically learning from historical
data without using explicit instructions. Fig. 2.2 shows the difference between traditional programming
and machine learning methods.
Deep Learning in Bioinformatics. https://doi.org/10.1016/B978-0-12-823822-6.00009-3
Copyright © 2022 Elsevier Inc. All rights reserved.
9
10 Chapter 2 A review of machine learning

FIGURE 2.1
The relationship between artificial intelligence, machine learning, and deep learning.

FIGURE 2.2
Traditional programming (left) vs machine learning (right).

In machine learning, we can generate a program (also known as a learned model) by integrating the
input and output of that program.
2.2 What is machine learning? 11

Machine learning is very popular now and is often synonymous with artificial intelligence.
In general, one cannot understand the concept of artificial intelligence without knowing how
machine learning works.

Let x and y are two vectors. In most of the problems in machine learning, the aim is to create a
mathematical function as follows:
y = f (x).
This function may take many vectors as input, perhaps thousands or even millions, and may generate
many numbers as output. Here are some examples of functions you may want to create:
• x contains the health characteristics of a large number of people, e.g., Pregnancies, Glucose, Blood
Pressure, Skin Thickness, Insulin, BMI, Age, and f (x) should equal to 1 if a person has diabetes
and 0 if it does not.
• x is the structure of a protein (i.e., a sequence of acids and amino acids) and f (x) must determine
the function of a protein, depending on the dataset used, there can be many functions.
• x contains a number of color images and f (x) should equal to 1 if the image has breast cancer and
0 if it does not.
• x contains a number of chest radiograph (chest X-ray) images; f (x) should be a vector of numbers.
The first element indicates whether the image contains a pleural thickening, the second whether it
contains cardiomegaly, the third whether it contains a nodule, and so on for many types of objects.
As you can see, f (x) can be a very, very complex function! It usually takes a lot of inputs and tries
to extract patterns from them that cannot be extracted manually just by looking at the input numbers.
In machine learning, f (x) is called the model.
In machine learning, we basically try to build a model from the dataset, which is referred to as the
“learned model,” to predict the new and unseen data. This short description has implications that may
not be obvious at first glance. Consequently, let me elaborate on this, just a few words first. Machine
learning aims to automatically create a “model” from “data,” which you can then use to make decisions.
In this direction, the data means information such as genes, proteins, images, documents, etc.
Before going further toward the model, let me step aside from the model a bit. If you have noticed,
the definition of machine learning only describes the concepts of data and model, and does not discuss
anything about “learning.” The term machine learning itself describes the process of finding a model
by analyzing data without having to be done by a human. Because this process, i.e., finding a model,
is trained with the help of data, we call it the “learning process.” Therefore, the data used for building
the “model” is called the training set. The first thing you need, of course, is a training set to train the
model. Fig. 2.3 depicts the overall process of machine learning.
I need to point out that the dataset used for a problem is initially split into two sets: training set and
test set. As mentioned earlier, the samples in the training set are used to train the model and the samples
in the test set are used to evaluate the performance of the resulting model. Fig. 2.4 shows this division.
After testing the model, if it is observed that the model is performing well enough, it can be used in the
real environment for new data.
Model is our main interest in this section, and let us now resume this discussion. In machine learn-
ing, the model is the final product we are looking for and this is what we actually use. The resulting
12 Chapter 2 A review of machine learning

FIGURE 2.3
The overall process of machine learning.

FIGURE 2.4
Splitting data into two parts of training and test sets.

model can be a mathematical representation of a real-world process. For example, if we are developing
a prediction system to identify the risk of breast cancer at earlier stages of the disease, the prediction
system is the model that we are talking about. If the training data used in the learning process are com-
prehensive, the model constructed works as well as the experts themselves. Machine learning has two
steps of training and inference:
• Training refers to the process of creating a model,
• Inference refers to the process of using a trained model to make a prediction.
2.3 Challenge with machine learning 13

FIGURE 2.5
Using the model for prediction.

In machine learning, the output of the training process is a model so that we can then utilize the
model to real-world domain data. This process is depicted in Fig. 2.5. The training data that is used to
create a model and the new data which is used in the real environment are often different.

2.3 Challenge with machine learning


I have just explained that machine learning is a data analysis method that automates model building
to recognize patterns (rules) and make decisions with minimal human interference. This method is
usually utilized to perform tasks that normally require human intelligence such as image recognition
where it is infeasible or difficult to design a conventional algorithm for effectively performing the task.
Using machine learning can solve this problem, but it creates inevitable issues. The following is the
fundamental issue that machine learning faces.
The distinctness of the data that the model was trained on and new unseen data is the structural
challenge that machine learning faces. It would not be an exaggeration to declare that every problem of
machine learning arises from this. For instance, suppose that we trained a model using a few medical
images for a particular disease. Will the model successfully recognize the new medical images? The
possibility will be very low.
Machine learning needs a comprehensive training set in order to work properly. No machine learning
algorithm can achieve the desired aim with small-sized or poor training data. Generalization is a term
used to express a model’s capability to cope with new data. Generalization usually refers to a machine
14 Chapter 2 A review of machine learning

learning model’s ability to perform well on unseen data rather than just the data that it was trained on.
The ability of a model to generalize is crucial to the success of machine learning (learned model).

2.4 Overfitting and underfitting


One of the important considerations in machine learning is how to generalize the learned model to new
data. Because the data that is collected is typically small, incomplete, missing and noisy, a constructed
model must be generalizable.
Generalization refers to the fact that the concepts learned by a machine learning model can be well
generalized to the new examples encountered. So the concept of generalizability refers to the model’s
ability to make output (make a prediction) from new data that it has not yet seen.
Due to the concept of generalizability in machine learning, two other terms emerge, called Under-
fitting and Overfitting. Both of these concepts reflect the poor performance of the learner algorithm
in machine learning. Let us start with an example. Suppose you are studying for a final exam. The
teacher has also given you 100 sample questions so that you can use them to prepare yourself for the
exam. If you study in such a way that you know only these 100 sample questions completely and an-
swer any other question that is slightly different from these 100 questions incorrectly, it means that your
mind is overfit by the educational questions that the teacher has given you for learning.
The meanings of these two concepts are summarized as follows:
Overfitting occurs when learning is well done on training data, but performance on unseen data is not
good. As a matter of fact, the constructed model cannot be generalized. In summary,

Overfitting = Good Learning + Not Generalized

Overfitting is due to the model learning “too much” from the training data. When we simplify the
model to reduce the risk of overfitting, we call this process regularization.
Underfitting is when not only learning is not good, but also when the model performs poorly on other
datasets. Underfitting is due to the model having “not learned enough” from the training data, yielding
low generalization and inaccurate predictions.
In summary,
• In overfitting, the accuracy of the model is high for data similar to training data as well as training
data.
• In overfitting, the model accuracy for new and never seen data is low.
• Overfitting occurs when the model is highly dependent on training data and therefore cannot be
generalized to new data.
• Overfitting occurs when a model learns the details and noise in training data to the extent that it
negatively affects the model performance on new data.
• Overfitting occurs when the model tries to memorize the training data only, instead of learning the
scope of the problem and finding the relationship between the independent and dependent variables,
and this is what we call being very dependent on the training data.
• Underfitting occurs when the model is not sufficiently trained from the training data at the time of
learning.
2.4 Overfitting and underfitting 15

2.4.1 Mitigating overfitting


Overfitting negatively impacts the machine learning performance on unseen data. We can figure out
who is a professional and who is an amateur by looking at their strategies in dealing with overfitting.
To deal with the overfitting problem, in general, there are two ways, namely regularization and cross-
validation.
Regularization techniques help confront overfitting in machine learning aiming to build a model
as simple as possible. In simple terms, they reduce parameters and simplify the model. The resulting
simplified model:
1. Can reduce overfitting,
2. Is usually faster to converge to a local minimum; the local minima are often bad,
3. Is less likely to learn noise data, and this may improve the model’s generalization capabilities.
We always need to determine whether a model could be generalized to new, unseen data, in other
words, whether the trained model is overfitted or not. Cross-validation is another method to deal with
overfitting. Validation is a very useful method to evaluate the effectiveness of your model, particularly
in cases where you need to reduce overfitting.

2.4.2 Adjusting parameters using cross-validation


In deep learning, we need to estimate the model parameters. If the number of parameters is large, the
model becomes more complex and the estimations may not be easy to perform. On the other hand,
increasing the parameters may reduce the efficiency of the model. As has already been mentioned, such
a problem is known as “overfitting.” The solution to such a problem could be to use “cross-validation”
in which the goal is to determine the appropriate number of parameters of the model. This method is
sometimes called “rotation estimation” or “out-of-sample testing.” In such a case, the parameters that
are estimated by cross-validation are called “out-of-sample estimation.”
To measure the performance of a model, two methods are usually used: (1) evaluation based on the
assumptions on which the model should work; and (2) evaluation based on the efficiency of the model
in predicting new values (not observed).
In Method 1, the evaluation of the model relies on the data (samples) observed and used to build the
model. For example, we expect the constructed model to have the least sum of squares of error com-
pared to any other model. It is clear that this method is possible based on the data on which the model is
based, but the performance of the model cannot be measured for new data that was not observed during
modeling. Method 2, which is called cross-validation, relies on data that is observed but not used when
building the model. This data is used to evaluate and measure the performance of the model to predict
new data.
Thus, to measure the efficiency of the model and its optimality, we resort to estimating the model
error based on the data that have been set aside for cross-validation. Estimating this error is commonly
referred to as an “out-of-sample error.” In the following, I describe cross-validation as a tool to measure
this error and examine different ways of implementing it.
Assume that observations from society are available as a random sample that is to be used in mod-
eling. The goal in cross-validation is to achieve a model whose number of parameters is optimal. That
is, finding a model that does not overfit. To achieve this goal in machine learning, the data is usually
divided into two parts, training data and test data.
16 Chapter 2 A review of machine learning

FIGURE 2.6
Splitting the training data into two sets, namely training and validation sets. The validation set must not share any
samples with either the training set or the test set.

According to the separation intended for these two sets, modeling will be based only on the training
data part. But in the cross-validation method, hereinafter referred to as CV, during a repetitive process,
the training set used to create the model is split into two parts. Each time the CV process is repeated,
part of the data is used to train and part to test the model. Thus, this process is a sampling method
to estimate the model error. Fig. 2.6 illustrates the splitting of training data into two sets, training and
validation sets.
The ratio of these parts is also debatable, which I will not discuss here, but usually 50% of the total
data is for training purposes, 25% for cross-validation, and the rest of the data for model testing.
It should be noted that the test data in the CV process may be used as training data in the next
iteration, so their nature is different from the data previously introduced as test data.
At each stage of the CV process, the model trained by applying the training samples is used to
predict the other part of CV data, and the “error” or “accuracy” of the model is calculated on the samples
that were not used to train the model. The average of these errors (accuracy) is usually considered as
the overall error (accuracy) of the model. Of course, it is better to report the standard deviation of
errors (accuracy). Thus, according to the number of different parameters (model complexity), different
models can be produced and their estimation error can be measured using the CV method. At the end,
we will choose a model as the most appropriate if it has the lowest error estimate.

2.4.3 Cross-validation methods


Based on the method of selecting the validation set, different CV methods have been introduced. In the
following, I discuss some of them.
Holdout method. In this method, the data is randomly divided into two parts, training and validation.
Model parameters are estimated by using training data and model error is calculated based on validation
data.
The simplicity of calculations and nonrepetition of the CV process in this method are its advantages.
This method seems appropriate if the training and validation data is homogeneous. However, since the
model error calculations are based on only one part, a suitable estimate for the model error may not be
provided.
2.5 Types of machine learning 17

Leave-One-Out method. In this method, an observation is removed from the training set and based
on the rest of the observations, the parameters are estimated. The model error is then calculated for
the removed observation. Since in this method only one observation is removed from each stage of the
CV process, the number of iterations of the CV process is equal to the number of training data. As a
result, the error calculation time of the model is short and can be easily implemented. This method is
sometimes called LOO for short.
Leave-P-Out method. If in the LOO method the number of observations coming out of the training set
is equal to p, it is called the Leave-P-Out method, or LPO for short. As a result, if n denotes the number
 
of observations in the training set, the number of steps in the CV process will be pn . Thus, at each stage
of the process, the p observations are removed from the training data and the model is estimated based
on the other parameters. The model error is then calculated for the removed observations. Finally, by
calculating the average of the obtained errors, the model error is estimated.
K-Fold method. If we randomly split the training samples into k subfolders or “folds” of the same size,
at each stage of the CV process, we can consider k − 1 of these subfolders as the training set and one
as the validation set. Fig. 2.7 illustrates the splitting of the training data into k folds. It is clear that by
selecting k = 5, the number of iterations of the CV process will be equal to 5 and it will be possible to
achieve the appropriate model quickly. This method is the gold-standard to evaluate the performance of
a machine learning algorithm.
Choosing the right number of folds is an important consideration in this approach. When choosing
the number of folds, it should be noted that it is necessary to have enough data in each fold to be able
to provide a good estimate of the model performance. On the other hand, the number of folds should
not be underestimated, in order to have enough folds to evaluate the model performance.
Validation based on random sampling. In this method, sometimes known as Monte Carlo cross-
validation, the dataset is randomly divided into training and validation sets. The model parameters are
then estimated based on the training data and the error or accuracy of the model is calculated using the
validation data. By repeating the random separation of data, the mean error or accuracy of the models
is considered as the criterion for selecting the appropriate model (least error or highest accuracy). Due
to the random selection of data, the ratio of training data size and validation will not depend on the
number of iterations, and unlike the k-fold method, the CV process can be performed with any number
of iterations. Instead, due to the random selection of subsamples, some observations may never be used
in the validation section and others may be used more than once in the model error estimate calculations.

2.5 Types of machine learning


Generally, machine learning uses two types of methods to perform learning. In several cases, we train
them (called Supervised Learning) and, in some other cases, machines learn by their own (called Un-
supervised Learning). In general, there are three ways for a machine to learn (Fig. 2.8), which are
Supervised Learning, Unsupervised Learning, and Reinforcement Learning. Let us see how a machine
learns in detail.
18 Chapter 2 A review of machine learning

FIGURE 2.7
K-fold validation process.

2.5.1 Supervised learning


The supervised approach is actually similar to a student learning under the supervision of a teacher.
The teacher teaches the student by solving examples, and then the student derives general rules from
these examples and thus will be able to solve new examples that he has not seen before. In supervised
learning, we have a training dataset consisting of samples in which we know the truth or the correct
output for each sample, and we train a model by telling the truth through samples. The shape of the
training dataset is as the following pairs:

{input, correct output}


2.5 Types of machine learning 19

FIGURE 2.8
Three core types of machine learning techniques differing in their approach.

Table 2.1 The shape of training dataset.


Input Correct output
Input #1 correct output for Input #1
... ...
Input #n correct output for Input #n

Table 2.1 shows the shape of the training dataset in detail. In this table, you can see that the correct
output is provided for each input. Another name for the “correct output” is “class” or “label.”
Now that you know the meaning of the supervised process, let us look at how a supervised algorithm
works:
Step 1. Data preparation – the very first step conducted before training a model in the supervised
process is to load labeled data into the system. This step usually takes more time for data preparation,
including data labeling and some preprocessing operations on it, such as removing invalid data. Most
tasks that can be done at this stage are often performed by a human trainer. At the end of this step, the
dataset prepared for the next step is so divided into training and test sets.
Step 2. Training process – the goal of this step is to find a relationship between input and output with
acceptable accuracy. Machine learning algorithms are used to find such a relationship. The output of
this step is a model made for the problem.
Step 3. Testing process – the model built in the second step will be tested on new data in this step to
determine its performance in the face of new and unseen data.
Step 4. Prediction – when the model is ready after training and testing, it can start making a prediction
or decision when new data is given to it.
There are two main supervised learning techniques: Regression and Classification. Table 2.2 shows
a summary of what they perform.
A classification algorithm classifies the input data (new observation) into one of several predefined
classes. It learns from the available dataset and then uses this learning to classify new observations.
There are two types of classification, which are binary and nonbinary classification. The classification
20 Chapter 2 A review of machine learning

Table 2.2 Two supervised machine learning algorithms.


Classification Classifying something into classes, and predicting unseen data from created model
Regression Finding the relationship between variables

Table 2.3 Structure of training data.


Input Class
Feature 1 Feature 2 ... Feature n correct output
Input #1 Value 1 Value 2 ... Value n correct output for Input 1
Input #2 Value 1 Value 2 ... Value n correct output for Input 2

of humans into two groups with diabetes and those without is an example of a binary classification.
Protein family classification is an example of nonbinary classification. In this problem, the proteins are
classified into classes that share similar function.
The structure of the training data of the classification problem, i.e., input and correct output pairs,
looks like in Table 2.3.

WHAT IS A FEATURE?
A feature in machine learning is any column value in the dataset that describes a piece of data.
For example, in the diagnosis of diabetes in a human, Pregnancies, Glucose, Blood Pressure,
etc., are examples of features. Note that we use features as independent variables.

Regression is another useful application from supervised machine learning algorithms that is used
to find a relationship between variables (features). It attempts to predict the output value when the input
value is given. In contrast to classification, regression does not determine the class. Instead, it involves
predicting a numerical value.

2.5.2 Unsupervised learning


In unsupervised learning, no training data (the data given is not labeled) is available, and the dataset
contains only inputs without correct outputs. Instead, the algorithm automatically identifies the patterns
and relationships within the dataset and creates a structure from the data itself. This machine learning
technique is employed when we do not know how to classify the given data but need to do so. Now, let
us use an example to see how unsupervised machine learning works.
Suppose we provide images of cucumbers, peaches, and bananas to the model, so the machine
learning algorithm creates classes based on some patterns and relationships, assigning fruits to those
classes. Now if new data enters the model, it adds it to one of the created classes.
There are two primary categories in unsupervised machine learning, clustering and dimensionality
reduction. Table 2.4 shows a summary of what they perform.
Just for reference, clustering is the process of dividing data points into several clusters so that
the data points in the same cluster are more similar than the data points in other clusters. It is often
confusing how classification and clustering differ from each other, as their process is similar. Despite
this similarity, these methods offer two completely different approaches so that clustering helps you to
2.5 Types of machine learning 21

Table 2.4 Two unsupervised machine learning algorithms.


Clustering Partitioning the given data into clusters
Dimensionality Reduction Reducing features to related and meaningful features aiming to improve accuracy

FIGURE 2.9
Several supervised and unsupervised algorithms.

find all kinds of unknown patterns in data. Something to bear in mind is that clustering and classification
are distinct terms. Some clustering approaches are:
• Partitioning methods
• Hierarchical clustering
• Fuzzy clustering
• Density-based clustering
• Model-based clustering
Why reduce dimensionality? Among the reasons are time or space complexity, desire to reduce the
cost of viewing and collecting additional and unnecessary data, and having better visualization when
data is 2D or 3D. Fig. 2.9 depicts several most used supervised and unsupervised algorithms.

2.5.3 Reinforcement learning


In recent years, Reinforcement Learning (LR) has achieved many successes in various fields, but there
are also situations in which the use of this knowledge will be difficult. Reinforcement learning describes
22 Chapter 2 A review of machine learning

a set of learning problems in which an “agent” must perform “actions” in an “environment” in order to
maximize the defined “reward function.”
Unlike in supervised learning, in reinforcement learning there is no labeled data or, in fact, correct
input and output pairs. Thus, a large part of learning takes place “online” and, for example, when the
agent actively interacts with its environment over several repetitions and gradually learns the “policy”
that applies and explains what can be done to maximize the “reward.”
Reinforcement learning has different goals compared to unsupervised learning. While the goal in
unsupervised learning is to explore the distribution in the data in order to learn more about the data,
reinforcement learning aims to discover the right data model that maximizes the “total cumulative
reward” for the agent.
Q-learning and SARSA (State–Action–Reward–State–Action) are two popular, model-independent
algorithms for reinforcement learning. The difference between these algorithms is in their search strate-
gies.

2.6 The math behind deep learning


Let us review some of the basic mathematical concepts you need to know to practice deep learning. Let
us take a look at how the data is displayed.

2.6.1 Tensors
Tensor is a new word. A tensor is a matrix in which each cell can hold multiple numbers instead of one.
Typically, deep learning uses tensors as the primary data structure. Tensors are the basis of this field,
which is why TensorFlow Google is so named. Now, what is a tensor? A tensor is actually a container
for storing data. Let us see the several types of tensors:
Scalars (zero-dimensional tensors). A tensor that contains only one number is called a scalar. This
number can be an integer or a decimal number.
Vectors (one-dimensional tensors). An array of numbers or a one-dimensional tensor is called a vector.
In mathematical texts, we often see vectors written as follows:

⎡ ⎤
x1
⎢.⎥
x = ⎣ .. ⎦
xn

or [x1 , . . . , xn ].
A one-dimensional tensor has exactly one axis. If an array has four elements then it is called a 4D
vector. There is a difference between a 4D vector and a 4D tensor. The 4D vector has only one axis
containing four components, while the 4D tensor has five axes.
Matrices (two-dimensional tensors). A vector of vectors, or arrays, is called a matrix. A matrix has
two axes known as the row axis and the column axis. For example, the following matrix has three rows
2.6 The math behind deep learning 23

FIGURE 2.10
3D tensor.

FIGURE 2.11
4D tensor.

and three columns (a 3 × 3 matrix):


⎡ ⎤
0 2 4
⎣1 3 5⎦ .
7 8 9

In this example, [0, 2, 4] is the first row of the matrix.


Three- and higher-dimensional tensors. If the elements of a matrix are placed in a vector, a 3D
tensor is created so that each element of a vector contains a matrix. In other words, by stacking two-
dimensional tensors, a three-dimensional tensor is created. Fig. 2.10 depicts a 3D tensor. Each two-
dimensional tensor or matrix is called a channel here. For example, channel 1 in Fig. 2.10 is yellow
(light gray in print version). So we say, channel 0, channel 1, channel 2, and so on.
By putting three-dimensional tensors together, a four-dimensional tensor is formed. In fact, a four-
dimensional tensor is a vector, each element of which is a three-dimensional tensor. Fig. 2.11 shows a
4D tensor.
24 Chapter 2 A review of machine learning

2.6.2 Relevant mathematical operations


Linear algebra is essential in machine learning and deep learning. In this section, I briefly review core
linear algebra operations you should know.
Dot product. The dot product, also commonly known as the “scalar product” or “inner product”, takes
two equal-length vectors, multiplies them together, and returns a single number. The dot product of two
vectors a = [a1 , a2 , . . . , an ] and b = [b1 , b2 , . . . , bn ] is defined as
n
a.b = ai bi = a1 × b1 + · · · + an × bn .
i=1

Let us see how we can apply dot product on two vectors with an example:

A = [1 2 3], B = [0 2 4],

A.B = 1 × 0 + 2 × 2 + 3 × 4 = 16.
The dot product of two normalized vectors is called the cosine similarity, or cosine of angle between
the vectors.
Elementwise product. Another common operation we see in practice is the elementwise product. You
often may want to operate on each element of a vector while doing a computation. For example, you
may want to add two matrices of the same dimensions by adding all of the corresponding elements
together from the two source matrices. The addition (+) and subtraction (−) operators are defined to
work on matrices as well as scalars. Like addition and subtraction operations on matrices, elementwise
product takes two vectors of the same dimensions and multiplies all of the corresponding elements. Let
A and B are two matrices of the same dimensions as follows:
⎡ ⎤ ⎡ ⎤
a11 a12 a13 b11 b12 b13
A = ⎣a21 a22 a23 ⎦ , B = ⎣b21 b22 b23 ⎦ .
a31 a32 a33 b31 b32 b33

Their elementwise product is as follows:


⎡ ⎤
a11 × b11 a12 × b12 a13 × b13
A. ∗ B = ⎣a21 × b21 a22 × b22 a23 × b23 ⎦ .
a31 × b31 a32 × b32 a33 × b33

Note that it should not be confused with the more common matrix product.
Tensor product. Given two vectors, this product takes each element of a vector and multiplies it by all
of the elements in the other vector creating a new row in the resultant matrix. Let N and M are two
vectors that are defined below as
⎡ ⎤ ⎡ ⎤
n11 m11
N = ⎣n21 ⎦ , M = ⎣m21 ⎦ .
n31 m31
2.6 The math behind deep learning 25

Their tensor product (which is also called outer product), denoted by N ⊗ M, is computed as
⎡ ⎤
n11 × m11 n11 × m12 n11 × m13
A ⊗ B = ⎣n21 × m21 n21 × m22 n21 × m23 ⎦ .
n31 × m31 n31 × m32 n31 × m33

Transpose. The transpose of matrix A is a new matrix AT whose rows are the columns of A. Here are
a matrix and its transpose:
⎡ ⎤T ⎡ ⎤
0 2 4 0 1 7
⎣1 3 5⎦ = ⎣2 3 8⎦ .
7 8 9 4 5 9

2.6.3 The math behind machine learning: statistics


As an interdisciplinary field, Data Science is most dependent on statistical concepts and techniques.
“Statistics is the grammar of science,” says Karl Pearson in his famous sentence. It is clear that data
science is no exception to this rule and machine learning and deep learning are some of the techniques
used in data science. Let us review the statistics that will be needed in this book. Some basic concepts
in statistics are:
• Statistics and Data Types
• Measures of Central Tendency
• Measures of Variability
• Outliers and Noisy Data
Statistics and data types
As a basic classifier in data science, values and data are divided into three groups: numerical data,
categorical data, and ordinal data. In the following, I introduce each of these groups and if each group
also has a subsection, I also mention it.
Numerical data – values obtained by measurement or counting methods are considered as numerical
data groups. This type of data is usually classified into two subcategories called discrete and continuous
data.
• Discrete – these numerical values are a subset of natural numbers (e.g., number of people, number
of children, etc.).
• Continuous – if the values obtained from numerical data are a subset of real (decimal) numbers, the
data type is considered continuous (e.g., weight, distance, etc.).
Categorical data – usually the data related to attributes or qualitative characteristics are of the categor-
ical type (e.g., place of birth, gender, type of car, etc.). Such data are suitable for labeling members of
the statistical community. If we use numbers to represent or specify any class or group, the numerical
coding operation is performed and it should be noted that these numbers should not be the basis of
mathematical calculations.
Ordinal data – if the values of qualitative attributes are ordered, the data is called ordinal. For example,
attributes that have rank or priority can be ordinal data sets. With the help of such features, members of
the statistical community can be sorted (e.g., hotel rank, education level, etc.).
26 Chapter 2 A review of machine learning

Measures of central tendency


Let us review central tendency measurements:
Mean. The method of calculation is the same as the average. This means that all values are added
together and the result is divided by the number of values.
Median. The middle point of the data is called the median. To calculate the median, you must first put
all the values in order (from smallest to largest), then specify the middle value.
Mode. The value that has the highest frequency in data is considered as the mode.
Measures of variability
To understand the behavior of data, in addition to determining their centrality, the degree of vari-
ability must also be determined. To measure the data scatter, we will use the various indicators that are
introduced below:
Range. The distance between the maximum and the minimum value is the range of the changes.
Variance. It measures how far a set of numbers are spread out from their mean value.
Standard deviation. It used to measure the dispersion of a dataset. Intuitively, it measures the average
distance data values are from the mean and is calculated by taking the square root of the variance.
Outliers and noisy data
One of the issues emphasized by data science and data mining experts is the importance of using
data on which preprocessing has been performed and it has the necessary validity. Noise and outlier
detection and reduction is one of the effective ways of improving data quality. Therefore, before using
the data, it is necessary to identify the noise and outlier data and deal with them in the right way.
Noise. Noise is caused by various reasons such as errors in the collection process or errors in entering
information into the system, and their identification helps us to design models with more awareness and
accuracy.
Outlier. A data point that is significantly different from other data is called an outlier. In other words,
values that are “far away” from the main group of data and have the biggest effect on the mean. Outlier
data can sometimes be a hassle, and sometimes it is a matter of outlier detection itself and a kind of
anomaly detection in which we seek to find outlier data. Outlier data can reduce the accuracy of the
machine learning algorithm. For this reason, it is necessary to remove outlier data from the dataset
before starting to build the model. This leads to a significant increase in accuracy.
It is noteworthy that sometimes some data is mistaken for the outlier, while these data are correct and
do not simply follow the pattern governing other data in the database. For example, in the population
health dataset of a community, the data of cancer patients, who may be a very small part of the total
data, and certainly have distinct characteristics from other people’s data, are not considered as outliers.
A box-and-whisker plot (sometimes called a boxplot) is a useful way of visually displaying the data
distribution through their quartiles. Another important application of the boxplot is that from it we can
recognize candidate outlier values (or bad data). Fig. 2.12 depicts the anatomy of a boxplot. It summa-
rizes the data distribution using five-numbers: median (the median, Q2, marked by a vertical line inside
the box, denotes 50% of the data), lower-quartile (denoted by Q1 shows the 25th percentile), upper-
quartile (denoted by Q3 shows the 75th percentile), lower-extreme, and upper-extreme. The whiskers
go from each quartile to the minimum (lower-extreme) or maximum (upper-extreme). The dots outside
of the whiskers show outlier values (Here, data that is smaller than the minimum or larger than the
maximum is considered as outlier). In this chart, the maximum and minimum are calculated as follows:

I QR = Q3 − Q1 ,
2.7 TensorFlow and Keras 27

FIGURE 2.12
Anatomy of a boxplot.

Max(data), Max(data) ≤ Q3 + 1.5 × I QR,


Q3 + 1.5 × I QR, Max(data) > Q3 + 1.5 × I QR,

Min(data), Min(data) > Q1 − 1.5 × I QR,


Q1 − 1.5 × I QR, Min(data) ≤ Q1 − 1.5 × I QR.

There are many outlier detection techniques. Some of the popular techniques are:
• Density-based outlier detection method (e.g., k-nearest neighbor, local outlier factor, isolation
forests, etc.),
• Subspace-, correlation-, and tensor-based outlier detection methods,
• One-Class Support Vector Machines (OC-SVMs),
• Clustering-based outlier detection algorithms,
• And many others.

2.7 TensorFlow and Keras


Because we will use Keras to build deep learning models in this book, let us take a look at the concept.
Keras, MXNet, PyTorch, and TensorFlow are deep learning frameworks.
28 Chapter 2 A review of machine learning

TensorFlow is a powerful and core open-source library for numerical computing, designed and
developed specifically for machine learning and deep learning models. TensorFlow also supports dis-
tributed computing so that you can train massive neural networks on very large training databases in
a reasonable amount of time by dividing computations into hundreds of servers. TensorFlow can train
a network with millions of parameters on a training set consisting of billions of instances each with
millions of features. This is not surprising, as TensorFlow was developed by the Google Brain team
and uses the power of many of Google’s great services.
TensorFlow is a very powerful library, but difficult to use directly for creating deep learning models.
Keras is an open-source neural network library written in Python which provides an easy and convenient
way to create a wide range of deep learning applications on top of TensorFlow, as the default backend
for Keras. Keras is a framework that we can use to build neural networks with just a few lines of code.
Of course, the Keras does not do all this alone; in fact, the Keras is a front-end for the deep learning
frameworks such as Tensorflow and CNTK, and they build and train behind neural networks. This is
why we call Keras a high-level framework – it eliminates the complexity of using these libraries to a
great extent.
Why Keras?
While deep neural networks have proven to be very effective in solving a particular problem, the com-
plexity of key frameworks is a barrier to their use by novice machine learning developers. Several
suggestions have been made for improving and simplifying high-level APIs for building neural net-
work models, all of which are remotely similar but differ in careful consideration.
Keras is one of the most leading high-level neural network APIs written in Python and supports
several backend neural network computing engines. Keras is created to be a user-friendly code interface,
easy to understand and extend, and it supports modularity.
The motto of Keras API is “deep learning for humans.” The main page of Keras website (https://
keras.io/) states: The API is “designed for human beings, not machines,” and “follows best practices
for reducing cognitive load.”
The biggest reason to use Keras comes from its extensive documentation and developer guides.
Beyond ease of learning and model building, Keras offers extensive use benefits, support for a wide
range of production deployment options, connects perfectly at least five backend engines (TensorFlow,
CNTK, Theano, MXNet, and PlaidML), and provides strong support to train Keras models on multiple
GPUs. In addition, Keras is supported by Google, Microsoft, Amazon, Apple, Nvidia, Uber, and others.

2.8 Real-world tensors


Let us introduce the most important tensors we deal with in deep learning. I remind you that tensors
are very important in deep learning and the inputs of the network need to be provided as a tensor. The
data we always use is almost always in one of the following cases.
Vector data (tabular data)
Vector data is equivalent to two-dimensional tensors and their shape is as follows:

(samples, features) or (number of samples, number of features)

Let us see how we can use this type of tensor with two examples.
Another random document with
no related content on Scribd:
change of colour. “You can say I am now with Lord L’Estrange.”
“I had hoped you had done for ever with that deluder of youth,”
said Harley, as soon as the groom of the chambers had withdrawn. “I
remember that you saw too much of him in the gay time, ere wild
oats are sown; but now surely you can never need a loan; and if so, is
not Harley L’Estrange by your side?”
Egerton.—“My dear Harley!—doubtless he but comes to talk to
me of some borough. He has much to do with those delicate
negotiations.”
Harley.—“And I have come on the same business. I claim the
priority. I not only hear in the world, but I see by the papers, that
Josiah Jenkins, Esq., known to fame as an orator who leaves out his
h’s, and young Lord Willoughby Whiggolin, who is just now made a
Lord of the Admiralty, because his health is too delicate for the army,
are certain to come in for the city which you and your present
colleague will as certainly vacate. That is true, is it not?”
Egerton.—“My old committee now vote for Jenkins and
Whiggolin. And I suppose there will not be even a contest. Go on.”
“So my father and I are agreed that you must condescend, for the
sake of old friendship, to be once more member for Lansmere!”
“Harley,” exclaimed Egerton, changing countenance far more than
he had done at the announcement of Levy’s portentous visit—“Harley
—No, no!”
“No! But why? Wherefore such emotion?” asked L’Estrange, in
surprise.
Audley was silent.
Harley.—“I suggested the idea to two or three of the late
Ministers; they all concur in advising you to accede. In the first place,
if declining to stand for the place which tempted you from Lansmere,
what more natural than that you should fall back on that earlier
representation? In the second place, Lansmere is neither a rotten
borough, to be bought, nor a close borough, under one man’s
nomination. It is a tolerably large constituency. My father, it is true,
has considerable interest in it, but only what is called the legitimate
influence of property. At all events, it is more secure than a contest
for a larger town, more dignified than a seat for a smaller. Hesitating
still? Even my mother entreats me to say how she desires you to
renew that connection.”
“Harley,” again exclaimed Egerton; and, fixing upon his friend’s
earnest face, eyes which, when softened by emotion, were strangely
beautiful in their expression—“Harley, if you could but read my heart
at this moment, you would—you would—” His voice faltered, and he
fairly bent his proud head upon Harley’s shoulder; grasping the hand
he had caught, nervously, clingingly—“Oh Harley, if I ever lose your
love, your friendship!—nothing else is left to me in the world.”
“Audley, my dear dear Audley, is it you who speak to me thus?
You, my school friend, my life’s confidant—you?”
“I am grown very weak and foolish,” said Egerton, trying to smile.
“I do not know myself. I, too, whom you have so often called ‘Stoic,’
and likened to the Iron Man in the poem which you used to read by
the riverside at Eton.”
“But even then, my Audley, I knew that a warm human heart (do
what you would to keep it down) beat strong under the iron ribs. And
I often marvel now, to think you have gone through life so free from
the wilder passions. Happier so!”
Egerton, who had turned his face from his friend’s gaze, remained
silent for a few moments, and he then sought to divert the
conversation, and roused himself to ask Harley how he had
succeeded in his views upon Beatrice, and his watch on the Count.
“With regard to Peschiera,” answered Harley, “I think we must
have overrated the danger we apprehended, and that his wagers were
but an idle boast. He has remained quiet enough, and seems devoted
to play. His sister has shut her doors both on myself and my young
associate during the last few days. I almost fear that, in spite of very
sage warnings of mine, she must have turned his poet’s head, and
that either he has met with some scornful rebuff to incautious
admiration, or that he himself has grown aware of peril, and declines
to face it; for he is very much embarrassed when I speak to him
respecting her. But if the Count is not formidable, why, his sister is
not needed; and I hope yet to get justice for my Italian friend
through the ordinary channels. I have secured an ally in a young
Austrian prince, who is now in London, and who has promised to
back, with all his influence, a memorial I shall transmit to Vienna.
Apropos, my dear Audley, now that you have a little breathing-time,
you must fix an hour for me to present to you my young poet, the son
of her sister. At moments the expression of his face is so like hers.”
“Ay, ay,” answered Egerton quickly, “I will see him as you wish, but
later. I have not yet that breathing-time you speak of; but you say he
has prospered; and, with your friendship, he is secure from fortune. I
rejoice to think so.”
“And your own protégé, this Randal Leslie, whom you forbid me to
dislike—hard task!—what has he decided?”
“To adhere to my fate. Harley, if it please Heaven that I do not live
to return to power, and provide adequately for that young man, do
not forget that he clung to me in my fall.”
“If he still cling to you faithfully, I will never forget it. I will forget
only all that now makes me doubt him. But you talk of not living,
Audley! Pooh!—your frame is that of a predestined octogenarian.”
“Nay,” answered Audley, “I was but uttering one of those vague
generalities which are common upon all mortal lips. And now
farewell—I must see this Baron.”
“Not yet, until you have promised to consent to my proposal, and
be once more member for Lansmere. Tut! don’t shake your head. I
cannot be denied. I claim your promise in right of our friendship,
and shall be seriously hurt if you even pause to reflect on it.”
“Well, well, I know not how to refuse you, Harley; but you have not
been to Lansmere yourself since—since that sad event. You must not
revive the old wound—you must not go; and—and I own it, Harley;
the remembrance of it pains even me. I would rather not go to
Lansmere.”
“Ah! my friend, this is an excess of sympathy, and I cannot listen
to it. I begin even to blame my own weakness, and to feel that we
have no right to make ourselves the soft slaves of the past.”
“You do appear to me of late to have changed,” cried Egerton
suddenly, and with a brightening aspect. “Do tell me that you are
happy in the contemplation of your new ties—that I shall live to see
you once more restored to your former self.”
“All I can answer, Audley,” said L’Estrange, with a thoughtful
brow, “is, that you are right in one thing—I am changed; and I am
struggling to gain strength for duty and for honour. Adieu! I shall tell
my father that you accede to our wishes.”
CHAPTER VI.
When Harley was gone, Egerton sunk back on his chair, as if in
extreme physical or mental exhaustion, all the lines of his
countenance relaxed and jaded.
“To go back to that place—there—there—where—Courage, courage
—what is another pang?”
He rose with an effort, and folding his arms tightly across his
breast, paced slowly to and fro the large, mournful, solitary room.
Gradually his countenance assumed its usual cold and austere
composure—the secret eye, the guarded lip, the haughty collected
front. The man of the world was himself once more.
“Now to gain time, and to baffle the usurer,” murmured Egerton,
with that low tone of easy scorn, which bespoke consciousness of
superior power and the familiar mastery over hostile natures. He
rang the bell: the servant entered.
“Is Baron Levy still waiting?”
“Yes, sir.”
“Admit him.”
Levy entered.
“I beg your pardon, Levy,” said the ex-minister, “for having so long
detained you. I am now at your commands.”
“My dear fellow,” returned the Baron, “no apologies between
friends so old as we are; and I fear that my business is not so
agreeable as to make you impatient to discuss it.”
Egerton, (with perfect composure.)—“I am to conclude, then, that
you wish to bring our accounts to a close. Whenever you will, Levy.”
The Baron, (disconcerted and surprised.)—“Peste! mon cher, you
take things coolly. But if our accounts are closed, I fear you will have
but little to live upon.”
Egerton.—“I can continue to live on the salary of a Cabinet
Minister.”
Baron.—“Possibly; but you are no longer a Cabinet Minister.”
Egerton.—“You have never found me deceived in a political
prediction. Within twelve months, (should life be spared to me) I
shall be in office again. If the same to you, I would rather wait till
then, formally and amicably to resign to you my lands and this
house. If you grant that reprieve, our connection can thus close,
without the éclat and noise, which may be invidious to you, as it
would be disagreeable to me. But if that delay be inconvenient, I will
appoint a lawyer to examine your accounts, and adjust my
liabilities.”
The Baron, (soliloquising.)—“I don’t like this. A lawyer! That may
be awkward.”
Egerton, (observing the Baron, with a curl of his lip.)—“Well,
Levy, how shall it be?”
The Baron.—“You know, my dear fellow, it is not my character to
be hard on any one, least of all upon an old friend. And if you really
think there is a chance of your return to office, which you apprehend
that an esclandre as to your affairs at present might damage, why, let
us see if we can conciliate matters. But, first, mon cher, in order to
become a Minister, you must at least have a seat in Parliament; and,
pardon me the question, how the deuce are you to find one?”
Egerton.—“It is found.”
The Baron.—“Ah, I forgot the £5000 you last borrowed.”
Egerton.—“No; I reserve that sum for another purpose.”
The Baron, (with a forced laugh.)—“Perhaps to defend yourself
against the actions you apprehend from me?”
Egerton.—“You are mistaken. But to soothe your suspicions, I will
tell you plainly, that finding any sum I might have insured on my life
would be liable to debts preincurred, and (as you will be my sole
creditor) might thus at my death pass back to you; and doubting
whether, indeed, any office would accept my insurance, I appropriate
that sum to the relief of my conscience. I intend to bestow it, while
yet in life, upon my late wife’s kinsman, Randal Leslie. And it is
solely the wish to do what I consider an act of justice, that has
prevailed with me to accept a favour from the hands of Harley
L’Estrange, and to become again the member for Lansmere.”
The Baron.—“Ha!—Lansmere! You will stand for Lansmere?”
Egerton, (wincing.)—“I propose to do so.”
The Baron.—“I believe you will be opposed, subjected to even a
sharp contest. Perhaps you may lose your election.”
Egerton.—“If so, I resign myself, and you can foreclose on my
estates.”
The Baron, (his brow colouring.)—“Look you, Egerton, I shall be
too happy to do you a favour.”
Egerton, (with stateliness.)—“Favour! No, Baron Levy, I ask from
you no favour. Dismiss all thought of rendering me one. It is but a
consideration of business on both sides. If you think it better that we
shall at once settle our accounts, my lawyer shall investigate them. If
you agree to the delay I request, my lawyer shall give you no trouble;
and all that I have, except hope and character, pass to your hands
without a struggle.”
The Baron.—“Inflexible and ungracious, favour or not—put it as
you will—I accede, provided, first, that you allow me to draw up a
fresh deed, which will accomplish your part of the compact;—and
secondly, that we saddle the proposed delay with the condition that
you do not lose your election.”
Egerton.—“Agreed. Have you anything further to say?”
The Baron.—“Nothing, except that, if you require more money, I
am still at your service.”
Egerton.—“I thank you. No; I owe no man aught except yourself. I
shall take the occasion of my retirement from office to reduce my
establishment. I have calculated already, and provided for the
expenditure I need, up to the date I have specified, and I shall have
no occasion to touch the £5000 that I still retain.”
“Your young friend, Mr Leslie, ought to be very grateful to you,”
said the Baron, rising. “I have met him in the world—a lad of much
promise and talent. You should try and get him also into
Parliament.”
Egerton, (thoughtfully.)—“You are a good judge of the practical
abilities and merits of men, as regards worldly success. Do you really
think Randal Leslie calculated for public life—for a Parliamentary
career?”
The Baron.—“Indeed I do.”
Egerton, (speaking more to himself than Levy.)—“Parliament
without fortune—’tis a sharp trial; still he is prudent, abstemious,
energetic, persevering; and at the onset, under my auspices and
advice, he might establish a position beyond his years.”
The Baron.—“It strikes me that we might possibly get him into the
next Parliament; or, as that is not likely to last long, at all events into
the Parliament to follow—not for one of the boroughs which will be
swept away, but for a permanent seat, and without expense.”
Egerton.—“Ay—and how?”
The Baron.—“Give me a few days to consider. An idea has
occurred to me. I will call again if I find it practicable. Good day to
you, Egerton, and success to your election for Lansmere.”
CHAPTER VII.
Peschiera had not been so inactive as he had appeared to Harley
and the reader. On the contrary, he had prepared the way for his
ultimate design, with all the craft and the unscrupulous resolution
which belonged to his nature. His object was to compel Riccabocca
into assenting to the Count’s marriage with Violante, or, failing that,
to ruin all chance of his kinsman’s restoration. Quietly and secretly
he had sought out, amongst the most needy and unprincipled of his
own countrymen, those whom he could suborn to depose to
Riccabocca’s participation in plots and conspiracies against the
Austrian dominions. These his former connection with the Carbonari
enabled him to track in their refuge in London; and his knowledge of
the characters he had to deal with fitted him well for the villanous
task he undertook.
He had, therefore, already collected witnesses sufficient for his
purposes, making up in number for their defects in quality.
Meanwhile, he had (as Harley had suspected he would) set spies
upon Randal’s movements; and the day before that young traitor
confided to him Violante’s retreat, he had, at least, got scent of her
father’s.
The discovery that Violante was under a roof so honoured, and
seemingly so safe as Lord Lansmere’s, did not discourage this bold
and desperate adventurer. We have seen him set forth to reconnoitre
the house at Knightsbridge. He had examined it well, and discovered
the quarter which he judged favourable to a coup-de-main, should
that become necessary.
Lord Lansmere’s house and grounds were surrounded by a wall,
the entrance being to the high-road, and by a porter’s lodge. At the
rear there lay fields crossed by a lane or by-road. To these fields a
small door in the wall, which was used by the gardeners in passing to
and from their work, gave communication. This door was usually
kept locked; but the lock was of the rude and simple description
common to such entrances, and easily opened by a skeleton key. So
far there was no obstacle which Peschiera’s experience in conspiracy
and gallantry did not disdain as trivial. But the Count was not
disposed to abrupt and violent means in the first instance. He had a
confidence in his personal gifts, in his address, in his previous
triumphs over the sex, which made him naturally desire to hazard
the effect of a personal interview; and on this he resolved with his
wonted audacity. Randal’s description of Violante’s personal
appearance, and such suggestions as to her character and the
motives most likely to influence her actions, as that young lynx-eyed
observer could bestow, were all that the Count required of present
aid from his accomplice.
Meanwhile we return to Violante herself. We see her now seated in
the gardens at Knightsbridge, side by side with Helen. The place was
retired, and out of sight from the windows of the house.
Violante.—“But why will you not tell me more of that early time?
You are less communicative even than Leonard.”
Helen, (looking down, and hesitatingly.)—“Indeed there is
nothing to tell you that you do not know; and it is so long since, and
things are so changed now.”
The tone of the last words was mournful, and the words ended
with a sigh.
Violante, (with enthusiasm.)—“How I envy you that past which
you treat so lightly! To have been something, even in childhood, to
the formation of a noble nature; to have borne on those slight
shoulders half the load of a man’s grand labour. And now to see
Genius moving calm in its clear career; and to say inly, ‘Of that
genius I am a part!’”
“Helen, (sadly and humbly.)—“A part! Oh, no! A part? I don’t
understand you.”
Violante.—“Take the child Beatrice from Dante’s life, and should
we have a Dante? What is a poet’s genius but the voice of its
emotions? All things in life and in Nature influence genius; but what
influences it the most, are its sorrows and affections.”
Helen looks softly into Violante’s eloquent face, and draws nearer
to her in tender silence.
Violante, (suddenly.)—“Yes, Helen, yes—I know by my own heart
how to read yours. Such memories are ineffaceable. Few guess what
strange self-weavers of our own destinies we women are in our
veriest childhood!” She sunk her voice into a whisper: “How could
Leonard fail to be dear to you—dear as you to him—dearer than all
others?”
Helen, (shrinking back, and greatly disturbed.)—“Hush, hush! you
must not speak to me thus; it is wicked—I cannot bear it. I would not
have it be so—it must not be—it cannot!”
She clasped her hands over her eyes for a moment, and then lifted
her face, and the face was very sad, but very calm.
Violante, (twining her arm round Helen’s waist.)—“How have I
wounded you?—how offended? Forgive me—but why is this wicked?
Why must it not be? Is it because he is below you in birth?”
Helen.—“No, no—I never thought of that. And what am I? Don’t
ask me—I cannot answer. You are wrong, quite wrong, as to me. I
can only look on Leonard as—as a brother. But—but, you can speak
to him more freely than I can. I would not have him waste his heart
on me, nor yet think me unkind and distant, as I seem. I know not
what I say. But—but—break to him—indirectly—gently—that duty in
both forbids us both to—to be more than friends—than——”
“Helen, Helen!” cried Violante, in her warm, generous passion,
“your heart betrays you in every word you say. You weep; lean on me,
whisper to me; why—why is this? Do you fear that your guardian
would not consent? He not consent! He who—”
Helen.—“Cease—cease—cease.”
Violante.—“What! You can fear Harley—Lord L’Estrange? Fie;
you do not know him.”
Helen, (rising suddenly.)—“Violante, hold; I am engaged to
another.”
Violante rose also, and stood still, as if turned to stone; pale as
death, till the blood came, at first slowly, then with suddenness from
her heart, and one deep glow suffused her whole countenance. She
caught Helen’s hand firmly, and said, in a hollow voice—
“Another! Engaged to another! One word, Helen—not to him—not
to—Harley—to——”
“I cannot say—I must not. I have promised,” cried poor Helen, and
as Violante let fall her hand, she hurried away.
Violante sate down, mechanically. She felt as if stunned by a
mortal blow. She closed her eyes, and breathed hard. A deadly
faintness seized her; and when it passed away, it seemed to her as if
she were no longer the same being, nor the world around her the
same world—as if she were but one sense of intense, hopeless misery,
and as if the universe were but one inanimate void. So strangely
immaterial are we really—we human beings, with flesh and blood—
that if you suddenly abstract from us but a single, impalpable, airy
thought, which our souls have cherished, you seem to curdle the air,
to extinguish the sun, to snap every link that connects us to matter,
and to benumb everything into death, except woe.
And this warm, young, southern nature, but a moment before was
so full of joy and life, and vigorous, lofty hope. It never till now had
known its own intensity and depth. The virgin had never lifted the
veil from her own soul of woman. What, till then, had Harley
L’Estrange been to Violante? An ideal—a dream of some imagined
excellence—a type of poetry in the midst of the common world. It
had not been Harley the Man—it had been Harley the Phantom. She
had never said to herself, “He is identified with my love, my hopes,
my home, my future.” How could she? Of such, he himself had never
spoken; an internal voice, indeed, had vaguely, yet irresistibly,
whispered to her that, despite his light words, his feelings towards
her were grave and deep. O false voice! how it had deceived her. Her
quick convictions seized the all that Helen had left unsaid. And now
suddenly she felt what it is to love, and what it is to despair. So she
sate, crushed and solitary, neither murmuring nor weeping, only now
and then passing her hand across her brow, as if to clear away some
cloud that would not be dispersed; or heaving a deep sigh, as if to
throw off some load that no time henceforth could remove. There are
certain moments in life in which we say to ourselves, “All is over; no
matter what else changes, that which I have made my all is gone
evermore—evermore.” And our own thought rings back in our ears,
“Evermore—evermore!”
CHAPTER VIII.
As Violante thus sate, a stranger, passing stealthily through the
trees, stood between herself and the evening sun. She saw him not.
He paused a moment, and then spoke low, in her native tongue,
addressing her by the name which she had borne in Italy. He spoke
as a relation, and excused his intrusion: “For,” said he, “I come to
suggest to the daughter the means by which she can restore to her
father his country and his honours.”
At the word “father” Violante roused herself, and all her love for
that father rushed back upon her with double force. It does so ever—
we love most our parents at the moment when some tie less holy is
abruptly broken; and when the conscience says, “There, at least, is a
love that never has deceived thee!”
She saw before her a man of mild aspect and princely form.
Peschiera (for it was he) had banished from his dress, as from his
countenance, all that betrayed the worldly levity of his character. He
was acting a part, and he dressed and looked it.
“My father!” she said quickly, and in Italian. “What of him? And
who are you, signior? I know you not.”
Peschiera smiled benignly, and replied in a tone in which great
respect was softened by a kind of parental tenderness.
“Suffer me to explain, and listen to me while I speak.” Then,
quietly seating himself on the bench beside her, he looked into her
eyes, and resumed.
“Doubtless, you have heard of the Count di Peschiera?”
Violante.—“I heard that name, as a child, when in Italy. And
when she with whom I then dwelt, (my father’s aunt,) fell ill and
died, I was told that my home in Italy was gone, that it had passed to
the Count di Peschiera—my father’s foe.”
Peschiera.—“And your father, since then, has taught you to hate
this fancied foe?”
Violante.—“Nay; my father did but forbid me ever to breathe his
name.”
Peschiera.—“Alas! what years of suffering and exile might have
been saved your father, had he but been more just to his early friend
and kinsman; nay, had he but less cruelly concealed the secret of his
retreat. Fair child, I am that Giulio Franzini, that Count di Peschiera.
I am the man you have been told to regard as your father’s foe. I am
the man on whom the Austrian emperor bestowed his lands. And
now judge if I am in truth the foe. I have come hither to seek your
father, in order to dispossess myself of my sovereign’s gift. I have
come but with one desire, to restore Alphonso to his native land, and
to surrender the heritage that was forced upon me.”
Violante.—“My father, my dear father! His grand heart will have
room once more. Oh! this is noble enmity, true revenge. I understand
it, signior, and so will my father, for such would have been his
revenge on you. You have seen him?”
Peschiera.—“No, not yet. I would not see him till I had seen
yourself; for you, in truth, are the arbiter of his destinies, as of mine.”
Violante.—“I—Count? I—arbiter of my father’s destinies? Is it
possible!”
Peschiera, (with a look of compassionate admiration, and in a
tone yet more emphatically parental.)—How lovely is that innocent
joy; but do not indulge it yet. Perhaps it is a sacrifice which is asked
from you—a sacrifice too hard to bear. Do not interrupt me. Listen
still, and you will see why I could not speak to your father until I had
obtained an interview with yourself. See why a word from you may
continue still to banish me from his presence. You know, doubtless,
that your father was one of the chiefs of a party that sought to free
Northern Italy from the Austrians. I myself was at the onset a warm
participator in that scheme. In a sudden moment I discovered that
some of its more active projectors had coupled with a patriotic
enterprise schemes of a dark nature—and that the conspiracy itself
was about to be betrayed to the government. I wished to consult with
your father; but he was at a distance. I learned that his life was
condemned. Not an hour was to be lost. I took a bold resolve, that
has exposed me to his suspicions, and to my country’s wrath. But my
main idea was to save him, my early friend, from death, and my
country from fruitless massacre. I withdrew from the intended
revolt. I sought at once the head of the Austrian government in Italy,
and made terms for the lives of Alphonso and of the other more
illustrious chiefs, which otherwise would have been forfeited. I
obtained permission to undertake myself the charge of securing my
kinsman in order to place him in safety, and to conduct him to a
foreign land, in an exile that would cease when the danger was
dispelled. But unhappily he deemed that I only sought to destroy
him. He fled from my friendly pursuit. The soldiers with me were
attacked by an intermeddling Englishman; your father escaped from
Italy—concealing his retreat; and the character of his flight
counteracted my efforts to obtain his pardon. The government
conferred on me half his revenues, holding the other at its pleasure. I
accepted the offer to save his whole heritage from confiscation. That
I did not convey to him, what I pined to do—viz., the information
that I held but in trust what was bestowed by the government, and
the full explanation of what seemed blamable in my conduct—was
necessarily owing to the secresy he maintained. I could not discover
his refuge; but I never ceased to plead for his recall. This year only I
have partially succeeded. He can be restored to his heritage and
rank, on one proviso—a guarantee for his loyalty. That guarantee the
government has named: it is the alliance of his only child with one
whom the government can trust. It was the interest of all Italian
nobility, that the representation of a house so great falling to a
female, should not pass away wholly from the direct line;—in a word,
that you should ally yourself with a kinsman. But one kinsman, and
he the next in blood, presented himself. Brief—Alphonso regains all
that he lost on the day in which his daughter gives her hand to Giulio
Franzini, Count di Peschiera. “Ah,” continued the Count, mournfully,
“you shrink—you recoil. He thus submitted to your choice is indeed
unworthy of you. You are scarce in the spring of life. He is in its
waning autumn. Youth loves youth. He does not aspire to your love.
All that he can say is, love is not the only joy of the heart—it is joy to
raise from ruin a beloved father—joy to restore, to a land poor in all
but memories, a chief in whom it reverences a line of heroes. These
are the joys I offer to you—you, a daughter, and an Italian maid. Still
silent! Oh speak to me!”
Certainly this Count Peschiera knew well how woman is to be
wooed and won; and never was woman more sensitive to those high
appeals which most move all true earnest womanhood, than was the
young Violante. Fortune favoured him in the moment chosen. Harley
was wrenched away from her hopes, and love a word erased from her
language. In the void of the world, her father’s image alone stood
clear and visible. And she who from infancy had so pined to serve
that father, who had first learned to dream of Harley as that father’s
friend! She could restore to him all for which the exile sighed; and by
a sacrifice of self! Self-sacrifice, ever in itself such a temptation to the
noble! Still, in the midst of the confusion and disturbance of her
mind, the idea of marriage with another seemed so terrible and
revolting, that she could not at once conceive it; and still that instinct
of openness and honour, which pervaded all her character, warned
even her inexperience that there was something wrong in this
clandestine appeal to herself.
Again the Count besought her to speak; and with an effort she said,
irresolutely—
“If it be as you say, it is not for me to answer you; it is for my
father.”
“Nay,” replied Peschiera. “Pardon, if I contradict you. Do you know
so little of your father as to suppose that he will suffer his interest to
dictate to his pride. He would refuse, perhaps, even to receive my
visit—to hear my explanations; but certainly he would refuse to buy
back his inheritance by the sacrifice of his daughter to one whom he
has deemed his foe, and whom the mere disparity of years would
incline the world to say he had made the barter of his personal
ambition. But if I could go to him sanctioned by you—if I could say
your daughter overlooks what the father might deem an obstacle—
she has consented to accept my hand of her own free choice—she
unites her happiness, and blends her prayers, with mine,—then,
indeed, I could not fail of success: and Italy would pardon my errors,
and bless your name. Ah! Signorina, do not think of me save as an
instrument towards the fulfilment of duties so high and sacred—
think but of your ancestors, your father, your native land, and reject
not the proud occasion to prove how you revere them all!”
Violante’s heart was touched at the right chord. Her head rose—
her colour came back to her pale cheek—she turned the glorious
beauty of her countenance towards the wily tempter. She was about
to answer, and to seal her fate, when at that instant Harley’s voice
was heard at a little distance, and Nero came bounding towards her,
and thrust himself, with rough familiarity, between herself and
Peschiera. The Count drew back, and Violante, whose eyes were still
fixed on his face, started at the change that passed there. One quick
gleam of rage sufficed in an instant to light up the sinister secrets of
his nature—it was the face of the baffled gladiator. He had time but
for few words.
“I must not be seen here,” he muttered; “but to-morrow—in these
gardens—about this hour. I implore you, for the sake of your father—
his hopes, fortunes, his very life, to guard the secret of this interview
—to meet me again. Adieu!”
He vanished amidst the trees, and was gone—noiselessly,
mysteriously, as he had come.
CHAPTER IX.
The last words of Peschiera were still ringing in Violante’s ears
when Harley appeared in sight, and the sound of his voice dispelled
the vague and dreamy stupor which had crept over her senses. At
that voice there returned the consciousness of a mighty loss, the
sting of an intolerable anguish. To meet Harley there, and thus,
seemed impossible. She turned abruptly away, and hurried towards
the house. Harley called to her by name, but she would not answer,
and only quickened her steps. He paused a moment in surprise, and
then hastened after her.
“Under what strange taboo am I placed?” said he gaily, as he laid
his hand on her shrinking arm. “I inquire for Helen—she is ill, and
cannot see me. I come to sun myself in your presence, and you fly me
as if gods and men had set their mark on my brow. Child!—child!—
what is this? You are weeping?”
“Do not stay me now—do not speak to me,” answered Violante
through her stifling sobs, as she broke from his hand and made
towards the house.
“Have you a grief, and under the shelter of my father’s roof? A grief
that you will not tell to me? Cruel!” cried Harley, with inexpressible
tenderness of reproach in his soft tones.
Violante could not trust herself to reply. Ashamed of her self-
betrayal—softened yet more by his pleading voice—she could have
prayed to the earth to swallow her. At length, checking back her tears
by a heroic effort, she said, almost calmly, “Noble friend, forgive me.
I have no grief, believe me, which—which I can tell to you. I was but
thinking of my poor father when you came up; alarming myself about
him, it may be, with vain superstitious fears; and so—even a slight
surprise—your abrupt appearance, has sufficed to make me thus
weak and foolish; but I wish to see my father!—to go home—home!”
“Your father is well, believe me, and pleased that you are here. No
danger threatens him; and you, here, are safe.”
“I safe—and from what?”
Harley mused irresolute. He inclined to confide to her the danger
which her father had concealed; but had he the right to do so against
her father’s will?
“Give me,” he said, “time to reflect, and to obtain permission to
intrust you with a secret which, in my judgment, you should know.
Meanwhile, this much I may say, that rather than you should incur
the danger that I believe he exaggerates, your father would have
given you a protector—even in Randal Leslie.”
Violante started.
“But,” resumed Harley, with a calm, in which a certain deep
mournfulness was apparent, unconsciously to himself—“but I trust
you are reserved for a fairer fate, and a nobler spouse. I have vowed
to live henceforth in the common workday world. But for you, bright
child, for you, I am a dreamer still!”
Violante turned her eyes for one instant towards the melancholy
speaker. The look thrilled to his heart. He bowed his face
involuntarily. When he looked up, she had left his side. He did not
this time attempt to follow her, but moved away and plunged amidst
the leafless trees.
An hour afterwards he re-entered the house, and again sought to
see Helen. She had now recovered sufficiently to give him the
interview he requested.
He approached her with a grave and serious gentleness.
“My dear Helen,” said he, “you have consented to be my wife, my
life’s mild companion; let it be soon—soon—for I need you. I need all
the strength of that holy tie. Helen, let me press you to fix the time.”
“I owe you too much,” answered Helen, looking down, “to have a
will but yours. But your mother,” she added, perhaps clinging to the
idea of some reprieve—“your mother has not yet—”
“My mother—true. I will speak first to her. You shall receive from
my family all honour due to your gentle virtues. Helen, by the way,
have you mentioned to Violante the bond between us?”
“No—that is, I fear I may have unguardedly betrayed it, against
Lady Lansmere’s commands too—but—but—”
“So, Lady Lansmere forbade you to name it to Violante. This
should not be. I will answer for her permission to revoke that
interdict. It is due to Violante and to you. Tell your young friend all.
Ah, Helen, if I am at times cold or wayward, bear with me—bear with
me; for you love me, do you not?”

You might also like