PDF Machine Learning Refined Foundations Algorithms and Applications Second Edition Borhani Ebook Full Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Machine learning refined foundations

algorithms and applications Second


Edition Borhani
Visit to download the full and correct content document:
https://textbookfull.com/product/machine-learning-refined-foundations-algorithms-and-
applications-second-edition-borhani/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Machine Learning Refined: Foundations, Algorithms, and


Applications Second Edition Jeremy Watt

https://textbookfull.com/product/machine-learning-refined-
foundations-algorithms-and-applications-second-edition-jeremy-
watt/

Foundations of Machine Learning second edition Mehryar


Mohri

https://textbookfull.com/product/foundations-of-machine-learning-
second-edition-mehryar-mohri/

Machine Learning Algorithms for Industrial Applications


Santosh Kumar Das

https://textbookfull.com/product/machine-learning-algorithms-for-
industrial-applications-santosh-kumar-das/

Machine Learning Foundations: Supervised, Unsupervised,


and Advanced Learning Taeho Jo

https://textbookfull.com/product/machine-learning-foundations-
supervised-unsupervised-and-advanced-learning-taeho-jo/
Pro Machine Learning Algorithms V Kishore Ayyadevara

https://textbookfull.com/product/pro-machine-learning-algorithms-
v-kishore-ayyadevara/

Analysis for computer scientists foundations methods


and algorithms Second Edition Oberguggenberger

https://textbookfull.com/product/analysis-for-computer-
scientists-foundations-methods-and-algorithms-second-edition-
oberguggenberger/

Learning Microsoft Cognitive Services leverage machine


learning APIs to build smart applications Second
Edition. Edition Larsen

https://textbookfull.com/product/learning-microsoft-cognitive-
services-leverage-machine-learning-apis-to-build-smart-
applications-second-edition-edition-larsen/

Machine learning and security protecting systems with


data and algorithms First Edition Chio

https://textbookfull.com/product/machine-learning-and-security-
protecting-systems-with-data-and-algorithms-first-edition-chio/

Machine Learning and its Applications 1st Edition Peter


Wlodarczak

https://textbookfull.com/product/machine-learning-and-its-
applications-1st-edition-peter-wlodarczak/
Machine Learning Refined

With its intuitive yet rigorous approach to machine learning, this text provides students
with the fundamental knowledge and practical tools needed to conduct research and
build data-driven products. The authors prioritize geometric intuition and algorithmic
thinking, and include detail on all the essential mathematical prerequisites, to offer a
fresh and accessible way to learn. Practical applications are emphasized, with examples
from disciplines including computer vision, natural language processing, economics,
neuroscience, recommender systems, physics, and biology. Over 300 color illustra-
tions are included and have been meticulously designed to enable an intuitive grasp
of technical concepts, and over 100 in-depth coding exercisesPython
(in ) provide a
real understanding of crucial machine learning algorithms. A suite of online resources
including sample code, data sets, interactive lecture slides, and a solutions manual are
provided online, making this an ideal text both for graduate courses on machine learning
and for individual reference and self-study.

Jeremy Watt received his PhD in Electrical Engineering from Northwestern University,
and is now a machine learning consultant and educator. He teaches machine learning,
deep learning, mathematical optimization, and reinforcement learning at Northwestern
University.

Reza Borhani received his PhD in Electrical Engineering from Northwestern University,

and is now a machine learning consultant and educator. He teaches a variety of courses
in machine learning and deep learning at Northwestern University.

Aggelos K. Katsaggelos is the Joseph Cummings Professor at Northwestern University,

where he heads the Image and Video Processing Laboratory. He is a Fellow of IEEE,
SPIE, EURASIP, and OSA and the recipient of the IEEE Third Millennium Medal
(2000).
Machine Learning Refined

Foundations, Algorithms, and Applications

J E R E M Y W AT T
Northwestern University, Illinois

REZA BORHANI
Northwestern University, Illinois

A G G E L O S K . K AT S A G G E L O S
Northwestern University, Illinois
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title:
www.cambridge.org/9781108480727
DOI: 10.1017/9781108690935
© Cambridge University Press 2020
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2020
Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A.
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-48072-7 Hardback
Additional resources for this publication www.cambridge.org/watt2
at
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To our families:

Deb, Robert, and Terri

Soheila, Ali, and Maryam

Ειρήνη Ζωή Σοφία


, , , and Ειρήνη
Contents

Preface pagexii
Acknowledgements xxii
1 Introduction to Machine Learning 1
1.1 Introduction 1
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 1
1.3 The Basic Taxonomy of Machine Learning Problems 6
1.4 Mathematical Optimization 16
1.5 Conclusion 18
Part I Mathematical Optimization 19
2 Zero-Order Optimization Techniques 21
2.1 Introduction 21
2.2 The Zero-Order Optimality Condition 23
2.3 Global Optimization Methods 24
2.4 Local Optimization Methods 27
2.5 Random Search 31
2.6 Coordinate Search and Descent 39
2.7 Conclusion 40
2.8 Exercises 42
3 First-Order Optimization Techniques 45
3.1 Introduction 45
3.2 The First-Order Optimality Condition 45
3.3 The Geometry of First-Order Taylor Series 52
3.4 Computing Gradients Efficiently 55
3.5 Gradient Descent 56
3.6 Two Natural Weaknesses of Gradient Descent 65
3.7 Conclusion 71
3.8 Exercises 71
4 Second-Order Optimization Techniques 75
4.1 The Second-Order Optimality Condition 75
viii Contents

4.2 The Geometry of Second-Order Taylor Series 78


4.3 Newton’s Method 81
4.4 Two Natural Weaknesses of Newton’s Method 90
4.5 Conclusion 91
4.6 Exercises 92
Part II Linear Learning 97
5 Linear Regression 99
5.1 Introduction 99
5.2 Least Squares Linear Regression 99
5.3 Least Absolute Deviations 108
5.4 Regression Quality Metrics 111
5.5 Weighted Regression 113
5.6 Multi-Output Regression 116
5.7 Conclusion 120
5.8 Exercises 121
5.9 Endnotes 124
6 Linear Two-Class Classification 125
6.1 Introduction 125
6.2 Logistic Regression and the Cross Entropy Cost 125
6.3 Logistic Regression and the Softmax Cost 135
6.4 The Perceptron 140
6.5 Support Vector Machines 150
6.6 Which Approach Produces the Best Results? 157
6.7 The Categorical Cross Entropy Cost 158
6.8 Classification Quality Metrics 160
6.9 Weighted Two-Class Classification 167
6.10 Conclusion 170
6.11 Exercises 171
7 Linear Multi-Class Classification 174
7.1 Introduction 174
7.2 One-versus-All Multi-Class Classification 174
7.3 Multi-Class Classification and the Perceptron 184
7.4 Which Approach Produces the Best Results? 192
7.5 The Categorical Cross Entropy Cost Function 193
7.6 Classification Quality Metrics 198
7.7 Weighted Multi-Class Classification 202
7.8 Stochastic and Mini-Batch Learning 203
7.9 Conclusion 205
7.10 Exercises 205
Contents ix

8 Linear Unsupervised Learning 208

8.1 Introduction 208

8.2 Fixed Spanning Sets, Orthonormality, and Projections 208

8.3 The Linear Autoencoder and Principal Component Analysis 213

8.4 Recommender Systems 219

8.5 K-Means Clustering 221

8.6 General Matrix Factorization Techniques 227

8.7 Conclusion 230

8.8 Exercises 231

8.9 Endnotes 233

9 Feature Engineering and Selection 237

9.1 Introduction 237

9.2 Histogram Features 238

9.3 Feature Scaling via Standard Normalization 249

9.4 Imputing Missing Values in a Dataset 254

9.5 Feature Scaling via PCA-Sphering 255

9.6 Feature Selection via Boosting 258

9.7 Feature Selection via Regularization 264

9.8 Conclusion 269

9.9 Exercises 269

Part III Nonlinear Learning 273

10 Principles of Nonlinear Feature Engineering 275

10.1 Introduction 275

10.2 Nonlinear Regression 275

10.3 Nonlinear Multi-Output Regression 282

10.4 Nonlinear Two-Class Classification 286

10.5 Nonlinear Multi-Class Classification 290

10.6 Nonlinear Unsupervised Learning 294

10.7 Conclusion 298

10.8 Exercises 298

11 Principles of Feature Learning 304

11.1 Introduction 304

11.2 Universal Approximators 307

11.3 Universal Approximation of Real Data 323


11.4 Naive Cross-Validation 335

11.5 E cient Cross-Validation via Boosting 340

11.6 Effi cient Cross-Validation via Regularization 350

11.7 Testing Data 361

11.8 Which Universal Approximator Works Best in Practice? 365

11.9 Bagging Cross-Validated Models 366


x Contents

11.10 K-Fold Cross-Validation 373


11.11 When Feature Learning Fails 378
11.12 Conclusion 379
11.13 Exercises 380
12 Kernel Methods 383
12.1 Introduction 383
12.2 Fixed-Shape Universal Approximators 383
12.3 The Kernel Trick 386
12.4 Kernels as Measures of Similarity 396
12.5 Optimization of Kernelized Models 397
12.6 Cross-Validating Kernelized Learners 398
12.7 Conclusion 399
12.8 Exercises 399
13 Fully Connected Neural Networks 403
13.1 Introduction 403
13.2 Fully Connected Neural Networks 403
13.3 Activation Functions 424
13.4 The Backpropagation Algorithm 427
13.5 Optimization of Neural Network Models 428
13.6 Batch Normalization 430
13.7 Cross-Validation via Early Stopping 438
13.8 Conclusion 440
13.9 Exercises 441
14 Tree-Based Learners 443
14.1 Introduction 443
14.2 From Stumps to Deep Trees 443
14.3 Regression Trees 446
14.4 Classification Trees 452
14.5 Gradient Boosting 458
14.6 Random Forests 462
14.7 Cross-Validation Techniques for Recursively Defined Trees 464
14.8 Conclusion 467
14.9 Exercises 467
Part IV Appendices 471
Appendix A Advanced First- and Second-Order Optimization Methods 473
A.1 Introduction 473
A.2 Momentum-Accelerated Gradient Descent 473
A.3 Normalized Gradient Descent 478
A.4 Advanced Gradient-Based Methods 485
Contents xi

A.5 Mini-Batch Optimization 487

A.6 Conservative Steplength Rules 490

A.7 Newton’s Method, Regularization, and Nonconvex Functions 499

A.8 Hessian-Free Methods 502

Appendix B Derivatives and Automatic Differentiation 511

B.1 Introduction 511

B.2 The Derivative 511

B.3 Derivative Rules for Elementary Functions and Operations 514

B.4 The Gradient 516

B.5 The Computation Graph 517

B.6 The Forward Mode of Automatic Di fferentiation 520

B.7 The Reverse Mode of Automatic Differentiation 526

B.8 Higher-Order Derivatives 529

B.9 Taylor Series 531

B.10 Using the autograd Library 536

Appendix C Linear Algebra 546

C.1 Introduction 546

C.2 Vectors and Vector Operations 546

C.3 Matrices and Matrix Operations 553

C.4 Eigenvalues and Eigenvectors 556

C.5 Vector and Matrix Norms 559

References 564

Index 569
Preface

For eons we humans have sought out rules or patterns that accurately describe

how important systems in the world around us work, whether these systems

be agricultural, biological, physical, financial, etc. We do this because such rules

allow us to understand a system better, accurately predict its future behavior

and ultimately, control it. However, the process of finding the ”right” rule that

seems to govern a given system has historically been no easy task. For most of

our history data (glimpses of a given system at work) has been an extremely

scarce commodity. Moreover, our ability to compute, to try out various rules

to see which most accurately represents a phenomenon, has been limited to

what we could accomplish by hand. Both of these factors naturally limited

the range of phenomena scientific pioneers of the past could investigate and

inevitably forced them to use philosophical and /or visual approaches to rule-

finding. Today, however, we live in a world awash in data, and have colossal

computing power at our fingertips. Because of this, we lucky descendants of the

great pioneers can tackle a much wider array of problems and take a much more

empirical approach to rule-finding than our forbears could. Machine learning,

the topic of this textbook, is a term used to describe a broad (and growing)

collection of pattern-finding algorithms designed to properly identify system

rules empirically and by leveraging our access to potentially enormous amounts

of data and computing power.

In the past decade the user base of machine learning has grown dramatically.

From a relatively small circle in computer science, engineering, and mathe-

matics departments the users of machine learning now include students and

researchers from every corner of the academic universe, as well as members of

industry, data scientists, entrepreneurs, and machine learning enthusiasts. This

textbook is the result of a complete tearing down of the standard curriculum

of machine learning into its most fundamental components, and a curated re-

assembly of those pieces (painstakingly polished and organized) that we feel

will most benefit this broadening audience of learners. It contains fresh and

intuitive yet rigorous descriptions of the most fundamental concepts necessary

to conduct research, build products, and tinker.


Preface xiii

Book Overview
The second edition of this text is a complete revision of our first endeavor, with

virtually every chapter of the original rewritten from the ground up and eight

new chapters of material added, doubling the size of the first edition. Topics from

the first edition, from expositions on gradient descent to those on One-versus-

All classification and Principal Component Analysis have been reworked and

polished. A swath of new topics have been added throughout the text, from

derivative-free optimization to weighted supervised learning, feature selection,

nonlinear feature engineering, boosting-based cross-validation, and more.

While heftier in size, the intent of our original attempt has remained un-

changed: to explain machine learning, from first principles to practical imple-

mentation, in the simplest possible terms. A big-picture breakdown of the second

edition text follows below.

Part I: Mathematical Optimization (Chapters 2–4)


Mathematical optimization is the workhorse of machine learning, powering not

only the tuning of individual machine learning models (introduced in Part II)

but also the framework by which we determine appropriate models themselves

via cross-validation (discussed in Part III of the text).

In this first part of the text we provide a complete introduction to mathemat-

ical optimization, from basic zero-order (derivative-free) methods detailed in

Chapter 2 to fundamental and advanced first-order and second-order methods

in Chapters 3 and 4, respectively. More specifically this part of the text con-

tains complete descriptions of local optimization, random search methodologies,

gradient descent, and Newton’s method.

Part II: Linear Learning (Chapters 5–9)


In this part of the text we describe the fundamental components of cost function

based machine learning, with an emphasis on linear models.

This includes a complete description of supervised learning in Chapters 5–7

including linear regression, two-class, and multi-class classification. In each of

these chapters we describe a range of perspectives and popular design choices

made when building supervised learners.

In Chapter 8 we similarly describe unsupervised learning, and Chapter 9 con-

tains an introduction to fundamental feature engineering practices including pop-

ular histogram features as well as various input normalization schemes, and

feature selection paradigms.


xiv Preface

Part III: Nonlinear Learning (Chapters 10–14)


In the final part of the text we extend the fundamental paradigms introduced in

Part II to the general nonlinear setting.

We do this carefully beginning with a basic introduction to nonlinear super-

vised and unsupervised learning in Chapter 10, where we introduce the motiva-

tion, common terminology, and notation of nonlinear learning used throughout

the remainder of the text.

In Chapter 11 we discuss how to automate the selection of appropriate non-

linear models, beginning with an introduction to universal approximation. This

naturally leads to detailed descriptions of cross-validation, as well as boosting,

regularization, ensembling, and K-folds cross-validation.

With these fundamental ideas in-hand, in Chapters 12–14 we then dedicate an

individual chapter to each of the three popular universal approximators used in

machine learning: fixed-shape kernels, neural networks, and trees, where we discuss

the strengths, weaknesses, technical eccentricities, and usages of each popular

universal approximator.

To get the most out of this part of the book we strongly recommend that

Chapter 11 and the fundamental ideas therein are studied and understood before

moving on to Chapters 12–14.

Part IV: Appendices


This shorter set of appendix chapters provides a complete treatment on ad-

vanced optimization techniques, as well as a thorough introduction to a range

of subjects that the readers will need to understand in order to make full use of

the text.

Appendix A continues our discussion from Chapters 3 and 4, and describes

advanced first- and second-order optimization techniques. This includes a discussion

of popular extensions of gradient descent, including mini-batch optimization,

momentum acceleration, gradient normalization, and the result of combining these

enhancements in various ways (producing e.g., the RMSProp and Adam first

order algorithms) – and Newton’s method – including regularization schemes

and Hessian-free methods.

Appendix B contains a tour of computational calculus including an introduc-

/
tion to the derivative gradient, higher-order derivatives, the Hessian matrix,

numerical di fferentiation, forward and backward (backpropogation) automatic


di fferentiation, and Taylor series approximations.
Appendix C provides a suitable background in linear and matrix algebra , in-

/
cluding vector matrix arithmetic, the notions of spanning sets and orthogonality,

as well as eigenvalues and eigenvectors.


Preface xv

Readers: How To Use This Book


This textbook was written with first-time learners of the subject in mind, as

well as for more knowledgeable readers who yearn for a more intuitive and

serviceable treatment than what is currently available today. To make full use of

the text one needs only a basic understanding of vector algebra (mathematical

functions, vector arithmetic, etc.) and computer programming (for example,

basic proficiency with a dynamically typed language like Python). We provide


complete introductory treatments of other prerequisite topics including linear

algebra, vector calculus, and automatic di ff erentiation in the appendices of the


text. Example ”roadmaps,” shown in Figures 0.1–0.4, provide suggested paths

for navigating the text based on a variety of learning outcomes and university

courses (ranging from a course on the essentials of machine learning to special

topics – as described further under ”Instructors: How to use this Book” below).

We believe that intuitive leaps precede intellectual ones, and to this end defer

the use of probabilistic and statistical views of machine learning in favor of a

fresh and consistent geometric perspective throughout the text. We believe that

this perspective not only permits a more intuitive understanding of individ-

ual concepts in the text, but also that it helps establish revealing connections

between ideas often regarded as fundamentally distinct (e.g., the logistic re-

gression and Support Vector Machine classifiers, kernels and fully connected

neural networks, etc.). We also highly emphasize the importance of mathemati-

cal optimization in our treatment of machine learning. As detailed in the ”Book

Overview” section above, optimization is the workhorse of machine learning

and is fundamental at many levels – from the tuning of individual models to

the general selection of appropriate nonlinearities via cross-validation. Because

of this a strong understanding of mathematical optimization is requisite if one

wishes to deeply understand machine learning, and if one wishes to be able to

implement fundamental algorithms.

To this end, we place significant emphasis on the design and implementa-

tion of algorithms throughout the text with implementations of fundamental

algorithms given in Python. These fundamental examples can then be used as


building blocks for the reader to help complete the text’s programming exer-

cises, allowing them to ”get their hands dirty” and ”learn by doing,” practicing

the concepts introduced in the body of the text. While in principle any program-

ming language can be used to complete the text’s coding exercises, we highly

recommend using Python for its ease of use and large support community. We
also recommend using the open-source Python libraries NumPy, autograd, and
matplotlib, as well as the Jupyter notebook editor to make implementing and
testing code easier. A complete set of installation instructions, datasets, as well

as starter notebooks for many exercises can be found at

https://github.com/jermwatt/machine_learning_refined
xvi Preface

Instructors: How To Use This Book


Chapter slides associated with this textbook, datasets, along with a large array of

instructional interactive Python widgets illustrating various concepts through-


out the text, can be found on the github repository accompanying this textbook

at

https://github.com/jermwatt/machine_learning_refined
This site also contains instructions for installing Python as well as a number

of other free packages that students will find useful in completing the text’s

exercises.

This book has been used as a basis for a number of machine learning courses

at Northwestern University, ranging from introductory courses suitable for un-

dergraduate students to more advanced courses on special topics focusing on

optimization and deep learning for graduate students. With its treatment of

foundations, applications, and algorithms this text can be used as a primary

resource or in fundamental component for courses such as the following.

Machine learning essentials treatment : an introduction to the essentials

of machine learning is ideal for undergraduate students, especially those in

quarter-based programs and universities where a deep dive into the entirety

of the book is not feasible due to time constraints. Topics for such a course

can include: gradient descent, logistic regression, Support Vector Machines,

One-versus-All and multi-class logistic regression, Principal Component Anal-

ysis, K-means clustering, the essentials of feature engineering and selection,

cross-validation, regularization, ensembling, bagging, kernel methods, fully

connected neural networks, and trees. A recommended roadmap for such a

course – including recommended chapters, sections, and corresponding topics

– is shown in Figure 0.1.

Machine learning full treatment: a standard machine learning course based

on this text expands on the essentials course outlined above both in terms

of breadth and depth. In addition to the topics mentioned in the essentials

course, instructors may choose to cover Newton’s method, Least Absolute

Deviations, multi-output regression, weighted regression, the Perceptron, the

Categorical Cross Entropy cost, weighted two-class and multi-class classifica-

tion, online learning, recommender systems, matrix factorization techniques,

boosting-based feature selection, universal approximation, gradient boosting,

random forests, as well as a more in-depth treatment of fully connected neu-

ral networks involving topics such as batch normalization and early-stopping-

based regularization. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is illustrated in

Figure 0.2.
Preface xvii

Mathematical optimization for machine learning and deep learning: such

a course entails a comprehensive description of zero-, first-, and second-order

optimization techniques from Part I of the text (as well as Appendix A) in-

cluding: coordinate descent, gradient descent, Newton’s method, quasi-Newton

methods, stochastic optimization, momentum acceleration, fixed and adaptive

steplength rules, as well as advanced normalized gradient descent schemes

(e.g., Adam and RMSProp). These can be followed by an in-depth description

of the feature engineering processes (especially standard normalization and

PCA-sphering) that speed up (particularly first-order) optimization algorithms.

All students in general, and those taking an optimization for machine learning

course in particular, should appreciate the fundamental role optimization plays

in identifying the ”right” nonlinearity via the processes of boosting and regular-

iziation based cross-validation, the principles of which are covered in Chapter

11. Select topics from Chapter 13 and Appendix B – including backpropagation,

/
batch normalization, and foward backward mode of automatic di ff erentiation
– can also be covered. A recommended roadmap for such a course – including

recommended chapters, sections, and corresponding topics – is given in Figure

0.3.

Introductory portion of a course on deep learning : such a course is best suit-

able for students who have had prior exposure to fundamental machine learning

concepts, and can begin with a discussion of appropriate first order optimiza-

tion techniques, with an emphasis on stochastic and mini-batch optimization,

momentum acceleration, and normalized gradient schemes such as Adam and

RMSProp. Depending on the audience, a brief review of fundamental elements

of machine learning may be needed using selected portions of Part II of the text.

A complete discussion of fully connected networks, including a discussion of

/
backpropagation and forward backward mode of automatic di fferentiation, as
well as special topics like batch normalization and early-stopping-based cross-

validation, can then be made using Chapters 11, 13 , and Appendices A and B of

the text. A recommended roadmap for such a course – including recommended

chapters, sections, and corresponding topics – is shown in Figure 0.4. Additional

recommended resources on topics to complete a standard course on deep learn-

ing – like convolutional and recurrent networks – can be found by visiting the

text’s github repository.


xviii Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5
2 Global/Local Optimization Curse of Dimensionality

1 2 3 4 5
3 Gradient Descent

1 2
5 Least Squares Linear Regression

1 2 3 5 6 8
6 Logistic Regression Cross Entropy/Softmax Cost SVMs

1 2 3 4 6
7 One-versus-All Multi-Class Logistic Regression

1 2 3 5
Principal Component Analysis K-means
8

2 7
Feature Engineering Feature Selection
9

1 2 4
Nonlinear Regression Nonlinear Classification
10

1 2 3 4 6 7 9
11 Universal Approximation Cross-Validation Regularization

Ensembling Bagging

1 2 3
Kernel Methods The Kernel Trick
12

1 2 4
Fully Connected Networks Backpropagation
13

1 2 3 4
14 Regression Trees Classification Trees

Figure 0.1 Recommended study roadmap for a course on the essentials of machine

learning, including requisite chapters (left column), sections (middle column), and

corresponding topics (right column). This essentials plan is suitable for

time-constrained courses (in quarter-based programs and universities) or self-study, or

where machine learning is not the sole focus but a key component of some broader

course of study. Note that chapters are grouped together visually based on text layout

detailed under ”Book Overview” in the Preface. See the section titled ”Instructors: How

To Use This Book” in the Preface for further details.


Preface xix

CHAPTER SECTIONS TOPICS

1 2 3 4 5
1 Machine Learning Taxonomy

1 2 3 4 5
Global/Local Optimization Curse of Dimensionality
2
1 2 3 4 5
3 Gradient Descent

1 2 3
4 Newton’s method

1 2 3 4 5 6
5 Least Squares Linear Regression Least Absolute Deviations

Multi-Output Regression Weighted Regression

1 2 3 4 5 6 7 8 9 10
6 Logistic Regression Cross Entropy/Softmax Cost The Perceptron

SVMs Categorical Cross Entropy Weighted Two-Class Classification

1 2 3 4 5 6 7 8 9
7 One-versus-All Multi-Class Logistic Regression

Weighted Multi-Class Classification Online Learning

1 2 3 4 5 6 7
PCA K-means Recommender Systems Matrix Factorization
8
1 2 3 6 7
Feature Engineering Feature Selection Boosting Regularization
9

1 2 3 4 5 6 7
Nonlinear Supervised Learning Nonlinear Unsupervised Learning
10

1 2 3 4 5 6 7 8 9 10 11 12
Universal Approximation Cross-Validation Regularization
11
Ensembling Bagging K-Fold Cross-Validation

1 2 3 4 5 6 7
Kernel Methods The Kernel Trick
12
1 2 3 4 5 6 7 8
Fully Connected Networks Backpropagation Activation Functions
13
Batch Normalization Early Stopping

1 2 3 4 5 6 7 8
14 Regression/Classification Trees Gradient Boosting Random Forests

Figure 0.2 Recommended study roadmap for a full treatment of standard machine

learning subjects, including chapters, sections, as well as corresponding topics to cover.

This plan entails a more in-depth coverage of machine learning topics compared to the

essentials roadmap given in Figure 0.1, and is best suited for senior undergraduate/early

graduate students in semester-based programs and passionate independent readers. See

the section titled ”Instructors: How To Use This Book” in the Preface for further details.
xx Preface

CHAPTER SECTIONS TOPICS

1 2 3 4 5
Machine Learning Taxonomy
1

1 2 3 4 5 6 7
2 Global/Local Optimization Curse of Dimensionality

Random Search Coordinate Descent

1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
Newton’s Method
4

6
8
Online Learning
7

8
3 4 5
Feature Scaling PCA-Sphering Missing Data Imputation
9

10
5 6
Regularization
11 Boosting

12
6
13 Batch Normalization

14

1 2 3 4 5 6 7 8
Momentum Acceleration Normalized Schemes: Adam, RMSProp
A
Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization Hessian-Free Optimization

1 2 3 4 5 6 7 8 9 10
Forward/Backward Mode of Automatic Differentiation
B

Figure 0.3 Recommended study roadmap for a course on mathematical optimization

for machine learning and deep learning, including chapters, sections, as well as topics

to cover. See the section titled ”Instructors: How To Use This Book” in the Preface for

further details.
Preface xxi

CHAPTER SECTIONS TOPICS

2
1 2 3 4 5 6 7
3 Gradient Descent

1 2 3 4 5
10 Nonlinear Regression Nonlinear Classification Nonlinear Autoencoder

1 2 3 4 6
11 Universal Approximation Cross-Validation Regularization

12
1 2 3 4 5 6 7 8
13 Fully Connected Networks Backpropagation Activation Functions

Batch Normalization Early Stopping

14

1 2 3 4 5 6
A Momentum Acceleration Normalized Schemes: Adam, RMSProp

Fixed Lipschitz Steplength Rules Backtracking Line Search

Stochastic/Mini-Batch Optimization

1 2 3 4 5 6 7 8 9 10
B Forward/Backward Mode of Automatic Differentiation

Figure 0.4 Recommended study roadmap for an introductory portion of a course on

deep learning, including chapters, sections, as well as topics to cover. See the section

titled ”Instructors: How To Use This Book” in the Preface for further details.
Acknowledgements

This text could not have been written in anything close to its current form

without the enormous work of countless genius-angels in the Python open-


source community, particularly authors and contributers of NumPy, Jupyter,
and matplotlib. We are especially grateful to the authors and contributors of

autograd including Dougal Maclaurin, David Duvenaud, Matt Johnson, and


Jamie Townsend, as autograd allowed us to experiment and iterate on a host of

new ideas included in the second edition of this text that greatly improved it as

well as, we hope, the learning experience for its readers.

We are also very grateful for the many students over the years that provided

insightful feedback on the content of this text, with special thanks to Bowen

Tian who provided copious amounts of insightful feedback on early drafts of

the work.

Finally, a big thanks to Mark McNess Rosengren and the entire Standing

Passengers crew for helping us stay ca ffeinated during the writing of this text.
1 Introduction to Machine
Learning

1.1 Introduction
Machine learning is a unified algorithmic framework designed to identify com-

putational models that accurately describe empirical data and the phenomena

underlying it, with little or no human involvement. While still a young dis-

cipline with much more awaiting discovery than is currently known, today

machine learning can be used to teach computers to perform a wide array

of useful tasks including automatic detection of objects in images (a crucial

component of driver-assisted and self-driving cars), speech recognition (which

powers voice command technology), knowledge discovery in the medical sci-

ences (used to improve our understanding of complex diseases), and predictive

analytics (leveraged for sales and economic forecasting), to just name a few.

In this chapter we give a high-level introduction to the field of machine

learning as well as the contents of this textbook.

1.2 Distinguishing Cats from Dogs: a Machine Learning


Approach
To get a big-picture sense of how machine learning works, we begin by dis-

cussing a toy problem: teaching a computer how to distinguish between pic-

tures of cats from those with dogs. This will allow us to informally describe the

terminology and procedures involved in solving the typical machine learning

problem.

Do you recall how you first learned about the di ff erence between cats and
dogs, and how they are di ff erent animals? The answer is probably no, as most
humans learn to perform simple cognitive tasks like this very early on in the

course of their lives. One thing is certain, however: young children do not need

some kind of formal scientific training, or a zoological lecture on felis catus and

canis familiaris species, in order to be able to tell cats and dogs apart. Instead,

they learn by example. They are naturally presented with many images of

what they are told by a supervisor (a parent, a caregiver, etc.) are either cats

or dogs, until they fully grasp the two concepts. How do we know when a

child can successfully distinguish between cats and dogs? Intuitively, when
2 Introduction to Machine Learning

they encounter new (images of) cats and dogs, and can correctly identify each

new example or, in other words, when they can generalize what they have learned

to new, previously unseen, examples.

Like human beings, computers can be taught how to perform this sort of task

in a similar manner. This kind of task where we aim to teach a computer to

distinguish between di ff erent types or classes of things (here cats and dogs) is

referred to as a classification problem in the jargon of machine learning, and is

done through a series of steps which we detail below.

1. Data collection. Like human beings, a computer must be trained to recognize

the diff erence between these two types of animals by learning from a batch of
examples, typically referred to as a training set of data. Figure 1.1 shows such a

training set consisting of a few images of di fferent cats and dogs. Intuitively, the
larger and more diverse the training set the better a computer (or human) can

perform a learning task, since exposure to a wider breadth of examples gives

the learner more experience.

Figure 1.1 A training set consisting of six images of cats (highlighted in blue) and six

images of dogs (highlighted in red). This set is used to train a machine learning model

that can distinguish between future images of cats and dogs. The images in this figure

were taken from [1].

2. Feature design. Think for a moment about how we (humans) tell the di ff erence
between images containing cats from those containing dogs. We use color, size,

/
the shape of the ears or nose, and or some combination of these features in order

to distinguish between the two. In other words, we do not just look at an image

as simply a collection of many small square pixels. We pick out grosser details,

or features, from images like these in order to identify what it is that we are

looking at. This is true for computers as well. In order to successfully train a

computer to perform this task (and any machine learning task more generally)
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 3

we need to provide it with properly designed features or, ideally, have it find or

learn such features itself.

Designing quality features is typically not a trivial task as it can be very ap-

plication dependent. For instance, a feature like color would be less helpful in

discriminating between cats and dogs (since many cats and dogs share similar

hair colors) than it would be in telling grizzly bears and polar bears apart! More-

over, extracting the features from a training dataset can also be challenging. For

example, if some of our training images were blurry or taken from a perspective

where we could not see the animal properly, the features we designed might

not be properly extracted.

However, for the sake of simplicity with our toy problem here, suppose we

can easily extract the following two features from each image in the training set:

size of nose relative to the size of the head, ranging from small to large, and shape

of ears, ranging from round to pointy.


pointy
ear shape
round

small nose size large

Figure 1.2 Feature space representation of the training set shown in Figure 1.1 where

the horizontal and vertical axes represent the features nose size and ear shape,

respectively. The fact that the cats and dogs from our training set lie in distinct regions

of the feature space reflects a good choice of features.

Examining the training images shown in Figure 1.1 , we can see that all cats

have small noses and pointy ears, while dogs generally have large noses and

round ears. Notice that with the current choice of features each image can now

be represented by just two numbers: a number expressing the relative nose size,

and another number capturing the pointiness or roundness of the ears. In other

words, we can represent each image in our training set in a two-dimensional


4 Introduction to Machine Learning

feature space where the features nose size and ear shape are the horizontal and

vertical coordinate axes, respectively, as illustrated in Figure 1.2.

3. Model training. With our feature representation of the training data the

machine learning problem of distinguishing between cats and dogs is now a

simple geometric one: have the machine find a line or a curve that separates

the cats from the dogs in our carefully designed feature space. Supposing for

simplicity that we use a line, we must find the right values for its two parameters

– a slope and vertical intercept – that define the line’s orientation in the feature

space. The process of determining proper parameters relies on a set of tools

known as mathematical optimization detailed in Chapters 2 through 4 of this text,

and the tuning of such a set of parameters to a training set is referred to as the

training of a model.

Figure 1.3 shows a trained linear model (in black) which divides the feature

space into cat and dog regions. This linear model provides a simple compu-

tational rule for distinguishing between cats and dogs: when the feature rep-

resentation of a future image lies above the line (in the blue region) it will be

considered a cat by the machine, and likewise any representation that falls below

the line (in the red region) will be considered a dog.


pointy
ear shape
round

small nose size large

Figure 1.3 A trained linear model (shown in black) provides a computational rule for

distinguishing between cats and dogs. Any new image received in the future will be

classified as a cat if its feature representation lies above this line (in the blue region), and

a dog if the feature representation lies below this line (in the red region).
1.2 Distinguishing Cats from Dogs: a Machine Learning Approach 5

Figure 1.4 A validation set of cat and dog images (also taken from [1]). Notice that the

images in this set are not highlighted in red or blue (as was the case with the training set

shown in Figure 1.1) indicating that the true identity of each image is not revealed to the

learner. Notice that one of the dogs, the Boston terrier in the bottom right corner, has

both a small nose and pointy ears. Because of our chosen feature representation the

computer will think this is a cat!

4. Model validation. To validate the e fficacy of our trained learner we now show
the computer a batch of previously unseen images of cats and dogs, referred to

generally as a validation set of data, and see how well it can identify the animal

in each image. In Figure 1.4 we show a sample validation set for the problem at

hand, consisting of three new cat and dog images. To do this, we take each new

image, extract our designed features (i.e., nose size and ear shape), and simply

check which side of our line (or classifier) the feature representation falls on. In

this instance, as can be seen in Figure 1.5, all of the new cats and all but one dog

from the validation set have been identified correctly by our trained model.

The misidentification of the single dog (a Boston terrier) is largely the result

of our choice of features, which we designed based on the training set in Figure

1.1, and to some extent our decision to use a linear model (instead of a nonlinear

one). This dog has been misidentified simply because its features, a small nose

and pointy ears, match those of the cats from our training set. Therefore, while

it first appeared that a combination of nose size and ear shape could indeed

distinguish cats from dogs, we now see through validation that our training set

was perhaps too small and not diverse enough for this choice of features to be

completely e ff ective in general.


We can take a number of steps to improve our learner. First and foremost we

should collect more data, forming a larger and more diverse training set. Second,

/
we can consider designing including more discriminating features (perhaps eye

color, tail shape, etc.) that further help distinguish cats from dogs using a linear

model. Finally, we can also try out (i.e., train and validate) an array of nonlinear

models with the hopes that a more complex rule might better distinguish be-

tween cats and dogs. Figure 1.6 compactly summarizes the four steps involved

in solving our toy cat-versus-dog classification problem.


6 Introduction to Machine Learning

pointy
ear shape
round

small nose size large

Figure 1.5 Identification of (the feature representation of) validation images using our

trained linear model. The Boston terrier (pointed to by an arrow) is misclassified as a cat

since it has pointy ears and a small nose, just like the cats in our training set.

Data collection Feature design Model training Model validation

Training set

Validation set

Figure 1.6 The schematic pipeline of our toy cat-versus-dog classification problem. The

same general pipeline is used for essentially all machine learning problems.

1.3 The Basic Taxonomy of Machine Learning Problems


The sort of computational rules we can learn using machine learning generally

fall into two main categories called supervised and unsupervised learning, which
we discuss next.
1.3 The Basic Taxonomy of Machine Learning Problems 7

1.3.1 Supervised learning


Supervised learning problems (like the prototypical problem outlined in Section

1.2) refer to the automatic learning of computational rules involving input /out-

put relationships. Applicable to a wide array of situations and data types, this

type of problem comes in two forms, called regression and classification, depend-

ing on the general numerical form of the output.

Regression
Suppose we wanted to predict the share price of a company that is about to

go public. Following the pipeline discussed in Section 1.2, we first gather a

training set of data consisting of a number of corporations (preferably active in

the same domain) with known share prices. Next, we need to design feature(s)

that are thought to be relevant to the task at hand. The company’s revenue is one

such potential feature, as we can expect that the higher the revenue the more

expensive a share of stock should be. To connect the share price (output) to the

revenue (input) we can train a simple linear model or regression line using our

training data.
share price

share price

revenue revenue
share price

share price

new company’s revenue estimated share price

revenue revenue

Figure 1.7 (top-left panel) A toy training dataset consisting of ten corporations’ share

price and revenue values. (top-right panel) A linear model is fit to the data. This trend

line models the overall trajectory of the points and can be used for prediction in the

future as shown in the bottom-left and bottom-right panels.

The top panels of Figure 1.7 show a toy dataset comprising share price versus

revenue information for ten companies, as well as a linear model fit to this data.

Once the model is trained, the share price of a new company can be predicted
8 Introduction to Machine Learning

based on its revenue, as depicted in the bottom panels of this figure. Finally,

comparing the predicted price to the actual price for a validation set of data

we can test the performance of our linear regression model and apply changes

as needed, for example, designing new features (e.g., total assets, total equity,

number of employees, years active, etc.) and/or trying more complex nonlinear

models.

This sort of task, i.e., fitting a model to a set of training data so that predictions

about a continuous-valued output (here, share price) can be made, is referred to as


regression. We begin our detailed discussion of regression in Chapter 5 with the

linear case, and move to nonlinear models starting in Chapter 10 and throughout

Chapters 11–14. Below we describe several additional examples of regression to

help solidify this concept.

Example 1.1 The rise of student loan debt in the United States

Figure 1.8 (data taken from [2]) shows the total student loan debt (that is money

borrowed by students to pay for college tuition, room and board, etc.) held

by citizens of the United States from 2006 to 2014, measured quarterly. Over

the eight-year period reflected in this plot the student debt has nearly tripled,

totaling over one trillion dollars by the end of 2014. The regression line (in

black) fits this dataset quite well and, with its sharp positive slope, emphasizes

the point that student debt is rising dangerously fast. Moreover, if this trend

continues, we can use the regression line to predict that total student debt will

surpass two trillion dollars by the year 2026 (we revisit this problem later in

Exercise 5.1).
[in trillions of dollars]
student debt

year

Figure 1.8 Figure associated with Example 1.1, illustrating total student loan debt in the

United States measured quarterly from 2006 to 2014. The rapid increase rate of the debt,

measured by the slope of the trend line fit to the data, confirms that student debt is

growing very fast. See text for further details.


Another random document with
no related content on Scribd:
decentration of the eye as if a prism were prescribed, nature
supplying its own decentration.

Treatment for Correcting Esophoria


in Children
In case of esophoria, regardless of amount, slightly increased
spherical power is frequently prescribed for children. This will
naturally blur or fog the patient’s vision, but in their effort to
overcome the blur, accommodation is relaxed, usually tending to
correct the muscular defect.
In such cases, as a rule, a quarter diopter increased spherical
strength may frequently be added for each degree of esophoria as
determined before the optical correction was made. In a case of 6
degrees of esophoria, the refractionist may prescribe +1.50 diopter
spherical added to the optical correction, which, let us assume, is
+1.00 sph. = -1.00 cyl. ax. 180°, so that the treatment glasses would
be +2.50 sph. = -1.00 ax. 180° (See Procedure on Page 74).
At the end of each three months’ period, the patient should be
requested to return, when the binocular and the duction test should
again be made, comparing results with the work previously
accomplished. An improvement tending to build up the left weak
externus will possibly permit of a decrease of the excessive spherical
power, so that excessive spherical power is reduced until completely
removed, in all probability overcoming the muscular defect.
Esophoria is almost invariably a false condition and frequently is
outgrown under this treatment as the child advances in years. On the
other hand, esophoria uncared for in the child may tend to produce
exophoria in the adult.

How Optical Correction Tends to


Decrease 6° Esophoria in a Child
Assume binocular muscle test made
before optical correction shows
6° Esophoria.
+1. Sph. = -1. Cyl. Ax. 180.

Next, locate faulty muscle by making a duction


test, which shows how abduction of left eye is
made to equal that of right eye, change being
made quarterly with treatment lenses in
accordance with following rule. Note as
abduction is increased, esophoria is reduced.
Rule—prescribe a quarter diopter increased
sphere for each degree of imbalance or 0.25
× 6 equals:
+1.50 added to optical correction.
1/1/19 (assumed date) prescribed treatment
lenses equal:
+2.50 = -1. × 180°.

4/1/19 (3 months later) assuming abduction has


increased from 2° to 3° showing difference of
5 Es. or 0.25 × 5. equals +1.25 added to
optical correction, prescribed treatment lenses
equal:
+2.25 = -1. × 180.

7/1/19 (3 months later), assuming abduction has


increased from 3° to 4° showing difference of
4° Es. or 0.25 × 4 equals +1.00 which added
to optical correction would make prescribed
treatment lenses equal:
+2.00 = -1. × 180.

And so on, every three months treatment lenses


are prescribed until both right and left eye
show 8° of abduction. In this way the
treatment lenses are reduced to original
correction of +1.00 = -100 × 180. This would
have required six changes of lenses, three
months apart—thus consuming 18 months
time.
Chapter X
SECOND METHOD OF TREATMENT—
MUSCULAR EXERCISE

Made With Two Rotary Prisms


and Red Maddox Rod

Exophoria

I f a case is one of exophoria of six degrees, where the second


method of treatment or muscular exercise is in line of routine, it is
essential to first determine through a duction test and the
preparation of the diagram exactly which one of the four muscles are
faulty (Fig. 24).
Having determined, with the aid of the diagram, first, the
existence of 6 degrees of exophoria; second, 18 degrees of
adduction; third, a weak left internus—the next procedure is to
determine what degree of prism will enable the patient to obtain
single binocular vision, with both eyes looking “straight.”
To determine this, place both of the Ski-optometer’s rotary prisms
in position with the handle of each pointing outward horizontally. The
red line or indicator of each prism should then be placed at 30° of the
outer scale (Fig. 26).
The red Maddox rod should be horizontally positioned before the
eye, the white line on indicator pointing to 180° of the scale (Fig. 27).
The strength of the rotary prism before the right eye should
thereupon be reduced by rotating the prism indicator or red line
toward the upper zero (0) to a point where the patient first sees the
red streak—assuming that the red line appears at 42 degrees, that is
30 degrees before the left eye and 12 degrees before the right.

Fig. 26 (A and B)—First position of rotary


prisms to determine amount of prism
exercise to be employed for building up
the weak muscle.
The prism should then be still further reduced until the vertical
streak produced by the Maddox rod directly bisects the muscle
testing spot of light. Assuming that this point be thirty-eight degrees,
which is four degrees less, single binocular vision is produced.
Fig. 27—Position of red Maddox rod used
in conjunction with Fig. 26 for prism
exercising.
For example, sixty degrees of prism power (the combined power
of the two rotary prisms) will usually cause complete distortion.
Therefore, as outlined in Figure 28, the patient, seeing only out of
the right eye, will detect nothing but a white light. By gradually
reducing the strength of the prism before the right, which is the good
eye, the patient will eventually see a red streak off to the left. A
continued and gradual reduction to a point where the red streak
bisects the white light, will determine how much prism power is
required for the patient to obtain single binocular vision, thus
establishing the same image at the same time on each fovea or
retina (Fig. 20).
This has taught the patient to do that which he has never before
accomplished. Therefore, after having been taught how to make the
two eyes work in relation to each other, the natural tendency
thereafter will be to strive for the same relationship of vision with
both eyes. The refractionist should then aim to reduce the excessive
amount of prism required to give binocular vision, which can be
accomplished by muscular exercise.
It must always be remembered before the refractionist is ready to
employ the muscular exercise or second method, that the degree of
prism required to give the patient single binocular vision must be
determined with the optical correction in place. The exercise must be
practised daily in routine, a daily record being essential.

An Assumed Case
We will assume a case where 42 degrees is required to enable
the patient to first see the red streak as produced by the Maddox rod
to the extreme left. Through a continued gradual reduction of 4
degrees (or to 38 degrees), we next learn that the streak was carried
over until it bisected the white spot of light, giving single binocular
vision and producing a position of rest.
Fig. 28—Simplified chart showing the
prism action employed in developing a
weak ocular muscle through alternating
prism exercise. Either side of 38° in
excess of 4° causing diplopia.
The patient has now established the limitation of the exercise,
which is four degrees, this limitation being determined by the
difference between the point where the streak was first seen to the
extreme side and where it bisected the spot. The same amount of
four degrees should then be used for the opposite side, thus
reducing the prism strength to 34 degrees.
This again produces diplopia, because of the lesser amount of
prism power employed to give single binocular vision. The
refractionist should then return to 38 degrees, where single binocular
vision had originally been determined (Fig. 28), alternating back to
42, returning to 38, over to 34, back to 38, and so on. This procedure
should be employed once a day just after meals for about five
minutes, and repeated ten times, constantly striving for a slight
reduction of prism power from day to day.

Effect of Muscular Exercise


This muscular treatment, or constructive exercising, should
enable the patient to overcome his amount of four degrees in either
direction in about a week. Hence in the case showing 38 degrees for
single binocular vision, results may be looked for in about nine
weeks—four degrees divided into 38 degrees. While the patient is
undergoing the treatment, which is nothing more than the
strengthening of the interni muscles or developing adduction, it is
natural to believe that the amount of imbalance is likewise being
conquered. This, however, is readily determined from time to time by
making the binocular muscle test with the phorometer and Maddox
rod, as well as the duction chart test (Fig. 24), as previously outlined.
To fully appreciate the effect of this muscular treatment, the
reader need only hold his head in a stationary position, casting his
eyes several times from the extreme right to the extreme left, not
failing to note the apparent muscular strain. On the other hand, with
the aid of the Ski-optometer’s rotating prisms, the refractionist not
only has complete control of the patient’s muscles at all times, but
scientifically accomplishes muscular exercise without any tiresome
strain, overcoming all possible exertion.
After the case in question has been reduced to 30 degrees,
having no further use for the rotary prism, it may be removed from
before the right eye and the same exercising procedure continued as
before with the remaining left side rotary prism by reducing its power,
until it is likewise down to zero.
Having reduced both prisms to zero, each prism should again be
placed in position with zero graduations vertical and the prism
indicator on upper zero. Both prisms should then be turned
simultaneously about four degrees toward the nasal side of the
patient, thus tending to jointly force corresponding muscles of both
eyes.

Home Treatment for Muscular Exercise—


Square Prism Set Used in Conjunction
With the Ski-Optometer
Where a patient is unable to call each day for this muscular
treatment or exercise, the work will be greatly facilitated by
employing a specially designed set of square prisms ranging in
strength from ½ to 20 degrees for home treatment. As in the case
previously cited, it is necessary to carefully instruct the patient that
the interni muscles must be developed, hence prism base out with
apex in must be employed. Attention should then be directed to a
candle light, serving as a muscle testing spot of light and stationed in
a semi-dark room at an approximate distance of twenty feet.
Having determined through the Ski-optometer the strength of the
prism required after each office treatment, its equivalent should then
be placed in a special square prism trial-frame which permits rotation
of the prism, although the patient is frequently taught to twirl the lens
before the eye. This exercise may be continued for about five
minutes each day.
The patient should also be instructed to call at the end of each
week, when the work may be checked by means of the Ski-
optometer’s rotary prisms, making the duction test as previously
explained and outlined in Fig. 24. It is then possible to determine
whether or not satisfactory results are being obtained. Otherwise the
exercise should be abandoned.
Should the second method employed in the work of muscular
imbalance not prove effective, the third method requiring the use of
prisms would be next in routine.
Chapter XI
THIRD METHOD OF TREATMENT—PRISM
LENSES

When and How Employed

A s stated in the preceding chapter, on ascertaining the failure of


the second muscular treatment or method, prisms are employed
for constant wear. When prism lenses are used, whether the
case is exophoria or esophoria, or right or left hyperphoria, it is
always safe to prescribe one-quarter degree prism for each degree
of prism imbalance for each eye. For example, in a case of 6
degrees of esophoria, a prism of 1½ degree base out should be
prescribed for each eye; or in 6 degrees of exophoria, employ the
same amount of prism, but base in. In right hyperphoria, place the
prism base down before the right eye and up before the left, and vice
versa for left hyperphoria.
It is not always advisable, however, to allow the patient to wear
the same degree of prism for any length of time. Many authorities
suggest a constant change with the idea that a prism is nothing more
than a crutch. Should the same degree be constantly worn, even
though it afforded temporary relief, the eye would become
accustomed to it and the purpose of the prism entirely lost. Prisms
should be prescribed with extreme care, their use being identical
with that of dumb-bells, where weight is first increased to maximum
and subsequently reduced, viz.:

Prism Reduction Method


Where prisms are prescribed, it is considered good practice to
make a binocular muscle test and the duction test (Fig. 24) at the
end of each three months’ period, employing the phorometer,
Maddox rod, and rotary prisms, as already explained.
If the condition shows any decrease, the prism degree should be
proportionately decreased. For example, in the case originally
showing 6 degrees of exophoria, one-quarter degree prism for each
degree of imbalance was prescribed, or 1½ degree for each eye. If
the same case subsequently indicated 4 degrees, only one degree
for each eye should be prescribed—and so on, a gradual reduction
of prism value being constantly sought.
Except in rare cases, prisms should not be prescribed with the
base or apex at oblique angles, as the eye is rarely at rest with such
a correction. An imbalance may be caused by a false condition in
one rectus and a true imbalance in the other, giving one the
impression that cyclophoria exists, as explained in a following
chapter.
Having now employed the three methods, the refractionist can
readily understand that a marked percentage of muscular imbalance
cases may be directly benefited through the aid of the Ski-optometer.
If these three methods of procedure fail, there is nothing left but the
fourth and last method—that of operative procedure.
Chapter XII
A CONDENSATION OF PREVIOUS CHAPTERS
ON THE PROCEDURE FOR MUSCLE TESTING
WITH THE SKI-OPTOMETER

T he present chapter, intended for those desiring a synopsis or


condensed summary of muscular imbalance work, should prove
of the utmost assistance to the busy refractionist. Muscular
imbalance work can be successfully conducted if the following
routine is studied and memorized, with the Ski-optometer constantly
before the reader. The chapters containing the corresponding figures
and diagrams or illustrations will then be readily comprehended. It is
also important to carefully note the captions under each diagram.
1. Without any testing lenses before patient’s eyes, direct
attention to a 20-foot distant muscle testing spot of light (Fig. 9).
2. Place phorometer handle vertically (Fig. 16).
Place red Maddox rod vertically (Fig. 15). Patient should see a
white spot of light, and a red horizontal streak (Fig. 17).
Simply turn phorometer handle until horizontal streak bisects
white spot of light. Pointer then indicates amount of deviation on red
scale. Ignore cases less than 1° hyperphoria, whether right or left
designated by (R. H.—L. H.).
3. Place phorometer handle horizontally (Fig. 19).
Place red Maddox rod horizontally (Fig. 18). Patient should see a
white spot of light and a vertical red streak (Fig. 20).
Simply turn phorometer handle until red streak bisects spot of
light. Pointer indicates amount of deviation on white scale, whether
esophoria or exophoria designated by (Es—Ex).
4. Ignore all exophoria cases, less than 3°.
Ignore all esophoria cases, less than 5°—except in children,
ignore less than 3° of esophoria.
5. Always make the above or binocular muscle test—with
phorometer and red Maddox before optical correction or (test for
spheres and cylinders) and again after optical correction where case
shows more than 1-3-5 rule, to determine whether muscles are
aggravated or benefited.
6. In cases showing more than the 1-3-5 rule, shown in above
No. 4, make monocular duction test first with rotary prism before
patient’s right eye,—then with rotary prism before left eye to find
faulty muscle and determine which eye is affected.
7. To test adduction, prism base out is required. Rotary prism’s
red line or indicator should be rotated from zero outwardly. To test
abduction, base in is required. Indicator should be rotated inwardly
from zero (Fig. 22). Power of adduction as compared with abduction,
is normally 3 to 1—usually rated 24 to 8.
8. To test superduction, base down is required. Rotary prism’s
line or indicator should be rotated downward from zero. To test
subduction, base up is required. Indicator should be rotated upward
from zero. Power of superduction as compared with subduction, is
normally equal—usually rated 2 for each (Fig. 23).
9. Direct patient’s attention to largest letter on distant chart,
usually letter “E,” rotating red line indicator of rotary prism outlined in
above No. 7 and No. 8, until diplopia is first procured.
10. The use of a duction chart on a record card, quickly
designates pull for each of four muscles (Fig. 24), illustrating an
assumed case of—

1st—6D of Exophoria.
2nd—18° adduction (which must be developed to 24°).
3rd—Patient has a left weak internus.
11. Employ First Method—Optical Correction—to effect
treatment.
12. Assuming a case of a child with 6° of esophoria—8° of right
abduction and 2° left abduction indicating a left weak externus,
prescribe a quarter diopter increased plus spherical power for each
degree of imbalance, thus adding +1.50D spherical to optical
correction. This is the first method of treatment. This requires a
thorough reading of Chapter IX on Treatment for Correcting
Esophoria in Children and a careful study of the formula. For
synopsis see Page 74.

Four Methods of Treating an Imbalance Case


When the Preceding One Fails
1st—Optical correction;
2nd—Muscular exercise or treatment;
75% are Curable with First and Second Methods.
3rd—Prisms;
5% are Curable with Third Method.
4th—Operation;
20% are Curable with Fourth Method.
13. When first method of treatment fails, Employ Second
Method—Muscular Exercise—to effect treatment.
1st—Find degree of prism patient will accept to produce single
binocular vision with optical correction on, placing both rotary prisms
in position, handles horizontal, red line on 30° of temporal scale of
each, giving total value to 60° (Fig. 26a and b).
2nd—Also place red Maddox rod before patient’s eye (rods
horizontal) (Fig. 18), calling patient’s attention to usual muscle
testing spot of light.
3rd—Reduce prism before good eye until red streak appears,
noting degree (which we assume shows 42° the combined total
value of both prisms) slowly continue to decrease prism until streak
bisects spot. Assume this shows total of 38°. Either side of 38° in
excess of 4° (38 to 42) produces diplopia. Prisms must only be
rotated from 38° to 42° back to 38° over to 34°—back to 38° over to
42°—back again to 38° and so on—exercise to be continued daily
ten times for five minutes (Fig. 28).
4th—At end of each week, duction test should again be made.
Duction chart should show a tendency to reduce exophoria by a
gradual building up of adduction, approximately one week is usually
sufficient to teach patient to hold streak within the spot (between 38°
and 42°). Exercise to be continued until both prisms are worked
down to zero. Exercise tends to teach patient how to establish same
image on each fovea or retina at same time.
5th—If patient is unable to call daily for treatment, employ home
treatment. (Read “Home Treatment for Muscular Exercising,” Page
82).
Employ Third Method—Use of Prisms for Constant Wear to
effect treatment.

Prisms
1st. Where a case cannot be reduced through use of first two
methods, as for example in a case of 6° of exophoria, prescribe ¼ of
amount of imbalance (¼ × 6 = 1½°) for each eye—base in—or
esophoria base out, hyperphoria base up on eye affected.
2nd. Advise patient to call every three months and make duction
test (Fig. 24). If no improvement in condition, after wearing prisms
six months, operative means is suggested.
Assume a case is benefited, reduce prism power according to
rule; ¼D prism for each degree of imbalance.

Cyclophoria
This work being of a technical nature, it is deemed best for the
reader to study Chapter XIII and XIV.
Chapter XIII
CYCLOPHORIA

Made with Maddox Rods


and Rotary Prisms

C yclophoria, a condition affecting the oblique muscles of the


eye, is caused by its rotation. It is detected in the following
manner by the combined use of the red and white Maddox rods
and the rotary prism.
Fig. 29—Position of rotary prism for
producing diplopia in testing cyclophoria
with prism placed at 8° base up.
Darken the room and direct the patient’s attention to the usual
muscle-testing spot of light, located approximately twenty feet away
and on a direct plane with the patient’s eye. The optical correction, if
one is required, should always be left in place—just as in making
other previously described muscle tests.
The rotary prism should then be brought before the patient’s right
eye with the handle-pointing upward and with zero graduations
horizontal. The indicator or red line should then be rotated upward
from zero to eight upon the prism scale, creating the equivalent of a
prism of 8 diopters with base up (Fig. 29). This normally caused

You might also like