Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python
Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python
Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python
Ebook558 pages9 hours

Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Add a touch of data analytics to your healthcare systems and get insightful outcomes




Key Features



  • Perform healthcare analytics with Python and SQL


  • Build predictive models on real healthcare data with pandas and scikit-learn


  • Use analytics to improve healthcare performance



Book Description



In recent years, machine learning technologies and analytics have been widely utilized across the healthcare sector. Healthcare Analytics Made Simple bridges the gap between practising doctors and data scientists. It equips the data scientists' work with healthcare data and allows them to gain better insight from this data in order to improve healthcare outcomes.






This book is a complete overview of machine learning for healthcare analytics, briefly describing the current healthcare landscape, machine learning algorithms, and Python and SQL programming languages. The step-by-step instructions teach you how to obtain real healthcare data and perform descriptive, predictive, and prescriptive analytics using popular Python packages such as pandas and scikit-learn. The latest research results in disease detection and healthcare image analysis are reviewed.






By the end of this book, you will understand how to use Python for healthcare data analysis, how to import, collect, clean, and refine data from electronic health record (EHR) surveys, and how to make predictive models with this data through real-world algorithms and code examples.




What you will learn



  • Gain valuable insight into healthcare incentives, finances, and legislation


  • Discover the connection between machine learning and healthcare processes


  • Use SQL and Python to analyze data


  • Measure healthcare quality and provider performance


  • Identify features and attributes to build successful healthcare models


  • Build predictive models using real-world healthcare data


  • Become an expert in predictive modeling with structured clinical data


  • See what lies ahead for healthcare analytics



Who this book is for



Healthcare Analytics Made Simple is for you if you are a developer who has a working knowledge of Python or a related programming language, although you are new to healthcare or predictive modeling with healthcare data. Clinicians interested in analytics and healthcare computing will also benefit from this book. This book can also serve as a textbook for students enrolled in an introductory course on machine learning for healthcare.

LanguageEnglish
Release dateJul 31, 2018
ISBN9781787283220
Healthcare Analytics Made Simple: Techniques in healthcare computing using machine learning and Python

Related to Healthcare Analytics Made Simple

Related ebooks

Computers For You

View More

Related articles

Reviews for Healthcare Analytics Made Simple

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Healthcare Analytics Made Simple - Vikas (Vik) Kumar

    Healthcare Analytics Made Simple

    Healthcare Analytics

    Made Simple

    Techniques in healthcare computing using machine learning and Python

    Vikas (Vik) Kumar

    BIRMINGHAM - MUMBAI

    Healthcare Analytics Made Simple

    Copyright © 2018 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Commissioning Editor: Veena Pagare

    Acquisition Editor: Divya Poojari

    Content Development Editor: Eisha Dsouza

    Technical Editor: Sneha Hanchate

    Copy Editor: Safis

    Project Coordinator: Namrata Swetta

    Proofreader: Safis Editing

    Indexer: Rekha Nair

    Graphics: Jisha Chirayil

    Production Coordinator: Shantanu Zagade

    First published: July 2018

    Production reference: 1280718

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-78728-670-2

    www.packtpub.com

    To my parents, Viren and Sarita; my sister, Monica; and Tuly, my 2018 Person of the Year.

    mapt.io

    Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

    Why subscribe?

    Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

    Improve your learning with Skill Plans built especially for you

    Get a free eBook or video every month

    Mapt is fully searchable

    Copy and paste, print, and bookmark content

    PacktPub.com

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

    Foreword

    Analytics is now an integral part of healthcare. It helps to optimize treatments, improve outcomes, and the reduce the overall cost of care. The availability of biomedical, healthcare, and operational big data enables hospitals and health systems to leverage past data to predict the future of patients and their clinical pathways. Predictive modeling and healthcare data science also help to design care pathways and operational strategies that could help in streamlining various aspects of healthcare delivery. However, healthcare analytics is an exciting field that requires skills in biomedicine, data science, and the technical stack, including databases, programming, data visualization, statistics, and machine learning. While there are several books with an in-depth account of the healthcare space and analytics tools and methods, there not many easy-to-read books that integrate these things together.

    In his new and exciting book, Dr. Vikas Kumar (Vik) has now blended the critical learning points of healthcare and computer science with mathematics and machine learning. Being a physician and a data scientist, Vik has done a tremendous job in compiling complex datasets and explaining several use cases in healthcare analytics with comprehensive code in MySQL and Python.

    I am sure that Healthcare Analytics Made Simple will be an important addition to the library of any data scientist who's interested in understanding the key concepts of biomedical and healthcare data. It will be an indispensable companion for readers from the domains of clinical informatics and health informatics to gain critical skills in the design, development, and validation of machine learning models. This book will also be useful for physicians and biomedical scientists who are interested in understanding the landscape of healthcare analytics. The book is a joy to read, and I enjoyed working through the examples. To conclude, Healthcare Analytics Made Simple is attempting to fill a gap in the field of healthcare analytics by providing a complete and comprehensive guide, resulting in an inter-disciplinary book that will be an easy read for computer scientists, software engineers, data scientists, and healthcare professionals alike.

    Dr. Shameer Khader, PhD

    Director of Healthcare Data Science and Bioinformatics

    Northwell Health, New York

    Contributors

    About the author

    Dr. Vikas (Vik) Kumar grew up in the United States in Niskayuna, New York. He earned his MD from the University of Pittsburgh, but shortly afterwards he discovered his true calling of computers and data science. He then earned his MS in the College of Computing at Georgia Institute of Technology and has subsequently worked as a data scientist for both healthcare and non-healthcare companies. He currently lives in Atlanta, Georgia.

    Thank you to Mark Braunstein, James Cheng, Shameer Khader, Bryant Menn, Srijita Mukherjee, and Bob Savage for their helpful comments on the book drafts.

    About the reviewer

    Seungjin Kim is currently a software engineer at Arcules, transforming video data into intelligence and providing a product based on distributed machine learning architecture. Previously, he was a software engineer at a genetic startup, providing a quality frontend user experience for patients accessing genetic products. He received his M.D. from the Medical School for International Health at the Ben-Gurion University of the Negev in Israel in 2015, and he received his B.S. in computer science and Engineering from the University of California in 2008.

    Packt is searching for authors like you

    If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

    Table of Contents

    Title Page

    Copyright and Credits

    Healthcare Analytics Made Simple

    Dedication

    Packt Upsell

    Why subscribe?

    PacktPub.com

    Foreword

    Contributors

    About the author

    About the reviewer

    Packt is searching for authors like you

    Preface

    Who this book is for

    What this book covers

    To get the most out of this book

    Download the example code files

    Download the color images

    Conventions used

    Get in touch

    Reviews

    Introduction to Healthcare Analytics

    What is healthcare analytics?

    Healthcare analytics uses advanced computing technology

    Healthcare analytics acts on the healthcare industry (DUH!)

    Healthcare analytics improves medical care

    Better outcomes

    Lower costs

    Ensure quality

    Foundations of healthcare analytics

    Healthcare

    Mathematics

    Computer science

    History of healthcare analytics

    Examples of healthcare analytics

    Using visualizations to elucidate patient care

    Predicting future diagnostic and treatment events

    Measuring provider quality and performance

    Patient-facing treatments for disease

    Exploring the software

    Anaconda

    Anaconda navigator

    Jupyter notebook

    Spyder IDE

    SQLite

    Command-line tools

    Installing a text editor

    Summary

    References

    Healthcare Foundations

    Healthcare delivery in the US

    Healthcare industry basics

    Healthcare financing

    Fee-for-service reimbursement

    Value-based care

    Healthcare policy

    Protecting patient privacy and patient rights

    Advancing the adoption of electronic medical records

    Promoting value-based care

    Advancing analytics in healthcare

    Patient data – the journey from patient to computer

    The history and physical (H&P)

    Metadata and chief complaint

    History of the present illness (HPI)

    Past medical history

    Medications

    Family history

    Social history

    Allergies

    Review of systems

    Physical examination

    Additional objective data (lab tests, imaging, and other diagnostic tests)

    Assessment and plan

    The progress (SOAP) clinical note

    Standardized clinical codesets

    International Classification of Disease (ICD)

    Current Procedural Terminology (CPT)

    Logical Observation Identifiers Names and Codes (LOINC)

    National Drug Code (NDC)

    Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT)

    Breaking down healthcare analytics

    Population

    Medical task

    Screening

    Diagnosis

    Outcome/Prognosis

    Response to treatment

    Data format

    Structured

    Unstructured

    Imaging

    Other data format

    Disease

    Acute versus chronic diseases

    Cancer

    Other diseases

    Putting it all together – specifying a use case

    Summary

    References and further reading

    Machine Learning Foundations

    Model frameworks for medical decision making

    Tree-like reasoning

    Categorical reasoning with algorithms and trees

    Corresponding machine learning algorithms – decision tree and random forest

    Probabilistic reasoning and Bayes theorem

    Using Bayes theorem for calculating clinical probabilities

    Calculating the baseline MI probability

    2 x 2 contingency table for chest pain and myocardial infarction

    Interpreting the contingency table and calculating sensitivity and specificity

    Calculating likelihood ratios for chest pain (+ and -)

    Calculating the post-test probability of MI given the presence of chest pain

    Corresponding machine learning algorithm – the Naive Bayes Classifier

    Criterion tables and the weighted sum approach

    Criterion tables

    Corresponding machine learning algorithms – linear and logistic regression

    Pattern association and neural networks

    Complex clinical reasoning

    Corresponding machine learning algorithm – neural networks and deep learning

    Machine learning pipeline

    Loading the data

    Cleaning and preprocessing the data

    Aggregating data

    Parsing data

    Converting types

    Dealing with missing data

    Exploring and visualizing the data

    Selecting features

    Training the model parameters

    Evaluating model performance

    Sensitivity (Sn)

    Specificity (Sp)

    Positive predictive value (PPV)

    Negative predictive value (NPV)

    False-positive rate (FPR)

    Accuracy (Acc)

    Receiver operating characteristic (ROC) curves

    Precision-recall curves

    Continuously valued target variables

    Summary

    References and further reading

    Computing Foundations – Databases

    Introduction to databases

    Data engineering with SQL – an example case

    Case details – predicting mortality for a cardiology practice

    The clinical database

    The PATIENT table

    The VISIT table

    The MEDICATIONS table

    The LABS table

    The VITALS table

    The MORT table

    Starting an SQLite session

    Data engineering, one table at a time with SQL

    Query Set #0 – creating the six tables

    Query Set #0a – creating the PATIENT table

    Query Set #0b – creating the VISIT table

    Query Set #0c – creating the MEDICATIONS table

    Query Set #0d – creating the LABS table

    Query Set #0e – creating the VITALS table

    Query Set #0f – creating the MORT table

    Query Set #0g – displaying our tables

    Query Set #1 – creating the MORT_FINAL table

    Query Set #2 – adding columns to MORT_FINAL

    Query Set #2a – adding columns using ALTER TABLE

    Query Set #2b – adding columns using JOIN

    Query Set #3 – date manipulation – calculating age

    Query Set #4 – binning and aggregating diagnoses

    Query Set #4a – binning diagnoses for CHF

    Query Set #4b – binning diagnoses for other diseases

    Query Set #4c – aggregating cardiac diagnoses using SUM

    Query Set #4d – aggregating cardiac diagnoses using COUNT

    Query Set #5 – counting medications

    Query Set #6 – binning abnormal lab results

    Query Set #7 – imputing missing variables

    Query Set #7a – imputing missing temperature values using normal-range imputation

    Query Set #7b – imputing missing temperature values using mean imputation

    Query Set #7c – imputing missing BNP values using a uniform distribution

    Query Set #8 – adding the target variable

    Query Set #9 – visualizing the MORT_FINAL_2 table

    Summary

    References and further reading

    Computing Foundations – Introduction to Python

    Variables and types

    Strings

    Numeric types

    Data structures and containers

    Lists

    Tuples

    Dictionaries

    Sets

    Programming in Python – an illustrative example

    Introduction to pandas

    What is a pandas DataFrame?

    Importing data

    Importing data into pandas from Python data structures

    Importing data into pandas from a flat file

    Importing data into pandas from a database

    Common operations on DataFrames

    Adding columns

    Adding blank or user-initialized columns

    Adding new columns by transforming existing columns

    Dropping columns

    Applying functions to multiple columns

    Combining DataFrames

    Converting DataFrame columns to lists

    Getting and setting DataFrame values

    Getting/setting values using label-based indexing with loc

    Getting/setting values using integer-based labeling with iloc

    Getting/setting multiple contiguous values using slicing

    Fast getting/setting of scalar values using at and iat

    Other operations

    Filtering rows using Boolean indexing

    Sorting rows

    SQL-like operations

    Getting aggregate row COUNTs

    Joining DataFrames

    Introduction to scikit-learn

    Sample data

    Data preprocessing

    One-hot encoding of categorical variables

    Scaling and centering

    Binarization

    Imputation

    Feature-selection

    Machine learning algorithms

    Generalized linear models

    Ensemble methods

    Additional machine learning algorithms

    Performance assessment

    Additional analytics libraries

    NumPy and SciPy

    matplotlib

    Summary

    Measuring Healthcare Quality

    Introduction to healthcare measures

    US Medicare value-based programs

    The Hospital Value-Based Purchasing (HVBP) program

    Domains and measures

    The clinical care domain

    The patient- and caregiver-centered experience of care domain

    Safety domain

    Efficiency and cost reduction domain

    The Hospital Readmission Reduction (HRR) program

    The Hospital-Acquired Conditions (HAC) program

    The healthcare-acquired infections domain

    The patient safety domain

    The End-Stage Renal Disease (ESRD) quality incentive program

    The Skilled Nursing Facility Value-Based Program (SNFVBP)

    The Home Health Value-Based Program (HHVBP)

    The Merit-Based Incentive Payment System (MIPS)

    Quality

    Advancing care information

    Improvement activities

    Cost

    Other value-based programs

    The Healthcare Effectiveness Data and Information Set (HEDIS)

    State measures

    Comparing dialysis facilities using Python

    Downloading the data

    Importing the data into your Jupyter Notebook session

    Exploring the data rows and columns

    Exploring the data geographically

    Displaying dialysis centers based on total performance

    Alternative analyses of dialysis centers

    Comparing hospitals

    Downloading the data

    Importing the data into your Jupyter Notebook session

    Exploring the tables

    Merging the HVBP tables

    Summary

    References

    Making Predictive Models in Healthcare

    Introduction to predictive analytics in healthcare

    Our modeling task – predicting discharge statuses for ED patients

    Obtaining the dataset

    The NHAMCS dataset at a glance

    Downloading the NHAMCS data

    Downloading the ED2013 file

    Downloading the list of survey items – body_namcsopd.pdf

    Downloading the documentation file – doc13_ed.pdf

    Starting a Jupyter session

    Importing the dataset

    Loading the metadata

    Loading the ED dataset

    Making the response variable

    Splitting the data into train and test sets

    Preprocessing the predictor variables

    Visit information

    Month

    Day of the week

    Arrival time

    Wait time

    Other visit information

    Demographic variables

    Age

    Sex

    Ethnicity and race

    Other demographic information

    Triage variables

    Financial variables

    Vital signs

    Temperature

    Pulse

    Respiratory rate

    Blood pressure

    Oxygen saturation

    Pain level

    Reason-for-visit codes

    Injury codes

    Diagnostic codes

    Medical history

    Tests

    Procedures

    Medication codes

    Provider information

    Disposition information

    Imputed columns

    Identifying variables

    Electronic medical record status columns

    Detailed medication information

    Miscellaneous information

    Final preprocessing steps

    One-hot encoding

    Numeric conversion

    NumPy array conversion

    Building the models

    Logistic regression

    Random forest

    Neural network

    Using the models to make predictions

    Improving our models

    Summary

    References and further reading

    Healthcare Predictive Models – A Review

    Predictive healthcare analytics – state of the art

    Overall cardiovascular risk

    The Framingham Risk Score

    Cardiovascular risk and machine learning

    Congestive heart failure

    Diagnosing CHF

    CHF detection with machine learning

    Other applications of machine learning in CHF

    Cancer

    What is cancer?

    ML applications for cancer

    Important features of cancer

    Routine clinical data

    Cancer-specific clinical data

    Imaging data

    Genomic data

    Proteomic data

    An example – breast cancer prediction

    Traditional screening of breast cancer

    Breast cancer screening and machine learning

    Readmission prediction

    LACE and HOSPITAL scores

    Readmission modeling

    Other conditions and events

    Summary

    References and further reading

    The Future – Healthcare and Emerging Technologies

    Healthcare analytics and the internet

    Healthcare and the Internet of Things

    Healthcare analytics and social media

    Influenza surveillance and forecasting

    Predicting suicidality with machine learning

    Healthcare and deep learning

    What is deep learning, briefly?

    Deep learning in healthcare

    Deep feed-forward networks

    Convolutional neural networks for images

    Recurrent neural networks for sequences

    Obstacles, ethical issues, and limitations

    Obstacles

    Ethical issues

    Limitations

    Conclusion of this book

    References and further reading

    Other Books You May Enjoy

    Leave a review - let other readers know what you think

    Preface

    The functional aim of this book is to demonstrate how Python packages are used for data analysis; how to import, collect, clean, and refine data from Electronic Health Record (EHR) surveys; and how to make predictive models with this data, with the help of real-world examples.

    Who this book is for

    Healthcare Analytics Made Simple is for you if you are a developer who has a working knowledge of Python or a related programming language, even if you are new to healthcare or predictive modeling with healthcare data. Clinicians interested in analytics and healthcare computing will also benefit from this book. This book can also serve as a textbook for students enrolled on an introductory course on machine learning for healthcare.

    What this book covers

    Chapter 1, Introduction to Healthcare Analytics, provides a definition of healthcare analytics, lists some foundational topics, provides a history of the subject, gives some examples of healthcare analytics in action, and includes download, installation, and basic usage instructions for the software in this book.

    Chapter 2, Healthcare Foundations, consists of an overview of how healthcare is structured and delivered in the US, provides a background on legislation that's relevant to healthcare analytics, describes clinical patient data and clinical coding systems, and provides a breakdown of healthcare analytics.

    Chapter 3, Machine Learning Foundations, describes some of the model frameworks used for medical decision making and describes the machine learning pipeline, from data import to model evaluation.

    Chapter 4, Computing Foundations – Databases, provides an introduction to the SQL language and demonstrates the use of SQL in healthcare with a healthcare predictive analytics example.

    Chapter 5, Computing Foundations – Introduction to Python, gives a basic overview of Python and the libraries that are important for performing analytics. We discuss variable types, data structures, functions, and modules in Python. We also give an introduction to the pandas and scikit-learn libraries.

    Chapter 6, Measuring Healthcare Quality, describes the measures used in healthcare performance, gives an overview of value-based programs in the US, and demonstrates how to download and analyze provider-based data in Python.

    Chapter 7, Making Predictive Models in Healthcare, describes the information contained in a publicly available clinical dataset, including downloading instructions. We then demonstrate how to make predictive models with this data, using Python, pandas, and scikit-learn.

    Chapter 8, Healthcare Predictive Models – A Review, reviews some of the current progress being made in healthcare predictive analytics for select diseases and application areas by comparing machine learning results to those obtained by using traditional methods.

    Chapter 9, The Future – Healthcare and Emerging Technologies, discusses some of the advances being made in healthcare analytics through using the internet, introduces the reader to deep learning techniques in healthcare, and states some of the challenges and limitations facing healthcare analytics.

    To get the most out of this book

    Helpful things to know include the following:

    High school math, such as basic probability, statistics, and algebra

    Basic familiarity with a programming language and/or basic programming concepts

    Basic familiarity with healthcare and a working knowledge of some clinical terminology

    Please follow the instructions in Chapter 1, Introduction to Healthcare Analytics for setting up Anaconda and SQLite.

    Download the example code files

    You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

    You can download the code files by following these steps:

    Log in or register at www.packtpub.com.

    Select the SUPPORT tab.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box and follow the onscreen instructions.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR/7-Zip for Windows

    Zipeg/iZip/UnRarX for Mac

    7-Zip/PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Healthcare-Analytics-Made-Simple. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/HealthcareAnalyticsMadeSimple_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

    A block of code is set as follows:

    string_1 = '1'

    string_2 = '2'

    string_sum = string_1 + string_2

    print(string_sum)

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    test_split_string = 'Jones,Bill,49,Atlanta,GA,12345'

    output = test_split_string.split(',')

    print(output)

    Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text

    Enjoying the preview?
    Page 1 of 1