Building Statistical Models in Python: Develop useful models for regression, classification, time series, and survival analysis

Ebook761 pages5 hours

Building Statistical Models in Python: Develop useful models for regression, classification, time series, and survival analysis

Name: Building Statistical Models in Python: Develop useful models for regression, classification, time series, and survival analysis
Author: Huy Hoang Nguyen
ISBN: 9781804612156

By Huy Hoang Nguyen, Paul N Adams and Stuart J. Miller

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation.

This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more.

By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateAug 31, 2023

ISBN9781804612156

Author

Huy Hoang Nguyen

Related authors

Skip carousel

Related to Building Statistical Models in Python

Related ebooks

Skip carousel

Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning
Ebook
Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning
bySinan Ozdemir
Rating: 0 out of 5 stars
0 ratings
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
Ebook
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
byDr. Ranja Sarkar
Rating: 0 out of 5 stars
0 ratings
Causal Inference in R: Decipher complex relationships with advanced R techniques for data-driven decision-making
Ebook
Causal Inference in R: Decipher complex relationships with advanced R techniques for data-driven decision-making
bySubhajit Das
Rating: 0 out of 5 stars
0 ratings
R Machine Learning Essentials
Ebook
R Machine Learning Essentials
byUsuelli Michele
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data
Ebook
Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data
byBrett Lantz
Rating: 0 out of 5 stars
0 ratings
Essential Statistics for Non-STEM Data Analysts: Get to grips with the statistics and math knowledge needed to enter the world of data science with Python
Ebook
Essential Statistics for Non-STEM Data Analysts: Get to grips with the statistics and math knowledge needed to enter the world of data science with Python
byRongpeng Li
Rating: 0 out of 5 stars
0 ratings
Data Science Career Guide Interview Preparation
Ebook
Data Science Career Guide Interview Preparation
byGradient Publication
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
Ebook
Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
byAli Madani
Rating: 0 out of 5 stars
0 ratings
Basics of Data Analysis
Ebook
Basics of Data Analysis
bySam Campbell
Rating: 0 out of 5 stars
0 ratings
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
Ebook
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
byIrena Cronin
Rating: 0 out of 5 stars
0 ratings
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
Applied Analytics through Case Studies Using SAS and R: Implementing Predictive Models and Machine Learning Techniques
Ebook
Applied Analytics through Case Studies Using SAS and R: Implementing Predictive Models and Machine Learning Techniques
byDeepti Gupta
Rating: 0 out of 5 stars
0 ratings
Data Science for Beginners: Unlocking the Power of Data with Easy-to-Understand Concepts and Techniques. Part 3
Ebook
Data Science for Beginners: Unlocking the Power of Data with Easy-to-Understand Concepts and Techniques. Part 3
byTom Lesley
Rating: 0 out of 5 stars
0 ratings
Regression Analysis Guide: A Comprehensive Guide for Data Analysts and Researchers
Ebook
Regression Analysis Guide: A Comprehensive Guide for Data Analysts and Researchers
byDaniel Garfield
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Ultimate Enterprise Data Analysis and Forecasting using Python
Ebook
Ultimate Enterprise Data Analysis and Forecasting using Python
byShanthababu Pandian
Rating: 0 out of 5 stars
0 ratings
Data Analysis for Beginners: The ABCs of Data Analysis. An Easy-to-Understand Guide for Beginners
Ebook
Data Analysis for Beginners: The ABCs of Data Analysis. An Easy-to-Understand Guide for Beginners
byBrian Murray
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis: For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.
Ebook
Practical Data Analysis: For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
Ebook
Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
bySayan Putatunda
Rating: 0 out of 5 stars
0 ratings
Creating Good Data: A Guide to Dataset Structure and Data Representation
Ebook
Creating Good Data: A Guide to Dataset Structure and Data Representation
byHarry J. Foxwell
Rating: 0 out of 5 stars
0 ratings
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
Ebook
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
byDavid Hoyle
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science.: Understand, analyze, and predict data using Machine Learning concepts and tools
Ebook
Principles of Data Science.: Understand, analyze, and predict data using Machine Learning concepts and tools
bySunil Kakade
Rating: 0 out of 5 stars
0 ratings
Statistics for Machine Learning
Ebook
Statistics for Machine Learning
byPratap Dangeti
Rating: 3 out of 5 stars
3/5
Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data
Ebook
Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data
byNathan George
Rating: 0 out of 5 stars
0 ratings
Predictive Analytics
Ebook
Predictive Analytics
byConor Williams
Rating: 0 out of 5 stars
0 ratings
Python: Advanced Predictive Analytics: Gain practical insights by exploiting data in your business to build advanced predictive modeling applications
Ebook
Python: Advanced Predictive Analytics: Gain practical insights by exploiting data in your business to build advanced predictive modeling applications
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistical and Machine Learning Methods for Data Science
Ebook
Introduction to Statistical and Machine Learning Methods for Data Science
byCarlos Andre Reis Pinheiro
Rating: 0 out of 5 stars
0 ratings
Data Observability for Data Engineering: Proactive strategies for ensuring data accuracy and addressing broken data pipelines
Ebook
Data Observability for Data Engineering: Proactive strategies for ensuring data accuracy and addressing broken data pipelines
byMichele Pinto
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byT.C. Boyle
Rating: 5 out of 5 stars
5/5
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Learning the Chess Openings
Ebook
Learning the Chess Openings
byJef Kaan
Rating: 5 out of 5 stars
5/5
Tor and the Dark Art of Anonymity
Ebook
Tor and the Dark Art of Anonymity
byLance Henderson
Rating: 5 out of 5 stars
5/5
An Ultimate Guide to Kali Linux for Beginners
Ebook
An Ultimate Guide to Kali Linux for Beginners
byAnsh Goyal
Rating: 3 out of 5 stars
3/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
Ebook
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
byFlynn Fisher
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
UNLIMITED
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
byProduct Thinking
0 ratings
0% found this document useful
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
UNLIMITED
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
byBetter System Trader
0 ratings
0% found this document useful
Privacy Threat Modeling with DoorDash’s Nandita Rao Narla: Privacy threat modeling is a structured approach to identifying and assessing potential privacy risks associated with a particular system, application, or process. It involves analyzing how personal data flows through a system, identifying potential ...
UNLIMITED
Privacy Threat Modeling with DoorDash’s Nandita Rao Narla: Privacy threat modeling is a structured approach to identifying and assessing potential privacy risks associated with a particular system, application, or process. It involves analyzing how personal data flows through a system, identifying potential ...
byPartially Redacted: Data, AI, Security, and Privacy
0 ratings
0% found this document useful
The Ethics of Procedural Fidelity: Session 272 with Claire St. Peter: Whether one calls it Procedural Fidelity, Treatment Integrity, or any combination of those, and/or many other related terms, this is an important and often overlooked issue when it comes to implementing behavior analytic interventions. Think about it...
UNLIMITED
The Ethics of Procedural Fidelity: Session 272 with Claire St. Peter: Whether one calls it Procedural Fidelity, Treatment Integrity, or any combination of those, and/or many other related terms, this is an important and often overlooked issue when it comes to implementing behavior analytic interventions. Think about it...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
UNLIMITED
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
byThe Effective Statistician - in association with PSI
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
UNLIMITED
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
Ep 53: Unique Insights From People Analytics: Andrew Marritt talks to Matt Alder
UNLIMITED
Ep 53: Unique Insights From People Analytics: Andrew Marritt talks to Matt Alder
byRecruiting Future with Matt Alder
0 ratings
0% found this document useful
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
UNLIMITED
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
byCurious Minds at Work
0 ratings
0% found this document useful
559. Paul Gaspar: AI Project Case Study: Show Notes: In this episode of Unleashed, Paul Gaspar discusses his experience working with artificial intelligence at a major global insurance conglomerate in Japan. The company faced pressure to streamline operations and reduce costs within its auto...
UNLIMITED
559. Paul Gaspar: AI Project Case Study: Show Notes: In this episode of Unleashed, Paul Gaspar discusses his experience working with artificial intelligence at a major global insurance conglomerate in Japan. The company faced pressure to streamline operations and reduce costs within its auto...
byUnleashed - How to Thrive as an Independent Professional
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
UNLIMITED
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
Advancing Health Care with AI: Humana’s Slawek Kierner Talks Synthetic Data and Real Lives: Slawek Kierner, senior vice president of enterprise data and analytics at Humana, has been immersed in data for as long as he can remember. His fascination with process simulations began on his first PC running MATLAB and Sumulink, and later led him...
UNLIMITED
Advancing Health Care with AI: Humana’s Slawek Kierner Talks Synthetic Data and Real Lives: Slawek Kierner, senior vice president of enterprise data and analytics at Humana, has been immersed in data for as long as he can remember. His fascination with process simulations began on his first PC running MATLAB and Sumulink, and later led him...
byMe, Myself, and AI
0 ratings
0% found this document useful
AI Access and Inclusivity as a Technical Challenge with Prem Natarajan - #658
UNLIMITED
AI Access and Inclusivity as a Technical Challenge with Prem Natarajan - #658
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
UNLIMITED
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently...
UNLIMITED
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently...
byPapers Read on AI
0 ratings
0% found this document useful
Ep 631: Science, AI & Assessment: Dr Charles Handler, President & Founder of Rocket-Hire, talks to Matt Alder
UNLIMITED
Ep 631: Science, AI & Assessment: Dr Charles Handler, President & Founder of Rocket-Hire, talks to Matt Alder
byRecruiting Future with Matt Alder
0 ratings
0% found this document useful
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
UNLIMITED
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
byEveryday AI Podcast – An AI and ChatGPT Podcast
0 ratings
0% found this document useful
OpenTable’s Grant Parsamyan on How Data and Analytics is Helping the Restaurant Industry Rebound from COVID-19: Grant Parsamyan, Senior Vice President of Data and Analytics at OpenTable, discusses how the team is using data to capture a 360-degree view of the restaurant industry and why organization-wide data literacy is fundamental for success.
UNLIMITED
OpenTable’s Grant Parsamyan on How Data and Analytics is Helping the Restaurant Industry Rebound from COVID-19: Grant Parsamyan, Senior Vice President of Data and Analytics at OpenTable, discusses how the team is using data to capture a 360-degree view of the restaurant industry and why organization-wide data literacy is fundamental for success.
byThe Data Chief
0 ratings
0% found this document useful
HRD & Data Analytics
UNLIMITED
HRD & Data Analytics
byHuman Resource Development Masterclass
0 ratings
0% found this document useful
Clinical Data Standards in Focus: SDTM Compliance with Sunil Gupta
UNLIMITED
Clinical Data Standards in Focus: SDTM Compliance with Sunil Gupta
byThe Life Science Rundown
0 ratings
0% found this document useful
53. Knowledge into Practice: Agile: In the second of our 3-part series from University of Maryland's 2018 Project Management symposium, we feature excerpts from presentations in the Agile track: Jeff Dalton on Agility for Leaders; Jason Dunn on Deciding When to Save or Shut Down a...
UNLIMITED
53. Knowledge into Practice: Agile: In the second of our 3-part series from University of Maryland's 2018 Project Management symposium, we feature excerpts from presentations in the Agile track: Jeff Dalton on Agility for Leaders; Jason Dunn on Deciding When to Save or Shut Down a...
byPM Point of View
0 ratings
0% found this document useful
548. Adam Braff, Business Analytics Diagnostic: Show Notes: The Umbrex Business Analytics Diagnostic Guide that is discussed in this episode can be downloaded at no cost here: In this episode of Unleashed, Will Bachman and Adam Braff discuss the creation of a data analytics diagnostic guide. Adam,...
UNLIMITED
548. Adam Braff, Business Analytics Diagnostic: Show Notes: The Umbrex Business Analytics Diagnostic Guide that is discussed in this episode can be downloaded at no cost here: In this episode of Unleashed, Will Bachman and Adam Braff discuss the creation of a data analytics diagnostic guide. Adam,...
byUnleashed - How to Thrive as an Independent Professional
0 ratings
0% found this document useful
Machine Learning, Business Success – Charles Martin, PhD, Data Scientist, Machine Learning AI Consultant, and Chief Scientist at Calculation Consulting – Rapidly Evolving Opportunities For Business Via Machine Learning and Data Science: Charles Martin, PhD, data scientist, machine learning AI consultant, and chief scientist at Calculation Consulting, delivers a thorough overview of the technologies that are helping companies expand their customer base and increase revenue. Martin is...
UNLIMITED
Machine Learning, Business Success – Charles Martin, PhD, Data Scientist, Machine Learning AI Consultant, and Chief Scientist at Calculation Consulting – Rapidly Evolving Opportunities For Business Via Machine Learning and Data Science: Charles Martin, PhD, data scientist, machine learning AI consultant, and chief scientist at Calculation Consulting, delivers a thorough overview of the technologies that are helping companies expand their customer base and increase revenue. Martin is...
byFinding Genius Podcast
0 ratings
0% found this document useful
#26 Spreadsheets in Data Science
UNLIMITED
#26 Spreadsheets in Data Science
byDataFramed
0 ratings
0% found this document useful
From Data to Action: Revolutionizing HEDIS Analytics
UNLIMITED
From Data to Action: Revolutionizing HEDIS Analytics
byCurrent Trends For Payers
0 ratings
0% found this document useful
134 - Unpacking Programmatic Sampling with JD Deitch: Challenges, Innovation, and the Future of Research
UNLIMITED
134 - Unpacking Programmatic Sampling with JD Deitch: Challenges, Innovation, and the Future of Research
byGreenbook Podcast
0 ratings
0% found this document useful
Schneider Electric's Gustavo Canton on optimizing human potential with AI and automation: Joining Cindi today is Gustavo Canton, VP of People Analytics at Schneider Electric. Gustavo has been with Schneider Electric for nearly three years, where he’s been a change agent driving innovation across Schneider’s growing data initiatives. In today's episode, Cindi and Gustavo discuss using automation to augment (not replace) humans, rebranding HR, scaling pandemic adaptation, the impact of staff well-being on performance, intuiting what customers really need, and team-building. Key Takeaways: People aren't machines (and vice versa). Organize the structure so people are doing the work that only humans can do, and allow algorithms to take care of repetitive, mundane, report-driven tasks. Monitor employee well-being and the impact on performance. We now have the technology to monitor and adjust the variables -- such as workplace air quality -- that contribute to (or detract from) staff well-be
UNLIMITED
Schneider Electric's Gustavo Canton on optimizing human potential with AI and automation: Joining Cindi today is Gustavo Canton, VP of People Analytics at Schneider Electric. Gustavo has been with Schneider Electric for nearly three years, where he’s been a change agent driving innovation across Schneider’s growing data initiatives. In today's episode, Cindi and Gustavo discuss using automation to augment (not replace) humans, rebranding HR, scaling pandemic adaptation, the impact of staff well-being on performance, intuiting what customers really need, and team-building. Key Takeaways: People aren't machines (and vice versa). Organize the structure so people are doing the work that only humans can do, and allow algorithms to take care of repetitive, mundane, report-driven tasks. Monitor employee well-being and the impact on performance. We now have the technology to monitor and adjust the variables -- such as workplace air quality -- that contribute to (or detract from) staff well-be
byThe Data Chief
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
UNLIMITED
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
119: Adam Greco: The Future of event-based web analytics and the overlapping landscape of data tools
UNLIMITED
119: Adam Greco: The Future of event-based web analytics and the overlapping landscape of data tools
byHumans of Martech
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
UNLIMITED
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
Similarities and Differences between ML and Analytics - Rishabh Bhargava
UNLIMITED
Similarities and Differences between ML and Analytics - Rishabh Bhargava
byDataTalks.Club
0 ratings
0% found this document useful

Skip carousel

AI Revolutionaries Are At The Gates Of Your Organization's HR Department
The European Business Review
UNLIMITED
AI Revolutionaries Are At The Gates Of Your Organization's HR Department
Dec 2, 2022
5 min read
Q&A
Rotman Management
UNLIMITED
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
NZ Marketing
UNLIMITED
How Mature Is Your Organisation With Regards To Digital And Web Analytics?
Jun 9, 2021
1 min read
How To Make Sense From And With AI ?
The European Business Review
UNLIMITED
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
UNLIMITED
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
UNLIMITED
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
PEOPLE ASSESSMENT in the Digital Age
The European Business Review
UNLIMITED
PEOPLE ASSESSMENT in the Digital Age
May 25, 2021
8 min read
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Finweek - English
UNLIMITED
Machine Learning And Investing: The Cautious Seldom Err Or Write Great Poetry
Oct 18, 2019
5 min read
Interviewing With Bots
Finweek - English
UNLIMITED
Interviewing With Bots
Oct 8, 2021
imagine that your next job interview is with an artificial intelligence (AI) recruiting platform. It is a virtual meeting and the computer-generated person on your screen looks as life-like as you could imagine. It displays all the emotions and facia
3 min read
Better Together: Behavioural Science + Data Science
Rotman Management
UNLIMITED
Better Together: Behavioural Science + Data Science
May 1, 2020
IMAGINE THIS SCENARIO: You are designing a new customer experience to drive a shift in customer behaviour. You have reviewed the reports and dashboards describing current behaviour. You have asked customers how they felt and incorporated their feedba
5 min read
The Infrastructure of an AI Factory
Techfastly
UNLIMITED
The Infrastructure of an AI Factory
Mar 3, 2021
Data is a crucial element for machine learning algorithms. It can be considered as a fuel of AI factories. Collection of useful data and feeding it into frameworks and models is the foremost step. Data acts as a case or example that the algorithms re
1 min read
The Democratization of Judgment
Rotman Management
UNLIMITED
The Democratization of Judgment
Jan 1, 2018
8 min read
Fact-check And Verify Information
Post South Africa
UNLIMITED
Fact-check And Verify Information
Mar 13, 2024
Q: What is AI? A: AI is the acronym for artificial intelligence (AI) and refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-maki
3 min read
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Rotman Management
UNLIMITED
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Jan 1, 2018
You believe that the world of leadership has hit an inflection point. How so? As useful as popular mental models and heuristics are, machine models now outstrip human performance in about half of the portfolio of cognitive tasks. Going forward, we wi
6 min read
Putting Artificial Intelligence to Work
Rotman Management
UNLIMITED
Putting Artificial Intelligence to Work
May 1, 2018
11 min read
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
Techfastly
UNLIMITED
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
May 3, 2021
5 min read
Quantum Leap
Marketing
UNLIMITED
Quantum Leap
Jul 11, 2019
6 min read
Measuring Attribution: What’s Working?
NZ Marketing
UNLIMITED
Measuring Attribution: What’s Working?
Sep 21, 2022
9 min read
Are You Making the Most of Your Data?
Rotman Management
UNLIMITED
Are You Making the Most of Your Data?
Jan 1, 2023
8 min read
Jobs Of The Future
True Love
UNLIMITED
Jobs Of The Future
Jan 26, 2023
5 min read
Managing People's Talent With Artificial Intelligence
The European Business Review
UNLIMITED
Managing People's Talent With Artificial Intelligence
Aug 1, 2022
23 min read
WHAT EVERY MANAGER SHOULD KNOW ABOUT HUMAN-CENTERED AI: A Manager’s Introduction to Human-Centered Artificial Intelligence
The European Business Review
UNLIMITED
WHAT EVERY MANAGER SHOULD KNOW ABOUT HUMAN-CENTERED AI: A Manager’s Introduction to Human-Centered Artificial Intelligence
Dec 3, 2019
9 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
UNLIMITED
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
STAT
UNLIMITED
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
Dec 6, 2018
Artificial intelligence is at the forefront of the minds of many pharmaceutical and health care executives. Is it hype, or the future?
4 min read
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
NZBusiness and Management
UNLIMITED
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
Mar 26, 2019
3 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
UNLIMITED
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
The Era of Human + Machine Innovation
Rotman Management
UNLIMITED
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
Deconstructing Management Analytics
Rotman Management
UNLIMITED
Deconstructing Management Analytics
Sep 1, 2022
7 min read
The Path to Future-Ready Operations
Rotman Management
UNLIMITED
The Path to Future-Ready Operations
Sep 1, 2022
As we move forward and the pandemic recedes, leaders must ask a fundamental question: What state are our business operations in? We wanted to better understand the connection between business operations maturity and performance. So in 2020, we survey
3 min read
Essentially There Many Fundamental Questions
The European Business Review
UNLIMITED
Essentially There Many Fundamental Questions
May 25, 2021
1 What we are trying to assess? The answer appears to be no: selectors are still interested in an individual’s ability, personality and motivation as well as their integrity and health. Whilst new concepts appear every so often (e.g agility, resilien
1 min read

Related categories

Skip carousel

Reviews for Building Statistical Models in Python

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Building Statistical Models in Python - Huy Hoang Nguyen

Cover.png

BIRMINGHAM—MUMBAI

Building Statistical Models in Python

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Ali Abidi

Publishing Product Manager: Sanjana Gupta

Senior Editor: Sushma Reddy

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Book Project Manager: Kirti Pisat

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Hemangini Bari

Production Designer: Prashant Ghare

Marketing Coordinator: Nivedita Singh

First published: August 2023

Production reference: 3310823

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul's Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80461-428-0

www.packtpub.com

To my parents, Thieu and Tang, for their enormous support and faith in me.

To my wife, Tam, for her endless love, dedication, and courage.

- Huy Hoang Nguyen

To my daughter, Lydie, for demonstrating how work and dedication regenerate inspiration and creativity. To my wife, Helene, for her love and support.

– Paul Adams

To my partner, Kate, who has always supported my endeavors.

– Stuart Miller

Contributors

About the authors

Huy Hoang Nguyen is a mathematician and data scientist with extensive experience in advanced mathematics, strategic leadership, and applied machine learning research. He holds a PhD in Mathematics, as well as two Master’s degrees in Applied Mathematics and Data Science. His previous work focused on Partial Differential Equations, Functional Analysis, and their applications in Fluid Mechanics. After transitioning from academia to the healthcare industry, he has undertaken a variety of data science projects, ranging from traditional machine learning to deep learning.

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering, and classification. Paul holds an MSc in Data Science from Southern Methodist University.

Stuart Miller is a Machine Learning Engineer with a wide range of experience. Stuart has applied machine learning methods to various projects in industries ranging from insurance to semiconductor manufacturing. Stuart holds degrees in data science, electrical engineering, and physics.

About the reviewers

Krishnan Raghavan is an IT Professional with over 20+ years of experience in software development and delivery excellence across multiple domains and technology ranging from C++ to Java, Python, Data Warehousing, and Big Data tools and technologies.

When not working, Krishnan likes to spend time with his wife and daughter, reading fiction and nonfiction as well as technical books. Krishnan tries to give back to the community by being part of the GDG Pune Volunteer Group, helping the team organize events. Currently, he is unsuccessfully trying to learn how to play the guitar.

You can connect with Krishnan at or via LinkedIn: .

I would like to thank my wife Anita and daughter Ananya for giving me the time and space to review this book.

Karthik Dulam is a Principal Data Scientist at EDB. He is passionate about all things data with a particular focus on data engineering, statistical modeling, and machine learning. He has a diverse background delivering machine learning solutions for the healthcare, IT, automotive, telecom, tax, and advisory industries. He actively engages with students as a guest speaker at esteemed universities delivering insightful talks on machine learning use cases.

I would like to thank my wife, Sruthi Anem, for her unwavering support and patience. I also want to thank my family, friends, and colleagues who have played an instrumental role in shaping the person I am today. Their unwavering support, encouragement, and belief in me have been a constant source of inspiration.

Table of Contents

Preface

Part 1: Introduction to Statistics

Sampling and Generalization

Software and environment setup

Population versus sample

Population inference from samples

Randomized experiments

Observational study

Sampling strategies – random, systematic, stratified, and clustering

Probability sampling

Non-probability sampling

Summary

Distributions of Data

Technical requirements

Understanding data types

Nominal data

Ordinal data

Interval data

Ratio data

Visualizing data types

Measuring and describing distributions

Measuring central tendency

Measuring variability

Measuring shape

The normal distribution and central limit theorem

The Central Limit Theorem

Bootstrapping

Confidence intervals

Standard error

Correlation coefficients (Pearson’s correlation)

Permutations

Permutations and combinations

Permutation testing

Transformations

Summary

References

Hypothesis Testing

The goal of hypothesis testing

Overview of a hypothesis test for the mean

Scope of inference

Hypothesis test steps

Type I and Type II errors

Type I errors

Type II errors

Basics of the z-test – the z-score, z-statistic, critical values, and p-values

The z-score and z-statistic

A z-test for means

z-test for proportions

Power analysis for a two-population pooled z-test

Summary

Parametric Tests

Assumptions of parametric tests

Normally distributed population data

Equal population variance

T-test – a parametric hypothesis test

T-test for means

Two-sample t-test – pooled t-test

Two-sample t-test – Welch’s t-test

Paired t-test

Tests with more than two groups and ANOVA

Multiple tests for significance

ANOVA

Pearson’s correlation coefficient

Power analysis examples

Summary

References

Non-Parametric Tests

When parametric test assumptions are violated

Permutation tests

The Rank-Sum test

The test statistic procedure

Normal approximation

Rank-Sum example

The Signed-Rank test

The Kruskal-Wallis test

Chi-square distribution

Chi-square goodness-of-fit

Chi-square test of independence

Chi-square goodness-of-fit test power analysis

Spearman’s rank correlation coefficient

Summary

Part 2: Regression Models

Simple Linear Regression

Simple linear regression using OLS

Coefficients of correlation and determination

Coefficients of correlation

Coefficients of determination

Required model assumptions

A linear relationship between the variables

Normality of the residuals

Homoscedasticity of the residuals

Sample independence

Testing for significance and validating models

Model validation

Summary

Multiple Linear Regression

Multiple linear regression

Adding categorical variables

Evaluating model fit

Interpreting the results

Feature selection

Statistical methods for feature selection

Performance-based methods for feature selection

Recursive feature elimination

Shrinkage methods

Ridge regression

LASSO regression

Elastic Net

Dimension reduction

PCA – a hands-on introduction

PCR – a hands-on salary prediction study

Summary

Part 3: Classification Models

Discrete Models

Probit and logit models

Multinomial logit model

Poisson model

The Poisson distribution

Modeling count data

The negative binomial regression model

Negative binomial distribution

Summary

Discriminant Analysis

Bayes’ theorem

Probability

Conditional probability

Discussing Bayes’ Theorem

Linear Discriminant Analysis

Supervised dimension reduction

Quadratic Discriminant Analysis

Summary

Part 4: Time Series Models

Introduction to Time Series

What is a time series?

Goals of time series analysis

Statistical measurements

Mean

Variance

Autocorrelation

Cross-correlation

The white-noise model

Stationarity

Summary

References

ARIMA Models

Technical requirements

Models for stationary time series

Autoregressive (AR) models

Moving average (MA) models

Autoregressive moving average (ARMA) models

Models for non-stationary time series

ARIMA models

Seasonal ARIMA models

More on model evaluation

Summary

References

Multivariate Time Series

Multivariate time series

Time-series cross-correlation

ARIMAX

Preprocessing the exogenous variables

Fitting the model

Assessing model performance

VAR modeling

Step 1 – visual inspection

Step 2 – selecting the order of AR(p)

Step 3 – assessing cross-correlation

Step 4 – building the VAR(p,q) model

Step 5 – testing the forecast

Step 6 – building the forecast

Summary

References

Part 5: Survival Analysis

Time-to-Event Variables – An Introduction

What is censoring?

Left censoring

Right censoring

Interval censoring

Type I and Type II censoring

Survival data

Survival Function, Hazard and Hazard Ratio

Summary

Survival Models

Technical requirements

Kaplan-Meier model

Model definition

Model example

Exponential model

Model example

Cox Proportional Hazards regression model

Step 1

Step 2

Step 3

Step 4

Step 5

Summary

Index

Other Books You May Enjoy

Preface

Statistics is a discipline of study used for applying analytical methods to answer questions and solve problems using data, in both academic and industry settings. Many methods have been around for centuries, while others are much more recent. Statistical analysis and results are fairly straightforward for presenting to both technical and non-technical audiences. Furthermore, producing results with statistical analysis does not necessarily require large amounts of data or compute resources and can be done fairly quickly, especially when using programming languages such as Python, which is moderately easy to work with and implement.

While artificial intelligence (AI) and advanced machine learning (ML) tools have become more prominent and popular over recent years with the increase of accessibility in compute power, performing statistical analysis as a precursor to developing larger-scale projects using AI and ML can enable a practitioner to assess feasibility and practicality before using larger compute resources and project architecture development for those types of projects.

This book provides a wide variety of tools that are commonly used to test hypotheses and provide basic predictive capabilities to analysts and data scientists alike. The reader will walk through the basic concepts and terminology required for understanding the statistical tools in this book prior to exploring the different tests and conditions under which they are applicable. Further, the reader will gain knowledge for assessing the performance of the tests. Throughout, examples will be provided in the Python programming language to get readers started understanding their data using the tools presented, which will be applicable to some of the most common questions faced in the data analytics industry. The topics we will walk through include:

An introduction to statistics

Regression models

Classification models

Time series models

Survival analysis

Understanding the tools provided in these sections will provide the reader with a firm foundation from which further independent growth in the statistics domain can more easily be achieved.

Who this book is for

Professionals in most industries can benefit from the tools in this book. The tools provided are useful primarily at a higher level of inferential analysis, but can be applied to deeper levels depending on the industry in which the practitioner wishes to apply them. The target audiences of this book are:

Industry professionals with limited statistical or programming knowledge who would like to learn to use data for testing hypotheses they have in their business domain

Data analysts and scientists who wish to broaden their statistical knowledge and find a set of tools and their implementations for performing various data-oriented tasks

The ground-up approach of this book seeks to provide entry into the knowledge base for a wide audience and therefore should neither discourage novice-level practitioners nor exclude advanced-level practitioners from the benefits of the materials presented.

What this book covers

Chapter 1, Sampling and Generalization, describes the concepts of sampling and generalization. The discussion of sampling covers several common methods for sampling data from a population and discusses the implications for generalization. This chapter also discusses how to setup the software required for this book.

Chapter 2, Distributions of Data, provides a detailed introduction to types of data, common distributions used to describe data, and statistical measures. This chapter also covers common transformations used to change distributions.

Chapter 3, Hypothesis Testing, introduces the concept of statistical tests as a method for answering questions of interest. This chapter covers the steps to perform a test, the types of errors encountered in testing, and how to select power using the Z-test.

Chapter 4, Parametric Tests, further discusses statistical tests, providing detailed descriptions of common parametric statistical tests, the assumptions of parametric tests, and how to assess the validity of parametric tests. This chapter also introduces the concept of multiple tests and provides details on corrections for multiple tests.

Chapter 5, Non-parametric Tests, discuss how to perform statistical tests when the assumptions of parametric tests are violated with class of tests without assumptions called non-parametric tests.

Chapter 6, Simple Linear Regression, introduces the concept of a statistical model with the simple linear regression model. This chapter begins by discussing the theoretical foundations of simple linear regression and then discusses how to interpret the results of the model and assess the validity of the model.

Chapter 7, Multiple Linear Regression, builds on the previous chapter by extending the simple linear regression model into additional dimensions. This chapter also discusses issues that occur when modeling with multiple explanatory variables, including multicollinearity, feature selection, and dimension reduction.

Chapter 8, Discrete Models, introduces the concept of classification and develops a model for classifying variables into discrete levels of a categorical response variable. This chapter starts by developing the model binary classification and then extends the model to multivariate classification. Finally, the Poisson model and negative binomial models are covered.

Chapter 9, Discriminant Analysis, discusses several additional models for classification, including linear discriminant analysis and quadratic discriminant analysis. This chapter also introduces Bayes’ Theorem.

Chapter 10, Introduction to Time Series, introduces time series data, discussing the time series concept of autocorrelation and the statistical measures for time series. This chapter also introduces the white noise model and stationarity.

Chapter 11, ARIMA Models, discusses models for univariate models. This chapter starts by discussing models for stationary time series and then extends the discussion to non-stationary time series. Finally, this chapter provides a detailed discussion on model evaluation.

Chapter 12, Multivariate Time Series, builds on the previous two chapters by introducing the concept of a multivariate time series and extends ARIMA models to multiple explanatory variables. This chapter also discusses time series cross-correlation.

Chapter 13, Survival Analysis, introduces survival data, also called time-to-event data. This chapter discusses the concept of censoring and the impact of censoring survival data. Finally, the chapter discusses the survival function, hazard, and hazard ratio.

Chapter 14, Survival Models, building on the previous chapter, provides an overview of several models for survival data, including the Kaplan-Meier model, the Exponential model, and the Cox Proportional Hazards model.

To get the most out of this book

You will need access to download and install open-source code packages implemented in the Python programming language and accessible through PyPi.org or the Anaconda Python distribution. While a background in statistics is helpful, but not necessary, this book assumes you have a decent background in basic algebra. Each unit of this book is independent of the other units, but the chapters within each unit build upon each other. Thus, we advise you to begin each unit with that unit’s first chapter to understand the content.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Building-Statistical-Models-in-Python. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

A block of code is set as follows:

A = [3,5,4]

B = [43,41,56,78,54]

permutation_testing(A,B,n_iter=10000)

Any command-line input or output is written as follows:

pip install SomePackage

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Building Statistical Models in Python, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/978-1-80461-428-0

Submit your proof of purchase

That’s it! We’ll send your free PDF and other benefits to your email directly

Part 1:Introduction to Statistics

This part will cover the statistical concepts that are foundational to statistical modeling.

It includes the following chapters:

Chapter 1, Sampling and Generalization

Chapter 2, Distributions of Data

Chapter 3, Hypothesis Testing

Chapter 4, Parametric Tests

Chapter 5, Non-Parametric Tests

Sampling and Generalization

In this chapter, we will describe the concept of populations and sampling from populations, including some common strategies for sampling. The discussion of sampling will lead to a section that will describe generalization. Generalization will be discussed as it relates to using samples to make conclusions about their respective populations. When modeling for statistical inference, it is necessary to ensure that samples can be generalized to populations. We will provide an in-depth overview of this bridge through the subjects in this chapter.

We will cover the following main topics:

Software and environment setup

Population versus sample

Population inference from samples

Sampling strategies – random, systematic, and stratified

Software and environment setup

Python is one of the most popular programming languages for data science and machine learning thanks to the large open source community that has driven the development of these libraries. Python’s ease of use and flexible nature made it a prime candidate in the data science world, where experimentation and iteration are key features of the development cycle. While there are new languages in development for data science applications, such as Julia, Python currently remains the key language for data science due to its wide breadth of open source projects, supporting applications from statistical modeling to deep learning. We have chosen to use Python in this book due to its positioning as an important language for data science and its demand in the job market.

Python is available for all major operating systems: Microsoft Windows, macOS, and Linux. Additionally, the installer and documentation can be found at the official website: https://www.python.org/.

This book is written for Python version 3.8 (or higher). It is recommended that you use whatever recent version of Python that is available. It is not likely that the code found in this book will be compatible with Python 2.7, and most active libraries have already started dropping support for Python 2.7 since official support ended in 2020.

The libraries used in this book can be installed with the Python package manager, pip, which is part of the standard Python library in contemporary versions of Python. More information about pip can be found here: https://docs.python.org/3/installing/index.html. After pip is installed, packages can be installed using pip on the command line. Here is basic usage at a glance:

Install a new package using the latest version:

pip install SomePackage

Install the package with a specific version, version 2.1 in this example:

pip install SomePackage==2.1

A package that is already installed can be upgraded with the --upgrade flag:

pip install SomePackage –upgrade

In general, it is recommended to use Python virtual environments between projects and to keep project dependencies separate from system directories. Python provides a virtual environment utility, venv, which, like pip, is part of the standard library in contemporary versions of Python. Virtual environments allow you to create individual binaries of Python, where each binary of Python has its own set of installed dependencies. Using virtual environments can prevent package version issues and conflict when working on multiple Python projects. Details on setting up and using virtual environments can be found here: https://docs.python.org/3/library/venv.html.

While we recommend the use of Python and Python’s virtual environments for environment setups, a highly recommended alternative is Anaconda. Anaconda is a free (enterprise-ready) analytics-focused distribution of Python by Anaconda Inc. (previously Continuum Analytics). Anaconda distributions come with many of the core data science packages, common IDEs (such as Jupyter and Visual Studio Code), and a graphical user interface for managing environments. Anaconda can be installed using the installer found at the Anaconda website here: https://www.anaconda.com/products/distribution.

Anaconda comes with its own package manager, conda, which can be used to install new packages similarly to pip.

Install a new package using the latest version:

conda install SomePackage

Upgrade a package that is already installed:

conda upgrade SomePackage

Throughout this book, we will make use of several core libraries in the Python data science ecosystem, such as NumPy for array manipulations, pandas for higher-level data manipulations, and matplotlib for data visualization. The package versions used for this book are contained in the following list. Please ensure that the versions installed in your environment are equal to or greater than the versions listed. This will help ensure that the code examples run correctly:

statsmodels 0.13.2

Matplotlib 3.5.2

NumPy 1.23.0

SciPy 1.8.1

scikit-learn 1.1.1

pandas 1.4.3

The packages used for the code in this book are shown here in Figure 1.1. The __version__ method can be used to print the package version in code.

Figure 1.1 – Package versions used in this book

Figure 1.1 – Package versions used in this book

Having set up the technical environment for the book, let’s get into the statistics. In the next sections, we will discuss the concepts of population and sampling. We will demonstrate sampling strategies with code implementations.

Population versus sample

In general, the goal of statistical modeling is to answer a question about a group by making an inference about that group. The group we are making an inference on could be machines in a production factory, people voting in an election, or plants on different plots of land. The entire group, every individual item or entity, is referred to as the population. In most cases, the population of interest is so large that it is not practical or even possible to collect data on every entity in the population. For instance, using the voting example, it would probably not be possible to poll every person that voted in an election. Even if it was possible to reach all the voters for the election of interest, many voters may not consent to polling, which would prevent collection on the entire population. An additional consideration would be the expense of polling such a large group. These factors make it practically impossible to collect population statistics in our example of vote polling. These types of prohibitive factors exist in many cases where we may want to assess a population-level attribute. Fortunately, we do not need to collect data on the entire population of interest. Inferences about a population can be made using a subset of the population. This subset of the population is called a sample. This is the main idea of statistical modeling. A model will be created using a sample and inferences will be made about the population.

In order to make valid inferences about the population of interest using a sample, the sample must be representative of the population of interest, meaning that the sample should contain the variation found in the population. For example, if we were interested in making an inference about plants in a field, it is unlikely that samples from one corner of the field would be sufficient for inferences about the larger population. There would likely be variations in plant characteristics over the entire field. We could think of various reasons why there might be variation. For this example, we will consider some examples from Figure 1.2.

Figure 1.2 – Field of plants

Figure 1.2 – Field of plants

The figure shows that Sample A is near a forest. This sample area may be affected by the presence of the forest; for example, some of the plants in that sample may receive less sunlight than plants in the other sample. Sample B is shown to be in between the main irrigation lines. It’s conceivable that this sample receives more water on average than the other two samples, which may have an effect on the plants in this sample. The final Sample C is near a road. This sample may see other effects that are not seen in Sample A or B.

If samples were only taken from one of those sections, the inferences from those samples would be biased and would not provide valid references about the population. Thus, samples would need to be taken from across the entire field to create a sample that is more likely to be representative of the population of plants. When taking samples from populations, it is critical to ensure the sampling method is robust to possible issues, such as the influence of irrigation and shade in the previous example. Whenever taking a sample from a population, it’s important to identify and mitigate possible influences of bias because biases in data will affect your model and skew your conclusions.

In the next section, various methods for sampling from a dataset will be discussed. An additional consideration is the sample size. The sample size impacts the type of statistical tools we can use, the distributional assumptions that can be made about the sample, and the confidence of inferences and predictions. The impact of sample size will be explored in depth in Chapter 2, Distributions of Data and Chapter 3, Hypothesis Testing.

Population inference from samples

When using a statistical model to make inferential conclusions about a population from a sample subset of that population, the study design must account for similar degrees of uncertainty in its variables as those in the population. This is the variation mentioned earlier in this chapter. To appropriately draw inferential conclusions about a population, any statistical model must be structured around a chance mechanism. Studies structured around these chance mechanisms are called randomized experiments and provide an understanding of both correlation and causation.

Randomized experiments

There are two primary characteristics of a randomized experiment:

Random sampling, colloquially referred to as random selection

Random assignment of treatments, which is the nature of the study

Random sampling

Random sampling (also called random selection) is designed with the intent of creating a sample representative of the overall population so that statistical models generalize the population well enough to assign cause-and-effect outcomes. In order for random sampling to be successful, the population of interest must be well defined. All samples taken from the population must have a chance of being selected. In considering the example of polling voters, all voters must be willing to be polled. Once all voters are entered into a lottery, random sampling can be used to subset voters for modeling. Sampling from only voters who are willing to be polled introduces sampling bias into statistical modeling, which can lead to skewed results. The sampling method in the scenario where only some voters are willing to participate is called self-selection. Any information obtained and modeled from self-selected samples – or any non-random samples – cannot be used for inference.

Random assignment of treatments

The random assignment of treatments refers to two motivators:

The first motivator is to gain an understanding of specific input variables and their influence on the response – for example, understanding whether assigning treatment A to a specific individual may produce more favorable outcomes than a placebo.

The second motivator is to remove the impact of external variables on the outcomes of a study. These external variables, called confounding variables (or confounders), are important to remove as they often prove difficult to control. They may have unpredictable values or even be unknown to the researcher. The consequence of including confounders is that the outcomes of a study may not be replicable, which can be costly. While confounders can influence outcomes, they can also influence input variables, as well as the relationships between those variables.

Referring back to the example in the earlier section, Population versus sample, consider a farmer who decides to start using pesticides on his crops and wants to test two different brands. The farmer knows there are three distinct areas of the land; plot A, plot B, and plot C. To determine the success of the pesticides and prevent damage to the crops, the farmer randomly chooses 60 plants from each plot (this is called stratified random sampling where random sampling is stratified across each plot) for testing. This

Enjoying the preview?

Page 1 of 1

Building Statistical Models in Python: Develop useful models for regression, classification, time series, and survival analysis

About this ebook

Huy Hoang Nguyen

Related authors

Related to Building Statistical Models in Python

Related ebooks

Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning

A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg

Causal Inference in R: Decipher complex relationships with advanced R techniques for data-driven decision-making

R Machine Learning Essentials

Principles of Data Science

Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data

Essential Statistics for Non-STEM Data Analysts: Get to grips with the statistics and math knowledge needed to enter the world of data science with Python

Data Science Career Guide Interview Preparation

Mastering Python for Data Science

Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models

Basics of Data Analysis

Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications

Simulation for Data Science with R

Applied Analytics through Case Studies Using SAS and R: Implementing Predictive Models and Machine Learning Techniques

Data Science for Beginners: Unlocking the Power of Data with Easy-to-Understand Concepts and Techniques. Part 3

Regression Analysis Guide: A Comprehensive Guide for Data Analysts and Researchers

Practical Data Analysis - Second Edition

Ultimate Enterprise Data Analysis and Forecasting using Python

Data Analysis for Beginners: The ABCs of Data Analysis. An Easy-to-Understand Guide for Beginners

Practical Data Analysis: For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models

Creating Good Data: A Guide to Dataset Structure and Data Representation

15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms

Principles of Data Science.: Understand, analyze, and predict data using Machine Learning concepts and tools

Statistics for Machine Learning

Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data

Predictive Analytics

Python: Advanced Predictive Analytics: Gain practical insights by exploiting data in your business to build advanced predictive modeling applications

Introduction to Statistical and Machine Learning Methods for Data Science

Data Observability for Data Engineering: Proactive strategies for ensuring data accuracy and addressing broken data pipelines

Computers For You

Elon Musk

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad

CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

Uncanny Valley: A Memoir

The Invisible Rainbow: A History of Electricity and Life

Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls

How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology

The Professional Voiceover Handbook: Voiceover training, #1

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics

Learning the Chess Openings

Tor and the Dark Art of Anonymity

An Ultimate Guide to Kali Linux for Beginners

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Deep Search: How to Explore the Internet More Effectively

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

People Skills for Analytical Thinkers

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61

Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!

The Hacker Crackdown: Law and Disorder on the Electronic Frontier

Grokking Algorithms: An illustrated guide for programmers and other curious people

Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands

How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters

Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.

Related podcast episodes

Related articles

Related categories

Reviews for Building Statistical Models in Python

What did you think?

Book preview

Building Statistical Models in Python - Huy Hoang Nguyen

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters