Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code
()
About this ebook
Data Forecasting and Segmentation Using Microsoft Excel guides you through basic statistics to test whether your data can be used to perform regression predictions and time series forecasts. The exercises covered in this book use real-life data from Kaggle, such as demand for seasonal air tickets and credit card fraud detection.
You’ll learn how to apply the grouping K-means algorithm, which helps you find segments of your data that are impossible to see with other analyses, such as business intelligence (BI) and pivot analysis. By analyzing groups returned by K-means, you’ll be able to detect outliers that could indicate possible fraud or a bad function in network packets.
By the end of this Microsoft Excel book, you’ll be able to use the classification algorithm to group data with different variables. You’ll also be able to train linear and time series models to perform predictions and forecasts based on past data.
Related to Data Forecasting and Segmentation Using Microsoft Excel
Related ebooks
Modern Time Series Forecasting with Python: Explore industry-ready time series forecasting using modern machine learning and deep learning Rating: 0 out of 5 stars0 ratingsA Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg Rating: 0 out of 5 stars0 ratingsCodeless Time Series Analysis with KNIME: A practical guide to implementing forecasting models for time series analysis applications Rating: 0 out of 5 stars0 ratingsPython Machine Learning By Example: The easiest way to get into machine learning Rating: 5 out of 5 stars5/5Hands-On Data Preprocessing in Python: Learn how to effectively prepare data for successful data analytics Rating: 0 out of 5 stars0 ratingsDeep Learning By Example: A hands-on guide to implementing advanced machine learning algorithms and neural networks Rating: 0 out of 5 stars0 ratingsPractical Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsEconometrics and Data Science: Apply Data Science Techniques to Model Complex Problems and Implement Solutions for Economic Problems Rating: 0 out of 5 stars0 ratingsGo Machine Learning Projects: Eight projects demonstrating end-to-end machine learning and predictive analytics applications in Go Rating: 0 out of 5 stars0 ratingsManufacturing: Engineering, Management and Marketing Rating: 0 out of 5 stars0 ratingsMachine Learning for Time-Series with Python: Forecast, predict, and detect anomalies with state-of-the-art machine learning methods Rating: 0 out of 5 stars0 ratingsHands-On Machine Learning with ML.NET: Getting started with Microsoft ML.NET to implement popular machine learning algorithms in C# Rating: 0 out of 5 stars0 ratingsMachine Learning Algorithms for Data Scientists: An Overview Rating: 0 out of 5 stars0 ratingsMachine Learning and Predictive Modeling Rating: 0 out of 5 stars0 ratingsDATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide) Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5Time Series Analysis on AWS: Learn how to build forecasting models and detect anomalies in your time series data Rating: 0 out of 5 stars0 ratingsThe Statistics and Machine Learning with R Workshop: Unlock the power of efficient data science modeling with this hands-on guide Rating: 0 out of 5 stars0 ratingsMachine Learning Solutions: Expert techniques to tackle complex machine learning problems using Python Rating: 0 out of 5 stars0 ratingsData Science Projects with Python.: A case study approach to gaining valuable insights from real data with machine learning Rating: 0 out of 5 stars0 ratingsApplied Logistic Regression Rating: 5 out of 5 stars5/5Distributed Machine Learning with Python: Accelerating model training and serving with distributed systems Rating: 0 out of 5 stars0 ratings
Computers For You
The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsDeep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Computer Science I Essentials Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsCreating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5
Reviews for Data Forecasting and Segmentation Using Microsoft Excel
0 ratings0 reviews
Book preview
Data Forecasting and Segmentation Using Microsoft Excel - Fernando Roque
BIRMINGHAM—MUMBAI
Data Forecasting and Segmentation Using Microsoft Excel
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Heramb Bhavsar
Senior Editor: David Sugarman
Content Development Editor: Sean Lobo
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Language Support Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Hemangini Bari
Production Designer: Sinhayna Bais
Marketing Coordinator: Priyanka Mhatre
First published: June 2022
Production reference: 1130522
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80324-773-1
www.packt.com
To Sesi: I hope that someday you help to protect our natural biodiversity by looking at the North star, the pyramids, and the numbers.
Thanks to Philly, Peter de la Union, and Mercedes Thaddeus for your continuous assistance.
Contributors
About the author
Fernando Roque has 24 years of experience in working with statistics for quality control and financial risk assessment of projects after planning, budgeting, and execution. In his work, Fernando applies Python k-means and time series machine learning algorithms, using Normalized Difference Vegetation Index (NDVI) drone images to find crop regions with more resilience to droughts. He also applies time series and k-means algorithms for supply chain management (logistics) and inventory planning for seasonal demand.
About the reviewer
Ashwini Badgujar is a machine learning engineer at Impulselogic Inc. She has been involved in the machine learning domain for the last 3 years. She has research experience in Natural Language Processing and computer vision, specifically neural networks. She has also worked in the earth science domain, working on a number of earth science projects in collaboration with machine learning. She has worked at NASA in data processing and fire project analysis in calculating mixing heights, and at Comcast, where she worked on optimizing machine learning models.
Antonio L. Amadeu is a data science consultant, who is passionate about artificial intelligence and neural networks. He researches and applies machine learning and deep learning algorithms in his daily challenges, solving all types of issues in various business industries. He has worked for big companies such as Unilever, Lloyds Bank, TE Connectivity, Microsoft, and Samsung. As an aspiring astrophysicist, he does some research on astronomy object classification, using machine and deep learning techniques and International Virtual Observation Alliance (IVOA). He also participates in research at the Institute of Astronomy, Geophysics, and Atmospheric Sciences at Universidade de São Paulo.
Table of Contents
Preface
Part 1 – An Introduction to Machine Learning Functions
Chapter 1: Understanding Data Segmentation
Segmenting data concepts
Grouping data in segments of two and three variables
Summary
Questions
Answers
Further reading
Chapter 2: Applying Linear Regression
Understanding the influence of variables in linear regression
Projecting values from predictor variables
Summary
Questions
Answers
Further reading
Chapter 3: What is Time Series?
Technical requirements
Understanding time series data
Designing the time series data model
Analyzing the air passenger 10-year data chart
Conducting a Durbin-Watson test on our 10-year data
Computing the centered moving average of each period lag of the data
Analyzing the seasonal irregularity
Trending component of the time series
Doing the forecast
Summary
Questions
Answers
Further reading
Part 2 – Grouping Data to Find Segments and Outliers
Chapter 4: Introduction to Data Grouping
Technical requirements
Grouping with the K-means machine learning function
Finding groups of multiple variables
Calculating centroids and the optimal number of segments for one variable
Calculating centroids and the optimal number of segments for two or more variables
Understanding outliers
Summary
Questions
Answers
Further reading
Chapter 5: Finding the Optimal Number of Single Variable Groups
Technical requirements
Finding an optimal number of groups for one variable
Instructions to run the required add-in in Excel
Running K-means elbow to get the optimal number of groups
Passing the data values to the K-means elbow function
Executing and interpreting the resulting chart of the optimal number of groups
Running the K-means function to get the centroids or group average
Finding the groups and centroids of one-variable data with K-means and Excel
Assigning values for every group
Calculating the centroid or the average point for every group
Exploring the range of values for each segment
Finding the segments for products and profits
Finding the optimal number of groups using the K-means elbow function
Running the K-means function to do the group segmentation
Summary
Questions
Answers
Further reading
Chapter 6: Finding the Optimal Number of Multi-Variable Groups
Technical requirements
Calculating the optimal number of groups for two and three variables
Finding the optimal number of segments for two variables – revenue and quantity
Using the elbow function to get the number of groups for three variables – revenue, quantity, and month of sale
Determining the groups and average value (centroids) of two and three variables
Getting the groups with the K-means algorithm for two and three variables
Visualizing centroids or the average value of each group for two and three variables
Charting the product value range of each group for revenue, quantity, and month
Using the Elbow and K-means functions with four variables
Summary
Questions and answers
Answers
Further reading
Chapter 7: Analyzing Outliers for Data Anomalies
Technical requirements
Representing the data in a 3D chart
Kaggle credit card fraud dataset
Kaggle suspicious logins
Kaggle insurance money amount complaints
K-means data grouping
Running the elbow algorithm
Kaggle credit card fraud dataset
Running the K-means function
Pivot analysis of the outliers
Kaggle credit card fraud dataset
Kaggle suspicious logins
Kaggle insurance money amount complaints
Summary
Questions
Answers
Further reading
Part 3 – Simple and Multiple Linear Regression Analysis
Chapter 8: Finding the Relationship between Variables
Technical requirements
Charting the predictive model's regression variables
Plotting the variables to analyze the possible relationship
Calculating the linear model confidence percentage
Coefficient of determination
Correlation coefficient
Statistical significance of the slope
The regression model's value ranges
Summary
Questions
Answers
Further reading
Chapter 9: Building, Training, and Validating a Linear Model
Technical requirements
Calculating the intercept and slope with formulas
Computing coefficient significance – t-statistics and p-value
Coefficient of determination
Coefficient of correlation
t-statistics and p-value
Getting the residual standard error
Calculating the r-squared
Calculating the f-statistics
Training and testing the model
Doing prediction scenarios with the regression model
Summary
Questions
Answers
Chapter 10: Building, Training, and Validating a Multiple Regression Model
Technical requirements
Exploring the variables with more influence
Calculating t-statistics and p-values
Determination coefficient
Correlation coefficient
t-statistics
p-value
Calculating residuals standard error and f-statistics
Calculating residuals standard errors
Calculating f-statistics
Training and testing the model
Writing a linear regression model formula
Building the prediction model
Summary
Questions
Answers
Further reading
Part 4 – Predicting Values with Time Series
Chapter 11: Testing Data for Time Series Compliance
Technical requirements
Visualizing seasonal trends
Researching autocorrelation – past values' influence over present values
Performing the Durbin-Watson autocorrelation test
Calculating Durbin-Watson by hand in Excel
Summary
Questions and answers
Answers
Further reading
Chapter 12: Working with Time Series Using the Centered Moving Average and a Trending Component
Technical requirements
Calculating the CMA
Calculating the moving average and CMA
Estimating the season irregular and season components
Calculating the trend line
Producing the forecast – season and trend line
Summary
Questions
Answers
Further reading
Chapter 13: Training, Validating, and Running the Model
Technical requirements
Training the model
Conducting the Durbin-Watson test
Building and training the forecast model
Testing the forecast model
Doing the forecast
Summary
Questions
Answers
Other Books You May Enjoy
Preface
This book is about giving you basic statistical knowledge to work with machine learning using complex algorithms to classify data, such as the K-means method. You will use an included add-in for Excel to practice the concepts of grouping statistics without the need for a deep programming background in the R language or Python.
The book covers three topics of machine learning:
Data segmentation
Linear regression
Forecasts with time series
Data segmentation has many practical applications because it allows applying different strategies depending on the segment data ranges. It has applications in marketing and inventory rotation to act accordingly to the location and season of the sales.
The linear regression statistical concepts in this book will help you to explore whether the variables that we are using are useful to build a predictive model.
The time series model helps to do a forecast depending on the different seasons of the year. It has applications in inventory planning to allocate the correct quantities of products and avoid stalled cash flow in the warehouses. The time series depends on statistical tests to see whether the present values depend on the past, so they are useful to forecast the future.
Who this book is for
This book is for any professional that needs to analyze the data generated by the industry or academic scope using machine learning principles and algorithms. This book can help to better understand the different groups of data to apply a different approach to each one. Then, you can use the statistical tests of this book to see the most relevant variables that affect your performance using projections with linear regression. You will be able to link these variables with the time and season and use time series analysis to build a forecast that could help you to improve your planning in your professional scope.
What this book covers
Chapter 1, Understanding Data Segmentation, looks at how classifying the data of similar values is an approach for planning a strategy depending on the characteristics of the range of values of the groups. This strategy is more important when you deal with a problem with several variables, for example, finding the different groups of revenues for each season of the year and the quantities delivered for logistics demand planning.
Chapter 2, Applying Linear Regression, shows that the target of linear regression is to use related variables to predict the behavior of the values and build scenarios of what could happen in different situations, using the regression model as a framework for foreseeing the situations.
Chapter 3, What is Time Series?, examines how a time series model could do a forecast of the data, taking into account the seasonal trends based on the past time values.
Chapter 4, An Introduction to Data Grouping, delves into the importance of finding a different approach for each group. In complex multivariable problems, we need the assistance of machine learning algorithms such as K-means to find the optimal number of segments and the group's values range.
Chapter 5, Finding the Optimal Number of Single Variable Groups, shows how running an add-in for Excel that uses the K-means algorithm can help to get the optimal number of groups for the data that we are researching. In this case, we will start with a problem of just one variable to explain the concepts.
Chapter 6, Finding the Optimal Number of Multi-Variable Groups, demonstrates how to use the Excel add-in to do the grouping of problems of several variables, for example, the classification of quantity, revenue, and season of the inventory rotation.
Chapter 7, Analyzing Outliers for Data Anomalies, delves into another approach to data segmentation: researching what happens with the values that have a long-distance separation of all the groups. These values are anomalies, such as very short value expenses happening at non-business hours that could indicate evidence of possible fraud attempts.
Chapter 8, Finding the Relationship between Variables, shows how we have to do statistical tests of the relationship of the variables to check whether they are useful to design a predictive model before building a linear model.
Chapter 9, Building, Training, and Validating a Linear Model, talks about what happens after the relationship of the variables is statistically tested as useful to build a predictive model; we will use a portion of the data (regularly 20%) to test the model and see whether it gives a good sense of results similar to the known data.
Chapter 10, Building, Training, and Validating a Multiple Regression Model, discusses multiple regression, which involves three or more variables. We will see how to apply the statistical tests to see the most useful variables to build the predictive model. Then, we will test the regression with 20% of the data and see whether it makes sense to use the model to build new scenarios with unknown data.
Chapter 11, Testing Data for Time Series Compliance, shows how the time series forecast relies on the relationship of the present values to the past values. We will apply statistical methods to find whether the data is useful for a forecast model.
Chapter 12, Working with Time Series Using the Centered Moving Average and a Trending Component, explores the forecast model's dependence on two components: the centered moving average (which gives the seasonal ups and downs variations) and the linear regression (which gives the positive or negative orientation of the trend). Once we have these calculations, we will be able to test and use the model.
Chapter 13, Training, Validating, and Running the Model, covers the statistical tests time series and then models with 80% of the data. Then, we will test the time series with the remaining 20% and see whether the model returns results that make sense depending on our experience. Finally, we will use the model to do forecasts.
To get the most out of this book
To better understand this book, you must have a basic knowledge of statistical concepts such as average and standard deviation. You must also be able to use statistical functions in Excel to mark the cells' ranges input for calculations.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Data-Forecasting-and-Segmentation-Using-Microsoft-Excel. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803247731_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.
A block of code is set as follows:
html, body, #map {
height: 100%;
margin: 0;
padding: 0
}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ mkdir css
$ cd css
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.
Tips or Important Notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming