Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code
Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code
Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code
Ebook495 pages3 hours

Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Forecasting and Segmentation Using Microsoft Excel guides you through basic statistics to test whether your data can be used to perform regression predictions and time series forecasts. The exercises covered in this book use real-life data from Kaggle, such as demand for seasonal air tickets and credit card fraud detection.
You’ll learn how to apply the grouping K-means algorithm, which helps you find segments of your data that are impossible to see with other analyses, such as business intelligence (BI) and pivot analysis. By analyzing groups returned by K-means, you’ll be able to detect outliers that could indicate possible fraud or a bad function in network packets.
By the end of this Microsoft Excel book, you’ll be able to use the classification algorithm to group data with different variables. You’ll also be able to train linear and time series models to perform predictions and forecasts based on past data.

LanguageEnglish
Release dateMay 27, 2022
ISBN9781803235264
Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code

Related to Data Forecasting and Segmentation Using Microsoft Excel

Related ebooks

Computers For You

View More

Related articles

Reviews for Data Forecasting and Segmentation Using Microsoft Excel

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Forecasting and Segmentation Using Microsoft Excel - Fernando Roque

    Cover.png

    BIRMINGHAM—MUMBAI

    Data Forecasting and Segmentation Using Microsoft Excel

    Copyright © 2022 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Heramb Bhavsar

    Senior Editor: David Sugarman

    Content Development Editor: Sean Lobo

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Language Support Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Hemangini Bari

    Production Designer: Sinhayna Bais

    Marketing Coordinator: Priyanka Mhatre

    First published: June 2022

    Production reference: 1130522

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80324-773-1

    www.packt.com

    To Sesi: I hope that someday you help to protect our natural biodiversity by looking at the North star, the pyramids, and the numbers.

    Thanks to Philly, Peter de la Union, and Mercedes Thaddeus for your continuous assistance.

    Contributors

    About the author

    Fernando Roque has 24 years of experience in working with statistics for quality control and financial risk assessment of projects after planning, budgeting, and execution. In his work, Fernando applies Python k-means and time series machine learning algorithms, using Normalized Difference Vegetation Index (NDVI) drone images to find crop regions with more resilience to droughts. He also applies time series and k-means algorithms for supply chain management (logistics) and inventory planning for seasonal demand.

    About the reviewer

    Ashwini Badgujar is a machine learning engineer at Impulselogic Inc. She has been involved in the machine learning domain for the last 3 years. She has research experience in Natural Language Processing and computer vision, specifically neural networks. She has also worked in the earth science domain, working on a number of earth science projects in collaboration with machine learning. She has worked at NASA in data processing and fire project analysis in calculating mixing heights, and at Comcast, where she worked on optimizing machine learning models.

    Antonio L. Amadeu is a data science consultant, who is passionate about artificial intelligence and neural networks. He researches and applies machine learning and deep learning algorithms in his daily challenges, solving all types of issues in various business industries. He has worked for big companies such as Unilever, Lloyds Bank, TE Connectivity, Microsoft, and Samsung. As an aspiring astrophysicist, he does some research on astronomy object classification, using machine and deep learning techniques and International Virtual Observation Alliance (IVOA). He also participates in research at the Institute of Astronomy, Geophysics, and Atmospheric Sciences at Universidade de São Paulo.

    Table of Contents

    Preface

    Part 1 – An Introduction to Machine Learning Functions

    Chapter 1: Understanding Data Segmentation

    Segmenting data concepts

    Grouping data in segments of two and three variables

    Summary

    Questions

    Answers

    Further reading

    Chapter 2: Applying Linear Regression

    Understanding the influence of variables in linear regression

    Projecting values from predictor variables

    Summary

    Questions

    Answers

    Further reading

    Chapter 3: What is Time Series?

    Technical requirements

    Understanding time series data

    Designing the time series data model

    Analyzing the air passenger 10-year data chart

    Conducting a Durbin-Watson test on our 10-year data

    Computing the centered moving average of each period lag of the data

    Analyzing the seasonal irregularity

    Trending component of the time series

    Doing the forecast

    Summary

    Questions

    Answers

    Further reading

    Part 2 – Grouping Data to Find Segments and Outliers

    Chapter 4: Introduction to Data Grouping

    Technical requirements

    Grouping with the K-means machine learning function

    Finding groups of multiple variables

    Calculating centroids and the optimal number of segments for one variable

    Calculating centroids and the optimal number of segments for two or more variables

    Understanding outliers

    Summary

    Questions

    Answers

    Further reading

    Chapter 5: Finding the Optimal Number of Single Variable Groups

    Technical requirements

    Finding an optimal number of groups for one variable

    Instructions to run the required add-in in Excel

    Running K-means elbow to get the optimal number of groups

    Passing the data values to the K-means elbow function

    Executing and interpreting the resulting chart of the optimal number of groups

    Running the K-means function to get the centroids or group average

    Finding the groups and centroids of one-variable data with K-means and Excel

    Assigning values for every group

    Calculating the centroid or the average point for every group

    Exploring the range of values for each segment

    Finding the segments for products and profits

    Finding the optimal number of groups using the K-means elbow function

    Running the K-means function to do the group segmentation

    Summary

    Questions

    Answers

    Further reading

    Chapter 6: Finding the Optimal Number of Multi-Variable Groups

    Technical requirements

    Calculating the optimal number of groups for two and three variables

    Finding the optimal number of segments for two variables – revenue and quantity

    Using the elbow function to get the number of groups for three variables – revenue, quantity, and month of sale

    Determining the groups and average value (centroids) of two and three variables

    Getting the groups with the K-means algorithm for two and three variables

    Visualizing centroids or the average value of each group for two and three variables

    Charting the product value range of each group for revenue, quantity, and month

    Using the Elbow and K-means functions with four variables

    Summary

    Questions and answers

    Answers

    Further reading

    Chapter 7: Analyzing Outliers for Data Anomalies

    Technical requirements

    Representing the data in a 3D chart

    Kaggle credit card fraud dataset

    Kaggle suspicious logins

    Kaggle insurance money amount complaints

    K-means data grouping

    Running the elbow algorithm

    Kaggle credit card fraud dataset

    Running the K-means function

    Pivot analysis of the outliers

    Kaggle credit card fraud dataset

    Kaggle suspicious logins

    Kaggle insurance money amount complaints

    Summary

    Questions

    Answers

    Further reading

    Part 3 – Simple and Multiple Linear Regression Analysis

    Chapter 8: Finding the Relationship between Variables

    Technical requirements

    Charting the predictive model's regression variables

    Plotting the variables to analyze the possible relationship

    Calculating the linear model confidence percentage

    Coefficient of determination

    Correlation coefficient

    Statistical significance of the slope

    The regression model's value ranges

    Summary

    Questions

    Answers

    Further reading

    Chapter 9: Building, Training, and Validating a Linear Model

    Technical requirements

    Calculating the intercept and slope with formulas

    Computing coefficient significance – t-statistics and p-value

    Coefficient of determination

    Coefficient of correlation

    t-statistics and p-value

    Getting the residual standard error

    Calculating the r-squared

    Calculating the f-statistics

    Training and testing the model

    Doing prediction scenarios with the regression model

    Summary

    Questions

    Answers

    Chapter 10: Building, Training, and Validating a Multiple Regression Model

    Technical requirements

    Exploring the variables with more influence

    Calculating t-statistics and p-values

    Determination coefficient

    Correlation coefficient

    t-statistics

    p-value

    Calculating residuals standard error and f-statistics

    Calculating residuals standard errors

    Calculating f-statistics

    Training and testing the model

    Writing a linear regression model formula

    Building the prediction model

    Summary

    Questions

    Answers

    Further reading

    Part 4 – Predicting Values with Time Series

    Chapter 11: Testing Data for Time Series Compliance

    Technical requirements

    Visualizing seasonal trends

    Researching autocorrelation – past values' influence over present values

    Performing the Durbin-Watson autocorrelation test

    Calculating Durbin-Watson by hand in Excel

    Summary

    Questions and answers

    Answers

    Further reading

    Chapter 12: Working with Time Series Using the Centered Moving Average and a Trending Component

    Technical requirements

    Calculating the CMA

    Calculating the moving average and CMA

    Estimating the season irregular and season components

    Calculating the trend line

    Producing the forecast – season and trend line

    Summary

    Questions

    Answers

    Further reading

    Chapter 13: Training, Validating, and Running the Model

    Technical requirements

    Training the model

    Conducting the Durbin-Watson test

    Building and training the forecast model

    Testing the forecast model

    Doing the forecast

    Summary

    Questions

    Answers

    Other Books You May Enjoy

    Preface

    This book is about giving you basic statistical knowledge to work with machine learning using complex algorithms to classify data, such as the K-means method. You will use an included add-in for Excel to practice the concepts of grouping statistics without the need for a deep programming background in the R language or Python.

    The book covers three topics of machine learning:

    Data segmentation

    Linear regression

    Forecasts with time series

    Data segmentation has many practical applications because it allows applying different strategies depending on the segment data ranges. It has applications in marketing and inventory rotation to act accordingly to the location and season of the sales.

    The linear regression statistical concepts in this book will help you to explore whether the variables that we are using are useful to build a predictive model.

    The time series model helps to do a forecast depending on the different seasons of the year. It has applications in inventory planning to allocate the correct quantities of products and avoid stalled cash flow in the warehouses. The time series depends on statistical tests to see whether the present values depend on the past, so they are useful to forecast the future.

    Who this book is for

    This book is for any professional that needs to analyze the data generated by the industry or academic scope using machine learning principles and algorithms. This book can help to better understand the different groups of data to apply a different approach to each one. Then, you can use the statistical tests of this book to see the most relevant variables that affect your performance using projections with linear regression. You will be able to link these variables with the time and season and use time series analysis to build a forecast that could help you to improve your planning in your professional scope.

    What this book covers

    Chapter 1, Understanding Data Segmentation, looks at how classifying the data of similar values is an approach for planning a strategy depending on the characteristics of the range of values of the groups. This strategy is more important when you deal with a problem with several variables, for example, finding the different groups of revenues for each season of the year and the quantities delivered for logistics demand planning.

    Chapter 2, Applying Linear Regression, shows that the target of linear regression is to use related variables to predict the behavior of the values and build scenarios of what could happen in different situations, using the regression model as a framework for foreseeing the situations.

    Chapter 3, What is Time Series?, examines how a time series model could do a forecast of the data, taking into account the seasonal trends based on the past time values.

    Chapter 4, An Introduction to Data Grouping, delves into the importance of finding a different approach for each group. In complex multivariable problems, we need the assistance of machine learning algorithms such as K-means to find the optimal number of segments and the group's values range.

    Chapter 5, Finding the Optimal Number of Single Variable Groups, shows how running an add-in for Excel that uses the K-means algorithm can help to get the optimal number of groups for the data that we are researching. In this case, we will start with a problem of just one variable to explain the concepts.

    Chapter 6, Finding the Optimal Number of Multi-Variable Groups, demonstrates how to use the Excel add-in to do the grouping of problems of several variables, for example, the classification of quantity, revenue, and season of the inventory rotation.

    Chapter 7, Analyzing Outliers for Data Anomalies, delves into another approach to data segmentation: researching what happens with the values that have a long-distance separation of all the groups. These values are anomalies, such as very short value expenses happening at non-business hours that could indicate evidence of possible fraud attempts.

    Chapter 8, Finding the Relationship between Variables, shows how we have to do statistical tests of the relationship of the variables to check whether they are useful to design a predictive model before building a linear model.

    Chapter 9, Building, Training, and Validating a Linear Model, talks about what happens after the relationship of the variables is statistically tested as useful to build a predictive model; we will use a portion of the data (regularly 20%) to test the model and see whether it gives a good sense of results similar to the known data.

    Chapter 10, Building, Training, and Validating a Multiple Regression Model, discusses multiple regression, which involves three or more variables. We will see how to apply the statistical tests to see the most useful variables to build the predictive model. Then, we will test the regression with 20% of the data and see whether it makes sense to use the model to build new scenarios with unknown data.

    Chapter 11, Testing Data for Time Series Compliance, shows how the time series forecast relies on the relationship of the present values to the past values. We will apply statistical methods to find whether the data is useful for a forecast model.

    Chapter 12, Working with Time Series Using the Centered Moving Average and a Trending Component, explores the forecast model's dependence on two components: the centered moving average (which gives the seasonal ups and downs variations) and the linear regression (which gives the positive or negative orientation of the trend). Once we have these calculations, we will be able to test and use the model.

    Chapter 13, Training, Validating, and Running the Model, covers the statistical tests time series and then models with 80% of the data. Then, we will test the time series with the remaining 20% and see whether the model returns results that make sense depending on our experience. Finally, we will use the model to do forecasts.

    To get the most out of this book

    To better understand this book, you must have a basic knowledge of statistical concepts such as average and standard deviation. You must also be able to use statistical functions in Excel to mark the cells' ranges input for calculations.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Data-Forecasting-and-Segmentation-Using-Microsoft-Excel. If there's an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803247731_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.

    A block of code is set as follows:

    html, body, #map {

    height: 100%;

    margin: 0;

    padding: 0

    }

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    [default]

    exten => s,1,Dial(Zap/1|30)

    exten => s,2,Voicemail(u100)

    exten => s,102,Voicemail(b100)

    exten => i,1,Voicemail(s0)

    Any command-line input or output is written as follows:

    $ mkdir css

    $ cd css

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.

    Tips or Important Notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming

    Enjoying the preview?
    Page 1 of 1