0% found this document useful (0 votes)
46 views32 pages

Business Intelligence: Lab Mannual (CSP130)

The document is a lab manual for a business intelligence course submitted by student Kartickey Sharma. It contains 9 sections covering topics like Excel, what-if analysis, regression analysis, classification analysis, forecast analysis, trendlines, t-tests, data visualization, and histograms. The manual provides instructions and step-by-step explanations for experiments analyzing data using these business intelligence techniques in Excel.

Uploaded by

Kkz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
46 views32 pages

Business Intelligence: Lab Mannual (CSP130)

The document is a lab manual for a business intelligence course submitted by student Kartickey Sharma. It contains 9 sections covering topics like Excel, what-if analysis, regression analysis, classification analysis, forecast analysis, trendlines, t-tests, data visualization, and histograms. The manual provides instructions and step-by-step explanations for experiments analyzing data using these business intelligence techniques in Excel.

Uploaded by

Kkz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 32

BUSINESS INTELLIGENCE

LAB MANNUAL
(CSP130)

COMPUTER SCIENCE AND ENGINEERING


B.E. Batch-2017
in
DECEMBER-2020

Under Guidance of Submitted By


Dr. Pradeepta Sarangi Name->Kartickey
Sharma
Professor & Program Head BE – CSE – 7th
(DCSE) Roll
No.>1710991383

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CHITKARA UNIVERSITY
Name Kartickey Sharma
Roll No.1710991383

PUNJAB

INDEX

1) Introduction to Excel

2) What if Analysis.

3) Regression Analysis

4) Classification Analysis

5) Forecast Analysis

6) Trendline

7) Testing Analysis (t-Test)

8) Data Visualization

9) Histogram
Name Kartickey Sharma
Roll No.1710991383

INTRODUCTION TO EXCEL

Excel is a software program from Microsoft that is part of the Microsoft Office Suite of
productivity software developed by Microsoft. It was released on September 30, 1985. Excel
can create and edit spreadsheets that are saved with .xls or .xlsx file extension. General uses of
Excel include cell- based calculations, pivot tables and various graphing tools. For instance,
with an Excel spreadsheet, you could create a monthly budget, track business expenses, or sort
and organize large amount of data.

Unlike a word processor, such as Microsoft Word, the Excel documents consist of columns
and rows of data, made up of individual cells. Each of these cells can contain either text or
numerical values that can be calculated using formulas.

Excel Overview:

Below is an example of Microsoft Excel with each of its major sections highlighted. See the
formula bar, cell, column, row, or sheet tab.
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 1

AIM: What-If Analysis


What if analysis is done with respect to different values and conditions for example If we
have total 100 lectures for students and criteria of their attendance is 60% then what would
be the number of lectures each student should attend. This whole scenario and values are
obtained by applying formula and what if analysis.

STEPS FOR WHAT IF ANALYSIS:

1) First, take a scenario


Let us consider a simple dataset, where the invoice amount is Rs. 10000, on which there
is 9% CGST and 9% SGST, which thus amounts to a total bill of Rs. 11800.

Total Bill = (Invoice Amount + Invoice Amount*(CGST+SGST).

2) Click on what if analysis > Scenario Manager


3) Click Add
4) Enter Scenario name and changing cell

5) Enter values for each changing cell


6) Click on the Scenario which you want to take
7) Click> Show
Name Kartickey Sharma
Roll No.1710991383

Goal Seek
In goal seek one of the values is fixed as a goal and other are automatically changed as per
using goal seek.
Steps of Goal Seek

1) Click on goal seek


2) Enter set cell > To value > changing cell
3) Values Entered
4) Click OK > Goal Seek applied and values are changed
Name Kartickey Sharma
Roll No.1710991383

Goal Seek make changes in shame sheet while Scenario Manager give changing results on new
sheet.
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 2

AIM: Regression Analysis

This example teaches you how to perform a regression analysis in Excel and how to interpret
the Summary Output.
Below you can find our data. Suppose you have a data of height and weight of 10 individuals,
we will try to fit regression for height values with the help of weight values. i.e. we will predict
height if we know weight.

Steps to perform Regression Analysis:


1) On the Data tab, in the Analysis group, click Data Analysis.

2) Select Regression and click OK.


Name Kartickey Sharma
Roll No.1710991383

3) Select the Y Range (A1:A11). This is the predictor variable (also called dependent
variable).
4) Select the X Range (B1:B11). These are the explanatory variables (also called
independent variables). These columns must be adjacent to each other.
5) Check Labels.
6) Click in the Output Range box and select cell A14.
7) Check Residuals.
8) Click OK.

R Square
R Square equals 0.954782248, which is an incredibly good fit. 95.47% of the variation in height
is explained by the independent variable weight. The closer to 1, the better the regression line
(read on) fits the data. In other words, information about height is explained 95.47% by weight.
Significance F and P-values
To check if your results are reliable (statistically significant), look at Significance F, If this
value is less than 0.05, you are OK. If Significance F is greater than 0.05, it is probably better
to stop using this set of independent variables. Delete a variable with a high P-value (greater
than 0.05) and rerun the regression until Significance F drops below 0.05.
Name Kartickey Sharma
Roll No.1710991383

Coefficients
It gives value of coefficients which can be used to build the model for future predictions.
Now our, regression equation for prediction becomes height=1.4153*weight+61.38.

Residuals
The residuals show you how far away the actual data points are from the predicted data points
(using the equation). For example, the first data point equals 150. Using the equation, the
predicted data point equals 1.4153*63 +61.38 = 150.5439, giving a residual of 151-150.5439
= -0.4561.

You can also create a scatter plot of these residuals.


Name Kartickey Sharma
Roll No.1710991383
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 3

AIM: Classification Analysis.


For the analysis of the assortment of goods, «prospects» of clients, suppliers, debtors are used
methods ABC and XYZ (rarely).

a) ABC analysis:

Based ABC-analysis – is the famous Pareto principle, which states that 20% of efforts give
80% of the result. Transformed and detailed, this law has been applied in the development of
we discussed methods.

ABC method allows you to sort a list of values in three groups, which have different impact
on the final result.

● highlight the with the greatest "weight" in the total result.


● analyse the groups of positions instead of an extensive list.
● to work on one algorithm with the positions of one group.
The meanings in the list after the application of the method ABC are divided into three
groups:

1. A – the most important for the total of (20% gives 80% of the results).
2. B – average in importance (30% - 15%).
3. C - the least important (50% - 5%).

These values are not mandatory. Methods of determining the boundaries of the ABC-groups
will differ in the analysis of various indicators. But if significant deviations are detected, you
worth to think, what`s wrong.

Conditions for the using of ABC-analysis:

● the analysed objects have numerical characteristic.


● the list of the analysis consists of homogeneous positions (you cannot comparable
washing machines and light bulbs, because these goods are occupied so different
price ranges);
● were selected the maximum objective meaning (to rank the options on the monthly
revenue more correct than on the daily receipts).
For some values, you can use the ABC analysis methodology:

● the commercial range of goods (analysing to profit);


● the client base (analysing to the volume of orders);
Name Kartickey Sharma
Roll No.1710991383

● the supplier base (analysing to the shipments);


● the debtors (analysing to the sum of indebtedness).
The ranking method is very simple. But to handle of large volumes of data without special
programs is problematic. The tabular processor Excel greatly simplifies to the ABC-analysis.

The general scheme:

1. Identify to the purpose of analysis. Determine the object (which analyse) and
parameter (on what principle will be sorted by groups).
2. Make the sorting parameters in descending order.
3. Summarize to numeric data (parameters - revenue, the amount of debt, the volume of
orders, etc.).
4. Find the proportion of each parameter in the total.
5. Calculate to the share of cumulative total for each list value.
6. Find the value in the list, in which the share of cumulative total is approaching to
80%. This is the lower limit of the group A. The top – is the first in the list.
7. Find the value in the list, in which the share of cumulative total close to 95% (+
15%). This is the lower limit of the group B.
8. For C - everything below.
9. Calculate the number of values for each category and the total number of positions in
the list.
10. Find the shares of each categories in total.

ABC-ANALYSIS STEPS:

1) Create three columns. The first column should be a list of the part numbers for every
item. The second column should contain the unit cost of each listed item. The third
column is where you input the annual demand for each item.
Name Kartickey Sharma
Roll No.1710991383

2) Add in the table to the final line. We need to find the total sum of the values in the
column «Unit Cost». Go to cell B13 and press the hotkey combination ALT + «=» for
quick access to functions with filled parameters: =SUM (B2:B12).

3) To calculate the proportion of each element in the total amount. Create the fourth
column «% Of Unit Cost» and appoint for the cells to percentage format. Enter the
formula in the first cell: =B2/$B$13 (the link to the "sum" we must do to the
absolute). "Stretch" to the last cell of column. In addition, make «Percent» of cells
format CTRL+SHIFT+5.

4) Calculate the share by accrual basis. Add in the table the 5-th column «Accumulated
Unit Cost». For the first position, it will be equal to the individual share. For this
purpose, the cell E2 enter: =D2. For the second position – is the individual share +
Name Kartickey Sharma
Roll No.1710991383

share of accrual basis for the previous position. Enter in the second cell the formula:
=D3+E2. "Stretch" until the end of the column. For the last positions it must be 100%.

5) Assign by the positions to one or another group. Less than 80% - is in the group A.
Less than 95% - is in the group B. Other ones – in the group C.

b) XYZ Analysis:

This method is often used in addition to the ABC analysis. The combined term ABC-XYZ-
analysis is even found in the literature.

The acronym XYZ hides to the level of predictability of the predictability of the object being
analyzed. This index is made to measure by the coefficient of variation that characterizes the
measure of the scatter dates around the average value.
Name Kartickey Sharma
Roll No.1710991383

The coefficient of variation – is a relative measure, which does not have of the specific units
of measurement. It`s suffice informative. Even per se. BUT! The tendency, seasonality
dynamics significantly increase the rate predictability. As a result is reduced the rate
predictability. This error may involve to wrong decisions. This is a huge minus of XYZ-
method. It`s nevertheless…

There are possible objects for analysis: volume of sales, number of suppliers, revenue, etc.
More often this method is used for determining the goods for which there is the strong
demand.

XYZ-analysis algorithm:

1. The calculation of the level of the coefficient of variation of demand for each product
category. The analyst estimates the percentage deviation of the sales volume of the
mean.
2. Sort product range for the coefficient of variation.
3. Position classification in the three groups - X, Y or Z.

The criteria for the classification and characteristic of the groups:

1. «Х» - 0 - 10% (the coefficient of variation) - goods with the strongest demand.
2. «Y» - 10 - 25% - goods with volatile sales.
3. «Z» - 25% - goods having random demand.

STEPS:
1) Let us, Month Wise Sales of product

2) Calculate the coefficient of variation for each commodity group. The variability
calculation formula of sales volume: =STDEVP (B2:M2)/AVERAGE (B2:M2).
Name Kartickey Sharma
Roll No.1710991383

3) Classify meanings - define to the products in the group «X», «Y» or «Z». We use the
built-in function «IF»: =IF (N2<=10%,"X”, IF(N2<=25%,"Y","Z"))
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 4

AIM: Forecast Analysis.


Forecasting is a special technique of making predictions for the future by using historical data
as inputs and analyzing trends.
This method is commonly used to make educated guesses on cash flows, plan budgets,
anticipate future expenses or sales, and so on. However, forecasting doesn't tell the future
definitively, it only shows probabilities. So, you should always double check the results
before making a decision.
Microsoft Excel offers a few different forecasting tools including built-in features, functions,
and graphs. Depending in your needs, you can choose one of the following methods:

● Exponential smoothing forecast - time series forecasting based on historical data with
seasonal or other cycles.
● Linear forecast - predicting future values using linear regression.

a) How to forecast in Excel using exponential smoothing


Exponential smoothing forecasting in Excel is based on the AAA version (additive
error, additive trend and additive seasonality) of the Exponential Triple
Smoothing (ETS) algorithm, which smoothes out minor deviations in past data trends
by detecting seasonality patterns and confidence intervals.
This forecasting method is best suited for non-linear data models with seasonal or
other recurring patterns. It is available in Excel 2016, Excel 2019, and Excel for
Office 365.
You can do such a forecast with your own formulas or have Excel create a forecast
sheet for you automatically.

Create an exponential forecast sheet automatically


The Forecast Sheet feature introduced in Excel 2016 makes time series forecasting super-
easy. Basically, you only need to appropriately organize the source data, and Excel will do
the rest.

Arranging data
In your Excel worksheet, enter two data series into adjacent columns:

● Time series - date or time entries that are observed sequentially at a regular interval
like hourly, daily, monthly, yearly, etc.
Name Kartickey Sharma
Roll No.1710991383

● Data values series - corresponding numeric values that will be predicted for future
dates.
In this example, we will try to forecast sales for the next few years based on the following
historical data. Please pay attention that column A contains dates (the 1st of every year from
2000) in a custom format that displays only year. However, these are fully functional dates, not
text values

Creating a forecast sheet


With the two data series in place, carry out the following steps to build a forecasting model:
1. Select both data series. In most cases, it is sufficient to select just one cell in any of your
series, and Excel picks up the rest of the data automatically.
2. Go to the Data tab > Forecast group and click the Forecast Sheet button.

3. The Create Forecast Worksheet window shows a forecast preview and asks you to
choose:

o Graph type: line (default) or column chart

o End date for forecasting

4. When done, click the Create button.


Name Kartickey Sharma
Roll No.1710991383

Excel immediately creates a new sheet containing a table with your original and predicted
values as well as a chart that visually represents this data.

Customizing Excel forecast


If you would like to change any of the default options of your forecast, click Options in the
lower-left part of the Create Forecast Worksheet window

Forecast Start - the start date for forecasting. You can either select a date from the date
picker or type it directly in the box.

● If your data is seasonal, it is recommended to start a forecast before the last historical point.
● To see how well the predictions, match the known values, pick a date before the end of the
historical data. In this case, only data prior to the start date will be used for forecasting (this
back-testing method is also known as hindcasting).

Confidence Interval - a range in which the predictions are expected to fall. On the line chart,
it is represented by the two finer lines on each side of the forecast line; on the column chart -
by the error bar values.
Confidence interval can help you understand the forecast accuracy. A smaller interval
indicates more confidence for a specific point. The default level is 95%, meaning that 95% of
future points are expected to fall within the range.
You can check and uncheck the Confidence Interval box to show or hide it. And you can
change the default value by using the up or down arrows.
Seasonality - the length of the seasonal pattern in which regular and predictable data
fluctuations occur. For example, in a yearly pattern where each data point represents a month,
the seasonality is 12.

Excel identifies the seasonal cycle automatically but also allows you to set it manually. When
Excel is unable to detect seasonality (usually, with less than 2 cycles of historical data), the
predictions revert to a linear trend.
Include Forecast Statistics - additional statistical information on the forecast. Check this
box if you want Excel to generate a table of additional statistics such as smoothing constants
Name Kartickey Sharma
Roll No.1710991383

(Alpha, Beta, Gamma) and error metrics (MASE, SMAPE, MAE, RMSE). All these values
are calculated by using the FORECASE.ETS.STAT function.
Timeline Range - the range used for your timeline series. By default, it includes all dates in
your source table, but you can change it here.
Values Range - the range used for your value series. It should match the Timeline Range.
Fill Missing Points Using - controls how missing points are handled. By default, Excel uses
the Interpolation approach where the missing points are filled based on the weighted average
of neighboring points. Alternatively, you can select Zeros to treat the missing points as zero
values.
Duplicate Aggregates Using - determines how multiple values with the same timestamp are
calculated. The default option is the average, but you can pick any other calculation method
from the list, e.g. Median, Max or Min.

b) Exponential smoothing forecast formulas


A forecast sheet created by Excel contains two columns with your original data (timeline
series and the corresponding data series) and three calculated columns (forecast values and
two confidence bounds).
Naturally, nothing prevents you from building a similar forecasting model yourself by using
the following formulas.

Forecasted values (FORECAST.ETS function)


The future values are calculated with the FORECAST.ETS function, which has the following
syntax:
FORECAST.ETS (target_date, values, timeline, [seasonality], [data_completion],
[aggregation])
For our sample forecast sheet, Excel has created this formula:
=FORECAST.ETS (A13, $B$2:$B$12, $A$2:$A$12, 1, 1)

Where:

● A13 is the target date


● $B$2: $B$12 is the data values range
● $A$2: $A$12 is the time series range
● 1 in the 4th argument (seasonality) - tells Excel to detect seasonality automatically.
● 1 in the 5th argument (data completion) - tells Excel to complete missing points as the average
of the neighbouring points.
● The 6th argument (aggregation) is omitted, which means that multiple values with the same
time stamp are to be aggregated using AVERAGE.

Confidence interval (FORECAST.ETS.CONFINT function)


To return a confidence interval for the forecast value at a specified date,
the FORECAST.ETS.CONFINT function is used.
For our sample data set, the confidence interval can be calculated with this formula:
Name Kartickey Sharma
Roll No.1710991383

=FORECAST.ETS.CONFINT(H13, $B$2: $B$12, $A$2:$A$12, 0.95, 1, 1)

Where:

● H13 is the target date


● $B$2: $B$12 is the data values range
● $A$2: $A$12 is the time series range
● 0.95 - the confidence level is equal to 95%.
● 1 in the 5th argument (seasonality) - automatic detection of seasonality.
● 1 in the 6th argument (data completion) - missing points are completed based on the average
of the neighbouring points.
● 7th argument omitted (aggregation) - aggregate multiple data values with the same time stamp
by using the AVERAGE function.

In the automatically created Forecast Sheet, Excel does not output the confidence interval
value. Instead, it uses the FORECAST.ETS.CONFINT function in combination with the
forecast value to calculate the Confidence Bounds, provided the Confidence Interval box is
checked in the Options section.
To get the lower bound, you subtract the confidence interval from the forecasted value:
=H13 - FORECAST.ETS.CONFINT(A13, $B$2: $B$12, $A$2: $A$12, 0.95, 1, 1)

To get the upper bound, you add the confidence interval to the forecasted value:
=H13 + FORECAST.ETS.CONFINT(A31, $B$2: $B$12, $A$2: $A$12, 0.95, 1, 1)

Where H13 is the forecasted value returned by FORECAST.ETS.


Name Kartickey Sharma
Roll No.1710991383

c) How to forecast using linear regression in Excel:


For data without seasonality or other cycles, you can predict future values by using linear
regression. This method is also suited for small and simple data sets that do not have enough
historical data to detect seasonality.
Microsoft Excel does not provide a built-in feature to do linear forecasting automatically, but
it does have a special function for this, more precisely, two functions: FORECAST and
FORECAST.LINEAR.
Both functions have the same purpose, syntax and return the same results. The difference is
only in Excel versions:

● In Excel 2016 and Excel 2019, both functions are available, but it is recommended to use
newer FORECAST.LINEAR.
● In Excel 2013, 2010 and 2007, only the FORECAST function is available.

The detailed explanation of the functions' syntax can be found in this tutorial: How to use
FORECAST function in Excel. For now, let us focus on a liner forecast example.

Linear forecast formulas:


Suppose you have the sales data for the previous year and want to predict this year sales.
With just one cycle of historical data, Excel cannot identify a seasonality pattern, therefore
exponential smoothing is not an option. Well, let us do linear forecast with one of these
formulas:
In Excel 2016 and 2019:
=FORECAST.LINEAR(A13, $B$2: $B$12, $A$2:$A$12)

In Excel 2013 and earlier versions:


=FORECAST (A13, $B$2: $B$12, $A$2:$A$12)

Where:

● A13 is the target date


● $B$2: $B$12 is the data values range
● $A$2: $A$12 is the time series range

Please pay attention that we lock both ranges with absolute cell references to prevent them
from changing when we copy the formula down the column.
So, you enter one of the above formulas in any empty cell in row 14, drag it down to as many
cells as needed, and have this result:
Name Kartickey Sharma
Roll No.1710991383

d) Linear regression forecasting graph:


To better understand the future strategies, you can visually represent the predicted values in
a line chart.
To draw a linear forecast graph like shown in the screenshot below, here's what you need to
do:

1. Copy the last historical data value to the Forecast In this example, we copy the value
from B31 to C31. This will help us achieve the effect of a continuous uninterrupted line.
2. Select 3 columns of data: time series, historical data values and forecasted values.
3. On the Insert tab, in the Charts group, click the Insert Line or Area Chart icon and
choose the first chart type (2-D Line).
Name Kartickey Sharma
Roll No.1710991383
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 5

AIM: Trendline.
This example teaches you how to add a trendline to a chart in Excel.
1. Select the chart.
2. Click the + button on the right side of the chart, click the arrow next to Trendline and then
click More Options.

The Format Trendline pane appears.


3. Choose a Trend/Regression type. Click Linear.
4. Specify the number of periods to include in the forecast. Type 3 in the Forward box.
5. Check "Display Equation on chart" and "Display R-squared value on chart".
Name Kartickey Sharma
Roll No.1710991383

Result:

Y=45909.9X+6502.9 and Rz=0.771.


Calculated Result:
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 6

AIM: Testing Analysis (t-Test).


This example teaches you how to perform a t-Test in Excel. The t-Test is used to test the null
hypothesis that the means of two values are equal.
Below you can find the marks obtained by student in 11 subjects in ST1 and ST2.
H0: μ1 - μ2 = 0
H1: μ1 - μ2 ≠ 0

STEPS:
1) On the Data tab, in the Analysis group, click Data Analysis.

2) Select t-Test: Two-Sample Assuming Unequal Variances and click OK.

3) Click in the Variable 1 Range box and select the range B2:A12.
Name Kartickey Sharma
Roll No.1710991383

4) Click in the Variable 2 Range box and select the range C2:C12.
5) Click in the Hypothesized Mean Difference box and type 0 (H 0: μ1 - μ2 = 0).
6) Click in the Output Range box and select cell E1.

7) Click OK.
Result:

Conclusion: We do a two-tail test (inequality). If t Stat < -t Critical two-tail or t Stat > t Critical
two-tail, we reject the null hypothesis. This is not the case, -2.086 < -1.272 < 2.086. Therefore,
we do not reject the null hypothesis. The observed difference between the sample means (15.18
- 20) is not convincing enough to say that the average number of scores between st1 and st2
exams differ significantly.

EXPERIMENT – 7
Name Kartickey Sharma
Roll No.1710991383

AIM: Data Visualization.


All Pie charts, Graphs, Bars, Lines and their Graphical Representations are under Data
Visualization.
1.Select the attributes about which you want to see the graphical representation>click on
specific graph type.

BAR GRAPH

PIE CHART

AREA GRAPH
Name Kartickey Sharma
Roll No.1710991383
Name Kartickey Sharma
Roll No.1710991383

EXPERIMENT – 8

AIM: Histogram
1) First, enter the bin numbers (upper levels) in the range C3:C7.

2) On the Data tab, in the Analysis group, click Data Analysis.

3) Select Histogram and click OK.

4) Select the range A2:A13.


5) Click in the Bin Range box and select the range C3:C7.
6) Click the Output Range option button, click in the Output Range box and select cell
E1.
Name Kartickey Sharma
Roll No.1710991383

7) Check Chart Output.


8) Click OK.
9) Click the legend on the right side and press Delete.
10) Properly label your bins.
11) To remove the space between the bars, right click a bar, click Format Data Series, and
change the Gap Width to 0%.
12) To add borders, right click a bar, click Format Data Series, click the Fill & Line icon,
click Border, and select a color.
Result:

You might also like