Business Intelligence: Lab Mannual (CSP130)
Business Intelligence: Lab Mannual (CSP130)
LAB MANNUAL
(CSP130)
PUNJAB
INDEX
1) Introduction to Excel
2) What if Analysis.
3) Regression Analysis
4) Classification Analysis
5) Forecast Analysis
6) Trendline
8) Data Visualization
9) Histogram
Name Kartickey Sharma
Roll No.1710991383
INTRODUCTION TO EXCEL
Excel is a software program from Microsoft that is part of the Microsoft Office Suite of
productivity software developed by Microsoft. It was released on September 30, 1985. Excel
can create and edit spreadsheets that are saved with .xls or .xlsx file extension. General uses of
Excel include cell- based calculations, pivot tables and various graphing tools. For instance,
with an Excel spreadsheet, you could create a monthly budget, track business expenses, or sort
and organize large amount of data.
Unlike a word processor, such as Microsoft Word, the Excel documents consist of columns
and rows of data, made up of individual cells. Each of these cells can contain either text or
numerical values that can be calculated using formulas.
Excel Overview:
Below is an example of Microsoft Excel with each of its major sections highlighted. See the
formula bar, cell, column, row, or sheet tab.
Name Kartickey Sharma
Roll No.1710991383
EXPERIMENT – 1
Goal Seek
In goal seek one of the values is fixed as a goal and other are automatically changed as per
using goal seek.
Steps of Goal Seek
Goal Seek make changes in shame sheet while Scenario Manager give changing results on new
sheet.
Name Kartickey Sharma
Roll No.1710991383
EXPERIMENT – 2
This example teaches you how to perform a regression analysis in Excel and how to interpret
the Summary Output.
Below you can find our data. Suppose you have a data of height and weight of 10 individuals,
we will try to fit regression for height values with the help of weight values. i.e. we will predict
height if we know weight.
3) Select the Y Range (A1:A11). This is the predictor variable (also called dependent
variable).
4) Select the X Range (B1:B11). These are the explanatory variables (also called
independent variables). These columns must be adjacent to each other.
5) Check Labels.
6) Click in the Output Range box and select cell A14.
7) Check Residuals.
8) Click OK.
R Square
R Square equals 0.954782248, which is an incredibly good fit. 95.47% of the variation in height
is explained by the independent variable weight. The closer to 1, the better the regression line
(read on) fits the data. In other words, information about height is explained 95.47% by weight.
Significance F and P-values
To check if your results are reliable (statistically significant), look at Significance F, If this
value is less than 0.05, you are OK. If Significance F is greater than 0.05, it is probably better
to stop using this set of independent variables. Delete a variable with a high P-value (greater
than 0.05) and rerun the regression until Significance F drops below 0.05.
Name Kartickey Sharma
Roll No.1710991383
Coefficients
It gives value of coefficients which can be used to build the model for future predictions.
Now our, regression equation for prediction becomes height=1.4153*weight+61.38.
Residuals
The residuals show you how far away the actual data points are from the predicted data points
(using the equation). For example, the first data point equals 150. Using the equation, the
predicted data point equals 1.4153*63 +61.38 = 150.5439, giving a residual of 151-150.5439
= -0.4561.
EXPERIMENT – 3
a) ABC analysis:
Based ABC-analysis – is the famous Pareto principle, which states that 20% of efforts give
80% of the result. Transformed and detailed, this law has been applied in the development of
we discussed methods.
ABC method allows you to sort a list of values in three groups, which have different impact
on the final result.
1. A – the most important for the total of (20% gives 80% of the results).
2. B – average in importance (30% - 15%).
3. C - the least important (50% - 5%).
These values are not mandatory. Methods of determining the boundaries of the ABC-groups
will differ in the analysis of various indicators. But if significant deviations are detected, you
worth to think, what`s wrong.
1. Identify to the purpose of analysis. Determine the object (which analyse) and
parameter (on what principle will be sorted by groups).
2. Make the sorting parameters in descending order.
3. Summarize to numeric data (parameters - revenue, the amount of debt, the volume of
orders, etc.).
4. Find the proportion of each parameter in the total.
5. Calculate to the share of cumulative total for each list value.
6. Find the value in the list, in which the share of cumulative total is approaching to
80%. This is the lower limit of the group A. The top – is the first in the list.
7. Find the value in the list, in which the share of cumulative total close to 95% (+
15%). This is the lower limit of the group B.
8. For C - everything below.
9. Calculate the number of values for each category and the total number of positions in
the list.
10. Find the shares of each categories in total.
ABC-ANALYSIS STEPS:
1) Create three columns. The first column should be a list of the part numbers for every
item. The second column should contain the unit cost of each listed item. The third
column is where you input the annual demand for each item.
Name Kartickey Sharma
Roll No.1710991383
2) Add in the table to the final line. We need to find the total sum of the values in the
column «Unit Cost». Go to cell B13 and press the hotkey combination ALT + «=» for
quick access to functions with filled parameters: =SUM (B2:B12).
3) To calculate the proportion of each element in the total amount. Create the fourth
column «% Of Unit Cost» and appoint for the cells to percentage format. Enter the
formula in the first cell: =B2/$B$13 (the link to the "sum" we must do to the
absolute). "Stretch" to the last cell of column. In addition, make «Percent» of cells
format CTRL+SHIFT+5.
4) Calculate the share by accrual basis. Add in the table the 5-th column «Accumulated
Unit Cost». For the first position, it will be equal to the individual share. For this
purpose, the cell E2 enter: =D2. For the second position – is the individual share +
Name Kartickey Sharma
Roll No.1710991383
share of accrual basis for the previous position. Enter in the second cell the formula:
=D3+E2. "Stretch" until the end of the column. For the last positions it must be 100%.
5) Assign by the positions to one or another group. Less than 80% - is in the group A.
Less than 95% - is in the group B. Other ones – in the group C.
b) XYZ Analysis:
This method is often used in addition to the ABC analysis. The combined term ABC-XYZ-
analysis is even found in the literature.
The acronym XYZ hides to the level of predictability of the predictability of the object being
analyzed. This index is made to measure by the coefficient of variation that characterizes the
measure of the scatter dates around the average value.
Name Kartickey Sharma
Roll No.1710991383
The coefficient of variation – is a relative measure, which does not have of the specific units
of measurement. It`s suffice informative. Even per se. BUT! The tendency, seasonality
dynamics significantly increase the rate predictability. As a result is reduced the rate
predictability. This error may involve to wrong decisions. This is a huge minus of XYZ-
method. It`s nevertheless…
There are possible objects for analysis: volume of sales, number of suppliers, revenue, etc.
More often this method is used for determining the goods for which there is the strong
demand.
XYZ-analysis algorithm:
1. The calculation of the level of the coefficient of variation of demand for each product
category. The analyst estimates the percentage deviation of the sales volume of the
mean.
2. Sort product range for the coefficient of variation.
3. Position classification in the three groups - X, Y or Z.
1. «Х» - 0 - 10% (the coefficient of variation) - goods with the strongest demand.
2. «Y» - 10 - 25% - goods with volatile sales.
3. «Z» - 25% - goods having random demand.
STEPS:
1) Let us, Month Wise Sales of product
2) Calculate the coefficient of variation for each commodity group. The variability
calculation formula of sales volume: =STDEVP (B2:M2)/AVERAGE (B2:M2).
Name Kartickey Sharma
Roll No.1710991383
3) Classify meanings - define to the products in the group «X», «Y» or «Z». We use the
built-in function «IF»: =IF (N2<=10%,"X”, IF(N2<=25%,"Y","Z"))
Name Kartickey Sharma
Roll No.1710991383
EXPERIMENT – 4
● Exponential smoothing forecast - time series forecasting based on historical data with
seasonal or other cycles.
● Linear forecast - predicting future values using linear regression.
Arranging data
In your Excel worksheet, enter two data series into adjacent columns:
● Time series - date or time entries that are observed sequentially at a regular interval
like hourly, daily, monthly, yearly, etc.
Name Kartickey Sharma
Roll No.1710991383
● Data values series - corresponding numeric values that will be predicted for future
dates.
In this example, we will try to forecast sales for the next few years based on the following
historical data. Please pay attention that column A contains dates (the 1st of every year from
2000) in a custom format that displays only year. However, these are fully functional dates, not
text values
3. The Create Forecast Worksheet window shows a forecast preview and asks you to
choose:
Excel immediately creates a new sheet containing a table with your original and predicted
values as well as a chart that visually represents this data.
Forecast Start - the start date for forecasting. You can either select a date from the date
picker or type it directly in the box.
● If your data is seasonal, it is recommended to start a forecast before the last historical point.
● To see how well the predictions, match the known values, pick a date before the end of the
historical data. In this case, only data prior to the start date will be used for forecasting (this
back-testing method is also known as hindcasting).
Confidence Interval - a range in which the predictions are expected to fall. On the line chart,
it is represented by the two finer lines on each side of the forecast line; on the column chart -
by the error bar values.
Confidence interval can help you understand the forecast accuracy. A smaller interval
indicates more confidence for a specific point. The default level is 95%, meaning that 95% of
future points are expected to fall within the range.
You can check and uncheck the Confidence Interval box to show or hide it. And you can
change the default value by using the up or down arrows.
Seasonality - the length of the seasonal pattern in which regular and predictable data
fluctuations occur. For example, in a yearly pattern where each data point represents a month,
the seasonality is 12.
Excel identifies the seasonal cycle automatically but also allows you to set it manually. When
Excel is unable to detect seasonality (usually, with less than 2 cycles of historical data), the
predictions revert to a linear trend.
Include Forecast Statistics - additional statistical information on the forecast. Check this
box if you want Excel to generate a table of additional statistics such as smoothing constants
Name Kartickey Sharma
Roll No.1710991383
(Alpha, Beta, Gamma) and error metrics (MASE, SMAPE, MAE, RMSE). All these values
are calculated by using the FORECASE.ETS.STAT function.
Timeline Range - the range used for your timeline series. By default, it includes all dates in
your source table, but you can change it here.
Values Range - the range used for your value series. It should match the Timeline Range.
Fill Missing Points Using - controls how missing points are handled. By default, Excel uses
the Interpolation approach where the missing points are filled based on the weighted average
of neighboring points. Alternatively, you can select Zeros to treat the missing points as zero
values.
Duplicate Aggregates Using - determines how multiple values with the same timestamp are
calculated. The default option is the average, but you can pick any other calculation method
from the list, e.g. Median, Max or Min.
Where:
Where:
In the automatically created Forecast Sheet, Excel does not output the confidence interval
value. Instead, it uses the FORECAST.ETS.CONFINT function in combination with the
forecast value to calculate the Confidence Bounds, provided the Confidence Interval box is
checked in the Options section.
To get the lower bound, you subtract the confidence interval from the forecasted value:
=H13 - FORECAST.ETS.CONFINT(A13, $B$2: $B$12, $A$2: $A$12, 0.95, 1, 1)
To get the upper bound, you add the confidence interval to the forecasted value:
=H13 + FORECAST.ETS.CONFINT(A31, $B$2: $B$12, $A$2: $A$12, 0.95, 1, 1)
● In Excel 2016 and Excel 2019, both functions are available, but it is recommended to use
newer FORECAST.LINEAR.
● In Excel 2013, 2010 and 2007, only the FORECAST function is available.
The detailed explanation of the functions' syntax can be found in this tutorial: How to use
FORECAST function in Excel. For now, let us focus on a liner forecast example.
Where:
Please pay attention that we lock both ranges with absolute cell references to prevent them
from changing when we copy the formula down the column.
So, you enter one of the above formulas in any empty cell in row 14, drag it down to as many
cells as needed, and have this result:
Name Kartickey Sharma
Roll No.1710991383
1. Copy the last historical data value to the Forecast In this example, we copy the value
from B31 to C31. This will help us achieve the effect of a continuous uninterrupted line.
2. Select 3 columns of data: time series, historical data values and forecasted values.
3. On the Insert tab, in the Charts group, click the Insert Line or Area Chart icon and
choose the first chart type (2-D Line).
Name Kartickey Sharma
Roll No.1710991383
Name Kartickey Sharma
Roll No.1710991383
EXPERIMENT – 5
AIM: Trendline.
This example teaches you how to add a trendline to a chart in Excel.
1. Select the chart.
2. Click the + button on the right side of the chart, click the arrow next to Trendline and then
click More Options.
Result:
EXPERIMENT – 6
STEPS:
1) On the Data tab, in the Analysis group, click Data Analysis.
3) Click in the Variable 1 Range box and select the range B2:A12.
Name Kartickey Sharma
Roll No.1710991383
4) Click in the Variable 2 Range box and select the range C2:C12.
5) Click in the Hypothesized Mean Difference box and type 0 (H 0: μ1 - μ2 = 0).
6) Click in the Output Range box and select cell E1.
7) Click OK.
Result:
Conclusion: We do a two-tail test (inequality). If t Stat < -t Critical two-tail or t Stat > t Critical
two-tail, we reject the null hypothesis. This is not the case, -2.086 < -1.272 < 2.086. Therefore,
we do not reject the null hypothesis. The observed difference between the sample means (15.18
- 20) is not convincing enough to say that the average number of scores between st1 and st2
exams differ significantly.
EXPERIMENT – 7
Name Kartickey Sharma
Roll No.1710991383
BAR GRAPH
PIE CHART
AREA GRAPH
Name Kartickey Sharma
Roll No.1710991383
Name Kartickey Sharma
Roll No.1710991383
EXPERIMENT – 8
AIM: Histogram
1) First, enter the bin numbers (upper levels) in the range C3:C7.