A Review On Data Analytics For Supply Chain Management
Malini M. Patil
Department of Information Science and Engineering
JSS Academy of Technical Education, Bengaluru-560060, India
Email: [email protected]
Abstract—The present study bridges the gap between the in the market. This integration enables the working
two intersecting domains, data science and supply chain pattern of a system that gives quick response to the needs
management. The data can be analyzed for inventory of the market. Also the ad-hoc situations that arise in the
management, forecasting and prediction, which is in the market. Such a system is referred to as Supply Chain
form of reports, queries and forecasts. Because of the Management (SCM) [1]. SCM is defined as managing the
price, weather patterns, economic volatility and complex flow of information, material and resources across and
nature of business, the forecasts may not be accurate. within the network of upstream and downstream
This has resulted in the growth of Supply chain analytics. organizations [2]. Supply chain can be defined as network
It is the application of qualitative and quantitative of flow of products, financial deliverables, customer
methods to solve relevant problems and to predict the services and network of information, material and
outcomes by considering quality of data. The issues like resources. Management of multiple relationships in
increased collaboration between companies, customers, supply chain is referred to as supply chain management.
retailers and governmental organizations, companies are Some of the factors like product success, customer
adopting Big Data solutions. Big Data applications can be satisfaction, growth of an organization depends on the
linked for Supply Chain Management across the fields successful execution of SCM [5].
like procurement, transportation, warehouse operations, The term Big Data has been used initially by two
marketing and also for smart logistics. As supply chain NASA researchers during 1997 to refer to the
networks becoming vast, more complex and driven by visualization challenge for systems with large amount of
demands for more exacting service levels, the type of data sets which are ubiquitous in nature [6]. Big data can
data that is managed and analyzed also becomes more be understood as the data which is complex, large in
complex. The present work aims at providing an volume, rapid growing with numerous, autonomous and
overview of adoption of capabilities of Data Analytics as independent data sources [33]. Data has increased in
part of a “next generation” architecture by developing a various fields on a large scale basis from couple of
linear regression model on a sales-data. The paper also decades, because of which the term “Big Data” has been
covers the survey of how big data techniques can be used coined [20]. Big data has its impact in various domains:
for storage, processing, managing, interpretation and helps in renovating the supply chain, manages the
visualization of data in the field of Supply chain. customer fidelity in marketing, health, optimizing the
route and reducing cost in transportation, reducing the
Index Terms—Big Data, Supply Chain Management, risk in finance etc. [7]. Deployment of Big data
Supply Chain Analytics, Supply Chain Network, management system for Supply chain management will
regression analysis, Smart logistics, Big Data achieve greater benefits as system becomes more agile.
Management Systems. Big Data can be defined as a large volume of data –
both unstructured and structured. Due to rise in the field
of social media, Internet of things and mobile devices, it
I. INTRODUCTION is found that there is a massive increase in real time data
After the 1990s great changes in the operating rules of generation [1]. As per the survey, more than 1200
world economy and market competition patterns, Exabyte’s of data are generated every year from different
enterprises have identified the need for globalization of data sources [1]. Most of the data generated are not
economic development. Companies need to rely on the structured. Amount of unstructured data is approximately
integration of their own and external resources available 80%, where these data are difficult to store, analyze and
The analysis of Big Data leads to insights that help in complexity, supplier complexity, process complexity and
taking better decisions and strategic business moves service complexity [3].
which is termed as Analytics. According to Waller and Fawcett (2013), research area
Complete process of extraction of Big Data consists of in the field of Big Data intersecting with SCM could
two things- data management and data analytics. Data illuminate a “great number of new opportunities” for both
management comprises “processes and supporting academia and practitioners [2]. They also pointed out that
technologies to gather data, store data, to prepare and very sparse literature survey about predictive analytics,
retrieve it for analysis. The techniques used to analyze data science and Big Data are available in the field of
and gather intelligence from Big data is referred to as Big SCM. SCM and Logistics are not the new ideas. Logistics
Data can be defined as the process of managing the
Analytics. Analytics is defined as the collection and procurement, flow of products, storage of materials, parts
analysis of data in terms of qualitative and quantitative and finished inventory to maximize the profit through the
for decision making. BDA is the application of advanced cost effective fulfillment of orders [10]. SCM is an
analytic techniques to huge data sets. Gartner explains extensive field than logistics. The concept of logistics
that only 15% of Fortune 500 companies will be able to evolved as a subfield of Supply chain management whose
make full use of big data for creating value and only 8% main vision is creating & presenting a single plan of flow
of them are currently using Big Data Analytics [1]. of materials and information. During 1980 logistics has
Big data analytics has proceeded its need for the been defined as to fulfill taking care, transportation,
Supply chain management of any organization and many loading and unloading, packing and processing between
companies are struggling to unveil its business value [2]. the manufacturer and consumer for commodity and
Big data analytics challenge is to analyze the irregular various other functions [28]. SCM superimposes the
patterns of data arriving next to the present huge data sets logistics framework in implementing the connectivity and
[26]. BDA is important in creating an integrated view of coordination between entities like suppliers and
operational performance and customer satisfaction of customers [10].
both sender and recipient in the SCM [16]. It is really According to the professionalism, skills, flexibility,
challenging to meet fully the Supply chain susceptibility reliability, attitude, behavior, reputation and integrity of
because of the complexity in Supply chain components, human resources in logistics companies are mainly
process, supplier and services [25]. Supply chain important from the client’s point of view [23]. Fig.1
analytics has been developed its own identity in supply shows the flow of supply chain management. It explains
chain management by using the Business Intelligence various activities involved in manufacturing a product
tools for the analysis of customer behavior, optimization and services starting from planning till delivery of
of upstream and downstream operations and also insight materials to end customer. Truly speaking, SCM is not a
on advanced routing solutions. Examples are component chain of processes instead it is a network of multiple
businesses and relationships [8].
needed for solving the Big data analytics and one should
II. BIG DATA familiarize with the data [18]. In data preparation stage,
ELTL (Extract, Load, Transform, Load) operations are
The name Big Data has been used first by two NASA applied on the required data. Huge volumes of data from
researchers in 1997 as a challenge for visualizing large
different sources causes high probabilities of errors [30].
data sets. There after researchers and specialists in the So data needs to be transformed, cleaned and audited
field of Information Management have been gradually before they are loaded into Data warehouses [30].
paying attention towards Big Data. The process for Big Technologies can be used if required in this phase. In the
Data is classified into 7 phases as shown in the Fig 2. subsequent phases, especially in the next phase project
Data Discovery phase includes accessing the resources
team has to decide the usage of methods, workflows and
techniques required for analyzing the data along with variety of data in the field of SCM includes data from
evaluation and interpretation of data. In the model diverse sources like retailers, distributors, suppliers,
execution phase, the model chosen in the previous phase inventory, sales, consumer [9] etc. Big Data collection
is executed with appropriate data sets available. Once the process in SCM includes 2 variety of sources: Upstream
results are available, communication of the results and and downstream sources. The data from upstream source
optimizing the results if possible has been carried out in includes supplier’s side, through intermediate stream or
the remaining phases [18]. warehouse side. Data from downstream includes logistics,
Big Data has a positive impact in different domains: distribution or retailer side [20].
helps in reconditioning the supply chain, increasing sales
4. Veracity:
and marketing, real route optimization etc. [7]. Big data
analytics process has been explained from the perspective Correctness or trustworthiness of data is referred to as
of supply chain data [18]. The following section provides veracity. This verifies the quality of data from SCM,
the taxonomy of ten main attributes of Big data compliance issues etc.
applications in Supply Chain management.
5. Value:
1. Volume:
It refers to monetary worth of data. It is challenging to
Volume refers to the huge amount of data generated monitor the value of reports, statistics, impacts on the
from emails, twitter, photos, and videos every second. In insights etc.
SCM, Volume can be related to the data generated from
the use of Sensors, bar codes, ERP, Transport 6. Variability:
management system and database technologies. Lack of variability in big data can be defined as the
Previously volume is measured in Gigabytes which is data which is not consistent or liable to vary or change.
now measured in Zettabytes (ZB) or even Yottabytes Supply Chain variability in terms of information sharing,
(YB). There are different forms and ways of storing the integration, quality control, unexpected delays in the
Big data generated from supply chain industry [20]. supply process etc.
Rational database management system (RDBMS), which
is a structured model employed to see, analyze, 7. Visualization:
manufacture and store the huge amount of supply chain Analyzing the data graphically is termed as
management data. Also the data clusters in Big data visualization. Visualization method is more effective in
storage includes components like conveying meaning than spreadsheets and reports or
using numbers and formulas in terms of Supply chain
data can be visualized using ERP, custom developed
reports or using graphical method.
8. Virality:
It measures speed of data movement from one network
to other. From the supply chain management view, it is
very essential for logistics process to be carried out.
9. Viscosity:
It mainly refers to the data latency or the delay in data.
Fig.2. Big Data Process. It can be easily understood as an element of velocity.
10. Volatility:
Direct attached storage (DAS) - includes different
types of hard disks/ hard drives which are attached How long the data is valid and how long it should be
to DBMS stored. It is mainly associated with old and new data.
Network storage (NS) – which comes in two forms RESEARCH FINDING 1: From literature survey it is
Network attached storage(NAS) and Storage Area found that Supply chain management is a big network of
Network(SAN) [20] multiple business strategies and relationships. In the Big
data ecosystem, where the data is found to be completely
2. Velocity: ubiquitous, it is challenging to justify the Big data
It mainly refers to the speed of the data collected, dimensions (all V’s). For example, the Big data
analyzed and transferred. It impacts on the efficiency and dimension volume relates to the data generated from
decision making models and algorithms in the field of Transport management system, Enterprise resource
SCM. planning and many more. Similar kind of explanation can
be found in another dimensions of Big data that is Variety,
3. Variety: which is more often referenced under data collection
It refers to the different forms of data like structured, process in SCM. Other issues related to Big data
unstructured or semi structured [35]. Also it includes dimensions are quality of data, information sharing,
different types of data from XML to video to SMS. The development of customized report, logistics process,
validation of data and others are really challenging to 1. Time series methods & Advanced forecasting.
achieve the completeness of all dimensions of Big data These methods are used for predicting the sales in
for SCM. SCM.
2. Statistical algorithms such as Discriminant
Analysis, k-NN, Naive Bayes (NB) and Bayes
CHAIN MANAGEMENT 3. Decision trees, CART and Random Forests uses
the hierarchical sequential structure
In spite of the largest growth in the field of data
4. Clustering algorithms used to group homogeneous
analytics experienced by customer insight, Analytics has elements in a data set.
many applications across end to end supply chain. As the 5. Frequent pattern mining algorithms
acquisition and transportation cost per entry is driven to
be minimum, there is an inevitably corrupted Predictive analytics mainly focused on forecasting at
measurements and errors in the large scale data has been strategic, tactic and operational levels, which is based on
found [29]. Since any of the data sources continuously
the planning process in terms of network design,
generate data in real time, analytics must often be production planning, inventory management and capacity
performed [29]. Applications of advanced analytic planning [14]. Predictive analytics uses mathematical
techniques has been described to supply chain
algorithms and programming in order to predict the
management [14]. Supply Chain Data Analytics has been patterns within data
classified into three types of Analytics: Descriptive, To understand the Descriptive and Predictive Analytics,
Predictive and Prescriptive analytics.
an experimental is performed on sales data, which is a
A. Descriptive Analytics: benchmark dataset available. The results are found to be
interesting. A predictive model is developed based on
Descriptive Analytics (DA) is mainly used to analyze regression analysis.
“what is happening” now in order to answer the question Detailed explanation is provided in section VI.
of “What happened” in the past. This is the first level of
analytics where 90% of organizations apply this strategy C. Prescriptive Analytics:
for betterment of the future. DA identifies the historical DA and PA are focused on what and when it will
data and analyzes the pattern. Descriptive Analytics happen, whereas Prescriptive Analytics anticipates on
mainly aims at identifying the problems and opportunities
“why it has happened”. It collects the data continuously
in the field of SCM within the existing processes and to re-predict the events which enable the decision makers
functions [17]. to increase the prediction accuracy for taking better
Descriptive Analytics uses the techniques like
decisions. Prescriptive analytics explains the reasons
behind certain events. It is mainly associated with
Data Modeling simulation and optimization [2]. The aim of Prescriptive
Regression Analysis analytics is to improve the business performance [17].
Visualization Three classes of algorithms used under this analytics
OLAP (online analytical processing) operations method are
like drill down, up and across to identify the areas.
Decision trees
Big data tools for supply chain analytics has been Fuzzy Rule-Based System
summarized in the Table 1. OLAP operations for Supply Switching Neural Networks (Logic Learning
chain may include shipments, products, logistics, Machine)
customers, suppliers and other dimensions like rates and
cost. The applications of Descriptive analytics provide Prescriptive analytics is focused on the optimization of
the managers with real time data regarding the quantities mathematical and simulation techniques in order to
of goods and location in the supply chain. provide the decision support tools which has been built
B. Predictive Analytics: on descriptive and predictive methods.
RESEARCH FINDING 2: Big data tools available for
Predictive analytics (PA) use both quantitative and supply chain analytics mainly used for data exploration,
qualitative methods to analyze the real time and historical integration of data, statistical analysis, proper
data to estimate the past and future levels of integration visualization methods and understanding the data
of business processes among functions or companies, as warehouse system. Few of them are R-prog, Informatica,
well as the associated costs and service levels [9]. PASW. The main observation is about LINGO, DSM,
Predictive analytics aims at projecting what will happen which are mainly used for documentation purpose based
in the future and why it may happen [17]. PA includes on customer support system. Pentaho is used to handle
algorithms/techniques such as [2] structured and unstructured volumes of data. To
summarize the integration of all tools is an important task functions are forecasting, inventory management,
in SCA for developing a Decision support system. transport management and also human resources. Big
data can address the issues of Supply chain like timely
Table 1. Big data tools for Supply Chain Analytics response, time delivery, real time planning, supplier and
Name customer relationship management etc. [15]. Operating
of the Tool Description an effective supply chain involves continuous flow of
Optimization tool for linear, nonlinear and information, which in turn helps to create better material
mathematical programming which is introduced by flow [8]. The main focus in supply chain is the customer.
John H Thomson in 1989. This tool can be used for
LINGO Easy model expressions, convenient data options and
So achieving a good customer focused system is one of
also it helps for documentation the aim of SCM [8]. The key supply chain processes/
challenges are listed and shown in the Fig 3.
Drop shipping management tool which is mainly used
to increase arbitrage opportunities. This is a best tool
in the field of drop shipping arbitrage. It can be used
DSM for handling customer support based on the sales
history, ticketing system and statistical reports.
better utilization of resources, social and environment customers. Minimization of cost, time and space are the
responsibility etc. Introducing or adopting green logistics main parameters. The above challenges are also reflecting
is a complex process which requires cross disciplinary the development of customer end activities such as
coordination and also changes in the current operation efficient delivery of products, packaging and handling
process [19]. This can be also being achieved by and documentation processes of related activities. A
introducing new practices in the area of supply and complete model can be developed by taking a real world
distribution that links them to other participants like data and establish that the above challenges are met using
suppliers and customers in the value chain. This link must Big data approach
be supported by management staff’s, their characteristics
and also by human resources [19].
C. Route Optimization
Logistics is a part of supply chain management.
Route optimization is the very important factor to
Logistics can be defined as process of managing the
control efficiently the physical flow of supply chain.
procurement, storage and movement of goods along with
Some of the challenges are like Optimization of single
the related information flow to maximize the profit of the
route trucking trip, Allocate appropriate resources per trip,
organization through cost effective fulfillment of orders.
Cost reduction. By achieving these goals, the benefits
Logistics has been identified as a core element of supply
from Route optimization is Route Efficiency.
chain management [21]. The aim of the Logistics is to
D. Space Optimization serve the customers in a cost effective way. Sustainability
in logistics can be defined as a cultural issue based on the
The key parameters in Space optimization are
demonstration of many companies and organizations. It
Maximizing space utilization, improve productivity and
can be a trend setting the business model or setting up the
also to minimize the cost. Benefits from Space
new market opportunities and also preparing for future
Optimization is utilization of space in a better way.
scenarios [22]. The term intelligent or smart logistics can
E. Last mile delivery problem be defined as a different logistics operation which are
planned, managed and controlled in a smart way
The main challenge in this problem is delivering
compared to conservative solutions [21]. Besides the
thousands of packages to customer in an efficient way. planning, managing & controlling the objects and
Another challenge here is time bound delivery of goods. resources of logistics, also the aggregation and processing
Benefit by overcoming this problem is Customer
of the collected data is an important task of Smart
Satisfaction. BDA enable the Last mile delivery problem logistics [27]. Some of the approaches to improve
by increasing the level of operational efficiency [15]. logistics by making them more intelligent are as follows
F. Redelivery Consignments
A. Autonomous Logistics
Some of the parameters in Redelivery consignments It describes the ability of logistics objects to process
are proper packaging and handling, Efficient the information, to provide and to execute their own
transportation which reduces the redelivery of the
products. Benefits by achieving above parameters is
efficiency in monetary through minimizing the redelivery. B. Product intelligence
G. Custom clearance time The way of storing and transporting any physical order
or product instance in an efficient manner.
The main parameter in custom clearance time is to
maintain proper documentation which represents the C. Intelligent transport systems
client in the time of custom examination and assessment.
It mainly refers to the innovative services related to
The benefit by achieving the given parameter is to avoid
transport and traffic management. This enables the user to
detention charges.
be better informed, safer and smarter use of transport
H. Track and Trace network.
Some of the parameters under track and trace is “Near- D. Physical Internet
real-time “tracking and Status & position information
Physical internet suggests exploiting the digital internet
[11]. Advantage by achieving this parameter is
metaphor in order to develop a physical internet towards
Performance improvement in track & trace.
meeting the global logistics sustainability challenge.
RESEARCH FINDING 3: Literature survey reveals
that the Big data challenges in Logistic management of E. Intelligent cargo
supply chain system mainly relates to the stake holders
Capabilities under Intelligent Cargo are self-
that is customers and the key Business functions. It is
identification, context detection, access to services, status
found that network optimization, route optimization and
monitoring and registering.
space optimization form the basis for a proper
management of physical space, physical flow of Supply F. Self-organizing logistics
chain and proper strategy of distribution of products to
36 A Review on Data Analytics for Supply Chain Management: A Case study
comparison of means (t-tests and one-way ANOVA); found from the table that the significance of ANOVA for
linear regression, logistic regression and many more. sales_quan_Feb is found to be 0.
The regression coefficients for the predictor
C. Scatter plot:
(Expected_sales) depicts the difference in response per
In order to determine the linear relationship between unit difference in the predictor. They are tabulated in
the variables (dependent and independent), it is suggested Table 3(c).
to run a scatter plot on the given data set. If the graph From the coefficients Table 3(c) regression equation is
contains no linear relationship, then no need for linear shown below.
regression. From Fig 4, it is found that points on the
graph are linear. This indicates that linear relationship DV=23411.25+0.02 * IV (3)
exists between the variables and simple regression can be
applied. The scatter plot obtained is shown in Fig.4 Where
DV = Dependent variable
IV = Independent variable
