Path Breaking Case Studies in E-Commerce Using Data Mining: Rupesh Sanchati, P.C. Patidar, Gaurav Kulkarni
Path Breaking Case Studies in E-Commerce Using Data Mining: Rupesh Sanchati, P.C. Patidar, Gaurav Kulkarni
Path Breaking Case Studies in E-Commerce Using Data Mining: Rupesh Sanchati, P.C. Patidar, Gaurav Kulkarni
Volume 1, Issue 1
20
International Journal of Computer Technology and Electronics Engineering (IJCTEE)
Volume 1, Issue 1
automated. Also, because the mined models will fit well with The Customer Interaction component provides the
the existing system, computing return on investment can be interface between customers and the e-commerce business.
much easier. This interaction could take place through a web site (e.g., a
The lowering of several significant hurdles to the marketing site or a web store), customer service (via
applicability of data mining will allow many more companies telephony or email), wireless application, or even a
to implement intelligent systems for e-commerce. However, bricks-and-mortar point of sale system. For effective analysis
there is an even more compelling reason why it will succeed. of all of these data sources, a data collector needs to be an
As implied above, the volume of data collected by systems for integrated part of the Customer Interaction component. To
e-commerce dwarfs prior collections of commerce data. provide maximum utility, the data collector should not only
Manual analysis will be impossible, and even traditional log sale transactions, but it should also log other types of
semi-automated analyses will become unwieldy. Data mining customer interactions, such as web page views for a web site.
soon will become essential for understanding customers.
The Analysis component provides an integrated en-
The lessons described in this paper are based on case
vironment for decision support utilizing data transformations,
studies and extensive contemporary literature study. While
reporting, data mining algorithms, visualization, and OLAP
the lessons can be drawn, both at business implementation
tools. The richness of the available metadata gives the
and technical fronts, here, in this paper we attempt to
Analysis component significant advantages over horizontal
summarize our inferences on the business front.
decision support tools, in both power and ease-of-use.
B. Integrating E-Commerce and Data Mining: Architecture The Stage Data bridge connects the Business Data
In this section we give a high level overview of architecture Definition component to the Customer Interaction
for an e-commerce system with integrated data mining. In this component. This bridge transfers (or stages) the data and
architecture there are three main components, Business Data metadata into the Customer Interaction component. Having a
Definition, Customer Interaction, and Analysis. Connecting staging process has several advantages, including the ability
these components are three data transfer bridges, Stage Data, to test changes before having them implemented in
Build Data Warehouse, and Deploy Results. The relationship production, allowing for changes in the data formats and
between the components and the data transfer bridges is replication between the two components for efficiency, and
illustrated in Figure 1. enabling e-commerce businesses to have zero down-time.
The Build Data Warehouse bridge links the Customer
Interaction component with the Analysis component. This
bridge transfers the data collected within the Customer
Interaction component to the Analysis component and builds
a data warehouse for analysis purposes. The Build Data
Warehouse bridge also transfers all of the business data
defined within the Business Data Definition component
(which was transferred to the Customer Interaction
component using the Stage Data bridge).
The last bridge, Deploy Results, is the key to closing the
loop and making analytical results actionable. It provides the
ability to transfer models, scores, results and new attributes
constructed using data transformations back into the Business
Data Definition and Customer Interaction components for use
Figure: 1 Architecture of Integrated Data Mining with in business rules for personalization.
E-Commerce
III. PROPOSED FRAMEWORK
In the Business Data Definition component the After analyzing various retail e-commerce sites, we
e-commerce business user defines the data and metadata propose some analyses that would be useful in practice. In
associated with their business. This data includes mer- each of the following subsections we describe the lessons
chandising information (e.g., products, assortments, and price learned from path breaking case studies.
lists), content information (e.g., web page templates, articles,
images, and multimedia) and business rules (e.g., Case 1: Bot Analysis
personalized content rules, promotion rules, and rules for
Web robots, spiders, crawlers, and aggregators, which we
cross-sells and up-sells). From a data mining perspective the
collectively call bots, are automated programs that create
key to the Business Data Definition component is the ability
traffic to websites. Bots include search engines, such as
to define a rich set of attributes (metadata) for any type of
Google, web monitoring software, such as Keynote and
data.
Gomez, and shopping comparison agents, such as mySimon.
Because such bots crawl sites and may bring in additional
21
International Journal of Computer Technology and Electronics Engineering (IJCTEE)
Volume 1, Issue 1
human traffic through referrals, it is not a good idea for 2. Just because the traffic is increasing immediately after
websites to block them from accessing the site. In addition to registering with search engines, one should not get
these good bots, there are e-mail harvesters, which try to overwhelmed, because substantial part of that might be bot
look for e-mails that are sold as e-mail lists, offline browsers traffic.
(e.g., Internet Explorer has such an option), and many 3. Many commercial web analytic packages include basic
experimental bots by students and companies trying out new bot detection through a list of known bots, identified by their
ideas. user agent or IP. However, such lists must be updated
regularly to keep track of new evolving and mutating bots.
Observations:
1. Both account for 5 to 40% of sessions. Due to the volume
Figure 2: Setting a suitable session timeout threshold.
and type of traffic that they generate, bots can dramatically
skew site statistics.
2. Even when the human traffic is fluctuating substantially, Observations:
the bot traffic still remains the same. 1. If the session timeout threshold were set to 25 minutes
3. After registering with search engine the external bot traffic then for client A, 7% of all sessions would experience
increases substantially, as expected. timeout and 8.25% of sessions with active shopping carts
would lose their carts as a result. However, for client B, the
Lesson:
numbers are 3.5% and 5% respectively.
1. Accurately identifying bots and eliminating them before 2. Several user sessions were experiencing a timeout as a
performing any type of analysis on the website is critical. result of a low timeout threshold and lost their active
22
International Journal of Computer Technology and Electronics Engineering (IJCTEE)
Volume 1, Issue 1
shopping cart spending per customer for multi-channel customers is more
than that of the web-only channel.
Lesson:
1. The software save the shopping cart automatically at
timeout and restore it when the visitor returns.
2. Clients must determine the timeout threshold only after
careful analysis of their own data.
3. Setting the session timeout threshold too high would
mean that fewer users would experience timeout thereby
improving the user experience.
4. A larger number of sessions would have to be kept active
(in memory) at the website thereby resulting in a higher load
on the website system resources.
5. Setting an appropriate session timeout threshold
involves a trade-off between website memory utilization
(which may impact performance) and user experience. So
maintain a right balance.
Case 3: Simpsons paradox Figure 3: Average yearly spending per customer for
On a few occasions it becomes difficult to present multi-channel and web-only purchasers by number of
insights that are seemingly counter-intuitive. For instance, purchases (left), and average yearly spending per customer
when analyzing a clients data we came across an example for multi-channel and web-only purchasers (right).
of Simpsons paradox (Simpson, 1951). Simpsons paradox Lesson:
occurs when the correlation between two variables is
1. Explain counter-intuitive insights - The reversal of the
reversed when a third variable is controlled.
trend in the above case is happening because a weighted
average is being computed and the number of customers who
shopped more than five times on the web is much smaller than
the number of customers who shopped more than five times
across multiple channels. Such insights must be explained to
business users.
We were comparing customers with at least two purchases Figure 4: Clarification of Simpsons paradox
and looking at their channel preferences, i.e., where they
made purchases. Do people who shop from the web only Case 4: Search Effectiveness Analysis
spend more on average as compared to people who shop from
Significant time and effort is spent in designing forms that
more than one channel, such as the web and physical retail
are aesthetically pleasing. The eventual use of the collected
stores.
form data for the purpose of data mining must also be kept in
Observations: mind when designing forms.
1. The line chart in Figure 3 shows that for each group of
shoppers who shopped once, twice, three times, four times, Observation:
five times, and more than five times respectively, the average
1. On the basis of average sales per visit, it can be said that
spending per customer on the web-only channel is more than
Customers that search are worth two times as much as
the average spending per customer on multiple channels.
customers that do not search.
2. However, the bar chart in Figure 3 shows that the average
2. Failed searches hurt sales severely.
23
International Journal of Computer Technology and Electronics Engineering (IJCTEE)
Volume 1, Issue 1
24
International Journal of Computer Technology and Electronics Engineering (IJCTEE)
Volume 1, Issue 1
25