Big Data Analytics For Dummies
Big Data Analytics For Dummies
Big Data Analytics For Dummies
by Barry Schoenborn
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Big Data Analytics Infrastructure For Dummies,® IBM Limited Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2014 by John Wiley & Sons, Inc.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as
permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior writ-
ten permission of the Publisher. Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011,
fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of John
Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not be used
without written permission. IBM and the IBM logo are registered trademarks of International
Business Machines Corporation. All other trademarks are the property of their respective owners.
John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
For general information on our other products and services, or how to create a custom For Dummies
book for your business or organization, please contact our Business Development Department in the
U.S. at 877-409-4177, contact [email protected], or visit www.wiley.com/go/custompub.
For information about licensing the For Dummies brand for products or services, contact
BrandedRights&[email protected].
ISBN: 978-1-118-92136-4 (pbk); ISBN: 978-1-118-92311-5 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2
Publisher’s Acknowledgments
We’re proud of this book and of the people who worked on it. For details on how to create a
custom For Dummies book for your business or organization, contact [email protected]
or visit www.wiley.com/go/custompub. For details on licensing the For Dummies brand
for products or services, contact BrandedRights&[email protected].
Some of the people who helped bring this book to market include the following:
Project Editor: Carrie A. Johnson Custom Publishing Project Specialist:
Acquisitions Editor: Connie Santisteban Michael Sullivan
Editorial Manager: Rev Mengle Production Coordinator: Melissa Cossell
Business Development Representative: Special Help: Rick Perret and
Sue Blessing Herbert Schultz
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book......................................................................... 2
Icons Used in This Book............................................................. 2
Beyond the Book......................................................................... 3
IBM Storage................................................................................ 26
Looking at storage basics.............................................. 26
IBM FlashSystem............................................................. 27
IBM System Software................................................................ 28
IBM Elastic Storage......................................................... 29
Platform Computing....................................................... 29
With Big Data and Analytics (BD&A), you get a new approach
to gathering, processing, and understanding information. You
may hear the term a lot, but many people still don’t fully know
what it is. Very broadly, BD&A is a combined hardware/software
architecture that gathers vast quantities of disparate data for
fast analysis. The analysis produces meaningful information. The
result is that you can make faster and better-informed business
decisions. The benefit is competitive advantage and (hopefully)
increased profitability for your operation.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
2 Big Data Analytics Infrastructure For Dummies
Beyond the Book
You can find additional information about BD&A (and about
IBM’s approach to it) by visiting the following websites:
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
4 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
6 Big Data Analytics Infrastructure For Dummies
Volume
The first attribute of Big Data is volume. Big Data projects tend
to imply terabytes to petabytes of information. However, some
smaller industries and organizations are likely to deal with
mere gigabytes or terabytes of data.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Getting to Know Big Data and Analytics 7
Velocity
The second attribute of Big Data is velocity — the speed at
which information arrives, is analyzed, and is delivered. The
velocity of data moving through the systems of an organi-
zation varies from batch integration and loading of data at
predetermined intervals to real-time streaming of data. The
former can be seen in traditional data warehousing. The latter
is in the world of technologies such as complex event pro-
cessing (CEP), rules engines, text analytics, inferencing, and
machine learning.
Variety
The third attribute of Big Data is variety. In the past, enterprises
had only to deal with a manageable number of data sources.
Times have changed. Today’s business environment includes
not only more data but also more types of data than ever before.
Disparate data is data from a variety of data sources and in a vari-
ety of formats, and is a major challenge that business analytics
and Big Data projects must contend with.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
8 Big Data Analytics Infrastructure For Dummies
Understanding the Business
Need for BD&A
An excellent technique for beating the competition and pros-
pering is reducing the time gap between awareness of a condi-
tion (the trigger event) and action on it. This is called time to
insight. Table 1-1 is a simple comparison of traditional time to
insight versus BD&A time to insight.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Getting to Know Big Data and Analytics 9
Table 1-1 Traditional versus BD&A Time to Insight
Timeline Days (without BD&A) Days (with BD&A)
Event 0 0
Time to get data 14 1
Time to analyze 7 1
Time to insight 21 2
Competitive advantage
Competitive advantage occurs when an enterprise acquires or
develops attributes or processes that allow it to outperform its
competitors. It’s easy to define, but more difficult to get. Different
executives have their favorite items that give competitive advan-
tage. Broadly, they include one or more of the following:
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
10 Big Data Analytics Infrastructure For Dummies
Enhanced ROI
A universal business goal is increasing ROI — also known as
“getting more bang for the buck.” There are many ways to
measure ROI, including
In the past, brand loyalty could get you through, but that’s not
true anymore. That’s because your customer base’s choices
have widened, largely through the Internet.
You can use BD&A to learn what customers want and (therefore)
enhance their attaining it. You have to deliver on customer wants
and needs. Customer experience improves by your knowing what
attracts customers (products, interactions, purchasing, warran-
ties, “rewards,” and so forth).
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Getting to Know Big Data and Analytics 11
But first, here’s a difficult reality: It’s a certainty that your
current systems can’t keep pace with the growth of data and
neither can your IT budget. The volume of data grows fast, and
the parts needed to manage it grow increasingly complex.
Promoting success
The key items needed for success are managing risk, promot-
ing agility, acting strategically, and practicing forward thinking.
These are covered in this section.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
12 Big Data Analytics Infrastructure For Dummies
Master plan
Build against a master plan. Plan for all types of data and
all types of analytics, with an efficient IT Infrastructure as a
given. Collaborate. Cultivate new partnerships and roles. IT
and LoB executives must join to understand and develop the
master plan.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 1: Getting to Know Big Data and Analytics 13
Infrastructure plan
Build an infrastructure designed to enable new levels of insights
derived from exploiting all relevant data. The platform should
be fluent in all forms of data and analytics: transactional data,
Hadoop data, and so forth.
New approach
Take a new approach in thinking about the infrastructure for
BD&A:
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
14 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
16 Big Data Analytics Infrastructure For Dummies
Even if that were true, the executive must stay constantly alert
to the possibilities of increasing revenue, reducing expenses,
and competing more effectively. So chances are that the infra-
structure that meets business goals today won’t meet them
tomorrow.
Needs analysis
An old IT saying is that “it proceeds from need.” After the
customer declares her business goals, mathematical tools
address the infrastructure that meets those goals. For
example:
Introducing the Infrastructure
Components of BD&A
This section addresses four core infrastructure capabilities
critical for optimal BD&A infrastructure: scalability, parallel
processing, low-latency resources, and data optimization.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2: Looking at Infrastructure for Big Data and Analytics 17
Scalability of infrastructure
Your BD&A needs will change. Presumably, the reason is that
your needs grow as your enterprise grows. To accommodate
this grown, the infrastructure must feature scalability and
resilience.
Parallel processing
There’s parallelism in processor design as in system level
software. Of course, you need an intelligently designed traffic
cop that understands the workloads so data and processing
instructions can be intelligently threaded through the hard-
ware layer — from processor to memory to storage. Greater
core and thread densities can help boost performance in data
and processing as more users begin to scale.
The same can be said for systems level software and for Big
Data in particular. File system parallelism is coming of age. Big
Data may need to be accessed in multiple ways from multiple
places and at multiple rates of speed.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
18 Big Data Analytics Infrastructure For Dummies
Data optimization
In the world of BD&A, data management takes on a whole new
meaning. What data do I keep? Do I need to store it in a differ-
ent location? What are the costs of dealing with the data? How
quickly and when do I need it? All good questions that require
new capabilities.
Big data isn’t just about “social media” Hadoop data. It’s about
all your data and how internal and external data sources
provide you with the best insights. Consider the following:
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2: Looking at Infrastructure for Big Data and Analytics 19
Network infrastructure
A network is a combination of points between regional data centers.
or nodes and the lines that connect Should disaster strike, process-
them. In the world of IT, the definition ing will “fail over” to another
holds up, but has several variants. data center over a Wide Area
The variations depend largely on the Network.
perspective of the people discussing
✓ Branch offices: Many businesses
the network — and they’re all legiti-
have a central data center and
mate. In all cases, they contribute to
branch offices, but banking surely
the BD&A architecture.
offers the most dramatic example.
The variants include the following: Bank of America has 5,600 bank-
ing centers and 16,200 automated
✓ Operating departments: In a busi-
teller machines. That implies an
ness or government environment,
enormous network.
network likely refers to the con-
nection between desktop com- ✓ Mergers: Bank mergers are a
puters and processors. That why special source of concern. There
it’s called a Local Area Network are security issues, compatibility
(LAN). issues, and generally no (or few)
branch closings. IBM assisted
✓ Data center: Network often refers
Bilbao Bizkaia Kutxa (BBK) when
to the system of redundant full
it acquired CajaSur. Both insti-
crosspoint switches that move
tutions were about the same
data between storage and serv-
size. That implies an enormous
ers, and vice versa.
network.
✓ Failover: There’s much more
✓ Internet: Large Internet retailers
to disaster recovery (DR) than
have robust data center net-
local backups or offsite back-
works, but also get information
ups. Backups aren’t all that
from and send information to
helpful when a flood or earth-
many users connected over the
quake hits. Sophisticated com-
Internet.
panies have robust connections
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
20 Big Data Analytics Infrastructure For Dummies
Speed/performance
BD&A provides insightful answers, but you need the horse-
power to get the job done. The infrastructure must have the
power to meet current needs and grow to meet expanding
needs. An application may have a key requirement to meet
certain thresholds. For example, for a customer service
analytics application, you can’t wait two minutes to get an
answer; the answer has to be available in seconds.
You need to balance cost versus speed and the value of busi-
ness outcomes. In addition, also consider future growth. A well-
architected solution should achieve the best balance among all
requirements.
Availability
Resilience is the ability of a system to absorb or avoid damage
without suffering complete failure. A well-planned BD&A archi-
tecture allows for processor, storage, and facility failures (with
protection via “failover” mechanisms), software failures, and at
the least a strong backup and restore capability.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 2: Looking at Infrastructure for Big Data and Analytics 21
Ensuring availability begins with evaluating from where the
data comes and to where the insights flow. If insights are
important to your real-time operations and processes, resil-
ience (and disaster recovery) is vital. Solutions may range
from simple rebuilds of a failed disk to simultaneous asyn-
chronous data replication at a remote site.
Access
Understanding data access is the basis for three kinds of
efficiency:
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
22 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 3
Looking at IBM
Infrastructure Choices
In This Chapter
▶ Seeing a world of infrastructure options
▶ Examining the storage universe
IBM Servers
You may want to become familiar with key IBM product line
names. They’re the core of a BD&A infrastructure and fall into
the worlds of processing and storage.
and create new business models. For each entry point, certain
infrastructure design points (speed, access, availability) are
more important than others. Certain design points are better
enabled by specific hardware and software infrastructure capa-
bilities and architectures.
IBM System z
IBM’s mainframe is System z. Much of the world’s transactional
and financial data still reside on mainframe systems, and the
System z BD&A solutions are centered on the concept of moving
analytics processing closer to the data, integrating and manag-
ing the full variety, velocity, and volume of data that comprises
advanced analytics business solutions. This approach relies on
the structured data already residing on System z but also incor-
porates feeds from unstructured and streaming data sources.
IBM System z mainframes deliver the ability to easily handle mas-
sive data volumes while also scaling quickly and cost-effectively
as needs arise — efficiently protecting and governing data and
delivering leading-edge computing performance.
IBM PureSystems
IBM PureSystems is a series of integrated systems that allow
choice and flexibility in server deployments, including both
x86 and non-x86 server nodes. The nodes are connected by
26 Big Data Analytics Infrastructure For Dummies
IBM Storage
IBM has a portfolio of storage capabilities for BD&A deploy-
ments. In this section, you take a look at those and get the
lowdown on storage basics.
✓ Self-optimizing
✓ Cloud-agile
✓ Reduced space requirements
Chapter 3: Looking at IBM Infrastructure Choices 27
✓ Intuitive management
✓ No manual intervention required
✓ Reduced administrator time
IBM FlashSystem
Huge amounts of data are generated by sensors and comput-
ers in addition to data created by people. It’s hard to keep up
with data growth. And then, without high-performance stor-
age, it can be impossible to extract meaningful insights from
rapidly accumulating data in areas such as customer satisfac-
tion, operational efficiency, financial processes, risk, fraud,
and compliance management.
Platform Computing
When you have big data problems to solve, you need to maxi-
mize the potential of your computing power and the support-
ing infrastructure to accelerate your applications at scale,
extract insight from your data, and make better decisions
faster. IBM Platform Computing provides that by pooling your
computing resources, managing them efficiently across mul-
tiple groups, and getting the most out of your IT investment.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
32 Big Data Analytics Infrastructure For Dummies
Business Need
The Oncor smart meter deployment creates immense volumes
of data, including an average of 2,900 records a month for
each of its 3+ million customers. The company needed a more
efficient and cost-effective alternative to its existing storage
setup.
Solution
Oncor deployed IBM Power Servers and System z with IBM
ProtecTIER.
Fashion
Bernard Chaus (Chaus) is an American clothing company
that retails consumer products. The clothing line is made to
enhance the dynamic woman’s personal style. The company
is involved with wholesale distribution and services. Chaus
found its sales and merchandising tracking efforts were
weighted toward time-consuming report creation and data
issues. That, combined with the increasingly competitive
nature of the business, prevented Chaus from effectively ana
lyzing and addressing its selling opportunities.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4: Solving Big Problems with Powerful Solutions 33
EDI is a particular set of standards for computer-to-computer
exchange of information.
Travel
Go Voyages, an online travel company in Paris, France, offers
a full range of services including accommodations, cruises,
short- and long-term vacation packages, and insurance. Go
Voyages has 450 employees with websites in English, French,
German, Italian, and Portuguese.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
34 Big Data Analytics Infrastructure For Dummies
Responding to customer
behavior
Go Voyages has deployed a business intelligence solution,
built on IBM hardware and software technologies, to create
and support a comprehensive data storage environment for
the company’s travel data and payment records, and enables
customer data to be integrated and analyzed using the soft
ware’s predictive modeling capability.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4: Solving Big Problems with Powerful Solutions 35
with a short layover — and with substantial savings. The
analysis allows Go Voyages to tailor the response to flight
inquiries and to increase the probability of ticket sales.
Healthcare
Park Nicollet Health Services provides healthcare services
through its network of 25 clinics, six urgent care sites, and one
hospital. From its headquarters in St. Louis Park, Minnesota,
the organization provides inpatient and outpatient services,
emergency care, pharmacy services, and health and wellness
programs. Although many other healthcare organizations
operate hospitals as separate entities from physicians, which
can inhibit information sharing, Park Nicollet set out more
than two decades ago to create an integrated care network
designed to encourage collaboration.
Insurance
The Swiss Re Group, headquartered in Zurich, is a leading
worldwide wholesale provider of reinsurance, insurance, and
other insurance-based forms of risk transfer. Its global client
base consists of insurance companies, mid-to-large-sized cor
porations, and public sector clients. From standard products
to tailor-made coverage across all lines of business, Swiss Re
deploys its capital strength, expertise, and innovation power
to enable the risk-taking on which enterprises and progress in
society depends.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
36 Big Data Analytics Infrastructure For Dummies
Overview
With 2.5 billion transactions and records dating back nearly
30 years, Swiss Re needed a solution with the flexibility and
capacity to manage its growing data workloads. The company
wanted to acquire insights from multiple international locations
of claims information from numerous insurance companies.
These insights would increase the understanding of risk, the
agility in identifying profitable segments, and the ability to
make decisive decisions quickly by internal business users.
The company needed a system that would also scale to meet
future data and analytics needs.
Solution
Swiss Re chose to deploy IBM zEnterprise System with IBM
DB2 for z/OS to perform data analysis and reporting from
a central location, with IBM DB2 Analytics Accelerator
(IDAA), powered by IBM Netezza technology, to deliver faster
responses to individual analytic queries.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 4: Solving Big Problems with Powerful Solutions 37
In addition to its public transport duties, SSB acts as an IT
service provider for other public transport companies and
medium-sized companies from various industries. Its IT service
business specializes in SAP ERP hosting, and the company also
provides general data center services and runs online shops,
marketplaces, and business-to-business portals for its clients.
Business need
Slow response times or lack of application availability could
have a huge impact on SSB’s ability to properly coordinate its
activity and deliver safe, efficient urban transport. It could also
cause significant disruption to its clients’ operations, leading
to lost revenues if clients chose to switch hosting providers.
The company wanted to ensure fast access to enterprise data,
in order to provide its own executives and its clients with the
information they needed to perform timely, accurate analysis
and reporting and guide better decision-making.
Solution
SSB chose IBM Power Systems servers, running IBM AIX, to
deliver the performance it needed for its business-critical sys
tems, including an extensive SAP ERP application landscape.
To manage growing quantities of data and provide fast access
to vital information, the company extended its existing stor
age environment, based on IBM System Storage SAN Volume
Controller and high-performance IBM FlashSystem technology.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
38 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
40 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Chapter 5: Ten Questions LoB and IT Execs Should Ask Each Other 41
No matter what you do, make sure you and your IT team
understand the infrastructure implications early.
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
42 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
44 Big Data Analytics Infrastructure For Dummies
These materials are © 2014 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.