Big Data12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

What is Data?

In order to understand 'Big Data', you first need to know What is data?

The quantities, characters, or symbols on which operations are


performed by a computer, which may be stored and transmitted in
the form of electrical signals and recorded on magnetic, optical, or
mechanical recording media.

What is Big Data?

Big data is a combination of structured, semistructured and unstructured data collected by


organizations that can be mined for information and used in machine learning projects, predictive
modeling and other advanced analytics applications.

Systems that process and store big data have become a common component of data
management architectures in organizations. Big data is often characterized by the 3Vs: the
large volume of data in many environments, the wide variety of data types stored in big data systems
and the velocity at which the data is generated, collected and processed. These characteristics were
first identified by Doug Laney, then an analyst at Meta Group Inc., in 2001; Gartner further popularized
them after it acquired Meta Group in 2005. More recently, several other Vs have been added to
different descriptions of big data, including veracity, value and variability.

Although big data doesn't equate to any specific volume of data, big data deployments often involve
terabytes (TB), petabytes (PB) and even exabytes (EB) of data captured over time.

Importance of big data

Companies use the big data accumulated in their systems to improve operations, provide
better customer service, create personalized marketing campaigns based on specific
customer preferences and, ultimately, increase profitability. Businesses that utilize big data
hold a potential competitive advantage over those that don't since they're able to make faster
and more informed business decisions, provided they use the data effectively.

For example, big data can provide companies


with valuable insights into their customers that
can be used to refine marketing campaigns and
techniques in order to increase customer
engagement and conversion rates.

Furthermore, utilizing big data enables


companies to become increasingly customer-
centric. Historical and real-time data can be used to assess the evolving preferences of
consumers, consequently enabling businesses to update and improve their marketing
strategies and become more responsive to customer desires and needs.

Big data is also used by medical researchers to identify disease risk factors and by doctors to
help diagnose illnesses and conditions in individual patients. In addition, data derived from
electronic health records (EHRs), social media, the web and other sources provides
healthcare organizations and government agencies with up-to-the-minute information on
infectious disease threats or outbreaks.

In the energy industry, big data helps oil and gas companies identify potential drilling locations
and monitor pipeline operations; likewise, utilities use it to track electrical grids. Financial
services firms use big data systems for risk management and real-time analysis of market
data. Manufacturers and transportation companies rely on big data to manage their supply
chains and optimize delivery routes. Other government uses include emergency response,
crime prevention and smart city initiatives.
Characteristics Of Big Data
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one
characteristic which needs to be considered while dealing with Big Data.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by most
of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs,
audio, etc. are also being considered in the analysis applications. This variety of unstructured data
poses certain issues for storage, mining and analyzing data.

(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data
is generated and processed to meet the demands, determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.
The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times,
thus hampering the process of being able to handle and manage the data effectively.
Types Of Big Data
BigData' could be found in three forms:

1. Structured
2. Unstructured
3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data. Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well known in advance)
and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.

Looking at these figures one can easily understand why the name Big Data is given
and imagine the challenges involved in its storage and processing.

Do you know? Data stored in a relational database management system is one example of
a 'structured' data.

Examples Of Structured Data


An 'Employee' table in a database is an example of Structured Data

Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the size
being huge, un-structured data poses multiple challenges in terms of its processing for deriving value
out of it. A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don't know how to derive value out of it since this data is
in its raw form or unstructured format.

Examples Of Un-structured Data


The output returned by 'Google Search'

Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-structured
data as a structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML
file.

Examples Of Semi-structured Data

Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Data Growth over the years


Please note that web application data, which is unstructured, consists of log files,
transaction history files etc. OLTP systems are built to work with structured data
wherein data is stored in relations (tables).

How big data is stored and processed

The need to handle big data velocity imposes unique demands on the underlying compute
infrastructure. The computing power required to quickly process huge volumes and varieties of data
can overwhelm a single server or server cluster. Organizations must apply adequate processing
capacity to big data tasks in order to achieve the required velocity. This can potentially demand
hundreds or thousands of servers that can distribute the processing work and operate collaboratively
in a clustered architecture, often based on technologies like Hadoop and Apache Spark.

Achieving such velocity in a cost-effective manner is also a challenge. Many enterprise leaders are
reticent to invest in an extensive server and storage infrastructure to support big data workloads,
particularly ones that don't run 24/7. As a result, public cloud computing is now a primary vehicle for
hosting big data systems. A public cloud provider can store petabytes of data and scale up the
required number of servers just long enough to complete a big data analytics project. The business
only pays for the storage and compute time actually used, and the cloud instances can be turned off
until they're needed again.

To improve service levels even further, public cloud providers offer big data capabilities through
managed services that include the following:

Amazon EMR (formerly Elastic MapReduce)

Microsoft Azure HDInsight

Google Cloud Dataproc

In cloud environments, big data can be stored in the following:

Hadoop Distributed File System (HDFS);

lower-cost cloud object storage, such as Amazon Simple Storage Service (S3);

NoSQL databases; and relational databases.


For organizations that want to deploy on-premises big data systems, commonly used Apache
open source technologies in addition to Hadoop and Spark include the following:

YARN, Hadoop's built-in resource manager and job scheduler, which stands for Yet Another
Resource Negotiator but is commonly known by the acronym alone;

the MapReduce programming framework, also a core component of Hadoop;

Kafka, an application-to-application messaging and data streaming platform;

the HBase database; and

SQL-on-Hadoop query engines, like Drill, Hive, Impala and Presto.

Users can install the open source versions of the technologies themselves or turn to
commercial big data platforms offered by Cloudera, which merged with former rival
Hortonworks in January 2019, or Hewlett Packard Enterprise (HPE), which bought the assets
of big data vendor MapR Technologies in August 2019. The Cloudera and MapR platforms
are also supported in the cloud.

The human side of big data analytics


Ultimately, the value and effectiveness of big data depend on the workers tasked with
understanding the data and formulating the proper queries to direct big data analytics projects.
Some big data tools meet specialized niches and enable less technical users to use everyday
business data in predictive analytics applications. Other technologies -- such as Hadoop-
based big data appliances -- help businesses implement a suitable compute infrastructure to
tackle big data projects, while minimizing the need for hardware and distributed software
know-how.

Big data can be contrasted with small data, another evolving term that's often used to describe
data whose volume and format can be easily used for self-service analytics. A commonly
quoted axiom is that "big data is for machines; small data is for people."

Volume is the most commonly cited characteristic of big data. A big data environment doesn't
have to contain a large amount of data, but most do because of the nature of the data being
collected and stored in them. Clickstreams, system logs and stream processing systems are
among the sources that typically produce massive volumes of big data on an ongoing basis.

Benefits of Big Data Processing


Ability to process Big Data brings in multiple benefits, such as-

o Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.

o Improved customer service

Traditional customer feedback systems are getting replaced by new systems designed with Big
Data technologies. In these new systems, Big Data and natural language processing technologies
are being used to read and evaluate consumer responses.

o Early identification of risk to the product/services, if any


o Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before
identifying what data should be moved to the data warehouse. In addition, such integration of Big
Data technologies and data warehouse helps an organization to offload infrequently accessed data.

USES

How is big data used?


The diversity of big data makes it inherently
complex, resulting in the need for systems capable
of processing its various structural and semantic
differences.

Big data is used in nearly every industry to identify patterns and trends, answer questions,
gain insights into customers, and tackle complex problems. Companies and organizations use the
information for a multitude of reasons like growing their businesses, understanding customer
decisions, enhancing research, making forecasts and targeting key audiences for advertising.
BIG DATA EXAMPLES
• Personalized e-commerce shopping experiences
• Financial market modelling

• Compiling trillions of data points to speed up cancer research

• Media recommendations from streaming services like Spotify, Hulu and Netflix

• Predicting crop yields for farmers

• Analyzing traffic patterns to lessen congestion in cities


• Data tools recognizing retail shopping habits and optimal product placement

• Big data helping sports teams maximize their efficiency and value
• Recognizing trends in education habits from individual students, schools and districts

Here are a few industries in which the big data revolution is already underway:

Finance
The finance and insurance industries utilize big data and predictive analytics for fraud detection, risk
assessments, credit rankings, brokerage services and blockchain technology, among other uses.
Financial institutions are also using big data to enhance their cybersecurity efforts and
personalize financial decisions for customers.

Healthcare
Hospitals, researchers and pharmaceutical companies are adopting big data solutions to improve and
advance healthcare. With access to vast amounts of patient and population data, healthcare is
enhancing treatments, performing more effective research on diseases like cancer and Alzheimer’s,
developing new drugs, and gaining critical insights on patterns within population health.

Media & Entertainment


If you've ever used Netflix, Hulu or any other streaming services that provides recommendations,
you've witnessed big data at work.

Media companies analyze our reading, viewing and listening habits to build individualized
experiences. Netflix even uses data on graphics, titles and colors to make decisions about customer
preferences.
Agriculture
From engineering seeds to predicting crop yields with amazing accuracy, big data and automation is
rapidly enhancing the farming industry.

With the influx of data in the last two decades, information is more abundant than food in many
countries, leading researchers and scientists to use big data to tackle hunger and malnutrition. With
groups like the Global Open Data for Agriculture & Nutrition (GODAN) promoting open and
unrestricted access to global nutrition and agricultural data, some progress is being made in the fight
to end world hunger.More Application Areas

• Advertising & marketing


• Business
• E-commerce & retail
• Education
• Internet of Things
• Sports

Summary

• Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection
of data that is huge in size and yet growing exponentially with time.
• Examples of Big Data generation includes stock exchanges, social media sites, jet engines,
etc.
• Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
• Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
• Improved customer service, better operational efficiency, Better Decision Making are few
advantages of Bigdata
GLOSSARY
English Spanish English Spanish

You might also like