Assignment: Ce Marketing Research & Data Analytics
Assignment: Ce Marketing Research & Data Analytics
Assignment: Ce Marketing Research & Data Analytics
BIG DATA
Big Data is a phrase used to mean a massive volume of both structured and unstructured data
that is so large it is difficult to process using traditional database and software techniques. In
most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds
current processing capacity. Big data is a field that treats ways to analyze, systematically
extract information from, or otherwise deal with data sets that are too large or complex to be
dealt with by traditional data-processing application software. Data with many cases (rows)
offer greater statistical power, while data with higher complexity (more attributes or
columns) may lead to a higher false discovery rate. Big data challenges include capturing
data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating,
information privacy and data source. Big data was originally associated with three key
concepts: volume, variety, and velocity. Other concepts later attributed with big data are
veracity (i.e., how much noise is in the data) and value
Current usage of the term big data tends to refer to the use of predictive analytics, user
behavior analytics, or certain other advanced data analytics methods that extract value from
data, and seldom to a particular size of data set. "There is little doubt that the quantities of
data now available are indeed large, but that's not the most relevant characteristic of this new
data ecosystem." Analysis of data sets can find new correlations to "spot business trends,
prevent diseases, combat crime and so on. Scientists, business executives, practitioners of
medicine, advertising and governments alike regularly meet difficulties with large data-sets in
areas including Internet search, fintech, urban informatics, and business informatics.
Scientists encounter limitations in e-Science work, including meteorology, genomics,
complex physics simulations, biology and environmental research.
EXAMPLE
The New York Stock Exchange generates about one terabyte of new trade data per
day.
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time.
With many thousand flights per day, generation of data reaches up to many Petabytes.
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data. Over the period of time, talent in computer science has achieved greater
success in developing techniques for working with such kind of data (where the format is
well known in advance) and also deriving value out of it.
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to
the size being huge, un-structured data poses multiple challenges in terms of its processing
for deriving value out of it.
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-structured
data as a structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML
file.
Characteristics Of Big Data
Volume
The name Big Data itself is related to a size which is enormous. Size of data plays a very
crucial role in determining value out of data. Also, whether a particular data can
actually be considered as a Big Data or not, is dependent upon the volume of data.
Variety
Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of
data considered by most of the applications.
Velocity
The term 'velocity' refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data.
Variability
This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
The challenges in this industry include: securities fraud early warning, tick
analytics, card fraud detection, archival of audit trails, enterprise credit risk
reporting, trade visibility, customer data transformation, social analytics for
trading, IT operations analytics, and IT policy compliance analytics, among
others.
Communications, Media and Entertainment
Consumers expect rich media on-demand in different formats and in a variety of
devices, some big data challenges in the communications, media and ent ertainment
industry include:
Government
The use and adoption of big data within governmental processes allows
efficiencies in terms of cost, productivity, and innovation, but does not come
without its flaws. Data analysis often requires multiple parts of government
(central and local) to work in collaboration and create new and innovative
processes to deliver the desired outcome.
CRVS (Civil Registration and Vital Statistics) collects all certificates status
from birth to death. CRVS is a source of big data for governments.
International development
Research on the effective usage of information and communication technologies
for development (also known as ICT4D) suggests that big data technology can
make important contributions but also present unique challenges to International
development. Advancements in big data analysis offer cost-effective
opportunities to improve decision-making in critical development areas such as
health care, employment, economic productivity, crime, security, and natural
disaster and resource management. Additionally, user-generated data offers new
opportunities to give the unheard a voice. However, longstanding challenges for
developing regions such as inadequate technological infrastructure and
economic and human resource scarcity exacerbate existing concerns with big
data such as privacy, imperfect methodology, and interoperability issues.