Full Ebook of Research Practitioners Handbook On Big Data Analytics 1St Edition S Sasikala Online PDF All Chapter
Full Ebook of Research Practitioners Handbook On Big Data Analytics 1St Edition S Sasikala Online PDF All Chapter
Full Ebook of Research Practitioners Handbook On Big Data Analytics 1St Edition S Sasikala Online PDF All Chapter
https://ebookmeta.com/product/machine-learning-and-big-data-
analytics-proceedings-of-international-conference-on-machine-
learning-and-big-data-analytics-icmlbda-2021-1st-edition-rajiv-
misra/
https://ebookmeta.com/product/handbook-of-research-for-big-
data-1st-edition-brojo-kishore-mishra/
https://ebookmeta.com/product/handbook-of-big-data-analytics-and-
forensics-kim-kwang-raymond-choo-editor/
https://ebookmeta.com/product/handbook-of-research-on-big-data-
clustering-and-machine-learning-1st-edition-fausto-pedro-garcia-
marquez-editor/
Data-Enabled Analytics: DEA for Big Data Joe Zhu
https://ebookmeta.com/product/data-enabled-analytics-dea-for-big-
data-joe-zhu/
https://ebookmeta.com/product/big-data-and-analytics-2nd-edition-
seema-acharya/
https://ebookmeta.com/product/big-data-analytics-with-r-1st-
edition-simon-walkowiak/
https://ebookmeta.com/product/mathematical-foundations-of-big-
data-analytics-vladimir-shikhman/
https://ebookmeta.com/product/advances-in-big-data-analytics-1st-
edition-hamid-r-arabnia/
RESEARCH PRACTITIONER’S
HANDBOOK ON BIG DATA ANALYTICS
RESEARCH PRACTITIONER’S
HANDBOOK ON BIG DATA ANALYTICS
S. Sasikala, PhD
Renuka Devi D, PhD
S. Sasikala, PhD
S.Sasikala, PhD, is Associate Professor and Research Supervisor in
the Department of Computer Science, IDE, and Director of Network
Operation and Edusat Programs at the University of Madras, Chennai,
India. She has 23 years of teaching experience and has coordinated
computer-related courses with dedication and sincerity. She has acted as
Head-in-charge of the Centre for Web-based Learning for three years,
beginning in 2019. She holds various posts at the university, including
Nodal Officer for the UGC Student Redressal Committee, Coordinator
for Online Course Development at IDE, President for Alumni Associa-
tion at IDE. She has been an active chair in various Board of Studies
meetings held at the institution and has acted as an advisor for research.
She has participated in administrative activities and shows her enthu-
siastic participation in research activities by guiding research scholars,
writing and editing textbooks, and publishing articles in many reputed
journals consistently. Her research interests include image, data mining,
machine learning, networks, big data, and AI. She has published two
books in the domain of computer science and published over 27 research
articles in leading journals and conference proceedings as well as four
book chapters, including in publications from IEEE, Scopus, Elsevier,
Springer, and Web of Science. She has also received best paper awards
and women’s achievement awards. She is an active reviewer and edito-
rial member for international journals and conferences. She has been
invited for talks on various emerging topics and chaired sessions in
international conferences.
2. Preprocessing Methods.............................................................................. 45
Abstract.............................................................................................................................. 45
2.1 Data Mining—Need of Preprocessing ...................................................................... 45
2.2 Preprocessing Methods ............................................................................................. 49
2.3 Challenges of Big Data Streams in Preprocessing.................................................... 59
2.4 Preprocessing Methods ............................................................................................. 60
Keywords........................................................................................................................... 68
References ......................................................................................................................... 68
With the recent developments in the digital era, data escalate at a rapid
rate. Big data refers to an assortment of data that are outsized and intricate
so that conventional database administration systems and data processing
tools cannot process them. This book mainly focuses on the core concepts of
big data analytics, tools, techniques, and methodologies from the research
perspectives. Both theoretical and practical approaches are handled in this
book that can cover a broad spectrum of readers. This book would be a
complete and comprehensive handbook in the research domain of big data
analytics.
Chapter 1 briefs about the fundamentals of big data, terminologies,
types of analytics, and big data tools and techniques.
Chapter 2 outlines the need for preprocessing data and various methods
in handling the same. Both text and image preprocessing methods are also
highlighted. In addition to that, challenges of streaming data processing are
also discussed.
Chapter 3 briefs on various featured selection methods and algorithms,
and research problems related to each category are discussed with specific
examples.
Chapter 4 describes the core methods of big data streams and the
prerequisite for parallelization. This chapter also enlightens on the streaming
architecture. Hadoop architecture is comprehensively mentioned with the
components of parallel processing.
Chapter 5 updates on the big data classification techniques, and various
learning methodologies are explained with examples. To extend the same,
deep learning algorithms and architectures are also briefed.
Chapter 6 highlights application across verticals with research problems
and solutions.
CHAPTER 1
ABSTRACT
1.1 INTRODUCTION
The word “big data” applies to the evolution and application of technology
that provide people with the right knowledge from a mass of data at the
right moment; it has been growing exponentially in its culture for a long
time. The task is not only to deal with exponentially growing data volumes
but also with the complexities of increasingly heterogeneous formats and
increasingly dynamic and integrated data management (Anuradha, 2015).
Its meaning varies according to the groups that involve a customer or
service provider.
Research Practitioner’s Handbook on Big Data Analytics. S. Sasikala, PhD, D. Renuka Devi, &
Raghvendra Kumar, PhD (Editor)
© 2023 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)
2 Research Practitioner's Handbook on Big Data Analytics
Data that is vast in scale concerning the retrieval system, with several
organized and unstructured data to be processed comprising multiple data
patterns. Data is registered, processed, and analyzed from traffic flows
and downloading of music, to web history and medical information, to
allow infrastructure and utilities to deliver the measurable performance
that the world depends on every day. If it just keeps hanging on to the
information without analyzing it, or if it does not store the information,
finding it to be of little use, it could be to the detriment of the organization.
These businesses manage all the things that they do on their website
and use them to create income (Loshin, 2013; Anuradha, 2015) for overall
improved customer experience, as well as for their benefit. There are
several examples of these types of activities that are available and they are
rising as more and more enterprises understand the power of information.
For technology researchers, this poses a challenge to come up with more
comprehensive and practical solutions that can address current problems
and demands.
The information society is now transitioning to a knowledge-based
society. It needs a greater volume of data to extract better information.
The knowledge society is a society in which data plays a significant role in
the economic, cultural, and political arenas. Thus, big data analytics play
a significant role in all facets of life.
1.3.1.3 METADATA
Metadata is information that does not represent data but contains knowl-
edge regarding the attributes and structure of a dataset. Metadata moni-
toring is important to the collection, storing, and interpretation of big data
because it offers knowledge regarding the birth of data, as well as all its
collection measures (Figure 1.2).
Metadata allows managing data in certain situations. It can, for instance,
maintain certain metadata regarding the resolution of the picture and the
number of colors used. Of course, it will get this detail from the graphic
image, at the expense of a longer loading period.
Figure 1.2 shows the example of metadata and contains the details of
the image, resolution, and other characteristics of the image.
Now that have set the groundwork for future debates, let us move on and
speak about the first capabilities of big data. It must have more than one
attribute, usually referred to as the five Vs in Figure 1.5 (Renuka Devi and
Sasikala, 2020) for a dataset to be called big data.
These five attributes of big data are used to help distinguish knowledge
classified as big from other data sizes. Doug Laney initially described
several of them in the early 2001 when he published an article explaining
the effect on enterprise data warehouses of the scale, pace, and variety of
e-commerce data.
10 Research Practitioner's Handbook on Big Data Analytics
1.4.1 VOLUME
1.4.2 VELOCITY
Velocity is the pace at which the data is produced, or how rapidly the data
arrives. It may term its data in motion in simpler terms. Imagine the sum of
data every day received from Twitter, YouTube, or other social networking
platforms. They must store it, process it, and be able to recover it somehow
later. Here are a few explanations of how data is growing rapidly:
• On each trading day, the New York stock exchange collects 1 TB
of data.
Introduction to Big Data Analytics 13
1.4.3 VARIETIES
1.4.4 VERACITY
1.4.5 VALUE
In terms of large data, this is the most significant vector, but it is not
especially synonymous with big data, and it is equally valid with small
data, too. Now it is time to determine if it is worth storing the data and
investing in storage, either on-premises or in the cloud, after resolving
all the other Vs, length, velocity, variety, variability, and veracity, which
requires a lot of time, commitment, and energy.
One aspect of value is that before we can use it to give valuable
information in return, we must store a huge amount of data. Earlier, it
was lumbered with enormous costs by storing this volume of data, but
now storage and recovery technology are so much less costly. One wants
to be sure that the data gives value to its organization. To satisfy legal
considerations, the study must be done.
that characteristic for statistical analysis (Saltz et al., 2020; Michael and
Miller, 2013).
Big data typically applies to data that extends traditional databases and
data mining techniques’ usual storage, retrieval, and computational power.
Big data requires software and techniques as a resistance that can be used
to evaluate and derive trends from large-scale data (Goul et al., 2020).
Study of organized data progresses due to the variety and speed of the data
that are manipulated.
Therefore, the large range of data suggests that the structures in place
would be able to aid in the processing of data and are no longer adequate
to interpret data and generate reports. The research consists of efficiently
identifying the associations between the data across a spectrum which is
constantly shifting the data to aid in the utilization of it.
Big data analytics relates to the method by which vast datasets are
gathered, processed, and are evaluated to uncover numerous trends and
useful knowledge. Big data analytics refers to a group of tools and tech-
niques that use new methods of incorporation to retrieve vast volumes
of unknown knowledge from large datasets that are more complex,
large-scale, and distinct from conventional databases. It mainly focuses
on overcoming new issues or old problems in the most productive and
reliable way possible.
Big data is the compilation of large and dynamic databases and data types,
including huge numbers of data, social network processing applications
for data mining, and real-time data. A vast volume of heterogeneous
digital data remains where, in terms of terabytes and petabytes, massive
datasets are measured (Saltz et al., 2020). The various types of analytics
are discussed below (Figure 1.10).
This consists of posing the question: What’s going on? (Figure 1.11) It
is a preliminary step in the collection of data that provides a historical
dataset. Methods of data mining coordinate knowledge and help discover
Introduction to Big Data Analytics 19
It consists of answering the question: why did it come about? (Figure 1.12)
Diagnostic analytics search for a problem’s root cause. To assess whether
anything happens, it is included. This form seeks to recognize the origins
of incidents and actions and understand them.
includes past evidence and what could happen are anticipated through
predictive analytics. To identify the right answer, prescriptive analytics
utilizes these criteria.
Streaming analytics is the real-time analytics, where the data are collected
from sensors or devices (Figure 1.15). This kind of analytics help us
to identify and understand the pattern of data being generated and can
provide immediate analytics and solutions.
confirmed cases
Features
• The transmission of 1 million 100-byte messages per second per
node is calculated.
• Storm to guarantee that the data device is processed at least once.
• Great scalability in horizontal terms.
• Fault-tolerance built-in.
• Auto-reboot for crashes.
• Clojure-written writing.
• Fits with the topology of a direct acyclic graph.
• Output files are in the language of JavaScript Object Notation (JSON).
• Real-time analytics, log analysis, ETL, continuous computing,
distributed RPC, deep learning have numerous use cases.
1.7.1.2 TALEND
Talend is a tool for big data that simplifies and automates the integra-
tion of big data. Native code is generated by its graphical wizard. It also
supports convergence with big data, master data processing, and data
consistency tests.
Features
Streamlines large data with ETL and ELT.
• Achieve the pace and size of the spark.
• Expedites the switch to real-time.
Another random document with
no related content on Scribd:
CHAPTER III.
I N the Sixth Book (Chap. iv. Sect. 1.) we have already seen how
the conception on the laws of fluid equilibrium was, by Pascal and
others, extended to air, as well as water. But though air presses and
is pressed as water presses and is pressed, pressure produces upon
air an effect which it does not, in any obvious degree, produce upon
water. Air which is pressed is also compressed, or made to occupy a
smaller space; and is consequently also made more dense, or
condensed; and on the other hand, when the pressure upon a
portion of air is diminished, the air expands or is rarefied. These
broad facts are evident. They are expressed in a general way by
saying that air is an elastic fluid, yielding in a certain degree to
pressure, and recovering its previous dimensions when the pressure
is removed.
But when men had reached this point, the questions obviously
offered themselves, in what degree and according to what law air
yields to pressure; when it is compressed, what relation does the
density bear to the pressure? The use which had been made of
tubes containing columns of mercury, by which the pressure of
portions of air was varied and measured, suggested obvious modes
of devising experiments by which this question might be answered.
Such experiments accordingly were made by Boyle about 1650; and
the result at which he arrived was, that when air is thus compressed,
the density is as the pressure. Thus if the pressure of the
atmosphere in its common state be equivalent to 30 inches of
mercury, as shown by the barometer; if air included in a tube be
pressed by 30 additional inches of 164 mercury, its density will be
doubled, the air being compressed into one half the space. If the
pressure be increased threefold, the density is also trebled; and so
on. The same law was soon afterwards (in 1676) proved
experimentally by Mariotte. And this law of the air’s elasticity, that the
density is as the pressure, is sometimes called the Boylean Law, and
sometimes the Law of Boyle and Mariotte.
Air retains its aerial character permanently; but there are other
aerial substances which appear as such, and then disappear or
change into some other condition. Such are termed vapors. And the
discovery of their true relation to air was the result of a long course
of researches and speculations.
[2nd Ed.] [It was found by M. Cagniard de la Tour (in 1823), that at
a certain temperature, a liquid, under sufficient pressure, becomes
clear transparent vapor or gas, having the same bulk as the liquid.
This condition Dr. Faraday calls the Cagniard de la Tour state, (the
Tourian state?) It was also discovered by Dr. Faraday that carbonic-
acid gas, and many other gases, which were long conceived to be
permanently elastic, are really reducible to a liquid state by
pressure. 39 And in 1835, M. Thilorier found the means of reducing
liquid carbonic acid to a solid form, by means of the cold produced in
evaporation. More recently Dr. Faraday has added several
substances usually gaseous to the list of those which could
previously be shown in the liquid state, and has reduced others,
including ammonia, nitrous oxide, and sulphuretted hydrogen, to a
solid consistency. 40 After these discoveries, we may, I think,
reasonably doubt whether all bodies are not capable of existing in
the three consistencies of solid, liquid, and air.
39 Phil. Trans. 1823.
We may note that the law of Boyle and Mariotte is not exactly true
near the limit at which the air passes to the liquid state in such cases
as that just spoken of. The diminution of bulk is then more rapid than
the increase of pressure.
De Luc also marked very precisely (as Wallerius had done) the
difference between vapor and air; the former being capable of
change of consistence by cold or pressure, the latter not so. Pictet,
in 1786, made a hygrometrical experiment, which appeared to him to
confirm De Luc’s views; and De Luc, in 1792, published a concluding
essay on the subject in the Philosophical Transactions. Pictet’s
Essay on Fire, in 1791, also demonstrated that “all the train of
hygrometrical phenomena takes place just as well, indeed rather
quicker, in a vacuum than in air, provided the same quantity of
moisture is present.” This essay, and De Luc’s paper, gave the
death-blow to the theory of the solution of water in air.
Yet this theory did not fall without an obstinate struggle. It was
taken up by the new school of French chemists, and connected with
their views of heat. Indeed, it long appears as the prevalent opinion.
169 Girtanner, 47 in his Grounds of the Antiphlogistic Theory, may be
considered as one of the principal expounders of this view of the
matter. Hube, of Warsaw, was, however, the strongest of the
defenders of the theory of solution, and published upon it repeatedly
about 1790. Yet he appears to have been somewhat embarrassed
with the increase of the air’s elasticity by vapor. Parrot, in 1801,
proposed another theory, maintaining that De Luc had by no means
successfully attacked that of solution, but only De Saussure’s
superfluous additions to it.
47 Fischer, vol. vii. 473.
The other difficulty was first fully removed by Mr. Dalton. When his
attention was drawn to the subject of vapor, he saw insurmountable
objections to the doctrine of a chemical union of water and air. In
fact, this doctrine was a mere nominal explanation; for, on closer
examination, no chemical analogies supported it. After some
reflection, and in the sequel of other generalizations concerning
gases, he was led to the persuasion, that when air and steam are
mixed together, each follows its separate laws of equilibrium, the
particles of each being elastic with regard to those of their own kind
only: so that steam may be conceived as flowing among the particles
of air 50 “like a stream of water among pebbles;” and the resistance
which air offers to evaporation arises, not from its weight, but from
the inertia of its particles.
50 Manchester Memoirs, vol. v. p. 581.