Full Ebook of Research Practitioners Handbook On Big Data Analytics 1St Edition S Sasikala Online PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Research Practitioner's Handbook on

Big Data Analytics 1st Edition S.


Sasikala
Visit to download the full and correct content document:
https://ebookmeta.com/product/research-practitioners-handbook-on-big-data-analytic
s-1st-edition-s-sasikala/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Machine Learning and Big Data Analytics (Proceedings of


International Conference on Machine Learning and Big
Data Analytics (ICMLBDA) 2021) 1st Edition Rajiv Misra

https://ebookmeta.com/product/machine-learning-and-big-data-
analytics-proceedings-of-international-conference-on-machine-
learning-and-big-data-analytics-icmlbda-2021-1st-edition-rajiv-
misra/

Handbook of Research for Big Data 1st Edition Brojo


Kishore Mishra

https://ebookmeta.com/product/handbook-of-research-for-big-
data-1st-edition-brojo-kishore-mishra/

Handbook of Big Data Analytics and Forensics Kim-Kwang


Raymond Choo (Editor)

https://ebookmeta.com/product/handbook-of-big-data-analytics-and-
forensics-kim-kwang-raymond-choo-editor/

Handbook of Research on Big Data Clustering and Machine


Learning 1st Edition Fausto Pedro Garcia Marquez
(Editor)

https://ebookmeta.com/product/handbook-of-research-on-big-data-
clustering-and-machine-learning-1st-edition-fausto-pedro-garcia-
marquez-editor/
Data-Enabled Analytics: DEA for Big Data Joe Zhu

https://ebookmeta.com/product/data-enabled-analytics-dea-for-big-
data-joe-zhu/

Big Data and Analytics 2nd Edition Seema Acharya

https://ebookmeta.com/product/big-data-and-analytics-2nd-edition-
seema-acharya/

Big Data Analytics with R 1st Edition Simon Walkowiak

https://ebookmeta.com/product/big-data-analytics-with-r-1st-
edition-simon-walkowiak/

Mathematical Foundations of Big Data Analytics Vladimir


Shikhman

https://ebookmeta.com/product/mathematical-foundations-of-big-
data-analytics-vladimir-shikhman/

Advances in Big Data Analytics 1st Edition Hamid R.


Arabnia

https://ebookmeta.com/product/advances-in-big-data-analytics-1st-
edition-hamid-r-arabnia/
RESEARCH PRACTITIONER’S
HANDBOOK ON BIG DATA ANALYTICS
RESEARCH PRACTITIONER’S
HANDBOOK ON BIG DATA ANALYTICS

S. Sasikala, PhD
Renuka Devi D, PhD

Raghvendra Kumar, PhD


Editor
First edition published 2023
Apple Academic Press Inc. CRC Press
1265 Goldenrod Circle, NE, 6000 Broken Sound Parkway NW,
Palm Bay, FL 32905 USA Suite 300, Boca Raton, FL 33487-2742 USA
760 Laurentian Drive, Unit 19, 4 Park Square, Milton Park,
Burlington, ON L7N 0A4, CANADA Abingdon, Oxon, OX14 4RN UK

© 2023 by Apple Academic Press, Inc.


Apple Academic Press exclusively co-publishes with CRC Press, an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the authors, editors, and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors, editors, and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write
and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright
Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on
CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification
and explanation without intent to infringe.

Library and Archives Canada Cataloguing in Publication


Title: Research practitioner's handbook on big data analytics / S. Sasikala, PhD, D. Renuka Devi, Raghvendra Kumar, PhD.
Names: Sasikala, S., 1970- author. | Devi, D. Renuka, 1980- author. | Kumar, Raghvendra, 1987- author.
Description: First edition. | Includes bibliographical references and index.
Identifiers: Canadiana (print) 20220437475 | Canadiana (ebook) 20220437491 | ISBN 9781774910528 (hardcover) |
ISBN 9781774910535 (softcover) | ISBN 9781003284543 (ebook)
Subjects: LCSH: Big data—Handbooks, manuals, etc. | LCSH: Big data—Research—Handbooks, manuals, etc. |
LCSH: Data mining—Handbooks, manuals, etc. | LCSH: Electronic data processing—Handbooks, manuals, etc. |
LCGFT: Handbooks and manuals.
Classification: LCC QA76.9.B45 S27 2023 | DDC 005.7—dc23
Library of Congress Cataloging-in-Publication Data

CIP data on file with US Library of Congress

ISBN: 978-1-77491-052-8 (hbk)


ISBN: 978-1-77491-053-5 (pbk)
ISBN: 978-1-00328-454-3 (ebk)
About the Authors

S. Sasikala, PhD
S.Sasikala, PhD, is Associate Professor and Research Supervisor in
the Department of Computer Science, IDE, and Director of Network
Operation and Edusat Programs at the University of Madras, Chennai,
India. She has 23 years of teaching experience and has coordinated
computer-related courses with dedication and sincerity. She has acted as
Head-in-charge of the Centre for Web-based Learning for three years,
beginning in 2019. She holds various posts at the university, including
Nodal Officer for the UGC Student Redressal Committee, Coordinator
for Online Course Development at IDE, President for Alumni Associa-
tion at IDE. She has been an active chair in various Board of Studies
meetings held at the institution and has acted as an advisor for research.
She has participated in administrative activities and shows her enthu-
siastic participation in research activities by guiding research scholars,
writing and editing textbooks, and publishing articles in many reputed
journals consistently. Her research interests include image, data mining,
machine learning, networks, big data, and AI. She has published two
books in the domain of computer science and published over 27 research
articles in leading journals and conference proceedings as well as four
book chapters, including in publications from IEEE, Scopus, Elsevier,
Springer, and Web of Science. She has also received best paper awards
and women’s achievement awards. She is an active reviewer and edito-
rial member for international journals and conferences. She has been
invited for talks on various emerging topics and chaired sessions in
international conferences.

Renuka Devi D, PhD


Renuka Devi D, PhD, is Assistant Professor in the Department of
Computer Science, Stella Maris College (Autonomous), Chennai, India.
She has 12 years of teaching experience. Her research interests include
data mining, machine learning, big data, and AI. She actively participates
vi About the Authors

in continued learning through conferences and professional research. She


has published eight research papers and a book chapter in publications
from IEEE, Scopus, and Web of Science. She has also presented papers
at international conferences and received best paper awards.
About the Editor

Raghvendra Kumar, PhD


Raghvendra Kumar, PhD, is Associate Professor in the Computer Science
and Engineering Department at GIET University, India. He was formerly
associated with the Lakshmi Narain College of Technology, Jabalpur,
Madhya Pradesh, India. He also serves as Director of the IT and Data
Science Department at the Vietnam Center of Research in Economics,
Management, Environment, Hanoi, Viet Nam. Dr. Kumar serves as Editor
of the book series Internet of Everything: Security and Privacy Paradigm
(CRC Press/Taylor & Francis Group) and the book series Biomedical
Engineering: Techniques and Applications (Apple Academic Press). He
has published a number of research papers in international journals and
conferences. He has served in many roles for international and national
conferences, including organizing chair, volume editor, volume editor,
keynote speaker, session chair or co-chair, publicity chair, publication
chair, advisory board member, and technical program committee member.
He has also served as a guest editor for many special issues of reputed
journals. He authored and edited over 20 computer science books in field of
Internet of Things, data mining, biomedical engineering, big data, robotics,
graph theory, and Turing machines. He is the Managing Editor of the
International Journal of Machine Learning and Networked Collaborative
Engineering. He received a best paper award at the IEEE Conference
2013 and Young Achiever Award–2016 by the IEAE Association for his
research work in the field of distributed database. His research areas are
computer networks, data mining, cloud computing and secure multiparty
computations, theory of computer science and design of algorithms.
Contents

Abbreviations ..................................................................................................... xiii


Preface .................................................................................................................xv
Introduction....................................................................................................... xvii

1. Introduction to Big Data Analytics............................................................. 1


Abstract................................................................................................................................ 1
1.1 Introduction................................................................................................................. 1
1.2 Wider Variety of Data ................................................................................................. 3
1.3 Types and Sources of Big Data ................................................................................... 4
1.4 Characteristics of Big Data ......................................................................................... 9
1.5 Data Property Types.................................................................................................. 16
1.6 Big Data Analytics .................................................................................................... 18
1.7 Big Data Analytics Tools with Their Key Features .................................................. 25
1.8 Techniques of Big Data Analysis .............................................................................. 32
Keywords........................................................................................................................... 42
References ......................................................................................................................... 42

2. Preprocessing Methods.............................................................................. 45
Abstract.............................................................................................................................. 45
2.1 Data Mining—Need of Preprocessing ...................................................................... 45
2.2 Preprocessing Methods ............................................................................................. 49
2.3 Challenges of Big Data Streams in Preprocessing.................................................... 59
2.4 Preprocessing Methods ............................................................................................. 60
Keywords........................................................................................................................... 68
References ......................................................................................................................... 68

3. Feature Selection Methods and Algorithms ............................................ 71


Abstract.............................................................................................................................. 71
3.1 Feature Selection Methods........................................................................................ 71
3.2 Types of Fs ................................................................................................................ 72
3.3 Online Fs Methods.................................................................................................... 78
3.4 Swarm Intelligence in Big Data Analytics................................................................ 79
3.5 Particle Swarm Optimization.................................................................................... 86
3.6 Bat Algorithm............................................................................................................ 86
x Contents

3.7 Genetic Algorithms ................................................................................................... 89


3.8 Ant Colony Optimization.......................................................................................... 91
3.9 Artificial Bee Colony Algorithm............................................................................... 96
3.10 Cuckoo Search Algorithm......................................................................................... 99
3.11 Firefly Algorithm .................................................................................................... 100
3.12 Grey Wolf Optimization Algorithm ........................................................................ 103
3.13 Dragonfly Algorithm............................................................................................... 104
3.14 Whale Optimization Algorithm............................................................................... 108
Keywords......................................................................................................................... 109
References ....................................................................................................................... 109

4. Big Data Streams ..................................................................................... 113


Abstract............................................................................................................................ 113
4.1 Introduction............................................................................................................. 113
4.2 Stream Processing................................................................................................... 114
4.3 Benefits of Stream Processing................................................................................. 118
4.4 Streaming Analytics ................................................................................................ 119
4.5 Real-Time Big Data Processing Life Cycle ............................................................ 119
4.6 Streaming Data Architecture................................................................................... 122
4.7 Modern Streaming Architecture.............................................................................. 128
4.8 The Future of Streaming Data in 2019 and Beyond ............................................... 129
4.9 Big Data and Stream Processing............................................................................. 130
4.10 Framework for Parallelization on Big Data ............................................................ 130
4.11 Hadoop.................................................................................................................... 134
Keywords......................................................................................................................... 153
References ....................................................................................................................... 154

5. Big Data Classification ............................................................................ 157


Abstract............................................................................................................................ 157
5.1 Classification of Big Data and its Challenges......................................................... 157
5.2 Machine Learning ................................................................................................... 159
5.3 Incremental Learning for Big Data Streams ........................................................... 179
5.4 Ensemble Algorithms.............................................................................................. 180
5.5 Deep Learning Algorithms...................................................................................... 188
5.6 Deep Neural Networks............................................................................................ 191
5.7 Categories of Deep Learning Algorithms ............................................................... 202
5.8 Application of Dl-Big Data Research ..................................................................... 212
Keywords......................................................................................................................... 215
References ....................................................................................................................... 215
Contents xi

6. Case Study ................................................................................................ 217


6.1 Introduction............................................................................................................. 217
6.2 Healthcare Analytics—Overview ........................................................................... 219
6.3 Big Data Analytics Healthcare Systems.................................................................. 231
6.4 Healthcare Companies Implementing Analytics..................................................... 240
6.5 Social Big Data Analytics ....................................................................................... 243
6.6 Big Data in Business............................................................................................... 255
6.7 Educational Data Analytics..................................................................................... 269
Keywords......................................................................................................................... 280
References ....................................................................................................................... 280

Index ................................................................................................................. 283


Abbreviations

ABC Ant Bee Colony


ABC Artificial Bee Colony
ACO Ant Colony Optimization
AI Artificial Intelligence
ANNs Artificial Neural Networks
APIs Application Programming Interfaces
BA Bat Algorithm
BI Business Intelligence
BMU Best Matching Unit
CDC Centers for Disease Control and Prevention
CHOA Children’s Healthcare of Atlanta
CNNs Convolutional Neural Networks
CT Computed Tomography
DA Dragonfly Algorithm
DBNs Deep Belief Networks
DL Deep Learning
EMRs Electronic Medical Records
ETL Extract–Transform–Load
FHCN Family HealthCare Network
FS Feature Selection
GA Genetic Algorithms
GANs Generative Adversarial Networks
GWO Grey Wolf Optimization Algorithm
HDFS Hadoop Distributed File System
HQL Hive Query Language
xiv Abbreviations

JSON JavaScript Object Notation


KNN K-Nearest Neighbor
KPIs Key Performance Indicators
LR Linear Regression
LSTMs Long Short-Term Memory networks
MDP Markov Decision Process
ML Machine Learning
MLP Multilayer Perceptron
NPCR National Program of Cancer Registries
OFS Online Feature Selection
OLAP Online Analytic Processing
PSO Particle Swarm Optimization
RBFNs Radial Basis Function Networks
RBMs Restricted Boltzmann Machines
ReLU Rectified Linear Unit
RNNs Recurrent Neural Networks
SI Swarm Intelligence
SOMs Self-Organizing Maps
SVM Support Vector Machine
WOA Whale Optimization Algorithm
Preface

Throughout the learning behavior among academicians, researchers,


and students around the globe, we observe unprecedented interest in big
data analytics. As decades pass by, big data analytics knowledge transfer
groups have been intensively working on shaping various nuances and
techniques and delivering them across the country. This would have not
been possible without the Midas touch of researchers who have been
extensively carrying out research across domains and connecting big data
analytics as a part of other evolving technologies. This book discusses
major contributions and perspectives in terms of research over big data
and how these concepts serve global markets (IT industry) to lay concrete
foundations on the same technology. We hope that this book will offer a
wider connotation to researchers and academicians in all walks and on par
with their ways of using big data analytics as a theoretical and practical
style.
Introduction

With the recent developments in the digital era, data escalate at a rapid
rate. Big data refers to an assortment of data that are outsized and intricate
so that conventional database administration systems and data processing
tools cannot process them. This book mainly focuses on the core concepts of
big data analytics, tools, techniques, and methodologies from the research
perspectives. Both theoretical and practical approaches are handled in this
book that can cover a broad spectrum of readers. This book would be a
complete and comprehensive handbook in the research domain of big data
analytics.
Chapter 1 briefs about the fundamentals of big data, terminologies,
types of analytics, and big data tools and techniques.
Chapter 2 outlines the need for preprocessing data and various methods
in handling the same. Both text and image preprocessing methods are also
highlighted. In addition to that, challenges of streaming data processing are
also discussed.
Chapter 3 briefs on various featured selection methods and algorithms,
and research problems related to each category are discussed with specific
examples.
Chapter 4 describes the core methods of big data streams and the
prerequisite for parallelization. This chapter also enlightens on the streaming
architecture. Hadoop architecture is comprehensively mentioned with the
components of parallel processing.
Chapter 5 updates on the big data classification techniques, and various
learning methodologies are explained with examples. To extend the same,
deep learning algorithms and architectures are also briefed.
Chapter 6 highlights application across verticals with research problems
and solutions.
CHAPTER 1

Introduction to Big Data Analytics

ABSTRACT

This chapter briefs about fundamentals of big data analytics, terminologies,


types of data analytics, big data tools and techniques. The fundamentals of
big data analytics extend into understanding the various types of big data,
its 5V’s characteristics, and the sources of big data. The core types of big
data analytics are explained with examples. Big data analytics refers to a
group of tools and techniques that use new methods of incorporation to
retrieve vast volumes of unknown knowledge from large datasets that are
more complex, large-scale, and distinct from conventional databases. In
recent times, various tools have been developed for deeper analytics and
visualization. The commonly used big data analytics tools are discussed
elaborately covering the key features, applications and highlighting the
potential advantages.

1.1 INTRODUCTION

The word “big data” applies to the evolution and application of technology
that provide people with the right knowledge from a mass of data at the
right moment; it has been growing exponentially in its culture for a long
time. The task is not only to deal with exponentially growing data volumes
but also with the complexities of increasingly heterogeneous formats and
increasingly dynamic and integrated data management (Anuradha, 2015).
Its meaning varies according to the groups that involve a customer or
service provider.

Research Practitioner’s Handbook on Big Data Analytics. S. Sasikala, PhD, D. Renuka Devi, &
Raghvendra Kumar, PhD (Editor)
© 2023 Apple Academic Press, Inc. Co-published with CRC Press (Taylor & Francis)
2 Research Practitioner's Handbook on Big Data Analytics

Big data, invented by the network giants, describes itself as a solution


intended to allow anyone access to giant datasets in real time. It is difficult
to exactly describe big data because the very notion of big differs from one
field to another in terms of data volume. It does not describe a collection of
technologies but describes a group of techniques and technologies.
This is an evolving field, and the meaning shifts as we understand how
to apply this new concept and leverage its value. Digital data generated is
partly the product of the use of Internet-connected devices. Smartphones,
laptops, and computers, therefore, relay information about their customers.
Linked smart objects relay knowledge about the use of ordinary objects by
the user.
In addition to connected computers, information comes from a broad
variety of sites: population data, climate data, science and medical data,
data on energy use, and so on. Data include information on the location,
transport, preferences, usage patterns, recreational activities, and ventures
of users of smartphones, and so on. However, information about how
infrastructure, machinery, and facilities are used is also available (Loshin,
2013). The amount of digital data is rising exponentially with the ever-
increasing number of Internet and cell phone users.
In the last decade, the amount of data that one must contend with has
risen to unprecedented heights and, at the same time, the price of data
storage has systematically decreased. Private firms and academic organi-
zations collect terabytes of data from devices such as cell phones and cars
regarding the experiences of their customers, industry, social media, and
sensors. This era’s challenge is to make sense of this sea of knowledge.
This is where the study of big data comes into focus. Big data analytics
primarily means gathering data from multiple sites, mixing it in a manner
where researchers can consume it, and eventually providing data items
that are beneficial to the business of the enterprise. The essence of big data
analytics is the method of transforming vast volumes of unstructured raw
data retrieved from multiple sites to a data product usable for organizations.
It is important to use this knowledge for predictive forecasting, as
well as for marketing and also involving many other uses. It would not be
possible to execute this task within the specified time frame if it uses the
conventional solution, since the storage and computing space will not be
suitable for these types of tasks. To explain the meaning of big data, it has
a clearer description. The more reliable version is given by Kim and Pries
(2015):
Introduction to Big Data Analytics 3

Data that is vast in scale concerning the retrieval system, with several
organized and unstructured data to be processed comprising multiple data
patterns. Data is registered, processed, and analyzed from traffic flows
and downloading of music, to web history and medical information, to
allow infrastructure and utilities to deliver the measurable performance
that the world depends on every day. If it just keeps hanging on to the
information without analyzing it, or if it does not store the information,
finding it to be of little use, it could be to the detriment of the organization.

These businesses manage all the things that they do on their website
and use them to create income (Loshin, 2013; Anuradha, 2015) for overall
improved customer experience, as well as for their benefit. There are
several examples of these types of activities that are available and they are
rising as more and more enterprises understand the power of information.
For technology researchers, this poses a challenge to come up with more
comprehensive and practical solutions that can address current problems
and demands.
The information society is now transitioning to a knowledge-based
society. It needs a greater volume of data to extract better information.
The knowledge society is a society in which data plays a significant role in
the economic, cultural, and political arenas. Thus, big data analytics play
a significant role in all facets of life.

1.2 WIDER VARIETY OF DATA

The range of sites of data continues to grow. Internally based operating


systems, such as enterprise resource planning and CRM applications, have
traditionally been the main source of data used in the predictive analysis
(Kim and Pries, 2015). However, the complexity of data sizes that feed
into the empirical processes is increasing to expand information and
understanding and to provide a broader spectrum of data sizes such as:
• Data on the Internet (i.e., clickstream, social networking, connections
to social networks).
• Main (i.e., polls, studies, observations) study.
• Secondary analysis (i.e., demand and competitive 1data, business
surveys, customer data, organization data).
• Data on positions (i.e., data on smart devices, geospatial data).
• Picture evidence (i.e., film, monitoring, satellite images).
4 Research Practitioner's Handbook on Big Data Analytics

• Data on supply chain (i.e., electronic data interchange (EDI),


distributor catalogs and prices, consistency data).
• Information about devices (i.e., sensors, programmable logic
controllers (PLCs), radio frequency (RF) devices, Laboratory
Information Management System (LIMs), telemetry).
The broad spectrum of data adds to problems in ingesting the data
into data storage. The variation of details often complicates the transfor-
mation (or the transformation of data into a shape that can be utilized
in the processing of analytics) and the computational calculation of data
processing.
Big data is considered a series of vast and dynamic datasets that are
challenging to store and process utilizing conventional databases and
data analysis techniques. Big data is obtained from both conventional and
modern channels that can be used for study and review when accurately
refined. Over time, organizations are rising and are often increasing rapidly
with this knowledge produced by these organizations.
The challenge is to provide a website that includes the entire data with a
single, coherent view. Another task is to organize this knowledge in detail
so that it makes sense and is beneficial. Big data is constantly generated
from all around us. The processing of such an immense volume of data
is the duty of social networking platforms and digital outlets. Sensors,
mobiles, and networks are the key to how this massive volume of data is
transmitted (Loshin, 2013).

1.3 TYPES AND SOURCES OF BIG DATA

1.3.1 TYPES OF DATA

The overview of categories of data is presented in this section.

1.3.1.1 STRUCTURED DATA

Data that is ordered in a predefined schema and has a fixed format is


considered structured data (Figure 1.1). Examples of organized data
provide data from conventional databases and archives such as mainframes,
SQL Server, Oracle, DB2, Sybase, Access, Excel, txt, and Teradata. The
Introduction to Big Data Analytics 5

method of relational database management deals with mostly this kind of


knowledge.

FIGURE 1.1 Structured data.

1.3.1.2 MULTISTRUCTURED DATA

Multistructure data is unmodeled, it must be ordered, so it is overlooked


even if there might be a schema. It can be learned from human and
6 Research Practitioner's Handbook on Big Data Analytics

computer experiences, which includes data on developing economies,


e-commerce, and other third-party data, such as temperature, currency
conversion, demographics, tables, etc.

1.3.1.3 METADATA

Metadata is information that does not represent data but contains knowl-
edge regarding the attributes and structure of a dataset. Metadata moni-
toring is important to the collection, storing, and interpretation of big data
because it offers knowledge regarding the birth of data, as well as all its
collection measures (Figure 1.2).
Metadata allows managing data in certain situations. It can, for instance,
maintain certain metadata regarding the resolution of the picture and the
number of colors used. Of course, it will get this detail from the graphic
image, at the expense of a longer loading period.
Figure 1.2 shows the example of metadata and contains the details of
the image, resolution, and other characteristics of the image.

FIGURE 1.2 Metadata.


Introduction to Big Data Analytics 7

1.3.1.4 UNSTRUCTURED DATA

Unstructured data is considered unstructured data (Figure 1.3) and it is not


possible to view such data using conventional databases or data structures.
Social networking info pertains to these, such as chatter, text analytics,
blogs, messages, mentions, clicks, marks, etc.
The relationship between structured data (data that is easy to describe,
index, and analyze) and unstructured data (data that tends to avoid easy
classification, requires a lot of storage space, and is typically more difficult
to analyze) is “level set” (Engelbrecht and Preez, 2020).

FIGURE 1.3 Unstructured data.

Unstructured data is content that lacks a predetermined data model or


does not fit well into a relational database. Text-heavy unstructured data
is common, but it may also contain information like hours, numbers, and
statistics.
The word semiorganized data is used to define structured knowledge
that does not fall within a formal data model framework. However, tags
that isolate textual elements include semistructured data, which requires
the potential to implement hierarchies within the data.
8 Research Practitioner's Handbook on Big Data Analytics

• Every 2 years, the amount of data (all data, everywhere) doubles.


• It is getting more open regarding its planet. As it grows more familiar
with parting with data that used to feel sacred and confidential and
thus are starting to acknowledge this.
• The latest data were largely unstructured. Unstructured data accounts
for almost 95% of the latest data, while structured data accounts for
just 5%.
• Unlike structured data, which continues to evolve more linearly,
unstructured data tend to grow steadily.
Unstructured material is vastly therefore underused. Imagine large
reserves of oil or other natural resizes ready to be exploited that are only
just lying there. As of now, that is the real status of unstructured records.
Tomorrow would be a different story because there is a lot of profit to
be created for clever people and businesses that can effectively exploit
unstructured data.

1.3.2 SOURCES OF BIG DATA

The various sources of big data are given in Figure 1.4.


Social media: Large data firms such as Facebook and Google receive
data from whatever operations they undertake. Facebook, Twitter,
LinkedIn, journals, SlideShare, Instagram, chatter, WordPress, Jive, and
so on are other examples.
Public web: This includes Wikipedia info, healthcare, the World Bank
data, economy, weather, traffic, and so on.
Archives: This covers records with all materials, such as medical records,
consumer communications, insurance reports, documents scanned, etc.
Docs: Big data outlets provide documents in all format, like HTML,
CSV, PDF, XLS, Word, XML, etc.
Media: Images, videos, audio, live streams, podcasts, etc.
Data storage: The different databases and file structures used to archive
data, thus function here in this as a big data outlet.
Machine log information: System data, device logs, audit logs,
comprehensive CDR call histories, numerous smartphone applications,
mobile positions, etc.
Sensor information: Information from medical device-connected
systems, road monitors, rockets, traffic surveillance software, video games,
Introduction to Big Data Analytics 9

home appliances, air-conditioning equipment, office buildings, etc. Thus,


large data is a combination of data that is unstructured, organized, and
multistructured.

FIGURE 1.4 Sources of big data.

1.4 CHARACTERISTICS OF BIG DATA

Now that have set the groundwork for future debates, let us move on and
speak about the first capabilities of big data. It must have more than one
attribute, usually referred to as the five Vs in Figure 1.5 (Renuka Devi and
Sasikala, 2020) for a dataset to be called big data.
These five attributes of big data are used to help distinguish knowledge
classified as big from other data sizes. Doug Laney initially described
several of them in the early 2001 when he published an article explaining
the effect on enterprise data warehouses of the scale, pace, and variety of
e-commerce data.
10 Research Practitioner's Handbook on Big Data Analytics

FIGURE 1.5 Five Vs of big data.

To highlight the importance of all data with a low signal-to-noise


ratio, veracity has been applied to this list. The use of big data aims to
execute data collection in such a way that high-quality outcomes are
generated in a timely fashion, supplying the organization with optimum
value.

1.4.1 VOLUME

Business statistics in previous years applied only to the details provided


by their workers. Now, as the use of technology increases, it is not just
employee-generated data but also machine-generated data used by busi-
nesses and their customers. Also, people are sharing and uploading too
much material, images, photographs, tweets, and so on, with the growth
of social networking and other Internet tools. Just imagine, the popula-
tion of the planet is 7 billion, and nearly 6 billion of them have mobile
phones.
There are many sensors in the cell phone itself, such as a gyro-meter,
which generates data for each event and is now being collected and
analyzed (Saltz et al., 2020). From Table 1.1, numerous machine memory
Introduction to Big Data Analytics 11

sizes are mentioned to offer an understanding of the conversions between


different devices.

TABLE 1.1 Memory Sizes.


1 bit Binary digit
8 bits 1 byte
1024 bytes 1 KB (kilobyte)
1024 KB 1 MB (megabyte)
1024 MB 1 GB (gigabyte)
1024 GB 1 TB (terabyte)
1,024 TB 1 PB (petabyte)
1024 PB 1 EB (exabyte)
1024 EB 1 ZB (zettabyte)
1024 ZB 1 YB (yottabyte)
1024 YB 1 brontobyte
1024 brontobytes 1 geophyte

In the big data sense, as we speak about volume, it is a large quantity


of data concerning the computing method that cannot be obtained,
stored, and analyzed using conventional methods. It is data at rest that
is already gathered and data that is continuously often produced by
streaming.
Take Facebook, for example, they have 2 billion active users who
share their status, images, videos, feedback on each other’s messages,
interests, dislikes, and several more things constantly using this social
networking platform. A daily 600 TB of data is absorbed into the Face-
book servers, as per the statistics given by Facebook. Figure 1.6 shows
the graph below which displays the details that occurred in previous
years, the present scenario, and where it will be going in the future
(Michael and Miller, 2013).
We take another example of an airplane with a helicopter. For every
hit of flight time, one statistic indicates that it produces 10 TB of data.
Now think of how the volume of data produced will exceed several
petabytes per day, involving thousands of flights per day (Michael and
Miller, 2013).
12 Research Practitioner's Handbook on Big Data Analytics

FIGURE 1.6 Data growth.

The volume of data produced in the last 2 years is equivalent to 90%


of the data ever made, per 1.2 years, the world’s data doubles. One study
further states that by 2020, 40 zettabytes of data will be produced. Not that
long ago, it was deemed a challenge to produce such a large volume of
data as the cost of storage was very high.
But now, when the cost of storage is declining, it is no longer a problem.
Solutions such as Hadoop and numerous algorithms that assist in ingesting
and analyzing this immense volume of data often render it as look resourceful.
Velocity is the second feature of big data. Let us explore what it is.

1.4.2 VELOCITY

Velocity is the pace at which the data is produced, or how rapidly the data
arrives. It may term its data in motion in simpler terms. Imagine the sum of
data every day received from Twitter, YouTube, or other social networking
platforms. They must store it, process it, and be able to recover it somehow
later. Here are a few explanations of how data is growing rapidly:
• On each trading day, the New York stock exchange collects 1 TB
of data.
Introduction to Big Data Analytics 13

• One-hundred and twenty hits of videos are posted per minute to


YouTube.
• Data created by modern vehicles; thus approximately 100 sensors
are available to track each item from fuel and tire pressure to
obstacles around it. Every minute, 200 million emails are sent.
• With the example of developments in social networking, more
knowledge indicates more revealing details regarding groups of
citizens in various territories.
Figure 1.7 indicates the number of time people spent on common
websites for social networking. Based on these user habits, imagine the
frequency of data produced. This is merely a snapshot of what is going
on out there. The period over which data can make sense and be useful is
another component of velocity.
Over time, will it age and reduce the value, or will it remain worth
permanently? This is also critical since it can confuse one, if the data ages
and loses validity over time, so maybe over time. It addressed two features
of big data so far. Variety is the third one. Now let us explore it.

FIGURE 1.7 Velocity of data.


14 Research Practitioner's Handbook on Big Data Analytics

1.4.3 VARIETIES

The review of the classification of data is done in this portion. It may


be data that is organized or unstructured. For the knowledge that has a
predefined schema or which has a data model rather with predefined
columns, data forms, and so on, structured data is chosen, on the other
hand, unstructured data has none of these attributes.
Figure 1.8 provides a long list of records, such as papers, addresses,
text messages from social media, photographs, still photos, audio, graphs,
and sensor performance from all forms of computer-generated records,
computers, RFID marks, computer logs, and GPS signals from mobile
phones, and more. In distinct chapters, in this novel chapter, we are
learning more information regarding structured and unstructured data.

FIGURE 1.8 Data variation.

Let us take one example, 30 billion pieces of content are posted on


Facebook every month. There are 400 million messages sent every day.
Every month, 4 billion hits of videos are viewed on YouTube. These are
both indicators of the production of unstructured data that needs to be
processed, either for a stronger customer interface or for the businesses
themselves to produce sales. Veracity is the fifth feature of big data.

1.4.4 VERACITY

This particular characteristic deals with knowledge ambiguity. It could be


due to low data quality or also because of data noise. It is human behavior
to say that knowledge given to us is not always trusted by us. This is one of
the main factors that reveal that the evidence they use for decision-making
is not trusted by one in three company leaders.
Introduction to Big Data Analytics 15

Before analysis and decision-making, it can be considered in a way


that velocity and variety are dependent on clean data, whereas veracity is
the opposite of these characteristics as it is derived from data uncertainty.
The sources of data veracity are presented in Figure 1.9.

FIGURE 1.9 Veracity—sources.

To remove confusion, the biggest problem is that they do not have


time to clean up streaming data or high-speed data. Machines and sensors
produce data such as event data, and if it is hesitant to clean and process it
first, the data can lose importance. So, taking account of confusion, it must
process it as is.
Veracity is all about misunderstanding and how much confidence one
has with data, so it might be that it must redefine trustworthy data with a
particular meaning while using it in terms of the sense of big data. It is, in
my view, the way it uses data or analyzes it to make decisions. It affects
the importance and effect of the decisions it makes because of the faith one
has in its data. Now let us look at the fifth big data characteristic, which is
uncertainty.
16 Research Practitioner's Handbook on Big Data Analytics

1.4.5 VALUE

In terms of large data, this is the most significant vector, but it is not
especially synonymous with big data, and it is equally valid with small
data, too. Now it is time to determine if it is worth storing the data and
investing in storage, either on-premises or in the cloud, after resolving
all the other Vs, length, velocity, variety, variability, and veracity, which
requires a lot of time, commitment, and energy.
One aspect of value is that before we can use it to give valuable
information in return, we must store a huge amount of data. Earlier, it
was lumbered with enormous costs by storing this volume of data, but
now storage and recovery technology are so much less costly. One wants
to be sure that the data gives value to its organization. To satisfy legal
considerations, the study must be done.

1.5 DATA PROPERTY TYPES

1.5.1 QUALITATIVE AND QUANTITATIVE PROPERTIES

There are two main types of properties (Table 1.2):


• Qualitative properties are properties that can be detected, but with
a numerical outcome, they cannot be evaluated or calculated (Tsai
et al., 2015). To describe a given subject in a more abstract way,
including even impressions, opinions, views, and motivations, it
uses this type of property.
This gives a subject breadth of comprehension but also makes
it more challenging to examine. It regards this form of property as
unstructured. The measurement is nonstatistical if the data type is
qualitative.
• Qualitative characteristics ask (or answer) why?
• Numbers and statistical equations rely on quantitative properties
and can be measured and computed. It regards this kind of property
as organized and statistical. How much is questioned (or answered)
by quantitative properties? Yeah, or how many?
Its characteristics can be interpreted as quantitative data on a specific
subject. If the characteristic of the property is qualitative, it can be
converted into a quantitative one by supplying the numerical details of
Introduction to Big Data Analytics 17

that characteristic for statistical analysis (Saltz et al., 2020; Michael and
Miller, 2013).

TABLE 1.2 Qualitative and Quantitative Properties.


Factor Qualitative Quantitative
Meaning The data in which the classification The data which can be measured
of objects is based on attributes and and expressed numerically
properties
Analysis Nonstatistical Statistical
Type of data Unstructured Structured
Question Why? How many or how much?
Used to Get an initial understanding Recommends final cause of action
Methodology Exploratory Conclusive

1.5.2 DISCRETE AND CONTINUOUS PROPERTY

The discrete and continuous properties are given in Table 1.3.


• Discrete is a category of statistical data that can only assume a
fixed number of different values and lacks an underlying order
often. Often recognized as categorical, since it has divisions that
are different, intangible.
• Continuous is a category of statistical knowledge that within a
specified range will assume all the possible values. If an infinite
and uncountable range of values may be taken from a domain, then
the domain is referred to as continuous.

TABLE 1.3 Discrete and Continuous Properties.


Factor Discrete Continuous
This applies to a vector that It refers to a vector that considers
Meaning considers the number of the number of various values to be
independent values to be finite infinite and uncountable
Represented by Lines separated Points linked
By counting, values are
Values (provenance) Values are gained by measuring
collected
Values (assume) Distinct or distinct values For the two values, every meaning
Classification Nonoverlap Overlapping over
18 Research Practitioner's Handbook on Big Data Analytics

1.6 BIG DATA ANALYTICS

Big data typically applies to data that extends traditional databases and
data mining techniques’ usual storage, retrieval, and computational power.
Big data requires software and techniques as a resistance that can be used
to evaluate and derive trends from large-scale data (Goul et al., 2020).
Study of organized data progresses due to the variety and speed of the data
that are manipulated.
Therefore, the large range of data suggests that the structures in place
would be able to aid in the processing of data and are no longer adequate
to interpret data and generate reports. The research consists of efficiently
identifying the associations between the data across a spectrum which is
constantly shifting the data to aid in the utilization of it.
Big data analytics relates to the method by which vast datasets are
gathered, processed, and are evaluated to uncover numerous trends and
useful knowledge. Big data analytics refers to a group of tools and tech-
niques that use new methods of incorporation to retrieve vast volumes
of unknown knowledge from large datasets that are more complex,
large-scale, and distinct from conventional databases. It mainly focuses
on overcoming new issues or old problems in the most productive and
reliable way possible.

1.6.1 TYPES OF BIG DATA ANALYTICS

Big data is the compilation of large and dynamic databases and data types,
including huge numbers of data, social network processing applications
for data mining, and real-time data. A vast volume of heterogeneous
digital data remains where, in terms of terabytes and petabytes, massive
datasets are measured (Saltz et al., 2020). The various types of analytics
are discussed below (Figure 1.10).

1.6.1.1 DESCRIPTIVE ANALYTICS

This consists of posing the question: What’s going on? (Figure 1.11) It
is a preliminary step in the collection of data that provides a historical
dataset. Methods of data mining coordinate knowledge and help discover
Introduction to Big Data Analytics 19

trends that provide insight. Descriptive analytics offers potential patterns


and probabilities and provides an understanding of what could happen in
the future.

FIGURE 1.10 Types of analytics.

FIGURE 1.11 Descriptive analytics.


20 Research Practitioner's Handbook on Big Data Analytics

1.6.1.2 DIAGNOSTIC ANALYTICS

It consists of answering the question: why did it come about? (Figure 1.12)
Diagnostic analytics search for a problem’s root cause. To assess whether
anything happens, it is included. This form seeks to recognize the origins
of incidents and actions and understand them.

FIGURE 1.12 Diagnostic analytics.

1.6.1.3 PREDICTIVE ANALYTICS

It consists of answering the following question: What is going to happen?


(Figure 1.13) To foresee the future, it uses past evidence. Forecasting is
all around it. To evaluate existing data and build simulations about what
might happen, predictive analytics utilizes multiple tools, such as data
mining and artificial intelligence.

1.6.1.4 PRESCRIPTIVE ANALYTICS

It includes posing the question: What is to be done? (Figure 1.14) It is


committed to determining the best step to take. Descriptive analytics
Introduction to Big Data Analytics 21

includes past evidence and what could happen are anticipated through
predictive analytics. To identify the right answer, prescriptive analytics
utilizes these criteria.

FIGURE 1.13 Predictive analytics.

FIGURE 1.14 Prescriptive analytics.


22 Research Practitioner's Handbook on Big Data Analytics

1.6.1.5 DECISIVE ANALYTICS

A set of techniques for visualizing information and recommending courses


of action to facilitate human decision-making when presented with a set
of alternatives.

1.6.1.6 STREAMING ANALYTICS

Streaming analytics is the real-time analytics, where the data are collected
from sensors or devices (Figure 1.15). This kind of analytics help us
to identify and understand the pattern of data being generated and can
provide immediate analytics and solutions.

FIGURE 1.15 Streaming analytics.

1.6.1.7 LOCATION ANALYTICS

Location analytics involved much with geographical location analysis


(Figure 1.16).
Introduction to Big Data Analytics 23

The key features are as follows:


– Enhance business system with location-based predictions with the
geographical analysis.
– Maps are used for visual analytics and locate the targets groups.
– Spatial analytics—combining geographical information systems
with another type of analytics.
– Explore the temporal and special patterns to locate specific activi-
ties or behavior.
– Rich data collection: combines consumer’s demographic details,
map, lifestyle, and other socio-environmental factors.
– Presence of maps/GPS on all electronic gadgets.

confirmed cases

Dec 31 May 31 Jun 30 Sep 30 Dec 31 Mar 31


August 17, 2020
1,811,779 Confirmed Cases
-78,815 Weekly Decrease
deaths -4.14% Weekly Change
Source: World Health Organization
Data may be incomplete for the current day or week

Dec 31 May 31 Jun 30 Sep 30 Dec 31 Mar 31

FIGURE 1.16 Location analytics.


Source: World Health Organization covid cases.

1.6.1.8 WEB ANALYTICS


This type of analytics is based on the usage of websites and interactions
(Figure 1.17).
24 Research Practitioner's Handbook on Big Data Analytics

The main features are as follows:


– Now: the study of the behavior of web users.
– Future: the study of one mechanism for how society makes decisions.
– Example: behavior of web users:
• Number of people clicked on a particular product or webpage.
• The user location, usage of website duration, total number of
sites visited and the user interface experience of the website.
• What can this type of analytics tell us?
• Aid in decision making and presenting the inferred information.
– Commercially, it is the collection and analysis of data from a website to
determine that aspects of the website achieve the business objectives.

FIGURE 1.17 Web analytics.

1.6.1.9 VISUAL ANALYTICS

This type of analytics leverages the visualization techniques for presenting


complex problems into simpler one (Figure 1.18).
Introduction to Big Data Analytics 25

1.7 BIG DATA ANALYTICS TOOLS WITH THEIR KEY FEATURES

Big data analytics program is commonly used to evaluate a broad data


collection virtually. These software analytical methods help to recognize
emerging developments in the industry, consumer expectations, and other
data (Husamaldin and Saeed, 2020).
The cutting-edge big data analytics techniques have been the corner-
stone to achieving effective data processing with the increase in big data
volume and tremendous development in cloud computing. It will review
the top big data analytics platforms and their main function.

FIGURE 1.18 Visual analytics.


26 Research Practitioner's Handbook on Big Data Analytics

1.7.1 BIG DATA ANALYTICS TOOLS

1.7.1.1 APACHE STORM

Apache Storm is a large data computing system that is open sources


and free. Apache Storm is also an Apache product that supports every
programming language with a real-time application for data stream
processing. It includes a distributed framework for real-time, fault-tolerant
processing, with capabilities for real-time computation. About topology
configuration, storm scheduler manages workload with multiple nodes
and works well with the Hadoop Distributed File System (HDFS).

Features
• The transmission of 1 million 100-byte messages per second per
node is calculated.
• Storm to guarantee that the data device is processed at least once.
• Great scalability in horizontal terms.
• Fault-tolerance built-in.
• Auto-reboot for crashes.
• Clojure-written writing.
• Fits with the topology of a direct acyclic graph.
• Output files are in the language of JavaScript Object Notation (JSON).
• Real-time analytics, log analysis, ETL, continuous computing,
distributed RPC, deep learning have numerous use cases.

1.7.1.2 TALEND

Talend is a tool for big data that simplifies and automates the integra-
tion of big data. Native code is generated by its graphical wizard. It also
supports convergence with big data, master data processing, and data
consistency tests.

Features
Streamlines large data with ETL and ELT.
• Achieve the pace and size of the spark.
• Expedites the switch to real-time.
Another random document with
no related content on Scribd:
CHAPTER III.

The Relation of Vapor and Air.

Sect. 1.—The Boylean Law of the Air’s Elasticity.

I N the Sixth Book (Chap. iv. Sect. 1.) we have already seen how
the conception on the laws of fluid equilibrium was, by Pascal and
others, extended to air, as well as water. But though air presses and
is pressed as water presses and is pressed, pressure produces upon
air an effect which it does not, in any obvious degree, produce upon
water. Air which is pressed is also compressed, or made to occupy a
smaller space; and is consequently also made more dense, or
condensed; and on the other hand, when the pressure upon a
portion of air is diminished, the air expands or is rarefied. These
broad facts are evident. They are expressed in a general way by
saying that air is an elastic fluid, yielding in a certain degree to
pressure, and recovering its previous dimensions when the pressure
is removed.

But when men had reached this point, the questions obviously
offered themselves, in what degree and according to what law air
yields to pressure; when it is compressed, what relation does the
density bear to the pressure? The use which had been made of
tubes containing columns of mercury, by which the pressure of
portions of air was varied and measured, suggested obvious modes
of devising experiments by which this question might be answered.
Such experiments accordingly were made by Boyle about 1650; and
the result at which he arrived was, that when air is thus compressed,
the density is as the pressure. Thus if the pressure of the
atmosphere in its common state be equivalent to 30 inches of
mercury, as shown by the barometer; if air included in a tube be
pressed by 30 additional inches of 164 mercury, its density will be
doubled, the air being compressed into one half the space. If the
pressure be increased threefold, the density is also trebled; and so
on. The same law was soon afterwards (in 1676) proved
experimentally by Mariotte. And this law of the air’s elasticity, that the
density is as the pressure, is sometimes called the Boylean Law, and
sometimes the Law of Boyle and Mariotte.

Air retains its aerial character permanently; but there are other
aerial substances which appear as such, and then disappear or
change into some other condition. Such are termed vapors. And the
discovery of their true relation to air was the result of a long course
of researches and speculations.

[2nd Ed.] [It was found by M. Cagniard de la Tour (in 1823), that at
a certain temperature, a liquid, under sufficient pressure, becomes
clear transparent vapor or gas, having the same bulk as the liquid.
This condition Dr. Faraday calls the Cagniard de la Tour state, (the
Tourian state?) It was also discovered by Dr. Faraday that carbonic-
acid gas, and many other gases, which were long conceived to be
permanently elastic, are really reducible to a liquid state by
pressure. 39 And in 1835, M. Thilorier found the means of reducing
liquid carbonic acid to a solid form, by means of the cold produced in
evaporation. More recently Dr. Faraday has added several
substances usually gaseous to the list of those which could
previously be shown in the liquid state, and has reduced others,
including ammonia, nitrous oxide, and sulphuretted hydrogen, to a
solid consistency. 40 After these discoveries, we may, I think,
reasonably doubt whether all bodies are not capable of existing in
the three consistencies of solid, liquid, and air.
39 Phil. Trans. 1823.

40 Ib. Pt. 1. 1845.

We may note that the law of Boyle and Mariotte is not exactly true
near the limit at which the air passes to the liquid state in such cases
as that just spoken of. The diminution of bulk is then more rapid than
the increase of pressure.

The transition of fluids from a liquid to an airy consistence appears


to be accompanied by other curious phenomena. See Prof. Forbes’s
papers on the Color of Steam under certain circumstances, and on
the Colors of the Atmosphere, in the Edin. Trans. vol. xiv.] 165

Sect. 2.—Prelude to Dalton’s Doctrine of Evaporation.

Visible clouds, smoke, distillation, gave the notion of Vapor; vapor


was at first conceived to be identical with air, as by Bacon. 41 It was
easily collected, that by heat, water might be converted into vapor. It
was thought that air was thus produced, in the instrument called the
æolipile, in which a powerful blast is caused by a boiling fluid; but
Wolfe showed that the fluid was not converted into air, by using
camphorated spirit of wine, and condensing the vapor after it had
been formed. We need not enumerate the doctrines (if very vague
hypotheses may be so termed) of Descartes, Dechales, Borelli. 42
The latter accounted for the rising of vapor by supposing it a mixture
of fire and water; and thus, fire being much lighter than air, the
mixture also was light. Boyle endeavored to show that vapors do not
permanently float in vacuo. He compared the mixture of vapor with
air to that of salt with water. He found that the pressure of the
atmosphere affected the heat of boiling water; a very important fact.
Boyle proved this by means of the air-pump; and he and his friends
were much surprised to find that when air was removed, water only
just warm boiled violently. Huyghens mentions an experiment of the
same kind made by Papin about 1673.
41 Bacon’s Hist. Nat. Cent. i. p. 27.

42They may be seen in Fischer, Geschichte der Physik, vol. ii. p.


175.

The ascent of vapor was explained in various ways in succession,


according to the changes which physical science underwent. It was a
problem distinctly treated of, at a period when hydrostatics had
accounted for many phenomena; and attempts were naturally made
to reduce this fact to hydrostatical principles. An obvious hypothesis,
which brought it under the dominion of these principles, was, to
suppose that the water, when converted into vapor, was divided into
small hollow globules;—thin pellicles including air or heat. Halley
gave such an explanation of evaporation; Leibnitz calculated the
dimensions of these little bubbles; Derham managed (as he
supposed) to examine them with a magnifying glass: Wolfe also
examined and calculated on the same subject. It is curious to see so
much confidence in so lame a theory; for if water became hollow
globules in order to rise as vapor, we require, in order to explain the
formation of these globules, new laws of nature, which are not even
hinted at by 166 the supporters of the doctrine, though they must be
far more complex than the hydrostatical law by which a hollow
sphere floats.
Newton’s opinion was hardly more satisfactory; he 43 explained
evaporation by the repulsive power of heat; the parts of vapors,
according to him, being small, are easily affected by this force, and
thus become lighter than the atmosphere.
43 Opticks, Qu. 31.

Muschenbroek still adhered to the theory of globules, as the


explanation of evaporation; but he was manifestly discontented with
it; and reasonably apprehended that the pressure of the air would
destroy the frail texture of these bubbles. He called to his aid a
rotation of the globules (which Descartes also had assumed); and,
not satisfied with this, threw himself on electrical action as a reserve.
Electricity, indeed, was now in favor, as hydrostatics had been
before; and was naturally called in, in all cases of difficulty.
Desaguliers, also, uses this agent to account for the ascent of vapor,
introducing it into a kind of sexual system of clouds; according to
him, the male fire (heat) does a part, and the female fire (electricity)
performs the rest. These are speculations of small merit and no
value.

In the mean time, Chemistry made great progress in the


estimation of philosophers, and had its turn in the explanation of the
important facts of evaporation. Bouillet, who, in 1742, placed the
particles of water in the interstices of those of air, may be considered
as approaching to the chemical theory. In 1748, the Academy of
Sciences of Bourdeaux proposed the ascent of vapors as the subject
of a prize; which was adjudged in a manner very impartial as to the
choice of a theory; for it was divided between Kratzenstein, who
advocated the bubbles, (the coat of which he determined to be
50,000th of an inch thick,) and Hamberger, who maintained the truth
1⁄
to be the adhesion of particles of water to those of air and fire. The
latter doctrine had become much more distinct in the author’s mind
when seven years afterwards (1750) he published his Elementa
Physices. He then gave the explanation of evaporation in a phrase
which has since been adopted,—the solution of water in air; which
he conceived to be of the same kind as other chemical solutions.

This theory of solution was further advocated and developed by Le


Roi; 44 and in his hands assumed a form which has been extensively
adopted up to our times, and has, in many instances, tinged the
language commonly used. He conceived that air, like other solvents,
167 might be saturated; and that when the water was beyond the
amount required for saturation, it appeared in a visible form. The
saturating quantity was held to depend mainly on warmth and wind.
44 Ac. R. Sc. Paris, 1750.

This theory was by no means devoid of merit; for it brought


together many of the phenomena, and explained a number of the
experiments which Le Roi made. It explained the facts of the
transparency of vapor, (for perfect solutions are transparent,) the
precipitation of water by cooling, the disappearance of the visible
moisture by warming it again, the increased evaporation by rain and
wind; and other observed phenomena. So far, therefore, the
introduction of the notion of the chemical solution of water in air was
apparently very successful. But its defects are of a very fatal kind; for
it does not at all apply to the facts which take place when air is
excluded.

In Sweden, in the mean time, 45 the subject had been pursued in a


different, and in a more correct manner. Wallerius Ericsen had, by
various experiments, established the important fact, that water
evaporates in a vacuum. His experiments are clear and satisfactory;
and he inferred from them the falsity of the common explanation of
evaporation by the solution of water in air. His conclusions are drawn
in a very intelligent manner. He considers the question whether
water can be changed into air, and whether the atmosphere is, in
consequence, a mere collection of vapors; and on good reasons,
decides in the negative, and concludes the existence of
permanently-elastic air different from vapor. He judges, also, that
there are two causes concerned, one acting to produce the first
ascent of vapors, the other to support them afterwards. The first,
which acts in a vacuum, he conceives to be the mutual repulsion of
the particles; and since this force is independent of the presence of
other substances, this seems to be a sound induction. When the
vapors have once ascended into the air, it may readily be granted
that they are carried higher, and driven from side to side by the
currents of the atmosphere. Wallerius conceives that the vapor will
rise till it gets into air of the same density as itself, and being then in
equilibrium, will drift to and fro.
45 Fischer, Gesch. Phys. vol. v. p. 63.

The two rival theories of evaporation, that of chemical solution and


that of independent vapor, were, in various forms, advocated by the
next generation of philosophers. De Saussure may be considered as
the leader on one side, and De Luc on the other. The former
maintained the solution theory, with some modifications of his own.
De 168 Luc denied all solution, and held vapor to be a combination of
the particles of water with fire, by which they became lighter than air.
According to him, there is always fire enough present to produce this
combination, so that evaporation goes on at all temperatures.
This mode of considering independent vapor as a combination of
fire with water, led the attention of those who adopted that opinion to
the thermometrical changes which take place when vapor is formed
and condensed. These changes are important, and their laws
curious. The laws belong to the induction of latent heat, of which we
have just spoken; but a knowledge of them is not absolutely
necessary in order to enable us to understand the manner in which
steam exists in air.

De Luc’s views led him 46 also to the consideration of the effect of


pressure on vapor. He explains the fact that pressure will condense
vapor, by supposing that it brings the particles within the distance at
which the repulsion arising from fire ceases. In this way, he also
explains the fact, that though external pressure does thus condense
steam, the mixture of a body of air, by which the pressure is equally
increased, will not produce the same effect; and therefore, vapors
can exist in the atmosphere. They make no fixed proportion of it; but
at the same temperature we have the same pressure arising from
them, whether they are in air or not. As the heat increases, vapor
becomes capable of supporting a greater and greater pressure, and
at the boiling heat, it can support the pressure of the atmosphere.
46 Fischer, vol. vii. p. 453. Nouvelles Idées sur la Météorologie,
1787.

De Luc also marked very precisely (as Wallerius had done) the
difference between vapor and air; the former being capable of
change of consistence by cold or pressure, the latter not so. Pictet,
in 1786, made a hygrometrical experiment, which appeared to him to
confirm De Luc’s views; and De Luc, in 1792, published a concluding
essay on the subject in the Philosophical Transactions. Pictet’s
Essay on Fire, in 1791, also demonstrated that “all the train of
hygrometrical phenomena takes place just as well, indeed rather
quicker, in a vacuum than in air, provided the same quantity of
moisture is present.” This essay, and De Luc’s paper, gave the
death-blow to the theory of the solution of water in air.

Yet this theory did not fall without an obstinate struggle. It was
taken up by the new school of French chemists, and connected with
their views of heat. Indeed, it long appears as the prevalent opinion.
169 Girtanner, 47 in his Grounds of the Antiphlogistic Theory, may be
considered as one of the principal expounders of this view of the
matter. Hube, of Warsaw, was, however, the strongest of the
defenders of the theory of solution, and published upon it repeatedly
about 1790. Yet he appears to have been somewhat embarrassed
with the increase of the air’s elasticity by vapor. Parrot, in 1801,
proposed another theory, maintaining that De Luc had by no means
successfully attacked that of solution, but only De Saussure’s
superfluous additions to it.
47 Fischer, vol. vii. 473.

It is difficult to see what prevented the general reception of the


doctrine of independent vapor; since it explained all the facts very
simply, and the agency of air was shown over and over again to be
unnecessary. Yet, even now, the solution of water in air is hardly
exploded. M. Gay Lussac, 48 in 1800, talks of the quantity of water
“held in solution” by the air; which, he says, varies according to its
temperature and density by a law which has not yet been
discovered. And Professor Robison, in the article “Steam,” in the
Encyclopædia Britannica (published about 1800), says, 49 “Many
philosophers imagine that spontaneous evaporation, at low
temperatures, is produced in this way (by elasticity alone). But we
cannot be of this opinion; and must still think that this kind of
evaporation is produced by the dissolving power of the air.” He then
gives some reasons for his opinion. “When moist air is suddenly
rarefied, there is always a precipitation of water. But by this new
doctrine the very contrary should happen, because the tendency of
water to appear in the elastic form is promoted by removing the
external pressure.” Another main difficulty in the way of the doctrine
of the mere mixture of vapor and air was supposed to be this; that if
they were so mixed, the heavier fluid would take the lower part, and
the lighter the higher part, of the space which they occupied.
48 Ann. Chim. tom. xliii.

49 Robison’s Works, ii. 37.

The former of these arguments was repelled by the consideration


that in the rarefaction of air, its specific heat is changed, and thus its
temperature reduced below the constituent temperature of the vapor
which it contains. The latter argument is answered by a reference to
Dalton’s law of the mixture of gases. We must consider the
establishment of this doctrine in a new section, as the most material
step to the true notion of evaporation. 170

Sect. 3.—Dalton’s Doctrine of Evaporation.

A portion of that which appears to be the true notion of evaporation


was known, with greater or less distinctness, to several of the
physical philosophers of whom we have spoken. They were aware
that the vapor which exists in air, in an invisible state, may be
condensed into water by cold: and they had noticed that, in any state
of the atmosphere, there is a certain temperature lower than that of
the atmosphere, to which, if we depress bodies, water forms upon
them in fine drops like dew; this temperature is thence called the
dew-point. The vapor of water which exists anywhere may be
reduced below the degree of heat which is necessary to constitute it
vapor, and thus it ceases to be vapor. Hence this temperature is also
called the constituent temperature. This was generally known to the
meteorological speculators of the last century, although, in England,
attention was principally called to it by Dr. Wells’s Essay on Dew, in
1814. This doctrine readily explains how the cold produced by
rarefaction of air, descending below the constituent temperature of
the contained vapor, may precipitate a dew; and thus, as we have
said, refutes one obvious objection to the theory of independent
vapor.

The other difficulty was first fully removed by Mr. Dalton. When his
attention was drawn to the subject of vapor, he saw insurmountable
objections to the doctrine of a chemical union of water and air. In
fact, this doctrine was a mere nominal explanation; for, on closer
examination, no chemical analogies supported it. After some
reflection, and in the sequel of other generalizations concerning
gases, he was led to the persuasion, that when air and steam are
mixed together, each follows its separate laws of equilibrium, the
particles of each being elastic with regard to those of their own kind
only: so that steam may be conceived as flowing among the particles
of air 50 “like a stream of water among pebbles;” and the resistance
which air offers to evaporation arises, not from its weight, but from
the inertia of its particles.
50 Manchester Memoirs, vol. v. p. 581.

It will be found that the theory of independent vapor, understood


with these conditions, will include all the facts of the case;—gradual
evaporation in air; sudden evaporation in a vacuum; the increase of
171 the air’s elasticity by vapor; condensation by its various causes;
and other phenomena.

But Mr. Dalton also made experiments to prove his fundamental


principle, that if two different gases communicate, they will diffuse
themselves through each other; 51 —slowly, if the opening of
communication be small. He observes also, that all the gases had
equal solvent powers for vapor, which could hardly have happened,
had chemical affinity been concerned. Nor does the density of the air
make any difference.
51 New System of Chemical Philosophy, vol. i. p. 151.

Taking all these circumstances into the account, Mr. Dalton


abandoned the idea of solution. “In the autumn of 1801,” he says, “I
hit upon an idea which seemed to be exactly calculated to explain
the phenomena of vapor: it gave rise to a great variety of
experiments,” which ended in fixing it in his mind as a true idea.
“But,” he adds, “the theory was almost universally misunderstood,
and consequently reprobated.”

Mr. Dalton answers various objections. Berthollet had urged that


we can hardly conceive the particles of an elastic substance added
to those of another, without increasing its elasticity. To this Mr. Dalton
replies by adducing the instance of magnets, which repel each other,
but do not repel other bodies. One of the most curious and ingenious
objections is that of M. Gough, who argues, that if each gas is elastic
with regard to itself alone, we should hear, produced by one stroke,
four sounds; namely, first, the sound through aqueous vapor;
second, the sound through azotic gas; third, the sound through
oxygen gas; fourth, the sound through carbonic acid. Mr. Dalton’s
answer is, that the difference of time at which these sounds would
come is very small; and that, in fact, we do hear, sounds double and
treble.

In his New System of Chemical Philosophy, Mr. Dalton considers


the objections of his opponents with singular candor and impartiality.
He there appears disposed to abandon that part of the theory which
negatives the mutual repulsion of the particles of the two gases, and
to attribute their diffusion through one another to the different size of
the particles, which would, he thinks, 52 produce the same effect.
52 New System, vol. i. p. 188.

In selecting, as of permanent importance, the really valuable part


of this theory, we must endeavor to leave out all that is doubtful or
unproved. I believe it will be found that in all theories hitherto 172
promulgated, all assertions respecting the properties of the particles
of bodies, their sizes, distances, attractions, and the like, are
insecure and superfluous. Passing over, then, such hypotheses, the
inductions which remain are these;—that two gases which are in
communication will, by the elasticity of each, diffuse themselves in
one another, quickly or slowly; and—that the quantity of steam
contained in a certain space of air is the same, whatever be the air,
whatever be its density, and even if there be a vacuum. These
propositions may be included together by saying, that one gas is
mechanically mixed with another; and we cannot but assent to what
Mr. Dalton says of the latter fact,—“this is certainly the touchstone of
the mechanical and chemical theories.” This doctrine of the
mechanical mixture of gases appears to supply answers to all the
difficulties opposed to it by Berthollet and others, as Mr. Dalton has
shown; 53 and we may, therefore, accept it as well established.
53 New System, vol. i. p. 160, &c.

This doctrine, along with the principle of the constituent


temperature of steam, is applicable to a large series of
meteorological and other consequences. But before considering the
applications of theory to natural phenomena, which have been
made, it will be proper to speak of researches which were carried on,
in a great measure, in consequence of the use of steam in the arts: I
mean the laws which connect its elastic force with its constituent
temperature.

Sect. 4.—Determination of the Laws of the Elastic Force of Steam.

The expansion of aqueous vapor at different temperatures is


governed, like that of all other vapors, by the law of Dalton and Gay-
Lussac, already mentioned; and from this, its elasticity, when its
expansion is resisted, will be known by the law of Boyle and
Mariotte; namely, by the rule that the pressure of airy fluids is as the
condensation. But it is to be observed, that this process of
calculation goes on the supposition that the steam is cut off from
contact with water, so that no more steam can be generated; a case
quite different from the common one, in which the steam is more
abundant as the heat is greater. The examination of the force of
vapor, when it is in contact with water, must be briefly noticed.

During the period of which we have been speaking, the progress


of the investigation of the laws of aqueous vapor was much
accelerated 173 by the growing importance of the steam-engine, in
which those laws operated in a practical form. James Watts, the
main improver of that machine, was thus a great contributor to
speculative knowledge, as well as to practical power. Many of his
improvements depended on the laws which regulate the quantity of
heat which goes to the formation or condensation of steam; and the
observations which led to these improvements enter into the
induction of latent heat. Measurements of the force of steam, at all
temperatures, were made with the same view. Watts’s attention had
been drawn to the steam-engine in 1759, by Robison, the former
being then an instrument-maker, and the latter a student at the
University of Glasgow. 54 In 1761 or 1762, he tried some experiments
on the force of steam in a Papin’s Digester; 55 and formed a sort of
working model of a steam-engine, feeling already his vocation to
develope the powers of that invention. His knowledge was at that
time principally derived from Desaguliers and Belidor, but his own
experiments added to it rapidly. In 1764 and 1765, he made a more
systematical course of experiments, directed to ascertain the force of
steam. He tried this force, however, only at temperatures above the
boiling-point; and inferred it at lower degrees from the supposed
continuity of the law thus obtained. His friend Robison, also, was
soon after led, by reading the account of some experiments of Lord
Charles Cavendish, and some others of Mr. Nairne, to examine the
same subject. He made out a table of the correspondence of the
elasticity and the temperature of vapor, from thirty-two to two
hundred and eighty degrees of Fahrenheit’s thermometer. 56 The
thing here to be remarked, is the establishment of a law of the
pressure of steam, down to the freezing-point of water. Ziegler of
Basle, in 1769, and Achard of Berlin, in 1782, made similar
experiments. The latter examined also the elasticity of the vapor of
alcohol. Betancourt, in 1792, published his Memoir on the expansive
force of vapors; and his tables were for some time considered the
most exact. 174 Prony, in his Architecture Hydraulique (1796),
established a mathematical formula, 57 on the experiments of
Betancourt, who began his researches in the belief that he was first
in the field, although he afterwards found that he had been
anticipated by Ziegler. Gren compared the experiments of
Betancourt and De Luc with his own. He ascertained an important
fact, that when water boils, the elasticity of the steam is equal to that
of the atmosphere. Schmidt at Giessen endeavored to improve the
apparatus used by Betancourt; and Biker, of Rotterdam, in 1800,
made new trials for the same purpose.
54 Robison’s Works, vol. ii. p. 113.

55 Denis Papin, who made many of Boyle’s experiments for him,


had discovered that if the vapor be prevented from rising, the
water becomes hotter than the usual boiling-point; and had hence
invented the instrument called Papin’s Digester. It is described in
his book, La manière d’amolir les os et de faire cuire toutes sorts
de viandes en fort peu de temps et à peu de frais. Paris, 1682.

56 These were afterwards published in the Encyclopædia


Britannica; in the article “Steam,” written by Robison.

57 Architecture Hydraulique, Seconde Partie, p. 163.

In 1801, Mr. Dalton communicated to the Philosophical Society of


Manchester his investigations on this subject; observing truly, that
though the forces at high temperatures are most important when
steam is considered as a mechanical agent, the progress of
philosophy is more immediately interested in accurate observations
on the force at low temperatures. He also found that his elasticities
for equidistant temperatures resembled a geometrical progression,
but with a ratio constantly diminishing. Dr. Ure, in 1818, published in
the Philosophical Transactions of London, experiments of the same
kind, valuable from the high temperatures at which they were made,
and for the simplicity of his apparatus. The law which he thus
obtained approached, like Dalton’s, to a geometrical progression. Dr.
Ure says, that a formula proposed by M. Biot gives an error of near
nine inches out of seventy-five, at a temperature of 266 degrees.
This is very conceivable, for if the formula be wrong at all, the
geometrical progress rapidly inflames the error in the higher portions
of the scale. The elasticity of steam, at high temperatures, has also
been experimentally examined by Mr. Southern, of Soho, and Mr.
Sharpe, of Manchester. Mr. Dalton has attempted to deduce certain
general laws from Mr. Sharpe’s experiments; and other persons
have offered other rules, as those which govern the force of steam
with reference to the temperature: but no rule appears yet to have
assumed the character of an established scientific truth. Yet the law
of the expansive force of steam is not only required in order that the
steam-engine may be employed with safety and to the best
advantage; but must also be an important point in every consistent
thermotical theory.

[2nd Ed.] [To the experiments on steam made by private


physicists, are to be added the experiments made on a grand scale
by order of the governments of France and of America, with a view
to 175 legislation on the subject of steam-engines. The French
experiments were made in 1823, under the direction of a
commission consisting of some of the most distinguished members
of the Academy of Sciences; namely, MM. de Prony, Arago, Girard,
and Dulong. The American experiments were placed in the hands of
a committee of the Franklin Institute of the State of Pennsylvania,
consisting of Prof. Bache and others, in 1830. The French
experiments went as high as 435° of Fahrenheit’s thermometer,
corresponding to a pressure of 60 feet of mercury, or 24
atmospheres. The American experiments were made up to a
temperature of 346°, which corresponded to 274 inches of mercury,
more than 9 atmospheres. The extensive range of these
experiments affords great advantages for determining the law of the
expansive force. The French Academy found that their experiments
indicated an increase of the elastic force according to the fifth power
of a binominal 1 + mt, where t is the temperature. The American
Institute were led to a sixth power of a like binominal. Other
experimenters have expressed their results, not by powers of the
temperature, but by geometrical ratios. Dr. Dalton had supposed that
the expansion of mercury being as the square of the true
temperature above its freezing-point, the expansive force of steam
increases in geometrical ratio for equal increments of temperature.
And the author of the article Steam in the Seventh Edition of the
Encyclopædia Britannica (Mr. J. S. Russell), has found that the
experiments are best satisfied by supposing mercury, as well as
steam, to expand in a geometrical ratio for equal increments of the
true temperature.

It appears by such calculation, that while dry gas increases in the


ratio of 8 to 11, by an increase of temperature from freezing to
boiling water; steam in contact with water, by the same increase of
temperature above boiling water, has its expansive force increased
in the proportion of 1 to 12. By an equal increase of temperature,
mercury expands in about the ratio of 8 to 9.

Recently, MM. Magnus of Berlin, Holzmann and Regnault, have


made series of observations on the relation between temperature
and elasticity of steam. 58
58 See Taylor’s Scientific Memoirs, Aug. 1845, vol. iv. part xiv.,
and Ann. de Chimie.
Prof. Magnus measured his temperatures by an air-thermometer;
a process which, I stated in the first edition, seemed to afford the
best promise of simplifying the law of expansion. His result is, that
the 176 elasticity proceeds in a geometric series when the
temperature proceeds in an arithmetical series nearly; the
differences of temperature for equal augmentations of the ratio of
elasticity being somewhat greater for the higher temperatures.

The forces of the vapors of other liquids in contact with their


liquids, determined by Dr. Faraday, as mentioned in Chap. ii. Sect. 1,
are analogous to the elasticity of steam here spoken of.]

~Additional material in the 3rd edition.~

Sect. 5.—Consequences of the Doctrine of Evaporation.—


Explanation of Rain, Dew, and Clouds.

The discoveries concerning the relations of heat and moisture which


were made during the last century, were principally suggested by
meteorological inquiries, and were applied to meteorology as fast as
they rose. Still there remains, on many points of this subject, so
much doubt and obscurity, that we cannot suppose the doctrines to
have assumed their final form; and therefore we are not here called
upon to trace their progress and connexion. The principles of
atmology are pretty well understood; but the difficulty of observing
the conditions under which they produce their effects in the
atmosphere is so great, that the precise theory of most
meteorological phenomena is still to be determined.

We have already considered the answers given to the question:


According to what rules does transparent aqueous vapor resume its
form of visible water? This question includes, not only the problems
of Rain and Dew, but also of Clouds; for clouds are not vapor, but
water, vapor being always invisible. An opinion which attracted much
notice in its time, was that of Hutton, who, in 1784, endeavored to
prove that if two masses of air saturated with transparent vapor at
different temperatures are mixed together, the precipitation of water
in the form either of cloud or of drops will take place. The reason he
assigned for the opinion was this: that the temperature of the mixture
is a mean between the two temperatures, but that the force of the
vapor in the mixture, which is the mean of the forces of the two
component vapors, will be greater than that which corresponds to
the mean temperature, since the force increases faster than the
temperature; 59 and hence some part of the vapor will be
precipitated. This doctrine, it will be seen, speaks of vapor as
“saturating” air, and is 177 therefore, in this form, inconsistent with
Dalton’s principle; but it is not difficult to modify the expression so as
to retain the essential part of the explanation.
59 Edin. Trans. vol. 1. p. 42.

Dew.—The principle of a “constituent temperature” of steam, and


the explanation of the “dew-point,” were known, as we have said
(chap. iii. sect. 3,) to the meteorologists of the last century; but we
perceive how incomplete their knowledge was, by the very gradual
manner in which the consequences of this principle were traced out.
We have already noticed, as one of the books which most drew
attention to the true doctrine, in this country at least, Dr. Wells’s
Essay on Dew, published in 1814. In this work the author gives an
account of the progress of his opinions; 60 “I was led,” he says, “in
the autumn of 1784, by the event of a rude experiment, to think it
probable that the formation of dew is attended with the production of

You might also like