Eti Chapter 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Emerging Trends in CO and IT (22618)

Unit-1 Artificial Intelligence

Contents
1.1 Introduction of AI
● Concept
● Scope of AI
● Components of AI
● Types of AI
● Application of AI
1.2 Data Visualization
● Data types in data visualization
● Scales map of data values in aesthetics
● Use of coordinate system in data visualization
● Use of colors to represent data values
● Representing - Amounts, Distribution, and Proportions
1.3 Data Storytelling
● Introduction
● Ineffectiveness of Graphical representation of data
● Explanatory Analysis
○ Who
○ What
○ How
1.4 Concept of machine learning and deep learning.

1.1 Introduction of AI
A branch of Computer Science named Artificial Intelligence (AI)pursues creating the
computers / machines as intelligent as human beings. John McCarthy the father of Artificial
Intelligence described AI as, “The science and engineering of making intelligent
machines, especially intelligent computer programs”. Artificial Intelligence (AI) is a
branch of science which deals with helping machines find solutions to complex problems
in a more human-like fashion.
Artificial is defined in different approaches by various researchers during its evolution,
such as “Artificial Intelligence is the study of how to make computers do things which at
the moment, people do better.”
There are other possible definitions “like AI is a collection of hard problems which can be
solved by humans and other living things, but for which we don’t have good algorithms for
solving.” e. g., understanding spoken natural language, medical diagnosis, circuit design,
learning, self-adaptation, reasoning, chess playing, proving math theories, etc.

Maharashtra State Board of Technical Education P a g e 1 | 151


Emerging Trends in CO and IT (22618)

● Data: Data is defined as symbols that represent properties of objects events and their
environment.
● Information: Information is a message that contains relevant meaning, implication, or
input for decision and/or action.
● Knowledge: It is the (1) cognition or recognition (know-what), (2) capacity to
act(know-how), and (3) understanding (know-why) that resides or is contained within
the mind or in the brain.
● Intelligence: It requires ability to sense the environment, to make decisions, and to
control action.

1.1.1 Concept:
Artificial Intelligence is one of the emerging technologies that try to simulate human
reasoning in AI systems The art and science of bringing learning, adaptation and self-
organization to the machine is the art of Artificial Intelligence. Artificial Intelligence is the
ability of a computer program to learn and think. Artificial intelligence (AI) is an area of
computer science that emphasizes the creation of intelligent machines that work and reacts
like humans. AI is built on these three important concepts
Machine learning: When you command your smartphone to call someone, or when you chat
with a customer service chatbot, you are interacting with software that runs on AI. But this
type of software actually is limited to what it has been programmed to do. However, we expect
to soon have systems that can learn new tasks without humans having to guide them. The idea
is to give them a large number of examples for any given chore, and they should be able to
process each one and learn how to do it by the end of the activity.
Deep learning: The machine learning example I provided above is limited by the fact that
humans still need to direct the AI’s development. In deep learning, the goal is for the software
to use what it has learned in one area to solve problems in other areas. For example, a program
that has learned how to distinguish images in a photograph might be able to use this learning
to seek out patterns in complex graphs.
Neural networks: These consist of computer programs that mimic the way the human brain
processes information. They specialize in clustering information and recognizing complex
patterns, giving computers the ability to use more sophisticated processes to analyze data.

Maharashtra State Board of Technical Education P a g e 2 | 151


Emerging Trends in CO and IT (22618)

1.1.2 Scope of AI:


The ultimate goal of artificial intelligence is to create computer programs that can solve
problems and achieve goals like humans would. There is scope in developing machines in
robotics, computer vision, language detection machine, game playing, expert systems, speech
recognition machine and much more.
The following factors characterize a career in artificial intelligence:
● Automation
● Robotics
● The use of sophisticated computer software
Individuals considering pursuing a career in this field require specific education based on the
foundations of math, technology, logic and engineering perspectives. Apart from these, good
communication skills (written and verbal) are imperative to convey how AI services and tools
will help when employed within industry settings.

AI Approach:
The difference between machine and human intelligence is that the human think / act
rationally compares to machine. Historically, all four approaches to AI have been followed,
each by different people with different methods.

Figure 1.1 AI Approaches


Think Well:
Develop formal models of knowledge representation, reasoning, learning, memory, problem
solving that can be rendered in algorithms. There is often an emphasis on a system that are
provably correct, and guarantee finding an optimal solution.
Act Well:
For a given set of inputs, generate an appropriate output that is not necessarily correct but gets
the job done.
● A heuristic (heuristic rule, heuristic method) is a rule of thumb, strategy, trick,
simplification, or any other kind of device which drastically limits search for solutions
in large problem spaces.
● Heuristics do not guarantee optimal solutions; in fact, they do not guarantee any
solution at all:
● all that can be said for a useful heuristic is that it offers solutions which are good

Maharashtra State Board of Technical Education P a g e 3 | 151


Emerging Trends in CO and IT (22618)

enough most of the time

Think like humans:


Cognitive science approach. Focus not just on behavior and I/O but also look at reasoning
process. The Computational model should reflect “how” results were obtained. Provide a new
language for expressing cognitive theories and new mechanisms for evaluating them. GPS
(General Problem Solver): Goal not just to produce humanlike behavior (like ELIZA), but to
produce a sequence of steps of the reasoning process that was similar to the steps followed by
a person in solving the same task.
Act like humans:
Behaviorist approach-Not interested in how you get results, just the similarity to what human
results is.
Example:
ELIZA: A program that simulated a psychotherapist interacting with a patient and
successfully passed the Turing Test. It was coded at MIT during 1964-1966 by Joel
Weizenbaum. First script was DOCTOR. The script was a simple collection of syntactic
patterns not unlike regular expressions. Each pattern had an associated reply which might
include bits of the input (after simple transformations (my →your) Weizenbaum was shocked
at reactions: Psychiatrists thought it had potential. People unequivocally anthropomorphized.

1.1.3 Components of AI
The core components and constituents of AI are derived from the concept of logic, cognition
and computation; and the compound components, built-up through core components are
knowledge, reasoning, search, natural language processing, vision etc.
Level Core Compound Coarse components

Induction Proposition
Knowledge Reasoning Knowledge based systems, Heuristic
Logic Tautology Model
Control Search Search Theorem Proving
Logic

Temporal Learning
Multi Agent system Co-operation,
Cognition Adaptation Belief Desire Intention
Co-ordination AI Programming
Self-organization
Vision
Functional Memory Perception Utterance Natural Language Speech
Processing
The core entities are inseparable constituents of AI in that these concepts are fused at atomic
level. The concepts derived from logic are propositional logic, tautology, predicate calculus,
model and temporal logic. The concepts of cognitive science are of two types: one is
functional which includes learning, adaptation and self-organization, and the other is memory
and perception which are physical entities. The physical entities generate some functions to
make the compound components.

Maharashtra State Board of Technical Education P a g e 4 | 151


Emerging Trends in CO and IT (22618)

The compound components are made of some combination of the logic and cognition stream.
These are knowledge, reasoning and control generated from constituents of logic such as
predicate calculus, induction and tautology and some from cognition (such as learning and
adaptation). Similarly, belief, desire and intention are models of mental states that are
predominantly based on cognitive components but less on logic. Vision, utterance (vocal) and
expression (written) are combined effect of memory and perceiving organs or body sensors
such as ear, eyes and vocal. The gross level contains the constituents at the third level which
are knowledge-based systems (KBS), heuristic search, automatic theorem proving, multi-
agent systems, Al languages such as PROLOG and LISP, Natural language processing (NLP).
Speech processing and vision are based mainly on the principle of pattern recognition.
AI Dimension: The philosophy of Al in three-dimensional representations consists in logic,
cognition and computation in the x-direction, knowledge, reasoning and interface in the y-
direction. The x-y plane is the foundation of AI. The z-direction consists of correlated systems
of physical origin such as language, vision and perception as shown in Figure.1.2

Figure 1.2 Three-dimensional model of AI

Maharashtra State Board of Technical Education P a g e 5 | 151


Emerging Trends in CO and IT (22618)

The First Dimension (Core)


The theory of logic, cognition and computation constitutes the fusion factors for the
formation of one of the foundations on coordinate x-axis. Philosophy from its very
inception of origin covered all the facts, directions and dimensions of human thinking
output. Aristotle's theory of syllogism, Descartes and Kant's critic of pure reasoning and
contribution of many other philosophers made knowledge-based on logic. It was
Charles Babbage and Boole who demonstrated the power of computation logic.
Although the modern philosophers such as Bertrand Russell correlated logic with
mathematics but it was Turing who developed the theory of computation for
mechanization. In the 1960s, Marvin Minsky pushed the logical formalism to integrate
reasoning with knowledge.

Cognition:
Computers has become so popular in a short span of time due to the simple reason that
they adapted and projected the information processing paradigm (IPP) of human beings:
sensing organs as input, mechanical movement organs as output and the central nervous
system (CNS) in brain as control and computing devices, short-term and long-term
memory were not distinguished by computer scientists but, as a whole, it was in
conjunction, termed memory.
In further deepening level, the interaction of stimuli with the stored information to
produce new information requires the process of learning, adaptation and self-
organization. These functionalities in the information processing at a certain level of
abstraction of brain activities demonstrate a state of mind which exhibits certain specific
behavior to qualify as intelligence. Computational models were developed and
incorporated in machines which mimicked the functionalities of human origin. The
creation of such traits of human beings in the computing devices and processes
originated the concept of intelligence in machine as virtual mechanism. These virtual
machines were termed in due course of time artificial intelligent machines.

Computation
The theory of computation developed by Turing-finite state automation—was a turning
point in mathematical model to logical computational. Chomsky's linguistic
computational theory generated a model for syntactic analysis through a regular
grammar.

The Second Dimension


The second dimension contains knowledge, reasoning and interface which are the
components of knowledge-based system (KBS). Knowledge can be logical; it may be
processed as information which is subject to further computation. This means that any
item on the y-axis is correlated with any item on the x-axis to make the foundation of
any item on the z-axis. Knowledge and reasoning are difficult to prioritize, which occurs
first: whether knowledge is formed first and then reasoning is performed or as reasoning
Maharashtra State Board of Technical Education P a g e 6 | 151
Emerging Trends in CO and IT (22618)

is present, knowledge is formed. Interface is a means of communication between one


domain to another. Here, it connotes a different concept then the user's interface. The
formation of a permeable membrane or transparent solid structure between two domains
of different permittivity is termed interface. For example, in the industrial domain, the
robot is an interface. A robot exhibits all traits of human intelligence in its course of
action to perform mechanical work. In the KBS, the user's interface is an example of
the interface between computing machine and the user. Similarly, a program is an
interface between the machine and the user. The interface may be between human and
human, i.e. experts in one domain to experts in another domain. Human-to- machine is
program and machine-to-machine is hardware. These interfaces are in the context of
computation and AI methodology.

The Third Dimension


The third dimension leads to the orbital or peripheral entities, which are built on the
foundation of x-y plane and revolve around these for development. The entities include
an information system. NLP, for example, is formed on the basis of the linguistic
computation theory of Chomsky and concepts of interface and knowledge on y-
direction. Similarly, vision has its basis on some computational model such as
clustering, pattern recognition computing models and image processing algorithms on
the x-direction and knowledge of the domain on the y-direction.
The third dimension is basically the application domain. Here, if the entities are near
the origin, more and more concepts are required from the x-y plane. For example,
consider information and automation, these are far away from entities on z-direction,
but contain some of the concepts of cognition and computation model respectively on
x-direction and concepts of knowledge (data), reasoning and interface on the y-
direction.
In general, any quantity in any dimension is correlated with some entities on the other
dimension. The implementation of the logical formalism was accelerated by the rapid
growth in electronic technology, in general and multiprocessing parallelism in
particular.

1.1.4 Types of AI
Artificial Intelligence can be divided in various types, there are mainly two types of
main categorization which are based on capabilities and based on functionally of AI.
Following is flow diagram which explain the types of AI.

Maharashtra State Board of Technical Education P a g e 7 | 151


Emerging Trends in CO and IT (22618)

Figure 1.3 Types of AI

Types of AI

AI type-1: Based on Capabilities


1. Weak AI or Narrow AI:
● Narrow AI is a type of AI which is able to perform a dedicated task with
intelligence. The most common and currently available AI is Narrow AI in the
world of Artificial Intelligence.
● Narrow AI cannot perform beyond its field or limitations, as it is only trained for
one specific task. Hence it is also termed as weak AI. Narrow AI can fail in
unpredictable ways if it goes beyond its limits.
● Apple Siri is a good example of Narrow AI, but it operates with a limited pre-
defined range of functions.
● IBM's Watson supercomputer also comes under Narrow AI, as it uses an Expert
system approach combined with Machine learning and natural language
processing.
● Some Examples of Narrow AI are playing chess, purchasing suggestions on e-
commerce site, self-driving cars, speech recognition, and image recognition.
2. General AI:
● General AI is a type of intelligence which could perform any intellectual task
with efficiency like a human.
● The idea behind the general AI to make such a system which could be smarter
and think like a human by its own.
● Currently, there is no such system exist which could come under general AI and
can perform any task as perfect as a human.
● The worldwide researchers are now focused on developing machines with
General AI.
● As systems with general AI are still under research, and it will take lots of efforts
and time to develop such systems.
3. Super AI:
● Super AI is a level of Intelligence of Systems at which machines could surpass
human intelligence, and can perform any task better than human with cognitive
properties. It is an outcome of general AI.
● Some key characteristics of strong AI include capability include the ability to
Maharashtra State Board of Technical Education P a g e 8 | 151
Emerging Trends in CO and IT (22618)

think, to reason, solve the puzzle, make judgments, plan, learn, and communicate
by its own.
● Super AI is still a hypothetical concept of Artificial Intelligence. Development
of such systems in real is still world changing task.

Artificial Intelligence type-2: Based on functionality


1. Reactive Machines
● Purely reactive machines are the most basic types of Artificial Intelligence.
● Such AI systems do not store memories or past experiences for future actions.
● These machines only focus on current scenarios and react on it as per possible
best action.
● IBM's Deep Blue system is an example of reactive machines.
● Google's AlphaGo is also an example of reactive machines.
● Limited Memory
● Limited memory machines can store past experiences or some data for a short
period of time.
● These machines can use stored data for a limited time period only.
● Self-driving cars are one of the best examples of Limited Memory systems.
These cars can store recent speed of nearby cars, the distance of other cars, speed
limit, and other information to navigate the road.

2. Theory of Mind
● Theory of Mind AI should understand the human emotions, people, beliefs, and
be able to interact socially like humans.
● This type of AI machines are still not developed, but researchers are making
lots of efforts and improvement for developing such AI machines.

3. Self-Awareness
● Self-awareness AI is the future of Artificial Intelligence. These machines will
be super intelligent, and will have their own consciousness, sentiments, and self-
awareness.
● These machines will be smarter than human mind.
● Self-Awareness AI does not exist in reality still and it is a hypothetical concept.

1.1.5 Application of AI
AI has been dominant in various fields such as −
● Gaming: AI plays crucial role in strategic games such as chess, poker, tic-tac-
toe, etc., where machine can think of large number of possible positions based on
heuristic knowledge.
● Natural Language Processing: It is possible to interact with the computer that
understands natural language spoken by humans.
● Expert Systems: There are some applications which integrate machine,

Maharashtra State Board of Technical Education P a g e 9 | 151


Emerging Trends in CO and IT (22618)

software, and special information to impart reasoning and advising. They provide
explanation and advice to the users.
● Vision Systems: These systems understand, interpret, and comprehend visual
input on the computer. For example,
• A spying aeroplane takes photographs, which are used to figure out spatial
information or map of the areas.
• Doctors use clinical expert system to diagnose the patient.
• Police use computer software that can recognize the face of criminal with the
stored portrait made by forensic artist.
● Speech Recognition: Some intelligent systems are capable of hearing and
comprehending the language in terms of sentences and their meanings while a
human talks to it. It can handle different accents, slang words, noise in the
background, change in human’s noise due to cold, etc.
● Handwriting Recognition: The handwriting recognition software reads the text
written on paper by a pen or on screen by a stylus. It can recognize the shapes of
the letters and convert it into editable text.
● Intelligent Robots: Robots are able to perform the tasks given by a human.
They have sensors to detect physical data from the real world such as light, heat,
temperature, movement, sound, bump, and pressure. They have efficient
processors, multiple sensors and huge memory, to exhibit intelligence. In addition,
they are capable of learning from their mistakes and they can adapt to the new
environment.

1.2 Data Visualization

Maharashtra State Board of Technical Education P a g e 10 | 151


Emerging Trends in CO and IT (22618)

1.2.1 Introduction –
Data visualization is the graphical representation of information and data. By
using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to see and understand trends, outliers, and patterns in
data. It also provides an excellent way to present data to non-technical audiences
without confusion. The first and foremost objective of data visualization is to
convey data correctly. Whenever we visualize data, we take data values and
convert them in a systematic and logical way into the visual elements that make
up the final graphic. Even though there are many different types of data
visualizations, and on first glance a scatterplot, a pie chart, and a heatmap don’t
seem to have much in common, all these visualizations can be described with a
common language that captures how data values are turned into blobs of ink on
paper or colored pixels on a screen. The key insight is the following: all data
visualizations map data values into quantifiable features of the resulting graphic.
We refer to these features as aesthetics.
1.2.2 Data types in data visualization –
When we consider types of data in data visualization, we consider various types
of data in use as well as aesthetics too. Aesthetics describe every aspect of a given
graphical element. For example, in Figure 1.4 -
A critical component of every graphical element is of course its position, which
describes where the element is located. In standard 2D graphics, we describe
positions by an x and y value, but other coordinate systems and one- or three-
dimensional visualizations are possible. Next, all graphical elements have a shape,
a size, and a color. Even if we are preparing a black-and-white drawing, graphical
elements need to have a color to be visible: for example, black if the background
is white or white if the background is black. Finally, to the extent we are using
lines to visualize data, these lines may have different widths or dash–dot patterns.
There are many other aesthetics may encountered in a data visualization. For
example, if we want to display text, we may have to specify font family, font face,
and font size, and if graphical objects overlap, we may have to specify whether
they are partially transparent.

Figure 1.4 Commonly used aesthetics in data visualization: position, shape, size,
color, line width, line type. Some of these aesthetics can represent both continuous
and discrete data (position, size, line width, color), while others can usually only
represent

All aesthetics are categorized into two groups:


(1) Those that can represent continuous data and
(2) those that cannot represent continuous data.
Continuous data values are values for which arbitrarily fine intermediates exist.
For example, time duration is a continuous value. Between any two durations, say
Maharashtra State Board of Technical Education P a g e 11 | 151
Emerging Trends in CO and IT (22618)

50 seconds and 51 seconds, there are arbitrarily many intermediates, such as 50.5
seconds, 50.51 seconds, 50.50001 seconds, and so on. By contrast, number of
persons in a room is a discrete value. A room can hold 5 persons or 6, but not 5.5.
For the examples in Figure 1.4, position, size, color, and line width can represent
continuous data, but shape and line type can usually only represent discrete data.
Next, we’ll consider the types of data we may want to represent in our
visualization. You may think of data as numbers, but numerical values are only
two out of several types of data we may encounter. In addition to continuous and
discrete numerical values, data can come in the form of discrete categories, in the
form of dates or times, and as text (Table 1.1). When data is numerical, we also
call it quantitative and when it is categorical, we call it qualitative. Variables
holding qualitative data are factors, and the different categories are called levels.
The levels of a factor are most commonly without order (as in the example of dog,
cat, fish in Table 1.1 given below, but factors can also be ordered, when there is
an intrinsic order among the levels of the factor (as in the example of good, fair,
poor in Table 1.1).
Table 1.1 Types of variables encountered in Data Visualization Scenario

Types of Appropriate
Example Description
Variables Scale
Quantitative/ 1.3, 5.7, 83, Arbitrary numerical values. These
numerical 1.5 × Continuous can be integers, rational numbers, or
continuous 10–2 real numbers.
Numbers in discrete units. These are
Quantitative/ most commonly but not necessarily
numerical integers. For example, the numbers
discrete 1, 2, 3, 4 Discrete 0.5, 1.0, 1.5 could also be treated as
discrete if intermediate values cannot
exist in the given dataset.
Categories without order. These are
Qualitative/
discrete and unique
categorical
dog, cat, fish Discrete categories that have no inherent
unordered
order. These variables are also called
factors.
Categories with order. These are
Qualitative/ discrete and unique
categorical good, fair, categories with an order. For
ordered Discrete example, “fair” always lies between
poor
“good” and “poor.” These variables
are also called ordered factors.
Continuous Specific days and/or times. Also,
Jan. 5 2018, or
Date or time generic dates, such as July 4 or Dec.
8:03am
Discrete 25 (without year).

Maharashtra State Board of Technical Education P a g e 12 | 151


Emerging Trends in CO and IT (22618)

The quick
brown fox None, or Free-form text. Can be treated as
Text
jumps over discrete categorical if needed.
the lazy dog.

Let’s consider an example, the below Table 1.2 shows the first few rows of a dataset
providing the daily temperature normal (aver‐ age daily temperatures over a 30-year
window) for four US locations. This table contains five variables: month, day,
location, station ID, and temperature (in degrees Fahrenheit). Month is an ordered
factor, day is a discrete numerical value, location is an unordered factor, station ID
is similarly an unordered factor, and temperature is a continuous numerical value.

Table 1.2 First 8 rows of a dataset listing daily temperature normal for four weather stations

Temperature
Month Day Location Section ID
(F)
USW000148
Jan 1 Chicago 25.6
19
USW000931
Jan 1 San Diego 55.2
07
USW000129
Jan 1 Houston 53.9
18
Death USC0004231
Jan 1 51.0
Valley 9
USW000148
Jan 2 Chicago 25.5
19
USW000931
Jan 2 San Diego 55.3
07
USW000129
Jan 2 Houston 53.8
18
Death USC0004231
Jan 2 51.2
Valley 9
Data source: National Oceanic and Atmospheric Administration (NOAA).

Maharashtra State Board of Technical Education P a g e 13 | 151


Emerging Trends in CO and IT (22618)

Figure 1.5 The data representation of the above-mentioned example

Figure 1.6 Monthly normal mean temperatures for the same example
1.2.3 Use of coordinate system in Data Visualization
To make any sort of data visualization, we need to define position scales, which
deter‐ mine where in graphic different data values are located. We cannot visualize
data without placing different data points at different locations, even if we just arrange
them next to each other along a line. For regular 2D visualizations, two numbers are
required to uniquely specify a point, and therefore we need two position scales. These
two scales are usually but not necessarily the x and y axes of the plot. We also have to
specify the relative geometric arrangement of these scales. Conventionally, the x axis
runs horizontally and the y axis vertically, but we could choose other arrangements.
For example, we could have the y axis run at an acute angle relative to the x axis, or
we could have one axis run in a circle and the other run radially. The combination of
a set of position scales and their relative geometric arrangement is called a coordinate
system.
● Cartesian coordinates –
The most widely used coordinate system for data visualization is the 2D Cartesian
coordinate system, where each location is uniquely specified by an x and a y value.
The x and y axes run orthogonally to each other, and data values are placed in an even
spacing along both axes. The two axes are continuous position scales, and they can
represent both positive and negative real numbers. To fully specify the coordinate
system, we need to specify the range of numbers each axis covers. Any data values
between these axis limits are placed at the appropriate respective location in the plot.
Maharashtra State Board of Technical Education P a g e 14 | 151
Emerging Trends in CO and IT (22618)

Any data values outside the axis limits are discarded.

Figure 1.7 Cartesian coordinate system's sample example

Figure 1.8 Daily temperature normals for Huston using different aspect ratio

A Cartesian coordinate system can have two axes representing two different units. This
situation arises quite commonly whenever we’re mapping two different types of
variables to x and y. For example, consider below image, if we plot temperature versus
days of the year. The y axis of is measured in degrees Fahrenheit, with a grid line every
at 20 degrees, and the x axis is measured in months, with a grid line at the first of every
third month. Whenever the two axes are measured in different units, we can stretch or
compress one relative to the other and maintain a valid visualization of the data. Which
version is preferable may depend on the story we want to convey. A tall and narrow
figure emphasizes change along the y axis and a short and wide figure does the opposite.
Ideally, we want to choose an aspect ratio that ensures that any important differences in
position are noticeable.
● Nonlinear Axes –
In a Cartesian coordinate system, the grid lines along an axis are spaced evenly both in
data units and in the resulting visualization. We refer to the position scales in these
Maharashtra State Board of Technical Education P a g e 15 | 151
Emerging Trends in CO and IT (22618)

coordinate systems as linear. While linear scales generally provide an accurate


representation of the data, there are scenarios where nonlinear scales are preferred. In a
nonlinear scale, even spacing in data units corresponds to uneven spacing in the
visualization, or conversely even spacing in the visualization corresponds to uneven
spacing in data units. The most commonly used nonlinear scale is the logarithmic scale,
or log scale for short. Log scales are linear in multiplication, such that a unit step on the
scale corresponds to multiplication with a fixed value. To create a log scale, we need to
log- transform the data values while exponentiating the numbers that are shown along
the axis grid lines. This process is demonstrated in Figure 3-4, which shows the numbers
1, 3.16, 10, 31.6, and 100 placed on linear and log scales. The numbers 3.16 and 31.6
may seem like strange choices, but they were selected because they are exactly half‐
way between 1 and 10 and between 10 and 100 on a log scale.

Figure 1.9 Representation of Linear and logarithmic scales


● Coordinate systems with curved axes –
All the coordinate systems we have encountered so far have used two straight axes
positioned at a right angle to each other, even if the axes themselves established a
nonlinear mapping from data values to positions. There are other coordinate systems,
however, where the axes themselves are curved. In particular, in the polar coordinate
system, we specify positions via an angle and a radial distance from the origin, and
therefore the angle axis is circular. Polar coordinates can be useful for data of a periodic
nature, such that data values at one end of the scale can be logically joined to data values
at the other end. For example, consider the days in a year. December 31st is the last day
of the year, but it is also one day before the first day of the year. If we want to show
how some quantity varies over the year, it can be appropriate to use polar coordinates
with the angle coordinate specifying each day. Let’s apply this concept to the
temperature normals. Because temperature normals are average temperatures that are
not tied to any specific year, Dec. 31st can be thought of as 366 days later than Jan. 1st
(temperature normals include Feb. 29th) and also 1 day earlier.

Maharashtra State Board of Technical Education P a g e 16 | 151


Emerging Trends in CO and IT (22618)

By plotting the temperature normals in a polar coordinate system, we emphasize this


cyclical property they have. The polar version highlights how similar the temperatures

are in Death Valley, Houston, and San Diego from late fall to early spring. In the
Cartesian coordinate system, this fact is obscured because the temperature values in late
December and in early January are shown in opposite parts of the figure and therefore
don’t form a single visual unit.

Figure 1. 10 Representation of data on curved axes


1.2.4 Use of colors to represent data values
There are three fundamental use cases for color in data visualizations:
i. to distinguish groups of data from each other,
ii. to represent data values, and
iii. to highlight.
The types of colors we use and the way in which we use them are quite different for
these three cases.
i. Color as a tool to distinguish –
We frequently use color as a means to distinguish discrete items or groups that do
not have an intrinsic order, such as different countries on a map or different
manufactures of a certain product. In this case, we use a qualitative color scale.

Maharashtra State Board of Technical Education P a g e 17 | 151


Emerging Trends in CO and IT (22618)

Such a scale contains a finite set of specific colors that are chosen to look clearly
distinct from each other while also being equivalent to each other. The second
condition requires that no one color should stand out relative to the others. Also,
the colors should not create the impression of an order, as would be the case with
a sequence of colors that get successively lighter. Such colors would create an
apparent order among the items being colored, which by definition have no order.
Many appropriate qualitative color scales are readily available. Figure 4-1 shows
three representative examples. In particular, the ColorBrewer project provides a
nice selection of qualitative color scales, including both fairly light and fairly dark
colors [Brewer 2017].

Figure 1.11. Example qualitative color scales. The Okabe Ito scale is the default scale
used throughout this book [Okabe and Ito 2008]. The ColorBrewer Dark2 scale is
provided by the ColorBrewer project [Brewer 2017]. The ggplot2 hue scale is the
default qualitative scale in the widely used plotting software ggplot2.

Figure 1.12 Representing data using various colors to distinguish regions


Maharashtra State Board of Technical Education P a g e 18 | 151
Emerging Trends in CO and IT (22618)

ii. Color as a tool to represent data values –


Color can also be used to represent quantitative data values, such as income,
temperature, or speed. In this case, we use a sequential color scale. Such a scale contains
a sequence of colors that clearly indicate which values are larger or smaller than which
other ones, and how distant two specific values are from each other. The second point
implies that the color scale needs to be perceived to vary uniformly across its entire
range. Sequential scales can be based on a single hue (e.g., from dark blue to light blue)
or on multiple hues (e.g., from dark red to light yellow). Multihued scales tend to follow
color gradients that can be seen in the natural world, such as dark red, green, or blue to
light yellow, or dark purple to light green. The reverse (e.g., dark yellow to light blue)
looks unnatural and doesn’t make a useful sequential scale.

Figure 1.13. Example sequential color scales. The ColorBrewer Blues scale is a monochro‐
matic scale that varies from dark to light blue. The Heat and Viridis scales are multihue
scales that vary from dark red to light yellow and from dark blue via green to light yel‐
low, respectively.
iii. Color as a tool to highlight –
Color can also be an effective tool to highlight specific elements in the data. There may
be specific categories or values in the dataset that carry key information about the story
we want to tell, and we can strengthen the story by emphasizing the relevant figure
elements to the reader. An easy way to achieve this emphasis is to color these figure
elements in a color or set of colors that vividly stand out against the rest of the figure.
This effect can be achieved with accent color scales, which are color scales that contain
both a set of subdued colors and a matching set of stronger, darker, and/or more
saturated colors.

Maharashtra State Board of Technical Education P a g e 19 | 151


Emerging Trends in CO and IT (22618)

Figure 4-7. Example accent color scales, each with four base colors and three accent col‐ ors.
Accent color scales can be derived in several different ways: (top) we can take an existing color
scale (e.g., the Okabe Ito scale) and lighten and/or partially desaturate some colors while
darkening others; (middle) we can take gray values and pair them with colors; (bottom) we can
use an existing accent color scale (e.g., the one from the ColorBrewer project).

1.2.5 Representing - Amounts, Distribution, and Proportions


Commonly used plots and charts to visualize different types of data -
i. Amounts

The most common approach to visualizing amounts (i.e., numerical values shown for
some set of categories) is using bars, either vertically or horizontally. However, instead
of using bars, we can also place dots at the location where the corresponding bar would
end.

If there are two or more sets of categories for which we want to show amounts, we can
group or stack the bars. We can also map the categories onto the x and y axes and
show amounts by color, via a heatmap.

ii. Distributions

Histograms and density plots provide the most intuitive visualizations of a distribution,
Maharashtra State Board of Technical Education P a g e 20 | 151
Emerging Trends in CO and IT (22618)

but both require arbitrary parameter choices and can be misleading. Cumulative
densities and quantile-quantile (q-q) plots always represent the data faithfully but can
be more difficult to interpret.

Boxplots, violin plots, strip charts, and since plots are useful when we want to visualize
many distributions at once and/or if we are primarily interested in overall shifts among
the distributions. Stacked histograms and overlapping densities allow a more in-depth
com‐ parison of a smaller number of distributions, though stacked histograms can be
difficult to interpret and are best avoided. Ridgeline plots can be a useful alternative to
violin plots and are often useful when visualizing very large numbers of distributions
or changes in distributions over.

iii. Proportions

Proportions can be visualized as pie charts, side-by-side bars, or stacked bars. As for
amounts, when we visualize proportions with bars, the bars can be arranged either
vertically or horizontally. Pie charts emphasize that the individual parts add up to a
whole and highlight simple fractions. However, the individual pieces are more easily
compared in side-by-side bars. Stacked bars look awkward for a single set of
proportions, but can be useful when comparing multiple sets of proportions.
When visualizing multiple sets of proportions or changes in proportions across
conditions, pie charts tend to be space-inefficient and often obscure relationships.
Grouped bars work well as long as the number of conditions compared is moderate, and
stacked bars can work for large numbers of conditions. Stacked densities are appropriate
when the proportions change along a continuous variable.

Maharashtra State Board of Technical Education P a g e 21 | 151


Emerging Trends in CO and IT (22618)

When proportions are specified according to multiple grouping variables, mosaic plots,
tree maps, or parallel sets are useful visualization approaches. Mosaic plots assume that
every level of one grouping variable can be combined with every level of another
grouping variable, whereas tree maps do not make such an assumption. Tree maps work
well even if the subdivisions of one group are entirely distinct from the subdivisions of
another. Parallel sets work better than either mosaic plots or tree maps when there are
more than two grouping variables.
1.3 Data Storytelling
1.3.1 Introduction
Data storytelling is a methodology for communicating information, tailored to a specific
audience, with a compelling narrative. It is the last ten feet of your data analysis and
arguably the most important aspect. Data storytelling is the concept of building a
compelling narrative based on complex data and analytics that help tell your story and
influence and inform a particular audience.
● The benefits of data storytelling
✔ Adding value to your data and insights.
✔ Interpreting complex information and highlighting essential key points for the
audience.
✔ Providing a human touch to your data.
✔ Offering value to your audience and industry.
✔ Building credibility as an industry and topic thought leader.
1.3.2. Ineffectiveness of Graphical representation of data
Data visualization plays a significant role in determining how receptive your audience
is to receiving complex information. Data visualization helps transform boundless
amounts of data into something simpler and digestible. Here, you can supply the visuals
needed to support your story. Effective data visualizations can help:
● Reveal patterns, trends, and findings from an unbiased viewpoint.
● Provide context, interpret results, and articulate insights.
● Streamline data so your audience can process information.
● Improve audience engagement.

1.3.3. The three key elements of data storytelling


Through a structured approach, data storytelling and data visualization work together
to communicate your insights through three essential elements: narrative, visuals, and
Maharashtra State Board of Technical Education P a g e 22 | 151
Emerging Trends in CO and IT (22618)

data. As you create your data story, it is important to combine the following three
elements to write a well-rounded anecdote of your theory and the resulting actions you’d
like to see from users.
1. Build your narrative
As you tell your story, you need to use your data as supporting pillars to your insights.
Help your audience understand your point of view by distilling complex information
into informative insights. Your narrative and context are what will drive the linear
nature of your data storytelling.
2. Use visuals to enlighten
Visuals can help educate the audience on your theory. When you connect the visual
assets (charts, graphs, etc.) to your narrative, you engage the audience with otherwise
hidden insights that provide the fundamental data to support your theory. Instead of
presenting a single data insight to support your theory, it helps to show multiple pieces
of data, both granular and high level, so that the audience can truly appreciate your
viewpoint.
3. Show data to support
Humans are not naturally attracted to analytics, especially analytics that lack
contextualization using augmented analytics. Your narrative offers enlightenment,
supported by tangible data. Context and critique are integral to the full interpretation of
your narrative. Using business analytic tools to provide key insights and understanding
to your narrative can help provide the much-needed context throughout your data story.

By combining the three elements above, your data story is sure to create an emotional
response in your audience. Emotion plays a significant role in decision-making. And by
linking the emotional context and hard data in your data storytelling, you’re able to
influence others. When these three key elements are successfully integrated, you have
created a data story that can influence people and drive change.

1.3.4. Explanatory Analysis


Exploratory analysis is what you do to understand the data and figure out what might
be noteworthy or interesting to highlight to others. When it comes to explanatory
analysis, there are a few things to think about and be extremely clear on before
visualizing any data or creating content. First, to whom are you communicating? It is
important to have a good understanding of who your audience is and how they perceive
you. This can help you to identify common ground that will help you ensure they hear
your message. Second, what do you want your audience to know or do? You should be
clear how you want your audience to act and take into account how you will
communicate to them and the overall tone that you want to set for your communication.
It’s only after you can concisely answer these first two questions that you’re ready to
move forward with the third: How can you use data to help make your point?
1.3.4.1. Who -

Maharashtra State Board of Technical Education P a g e 23 | 151


Emerging Trends in CO and IT (22618)

o Your audience - The more specific you can be about who your audience is, the
better position you will be in for successful communication. Avoid general
audiences, such as “internal and external stakeholders” or “anyone who might be
interested”—by trying to communicate to too many different people with disparate
needs at once, you put yourself in a position where you can’t communicate to any
one of them as effectively as you could if you narrowed your target audience.
Sometimes this means creating different communications for different audiences.
Identifying the decision maker is one way of narrowing your audience. The more
you know about your audience, the better positioned you’ll be to understand how
to resonate with them and form a communication that will meet their needs and
yours.
o You - It’s also helpful to think about the relationship that you have with your
audience and how you expect that they will perceive you. Will you be encountering
each other for the first time through this communication, or do you have an
established relationship? Do they already trust you as an expert, or do you need to
work to establish credibility? These are important considerations when it comes to
determining how to structure your communication and whether and when to use
data, and may impact the order and flow of the overall story you aim to tell.

1.3.4.2. What -
o Action - What do you need your audience to know or do? This is the point where
you think through how to make what you communicate relevant for your audience
and form a clear understanding of why they should care about what you say. You
should always want your audience to know or do something. If you can’t concisely
articulate that, you should revisit whether you need to communicate in the first place.
o Mechanism - How will you communicate to your audience? The method you will
use to communicate to your audience has implications on a number of factors,
including the amount of control you will have over how the audience takes in the
information and the level of detail that needs to be explicit. We can think of the
communication mechanism along a continuum, with live presentation at the left and
a written document or email at the right, as shown in Figure 1.1. Consider the level
of control you have over how the information is consumed as well as the amount of
detail needed at either end of the spectrum.
1.3.4.3. How -
Finally—and only after we can clearly articulate who our audience is and what we
need them to know or do—we can turn to the data and ask the question: What data is
available that will help make my point? Data becomes supporting evidence of the
story you will build and tell.
1.4 Concept of machine learning and deep learning
1.4.1 Machine Learning:
● Machine learning is a branch of science that deals with programming the systems
in such a way that they automatically learn and improve with experience. Here, learning
Maharashtra State Board of Technical Education P a g e 24 | 151
Emerging Trends in CO and IT (22618)

means recognizing and understanding the input data and making wise decisions based
on the supplied data.
● It is very difficult to cater to all the decisions based on all possible inputs. To
tackle this problem, algorithms are developed. These algorithms build knowledge from
specific data and past experience with the principles of statistics, probability theory,
logic, combinatorial optimization, search, reinforcement learning, and control theory.
The developed algorithms form the basis of various applications such as:
● Vision processing
● Language processing
● Forecasting (e.g., stock market trends)
● Pattern recognition
● Games
● Data mining
● Expert systems
● Robotics
Machine learning is a vast area and it is quite beyond the scope of this tutorial to cover
all its features. There are several ways to implement machine learning techniques,
however the most commonly used ones are supervised and unsupervised learning.
1.4.2. Supervised Learning: Supervised learning deals with learning a function from
available training data. A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new examples. Common
examples of supervised learning include:
● classifying e-mails as spam,
● labeling webpages based on their content, and
● voice recognition.
There are many supervised learning algorithms such as neural networks, Support Vector
Machines (SVMs), and Naive Bayes classifiers. Mahout implements Naive Bayes
classifier.
1.4.3. Unsupervised Learning: Unsupervised learning makes sense of unlabeled data
without having any predefined dataset for its training. Unsupervised learning is an
extremely powerful tool for analyzing available data and look for patterns and trends.
It is most commonly used for clustering similar input into logical groups. Common
approaches to unsupervised learning include:
● k-means
● self-organizing maps, and
● hierarchical clustering

1.4.4. Deep Learning


Deep learning is a subfield of machine learning where concerned algorithms are
inspired by the structure and function of the brain called artificial neural networks.
All the value today of deep learning is through supervised learning or learning from
labelled data and algorithms.
Maharashtra State Board of Technical Education P a g e 25 | 151
Emerging Trends in CO and IT (22618)

Each algorithm in deep learning goes through the same process. It includes a hierarchy
of nonlinear transformation of input that can be used to generate a statistical model as
output. Consider the following steps that define the Machine Learning process
● Identifies relevant data sets and prepares them for analysis.
● Chooses the type of algorithm to use
● Builds an analytical model based on the algorithm used.
● Trains the model on test data sets, revising it as needed.
● Runs the model to generate test scores.
Deep learning has evolved hand-in-hand with the digital era, which has brought about
an explosion of data in all forms and from every region of the world. This data, known
simply as big data, is drawn from sources like social media, internet search engines, e-
commerce platforms, and online cinemas, among others. This enormous amount of data
is readily accessible and can be shared through fintech applications like cloud
computing.
However, the data, which normally is unstructured, is so vast that it could take decades
for humans to comprehend it and extract relevant information. Companies realize the
incredible potential that can result from unraveling this wealth of information and are
increasingly adapting to AI systems for automated support.

1.4.5. Applications of Machine Learning and Deep Learning


● Computer vision which is used for facial recognition and attendance mark
through fingerprints or vehicle identification through number plate.
● Information Retrieval from search engines like text search for image search.
● Automated email marketing with specified target identification.
● Medical diagnosis of cancer tumors or anomaly identification of any chronic
disease.
● Natural language processing for applications like photo tagging. The best
example to explain this scenario is used in Facebook.
● Online Advertising.

References:
● https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence
_overview. htm
● https://www.javatpoint.com/introduction-to-artificial-intelligence
● https://www.tutorialspoint.com/tensorflow/tensorflow_machine_learning_d
eep_learni ng.htm
● Story telling with data by Cole Nissbuamer Knafilc – Wiley Publication -
ISBN 9781119002253
● Fundamentals of Data Visualization, A primer on making informative and
compelling figures by Claus O Wilke - O’Reilly Publication – March 2019

Sample Multiple Choice Questions

Maharashtra State Board of Technical Education P a g e 26 | 151

You might also like