Eti Chapter 1
Eti Chapter 1
Eti Chapter 1
Contents
1.1 Introduction of AI
● Concept
● Scope of AI
● Components of AI
● Types of AI
● Application of AI
1.2 Data Visualization
● Data types in data visualization
● Scales map of data values in aesthetics
● Use of coordinate system in data visualization
● Use of colors to represent data values
● Representing - Amounts, Distribution, and Proportions
1.3 Data Storytelling
● Introduction
● Ineffectiveness of Graphical representation of data
● Explanatory Analysis
○ Who
○ What
○ How
1.4 Concept of machine learning and deep learning.
1.1 Introduction of AI
A branch of Computer Science named Artificial Intelligence (AI)pursues creating the
computers / machines as intelligent as human beings. John McCarthy the father of Artificial
Intelligence described AI as, “The science and engineering of making intelligent
machines, especially intelligent computer programs”. Artificial Intelligence (AI) is a
branch of science which deals with helping machines find solutions to complex problems
in a more human-like fashion.
Artificial is defined in different approaches by various researchers during its evolution,
such as “Artificial Intelligence is the study of how to make computers do things which at
the moment, people do better.”
There are other possible definitions “like AI is a collection of hard problems which can be
solved by humans and other living things, but for which we don’t have good algorithms for
solving.” e. g., understanding spoken natural language, medical diagnosis, circuit design,
learning, self-adaptation, reasoning, chess playing, proving math theories, etc.
● Data: Data is defined as symbols that represent properties of objects events and their
environment.
● Information: Information is a message that contains relevant meaning, implication, or
input for decision and/or action.
● Knowledge: It is the (1) cognition or recognition (know-what), (2) capacity to
act(know-how), and (3) understanding (know-why) that resides or is contained within
the mind or in the brain.
● Intelligence: It requires ability to sense the environment, to make decisions, and to
control action.
1.1.1 Concept:
Artificial Intelligence is one of the emerging technologies that try to simulate human
reasoning in AI systems The art and science of bringing learning, adaptation and self-
organization to the machine is the art of Artificial Intelligence. Artificial Intelligence is the
ability of a computer program to learn and think. Artificial intelligence (AI) is an area of
computer science that emphasizes the creation of intelligent machines that work and reacts
like humans. AI is built on these three important concepts
Machine learning: When you command your smartphone to call someone, or when you chat
with a customer service chatbot, you are interacting with software that runs on AI. But this
type of software actually is limited to what it has been programmed to do. However, we expect
to soon have systems that can learn new tasks without humans having to guide them. The idea
is to give them a large number of examples for any given chore, and they should be able to
process each one and learn how to do it by the end of the activity.
Deep learning: The machine learning example I provided above is limited by the fact that
humans still need to direct the AI’s development. In deep learning, the goal is for the software
to use what it has learned in one area to solve problems in other areas. For example, a program
that has learned how to distinguish images in a photograph might be able to use this learning
to seek out patterns in complex graphs.
Neural networks: These consist of computer programs that mimic the way the human brain
processes information. They specialize in clustering information and recognizing complex
patterns, giving computers the ability to use more sophisticated processes to analyze data.
AI Approach:
The difference between machine and human intelligence is that the human think / act
rationally compares to machine. Historically, all four approaches to AI have been followed,
each by different people with different methods.
1.1.3 Components of AI
The core components and constituents of AI are derived from the concept of logic, cognition
and computation; and the compound components, built-up through core components are
knowledge, reasoning, search, natural language processing, vision etc.
Level Core Compound Coarse components
Induction Proposition
Knowledge Reasoning Knowledge based systems, Heuristic
Logic Tautology Model
Control Search Search Theorem Proving
Logic
Temporal Learning
Multi Agent system Co-operation,
Cognition Adaptation Belief Desire Intention
Co-ordination AI Programming
Self-organization
Vision
Functional Memory Perception Utterance Natural Language Speech
Processing
The core entities are inseparable constituents of AI in that these concepts are fused at atomic
level. The concepts derived from logic are propositional logic, tautology, predicate calculus,
model and temporal logic. The concepts of cognitive science are of two types: one is
functional which includes learning, adaptation and self-organization, and the other is memory
and perception which are physical entities. The physical entities generate some functions to
make the compound components.
The compound components are made of some combination of the logic and cognition stream.
These are knowledge, reasoning and control generated from constituents of logic such as
predicate calculus, induction and tautology and some from cognition (such as learning and
adaptation). Similarly, belief, desire and intention are models of mental states that are
predominantly based on cognitive components but less on logic. Vision, utterance (vocal) and
expression (written) are combined effect of memory and perceiving organs or body sensors
such as ear, eyes and vocal. The gross level contains the constituents at the third level which
are knowledge-based systems (KBS), heuristic search, automatic theorem proving, multi-
agent systems, Al languages such as PROLOG and LISP, Natural language processing (NLP).
Speech processing and vision are based mainly on the principle of pattern recognition.
AI Dimension: The philosophy of Al in three-dimensional representations consists in logic,
cognition and computation in the x-direction, knowledge, reasoning and interface in the y-
direction. The x-y plane is the foundation of AI. The z-direction consists of correlated systems
of physical origin such as language, vision and perception as shown in Figure.1.2
Cognition:
Computers has become so popular in a short span of time due to the simple reason that
they adapted and projected the information processing paradigm (IPP) of human beings:
sensing organs as input, mechanical movement organs as output and the central nervous
system (CNS) in brain as control and computing devices, short-term and long-term
memory were not distinguished by computer scientists but, as a whole, it was in
conjunction, termed memory.
In further deepening level, the interaction of stimuli with the stored information to
produce new information requires the process of learning, adaptation and self-
organization. These functionalities in the information processing at a certain level of
abstraction of brain activities demonstrate a state of mind which exhibits certain specific
behavior to qualify as intelligence. Computational models were developed and
incorporated in machines which mimicked the functionalities of human origin. The
creation of such traits of human beings in the computing devices and processes
originated the concept of intelligence in machine as virtual mechanism. These virtual
machines were termed in due course of time artificial intelligent machines.
Computation
The theory of computation developed by Turing-finite state automation—was a turning
point in mathematical model to logical computational. Chomsky's linguistic
computational theory generated a model for syntactic analysis through a regular
grammar.
1.1.4 Types of AI
Artificial Intelligence can be divided in various types, there are mainly two types of
main categorization which are based on capabilities and based on functionally of AI.
Following is flow diagram which explain the types of AI.
Types of AI
think, to reason, solve the puzzle, make judgments, plan, learn, and communicate
by its own.
● Super AI is still a hypothetical concept of Artificial Intelligence. Development
of such systems in real is still world changing task.
2. Theory of Mind
● Theory of Mind AI should understand the human emotions, people, beliefs, and
be able to interact socially like humans.
● This type of AI machines are still not developed, but researchers are making
lots of efforts and improvement for developing such AI machines.
3. Self-Awareness
● Self-awareness AI is the future of Artificial Intelligence. These machines will
be super intelligent, and will have their own consciousness, sentiments, and self-
awareness.
● These machines will be smarter than human mind.
● Self-Awareness AI does not exist in reality still and it is a hypothetical concept.
1.1.5 Application of AI
AI has been dominant in various fields such as −
● Gaming: AI plays crucial role in strategic games such as chess, poker, tic-tac-
toe, etc., where machine can think of large number of possible positions based on
heuristic knowledge.
● Natural Language Processing: It is possible to interact with the computer that
understands natural language spoken by humans.
● Expert Systems: There are some applications which integrate machine,
software, and special information to impart reasoning and advising. They provide
explanation and advice to the users.
● Vision Systems: These systems understand, interpret, and comprehend visual
input on the computer. For example,
• A spying aeroplane takes photographs, which are used to figure out spatial
information or map of the areas.
• Doctors use clinical expert system to diagnose the patient.
• Police use computer software that can recognize the face of criminal with the
stored portrait made by forensic artist.
● Speech Recognition: Some intelligent systems are capable of hearing and
comprehending the language in terms of sentences and their meanings while a
human talks to it. It can handle different accents, slang words, noise in the
background, change in human’s noise due to cold, etc.
● Handwriting Recognition: The handwriting recognition software reads the text
written on paper by a pen or on screen by a stylus. It can recognize the shapes of
the letters and convert it into editable text.
● Intelligent Robots: Robots are able to perform the tasks given by a human.
They have sensors to detect physical data from the real world such as light, heat,
temperature, movement, sound, bump, and pressure. They have efficient
processors, multiple sensors and huge memory, to exhibit intelligence. In addition,
they are capable of learning from their mistakes and they can adapt to the new
environment.
1.2.1 Introduction –
Data visualization is the graphical representation of information and data. By
using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to see and understand trends, outliers, and patterns in
data. It also provides an excellent way to present data to non-technical audiences
without confusion. The first and foremost objective of data visualization is to
convey data correctly. Whenever we visualize data, we take data values and
convert them in a systematic and logical way into the visual elements that make
up the final graphic. Even though there are many different types of data
visualizations, and on first glance a scatterplot, a pie chart, and a heatmap don’t
seem to have much in common, all these visualizations can be described with a
common language that captures how data values are turned into blobs of ink on
paper or colored pixels on a screen. The key insight is the following: all data
visualizations map data values into quantifiable features of the resulting graphic.
We refer to these features as aesthetics.
1.2.2 Data types in data visualization –
When we consider types of data in data visualization, we consider various types
of data in use as well as aesthetics too. Aesthetics describe every aspect of a given
graphical element. For example, in Figure 1.4 -
A critical component of every graphical element is of course its position, which
describes where the element is located. In standard 2D graphics, we describe
positions by an x and y value, but other coordinate systems and one- or three-
dimensional visualizations are possible. Next, all graphical elements have a shape,
a size, and a color. Even if we are preparing a black-and-white drawing, graphical
elements need to have a color to be visible: for example, black if the background
is white or white if the background is black. Finally, to the extent we are using
lines to visualize data, these lines may have different widths or dash–dot patterns.
There are many other aesthetics may encountered in a data visualization. For
example, if we want to display text, we may have to specify font family, font face,
and font size, and if graphical objects overlap, we may have to specify whether
they are partially transparent.
Figure 1.4 Commonly used aesthetics in data visualization: position, shape, size,
color, line width, line type. Some of these aesthetics can represent both continuous
and discrete data (position, size, line width, color), while others can usually only
represent
50 seconds and 51 seconds, there are arbitrarily many intermediates, such as 50.5
seconds, 50.51 seconds, 50.50001 seconds, and so on. By contrast, number of
persons in a room is a discrete value. A room can hold 5 persons or 6, but not 5.5.
For the examples in Figure 1.4, position, size, color, and line width can represent
continuous data, but shape and line type can usually only represent discrete data.
Next, we’ll consider the types of data we may want to represent in our
visualization. You may think of data as numbers, but numerical values are only
two out of several types of data we may encounter. In addition to continuous and
discrete numerical values, data can come in the form of discrete categories, in the
form of dates or times, and as text (Table 1.1). When data is numerical, we also
call it quantitative and when it is categorical, we call it qualitative. Variables
holding qualitative data are factors, and the different categories are called levels.
The levels of a factor are most commonly without order (as in the example of dog,
cat, fish in Table 1.1 given below, but factors can also be ordered, when there is
an intrinsic order among the levels of the factor (as in the example of good, fair,
poor in Table 1.1).
Table 1.1 Types of variables encountered in Data Visualization Scenario
Types of Appropriate
Example Description
Variables Scale
Quantitative/ 1.3, 5.7, 83, Arbitrary numerical values. These
numerical 1.5 × Continuous can be integers, rational numbers, or
continuous 10–2 real numbers.
Numbers in discrete units. These are
Quantitative/ most commonly but not necessarily
numerical integers. For example, the numbers
discrete 1, 2, 3, 4 Discrete 0.5, 1.0, 1.5 could also be treated as
discrete if intermediate values cannot
exist in the given dataset.
Categories without order. These are
Qualitative/
discrete and unique
categorical
dog, cat, fish Discrete categories that have no inherent
unordered
order. These variables are also called
factors.
Categories with order. These are
Qualitative/ discrete and unique
categorical good, fair, categories with an order. For
ordered Discrete example, “fair” always lies between
poor
“good” and “poor.” These variables
are also called ordered factors.
Continuous Specific days and/or times. Also,
Jan. 5 2018, or
Date or time generic dates, such as July 4 or Dec.
8:03am
Discrete 25 (without year).
The quick
brown fox None, or Free-form text. Can be treated as
Text
jumps over discrete categorical if needed.
the lazy dog.
Let’s consider an example, the below Table 1.2 shows the first few rows of a dataset
providing the daily temperature normal (aver‐ age daily temperatures over a 30-year
window) for four US locations. This table contains five variables: month, day,
location, station ID, and temperature (in degrees Fahrenheit). Month is an ordered
factor, day is a discrete numerical value, location is an unordered factor, station ID
is similarly an unordered factor, and temperature is a continuous numerical value.
Table 1.2 First 8 rows of a dataset listing daily temperature normal for four weather stations
Temperature
Month Day Location Section ID
(F)
USW000148
Jan 1 Chicago 25.6
19
USW000931
Jan 1 San Diego 55.2
07
USW000129
Jan 1 Houston 53.9
18
Death USC0004231
Jan 1 51.0
Valley 9
USW000148
Jan 2 Chicago 25.5
19
USW000931
Jan 2 San Diego 55.3
07
USW000129
Jan 2 Houston 53.8
18
Death USC0004231
Jan 2 51.2
Valley 9
Data source: National Oceanic and Atmospheric Administration (NOAA).
Figure 1.6 Monthly normal mean temperatures for the same example
1.2.3 Use of coordinate system in Data Visualization
To make any sort of data visualization, we need to define position scales, which
deter‐ mine where in graphic different data values are located. We cannot visualize
data without placing different data points at different locations, even if we just arrange
them next to each other along a line. For regular 2D visualizations, two numbers are
required to uniquely specify a point, and therefore we need two position scales. These
two scales are usually but not necessarily the x and y axes of the plot. We also have to
specify the relative geometric arrangement of these scales. Conventionally, the x axis
runs horizontally and the y axis vertically, but we could choose other arrangements.
For example, we could have the y axis run at an acute angle relative to the x axis, or
we could have one axis run in a circle and the other run radially. The combination of
a set of position scales and their relative geometric arrangement is called a coordinate
system.
● Cartesian coordinates –
The most widely used coordinate system for data visualization is the 2D Cartesian
coordinate system, where each location is uniquely specified by an x and a y value.
The x and y axes run orthogonally to each other, and data values are placed in an even
spacing along both axes. The two axes are continuous position scales, and they can
represent both positive and negative real numbers. To fully specify the coordinate
system, we need to specify the range of numbers each axis covers. Any data values
between these axis limits are placed at the appropriate respective location in the plot.
Maharashtra State Board of Technical Education P a g e 14 | 151
Emerging Trends in CO and IT (22618)
Figure 1.8 Daily temperature normals for Huston using different aspect ratio
A Cartesian coordinate system can have two axes representing two different units. This
situation arises quite commonly whenever we’re mapping two different types of
variables to x and y. For example, consider below image, if we plot temperature versus
days of the year. The y axis of is measured in degrees Fahrenheit, with a grid line every
at 20 degrees, and the x axis is measured in months, with a grid line at the first of every
third month. Whenever the two axes are measured in different units, we can stretch or
compress one relative to the other and maintain a valid visualization of the data. Which
version is preferable may depend on the story we want to convey. A tall and narrow
figure emphasizes change along the y axis and a short and wide figure does the opposite.
Ideally, we want to choose an aspect ratio that ensures that any important differences in
position are noticeable.
● Nonlinear Axes –
In a Cartesian coordinate system, the grid lines along an axis are spaced evenly both in
data units and in the resulting visualization. We refer to the position scales in these
Maharashtra State Board of Technical Education P a g e 15 | 151
Emerging Trends in CO and IT (22618)
are in Death Valley, Houston, and San Diego from late fall to early spring. In the
Cartesian coordinate system, this fact is obscured because the temperature values in late
December and in early January are shown in opposite parts of the figure and therefore
don’t form a single visual unit.
Such a scale contains a finite set of specific colors that are chosen to look clearly
distinct from each other while also being equivalent to each other. The second
condition requires that no one color should stand out relative to the others. Also,
the colors should not create the impression of an order, as would be the case with
a sequence of colors that get successively lighter. Such colors would create an
apparent order among the items being colored, which by definition have no order.
Many appropriate qualitative color scales are readily available. Figure 4-1 shows
three representative examples. In particular, the ColorBrewer project provides a
nice selection of qualitative color scales, including both fairly light and fairly dark
colors [Brewer 2017].
Figure 1.11. Example qualitative color scales. The Okabe Ito scale is the default scale
used throughout this book [Okabe and Ito 2008]. The ColorBrewer Dark2 scale is
provided by the ColorBrewer project [Brewer 2017]. The ggplot2 hue scale is the
default qualitative scale in the widely used plotting software ggplot2.
Figure 1.13. Example sequential color scales. The ColorBrewer Blues scale is a monochro‐
matic scale that varies from dark to light blue. The Heat and Viridis scales are multihue
scales that vary from dark red to light yellow and from dark blue via green to light yel‐
low, respectively.
iii. Color as a tool to highlight –
Color can also be an effective tool to highlight specific elements in the data. There may
be specific categories or values in the dataset that carry key information about the story
we want to tell, and we can strengthen the story by emphasizing the relevant figure
elements to the reader. An easy way to achieve this emphasis is to color these figure
elements in a color or set of colors that vividly stand out against the rest of the figure.
This effect can be achieved with accent color scales, which are color scales that contain
both a set of subdued colors and a matching set of stronger, darker, and/or more
saturated colors.
Figure 4-7. Example accent color scales, each with four base colors and three accent col‐ ors.
Accent color scales can be derived in several different ways: (top) we can take an existing color
scale (e.g., the Okabe Ito scale) and lighten and/or partially desaturate some colors while
darkening others; (middle) we can take gray values and pair them with colors; (bottom) we can
use an existing accent color scale (e.g., the one from the ColorBrewer project).
The most common approach to visualizing amounts (i.e., numerical values shown for
some set of categories) is using bars, either vertically or horizontally. However, instead
of using bars, we can also place dots at the location where the corresponding bar would
end.
If there are two or more sets of categories for which we want to show amounts, we can
group or stack the bars. We can also map the categories onto the x and y axes and
show amounts by color, via a heatmap.
ii. Distributions
Histograms and density plots provide the most intuitive visualizations of a distribution,
Maharashtra State Board of Technical Education P a g e 20 | 151
Emerging Trends in CO and IT (22618)
but both require arbitrary parameter choices and can be misleading. Cumulative
densities and quantile-quantile (q-q) plots always represent the data faithfully but can
be more difficult to interpret.
Boxplots, violin plots, strip charts, and since plots are useful when we want to visualize
many distributions at once and/or if we are primarily interested in overall shifts among
the distributions. Stacked histograms and overlapping densities allow a more in-depth
com‐ parison of a smaller number of distributions, though stacked histograms can be
difficult to interpret and are best avoided. Ridgeline plots can be a useful alternative to
violin plots and are often useful when visualizing very large numbers of distributions
or changes in distributions over.
iii. Proportions
Proportions can be visualized as pie charts, side-by-side bars, or stacked bars. As for
amounts, when we visualize proportions with bars, the bars can be arranged either
vertically or horizontally. Pie charts emphasize that the individual parts add up to a
whole and highlight simple fractions. However, the individual pieces are more easily
compared in side-by-side bars. Stacked bars look awkward for a single set of
proportions, but can be useful when comparing multiple sets of proportions.
When visualizing multiple sets of proportions or changes in proportions across
conditions, pie charts tend to be space-inefficient and often obscure relationships.
Grouped bars work well as long as the number of conditions compared is moderate, and
stacked bars can work for large numbers of conditions. Stacked densities are appropriate
when the proportions change along a continuous variable.
When proportions are specified according to multiple grouping variables, mosaic plots,
tree maps, or parallel sets are useful visualization approaches. Mosaic plots assume that
every level of one grouping variable can be combined with every level of another
grouping variable, whereas tree maps do not make such an assumption. Tree maps work
well even if the subdivisions of one group are entirely distinct from the subdivisions of
another. Parallel sets work better than either mosaic plots or tree maps when there are
more than two grouping variables.
1.3 Data Storytelling
1.3.1 Introduction
Data storytelling is a methodology for communicating information, tailored to a specific
audience, with a compelling narrative. It is the last ten feet of your data analysis and
arguably the most important aspect. Data storytelling is the concept of building a
compelling narrative based on complex data and analytics that help tell your story and
influence and inform a particular audience.
● The benefits of data storytelling
✔ Adding value to your data and insights.
✔ Interpreting complex information and highlighting essential key points for the
audience.
✔ Providing a human touch to your data.
✔ Offering value to your audience and industry.
✔ Building credibility as an industry and topic thought leader.
1.3.2. Ineffectiveness of Graphical representation of data
Data visualization plays a significant role in determining how receptive your audience
is to receiving complex information. Data visualization helps transform boundless
amounts of data into something simpler and digestible. Here, you can supply the visuals
needed to support your story. Effective data visualizations can help:
● Reveal patterns, trends, and findings from an unbiased viewpoint.
● Provide context, interpret results, and articulate insights.
● Streamline data so your audience can process information.
● Improve audience engagement.
data. As you create your data story, it is important to combine the following three
elements to write a well-rounded anecdote of your theory and the resulting actions you’d
like to see from users.
1. Build your narrative
As you tell your story, you need to use your data as supporting pillars to your insights.
Help your audience understand your point of view by distilling complex information
into informative insights. Your narrative and context are what will drive the linear
nature of your data storytelling.
2. Use visuals to enlighten
Visuals can help educate the audience on your theory. When you connect the visual
assets (charts, graphs, etc.) to your narrative, you engage the audience with otherwise
hidden insights that provide the fundamental data to support your theory. Instead of
presenting a single data insight to support your theory, it helps to show multiple pieces
of data, both granular and high level, so that the audience can truly appreciate your
viewpoint.
3. Show data to support
Humans are not naturally attracted to analytics, especially analytics that lack
contextualization using augmented analytics. Your narrative offers enlightenment,
supported by tangible data. Context and critique are integral to the full interpretation of
your narrative. Using business analytic tools to provide key insights and understanding
to your narrative can help provide the much-needed context throughout your data story.
By combining the three elements above, your data story is sure to create an emotional
response in your audience. Emotion plays a significant role in decision-making. And by
linking the emotional context and hard data in your data storytelling, you’re able to
influence others. When these three key elements are successfully integrated, you have
created a data story that can influence people and drive change.
o Your audience - The more specific you can be about who your audience is, the
better position you will be in for successful communication. Avoid general
audiences, such as “internal and external stakeholders” or “anyone who might be
interested”—by trying to communicate to too many different people with disparate
needs at once, you put yourself in a position where you can’t communicate to any
one of them as effectively as you could if you narrowed your target audience.
Sometimes this means creating different communications for different audiences.
Identifying the decision maker is one way of narrowing your audience. The more
you know about your audience, the better positioned you’ll be to understand how
to resonate with them and form a communication that will meet their needs and
yours.
o You - It’s also helpful to think about the relationship that you have with your
audience and how you expect that they will perceive you. Will you be encountering
each other for the first time through this communication, or do you have an
established relationship? Do they already trust you as an expert, or do you need to
work to establish credibility? These are important considerations when it comes to
determining how to structure your communication and whether and when to use
data, and may impact the order and flow of the overall story you aim to tell.
1.3.4.2. What -
o Action - What do you need your audience to know or do? This is the point where
you think through how to make what you communicate relevant for your audience
and form a clear understanding of why they should care about what you say. You
should always want your audience to know or do something. If you can’t concisely
articulate that, you should revisit whether you need to communicate in the first place.
o Mechanism - How will you communicate to your audience? The method you will
use to communicate to your audience has implications on a number of factors,
including the amount of control you will have over how the audience takes in the
information and the level of detail that needs to be explicit. We can think of the
communication mechanism along a continuum, with live presentation at the left and
a written document or email at the right, as shown in Figure 1.1. Consider the level
of control you have over how the information is consumed as well as the amount of
detail needed at either end of the spectrum.
1.3.4.3. How -
Finally—and only after we can clearly articulate who our audience is and what we
need them to know or do—we can turn to the data and ask the question: What data is
available that will help make my point? Data becomes supporting evidence of the
story you will build and tell.
1.4 Concept of machine learning and deep learning
1.4.1 Machine Learning:
● Machine learning is a branch of science that deals with programming the systems
in such a way that they automatically learn and improve with experience. Here, learning
Maharashtra State Board of Technical Education P a g e 24 | 151
Emerging Trends in CO and IT (22618)
means recognizing and understanding the input data and making wise decisions based
on the supplied data.
● It is very difficult to cater to all the decisions based on all possible inputs. To
tackle this problem, algorithms are developed. These algorithms build knowledge from
specific data and past experience with the principles of statistics, probability theory,
logic, combinatorial optimization, search, reinforcement learning, and control theory.
The developed algorithms form the basis of various applications such as:
● Vision processing
● Language processing
● Forecasting (e.g., stock market trends)
● Pattern recognition
● Games
● Data mining
● Expert systems
● Robotics
Machine learning is a vast area and it is quite beyond the scope of this tutorial to cover
all its features. There are several ways to implement machine learning techniques,
however the most commonly used ones are supervised and unsupervised learning.
1.4.2. Supervised Learning: Supervised learning deals with learning a function from
available training data. A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new examples. Common
examples of supervised learning include:
● classifying e-mails as spam,
● labeling webpages based on their content, and
● voice recognition.
There are many supervised learning algorithms such as neural networks, Support Vector
Machines (SVMs), and Naive Bayes classifiers. Mahout implements Naive Bayes
classifier.
1.4.3. Unsupervised Learning: Unsupervised learning makes sense of unlabeled data
without having any predefined dataset for its training. Unsupervised learning is an
extremely powerful tool for analyzing available data and look for patterns and trends.
It is most commonly used for clustering similar input into logical groups. Common
approaches to unsupervised learning include:
● k-means
● self-organizing maps, and
● hierarchical clustering
Each algorithm in deep learning goes through the same process. It includes a hierarchy
of nonlinear transformation of input that can be used to generate a statistical model as
output. Consider the following steps that define the Machine Learning process
● Identifies relevant data sets and prepares them for analysis.
● Chooses the type of algorithm to use
● Builds an analytical model based on the algorithm used.
● Trains the model on test data sets, revising it as needed.
● Runs the model to generate test scores.
Deep learning has evolved hand-in-hand with the digital era, which has brought about
an explosion of data in all forms and from every region of the world. This data, known
simply as big data, is drawn from sources like social media, internet search engines, e-
commerce platforms, and online cinemas, among others. This enormous amount of data
is readily accessible and can be shared through fintech applications like cloud
computing.
However, the data, which normally is unstructured, is so vast that it could take decades
for humans to comprehend it and extract relevant information. Companies realize the
incredible potential that can result from unraveling this wealth of information and are
increasingly adapting to AI systems for automated support.
References:
● https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence
_overview. htm
● https://www.javatpoint.com/introduction-to-artificial-intelligence
● https://www.tutorialspoint.com/tensorflow/tensorflow_machine_learning_d
eep_learni ng.htm
● Story telling with data by Cole Nissbuamer Knafilc – Wiley Publication -
ISBN 9781119002253
● Fundamentals of Data Visualization, A primer on making informative and
compelling figures by Claus O Wilke - O’Reilly Publication – March 2019