Data Science M-1 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Module: 1

Data Science
Data Science has become the most demanding job of the 21st century. Every
organization is looking for candidates with knowledge of data science. In this tutorial,
we are giving an introduction to data science, with data science Job roles, tools for data
science, components of data science, application, etc.

So let's start,

What is Data Science?

Data science is a deep study of the massive amount of data, which involves extracting
meaningful insights from raw, structured, and unstructured data that is processed using
the scientific method, different technologies, and algorithms.
It is a multidisciplinary field that uses tools and techniques to manipulate the data so that
you can find something new and meaningful.
Prime Ministers of India | List of Prime Minister of India (1947-2020)

Data science uses the most powerful hardware, programming systems, and most efficient
algorithms to solve the data related problems. It is the future of artificial intelligence.

In short, we can say that data science is all about:

o Asking the correct questions and analyzing the raw data.

o Modeling the data using various complex and efficient algorithms.
o Visualizing the data to get a better perspective.
o Understanding the data to make better decisions and finding the final result.

Let suppose we want to travel from station A to station B by car. Now, we need to take
some decisions such as which route will be the best route to reach faster at the location,
in which route there will be no traffic jam, and which will be cost-effective. All these
decision factors will act as input data, and we will get an appropriate answer from these
decisions, so this analysis of data is called the data analysis, which is a part of data
Need for Data Science:

Some years ago, data was less and mostly available in a structured form, which could be
easily stored in excel sheets, and processed using BI tools.

But in today's world, data is becoming so vast, i.e., approximately 2.5 quintals bytes of
data is generating on every day, which led to data explosion. It is estimated as per
researches, that by 2020, 1.7 MB of data will be created at every single second, by a
single person on earth. Every Company requires data to work, grow, and improve their

Now, handling of such huge amount of data is a challenging task for every organization.
So to handle, process, and analysis of this, we required some complex, powerful, and
efficient algorithms and technology, and that technology came into existence as data
Science. Following are some main reasons for using data science technology:

o With the help of data science technology, we can convert the massive amount of
raw and unstructured data into meaningful insights.
o Data science technology is opting by various companies, whether it is a big brand
or a startup. Google, Amazon, Netflix, etc, which handle the huge amount of data,
are using data science algorithms for better customer experience.
o Data science is working for automating transportation such as creating a self-
driving car, which is the future of transportation.
o Data science can help in different predictions such as various survey, elections,
flight ticket confirmation, etc.

Data science Jobs:

As per various surveys, data scientist job is becoming the most demanding Job of the
21st century due to increasing demands for data science. Some people also called it
"the hottest job title of the 21st century". Data scientists are the experts who can use
various statistical tools and machine learning algorithms to understand and analyze the

The average salary range for data scientist will be approximately $95,000 to $ 165,000
per annum, and as per different researches, about 11.5 millions of job will be created
by the year 2026.

Types of Data Science Job

If you learn data science, then you get the opportunity to find the various exciting job
roles in this domain. The main job roles are given below:

1. Data Scientist
2. Data Analyst
3. Machine learning expert
4. Data engineer
5. Data Architect
6. Data Administrator
7. Business Analyst
8. Business Intelligence Manager

Below is the explanation of some critical job titles of data science.

1. Data Analyst:

Data analyst is an individual, who performs mining of huge amount of data, models the
data, looks for patterns, relationship, trends, and so on. At the end of the day, he comes
up with visualization and reporting for analyzing the data for decision making and
problem-solving process.

Skill required: For becoming a data analyst, you must get a good background
in mathematics, business intelligence, data mining, and basic knowledge of statistics.
You should also be familiar with some computer languages and tools such
as MATLAB, Python, SQL, Hive, Pig, Excel, SAS, R, JS, Spark, etc.

2. Machine Learning Expert:

The machine learning expert is the one who works with various machine learning
algorithms used in data science such as regression, clustering, classification, decision
tree, random forest, etc.

Skill Required: Computer programming languages such as Python, C++, R, Java, and
Hadoop. You should also have an understanding of various algorithms, problem-solving
analytical skill, probability, and statistics.

3. Data Engineer:

A data engineer works with massive amount of data and responsible for building and
maintaining the data architecture of a data science project. Data engineer also works for
the creation of data set processes used in modeling, mining, acquisition, and verification.

Skill required: Data engineer must have depth knowledge of SQL, MongoDB,
Cassandra, HBase, Apache Spark, Hive, MapReduce, with language knowledge
of Python, C/C++, Java, Perl, etc.

4. Data Scientist:

A data scientist is a professional who works with an enormous amount of data to come
up with compelling business insights through the deployment of various tools,
techniques, methodologies, algorithms, etc.

Skill required: To become a data scientist, one should have technical language skills
such as R, SAS, SQL, Python, Hive, Pig, Apache spark, MATLAB. Data scientists
must have an understanding of Statistics, Mathematics, visualization, and
communication skills.
Prerequisite for Data Science
Non-Technical Prerequisite:
o Curiosity: To learn data science, one must have curiosities. When you have
curiosity and ask various questions, then you can understand the business problem
o Critical Thinking: It is also required for a data scientist so that you can find
multiple new ways to solve the problem with efficiency.
o Communication skills: Communication skills are most important for a data
scientist because after solving a business problem, you need to communicate it
with the team.

Technical Prerequisite:
o Machine learning: To understand data science, one needs to understand the
concept of machine learning. Data science uses machine learning algorithms to
solve various problems.
o Mathematical modeling: Mathematical modeling is required to make fast
mathematical calculations and predictions from the available data.
o Statistics: Basic understanding of statistics is required, such as mean, median, or
standard deviation. It is needed to extract knowledge and obtain better results from
the data.
o Computer programming: For data science, knowledge of at least one
programming language is required. R, Python, Spark are some required computer
programming languages for data science.
o Databases: The depth understanding of Databases such as SQL, is essential for
data science to get the data and to work with data.
Difference between BI and Data Science
BI stands for business intelligence, which is also used for data analysis of business
information: Below are some differences between BI and Data sciences:

Criterion Business intelligence Data science

Data Business intelligence deals with Data science deals with structured
Source structured data, e.g., data warehouse. and unstructured data, e.g.,
weblogs, feedback, etc.

Method Analytical(historical data) Scientific(goes deeper to know the

reason for the data report)

Skills Statistics and Visualization are the Statistics, Visualization, and

two skills required for business Machine learning are the required
intelligence. skills for data science.

Focus Business intelligence focuses on both Data science focuses on past data,
Past and present data present data, and also future

Data Science Components:

The main components of Data Science are given below:

1. Statistics: Statistics is one of the most important components of data science.

Statistics is a way to collect and analyze the numerical data in a large amount and
finding meaningful insights from it.

2. Domain Expertise: In data science, domain expertise binds data science together.
Domain expertise means specialized knowledge or skills of a particular area. In data
science, there are various areas for which we need domain experts.

3. Data engineering: Data engineering is a part of data science, which involves

acquiring, storing, retrieving, and transforming the data. Data engineering also includes
metadata (data about data) to the data.

4. Visualization: Data visualization is meant by representing data in a visual context so

that people can easily understand the significance of data. Data visualization makes it
easy to access the huge amount of data in visuals.

5. Advanced computing: Heavy lifting of data science is advanced computing.

Advanced computing involves designing, writing, debugging, and maintaining the
source code of computer programs.
6. Mathematics: Mathematics is the critical part of data science. Mathematics involves
the study of quantity, structure, space, and changes. For a data scientist, knowledge of
good mathematics is essential.

7. Machine learning: Machine learning is backbone of data science. Machine learning

is all about to provide training to a machine so that it can act as a human brain. In data
science, we use various machine learning algorithms to solve the problems.

Tools for Data Science

Following are some tools required for data science:

o Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB,

Excel, RapidMiner.
o Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift
o Data Visualization tools: R, Jupyter, Tableau, Cognos.
o Machine learning tools: Spark, Mahout, Azure ML studio.
Data Science Lifecycle
The life-cycle of data science is explained as below diagram.

The main phases of data science life cycle are given below:

1. Discovery: The first phase is discovery, which involves asking the right questions.
When you start any data science project, you need to determine what are the basic
requirements, priorities, and project budget. In this phase, we need to determine all the
requirements of the project such as the number of people, technology, time, data, an end
goal, and then we can frame the business problem on first hypothesis level.

2. Data preparation: Data preparation is also known as Data Munging. In this phase,
we need to perform the following tasks:

o Data cleaning
o Data Reduction
o Data integration
o Data transformation,

After performing all the above tasks, we can easily use this data for our further

3. Model Planning: In this phase, we need to determine the various methods and
techniques to establish the relation between input variables. We will apply Exploratory
data analytics(EDA) by using various statistical formula and visualization tools to
understand the relations between variable and to see what data can inform us. Common
tools used for model planning are:

o SQL Analysis Services

o R
o Python

4. Model-building: In this phase, the process of model building starts. We will create
datasets for training and testing purpose. We will apply different techniques such as
association, classification, and clustering, to build the model.

Following are some common Model building tools:

o SAS Enterprise Miner

o SPCS Modeler

5. Operationalize: In this phase, we will deliver the final reports of the project, along
with briefings, code, and technical documents. This phase provides you a clear overview
of complete project performance and other components on a small scale before the full

6. Communicate results: In this phase, we will check if we reach the goal, which we
have set on the initial phase. We will communicate the findings and final result with the
business team.
Applications of Data Science:
o Image recognition and speech recognition:
Data science is currently using for Image and speech recognition. When you
upload an image on Facebook and start getting the suggestion to tag to your
friends. This automatic tagging suggestion uses image recognition algorithm,
which is part of data science.
When you say something using, "Ok Google, Siri, Cortana", etc., and these
devices respond as per voice control, so this is possible with speech recognition
o Gaming world:
In the gaming world, the use of Machine learning algorithms is increasing day by
day. EA Sports, Sony, Nintendo, are widely using data science for enhancing user
o Internet search:
When we want to search for something on the internet, then we use different types
of search engines such as Google, Yahoo, Bing, Ask, etc. All these search engines
use the data science technology to make the search experience better, and you can
get a search result with a fraction of seconds.
o Transport:
Transport industries also using data science technology to create self-driving cars.
With self-driving cars, it will be easy to reduce the number of road accidents.
o Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is
being used for tumor detection, drug discovery, medical image analysis, virtual
medical bots, etc.
o Recommendation systems:
Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data
science technology for making a better user experience with personalized
recommendations. Such as, when you search for something on Amazon, and you
started getting suggestions for similar products, so this is because of data science
o Risk detection:
Finance industries always had an issue of fraud and risk of losses, but with the
help of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and
any type of losses with an increase in customer satisfaction.

Business Analytics Data Science

Business Analytics is the statistical study of Data science is the study of data using
business data to gain insights. statistics, algorithms and technology.

Uses mostly structured data. Uses both structured and unstructured data.

Coding is widely used. This field is a

Does not involve much coding. It is more
combination of traditional analytics practice
statistics oriented.
with good computer science knowledge.

The whole analysis is based on statistical Statistics is used at the end of analysis
concepts. following coding.

Studies trends and patterns specific to

Studies almost every trend and pattern.

Top industries where business analytics is Top industries/applications where data science
used: finance, healthcare, marketing, retail, is used: e-commerce, finance, machine
Business Analytics Data Science

supply chain, telecommunications. learning, manufacturing.

Importance of Data Science for Business

There are many ways by which Data Science is helping businesses to run in a
better way:

1. Business Intelligence for Making Smarter Decisions

Traditional Business Intelligence was more descriptive and static in nature.
However, with the addition of data science, it has transformed itself to become
a more dynamic field. Data Science has rendered Business Intelligence to
incorporate a wide range of business operations.
With the massive increase in the volume of data, businesses need data
scientists to analyze and derive meaningful insights from the data.
The meaningful insights will help the data science companies to analyze
information at a large scale and gain necessary decision-making strategies.
The process of decision making involves the evaluation and assessment of
various factors involved in it. Decision Making is a four-step process:
1. Understanding the context and nature of the problem that we are required
to solve.
2. Exploring and quantifying the quality of the data.
3. Implementation of the right algorithm and tools for finding a solution to
the problems.
4. Using story-telling to translate our insights for a better understanding of
Through this way, businesses need data science for facilitating the decision-
making process.

2. Making Better Products

Companies should be able to attract their customers towards products. They
need to develop products that suit the requirements of customers and provide
them with guaranteed satisfaction. Therefore, industries require data to
develop their product in the best possible way.
The process involves the analysis of customer reviews to find the best fit for
the products. This analysis is carried out with the advanced analytical tools of
Data Science.
Furthermore, industries utilize the current market trends to devise a product
for the masses. These market trends provide businesses with clues about the
current need for the product. Businesses evolve with innovation.
With the growth in data, industries are able to implement not only newer
products but also various innovative strategies.
For example – Airbnb uses data science to improve its services The data
generated by the customers, is processed and analyzed. It is then used by
Airbnb to address the requirements and offer premier facilities to its
3. Managing Businesses Efficiently
Businesses today are data rich. They possess a plethora of data that allows
them to gain insights through a proper analysis of the data. Data Science
platforms unearth the hidden patterns that are present inside the data and help
to make meaningful analysis and prediction of events.
With Data Science, businesses can manage themselves more efficiently. Both
large scale businesses and small startups can benefit from data science in
order to grow further.
Data Scientists help to analyze the health of the businesses. With data science,
companies can predict the success rate of their strategies. Data Scientists
are responsible for turning raw data into cooked data.
This helps in summarizing the performance of the company and the health of
the product. Data Science identifies key metrics that are essential for the
determination of business performance.
Based on this, the business can take important measures to quantify and
evaluate its performance and take appropriate management steps. It can also
help the managers to analyze and determine the potential candidates for the
Using data science, businesses can also foster leadership development by
tracking the performance, success rate, and other important metrics. With
workforce analytics, industries can evaluate what is best working for the
For example – Data Science can be used to monitor the performance of
employees. Using this, managers can analyze the contributions made by the
employees and determine when they should be promoted, managing their
perks, etc.
4. Predictive Analytics to Predict Outcomes
Predictive analytics is the most important part of businesses. With the advent
of advanced predictive tools and technologies, companies have expanded their
capability to deal with diverse forms of data.
In formal terms, predictive analytics is the statistical analysis of data that
involves several machine learning algorithms for predicting the future
outcome using the historical data. There are several predictive analytics tools
like SAS, IBM SPSS, SAP HANA, etc.
There are various applications of predictive analytics in businesses such as
customer segmentation, risk assessment, sales forecasting, and market
analysis. With predictive analytics, businesses have an edge over others as
they are able to foresee future events and take appropriate measures in respect
to it.
Predictive Analytics has its own specific implementation based on the type of
industries. However, regardless of that, it shares a common role in predicting
future events.
5. Leveraging Data for Business Decisions
In the previous section, we understood how data science is playing an
important role in predicting the future. These predictions are necessary for
businesses to learn about future outcomes. Based on this, businesses take
decisions that are data-driven.
In the past, many businesses would take poor decisions due to the lack of
surveys or sole reliance on ‘gut feelings’. It would result in some disastrous
decisions leading to losses in millions.
However, with the presence of a plethora of data and necessary data tools, it is
now possible for the data industries to make calculated data-driven
Furthermore, business decisions can be made with the help of powerful tools
that can not only process data faster but also provide accurate results.

6. Assessing Business Decisions

After making decisions through the forecast of the future occurrences, it is a
requirement for the companies to assess them. This is possible through several
hypothesis testing tools.
After implementing the decisions, businesses should understand how
these decisions affect their performance and growth. If the decision leads
to any negative factor, then they should analyze it and eliminate the problem
that is slowing down their performance.
There are various procedures through which businesses can evaluate their
decisions and plan a suitable action strategy. These decisions revolve around
their customer requirements, company goals as well as the needs of the project
Furthermore, in order to assess future growth through the present course of
actions, businesses can make profits considerably with the help of data

7. Automating Recruitment Processes

Data Science has played a key role in bringing automation to several
industries. It has taken away the mundane and repetitive jobs. One such job is
that of resume screening. Every day, companies have to deal with hordes of
applicant’s resumes.
Some major businesses can even attract thousands of resumes for a position.
In order to make sense of all of these resumes and select the right candidate,
businesses make use of data science.
The data science technologies like image recognition are able to convert the
visual information from the resume into a digital format. It then processes the
data using various analytical algorithms like clustering and classification to
churn out the right candidate for the job.
Furthermore, businesses study the right trends and analyze potential applicants
for the job. This allows them to reach out to candidates and have an in-depth
insight into the job-seeker market.
Now, let’s see the case study of Walmart and discuss how it is using data to
modify the supply chain and understand the need of customers.

Data Science Case Study

Walmart – Leveraging Data to Make Business Better
Walmart is the world’s largest retailer. It is one of the many major industries
that is leveraging Big Data to make the business more efficient. Walmart
handles a plethora of customer data. A staggering amount of about 2.5
petabytes of data is collected from the customers every hour.
This data is unstructured that is utilized through Hadoop and NoSQL. It tracks
and monitors various factors that might affect the sales at Walmart stores.
Some of the ways in which Walmart is using data science are:
• Walmart is using data science to make store checkouts more efficient.
There are certain times of the day where the checkouts can become
crowdy. This makes it difficult for Walmart employees to manage
customers during rush hours. However, with the help of predictive
analytics, Walmart can analyze data and determine the best form of
checkout for each store, that is, self-checkout and facilitated checkout.
• Walmart is using real-time analytics to analyze the purchasing
patterns of the customers. This allows them to stock up on products that
are in demand and also the products which will be in future demand based
on several factors.
• Walmart is managing supply chain and logistics with the help of data
science. It manages its inventory and analyzes the rate of its depletion,
thereby taking the necessary steps to mitigate it through efficient logistics.
Walmart also analyzes the transportation lanes for the company’s trucks
to follow. It specifies an optimized route using data science, thereby
reducing the cost and time.
• Walmart is personalizing the shopping experience by analyzing the
preferences and behaviour of the customers. Using data science, it
tracks the purchasing patterns of the customers and recommends them
further products and discounts to improve their shopping experience.

Main Components of Data Science

The main components or processes are as follows:

1. Data Exploration
It is the most important step, as this step consumes the most amount of time.
Around 70 per cent of the time is spent on data exploration. The main
ingredient for data science is data, so when we get data, it is seldom that data
is in a correct structured form. There is a lot of noise present in the data. The
noise here means a lot of unwanted data that is not required. So what do we do
in this step? This step involves sampling and transformation of data in which
we check the observations (rows) and features (columns) and remove the
noise by using statistical methods. This step is also used to check the
relationship among various features(columns) in the data set; by the
relationship, we mean whether the features(columns) are dependent on each
other or independent of each other, whether there are missing values data or
not. So basically, the data is transformed and readied for further use. Hence
this is one of the most time-consuming steps.

2. Modeling
So, by now, our data is prepared and ready to go. This is the second step,
where we actually use Machine Learning algorithms. Here we actually fit the
data into the model. The selection of a model depends on the type of data we
have and the business requirement. For example, the model selection for
recommending an article to a customer will be different than the model
required for predicting the number of articles that will be sold on a particular
day. Once the model is decided, we fit the data into the model.

3. Testing the Model

It is the next step and very important concerning the performance of the
model. The model is tested with test data to check the model’s accuracy and
other characteristics and make the required changes in the model to get the
desired result. In case we do not get the desired accuracy, we can again go to
step 2(modeling), select a different model, and then repeat the same step 3 and
choose the model which gives the best result as per the business requirement.

4. Deploying Models
Once we get the desired result by proper testing as per the business
requirements, we finalize the model, which gives us the best result as per
testing results and deploys the model in the production environment.


Some of the key job options that one can explore in the field are:
1. Data Scientist
2. Data Engineer
3. Data Analyst
4. Machine Learning Engineer
5. Data Journalist
6. Database Admin
7. Financial Analyst
8. Business Analyst
9. Product Analyst
10.Business Intelligence Analyst
11.Marketing Analyst
12.Quantitative Analyst
13.Data Visualization Specialist
14.Functional Analyst
15.Data System Developer
Let’s explore these top careers in detail.


As one of the most prominent Data Analytics career opportunities, Data Scientists collect and
analyze data that can be communicated as actionable insights. Data Scientists are often those
who work with complex data and advanced analytics and require strong expertise in Data
Analytics, including programming languages, like Python and R, data visualization tools, and
other vital skills.


A Data Engineer often focuses on massive data sets and is tasked with optimizing the
organization’s infrastructure around several Data Analytics processes. Data Engineers require
not just strong expertise in data visualization and programming but also need to have
experience in developing and testing solutions, to be on par with the requirement.


As one of the most prominent Data Analytics career opportunities, Data Analysts are required
in several industries to interpret and represent data in various forms in order to derive
actionable insights. Data Analysts need to be well versed with Excel, Access, SharePoint, and
SQL, while also knowing data mining, data modeling, and data visualization tools. At times,
Data Analysts also help to segregate and simplify data from several systems that require to be
shared for further analytics.


Machine Learning Engineering is an advanced Data Analytics career path that combines the
expertise of Data Science and Machine Learning/Artificial Intelligence. A Machine Learning
Engineer needs to be skilled in Deep Learning, Python, and other programming languages, Big
Data Analytics, and data visualization tools. According to reports, Machine Learning
Engineers are sought-after in organizations like Apple Inc, Accenture, PwC, JP Morgan, and
other companies, with the average annual salary of $1,11,855.

Data and information are integral to an organization, especially when it comes to journalism.
So, publications and several news agencies require Data Journalists who are primarily the ones
finding data, distinguishing the useful data from the bad, and analyzing it for simplicity. If you
are wondering how to make a career in Data Analytics after obtaining a degree in Journalism,
Data Journalists need to have good technical skills in SQL, Python while being experts in Data
Visualization and Statistics. The field also requires a sound knowledge of various other areas to
contextualize the data for further usage.


Wondering how to start a Data Analytics career from scratch? Then, a Database Admin is a
great role to consider. Database Administrators are primarily responsible for ensuring that the
database is working correctly, and they should be familiar with several database tools like
SQL, NoSQL, and more. Database Admins monitor and optimize performance and implement,
configure, and troubleshoot database instances to ensure optimal health of the environment.


Data is also used for financial planning, investments, and making other financial decisions.
This is where Financial Analysis comes in. Financial Analysts are Data Analysts who bring
Finance domain expertise and use the various insights to interpret it, making this role a perfect
fit for those looking for career growth in Data Analytics with a background in Finance.
Financial Analysts have working knowledge of Data Science, while also having expertise in
financial assets like bonds, stocks, trading, and other domain-specific instruments.


A Business Analyst (BA) is now main stream in every organization, especially product, IT
services, and other organizations utilizing technology. A Business Analyst has to have
expertise in the chosen domain and is tasked with testing, updating, installing, and maintaining
the business process systems for the organization, which includes data processing and other
tools. BA may not have in-depth knowledge of Data Science, but having functional expertise is
always a big plus.


Similar to Business Analyst, a Product Analyst is responsible for managing product-related

processes like planning, analyzing existing or new products, and ensuring proper working of
the products used within the organization. If you are wondering how to switch career to Data
Analytics, understand that Product Analysts need to have a strong knowledge of tools in Data
Science like R, Python, SQL, and others.


Business Intelligence (BI) tools help organizations convert raw data into insights, and Business
Intelligence Analysts need to have a strong familiarity with the BI tools. BI Analysts work with
Data Scientists and Analysts to provide data visualizations through charts and graphs and
create reports used for making significant business decisions.


A great option for a career in Data Analytics in India, Marketing Analysts use data to make
smart decisions for marketing and sales-related activities. Marketing Analysts are experts in
data crunching, while also being aware of the marketing aspects of a business.

Marketing Analysts need to use relevant data to make predictions, identify opportunities, and
streamline processes. This requires strong communication skills in addition to technical skills,
as the person in this role needs to often interact with the client-facing teams.

A highly sought-after role for those looking for a career in Big Data analytics in India,
Quantitative Analysts use data to understand and flag potential investment opportunities or
risks depending on the outcome. Since these decisions are used to make an investment-related
decision like trading models, stock predictions, commodities, and exchange rates, these
positions usually pay well.


Similar to the Business Intelligence Analyst role, Data Visualization Specialists need to have a
strong familiarity with various data processing tools like Tableau, Qlik, Datameer, SAP,
TIBCO, and more. In addition to business and functional knowledge, this role has an excellent
scope for Data Analysts who have in-depth familiarity with all the programming languages that
are used for data analysis.


Great for those who want to make a career change to Data Analytics, Functional Analysis is
another stream that leverages the Data Science background to identify and mitigate system
risks, functions, and records. Functional Analysts find problems, recommend system and
technology upgrades, while being in charge of the system procedure, to ensure that operations
are efficient and effective.


With a focus on building products that effectively leverage an organization’s data for decision-
making and insights, Data System Developers are professionals who are the heart of the
database and the system. They help to design, build, and maintain an organization’s data and
analytics infrastructure and also facilitate the process for Data Scientists, Analysts, and other
What Do Data Scientists Do?
In simple terms, a data scientist’s job is to analyze data for actionable

Specific tasks include:

• Identifying the data-analytics problems that offer the greatest

opportunities to the organization
• Determining the correct data sets and variables
• Collecting large sets of structured and unstructured data from disparate
• Cleaning and validating the data to ensure accuracy, completeness, and
• Devising and applying models and algorithms to mine the stores of big
• Analyzing the data to identify patterns and trends
• Interpreting the data to discover solutions and opportunities
• Communicating findings to stakeholders using visualization and other
“More generally, a data scientist is someone who knows how to extract
meaning from and interpret data, which requires both tools and methods from
statistics and machine learning, as well as being human. She spends a lot of
time in the process of collecting, cleaning, and munging data, because data is
never clean. This process requires persistence, statistics, and software
engineering skills—skills that are also necessary for understanding biases in
the data, and for debugging logging output from code.

Once she gets the data into shape, a crucial part is exploratory data analysis,
which combines visualization and data sense. She’ll find patterns, build
models, and algorithms—some with the intention of understanding product
usage and the overall health of the product, and others to serve as prototypes
that ultimately get baked back into the product. She may design experiments,
and she is a critical part of data-driven decision making. She’ll communicate
with team members, engineers, and leadership in clear language and with data
visualizations so that even if her colleagues are not immersed in the data
themselves, they will understand the implications.”

Would You Make a Good Data Scientist?

To find out, ask yourself: Do you . . .

• Hold a degree in mathematics, statistics, computer science, management

information systems, or marketing?
• Have substantial work experience in any of these areas?
• Have an interest in data collection and analysis?
• Enjoy individualized work and problem solving?
• Communicate well both verbally and visually?
• Want to broaden your skills and take on new challenges?
If you answered yes to any of these questions, you may find a lot to like in the
field of data science.

Data scientists require knowledge of math or statistics. A natural curiosity is

also important, as is creative and critical thinking. What can you do with all
the data? What undiscovered opportunities lie hidden within? You must have
a knack for connecting the dots and a desire to search out the answers to
questions that have not yet been asked if you are to realize the data’s full

Data scientists are also highly educated. According to industry

resource KDnuggets, 88 percent of data scientists have at least a master’s
degree and 46 percent have PhDs.

You also need some background in computer programming so you can devise
the models and algorithms necessary to mine the stores of big data. Python
and R are two of the premier programming environments for data science.

You must be something of an entrepreneur. A head for business strategy is

important. Although you may work with other data specialists or even with an
interdisciplinary team of professionals, you will not be successful if you
cannot devise your own methods and build your own infrastructures to slice
and dice the data that will lead you to your new discoveries and new visions
for the future.

You must also be able to communicate complex ideas to your nontechnical

stakeholders in a way they can easily understand. Data-science software tools
can help you visualize your findings, but you will also need the verbal
communication skills to tell the story clearly.

What Does a Data Analyst Do?

A data analyst collects, cleans, and interprets data sets in order to answer a
question or solve a problem. They can work in many industries, including
business, finance, criminal justice, science, medicine, and government.

What kind of customers should a business target in its next ad campaign? What
age group is most vulnerable to a particular disease? What patterns in behavior
are connected to financial fraud?

These are the types of questions you might be pressed to answer as a data
analyst. Read on to find out more about what a data analyst is, what skills you'll
need, and how you can start on a path to become one.

What is data analysis?

Data analysis is the process of gleaning insights from data to help inform better
business decisions. The process of analyzing data typically moves through five
iterative phases:
• Identify the data you want to analyze
• Collect the data
• Clean the data in preparation for analysis
• Analyze the data
• Interpret the results of the analysis
Data analysis can take different forms, depending on the question you’re trying
to answer. You can read more about the types of data analysis here. Briefly,
descriptive analysis tells us what happened, diagnostic analysis tells us why it
happened, predictive analytics forms projections about the future, and
prescriptive analysis creates actionable advice on what actions to take.

Data analyst tasks and responsibilities

A data analyst is a person whose job is to gather and interpret data in order to
solve a specific problem. The role includes plenty of time spent with data, but
entails communicating findings too.

Here’s what many data analysts do on a day-to-day basis:

• Gather data: Analysts often collect data themselves. This could include
conducting surveys, tracking visitor characteristics on a company website, or
buying datasets from data collection specialists.
• Clean data: Raw data might contain duplicates, errors, or outliers. Cleaning
the data means maintaining the quality of data in a spreadsheet or through a
programming language so that your interpretations won’t be wrong or skewed.
• Model data: This entails creating and designing the structures of a database.
You might choose what types of data to store and collect, establish how data
categories are related to each other, and work through how the data actually
• Interpret data: Interpreting data will involve finding patterns or trends in data
that will help you answer the question at hand.
• Present: Communicating the results of your findings will be a key part of your
job. You do this by putting together visualizations like charts and graphs,
writing reports, and presenting information to interested parties.

What tools do data analysts use?

During the process of data analysis, analysts often use a wide variety of tools to
make their work more accurate and efficient. Some of the most common tools
in the data analytics industry include:
• Microsoft Excel
• Google Sheets
• Tableau
• R or Python
• Microsoft Power BI
• Jupyter Notebooks

Data analyst salary and job outlook

The average base salary for a data analyst in the US is $68,577 in June 2021,
according to Glassdoor. This can vary depending on your seniority, where in
the US you’re located, and other factors.

Data analysts are in high demand. The World Economic Forum listed it as
number two in growing jobs in the US. The Bureau of Labor Statistics also
reports related occupations as having extremely high growth rates [1].

From 2019 to 2029, operations research analyst positions are expected to grow
by 28 percent, market research analysts by 18 percent, and mathematicians and
statisticians by 33 percent. That’s a lot higher than the total employment
growth rate of four percent.

Types of data analysts

As advancing technology has rapidly expanded the types and amount of
information we can collect, knowing how to gather, sort, and analyze data has
become a crucial part of almost any industry. You’ll find data analysts in the
criminal justice, fashion, food, technology, business, environment, and public
sectors—among many others.

People who perform data analysis might have other titles such as:
• Medical and healthcare analyst
• Marketing research analyst
• Business analyst
• Operations research analyst
• Intelligence analyst

How to become a data analyst

There’s more than one path toward a career as a data analyst. Whether you’re
just graduating from school or looking to switch careers, the first step is often
assessing what transferable skills you have and building the new skills you’ll
need in this new role.

Data analyst technical skills

• Database tools: Microsoft Excel and SQL should be mainstays in any data
analyst’s toolbox. While Excel is ubiquitous across industries, SQL can handle
larger sets of data and is widely regarded as a necessity for data analysis.
• Programming languages: Learning a statistical programming language
like Python or R will let you handle large sets of data and perform complex
equations. Though Python and R are among the most common, it’s a good idea
to look at several job descriptions of a position you’re interested in to
determine which language will be most useful to your industry.
• Data visualization: Presenting your findings in a clear and compelling way is
crucial to being a successful data analyst. Knowing how best to present
information through charts and graphs will make sure colleagues, employers,
and stakeholders will understand your work. Tableau, Jupyter Notebook, and
Excel are among the many tools used to create visuals.
• Statistics and math: Knowing the concepts behind what data tools are actually
doing will help you tremendously in your work. Having a solid grasp of
statistics and math will help you determine which tools are best to use to solve
a particular problem, help you catch errors in your data, and have a better
understanding of the results.

If that seems like a lot, don’t worry—there are plenty of courses that will walk
you through the basics of the hard skills you need as a data analyst. This IBM
Data Analyst Professional Certificate course on Coursera can be a good place
to start.
Data analyst workplace skills
• Problem solving: A data analyst needs to have a good understanding of the
question being asked and the problem that needs to be solved. They also should
be able to find patterns or trends that might reveal a story. Having the critical
thinking skills will allow you to focus on the right types of data, recognize the
most revealing methods of analysis, and catch gaps in your work.
• Communication: Being able to get your ideas across to other people will be
crucial to your work as a data analyst. Strong written and speaking skills to
communicate with colleagues and other stakeholders are good assets in data
• Industry knowledge: Knowing about the industry you work in—healthcare,
business, finance, or otherwise—will give you an advantage in your work and
in job applications. If you’re trying to break into a specific industry, take some
time to pay attention to the news in your industry, or read a book on the
subject. This can familiarize you with the industry’s main issues and trends.

Paths to becoming a data analyst

Acquiring these skills are the first step to becoming a data analyst. Here are a
few routes you can take to get them that are flexible enough to fit in around
your life.
• Professional certificate: Entry-level professional certificate programs usually
require no previous experience in the field. They can teach you basic skills like
SQL or statistics while giving you the chance to create projects for your
portfolio and provide real-time feedback on your work. Several professional
certificate programs on Coursera do just that.
• Bachelor's degree: The Bureau of Labor Statistics recommends a bachelor’s
degree for jobs that involve data analysis. If you’re considering getting a degree
to become a data analyst, focusing your coursework in statistics, math, or
computer science can give you a head start with potential employers. Many
online bachelor’s degrees have flexible scheduling so you can fit a degree in
around your priorities.
• Self-study: If you want a path that doesn’t include formal training, it’s possible
to learn the skills necessary for data analysis on your own. Get started with
this data analytics reading list for beginners. Once you’re ready to start building
a portfolio, here are some ideas for data analytics projects.

You might also like