Data Science M-1 Notes
Data Science M-1 Notes
Data Science M-1 Notes
Data Science
Data Science has become the most demanding job of the 21st century. Every
organization is looking for candidates with knowledge of data science. In this tutorial,
we are giving an introduction to data science, with data science Job roles, tools for data
science, components of data science, application, etc.
So let's start,
Data science uses the most powerful hardware, programming systems, and most efficient
algorithms to solve the data related problems. It is the future of artificial intelligence.
Example:
Let suppose we want to travel from station A to station B by car. Now, we need to take
some decisions such as which route will be the best route to reach faster at the location,
in which route there will be no traffic jam, and which will be cost-effective. All these
decision factors will act as input data, and we will get an appropriate answer from these
decisions, so this analysis of data is called the data analysis, which is a part of data
science.
Need for Data Science:
Some years ago, data was less and mostly available in a structured form, which could be
easily stored in excel sheets, and processed using BI tools.
But in today's world, data is becoming so vast, i.e., approximately 2.5 quintals bytes of
data is generating on every day, which led to data explosion. It is estimated as per
researches, that by 2020, 1.7 MB of data will be created at every single second, by a
single person on earth. Every Company requires data to work, grow, and improve their
businesses.
Now, handling of such huge amount of data is a challenging task for every organization.
So to handle, process, and analysis of this, we required some complex, powerful, and
efficient algorithms and technology, and that technology came into existence as data
Science. Following are some main reasons for using data science technology:
o With the help of data science technology, we can convert the massive amount of
raw and unstructured data into meaningful insights.
o Data science technology is opting by various companies, whether it is a big brand
or a startup. Google, Amazon, Netflix, etc, which handle the huge amount of data,
are using data science algorithms for better customer experience.
o Data science is working for automating transportation such as creating a self-
driving car, which is the future of transportation.
o Data science can help in different predictions such as various survey, elections,
flight ticket confirmation, etc.
The average salary range for data scientist will be approximately $95,000 to $ 165,000
per annum, and as per different researches, about 11.5 millions of job will be created
by the year 2026.
1. Data Scientist
2. Data Analyst
3. Machine learning expert
4. Data engineer
5. Data Architect
6. Data Administrator
7. Business Analyst
8. Business Intelligence Manager
Data analyst is an individual, who performs mining of huge amount of data, models the
data, looks for patterns, relationship, trends, and so on. At the end of the day, he comes
up with visualization and reporting for analyzing the data for decision making and
problem-solving process.
Skill required: For becoming a data analyst, you must get a good background
in mathematics, business intelligence, data mining, and basic knowledge of statistics.
You should also be familiar with some computer languages and tools such
as MATLAB, Python, SQL, Hive, Pig, Excel, SAS, R, JS, Spark, etc.
The machine learning expert is the one who works with various machine learning
algorithms used in data science such as regression, clustering, classification, decision
tree, random forest, etc.
Skill Required: Computer programming languages such as Python, C++, R, Java, and
Hadoop. You should also have an understanding of various algorithms, problem-solving
analytical skill, probability, and statistics.
3. Data Engineer:
A data engineer works with massive amount of data and responsible for building and
maintaining the data architecture of a data science project. Data engineer also works for
the creation of data set processes used in modeling, mining, acquisition, and verification.
Skill required: Data engineer must have depth knowledge of SQL, MongoDB,
Cassandra, HBase, Apache Spark, Hive, MapReduce, with language knowledge
of Python, C/C++, Java, Perl, etc.
4. Data Scientist:
A data scientist is a professional who works with an enormous amount of data to come
up with compelling business insights through the deployment of various tools,
techniques, methodologies, algorithms, etc.
Skill required: To become a data scientist, one should have technical language skills
such as R, SAS, SQL, Python, Hive, Pig, Apache spark, MATLAB. Data scientists
must have an understanding of Statistics, Mathematics, visualization, and
communication skills.
Prerequisite for Data Science
Non-Technical Prerequisite:
o Curiosity: To learn data science, one must have curiosities. When you have
curiosity and ask various questions, then you can understand the business problem
easily.
o Critical Thinking: It is also required for a data scientist so that you can find
multiple new ways to solve the problem with efficiency.
o Communication skills: Communication skills are most important for a data
scientist because after solving a business problem, you need to communicate it
with the team.
Technical Prerequisite:
o Machine learning: To understand data science, one needs to understand the
concept of machine learning. Data science uses machine learning algorithms to
solve various problems.
o Mathematical modeling: Mathematical modeling is required to make fast
mathematical calculations and predictions from the available data.
o Statistics: Basic understanding of statistics is required, such as mean, median, or
standard deviation. It is needed to extract knowledge and obtain better results from
the data.
o Computer programming: For data science, knowledge of at least one
programming language is required. R, Python, Spark are some required computer
programming languages for data science.
o Databases: The depth understanding of Databases such as SQL, is essential for
data science to get the data and to work with data.
Difference between BI and Data Science
BI stands for business intelligence, which is also used for data analysis of business
information: Below are some differences between BI and Data sciences:
Data Business intelligence deals with Data science deals with structured
Source structured data, e.g., data warehouse. and unstructured data, e.g.,
weblogs, feedback, etc.
Focus Business intelligence focuses on both Data science focuses on past data,
Past and present data present data, and also future
predictions.
2. Domain Expertise: In data science, domain expertise binds data science together.
Domain expertise means specialized knowledge or skills of a particular area. In data
science, there are various areas for which we need domain experts.
The main phases of data science life cycle are given below:
1. Discovery: The first phase is discovery, which involves asking the right questions.
When you start any data science project, you need to determine what are the basic
requirements, priorities, and project budget. In this phase, we need to determine all the
requirements of the project such as the number of people, technology, time, data, an end
goal, and then we can frame the business problem on first hypothesis level.
2. Data preparation: Data preparation is also known as Data Munging. In this phase,
we need to perform the following tasks:
o Data cleaning
o Data Reduction
o Data integration
o Data transformation,
After performing all the above tasks, we can easily use this data for our further
processes.
3. Model Planning: In this phase, we need to determine the various methods and
techniques to establish the relation between input variables. We will apply Exploratory
data analytics(EDA) by using various statistical formula and visualization tools to
understand the relations between variable and to see what data can inform us. Common
tools used for model planning are:
4. Model-building: In this phase, the process of model building starts. We will create
datasets for training and testing purpose. We will apply different techniques such as
association, classification, and clustering, to build the model.
5. Operationalize: In this phase, we will deliver the final reports of the project, along
with briefings, code, and technical documents. This phase provides you a clear overview
of complete project performance and other components on a small scale before the full
deployment.
6. Communicate results: In this phase, we will check if we reach the goal, which we
have set on the initial phase. We will communicate the findings and final result with the
business team.
Applications of Data Science:
o Image recognition and speech recognition:
Data science is currently using for Image and speech recognition. When you
upload an image on Facebook and start getting the suggestion to tag to your
friends. This automatic tagging suggestion uses image recognition algorithm,
which is part of data science.
When you say something using, "Ok Google, Siri, Cortana", etc., and these
devices respond as per voice control, so this is possible with speech recognition
algorithm.
o Gaming world:
In the gaming world, the use of Machine learning algorithms is increasing day by
day. EA Sports, Sony, Nintendo, are widely using data science for enhancing user
experience.
o Internet search:
When we want to search for something on the internet, then we use different types
of search engines such as Google, Yahoo, Bing, Ask, etc. All these search engines
use the data science technology to make the search experience better, and you can
get a search result with a fraction of seconds.
o Transport:
Transport industries also using data science technology to create self-driving cars.
With self-driving cars, it will be easy to reduce the number of road accidents.
o Healthcare:
In the healthcare sector, data science is providing lots of benefits. Data science is
being used for tumor detection, drug discovery, medical image analysis, virtual
medical bots, etc.
o Recommendation systems:
Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data
science technology for making a better user experience with personalized
recommendations. Such as, when you search for something on Amazon, and you
started getting suggestions for similar products, so this is because of data science
technology.
o Risk detection:
Finance industries always had an issue of fraud and risk of losses, but with the
help of data science, this can be rescued.
Most of the finance companies are looking for the data scientist to avoid risk and
any type of losses with an increase in customer satisfaction.
Business Analytics is the statistical study of Data science is the study of data using
business data to gain insights. statistics, algorithms and technology.
Uses mostly structured data. Uses both structured and unstructured data.
The whole analysis is based on statistical Statistics is used at the end of analysis
concepts. following coding.
Top industries where business analytics is Top industries/applications where data science
used: finance, healthcare, marketing, retail, is used: e-commerce, finance, machine
Business Analytics Data Science
1. Data Exploration
It is the most important step, as this step consumes the most amount of time.
Around 70 per cent of the time is spent on data exploration. The main
ingredient for data science is data, so when we get data, it is seldom that data
is in a correct structured form. There is a lot of noise present in the data. The
noise here means a lot of unwanted data that is not required. So what do we do
in this step? This step involves sampling and transformation of data in which
we check the observations (rows) and features (columns) and remove the
noise by using statistical methods. This step is also used to check the
relationship among various features(columns) in the data set; by the
relationship, we mean whether the features(columns) are dependent on each
other or independent of each other, whether there are missing values data or
not. So basically, the data is transformed and readied for further use. Hence
this is one of the most time-consuming steps.
2. Modeling
So, by now, our data is prepared and ready to go. This is the second step,
where we actually use Machine Learning algorithms. Here we actually fit the
data into the model. The selection of a model depends on the type of data we
have and the business requirement. For example, the model selection for
recommending an article to a customer will be different than the model
required for predicting the number of articles that will be sold on a particular
day. Once the model is decided, we fit the data into the model.
4. Deploying Models
Once we get the desired result by proper testing as per the business
requirements, we finalize the model, which gives us the best result as per
testing results and deploys the model in the production environment.
1. DATA SCIENTIST
As one of the most prominent Data Analytics career opportunities, Data Scientists collect and
analyze data that can be communicated as actionable insights. Data Scientists are often those
who work with complex data and advanced analytics and require strong expertise in Data
Analytics, including programming languages, like Python and R, data visualization tools, and
other vital skills.
2. DATA ENGINEER
A Data Engineer often focuses on massive data sets and is tasked with optimizing the
organization’s infrastructure around several Data Analytics processes. Data Engineers require
not just strong expertise in data visualization and programming but also need to have
experience in developing and testing solutions, to be on par with the requirement.
3. DATA ANALYST
As one of the most prominent Data Analytics career opportunities, Data Analysts are required
in several industries to interpret and represent data in various forms in order to derive
actionable insights. Data Analysts need to be well versed with Excel, Access, SharePoint, and
SQL, while also knowing data mining, data modeling, and data visualization tools. At times,
Data Analysts also help to segregate and simplify data from several systems that require to be
shared for further analytics.
Machine Learning Engineering is an advanced Data Analytics career path that combines the
expertise of Data Science and Machine Learning/Artificial Intelligence. A Machine Learning
Engineer needs to be skilled in Deep Learning, Python, and other programming languages, Big
Data Analytics, and data visualization tools. According to reports, Machine Learning
Engineers are sought-after in organizations like Apple Inc, Accenture, PwC, JP Morgan, and
other companies, with the average annual salary of $1,11,855.
5. DATA JOURNALIST
Data and information are integral to an organization, especially when it comes to journalism.
So, publications and several news agencies require Data Journalists who are primarily the ones
finding data, distinguishing the useful data from the bad, and analyzing it for simplicity. If you
are wondering how to make a career in Data Analytics after obtaining a degree in Journalism,
Data Journalists need to have good technical skills in SQL, Python while being experts in Data
Visualization and Statistics. The field also requires a sound knowledge of various other areas to
contextualize the data for further usage.
6. DATABASE ADMIN
Wondering how to start a Data Analytics career from scratch? Then, a Database Admin is a
great role to consider. Database Administrators are primarily responsible for ensuring that the
database is working correctly, and they should be familiar with several database tools like
SQL, NoSQL, and more. Database Admins monitor and optimize performance and implement,
configure, and troubleshoot database instances to ensure optimal health of the environment.
7. FINANCIAL ANALYST
Data is also used for financial planning, investments, and making other financial decisions.
This is where Financial Analysis comes in. Financial Analysts are Data Analysts who bring
Finance domain expertise and use the various insights to interpret it, making this role a perfect
fit for those looking for career growth in Data Analytics with a background in Finance.
Financial Analysts have working knowledge of Data Science, while also having expertise in
financial assets like bonds, stocks, trading, and other domain-specific instruments.
8. BUSINESS ANALYST
A Business Analyst (BA) is now main stream in every organization, especially product, IT
services, and other organizations utilizing technology. A Business Analyst has to have
expertise in the chosen domain and is tasked with testing, updating, installing, and maintaining
the business process systems for the organization, which includes data processing and other
tools. BA may not have in-depth knowledge of Data Science, but having functional expertise is
always a big plus.
9. PRODUCT ANALYST
Business Intelligence (BI) tools help organizations convert raw data into insights, and Business
Intelligence Analysts need to have a strong familiarity with the BI tools. BI Analysts work with
Data Scientists and Analysts to provide data visualizations through charts and graphs and
create reports used for making significant business decisions.
A great option for a career in Data Analytics in India, Marketing Analysts use data to make
smart decisions for marketing and sales-related activities. Marketing Analysts are experts in
data crunching, while also being aware of the marketing aspects of a business.
Marketing Analysts need to use relevant data to make predictions, identify opportunities, and
streamline processes. This requires strong communication skills in addition to technical skills,
as the person in this role needs to often interact with the client-facing teams.
12. QUANTITATIVE ANALYST
A highly sought-after role for those looking for a career in Big Data analytics in India,
Quantitative Analysts use data to understand and flag potential investment opportunities or
risks depending on the outcome. Since these decisions are used to make an investment-related
decision like trading models, stock predictions, commodities, and exchange rates, these
positions usually pay well.
Similar to the Business Intelligence Analyst role, Data Visualization Specialists need to have a
strong familiarity with various data processing tools like Tableau, Qlik, Datameer, SAP,
TIBCO, and more. In addition to business and functional knowledge, this role has an excellent
scope for Data Analysts who have in-depth familiarity with all the programming languages that
are used for data analysis.
Great for those who want to make a career change to Data Analytics, Functional Analysis is
another stream that leverages the Data Science background to identify and mitigate system
risks, functions, and records. Functional Analysts find problems, recommend system and
technology upgrades, while being in charge of the system procedure, to ensure that operations
are efficient and effective.
With a focus on building products that effectively leverage an organization’s data for decision-
making and insights, Data System Developers are professionals who are the heart of the
database and the system. They help to design, build, and maintain an organization’s data and
analytics infrastructure and also facilitate the process for Data Scientists, Analysts, and other
Developers.
What Do Data Scientists Do?
In simple terms, a data scientist’s job is to analyze data for actionable
insights.
Once she gets the data into shape, a crucial part is exploratory data analysis,
which combines visualization and data sense. She’ll find patterns, build
models, and algorithms—some with the intention of understanding product
usage and the overall health of the product, and others to serve as prototypes
that ultimately get baked back into the product. She may design experiments,
and she is a critical part of data-driven decision making. She’ll communicate
with team members, engineers, and leadership in clear language and with data
visualizations so that even if her colleagues are not immersed in the data
themselves, they will understand the implications.”
You also need some background in computer programming so you can devise
the models and algorithms necessary to mine the stores of big data. Python
and R are two of the premier programming environments for data science.
What kind of customers should a business target in its next ad campaign? What
age group is most vulnerable to a particular disease? What patterns in behavior
are connected to financial fraud?
These are the types of questions you might be pressed to answer as a data
analyst. Read on to find out more about what a data analyst is, what skills you'll
need, and how you can start on a path to become one.
Data analysts are in high demand. The World Economic Forum listed it as
number two in growing jobs in the US. The Bureau of Labor Statistics also
reports related occupations as having extremely high growth rates [1].
From 2019 to 2029, operations research analyst positions are expected to grow
by 28 percent, market research analysts by 18 percent, and mathematicians and
statisticians by 33 percent. That’s a lot higher than the total employment
growth rate of four percent.
People who perform data analysis might have other titles such as:
• Medical and healthcare analyst
• Marketing research analyst
• Business analyst
• Operations research analyst
• Intelligence analyst
If that seems like a lot, don’t worry—there are plenty of courses that will walk
you through the basics of the hard skills you need as a data analyst. This IBM
Data Analyst Professional Certificate course on Coursera can be a good place
to start.
Data analyst workplace skills
• Problem solving: A data analyst needs to have a good understanding of the
question being asked and the problem that needs to be solved. They also should
be able to find patterns or trends that might reveal a story. Having the critical
thinking skills will allow you to focus on the right types of data, recognize the
most revealing methods of analysis, and catch gaps in your work.
• Communication: Being able to get your ideas across to other people will be
crucial to your work as a data analyst. Strong written and speaking skills to
communicate with colleagues and other stakeholders are good assets in data
analysts.
• Industry knowledge: Knowing about the industry you work in—healthcare,
business, finance, or otherwise—will give you an advantage in your work and
in job applications. If you’re trying to break into a specific industry, take some
time to pay attention to the news in your industry, or read a book on the
subject. This can familiarize you with the industry’s main issues and trends.