Basics of Data Analytics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4
At a glance
Powered by AI
The key takeaways are that data analytics involves collecting, processing, analyzing and communicating data to support decision making. It uses techniques from statistics, machine learning and data visualization. The main goals are to describe past events, diagnose reasons for outcomes, predict likely future events and prescribe optimal courses of action.

The main steps involved in the data analysis process are: data requirements specification, data collection, data processing, data cleaning, data analysis and communication of results.

The four basic types of data analytics are: descriptive analytics, diagnostic analytics, predictive analytics and prescriptive analytics.

What Is Data Analytics?

Data analytics is the science of analyzing raw data in order to make conclusions about that
information. Many of the techniques and processes of data analytics have been automated into
mechanical processes and algorithms that work over raw data for human consumption.

Data Analytics Basics: Statistics + Coding +


Business Thinking

Data analytics (DA) is the process of


examining data sets in order to draw conclusions
about the information they contain, increasingly with
the aid of specialized systems and software. Data
analytics technologies and techniques are widely
used in commercial industries to enable organizations
to make more-informed business decisions and by
scientists and researchers to verify or disprove
scientific models, theories and hypotheses.

The process involved in data analysis involves several different steps:


Data Analysis Process consists of the following phases that are iterative in nature −
 Data Requirements Specification
 Data Collection
 Data Processing
 Data Cleaning
 Data Analysis
 Communication

Data Analysis is a process of collecting, transforming,


cleaning, and modeling data with the goal of discovering
the required information. The results so obtained are
communicated, suggesting conclusions, and supporting
decision-making. Data visualization is at times used to
portray the data for the ease of discovering the useful
patterns in the data.

Data Requirements Specification


The data required for analysis is based on a question or
an experiment. Based on the requirements of those
directing the analysis, the data necessary as inputs to the
analysis is identified (e.g., Population of people).
Data Collection
Data Collection is the process of gathering information on targeted variables identified as data
requirements. The emphasis is on ensuring accurate and honest collection of data. Data
Collection ensures that data gathered is accurate such that the related decisions are valid. Data
Collection provides both a baseline to measure and a target to improve.
Data Processing
The data that is collected must be processed or organized for analysis. This includes structuring
the data as required for the relevant Analysis Tools. For example, the data might have to be
placed into rows and columns in a table within a Spreadsheet or Statistical Application.
Data Cleaning
The processed and organized data may be incomplete, contain duplicates, or contain errors.
Data Cleaning is the process of preventing and correcting these errors. There are several types
of Data Cleaning that depend on the type of data. For example, while cleaning the financial
data, certain totals might be compared against reliable published numbers or defined thresholds.
Data Analysis
Data that is processed, organized and cleaned would be ready for the analysis. Various data
analysis techniques are available to understand, interpret, and derive conclusions based on the
requirements.
Communication
The results of the data analysis are to be reported in a format as required by the users to support
their decisions and further action.

Types of Data Analytics

Data analytics is broken down into four basic types.


1. Descriptive analytics describes what has happened over a given period of time. Have the
number of views gone up? Are sales stronger this month than last?
2. Diagnostic analytics focuses more on why something happened. This involves more
diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales? Did that
latest marketing campaign impact sales?
3. Predictive analytics moves to what is likely going to happen in the near term. What
happened to sales the last time we had a hot summer? How many weather models predict
a hot summer this year?
4. Prescriptive analytics suggests a course of action. If the likelihood of a hot summer is
measured as an average of these five weather models is above 58%, we should add an
evening shift to the brewery and rent an additional tank to increase output.
Common terminologies used in data analytics
Business Intelligence (BI). Developing intelligent applications that are capable of extracting data
from both the internal and external environment to help executives make strategic decisions in an
organization.
1. Automatic identification and capture (AIDC). It is any method that can automatically
identify and collect data on items, and store them in a computer system.
2. Avro. It is a data serialization system that facilitates encoding of a database schema in
Hadoop.
3. Behavioral analytics. It involves using data about people’s behavior to infer their intent
and predict their future actions.
4. Big Data Scientist. A professional who can develop algorithms that make sense from big
data.
5. Cascading. It is used in Hadoop to explain the concept of providing a higher level of
abstraction. This allows developers to create complex jobs using different programming
languages in the JVM.
6. Classification analysis. It is a systematic process of obtaining crucial and relevant
information about raw data and its metadata.
7. Database. A digital collection of logically related and shared data.
8. Database administrator (DBA). A professional, often certified that is responsible for
developing and maintaining the integrity of the database.
9. Database management system (DBMS). A software that creates and manipulates
database systems in a structured format.
10. Data cleansing. The process of reviewing and revising data to eliminate duplicate entries,
correct spelling mistakes and add missing data.
11. Data collection. Any process that leads to the acquisition of data.
12. Data-directed decision making. Using database as the basis to support making crucial
decisions.
13. Data exhaust. The by-product that is created by a person who uses database system.
14. Data feed. A means for any person to receive a stream of data such as RSS.
15. Data governance. A set of processes that promotes the integrity of the data stored in a
database system.
16. Data integration. The act of combining data from diverse and disparate sources and
presenting it in a single coherent and unified view.
17. Data integrity. The validity or correctness of data stored in a database. It ensures
accuracy, timeliness, and completeness of data.
18. Data migration. The process of moving data from one storage location or server to
another while maintaining its format.
19. Data mining. The process of obtaining patterns or knowledge from large sets of
databases.
20. Data science. A discipline that incorporates the use of statistics, data visualization,
machine learning, computer programming and data mining database to solve complex
problems in organizations.
21. Data scientist. A professional who is knowledgeable in data science.
22. Machine learning. Using algorithms to allow computers to analyze data for the purpose
of extracting information to take specific actions based on specific events or patterns.
23. MongoDB. It is a NoSQL database system that is oriented to documents and developed
under the open source concept. It uses JSON to save data structures in documents with a
dynamic scheme.
24. Qualitative analysis. The process of analyzing qualitative data by interpreting words and
text
25. Quantitative analysis. The process of analyzing quantitative data by interpreting
numerical data.
26. Quartiles. The lower (Q1) quartile is the value below for which the bottom 25% of any
sampled data lies, and the upper (Q3) quartile is the value above which the upper 25% of
sampled data lies.
27. R. It is an open source programming language for performing data analysis.
28. Random sample. Every member of a given population has an equal chance of being
selected in a random sample. The random sample is the representative of the population
that is being studied.

You might also like