New Ebook Guide To AI & Data Science
New Ebook Guide To AI & Data Science
New Ebook Guide To AI & Data Science
CAREERS
YOUR BLUEPRINT IN ARTIFICIAL
INTELLIGENCE & DATA SCIENCE
M O H A M M A D A R S H A D
2 0 2 3 E D I T I O N
Table of Contents
Preface
1. Roadmap to get into Data Science 1-4
2. Basic Python 5-9
3. The Signficance of Python in DS 10-19
4. Numpy Guide 24-29
5. Pandas Guide
30-35
6. Loc and Iloc in Panda
36-47
7. Data Visualization with Seaborn
48-52
8. Matplotlib Guide
53-58
9. Regression Analysis
59-64
10. Logistic Regression Explained
65-69
11. Introduction to Decision Tree
70-73
12. KMeans Cheat Sheet
74-76
13. Hypothesis Testing
77-120
14. Data Science Roadmap
15. SQL Operators 121-130
P
Objective
P
Introduction to Decision Tree and KMeans
Cheat Sheet: Exploring classification and
clustering algorithms.
P
DAX in PowerBI: A guide to Data Analysis
Expressions in Power BI.
P
Audience
Concluding Remarks
1
6 Steps to Master
1. Excel
2. PowerBI
3. SQL for Data Science
4. Python for Data Science
5. Maths & Stats for DS
6. Capstone Project
2
Master Excel
Mastering Excel for Data Science involves learning how to
effectively use Excel as a tool to collect, clean, analyze, and
visualize data.
PowerBI
Power BI is an important tool for Data Science that offers a wide
range of benefits.
SQL For DS
SQL (Structured Query Language) is a programming language that
is widely used for Data Science. It is used to manipulate and
retrieve data stored in relational databases, making it an essential
tool for data analysis
3
Python For DS
Python is a popular programming language used extensively in the
field of Data Science.
Maths/Stats For DS
Mathematics and statistics are fundamental to the field of Data
Science.
Capstone Project
Project work is an essential part of Data Science education.
4
The Significance
of Python in Data
Science &
Artificial
Intelligence
Introduction
In the realm of data science, where vast
amounts of information are harnessed
to extract valuable insights and drive
informed decision-making, the choice
of programming language is of
paramount importance. Among the
plethora of languages available, Python
has risen to prominence as the
preferred tool for data scientists. In this
chapter, we delve into the multifaceted
significance of Python in the field of
data science, elucidating why it has
become the lingua franca for data
professionals.
Versatility: A Multifaceted Toolbox
NAS.IO/ARTIFICIALINTELLIGENCE
Basic Data Types
1. Integer (int)
• Definition: Represents whole numbers.
Characteristics:
No decimal points, can be positive or
negative.
Size is only limited by available memory.
Typical Operations: Arithmetic operations,
bitwise operations. Methods: bit_length(),
to_bytes().
Use Cases: Counting, indexing, mathematical
calculations in discrete mathematics.
2. Array Creati on
Creating NumPy arrays from lists and with initial placeholders:
6
3. Array Att ri but es
Getting an array's shape and data type:
7
5. Array Mani pulati on
Various ways to manipulate arrays such as reshaping, stacking, and
splitting:
8
7. Statistical Operati ons
9
PANDAS
Cheat Sheet
1. Basic Commands
Pandas is a software library for Python that provides tools for data
manipulation and analysis. It's important to ensure that the correct
version of pandas is installed for
compatibility with your code.
- Importing Pandas:
2. Dataframe Creati on
Dataframes are two-dimensional labeled data structures with
columns potentially of different types.
You can think of it like a spreadsheet or SQL table.
- From a list:
11
- From a Dictionary:
3. Data Select i on
Pandas provides different methods for data selection.
- Selecting a column:
12
- Selecting multiple columns:
- Selecting rows:
13
4. Data Mani pulati on
Pandas provide various ways to manipulate a dataset.
- Adding a column:
- Deleting a column:
- Renaming columns:
14
- Applying a function to a column:
5. Data Cleani ng
Data cleaning is detecting and correcting (or removing) corrupt or
inaccurate records from a dataset.
- Checking for null values:
15
- Filling null values:
- Replacing values:
16
- Aggregation:
- Merging:
17
- Joining:
8. Working wi t h Dat es
Pandas provides powerful functionalities for working with dates.
- Convert to datetime:
18
9. File I/O
Pandas can seamlessly read from and write to a variety of file
formats.
- Reading a CSV file:
19
ILOC AND
LOC IN
PANDA
Loc allows you to select data based
on the label or name of the row or
column, while iloc uses the number
or index of the row or column.
21
LOC
THE LOC() FUNCTION IS LABEL BASED DATA SELECTING METHOD
WHICH MEANS THAT WE HAVE TO PASS THE NAME OF THE ROW OR
COLUMN WHICH WE WANT TO SELECT.
ILOC
22
5
23
Data
Visualization
with Seaborn
Introduction to Seaborn, its various
functionalities, and sample graphs using
the provided dataset.
25
2. Distribution Plots
Description:
Distribution plots are used to visualize the distribution of a
dataset. Common distribution plots in Seaborn include the
histogram, KDE (Kernel Density Estimation), and the rug plot.
Code Snippet:
3. Categorical Plots
Description:
Categorical plots are used to visualize categorical data.
Examples include bar plots, box plots, and violin plots.
Code Snippet:
26
4. Matrix Plots
Description:
Matrix plots are used to display data in a matrix format.
The heatmap is a common matrix plot used to represent data in a
color-encoded matrix format.
Code Snippet:
27
5. Pair Plots
Description:
Pair plots are used to visualize relationships between multiple
variables in a dataset.
It plots pairwise relationships in a dataset.
Code Snippet:
6. Regression Plots
Description:
Regression plots are used to visualize the relationship between two
variables and fit a regression line.
Code Snippet:
28
7. Styling and Themes
Description:
Seaborn allows for the customization of plots using various styles
and themes.
This ensures that plots are both informative and aesthetically
pleasing.
Code Snippet:
29
MATPLOTLIB
Cheat Sheet
1. Basic Commands
Matplotlib is a plotting library for the Python programming language
and its numerical mathematics extension NumPy.
- Importing Matplotlib:
2. Basic Plot t i ng
Matplotlib provides functionalities for various types of plots.
31
3. Figure and Axes
A figure in matplotlib means the whole window in the
user interface. Axis are the number-line-like objects and
they take care of generating the graph limits.
- Creating Figure and Axes:
32
4. Customizi ng Plot s
Matplotlib allows you to customize various aspects of your
plots.
- Changing Line Style and Color:
- Adding Grid:
33
5. Multiple Plot s
Matplotlib provides functionalities to create multiple plots in a single
figure.
- Subplots:
- Sharing Axis:
34
- Adding Annotations:
7. Saving Figures
Matplotlib provides the savefig() function to save the
current figure to a file.
- Saving Figures as PNG, PDF, SVG, and more:
35
REGRESSION
ANALYSIS
Agenda
Introduction to Regression Analysis
–What is Regression Analysis
–Why do we need Regression
Analysis in Business
– Introduction to Modeling
Introduction to OLS Regression
Introduction to Modeling Process
37
What is Regression Analysis?
Regression Analysis captures the relationship between one or more response
variables (dependent/predicted variable –denoted by Y) and the its predictor
variables (independent/explanatory variables –denoted by X) using historical
observations of both.
Bad
Your
Company
38
Types of Regression Analysis
There are various kinds of Regressions based on the nature
of : - •the functional form of the relationship
•the residual
•the dependent variable
•the independent variables
39
Types of Linear Regression
40
Agenda
Introduction to Regression Analysis
–What is Regression Analysis
–Why do we need Regression Analysis
in Business – Introduction to Modeling
Introduction to OLS Regression
Introduction to Modeling Process
41
What is Modeling?
Is based on Regression Analysis. It can be used
for the following two distinct but related
purposes
1. Predict certain events
2. Identify the drivers of certain events based
on some explanatory variables
Isolates individual effects and then quantifies
the magnitude of that driver to its impact on the
dependent variable. It is required because
1.Knowledge of Y is crucial for decision
making but is not deterministic
2. X is available at the time of decision
making and is related to Y
42
Agenda
•Introduction to Regression Analysis
43
Introduction to Ordinary Least Squares
Dependent Variable Type Residual Distribution Types of Regression
45
OLS Model Assumptions
1. Linearity -Model is
Yi=a+b1X1i+b2X2i+…+bpXpi+ei
linear in parameters
2. Spherical Errors - Error
distribution is Normal e2i ~ Normal(0, σ)
with mean 0 & constant
variance
3. Zero Expected Error - E(ei)=0 for all i
The expected value (or
mean) of the errors is
Variance(ei)=constant for all i
always zero
4. Homoskedasticity - The
errors have constant corr(ei, ej)=0 for all i≠j
variance
5. Non-Autocorrelation -
The errors are
Covariance (Xi,Xj) = 0
statistically
independent from one
another. This implies
the data is a random
sample of the
population
6. Non-Multicollinearity -
The independent
variables are not
collinear
46
Steps in OLS Regression
Assume all OLS assumptions hold
47
Logistics
Regression
Explained
Logistic Regr essi on – Int roduct i on
In Linear regression, the outcome variable is continuous and the
predictor variables can be a mix of numeric and categorical. But often
there are situations where we wish to evaluate the effects of multiple
explanatory variables on a binary outcome variable
49
Process Flow
Data Factor
Data In Preparation/ Identification Analysis or
Python of Variables
Cleaning Correlation
Creation of Logistic KS
Validate Output
Modeling Regression in Statistic
Output
Dataset Python
50
Python code
Step 1: Importing the dataset
Step 2: Splitting the dataset into the Training set and Test set
51
Step 5:Predicting a new result
Practice
For location of code and dataset
https://github.com/arshad831/Modelling-
Exercise/blob/main/logistic_regression.ipynb
52
INTRODUCTION
TO
DECISION TREE
1
Decision Trees
A decision tree is a decision support tool that uses a tree-like model
of decisions and their possible consequences, including chance
event outcomes, resource costs, and utility. It is one way to display
an algorithm that only contains conditional control statements.
54
Importance of decision tree:
Decision trees are a popular method for various classification and
regression tasks. For example, in medical diagnosis, decision trees
have been used to classify diseases based on symptoms. In credit
scoring, decision trees are used to predict the probability of default.
Decision Trees
We were creating a decision tree to predict whether or not someone
is likely to go to the beach.
Predictors Outcome
Sky Arshad goes to the
Weekend beach?
Wind Speed
55
Data
56
Advantages of decision trees:
1. Decision trees are easy to interpret and explain.
2. They can handle both numeric and categorical data.
3. They are resistance to overfitting.
4. They can be used for feature selection.
5. They are non-parametric, meaning they make no
assumptions about the underlying data distribution.
57
58
K-Means
Clustering
Detailed
Steps
with Code
Introduction to K-Means Clustering
Definition and Description of K-Means Clustering: K- Means is a type
of partitioning clustering that separates the data into K non-
overlapping subsets (or clusters) without any cluster-internal
structure.
Overview of Unsupervised Learning: In unsupervised learning, the
goal is to identify useful patterns and structure from the input data.
K-Means is an unsupervised learning algorithm as it forms clusters
based on the input data without referring to known, or labelled,
outcomes.
Use Cases for K-Means Clustering: Applications in various fields like
market segmentation, image segmentation, anomaly detection, etc.
60
Python setup and data preparation
Required Python Libraries: Detail libraries such as pandas for data
manipulation, numpy for numerical operations, matplotlib and
seaborn for visualization, and scikit-learn for the K-Means algorithm.
Data Preparation: Discuss the importance of data cleaning,
normalization, and dealing with missing values. Include code
examples of these tasks using pandas and scikit-learn.
61
Implementing K-Means Clustering in Python
Detailed Code Example: Provide a step-by-step walkthrough of a
Python implementation of the K- Means algorithm using scikit-
learn. Discuss each step in detail, including the importance of
setting the random seed for reproducibility.
62
Evaluating K-Means Clustering
Evaluation Metrics: Discuss how to evaluate the clustering result
using metrics like Within Cluster Sum of Squares (WCSS), between
cluster sum of squares (BCSS), and silhouette score.
The Elbow Method: Explain and provide a code snippet to
demonstrate the Elbow Method, a visual tool to estimate the optimal
number of clusters by plotting the explained variation as a function
of the number of clusters.
63
Visualizing K-Means Clustering
Visualization Techniques: Discuss and provide Python code
examples for visualizing K-Means Clustering results. This could
include scatter plots of the data points colored by cluster and
indicating the centroids, as well as pair plots for multi-dimensional
data.
64
Hypothesis
Testing
Why Significance Testing ?
66
Hypothesis Testing
Medicine
Experiment
Questions
Medicine Next
Sample 14
Experiment Men
36 Women
Step 1a) Design Hypothesis
67
68
Scenarios for hypothesis testing
A recent national studyfound that the average
Americanbetween the ages of 18 and 24checks their phone 74
times per day.A mobile service provider questions these results
69
Data
Science
Road Map
71
72
73
SQL
OPERATORS
SQL OPERATORS
Operators are used to specify conditions in an SQL statement
and to serve as conjunctions for multiple conditions in a
statement.
OPERATORS
ARITHMETIC
LOGICAL
ARITHMETIC OPERATORS
75
LOGICAL OPERATORS
76
SQL
IMPORTANT
INTERVIEW
QUESTIONS
Answers taken from online portal for education persons .
Not for commercial use.
SQL INTERVIEW QUESTIONS
3.What are the differences between DDL, DML and DCL in SQL?
Ans: Following are some details of three.
DDL stands for Data Definition Language. SQL queries like CREATE, ALTER,
DROP and
RENAME come under this.
DML stands for Data Manipulation Language. SQL queries like SELECT, INSERT
and
UPDATE come under this.
DCL stands for Data Control Language. SQL queries like GRANT and REVOKE
come under
this.
78
4.What is the difference between having and where clause? Ans: HAVING is used to
specify a condition for a group or an aggregate function used in select statement. The
WHERE clause selects before grouping. The HAVING clause selects rows after grouping.
Unlike HAVING clause, the WHERE clause cannot contain aggregate functions.
5.What is Join?
Ans: An SQL Join is used to combine data from two or more tables, based on a
common
field between them. For example, consider the following two tables.
Student Table
EnrollNo StudentName Address
1000 geek1 geeksquiz1
1001 geek2 geeksquiz2
1002 geek3 geeksquiz3
StudentCourse Table
79
CourseID EnrollNo
1 1000
2 1000
3 1000
1 1002
2 1003
CourseID StudentName
1 geek1
1 geek2
2 geek1
2 geek3
3 geek1
14.What is Identity?
Ans: Identity (or AutoNumber) is a column that automatically generates numeric
values. A start and increment value can be set, but most DBA leave these at 1. A
GUID column also generates numbers; the value of this cannot be controlled.
Identity/GUID columns do not need to be indexed.
80
16. What are the uses of view?
1. Views can represent a subset of the data contained in a table; consequently, a
view can Limit the degree of exposure of the underlying tables to the outer world: a
given user
may have permission to query the view, while denied access to the rest of the base
table.
2. Views can join and simplify multiple tables into a single virtual table
3. Views can act as aggregated tables, where the database engine aggregates data
(sum, average etc.) and presents the calculated results as part of the data
4. Views can hide the complexity of data; for example, a view could appear as
Sales2000 or Sales2001, transparently partitioning the actual underlying table.
5. Depending on the SQL engine used, views can provide extra security.
81
21.What are indexes?
Ans: A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and the use of more
storage space to maintain the extra copy of data.
Data can be stored only in one order on disk. To support faster access according to
different values, faster search like binary search for different values is desired, For this
purpose, indexes are created on tables. These indexes need extra space on disk, but
they allow faster search according to different frequently searched values.
24.What is SQL ?
Ans: Structured Query Language(SQL) is a language designed specifically for
communicating with databases. SQL is an ANSI (American National Standards
Institute) standard .
25. What are the different type of SQL or different commands in SQL?
Ans: Frequently Asked SQL Interview Questions
1.DDL – Data Definition Language. DDL is used to define the structure that holds the data.
2. DM– Data Manipulation Language DML is used for the manipulation of the data itself. Typical
operations are Insert, Delete, updating, and retrieving the data from the table
3. DCL–Data Control Language DCL is used to control the visibility of data like grantingdatabase
access and setting privileges to create tables etc.
4.TCL-TransactionControl Language
82
26. What are the Advantages of SQL?
1. SQL is not a proprietary language used by specific database vendors. Almost every
Major DBMS supports SQL, so learning this one language will enable programmers to
interact with any database like ORACLE, SQL, MYSQL, etc.
2. SQL is easy to learn. The statements are all made up of descriptive English words, and
there aren't that many of them.
3. SQL is actually a very powerful language and by using its language elements you
can perform very complex and sophisticated database operations.
83
32. What is a Database Lock?
Database lock tells a transaction if the data item in question is currently being
used by other transactions.
33. What are the types of locks?
1. Shared Lock
When a shared lock is applied on data item, other transactions can only read the
item, but can't write into it.
2. Exclusive Lock
When a exclusive lock is applied on data item, other transactions can't read or
write into the data item.
84
36.What is a Composite Key ?
A Composite primary key is a type of candidate key, which represents a set of columns
whose values uniquely identify every row in a table.
For example - if "Employee_ID" and "Employee Name" in a table is combined to
uniquely identifies a row its called a Composite Key.
85
41. Define SQL Update Statement ?
SQL Update is used to update data in a row or set of rows specified in the filter
condition. The basic format of an SQL UPDATE statement is ,Update command
followed by table to be updated and SET command followed by column names and
their new values followed by filter condition that determines which rows should be
updated
45.What is Self-Join?
Self-join is query used to join a table to itself. Aliases should be used for the same table
comparison.
86
46. What is Cross Join?
Cross Join will return all records where each row from the first table is combined with each
row from the second table.
SQL Interview Questions and answers on Database Views
87
52.What is a trigger?
Database are set of commands that get executed when an event(Before Insert, After
Insert, On Update, On delete of a row) occurs on a table, views.
53. Explain the difference between DELETE , TRUNCATE and DROP commands?
Once delete operation is performed, Commit and Rollback can be performed to retrieve
data.
Once truncate statement is executed, Commit and Rollback statement cant be
performed. Where condition can be used along with delete statement but it cant be used
with truncate statement.
Drop command is used to drop the table or keys like primary,foreign from a table.
54. What is the difference between Cluster and Non cluster Index?
A clustered index reorders the way records in the table are physically stored. There can
be only one clustered index per table. It make data retrieval faster.
A non clustered index does not alter the way it was stored but creates a complete
separate object within the table. As a result insert and update command will be faster.
88
Committed, Repeatable Read, Serializable. See SQL Server books online for an
explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION
LEVEL, which lets you customize the isolation level at the connection level.
89
DELETE TABLE is a logged operation, so the deletion of each row gets logged in the
transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table,
but it won't log the deletion of each row, instead, it logs the de-allocation of the data
pages of the table, which makes it faster. Of course, the TRUNCATE TABLE can be rolled
back.
71. What are the new features introduced in SQL Server 2000 (or the latest release of
SQL Server at the time of your interview)?
What changed between the previous version of SQL Server and the current version? This
question is generally asked to see how current is your knowledge. Generally there is a
section in the beginning of the books online titled "What's New", which has all such
information. Of course, reading just that is not enough, you should have tried those things to
better answer the questions. Also check out the section titled "Backward Compatibility" in
books online which talks about the changes that have taken place in the new version.
75. What are the steps you will take to improve performance of a poor performing
query?
This is a very open ended question and there could be a lot of reasons behind the
poor performance of a query. But some general issues that you could talk about
would be: No indexes, table scans, missing or out of date statistics, blocking,
excess recompilations of stored procedures, procedures and triggers without SET
NOCOUNT ON, poorly written query with unnecessarily complicated joins, too
much normalization, excess usage of cursors and temporary tables.
Some of the tools/ways that help you troubleshooting performance problems are: SET
SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler,
Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.
76. What are the steps you will take, if you are tasked with securing an SQL Server?
Again this is another open ended question. Here are some things you could talk about:
Preferring NT authentication, using server, database and application roles to control access
to the data, securing the physical database files using NTFS permissions, using
an unguessable SA password, restricting physical access to the SQL Server, renaming
the Administrator account on the SQL Server computer, disabling the Guest account, enabling
auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL
Server from the web server etc.
77. What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
Deadlock is a situation when two processes, each having a lock on one piece of data, attempt
to acquire a lock on the other's piece. Each process would wait indefinitely for the other to
release the lock, unless one of the user processes is terminated. SQL Server
detects deadlocks and terminates one user's process.
A live lock is one, where a request for an exclusive lock is repeatedly denied because a series
of overlapping shared locks keeps interfering. SQL Server detects the situation after four
denials and refuses further shared locks. A live lock also occurs when read transactions
monopolize a table or page, forcing a write transaction to wait indefinitely
91
78. What is blocking and how would you troubleshoot it?
Blocking happens when one connection from an application holds a lock and a second
connection requires a conflicting lock type. This forces the second connection to wait,
blocked on the first.
80. How to restart SQL Server in single user mode? How to start SQL Server in minimal
configuration mode?
SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has
some very important parameters with which a DBA should be familiar with. -m is used
for starting SQL Server in single user mode and -f is used to start the SQL Server in
minimal configuration mode. Check out SQL Server books online for more parameters
and their explanations.
81. As a part of your job, what are the DBCC commands that you commonly use for
database maintenance?
DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC
SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole
load of DBCC commands which are very useful for DBAs. Check out SQL Server books
online for more information.
82. What are statistics, under what circumstances they go out of date, how do you
update them?
Statistics determine the selectivity of the indexes. If an indexed column has unique
values then the selectivity of that index is more, as opposed to an index with non-unique
values. Query optimizer uses these indexes in determining whether to choose an index or
not while executing a query.
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
92
2) If a large amount of data in an indexed column has been added, changed, or removed
(that is, if the distribution of key values has changed), or the table has been truncated
using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version
108.Define constraints.
Constraints enforce integrity of the database. Constraints can be of following types Not Null
Check
Unique
Primary key Foreign key
110.Define Trigger.
Triggers are similar to stored procedure except it is executed automatically when any
operations are occurred on the table.
93
113. What is Normalization
?Database normalization is a data design and organization process applied to data
structures based on rules that help building relational databases. In relational
database design, the process of organizing data to minimize redundancy is called
normalization. Normalization usually involves dividing a database into two or
more tables and defining relationships between the tables. The objective is to
isolate data so that additions, deletions, and modifications of a field can be made
in just one table and then propagated through the rest of the database via the
defined relationships.
94
7. ONF: Optimal Normal Form A model limited to only simple (elemental) facts, as
expressed in Object Role Model notation.
8. DKNF: Domain-Key Normal Form A model free from all modification anomalies is
said to be in DKNF.
Remember, these normalization guidelines are cumulative. For a database to be in 3NF,
it must first fulfill all the criteria of a 2NF and 1NF database.
95
119. What is View?
A simple view can be thought of as a subset of a table. It can be used for retrieving data, as
well as updating or deleting rows. Rows updated or deleted in the view are updated or
deleted in the table the view was created with. It should also be noted that as data in the
original table changes, so does data in the view, as views are the way to look at part of the
original table. The results of using a view are not permanently stored in the database. The
data accessed through a view is actually constructed using standard T-SQL select command
and can come from one to many different base tables or even other views.
96
123. What is Collation?
Collation refers to a set of rules that determine how data is sorted and compared.
Character data is sorted using rules that define the correct character sequence, with
options for specifying case sensitivity, accent marks, kana character types and character
width.
97
3. Outer Join A join that includes rows even if they do not have related rows in the joined
table is an Outer Join. You can create three different outer join to specify the
unmatched rows to be included:
1. Left Outer Join: In Left Outer Join all rows in the first-named table i.e. "left"
table, which appears leftmost in the JOIN clause are included. Unmatched rows
in the right table do not appear.
2. Right Outer Join: In Right Outer Join all rows in the second-named table i.e.
"right" table, which appears rightmost in the JOIN clause are included.
Unmatched rows in the left table are not included.
3. Full Outer Join: In Full Outer Join all rows in all joined tables are included,
whether they are matched or not.
4. Self Join This is a particular case when one table joins to itself, with one or two
aliases to avoid confusion. A self join can be of any type, as long as the joined
tables are the same. A self join is rather unique in that it involves a relationship
with only one table. The common example is when company has a hierarchal
reporting structure whereby one member of staff reports to another. Self Join
can be Outer Join or Inner Join.
128. What is User Defined Functions? What kind of User-Defined Functions can be
created?
User-Defined Functions allow defining its own T-SQL functions that can accept 0 or more
parameters and return a single scalar data value or a table data type. Different Kinds of
User-Defined Functions created are:
1. Scalar User-Defined Function A Scalar user-defined function returns one of the
scalar data types. Text, image and timestamp data types are not supported. These are
the type of user-defined functions that most developers are used to in other
programming languages. You pass in 0 to many parameters and you get a return value.
2. Inline Table-Value User-Defined Function An Inline Table-Value user-defined function
returns a table data type and is an exceptional alternative to a view as the user-defined
function can pass parameters into a T-SQL select command and in essence provide us
with a parameterized, non-updateable view of the underlying tables.
98
3. Multi-statement Table-Value User-Defined Function A Multi-Statement TableValue
user- defined function returns a table and is also an exceptional alternative to a view as
the function can support multiple T-SQL statements to build the final result where the
view is limited to a single SELECT statement. Also, the ability to pass parameters into a
TSQL select command or a group of them gives us the capability to in essence create a
parameterized, non-updateable view of the data in the underlying tables. Within the
create function command you must define the table structure that is being returned. After
creating this type of user-defined function, It can be used in the FROM clause of a T-SQL
command unlike the behavior found when using a stored procedure which can also
return record sets.
130. Which TCP/IP port does SQL Server run on? How can it be changed?
SQL Server runs on port 1433. It can be changed from the Network Utility TCP/IP
properties.
131. What are the difference between clustered and a non-clustered index?
1. A clustered index is a special type of index that reorders the way records in the
table are physically stored. Therefore table can have only one clustered index. The
leaf nodes of a clustered index contain the data pages.
2. A non clustered index is a special type of index in which the logical order of the
index does not match the physical stored order of the rows on disk. The leaf node of a
non clustered index does not consist of the data pages. Instead, the leaf nodes
contain index rows.
132. What are the different index configurations a table can have?
A table can have one of the following index configurations:
1. No indexes
2. A clustered index
3. A clustered index and many nonclustered indexes
4. A non clustered index
5. Many non clustered indexes
99
134. What are different types of Collation Sensitivity?
1. Case sensitivity - A and a, B and b, etc.
2. Accent sensitivity
3. Kana Sensitivity - When Japanese kana characters Hiragana and Katakana are
treated differently, it is called Kana sensitive.
4. Width sensitivity - A single-byte character (half-width) and the same character
represented as a double-byte character (full-width) are treated differently than it is
width sensitive.
136. What's the difference between a primary key and a unique key?
Both primary key and unique key enforces uniqueness of the column on which they are
defined. But by default primary key creates a clustered index on the column, where are
unique creates a non clustered index by default. Another major difference is that, primary
key doesn't allow NULLs, but unique key allows one NULL only.
100
5. TRUNCATE cannot be rolled back.
6. TRUNCATE is DDL Command.
7. TRUNCATE Resets identity of the table
2. DELETE:
1. DELETE removes rows one at a time and records an entry in the transaction
log for each deleted row.
2. If you want to retain the identity counter, use DELETE instead. If you want to
remove table definition and its data, use the DROP TABLE statement.
3. DELETE Can be used with or without a WHERE clause
4. DELETE Activates Triggers.
5. DELETE can be rolled back.
6. DELETE is DML Command.
7. DELETE does not reset identity of the table.
Note: DELETE and TRUNCATE both can be rolled back when surrounded by
TRANSACTION if the current session is not closed. If TRUNCATE is written in Query Editor
surrounded by TRANSACTION and if session is closed, it can not be rolled back but
DELETE can be rolled back.
139. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
They specify a search condition for a group or an aggregate. But the difference is that
HAVING can be used only with the SELECT statement. HAVING is typically used in a
GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.
Having Clause is basically used only with the GROUP BY function in a query whereas
WHERE
Clause is applied to each row before they are part of the GROUP BY function in a query.
101
2. Types of Sub-Query
1. Single-row sub-query, where the sub-query returns only one row.
2. Multiple-row sub-query, where the sub-query returns multiple rows,. and 3.
Multiple column sub-query, where the sub-query returns multiple columns
142. What are the authentication modes in SQL Server? How can it be changed?
Windows mode and Mixed Mode - SQL and Windows. To change authentication mode in
SQL Server click Start, Programs, Microsoft SQL Server and click SQL Enterprise
Manager to run SQL Enterprise Manager from the Microsoft SQL Server program group.
Select the server then from the Tools menu select SQL Server Configuration Properties,
and choose the Security page.
143. Which command using Query Analyzer will give you the version of SQL server
and operating system?
SELECT SERVERPROPERTY ('productversion'), SERVERPROPERTY ('productlevel'),
SERVERPROPERTY ('edition').
102
145. Can a stored procedure call itself or recursive stored procedure? How much
level SP nesting is possible?
Yes. Because Transact-SQL supports recursion, you can write stored procedures
that call themselves. Recursion can be defined as a method of problem solving
wherein the solution is arrived at by repetitively applying it to subsets of the
problem. A common application of recursive logic is to perform numeric
computations that lend themselves to repetitive evaluation by the same processing
steps. Stored procedures are nested when one stored procedure calls another or
executes managed code by referencing a CLR routine, type, or aggregate. You can
nest stored procedures and managed code references up to 32 levels
.
146. What is Log Shipping?
Log shipping is the process of automating the backup of database and transaction
log files on a production SQL server, and then restoring them onto a standby server.
Enterprise Editions only supports log shipping. In log shipping the transactional log
file from one server is automatically updated into the backup database on the other
server. If one server fails, the other server will have the same db and can be used
this as the Disaster Recovery plan. The key feature of log shipping is that it will
automatically backup transaction logs throughout the day and automatically
restore them on the standby server at defined interval.
147. Name 3 ways to get an accurate count of the number of records in a table?
SELECT * FROM table1
SELECT COUNT(*) FROM table1
SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2
148. What does it mean to have QUOTED_IDENTIFIER ON? What are the
implications of having it OFF?
When SET QUOTED_IDENTIFIER is ON, identifiers can be delimited by double
quotation marks, and literals must be delimited by single quotation marks. When
SET QUOTED_IDENTIFIER is OFF, identifiers cannot be quoted and must follow all
TransactSQL rules for identifiers.
149. What is the difference between a Local and a Global temporary table?
1. A local temporary table exists only for the duration of a connection or, if defined
inside a compound statement, for the duration of the compound statement.
103
2. A global temporary table remains in the database permanently, but the rows exist only
within a given connection. When connection is closed, the data in the global temporary
table disappears. However, the table definition remains with the
database for access when database is opened next time.
150. What is the STUFF function and how does it differ from the REPLACE function?
STUFF function is used to overwrite existing characters. Using this syntax, STUFF
(string_expression, start, length, replacement_characters), string_expression is the string
that will have characters substituted, start is the starting position, length is the number
of characters in the string that are substituted, and replacement_characters are the new
characters interjected into the string. REPLACE function to replace existing characters of
all occurrences. Using the syntax REPLACE (string_expression, search_string,
replacement_string), where every incidence of search_string found in the
string_expression will be replaced with replacement_string.
104
155. What is NOT NULL Constraint?
A NOT NULL constraint enforces that the column will not accept null values. The not null
constraints are used to enforce domain integrity, as the check constraints.
159. What is a table called, if it has neither Cluster nor Non-cluster Index? What is it
used for?
Unindexed table or Heap. Microsoft Press Books and Book on Line (BOL) refers it as
Heap. A heap is a table that does not have a clustered index and, therefore, the pages
are not linked by pointers. The IAM pages are the only structures that link the pages in a
table together. Unindexed tables are good for fast storing of data. Many times it is better
to drop all indexes from table and then do bulk of inserts and to restore those indexes
after that.
105
160. Can SQL Servers linked to other servers like Oracle?
SQL Server can be linked to any server provided it has OLE-DB provider from Microsoft to
allow a link. E.g. Oracle has an OLE-DB provider for oracle that Microsoft provides to add
it as linked server to SQL Server group.
162. What are the basic functions for master, msdb, model, tempdb and resource
databases?
1. The master database holds information for all databases located on the SQL Server
instance and is theglue that holds the engine together. Because SQL Server cannot start
without a functioning masterdatabase, you must administer this database with care.
2. The msdb database stores information regarding database backups, SQL Agent
information, DTS packages, SQL Server jobs, and some replication information such
as for log shipping.
3. The tempdb holds temporary objects such as global and local temporary tables and
stored procedures.
4. The model is essentially a template database used in the creation of any new user
database created in the instance.
5. The resoure Database is a read-only database that contains all the system objects
that are included with SQL Server. SQL Server system objects, such as sys.objects,
are physically persisted in the Resource database, but they logically appear in the
sys schema of every database. The Resource database does not contain user data
or user metadata.
106
164. Where SQL server user names and passwords are stored in SQL server?
They get stored in System Catalog Views sys.server_principals and sys.sql_logins.
107
170. What is MERGE Statement?
MERGE is a new feature that provides an efficient way to perform multiple DML
operations. In previous versions of SQL Server, we had to write separate statements to
INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE
statement we can include the logic of such data modifications in one statement that even
checks when the data is matched then just updates it and when unmatched then inserts
it. One of the most important advantages of a MERGE statement is all the data is read
and processed only once.
172. Which are new data types introduced in SQL SERVER 2008?
1. The GEOMETRY Type: The GEOMETRY data type is a system .NET common language
runtime (CLR) data type in SQL Server. This type represents data in a
two-dimensional Euclidean coordinate system.
2. The GEOGRAPHY Type: The GEOGRAPHY datatype’s functions are the same as with
GEOMETRY. The difference between the two is that when you specify GEOGRAPHY,
you are usually specifying points in terms of latitude and longitude.
3. New Date and Time Datatypes: SQL Server 2008 introduces four new datatypes
related to date and time: DATE, TIME, DATETIMEOFFSET, and DATETIME2.
1. DATE: The new DATE type just stores the date itself. It is based on the
Gregorian calendar and handles years from 1 to 9999.
2. TIME: The new TIME (n) type stores time with a range of 00:00:00.0000000
through 23:59:59.9999999. The precision is allowed with this type. TIME
supports seconds down to 100 nanoseconds. The n in TIME (n) defines this
level of fractional second precision, from 0 to 7 digits of precision.
3. The DATETIMEOFFSET Type: DATETIMEOFFSET (n) is the time-zone aware
version of a datetime datatype. The name will appear less odd when you
consider what it really is: a date + a time + a time-zone offset. The offset is based on how
far behind or ahead you are from Coordinated Universal Time
(UTC) time.
108
4. The DATETIME2 Type: It is an extension of the datetime type in earlier
versions of SQL Server. This new datatype has a date range covering dates
from January 1 of year 1 through December 31 of year 9999. This is a
definite improvement over the 1753 lower boundary of the datetime datatype.
DATETIME2 not only includes the larger date range, but also has a
timestamp and the same fractional precision that TIME type provides
109
177. What is the difference between UNION and UNION ALL?
1. UNION The UNION command is used to select related information from two
tables, much like the JOIN command. However, when using the UNION command
all selected columns need to be of the same data type. With UNION, only distinct
values are selected.
2. UNION ALL The UNION ALL command is equal to the UNION command, except that
UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate
rows, instead it just pulls all rows from all tables fitting your query specifics and
combines them into a table.
180. What does it mean when we say that a relation is in Boyce-Codd Normal Form
(BCNF)?
A relation is in BCNF when every determinant in the relation is a candidate key. This
means that any possible primary key can determine all other attributes in the relation.
Attributes may not be determined by non-candidate key attributes or part of a composite
candidate key. Thus it is said "I swear to construct my tables so that all nonkey columns
are dependent on the key, the whole key and nothing but the key, so help me Codd!"
110
181. You have been given a set of tables with data and asked to create a new database
to store them. When you examine the data values in the tables, what are you looking
for?
(1) Multivalued dependencies, (2) Functional dependencies, (3) Candidate keys, (4)
Primary keys and (5) Foreign keys.
184. What are stored procedures, and how do they differ from triggers?
A stored procedure is a program that is stored within the database and is compiled when
used. They can receive input parameters and they can return results. Unlike triggers,
their scope is database-wide; they can be used by any process that has permission to
use the database stored procedure.
111
187. Explain the differences between structured data and unstructured data.
Structured data are facts concerning objects and events. The most important structured
data are numeric, character, and dates. Structured data are stored in tabular form.
Unstructured data are multimedia data such as documents, photographs, maps, images,
sound, and video clips. Unstructured data are most commonly found on Web servers and
Web-enabled databases.
188.What are dimension tables and definition of Fact tables?
These two questions are most commonly asked database interview questions. Fact
tables are mainly central tables that are an integral part of data warehousing and
dimension tables are used for describing the attributes of the fact tables. Both of these
tables are important and play an important role in maintaining the database
management system.
112
MOST FREQUENTLY ASKED SQL QUERIES
1.SQL Query to find second highest salary of Employee
Answer : There are many ways to find second highest salary of Employee in SQL, you can
either use SQL Join or Subquery to solve this problem. Here is SQL query using Subquery
:
1. select MAX(Salary) from Employee WHERE Salary NOT IN (select MAX(Salary) from
Employee );
2. SQL Query to find Max Salary from each department. Answer :
SELECT DeptID, MAX(Salary) FROM Employee GROUP BY DeptID.
3.Write SQL Query to display current date.
Ans:SQL has built in function called GetDate() which returns current timestamp.
SELECT GetDate();
4.Write an SQL Query to check whether date passed to Query is date of given
format or not. Ans: SQL has IsDate() function which is used to check passed value is
date or not of
specified format ,it returns 1(true) or 0(false) accordingly.
6.Write an SQL Query find number of employees according to gender whose DOB is
between 01/01/1960 to 31/12/1975.
Answer : SELECT COUNT(*), sex from Employees WHERE DOB BETWEEN ‘01/01/1960 '
AND ‘31/12/1975’ GROUP BY sex;
7.Write an SQL Query to find employee whose Salary is equal or greater than 10000.
Answer : SELECT EmpName FROM Employees WHERE Salary>=10000;
113
8.Write an SQL Query to find name of employee whose name Start with ‘M’
Ans: SELECT * FROM Employees WHERE EmpName like 'M%';
11.To fetch ALTERNATE records from a table. (EVEN NUMBERED) select * from emp
where rowid in (select decode(mod(rownum,2),0,rowid, null) from emp);
12.To select ALTERNATE records from a table. (ODD NUMBERED) select * from emp
where rowid in (select decode(mod(rownum,2),0,null ,rowid) from emp);
13.Find the 3rd MAX salary in the emp table. select distinct sal from emp e1 where 3
= (select count(distinct sal) from emp e2 where e1.sal <= e2.sal);
14.Find the 3rd MIN salary in the emp table. select distinct sal from emp e1 where 3
= (select count(distinct sal) from emp e2where e1.sal >= e2.sal);
15.Select FIRST n records from a table. select * from emp where rownum <= &n;
16.Select LAST n records from a table select * from emp minus select * from emp
where rownum <= (select count(*) - &n from emp);
17.List dept no., Dept name for all the departments in which there are no employees
in the department.
select * from dept where deptno not in (select deptno from emp);
alternate solution: select * from dept a where not exists (select * from emp b where
a.deptno = b.deptno);
altertnate solution: select empno,ename,b.deptno,dname from emp a, dept b where
a.deptno(+) = b.deptno and empno is null;
18.How to get 3 Max salaries ? select distinct sal from emp a where 3 >= (select
count(distinct sal) from emp b where a.sal <= b.sal) order by a.sal desc;
22.How to delete duplicate rows in a table? delete from emp a where rowid != (select
max(rowid) from emp b where a.empno=b.empno);
24. Suppose there is annual salary information provided by emp table. How to fetch
monthly salary of each and every employee? select ename,sal/12 as monthlysal from
emp;
25.Select all record from emp table where deptno =10 or 40.
select * from emp where deptno=30 or deptno=10;
26.Select all record from emp table where deptno=30 and sal>1500.
select * from emp where deptno=30 and sal>1500;
27.Select all record from emp where job not in SALESMAN or CLERK.
select * from emp where job not in ('SALESMAN','CLERK');
in('JONES','BLAKE','SCOTT','KING','FORD');
select * from emp where ename like'S____';
29. Select all records where ename starts with ‘S’ and its lenth is 6 ch
30. Select all records where ename may be any no of character but it should end with
ar.
select * from emp where ename like'%R';
‘R’.
31. Count MGR and their salary in emp table.
select ename,(sal+nvl(comm,0)) as totalsal from emp;
115
33. Select any salary <3000 from emp table.
select * from emp where sal> any(select sal from emp where sal<3000);
35. Select all the employee group by deptno and sal in descending order.
select ename,deptno,sal from emp order by deptno,sal desc;
36. How can I create an empty table emp1 with same structure as emp?
Create table emp1 as select * from emp where 1=2;
38.Select all records where dept no of both emp and dept table matches.
select * from emp where exists(select * from dept where
emp.deptno=dept.deptno)
39.If there are two tables emp1 and emp2, and both have common record. How canI fetc
h all the recods but common records only once?
(Select * from emp) Union (Select * from emp1)
40.How to fetch only common records from two tables emp and emp1?
(Select * from emp) Intersect (Select * from emp1)
41. How can I retrive all records of emp1 those should not present in emp2?
(Select * from emp) Minus (Select * from emp1)
42.Count the totalsa deptno wise where more than 2 employees exist.
SELECT deptno, sum(sal) As totalsal
FROM emp
GROUP BY deptno
HAVING COUNT(empno) > 2
43.Display the names of employees who are working in the company for the
past 5 years
. select ename from emp where sysdate-hiredate>5*365;
116
44.Display the list of employees who have joined the company before 30th June 90
or after 31st dec 90
.select * from emp where hiredate between ‘30-jun-1990’ and ‘31-dec-1990’;
47.Display employee names for employees whose name ends with alphab
select ename from emp where ename like
et.
‘%S’;
1. What is the difference between “Stored Procedure” and “Function”?
1. A procedure can have both input and output parameters, but a function can only
have input parameters.
2. Inside a procedure we can use DML (INSERT/UPDATE/DELETE) statements. But
inside a function we can't use DML statements.
3. We can't utilize a Stored Procedure in a Select statement. But we can use a function
in a Select statement.
4. We can use a Try-Catch Block in a Stored Procedure but inside a function we can't
use a Try-Catch block.
5. We can use transaction management in a procedure but we can't in a function.
6. We can't join a Stored Procedure but we can join functions.
7. Stored Procedures cannot be used in the SQL statements anywhere in the
WHERE/HAVING/SELECT section. But we can use a function anywhere.
8. A procedure can return 0 or n values (max 1024). But a function can return only 1
value that is mandatory.
9. A procedure can't be called from a function but we can call a function from a
procedure.
117
2. A Clustered Index requires no separate storage than the table storage. It forces the
rows to be stored sorted on the index key whereas a non-clustered index requires
separate storage than the table storage to store the index information.
3. A table with a Clustered Index is called a Clustered Table. Its rows are stored in a
BTree structure sorted whereas a table without any clustered indexes is called a
nonclustered table. Its rows are stored in a heap structure unsorted.
4. The default index is created as part of the primary key column as a Clustered Index.
5. In a Clustered Index, the leaf node contains the actual data whereas in a
nonclustered index, the leaf node contains the pointer to the data rows of the table.
6. A Clustered Index always has an Index Id of 1 whereas non-clustered indexes have
Index Ids > 1.
7. A Table can have only 1 Clustered Index whereas prior to SQL Server 2008 only
249 non-clustered indexes can be created. With SQL Server 2008 and above 999
nonclustered indexes can be created.
8. A Primary Key constraint creates a Clustered Index by default whereas A Unique Key
constraint creates a non-clustered index by default.
119
7.What are super, primary, candidate and foreign keys?
Ans: A super key is a set of attributes of a relation schema upon which all attributes of
the schema are functionally dependent. No two rows can have the same value of super
key attributes.
A Candidate key is minimal super key, i.e., no proper subset of Candidate key attributes
can be a super key.
A Primary Key is one of the candidate keys. One of the candidate keys is selected as
most important and becomes the primary key. There cannot be more that one primary
keys in a table.
Foreign key is a field (or collection of fields) in one table that uniquely identifies a row of
another table.
120
Introduction
to DAX in
PowerBI
Overview: Data Analysis
Expressions (DAX) is a formula
language primarily used in Power
Pivot, Analysis Services, and Power
BI. It enables users to harness the
power of dynamic calculations and
deep data insights.
122
DAX Fundamentals
- Syntax Distinctions: While bearing similarities to Excel, DAX offers
a more powerful and dynamic formula structure.
- Data Types: Beyond basic types like Decimal and Integer, DAX
introduces complex types that support intricate data operations.
1. **SUM**
- The `SUM` function calculates the sum of a column's values. -
Example:
- Given a table `Sales` with a column `Revenue`.
- DAX Formula:
2. **AVERAGE**
- The `AVERAGE` function returns the average (arithmetic mean) of a column's values. -
Example:
- Given the same `Sales` table with a column `Revenue`.
- DAX Formula:
123
3. **COUNT**
- The `COUNT` function counts the number of rows in a table where the values in the specified
column are not blank.
- Example:
- Using a table `Orders` with a column `OrderID`.
- DAX Formula: `Total Orders = COUNT(Orders[OrderID])`
- This formula will count the total number of orders.
5. **COUNTA**
- The `COUNTA` function counts the number of rows in a table where the values in the specified
column are not blank, and it works on non-numeric data as well.
- Example:
- Using a table `Customers` with a column `Name`.
- DAX Formula: `Total Customers = COUNTA(Customers[Name])`
- This formula will count the total number of customers with a valid name.
2. **NOW**
- The `NOW` function returns the current date and time.
- Example:
- DAX Formula: `Current DateTime = NOW()`
- This formula will show the current date and time at the moment of execution.
124
3. **TODAY**
- The `TODAY` function returns the current date.
- Example:
- DAX Formula: `Current Date = TODAY()`
- This formula will return the current date without the time component.
4. **MONTH**
- The `MONTH` function returns the month as a number (1 for January, 2 for February, etc.) from a
date.
- Example:
- Given a date column `Sales[Date]`.
- DAX Formula: `Sale Month = MONTH(Sales[Date])`
- This formula will extract the month number from the sale date.
5. **YEAR**
- The `YEAR` function extracts the year from a date.
- Example:
- Given a date column `Sales[Date]`.
- DAX Formula: `Sale Year = YEAR(Sales[Date])`
- This formula will provide the year of the sale date.
6. **DATEDIFF**
- The `DATEDIFF` function returns the difference between two dates, based on a specified
interval (day, month, year, etc.).
- Example:
- Given two date columns `Orders[StartDate]` and `Orders[EndDate]`.
- DAX Formula: `Duration in Days = DATEDIFF(Orders[StartDate], Orders[EndDate], DAY)`
- This formula will calculate the number of days between the start and end dates.
125
Advanced DAX Concepts
- Dynamic Context: Functions like CALCULATE
and CALCULATETABLE give users the power to modify existing
contexts and create new ones.
- Time Intelligence: Explore historical data trends with
functions like DATESYTD and TOTALYTD.
1. **Dynamic Context**
- Dynamic context in DAX enables the modification of existing filter contexts or the creation of new
ones.
- **CALCULATE**
- This function evaluates an expression in a modified filter context.
- Example:
- Given a `Sales` table and a need to calculate total sales only for a specific
product category, say "Electronics".
- DAX Formula: `
Electronics Sales = CALCULATE(SUM(Sales[Revenue]), Sales[Category] = "Electronics")`
- This formula calculates the total revenue for only the "Electronics" category.
- **CALCULATETABLE**
- This function evaluates a table expression in a modified filter context.
- Example:
- If you want to produce a table that only includes sales data for the year 2023. - DAX Formula: `Sales
2023 = CALCULATETABLE(Sales, YEAR(Sales[Date]) = 2023)`
- This formula provides a table containing sales data exclusively from the year 2023.
2. **Time Intelligence**
- Time intelligence functions in DAX are critical for analyzing data over time, especially for
understanding historical trends and making future projections.
- **DATESYTD**
- Returns dates from the beginning of the year to a specified date.
- Example:
- If you're analyzing sales data and need to consider dates from the start of the year to the current
date.
- DAX Formula: `Year to Date = DATESYTD(Sales[Date])`
- This formula gives a list of dates from the start of the year to the present date.
126
- **TOTALYTD**
- Calculates the cumulative total for a measure from the beginning of the year to a specified date.
- Example:
- To compute the cumulative sales from the start of the year to the current date.
- DAX Formula: `Cumulative Sales = TOTALYTD(SUM(Sales[Revenue]), Sales[Date])`
- This formula calculates the total revenue accumulated from the start of the year to
the present date.
DAX in Action
- Calculated Columns: Enhance your data models by defining new
columns using DAX.
Calculated columns allow you to add new data to your tables in Power BI. The formula for the
column is calculated for each row of data.
1. **Profit Column**
- Let's say you have a sales table with columns "Revenue" and "Cost." You can create a new
calculated column called "Profit" using DAX.
127
- Measures: These custom aggregations, defined using DAX, provide
more profound insights into datasets.
Measures perform calculations on data based on user interactions in reports. They aggregate data as
users interact with visuals.
128
- KPIs: Monitor business health and trajectory using DAX-
defined Key Performance Indicators.
129
Performance and Optimization
- Efficiency Tips: Writing optimal DAX formulas ensures faster report
rendering.
- Monitoring Techniques: Using query plans and server timings can
help pinpoint and resolve performance bottlenecks.
130
RoadMap to
Mastering
Generative
AI
To develop proficiency in Generative AI, the following 5 skills are
essential for a comprehensive understanding and practical
application of the field.
132
Natural Language Processi ng (NLP) :
Learn the basics of NLP, including tokenization, text preprocessing,
and language modeling.
Computer Vi si on:
Develop a solid understanding of computer vision concepts, including
image classification, object detection, and image segmentation.
133
Generative Adversar i al Net works
(GANs):
Study the fundamentals of GANs, including their architecture and
training process.
134
NLP
Cheat Sheet
1. Tokenization
Tokenization is the process of breaking up text into words, phrases,
symbols, or other meaningful elements, which are called tokens.
- NLTK Word Tokenization:
136
- Spacy Lemmatization:
137
4. Named Entity Recognition (NER)
NER is the process of locating named entities in text and
classifying them into predefined categories.
- NLTK NER:
- Spacy NER:
5. Stopword Removal
Stopwords are the most common words in a language that are to be
filtered out before processing the text data.
- NLTK Stopword Removal:
138
- Spacy Stopword Removal:
6. Sentiment Analysis
Sentiment Analysis is the process of
determining the sentiment or emotion of a piece of text.
7. Topic Modeling
Topic Modeling is the process of identifying topics
in a set of documents.
139
Parameters of
OPENAI GPT
Models
One
Imagine you’re playing a game where you have to come up with
words that start with the letter ‘A’. You can think of many words, like
‘apple’, ‘ant’, ‘alligator’, and so on. But, you want to make the game
more interesting, so you add some rules.
Temperature:
This rule decides how creative or unusual the words you come up
with can be. If the temperature is low, you’ll mostly come up with
common words like ‘apple’ or ‘ant’. But if the temperature is high,
you might come up with more unusual words like ‘abacus’ or
‘aardvark’.
Two
Top_P
This rule decides how many of the most likely words you can choose
from. If top_p is 5, then you can only choose from the 5 most likely
words that start with ‘A’. If top_p is 10, then you can choose from
the 10 most likely words. More the choice, the chances of
randomness increases
Three
Frequency Penalty:
This rule decides how often you can use the same word. If the
frequency penalty is high, then you can’t use the same word too
many times. So if you’ve already used the word ‘apple’ a few times,
it’ll become less likely for you to use it again.
141
Four
Presence Penalty (topic)
142
AIGuild Premium Community
✅ webinars
FOR ACTION TAKERS FIRST 5 USERS USE YEARLY20TO GET 20% OFF
FOR 6.4$ / month ( if Paid yearly 77$ upfront)
Delivery Hero SE This document is strictly confidential and may not be copied, used, made available or be disclosed to thirdparties without prior written permission.
143
Scan to Access
Delivery Hero SE This document is strictly confidential and may not be copied, used, made available or be disclosed to thirdparties without prior written permission.
144
[email protected]
2023
Scan to Access
Delivery Hero SE This document is strictly confidential and may not be copied, used, made available or be disclosed to thirdparties without prior written permission.
145