Question Bank Data Science & Its Applications

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Question Bank

Data Science & its Applications


Module 5: (Natural Language Processing)

1. What are the applications of a wordcloud Generator? What are the Prerequisites python
packages for word Cloud? Creating a Word Cloud in Python.

2. Discuss the Advantages and Limitations of Bi-grams, Tri-grams and N-grams models and
How do we implement this? Write a python function.

3. Compare Gibbs sampling with traditional sampling and bagging methods, provide a clear
understanding of its advantages and applications.

4. Discuss a technique called Latent Dirichlet Analysis (LDA) that is commonly used to
identify common topics in a set of documents. Write a python function for how to assigned a
topic and Most common words per topic to each document.

5. What is centrality measure? Discuss Betweenness centrality and PageRank centrality. Use
a undirected graph to define the eigenvector centrality in the way where two nodes of the
graph depends on both the number of neighbors.

6. How does a recommendation system use natural language processing? Explain the concept
of matrix factorisation in recommendation systems. Explain the concept of content-based
filtering and discuss its applications and limitations in the context of recommending items to
users. Provide examples of real-world applications and analyze the benefits they provide to
users.

7. Analyze the concept of matrix factorization and its application in the recommendation
systems. Discuss the advantages and limitations of matrix factorization compared to other
techniques and provide examples of real-world applications.

8. Explain how collaborative filtering works, and discuss the challenges and potential
solutions for improving its accuracy.

Module 4
1. a. Consider the following dataset. Write a program to demonstrate the working of the
decision tree based ID3 algorithm.
b. Consider the dataset spiral.txt The first two columns in the dataset corresponds to the co-
ordinates of each data point. The third column corresponds to the actual cluster label.
Compute the rand index for the following methods:
 K – means Clustering
 Single – link Hierarchical Clustering
 Complete link hierarchical clustering.
 Also visualize the dataset and which algorithm will be able to recover the true clusters.

2. Analyze the importance of training data in the performance of machine learning models.
Discuss the different types of training data, the considerations to take into account when
collecting and preprocessing the training data and provide examples of real-world
applications where training data plays a critical role.
3. Examine the concept of Mean Absolute Error (MAE) and its application in the evaluation
of regression models. Discuss the advantages and disadvantages of MAE compared to other
evaluation metrics and provide examples of how MAE can be used to improve model
performance.
4. What is Artificial Neural Network (ANN) and applications of ANN? How do Neural
Networks Work? Develop a Perceptron training algorithm. Explain how a Backpropagation
network learns to solve the XOR problem.
5. Define Perceptron. Write a Python program to create a multi-layer perceptron for the XOR
function.
6. What are Tensors? How do you approach feature selection for a deep learning. What
methods have you used in the past for hyperparameter tuning?

7. What are some common challenges found in deep learning models, and how would you
overcome them?

8. Describe your approach to handling overfitting and underfitting in a deep learning model
via regularization technique.

9. Compute Hierarchical Clustering and explain about how classification differ from
clustering.
10. Derive the Backpropagation rule considering the training rule for Output Unit weights
and Training Rule for Hidden Unit weights with example.

What is the XOR


problem, and how can it
be solved using an
artificial neural
network?
Module 3:
1. Train a regularized logistic regression classifier on the iris dataset using sklearn. Train the
model with the following hyper parameter C = 1e4 and report the best classification accuracy.

2. Train an SVM classifier on the iris dataset using sklearn. Try different kernels and the
associated hyper parameters. Train model with the following set of hyper parameters RBF
kernel, gamma=0.5, one-vs-rest classifier, no-feature-normalization. Also try
C=0.01,1,10C=0.01,1,10. For the above set of hyper parameters, find the best classification
accuracy along with total number of support vectors on the test data.

3. Using code snippets, outline the concepts involved in


(i) Measuring accuracy using Cross-Validation. (ii) Confusion Matrix. (iii) Precision and
Recall

4. What is gradient descent algorithm and discuss its various types.


5. Define machine Learning. Explain with specific example.
6. Explain the Naïve Bayes Classifier with suitable example.
7. Describe the K-nearest neighbour (KNN) algorithm for continuous valued target function.
8. Explain the Linear Regression with Gradient Descent and Maximum Likelihood
Estimation (MLE)
9. What is Multiple Regression model, Fit the model and interpret model using a Goodness of
fit.
10. What is Logistic Regression. Explain Logistic Function while applying a model.

You might also like