Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Exam C1000 – 059 IBM AI Enterprise

Workflow V1 Data Scientist Specialist


(Sample Questions)

1. To reduce the overall time to complete a data ingestion job, what


two actions should be taken?

A. Assemble the data pipeline into a series of immutable


transformations, which can be combined after the processing.
B. Partition the data within each pipeline to take advantage of parallel
processing (multiple server cores, processors, etc.).
C. Look for outliers in the data, missing values, and skewness of the
data.
D. Build a dedicated pipeline for each dataset to ensure that all of them
can be processed independently and concurrently.
E. Apply a chi-squared statistical test to rank the impact of each feature
on the concept label and discard the less impactful features before
model training.

2. A design thinking project at a large corporation is in-progress


and most of the project activities involve conducting interviews
and the creation and review of photo journals. Which phase of
the design thinking process is currently being executed?

A. Empathize
B. Define
C. Ideate
D. Prototype

3. A client requests a general artificial intelligence (AI) tool that


they can plug into their data warehouse. What is the best
response to this request?

A. There is no general AI tool currently that works universally.


B. Apply neural networks to your data.
C. IBM Watson is the tool you are looking for.
D. AI can create value without any human-intervention.

4. What is a key advantage to a machine learning system versus a


rule-based system for making business decisions?

A. Machine learning systems can be implemented by business users.


B. Machine learning systems generalize better than a rule-based
system.
C. Machine learning systems are always more accurate than
rule-based systems.
D. Rule-based systems can only deal with nominal and ordinal
categorical data, whereas machine learning systems can deal with all
types of data.

5. What is a class of machine learning problems where the


algorithm builds a mathematical model from a small amount of
labeled data with a large amount of unlabeled data?

A. semi-supervised learning
B. partially labeled learning
C. nearest-neighbor clustering
D. imperfect knowledge clustering

6. What should be the first step to begin the task of collecting


initial data?

A. Copy data from several sources to a central repository to review the


data
B. Determine if a poll is required to collect data
C. Verify the technical skills that are required to collect data
D. Understand the business requirement to find out what would be the
relevant data needed

7. What are two common ways to handle missing values when


cleaning data?

A. delete records
B. replace with '1'
C. replace with mean
D. replace with '100'
E. replace with standard deviation

8. A client, a tomato grower, provides a dataset of measurements


of tomato plants and environmental data. A data scientist thinks
the features probably have a significant amount of redundancy.
The data scientist decides to apply dimensionality reduction to
the data features.

Which three techniques are examples of dimensionality


reduction?

A. k-means clustering
B. batch normalization
C. combinatorial optimization
D. autoencoder neural network
E. principal component analysis (PCA)
F. t-distributed stochastic neighbor embedding (t-SNE)

9. Which is an accurate statement regarding logistic regression?

A. Logistic regression is a non-linear classifier.


B. Logistic regression can be used for unsupervised learning.
C. Logistic regression can be used for binary classification.
D. The logistic function f(x) = 1/(1 + exp(-(wx + b))) can take values
between [0, inf].

10. What are three hyperparameters that are used when building a
simple decision tree model?

A. kernel
B. learning rate
C. maximum depth
D. split criterion
E. number of nearest neighbors
F. minimum number of samples in a leaf node

11. What is used to update coefficients in logistic regression?

A. number of features
B. kernel
C. slope
D. gradient descent

12. Which two statements are true in the context of evaluating


machine learning models?

A. Accuracy of 95% is always a good result.


B. Random guessing can be used as a baseline.
C. The F2-score puts equal weight on precision and recall.
D. F-score is the harmonic mean between precision and recall.
E. Evaluation metrics on training data are more important than on test
data.

13. What is the main benefit of adjusted R-squared compared to


R-squared?

A. all samples are considered in the formula


B. the number of features is considered in the formula
C. the average R-squared is calculated
D. train and test split is respected

14. Which model evaluation metric is best suited for imbalanced


data sets?

A. precision-recall curve
B. roc curve
C. misclassification curve
D. lift curve

15. Which IBM offering enables data scientists to deploy their


trained machine learning models to production in a scalable
environment?

A. Watson Machine Learning


B. Watson Studio
C. Watson Knowledge Catalog
D. Watson OpenScale

16. Which Python function would allow a data analyst to convert


strings of dates (such as "10 June 1964") into struct_time
objects to be used for further data cleansing?

A. import datetime.strptime()
B. import timobj.str2obj()
C. import calendar.object()
D. import time.toString()

17. The "aperture problem" in machine vision is best defined as?

A. Identifying a whole object or scene based on seeing only a small


part of that object or scene
B. generating "snakes" of active contours based on boundary curves
C. pattern matching based on an undertrained model
D. over-fitting a model based on close-up images

18. What is an example of a relation type that can be detected with


Watson Natural Language Understanding?

A. partOf
B. describedBy
C. assistant
D. during
Answer Key:

1. BD
2. A
3. A
4. B
5. A
6. D
7. AC
8. DEF
9. C
10. CDF
11. D
12. BD
13. B
14. A
15. A
16. A
17. A
18. A

You might also like