Pre-Winter Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bigdisc65

Newly Released Databricks Databricks-Machine-Learning-Associate Exam PDF

Page: 2 / 5
Question 8

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.

Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

Options:

A.

A holdout set is not necessary when using a train-validation split

B.

Reproducibility is achievable when using a train-validation split

C.

Fewer hyperparameter values need to be tested when usinga train-validation split

D.

Bias is avoidable when using a train-validation split

E.

Fewer models need to be trained when using a train-validation split

Question 9

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Options:

A.

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

B.

pandas API on Spark DataFrames are more performant than Spark DataFrames

C.

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

D.

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Question 10

A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model.

Which of the following classification metrics should be used to evaluate the model?

Options:

A.

RMSE

B.

Precision

C.

Area under the residual operating curve

D.

Accuracy

E.

Recall

Question 11

A data scientist has written a feature engineering notebook that utilizes the pandas library. As the size of the data processed by the notebook increases, the notebook's runtime is drastically increasing, but it is processing slowly as the size of the data included in the process increases.

Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?

Options:

A.

PySpark DataFrame API

B.

pandas API on Spark

C.

Spark SQL

D.

Feature Store

Page: 2 / 5
Exam Name: Databricks Certified Machine Learning Associate Exam
Last Update: Oct 17, 2024
Questions: 74
Databricks-Machine-Learning-Associate pdf

Databricks-Machine-Learning-Associate PDF

$28  $80
Databricks-Machine-Learning-Associate Engine

Databricks-Machine-Learning-Associate Testing Engine

$33.25  $95
Databricks-Machine-Learning-Associate PDF + Engine

Databricks-Machine-Learning-Associate PDF + Testing Engine

$45.5  $130