Databricks-Machine-Learning-Professional Exam Dumps - Databricks ML Data Scientist Questions and Answers

Question # 14

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.

Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

Options:

df = fs.get_missing_features(spark_df, model_uri)

fs.score_model(model_uri, df)

fs.score_model(model_uri, spark_df)

df = fs.get_missing_features(spark_df, model_uri)

fs.score_batch(model_uri, df)

df = fs.get_missing_features(spark_df)

fs.score_batch(model_uri, df)

fs.score_batch(model_uri, spark_df)

Buy Now

Answer:

Explanation:

To compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id, you can use the following code block:

Python

# Get the missing features from the Feature Store using the model URI and the customer_id column

df = fs.get_missing_features(spark_df, model_uri, lookup_key="customer_id")

# Score the DataFrame using the model URI and the Feature Store Client

fs.score_batch(model_uri, df)

AI-generated code. Review and use carefully. More info on FAQ.

The fs.get_missing_features method takes a Spark DataFrame, a model URI, and a lookup key as arguments. It returns a new Spark DataFrame that contains the originalcolumns plus the missing features that are required by the model. The missing features are retrieved from the Feature Store by joining the DataFrame with the feature tables using the lookup key. The lookup key must match the primary key of the feature tables. The model URI must point to a registered model that was trained using features from the Feature Store1.

The fs.score_batch method takes a model URI and a Spark DataFrame as arguments. It applies the model to the DataFrame and returns a new Spark DataFrame that contains the original columns plus a prediction column. The model URI must point to a registered model that was trained using features from the Feature Store2.

The other options are incorrect because:

Option A: fs.score_model is not a valid method name, as it is missing an underscore. The correct method name is fs.score_batch2.
Option B: fs.score_model without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option D: fs.score_batch without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option E: fs.score_batch without specifying the lookup key will not work, as the fs.get_missing_features method requires a lookup key to join the DataFrame with the feature tables. The correct way is to use fs.get_missing_features with the lookup key “customer_id” before fs.score_batch12. References: Get missing features, Score batch

Question # 15

A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.

Which of the following deployment strategies can be used to meet these requirements?

Options:

Edge/on-device

Streaming

None of these strategies will meet the requirements.

Batch

Real-time

Buy Now

Question # 16

A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in themodel_urivariable and its Run ID in therun_idvariable. They have also determined that the model was logged with the name"model". Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name"best_model".

Which of the following lines of code can they use to register the model to the MLflow Model Registry?

Options:

mlflow.register_model(model_uri, "best_model")

mlflow.register_model(run_id, "best_model")

mlflow.register_model(f"runs:/{run_id}/best_model", "model")

mlflow.register_model(model_uri, "model")

mlflow.register_model(f"runs:/{run_id}/model")

Buy Now

Question # 17

A machine learning engineer is using the following code block as part of a batch deployment pipeline:

Which of the following changes needs to be made so this code block will work when theinferencetable is a stream source?

Options:

Replace "inference" with the path to the location of the Delta table

Replace schema(schema) with option("maxFilesPerTriqqer", 1}

Replace spark.read with spark.readStream

Replace formatfdelta") with format("stream")

Replace predict with a stream-friendly prediction function

Buy Now

Answer:

Explanation:

To read data from a stream source, such as Kafka, socket, or rate, the spark.readStream method should be used instead of spark.read. The spark.readStream method returns a streaming DataFrame that represents the unbounded input data stream. The spark.readStream method supports the same options and formats as the spark.read method, such as schema, delta, csv, json, etc. The spark.readStream method can also read from a Delta table as a stream source, by specifying the format("delta") and the path or table name of the Delta table123

The other options are incorrect because:

A. Replacing “inference” with the path to the location of the Delta table does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the path or table name of the Delta table can be specified as an option or argument.
B. Replacing schema(schema) with option("maxFilesPerTrigger", 1) does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the schema can be specified as an option or argument. The option("maxFilesPerTrigger", 1) is an optional configuration that limits the number of files processed in each trigger for file-based stream sources, such as delta, csv, json, etc. It does not affect the reading of data from a stream source4
D. Replacing format("delta") with format("stream") does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the format can be specified as an option or argument. The format("stream") is not a valid format for reading data from a stream source. The supported formats are delta, kafka, socket, rate, etc1
E. Replacing predict with a stream-friendly prediction function does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the prediction function can be applied to the streaming DataFrame as usual. The predict function does not need to be changed, as long as it can accept a streaming DataFrame as input and return a column of predictions as output5

References:

Input Sources - Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Structured Streaming + Delta Lake - Databricks
Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Configuration - Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Machine Learning with Structured Streaming - Databricks

Question # 18

Which of the following lists all of the model stages are available in the MLflow Model Registry?

Options:

Development. Staging. Production

None. Staging. Production

Staging. Production. Archived

None. Staging. Production. Archived

Development. Staging. Production. Archived

Buy Now

Question # 19

Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?

Options:

All of these reasons

JS is not normalized or smoothed

None of these reasons

JS is more robust when working with large datasets

JS does not require any manual threshold or cutoff determinations

Buy Now

Question # 20

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

Options:

MLflow APIs

AutoML APIs

MLflow Client

Jobs cannot be created programmatically

Databricks REST APIs

Buy Now

Question # 21

Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?

Options:

The context parameter allows the user to specify which version of the registered MLflowModel should be used based on the given application's current scenario

The context parameter allows the user to document the performance of a model after it has been deployed

The context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model

The context parameter allows the user to provide the model with completely custom if-else logic for the given application's current scenario

The context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files

Buy Now

Exam Code: Databricks-Machine-Learning-Professional

Exam Name: Databricks Certified Machine Learning Professional

Last Update: Apr 3, 2025

Questions: 60

Databricks-Machine-Learning-Professional PDF

$25.5 ~~$84.99~~

Add to Cart

Databricks-Machine-Learning-Professional Engine

Databricks-Machine-Learning-Professional Testing Engine

$28.5 ~~$94.99~~

Add to Cart

Databricks-Machine-Learning-Professional PDF + Engine

Databricks-Machine-Learning-Professional PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Special Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

certsboard certification exams

Navigation:

Databricks-Machine-Learning-Professional Exam Dumps - Databricks ML Data Scientist Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Machine-Learning-Professional PDF

Databricks-Machine-Learning-Professional Testing Engine

Databricks-Machine-Learning-Professional PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure