Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certification Questions and Answers

Question # 14

Which of the following describes the role of the cluster manager?

Options:

The cluster manager schedules tasks on the cluster in client mode.

The cluster manager schedules tasks on the cluster in local mode.

The cluster manager allocates resources to Spark applications and maintains the executor processes in client mode.

The cluster manager allocates resources to Spark applications and maintains the executor processes in remote mode.

The cluster manager allocates resources to the DataFrame manager.

Buy Now

Question # 15

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

Options:

transactionsDf.filter(productId==3 or productId<1)

transactionsDf.filter((col("productId")==3) or (col("productId")<1))

transactionsDf.filter(col("productId")==3 | col("productId")<1)

transactionsDf.where("productId"=3).or("productId"<1))

transactionsDf.filter((col("productId")==3) | (col("productId")<1))

Buy Now

Question # 16

The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column

storeId as key for partitioning. Find the error.

Code block:

transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_split")A.

Options:

The format("parquet") expression is inappropriate to use here, "parquet" should be passed as first argument to the save() operator and "/FileStore/transactions_split" as the second argument.

Partitioning data by storeId is possible with the partitionBy expression, so partitionOn should be replaced by partitionBy.

Partitioning data by storeId is possible with the bucketBy expression, so partitionOn should be replaced by bucketBy.

partitionOn("storeId") should be called before the write operation.

The format("parquet") expression should be removed and instead, the information should be added to the write expression like so: write("parquet").

Buy Now

Question # 17

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.

Code block:

transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

Options:

The "outer" argument should be eliminated, since "outer" is the default join type.

The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Buy Now

Question # 18

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Buy Now

Question # 19

Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

Options:

transactionsDf.repartition(24, boost=True)

transactionsDf.repartition()

transactionsDf.repartition("itemId", 24)

transactionsDf.coalesce(24)

transactionsDf.repartition(24)

Buy Now

Question # 20

Which of the following is not a feature of Adaptive Query Execution?

Options:

Replace a sort merge join with a broadcast join, where appropriate.

Coalesce partitions to accelerate data processing.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

Reroute a query in case of an executor failure.

Collect runtime statistics during query execution.

Buy Now

Question # 21

Which of the following is a characteristic of the cluster manager?

Options:

Each cluster manager works on a single partition of data.

The cluster manager receives input from the driver through the SparkContext.

The cluster manager does not exist in standalone mode.

The cluster manager transforms jobs into DAGs.

In client mode, the cluster manager runs on the edge node.

Buy Now

Question # 22

Which of the following statements about executors is correct?

Options:

Executors are launched by the driver.

Executors stop upon application completion by default.

Each node hosts a single executor.

Executors store data in memory only.

An executor can serve multiple applications.

Buy Now

Question # 23

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Buy Now

Answer:

Explanation:

Explanation

This QUESTION NO: is pretty complex and, in its complexity, is probably above what you would encounter in the exam. However, reading the QUESTION NO: carefully, you can use your logic skills

to weed out the

wrong answers here.

First, you should examine the join statement which is common to all answers. The first argument of the join() operator (documentation linked below) is the DataFrame to be joined with. Where join is

in gap 3, the first argument of gap 4 should therefore be another DataFrame. For none of the questions where join is in the third gap, this is the case. So you can immediately discard two answers.

For all other answers, join is in gap 1, followed by .(itemsDf, according to the code block. Given how the join() operator is called, there are now three remaining candidates.

Looking further at the join() statement, the second argument (on=) expects "a string for the join column name, a list of column names, a join expression (Column), or a list of Columns", according to

the documentation. As one answer option includes a list of join expressions (transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId) which is unsupported according to the

documentation, we can discard that answer, leaving us with two remaining candidates.

Both candidates have valid syntax, but only one of them fulfills the condition in the QUESTION NO: "only where column storeId of DataFrame transactionsDf does not match column itemId of

DataFrame

itemsDf". So, this one remaining answer option has to be the correct one!

As you can see, although sometimes overwhelming at first, even more complex questions can be figured out by rigorously applying the knowledge you can gain from the documentation during the

exam.

More info: pyspark.sql.DataFrame.join — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 47 (Databricks import instructions)

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Jul 7, 2025

Questions: 180

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$34 ~~$84.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$38 ~~$94.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$54 ~~$134.99~~

Add to Cart

Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

certsboard certification exams

Navigation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certification Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure