Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certification Questions and Answers

Question # 24

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame

transactionsDf?

Options:

transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

transactionsDf.select(corr(predError, value).alias("corr"))

transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

(Correct)

transactionsDf.select(corr("predError", "value"))

Buy Now

Answer:

Explanation:

Explanation

In difficulty, this QUESTION NO: is above what you can expect from the exam. What this QUESTION NO: wants to teach you, however, is to pay attention to the useful details included in the

documentation.

pyspark.sql.corr is not a very common method, but it deals with Spark's data structure in an interesting way. The command takes two columns over multiple rows and returns a single row - similar to

an aggregation function. When examining the documentation (linked below), you will find this code example:

a = range(20)

b = [2 * x for x in range(20)]

df = spark.createDataFrame(zip(a, b), ["a", "b"])

df.agg(corr("a", "b").alias('c')).collect()

[Row(c=1.0)]

See how corr just returns a single row? Once you understand this, you should be suspicious about answers that include first(), since there is no need to just select a single row. A reason to eliminate

those answers is that DataFrame.first() returns an object of type Row, but not DataFrame, as requested in the question.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

Correct! After calculating the Pearson correlation coefficient, the resulting column is correctly renamed to corr.

transactionsDf.select(corr(predError, value).alias("corr"))

No. In this answer, Python will interpret column names predError and value as variable names.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

Incorrect. first() returns a row, not a DataFrame (see above and linked documentation below).

transactionsDf.select(corr("predError", "value"))

Wrong. Whie this statement returns a DataFrame in the desired shape, the column will have the name corr(predError, value) and not corr.

transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

False. In addition to first() returning a row, this code block also uses the wrong call structure for command corr which takes two arguments (the two columns to correlate).

More info:

- pyspark.sql.functions.corr — PySpark 3.1.2 documentation

- pyspark.sql.DataFrame.first — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 53 (Databricks import instructions)

Question # 25

Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?

Options:

1.print(itemsDf.columns)

2.print(itemsDf.types)

itemsDf.printSchema()

spark.schema(itemsDf)

itemsDf.rdd.printSchema()

itemsDf.print.schema()

Buy Now

Question # 26

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column

predError in DataFrame transactionsDf?

Options:

transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))

transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))

transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))

transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))

transactionsDf.withColumn("predErrorSquared", "predError"**2)

Buy Now

Question # 27

Which of the following code blocks returns a DataFrame with a single column in which all items in column attributes of DataFrame itemsDf are listed that contain the letter i?

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

3.+------+----------------------------------+-----------------------------+-------------------+

7.+------+----------------------------------+-----------------------------+-------------------+

Options:

itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(attributes_exploded.contains("i"))

itemsDf.explode(attributes).alias("attributes_exploded").filter(col("attributes_exploded").contains("i"))

itemsDf.select(explode("attributes")).filter("attributes_exploded".contains("i"))

itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(col("attributes_exploded").contains("i"))

itemsDf.select(col("attributes").explode().alias("attributes_exploded")).filter(col("attributes_exploded").contains("i"))

Buy Now

Question # 28

The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame

itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

3.+------+----------------------------------+-----------------------------+-------------------+

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

Options:

1. length

2. regexp_extract

3. upper

4. col("itemName")

5. as

1. size

2. regexp_replace

3. lower

4. "itemName"

5. alias

1. lower

2. regexp_replace

3. length

4. "itemName"

5. alias

1. length

2. regexp_replace

3. lower

4. col("itemName")

5. alias

1. size

2. regexp_extract

3. lower

4. col("itemName")

5. alias

Buy Now

Question # 29

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

The column names should be listed directly as arguments to the operator and not as a list.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

The select operator should be replaced by a drop operator.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Buy Now

Answer:

Explanation:

Explanation

Correct code block: transactionsDf.drop("productId", "f")

This QUESTION NO: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as

strings without being wrapped in a col() operator.

Correct! Here, you need to figure out the many, many things that are wrong with the initial code block. While the QUESTION NO: can be solved by using a select statement, a drop statement, given

the

answer options, is the correct one. Then, you can read in the documentation that drop does not take a list as an argument, but just the column names that should be dropped. Finally, the column

names should be expressed as strings and not as Python variable names as in the original code block.

The column names should be listed directly as arguments to the operator and not as a list.

Incorrect. While this is a good first step and part of the correct solution (see above), this modification is insufficient to solve the question.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f

should be replaced by transactionId, predError, value and storeId.

Wrong. If you use the same pattern as in the original code block (col(productId), col(f)), you are still making a mistake. col(productId) will trigger Python to search for the content of a variable named

productId instead of telling Spark to use the column productId - for that, you need to express it as a string.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

No. This still leaves you with Python trying to interpret the column names as Python variables (see above).

The select operator should be replaced by a drop operator.

Wrong, this is not enough to solve the question. If you do this, you will still face problems since you are passing a Python list to drop and the column names are still interpreted as Python variables

(see above).

More info: pyspark.sql.DataFrame.drop — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 30 (Databricks import instructions)

Question # 30

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

Options:

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Buy Now

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Jul 2, 2025

Questions: 180

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$34 ~~$84.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$38 ~~$94.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$54 ~~$134.99~~

Add to Cart

Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

certsboard certification exams

Navigation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certification Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure