Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to
transactionNumber. Find the error.
Code block:
transactionsDf.withColumn("transactionNumber", "transactionId")
The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that
correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)
Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?
The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to
30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__((__2__.__3__) __4__ (__5__))
The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame
transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))
Which of the following code blocks returns only rows from DataFrame transactionsDf in which values in column productId are unique?
The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))
The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string
type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))