Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?
A. array_remove(transactionsDf, "*")
B. transactionsDf.unpersist()
(Correct)
C. del transactionsDf
D. transactionsDf.persist()
E. transactionsDf.clearCache()
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
Which of the following describes Spark's Adaptive Query Execution?
A. Adaptive Query Execution reoptimizes queries at execution points.
B. Adaptive Query Execution is enabled in Spark by default.
C. Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.
D. Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
E. Adaptive Query Execution applies to all kinds of queries.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
A. spark.read().schema(fileSchema).parquet(filePath)
B. spark.read().schema(fileSchema).format(parquet).load(filePath)
C. spark.read.schema(fileSchema).format("parquet").load(filePath)
D. spark.read.schema(fileSchema).open(filePath)
E. spark.read.schema("fileSchema").format("parquet").load(filePath)
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?
A. transactionsDf.sample(False, 0.5)
B. transactionsDf.sample(True, 0.5, force=True)
C. transactionsDf.take(1000)
D. transactionsDf.sample(True, 0.5)
E. transactionsDf.take(1000).distinct()
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?
A. Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions
B. Increase values for the properties spark.sql.parallelism and spark.sql.partitions
C. Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions
D. Decrease values for the properties spark.default.parallelism and spark.sql.partitions
E. Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?
A. transactionsDf.withColumnRenamed("productNumber", "productId")
B. transactionsDf.withColumnRenamed(col(productId), col(productNumber))
C. transactionsDf.withColumnRenamed("productId", "productNumber")
D. transactionsDf.withColumn("productId", "productNumber")
E. transactionsDf.withColumnRenamed(productId, productNumber)
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
896 お客様のコメント





知念** -
至れり尽くせりのAssociate-Developer-Apache-Spark一冊だなって思いました。すごく参考になると思いました。