Which of the following are valid execution modes?
A. Kubernetes, Local, Client
B. Client, Cluster, Local
C. Cluster, Server, Local
D. Standalone, Client, Cluster
E. Server, Standalone, Client
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
The code block displayed below contains an error. The code block should return a DataFrame in which column predErrorAdded contains the results of Python function add_2_if_geq_3 as applied to numeric and nullable column predError in DataFrame transactionsDf. Find the error.
Code block:
1.def add_2_if_geq_3(x):
2. if x is None:
3. return x
4. elif x >= 3:
5. return x+2
6. return x
7.
8.add_2_if_geq_3_udf = udf(add_2_if_geq_3)
9.
10.transactionsDf.withColumnRenamed("predErrorAdded", add_2_if_geq_3_udf(col("predError")))
A. The udf() method does not declare a return type.
B. Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so transactionsDf.predError.
C. UDFs are only available through the SQL API, but not in the Python API as shown in the code block.
D. The Python function is unable to handle null values, resulting in the code block crashing on execution.
E. The operator used to adding the column does not add column predErrorAdded to the DataFrame.
正解:E
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Which of the following statements about RDDs is incorrect?
A. RDDs are immutable.
B. RDD stands for Resilient Distributed Dataset.
C. An RDD consists of a single partition.
D. RDDs are great for precisely instructing Spark on how to do a query.
E. The high-level DataFrame API is built on top of the low-level RDD API.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
Which of the following describes characteristics of the Spark driver?
A. If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.
B. The Spark driver processes partitions in an optimized, distributed fashion.
C. The Spark driver requests the transformation of operations into DAG computations from the worker nodes.
D. The Spark driver's responsibility includes scheduling queries for execution on worker nodes.
E. In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.
正解:E
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
A. spark.read().schema(fileSchema).parquet(filePath)
B. spark.read().schema(fileSchema).format(parquet).load(filePath)
C. spark.read.schema(fileSchema).format("parquet").load(filePath)
D. spark.read.schema(fileSchema).open(filePath)
E. spark.read.schema("fileSchema").format("parquet").load(filePath)
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
Murayama -
この通りに行えば、必ず合格出来るようになってます。本Associate-Developer-Apache-Spark試験は非常に簡単なので頑張ってください。