An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)
in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
How can the MLOps engineer change this code to reduce how many times the language model is loaded?
A. Convert the Pandas UDF from a Series # Series UDF to a Series # Scalar UDF
B. Convert the Pandas UDF from a Series # Series UDF to an Iterator[Series] # Iterator[Series] UDF
C. Convert the Pandas UDF to a PySpark UDF
D. Run thein_spanish_inner()function in amapInPandas()function call
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.
Which combination of Apache Spark modules should the data scientist use in this scenario?
Options:
A. Spark SQL, Pandas API on Spark, and Structured Streaming
B. Spark DataFrames, Spark SQL, and MLlib
C. Spark DataFrames, Structured Streaming, and GraphX
D. Spark Streaming, GraphX, and Pandas API on Spark
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Given:
python
CopyEdit
spark.sparkContext.setLogLevel("<LOG_LEVEL>")
Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?
A. FATAL, NONE, INFO, DEBUG
B. WARN, NONE, ERROR, FATAL
C. ERROR, WARN, TRACE, OFF
D. ALL, DEBUG, FAIL, INFO
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
A data engineer is running a batch processing job on a Spark cluster with the following configuration:
10 worker nodes
16 CPU cores per worker node
64 GB RAM per node
The data engineer wants to allocate four executors per node, each executor using four cores.
What is the total number of CPU cores used by the application?
A. 80
B. 64
C. 160
D. 40
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?
A. Both functions perform the same operation. Thepersist()function provides improved performance asits default storage level isDISK_ONLY.
B. Bothcache()andpersist()can be used to set the default storage level (MEMORY_AND_DISK_SER)
C. cache()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK) andpersist()- Can be used to set different storage levels to persist the contents of the DataFrame
D. persist()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK_SER) andcache()- Can be used to set different storage levels to persist the contents of the DataFrame.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
A Data Analyst needs to retrieve employees with 5 or more years of tenure.
Which code snippet filters and shows the list?
A. employees_df.filter(employees_df.tenure >= 5).collect()
B. filter(employees_df.tenure >= 5)
C. employees_df.where(employees_df.tenure >= 5)
D. employees_df.filter(employees_df.tenure >= 5).show()
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
Which Spark configuration controls the number of tasks that can run in parallel on the executor?
Options:
A. spark.executor.memory
B. spark.executor.cores
C. spark.driver.cores
D. spark.task.maxFailures
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
Given the code:

df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?
A. When the filter transformation is applied
B. When the show action is applied
C. When the groupBy transformation is applied
D. When the count action is applied
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
Horie -
Associate-Developer-Apache-Spark-3.5合格できてとても嬉しいです。
ありがとうございました。