Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?
A. TrainValidationSplit
B. TrainValidationSplitModel
C. DataFrame.where
D. DataFrame.randomSplit
E. CrossValidator
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?
A. Spark ML
B. Autoscaling clusters
C. Autoscaling clusters
D. MLflow Experiment Tracking
E. Delta Lake
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
A. Iterative optimization
B. Spark ML cannot distribute linear regression training
C. Singular value decomposition
D. Least-squares method
E. Logistic regression
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
A. Iterative optimization
B. Spark ML cannot distribute linear regression training
C. Singular value decomposition
D. Least-squares method
E. Logistic regression
正解:A
質問 5:
Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?
A. pandas
B. Spark ML
C. PvTorch
D. Scikit-learn
E. Keras
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
A machine learning engineer wants to parallelize the training of group-specific models using the Pandas Function API. They have developed the train_model function, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:

Which of the following pieces of code can be used to fill in the above blank to complete the task?
A. train_model
B. groupedApplyIn
C. predict
D. applyInPandas
E. mapInPandas
正解:E
解説: (Pass4Test メンバーにのみ表示されます)
铃木** -
この問題集を覚えて受験して、無事、合格することができました。感謝感激です。Databricks-Machine-Learning-Associate情報量は多いのでそれに関しては満足。