Which of the following machine learning algorithms typically uses bagging?
A. Random forest
B. K-means
C. IGradient boosted trees
D. Decision tree
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
A data scientist has developed a machine learning pipeline with a static input data set using Spark ML, but the pipeline is taking too long to process. They increase the number of workers in the cluster to get the pipeline to run more efficiently. They notice that the number of rows in the training set after reconfiguring the cluster is different from the number of rows in the training set prior to reconfiguring the cluster.
Which of the following approaches will guarantee a reproducible training and test set for each model?
A. Manually partition the input data
B. Manually configure the cluster
C. Write out the split data sets to persistent storage
D. Set a speed in the data splitting operation
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?
A. predict(Iterator(spark_df))
B. predict(*spark_df.columns)
C. mapInPandas(predict)
D. mapInPandas(predict(spark_df.columns))
E. predict(spark_df.columns)
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.
They have developed this code block to accomplish this task:

The code block is returning an error.
Which of the following adjustments does the data scientist need to make to accomplish this task?
A. They need to use Stringlndexer prior to one-hot encodinq the features.
B. They need to use VectorAssembler prior to one-hot encoding the features.
C. They need to remove the line with the fit operation.
D. They need to specify the method parameter to the OneHotEncoder.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?
A. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.
B. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.
C. One-hot encoding is not a common strategy for representing categorical feature variables numerically.
D. One-hot encoding is dependent on the target variable's values which differ for each apaplication.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
A data scientist is utilizing MLflow Autologging to automatically track their machine learning experiments. After completing a series of runs for the experiment experiment_id, the data scientist wants to identify the run_id of the run with the best root-mean-square error (RMSE).
Which of the following lines of code can be used to identify the run_id of the run with the best RMSE in experiment_id?
A.

B.

C.

D.

正解:B
解説: (Pass4Test メンバーにのみ表示されます)
Sena -
これDatabricks-Machine-Learning-Associate一冊あれば十分に事足りると私は思いました。しっかり網羅しているので、Databricks-Machine-Learning-Associate初学者も再挑戦者も効率的に学習を進められます!