Assuming that the Databricks CLI has been installed and configured correctly, which Databricks CLI command can be used to upload a custom Python Wheel to object storage mounted with the DBFS for use with a production job?
A. libraries
B. jobs
C. configure
D. workspace
E. fs
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
The following table consists of items found in user carts within an e-commerce website.

The following MERGE statement is used to update this table using an updates view, with schema evaluation enabled on this table.

How would the following update be handled?
A. The new restored field is added to the target schema, and dynamically read as NULL for existing unmatched records.
B. The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.
C. The update is moved to separate ''restored'' column because it is missing a column expected in the target schema.
D. The update throws an error because changes to existing columns in the target schema are not supported.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.
The proposed directory structure is displayed below:

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?
A. Yes; Delta Lake supports infinite concurrent writers.
B. Yes; both of the streams can share a single checkpoint directory.
C. No; each of the streams needs to have its own checkpoint directory.
D. No; only one stream can write to a Delta Lake table.
E. No; Delta Lake manages streaming checkpoints in the transaction log.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have been breaking their ability to use other geolocation processes.
A junior engineer has written the following code to addCHECKconstraints to the Delta Lake table:

A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed.
Which statement explains the cause of this failure?
A. The activity details table already exists; CHECK constraints can only be added during initial table creation.
B. The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.
C. The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table.
D. The current table schema does not contain the field valid coordinates; schema evolution will need to be enabled before altering the table to add a constraint.
E. Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint2.0/jobs/create.

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?
A. The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.
B. One new job named "Ingest new data" will be defined in the workspace, but it will not be executed.
C. Three new jobs named "Ingest new data" will be defined in the workspace, but no jobs will be executed.
D. The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
E. Three new jobs named "Ingest new data" will be defined in the workspace, and they will each run once daily.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.
The following logic is used to process these records.
MERGE INTO customers
USING (
SELECT updates.customer_id as merge_ey, updates .*
FROM updates
UNION ALL
SELECT NULL as merge_key, updates .*
FROM updates JOIN customers
ON updates.customer_id = customers.customer_id
WHERE customers.current = true AND updates.address <> customers.address ) staged_updates ON customers.customer_id = mergekey WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.
address THEN
UPDATE SET current = false, end_date = staged_updates.effective_date
WHEN NOT MATCHED THEN
INSERT (customer_id, address, current, effective_date, end_date)
VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null) Which statement describes this implementation?
* The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.
A. The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.
B. The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.
C. The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
Mizutani -
この通りに行えば、必ず合格出来るようになってます。本Databricks-Certified-Professional-Data-Engineer試験は非常に簡単なので頑張ってください。