A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.
Which consideration will impact the decisions made by the engineer while migrating this workload?
A. Databricks supports Spark SQL and JDBC; all logic can be directly migrated from the source system without refactoring.
B. Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake's upsert functionality.
C. Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
D. All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.
E. Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create.

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?
A. The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.
B. One new job named "Ingest new data" will be defined in the workspace, but it will not be executed.
C. Three new jobs named "Ingest new data" will be defined in the workspace, but no jobs will be executed.
D. The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
E. Three new jobs named "Ingest new data" will be defined in the workspace, and they will each run once daily.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.
Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?
A. Both commands will succeed. Executing show tables will show that countries at and sales at have been registered as views.
B. Both commands will fail. No new variables, tables, or views will be created.
C. Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries af: if this entity exists, Cmd 2 will succeed.
D. Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable containing a list of strings.
E. Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable representing a PySpark DataFrame.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
Which statement describes integration testing?
A. Validates interactions between subsystems of your application
B. Requires manual intervention
C. Requires an automated testing framework
D. Validates an application use case
E. Validates behavior of individual elements of your application
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
A table named user_ltv is being used to create a view that will be used by data analysts on Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs.
The user_ltv table has the following schema:
email STRING, age INT, ltv INT
The following view definition is executed:

An analyst who is not a member of the marketing group executes the following query:
SELECT * FROM email_ltv
Which statement describes the results returned by this query?
A. The email, age. and ltv columns will be returned with the values in user ltv.
B. Only the email and itv columns will be returned; the email column will contain all null values.
C. The email and ltv columns will be returned with the values in user itv.
D. Only the email and ltv columns will be returned; the email column will contain the string
"REDACTED" in each row.
E. Three columns will be returned, but one column will be named "redacted" and contain only null values.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
The data science team has requested assistance in accelerating queries on free form text from user reviews. The data is currently stored in Parquet with the below schema:
item_id INT, user_id INT, review_id INT, rating FLOAT, review STRING
The review column contains the full text of the review left by the user. Specifically, the data science team is looking to identify if any of 30 key words exist in this field.
A junior data engineer suggests converting this data to Delta Lake will improve query performance.
Which response to the junior data engineer s suggestion is correct?
A. The Delta log creates a term matrix for free text fields to support selective filtering.
B. Text data cannot be stored with Delta Lake.
C. ZORDER ON review will need to be run to see performance gains.
D. Delta Lake statistics are only collected on the first 4 columns in a table.
E. Delta Lake statistics are not optimized for free text fields with high cardinality.
正解:E
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
Which REST API call can be used to review the notebooks configured to run as tasks in a multi- task job?
A. /jobs/runs/get-output
B. /jobs/get
C. /jobs/list
D. /jobs/runs/list
E. /jobs/runs/get
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.
Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?
A. Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.
B. Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the vacuum job is run the following day.
C. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the vacuum job is run 8 days later.Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from
D. Because the vacuum command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.
E. Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 9:
A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.
Which consideration will impact the decisions made by the engineer while migrating this workload?
A. Databricks supports Spark SQL and JDBC; all logic can be directly migrated from the source system without refactoring.
B. Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake's upsert functionality.
C. Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
D. All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.
E. Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
正解:D
Koike -
ほんとうにDatabricks-Certified-Data-Engineer-Professionalの問題集を買って大正解だ。オススメです。Databricks-Certified-Data-Engineer-Professional苦手な私でも分かりやすかったです。