A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.
In which location can one review the timeline for cluster resizing events?
A. Executor's log file
B. Ganglia
C. Workspace audit logs
D. Driver's log file
E. Cluster Event Log
正解:E
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
A Delta Lake table was created with the below query:

Consider the following query:
DROP TABLE prod.sales_by_store
If this statement is executed by a workspace admin, which result will occur?
A. An error will occur because Delta Lake prevents the deletion of production data.
B. The table will be removed from the catalog but the data will remain in storage.
C. The table will be removed from the catalog and the data will be deleted.
D. Nothing will occur until a COMMIT command is executed.
E. Data will be marked as deleted but still recoverable with Time Travel.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Which statement characterizes the general programming model used by Spark Structured Streaming?
A. Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
B. Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
C. Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
D. Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.
E. Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
A data engineer needs to capture pipeline settings from an existing in the workspace, and use them to create and version a JSON file to create a new pipeline. Which command should the data engineer enter in a web terminal configured with the Databricks CLI?
A. Stop the existing pipeline; use the returned settings in a reset command
B. Use the alone command to create a copy of an existing pipeline; use the get JSON command to get the pipeline definition; save this to git
C. Use list pipelines to get the specs for all pipelines; get the pipeline spec from the return results parse and use this to create a pipeline
D. Use the get command to capture the settings for the existing pipeline; remove the pipeline_id and rename the pipeline; use this in a create command
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.
Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?
A. Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.
B. Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the vacuum job is run the following day.
C. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the vacuum job is run 8 days later.Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from
D. Because the vacuum command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.
E. Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
Which statement describes the default execution mode for Databricks Auto Loader?
A. New files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.
B. Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.
C. Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; the target table is materialized by directly querying all valid files in the source directory.
D. New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.
E. Webhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
A table is registered with the following code:
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?
A. All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
B. All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.
C. Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.
D. The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.
E. All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from part- file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
A. Set spark.sql.shuffle.partitions to 2,048 partitions (1TB*1024*1024/512), ingest the data, execute the narrow transformations, optimize the data by sorting it (which automatically repartitions the data), and then write to parquet.
B. Set spark.sql.shuffle.partitions to 512, ingest the data, execute the narrow transformations, and then write to parquet.
C. Set spark.sql.adaptive.advisoryPartitionSizeInBytes to 512 MB bytes, ingest the data, execute the narrow transformations, coalesce to 2,048 partitions (1TB*1024*1024/512), and then write to parquet.
D. Set spark.sql.files.maxPartitionBytes to 512 MB, ingest the data, execute the narrow transformations, and then write to parquet.
E. Ingest the data, execute the narrow transformations, repartition to 2,048 partitions (1TB*
1024*1024/512), and then write to parquet.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 9:
A Delta Lake table was created with the below query:
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

Realizing that the original query had a typographical error, the below code was executed:
ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store
Which result will occur after running the second command?
A. A new Delta transaction log Is created for the renamed table.
B. The table reference in the metastore is updated and no data is changed.
C. All related files and metadata are dropped and recreated in a single ACID transaction.
D. The table reference in the metastore is updated and all data files are moved.
E. The table name change is recorded in the Delta transaction log.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
Tooyama -
本当に使えて、本番試験にも無事Databricks-Certified-Data-Engineer-Professional合格した。以前購入したよりもかなり安いです。