During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.
A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials.
Which combination of steps should the data engineer take to meet these requirements? (Choose two.)
A. Store the credentials in a configuration file that is in an Amazon S3 bucket.
B. Store the credentials in AWS Secrets Manager.
C. Store the credentials in the AWS Glue job parameters.
D. Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.
E. Grant the AWS Glue job IAM role access to the stored credentials.
正解:B,E
質問 2:
A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats.
The data engineer needs to automate the transfer process and must schedule the process to run periodically.
Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?
A. AWS Glue
B. AWS Direct Connect
C. AWS DataSync
D. Amazon S3 Transfer Acceleration
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
A company has five offices in different AWS Regions. Each office has its own human resources (HR) department that uses a unique IAM role. The company stores employee records in a data lake that is based on Amazon S3 storage.
A data engineering team needs to limit access to the records. Each HR department should be able to access records for only employees who are within the HR department's Region.
Which combination of steps should the data engineering team take to meet this requirement with the LEAST operational overhead? (Choose two.)
A. Use data filters for each Region to register the S3 paths as data locations.
B. Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3 access.Restrict access based on Region.
C. Register the S3 path as an AWS Lake Formation location.
D. Enable fine-grained access control in AWS Lake Formation. Add a data filter for each Region.
E. Modify the IAM roles of the HR departments to add a data filter for each department's Region.
正解:C,D
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
Which of the following best describes the type of data found in traditional relational databases?
A. Free-form data
B. Structured data
C. Semi-structured data
D. Unstructured data
正解:B
質問 5:
A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB.
The ETL pipeline must function correctly despite changes to the data schema.
Which data pipeline solutions will meet these requirements? (Choose two.)
A. Use an Amazon EventBridge rule to run an AWS Glue job every 15 minutes. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
B. Configure an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
C. Configure an AWS Lambda function to invoke an AWS Glue job when a file is loaded into the S3 bucket. Configure the AWS Glue job to read the files from the S3 bucket into an Apache Spark DataFrame. Configure the AWS Glue job to also put smaller partitions of the DataFrame into an Amazon Kinesis Data Firehose delivery stream. Configure the delivery stream to load data into the Amazon Redshift tables.
D. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes.
Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables.
E. Configure an AWS Lambda function to invoke an AWS Glue crawler when a file is loaded into the S3 bucket. Configure an AWS Glue job to process and load the data into the Amazon Redshift tables. Create a second Lambda function to run the AWS Glue job. Create an Amazon EventBridge rule to invoke the second Lambda function when the AWS Glue crawler finishes running successfully.
正解:B,D
質問 6:
An online retail company stores Application Load Balancer (ALB) access logs in an Amazon S3 bucket. The company wants to use Amazon Athena to query the logs to analyze traffic patterns.
A data engineer creates an unpartitioned table in Athena. As the amount of the data gradually increases, the response time for queries also increases. The data engineer wants to improve the query performance in Athena.
Which solution will meet these requirements with the LEAST operational effort?
A. Use Apache Hive to create bucketed tables. Use an AWS Lambda function to transform all ALB access logs.
B. Create an AWS Lambda function to transform all ALB access logs. Save the results to Amazon S3 in Apache Parquet format. Partition the metadata. Use Athena to query the transformed data.
C. Create an AWS Glue job that determines the schema of all ALB access logs and writes the partition metadata to AWS Glue Data Catalog.
D. Create an AWS Glue crawler that includes a classifier that determines the schema of all ALB access logs and writes the partition metadata to AWS Glue Data Catalog.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.
Which combination of tasks will meet these requirements with the LEAST operational overhead?
(Choose two.)
A. Use AWS Lambda functions to schedule and run the ETL jobs every hour.
B. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
C. Configure AWS Glue triggers to run the ETL jobs every hour.
D. Use AWS Glue DataBrew to clean and prepare the data for analytics.
E. Use the Redshift Data API to load transformed data into Amazon Redshift.
正解:B,C
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
A company uses a data lake that is based on an Amazon S3 bucket. To comply with regulations, the company must apply two layers of server-side encryption to files that are uploaded to the S3 bucket. The company wants to use an AWS Lambda function to apply the necessary encryption.
Which solution will meet these requirements?
A. Use both server-side encryption with AWS KMS keys (SSE-KMS) and the Amazon S3 Encryption Client.
B. Use dual-layer server-side encryption with AWS KMS keys (DSSE-KMS).
C. Use server-side encryption with AWS KMS keys (SSE-KMS).
D. Use server-side encryption with customer-provided keys (SSE-C) before files are uploaded.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
我那** -
かなり本気で取り組む必要がありますが、これDEA-C01一冊をしっかりやり込めば合格できると思います。