A marketing company wants to improve its reporting and business intelligence capabilities. During the planning phase, the company interviewed the relevant stakeholders and discovered that:
The operations team reports are run hourly for the current month's data.
The sales team wants to use multiple Amazon QuickSight dashboards to show a rolling view of the last 30 days based on several categories.
The sales team also wants to view the data as soon as it reaches the reporting backend.
The finance team's reports are run daily for last month's data and once a month for the last 24 months of data.
Currently, there is 400 TB of data in the system with an expected additional 100 TB added every month. The company is looking for a solution that is as cost-effective as possible.
Which solution meets the company's requirements?
A. Store the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Use a long- running Amazon EMR with Apache Spark cluster to query the data as needed. Configure Amazon QuickSight with Amazon EMR as the data source.
B. Store the last 24 months of data in Amazon S3 and query it using Amazon Redshift Spectrum. Configure Amazon QuickSight with Amazon Redshift Spectrum as the data source.
C. Store the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Set up an external schema and table for Amazon Redshift Spectrum. Configure Amazon QuickSight with Amazon Redshift as the data source.
D. Store the last 24 months of data in Amazon Redshift. Configure Amazon QuickSight with Amazon Redshift as the data source.
正解:C
質問 2:
A transportation company uses IoT sensors attached to trucks to collect vehicle data for its global delivery fleet. The company currently sends the sensor data in small .csv files to Amazon S3. The files are then loaded into a 10-node Amazon Redshift cluster with two slices per node and queried using both Amazon Athena and Amazon Redshift. The company wants to optimize the files to reduce the cost of querying and also improve the speed of data loading into the Amazon Redshift cluster.
Which solution meets these requirements?
A. Use AWS Glue to convert all the files from .csv to a single large Apache Parquet file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
B. Use AWS Glue to convert the files from .csv to a single large Apache ORC file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
C. Use AWS Glue to convert the files from .csv to Apache Parquet to create 20 Parquet files. COPY the files into Amazon Redshift and query the files with Athena from Amazon S3.
D. Use Amazon EMR to convert each .csv file to Apache Avro. COPY the files into Amazon Redshift and query the file with Athena from Amazon S3.
正解:C
質問 3:
A company plans to store quarterly financial statements in a dedicated Amazon S3 bucket. The financial statements must not be modified or deleted after they are saved to the S3 bucket.
Which solution will meet these requirements?
A. Create the S3 bucket with S3 Object Lock in governance mode.
B. Create the S3 bucket with S3 Object Lock in compliance mode.
C. Create the S3 bucket with MFA delete enabled.
D. Create S3 buckets in two AWS Regions. Use S3 Cross-Region Replication (CRR) between the buckets.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
A hospital uses an electronic health records (EHR) system to collect two types of data
* Patient information, which includes a patient's name and address
* Diagnostic tests conducted and the results of these tests
Patient information is expected to change periodically Existing diagnostic test data never changes and only new records are added The hospital runs an Amazon Redshift cluster with four dc2.large nodes and wants to automate the ingestion of the patient information and diagnostic test data into respective Amazon Redshift tables for analysis The EHR system exports data as CSV files to an Amazon S3 bucket on a daily basis Two sets of CSV files are generated One set of files is for patient information with updates, deletes, and inserts The other set of files is for new diagnostic test data only What is the MOST cost-effective solution to meet these requirements?
A. Use an AWS Lambda function to run a COPY command that appends new diagnostic test data to the diagnostic tests table Run another COPY command to load the patient information data into the staging tables Use a stored procedure to handle create update, and delete operations for the patient information table
B. Use AWS Database Migration Service (AWS DMS) to collect and process change data capture (CDC) records Use the COPY command to load patient information data into the staging tables. Use a stored procedure to handle create, update and delete operations for the patient information table
C. Use an AWS Glue crawler to catalog the data in Amazon S3 Use Amazon Redshift Spectrum to perform scheduled queries of the data in Amazon S3 and ingest the data into the patient information table and the diagnostic tests table.
D. Use Amazon EMR with Apache Hudi. Run daily ETL jobs using Apache Spark and the Amazon Redshift JDBC driver
正解:C
質問 5:
A company collects and transforms data files from third-party providers by using an on-premises SFTP server. The company uses a Python script to transform the dat a.
The company wants to reduce the overhead of maintaining the SFTP server and storing large amounts of data on premises. However, the company does not want to change the existing upload process for the third-party providers.
Which solution will meet these requirements with the LEAST development effort?
A. Create an Amazon S3 bucket that includes a separate prefix for each provider. Provide the S3 URL to each provider for its respective prefix. Instruct the providers to use the S3 COPY command to upload data. Configure an AWS Lambda function that transforms the data when new files are uploaded.
B. Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.
C. Deploy the Python script on an Amazon EC2 instance. Install a third-party SFTP server on the EC2 instance. Schedule the script to run periodically on the EC2 instance to perform a data transformation on new files. Copy the transformed files to Amazon S3.
D. Use AWS Transfer Family to create an SFTP server that includes a publicly accessible endpoint. Configure the new server to use Amazon S3 storage. Change the server name to match the name of the on-premises SFTP server. Use AWS Data Pipeline to schedule a transient Amazon EMR cluster with an Apache Spark step to periodically transform the files.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
An education provider's learning management system (LMS) is hosted in a 100 TB data lake that is built on Amazon S3. The provider's LMS supports hundreds of schools. The provider wants to build an advanced analytics reporting platform using Amazon Redshift to handle complex queries with optimal performance. System users will query the most recent 4 months of data 95% of the time while 5% of the queries will leverage data from the previous 12 months.
Which solution meets these requirements in the MOST cost-effective way?
A. Leverage DS2 nodes for the Amazon Redshift cluster. Migrate all data from Amazon S3 to Amazon Redshift. Decommission the data lake.
B. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift federated queries to join cluster data with the data lake to reduce costs. Ensure the S3 Standard storage class is in use with objects in the data lake.
C. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake.
D. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Use S3 lifecycle management rules to store data from the previous 12 months in Amazon S3 Glacier storage.
正解:C
質問 7:
A financial company hosts a data lake in Amazon S3 and a data warehouse on an Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards and wants to secure access from its on-premises Active Directory to Amazon QuickSight.
How should the data be secured?
A. Establish a secure connection by creating an S3 endpoint to connect Amazon QuickSight and a VPC endpoint to connect to Amazon Redshift.
B. Use a VPC endpoint to connect to Amazon S3 from Amazon QuickSight and an IAM role to authenticate Amazon Redshift.
C. Use an Active Directory connector and single sign-on (SSO) in a corporate network environment.
D. Place Amazon QuickSight and Amazon Redshift in the security group and use an Amazon S3 endpoint to connect Amazon QuickSight to Amazon S3.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
A company has a mobile app that has millions of users. The company wants to enhance the mobile app by including interactive data visualizations that show user trends.
The data for visualization is stored in a large data lake with 50 million rows. Data that is used in the visualization should be no more than two hours old.
Which solution will meet these requirements with the LEAST operational overhead?
A. Run Amazon Athena queries behind an Amazon API Gateway API. Render the data by using the D3.js JavaScript library.
B. Embed an Amazon QuickSight Enterprise edition dashboard into the mobile app by using the QuickSight Embedding SDK. Refresh data in SPICE hourly.
C. Run an hourly batch process that renders user-specific data visualizations as static images that are stored in Amazon S3.
D. Precompute aggregated data hourly. Store the data in Amazon DynamoDB. Render the data by using the D3.js JavaScript library.
正解:C
質問 9:
A financial services company needs to aggregate daily stock trade data from the exchanges into a data store. The company requires that data be streamed directly into the data store, but also occasionally allows data to be modified using SQL. The solution should integrate complex, analytic queries running with minimal latency. The solution must provide a business intelligence dashboard that enables viewing of the top contributors to anomalies in stock prices.
Which solution meets the company's requirements?
A. Use Amazon Kinesis Data Streams to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
B. Use Amazon Kinesis Data Streams to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
C. Use Amazon Kinesis Data Firehose to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
D. Use Amazon Kinesis Data Firehose to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
正解:D
640 お客様のコメント
クリック」





Sasaki -
本書はAWS-Certified-Data-Analytics-Specialtyになじみのないかたでも理解しやすいように、身近なところから順番に学習できるようになっている点も良い点です。