You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?
A. Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
B. Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery
C. Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table
D. Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery
正解:B
質問 2:
What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?
A. create a third instance and sync the data from the two storage types via batch jobs
B. export the data from the existing instance and import the data into a new instance
C. run parallel instances where one is HDD and the other is SDD
D. the selection is final and you must resume using the same storage type
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Your organization has two Google Cloud projects, project A and project B. In project A, you have a Pub/Sub topic that receives data from confidential sources. Only the resources in project A should be able to access the data in that topic. You want to ensure that project B and any future project cannot access data in the project A topic. What should you do?
A. Use Identity and Access Management conditions to ensure that only users and service accounts in project A can access resources in project.
B. Add firewall rules in project A so only traffic from the VPC in project A is permitted.
C. Configure VPC Service Controls in the organization with a perimeter around the VPC of project A.
D. Configure VPC Service Controls in the organization with a perimeter around project A.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGS and INT64s in the same column, andinconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?
A. Create a table with the desired schema, toad the CSV files into the table, and perform the transformations in place using SQL.
B. Use Data Fusion to convert the CSV files lo a self-describing data formal, such as AVRO. before loading the data to BigOuery.
C. Use Data Fusion to transform the data before loading it into BigQuery.
D. Load the CSV files into a staging table with the desired schema, perform the transformations with SQL.
and then write the results to the final destination table.
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
Your company currently runs a large on-premises cluster using Spark Hive and Hadoop Distributed File System (HDFS) in a colocation facility. The duster is designed to support peak usage on the system, however, many jobs are batch n nature, and usage of the cluster fluctuates quite dramatically.
Your company is eager to move to the cloud to reduce the overhead associated with on-premises infrastructure and maintenance and to benefit from the cost savings. They are also hoping to modernize their existing infrastructure to use more servers offerings m order to take advantage of the cloud Because of the tuning of their contract renewal with the colocation facility they have only 2 months for their initial migration How should you recommend they approach thee upcoming migration strategy so they can maximize their cost savings in the cloud will still executing the migration in time?
A. Migrate the workloads to Dataproc plus Cloud Storage modernize later
B. Migrate the workloads to Dataproc plus HOPS, modernize later
C. Modernize the Spark workload for Dataflow and the Hive workload for BigQuery
D. Migrate the Spark workload to Dataproc plus HDFS, and modernize the Hive workload for BigQuery
正解:C
質問 6:
You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbclO.
You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL instance is running jn Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects.
You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?
A. Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.
B. Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.
C. Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.
D. Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
Government regulations in the banking industry mandate the protection of client's personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII. What should you do?
A. Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources
B. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group
C. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users
D. Use one service account to access a Cloud SQL database and use separate service accounts for each human user
正解:B
質問 8:
You recently deployed several data processing jobs into your Cloud Composer 2 environment. You notice that some tasks are failing in Apache Airflow. On the monitoring dashboard, you see an increase in the total workers' memory usage, and there were worker pod evictions. You need to resolve these errors. What should you do?
Choose 2 answers
A. Increase the memory available to the Airflow workers.
B. Increase the Cloud Composer 2 environment size from medium to large.
C. Increase the directed acyclic graph (DAG) file parsing interval.
D. Increase the maximum number of workers and reduce worker concurrency.
E. Increase the memory available to the Airflow triggerer.
正解:A,D
解説: (Pass4Test メンバーにのみ表示されます)
質問 9:
You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?
A. Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.
B. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
C. Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
D. Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
正解:D
伴都** -
腰を落ち着かせて勉強するには、やはりアプリの方が頭に入りやすいから、あるのは嬉しい。受かる可能性は大きい。