A data engineer is tasked with processing a large dataset of customer orders using Snowpark Python. The dataset contains a column stored as a string in 'YYYY-MM-DD HH:MI:SS' format. They need to create a new DataFrame with only the orders placed in the month of January 2023. Which of the following code snippets achieves this most efficiently, considering potential data volume and query performance?
A.

B.

C.

D.

E.

正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
You need to create a UDF in Snowflake to perform complex data validation. This UDF must access an external API to retrieve validation rules based on the input data'. You want to ensure that sensitive API keys are not exposed within the UDF's code and that the external API call is made securely. Which of the following approaches is the MOST secure and appropriate for this scenario?
A. Hardcode the API key directly into the UDF's JavaScript code, obfuscating it with base64 encoding.
B. Use a Snowflake Secret to securely store the API key. Retrieve the secret within the UDF using the 'SYSTEM$GET_SECRET function, and use 'SECURITY INVOKER with caution or define the UDF as 'SECURITY DEFINER with appropriate role based access controls .
C. Store the API key as an environment variable within the UDF's JavaScript code. Snowflake automatically encrypts environment variables for security.
D. Store the API key in a Snowflake table with strict access controls, and retrieve it within the UDF using a SELECT statement. Use 'SECURITY INVOKER to ensure the UDF uses the caller's privileges when accessing the table.
E. Pass the API key as an argument to the UDF when it is called. Rely on the caller to provide the correct key and keep it secure.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
You have implemented a Snowpipe using auto-ingest to load data from an AWS S3 bucket. The pipe is configured to load data into a table with a 'DATE column ('TRANSACTION DATE'). The data files in S3 contain a date field in the format 'YYYYMMDD'. Occasionally, you observe data loading failures in Snowpipe with the error message indicating an issue converting the string to a date. The 'FILE FORMAT' definition includes 'DATE FORMAT = 'YYYYMMDD''. Furthermore, you are also noticing that after a while, some files are not being ingested even though they are present in the S3 bucket. How to effectively diagnose and resolve these issues?
A. The issue may arise if the time zone of the Snowflake account does not match the time zone of your data in AWS S3. Try setting the 'TIMEZONE parameter in the FILE FORMAT definition. For files that are not being ingested, manually refresh the Snowpipe with 'ALTER PIPE ... REFRESH'.
B. The error could be due to invalid characters in the source data files. Implement data cleansing steps to remove invalid characters from the date fields before uploading to S3. For files not being ingested, check S3 event notifications for missing or failed events.
C. The 'DATE FORMAT parameter is case-sensitive. Ensure it matches the case of the incoming data. Also, check the 'VALIDATION MODE and ERROR parameters to ensure error handling is appropriately configured for files with date format errors. For the files that are not ingested use 'SYSTEM$PIPE to find the cause of the issue.
D. Snowflake's auto-ingest feature has limitations and may not be suitable for inconsistent data formats. Consider using the Snowpipe REST API to implement custom error handling and data validation logic. Monitor the Snowflake event queue to ensure events are being received.
E. Verify that the 'DATE FORMAT is correct and that all files consistently adhere to this format. Check for corrupted files in S3 that may be preventing Snowpipe from processing subsequent files. Additionally, review the Snowpipe error notifications in Snowflake to identify the root cause of ingestion failures. Use 'SYSTEM$PIPE to troubleshoot the files not ingested
正解:C,E
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
A data engineer is facing performance issues with a complex analytical query in Snowflake. The query joins several large tables and uses multiple window functions. The query profile indicates that a significant amount of time is spent in the 'Remote Spill' stage. This means the data from one of the query stages is spilling to the remote disk. What are the possible root causes for 'Remote Spill' and what steps can be taken to mitigate this issue? Select two options.
A. The data being queried is stored in a non-Snowflake database, making it difficult to optimize the join.
B. The window functions are operating on large partitions of data, exceeding the available memory on the compute nodes. Try to reduce the partition size by pre- aggregating the data or using filtering before applying the window functions.
C. The 'Remote Spill' indicates network latency issues between compute nodes. There is nothing the data engineer can do to fix this; it is an infrastructure issue.
D. The virtual warehouse is not appropriately sized for the volume of data and complexity of the query. Increasing the virtual warehouse size might provide sufficient memory to avoid spilling.
E. The query is using a non-optimal join strategy. Review the query profile and consider using join hints to force a different join order or algorithm.
正解:B,D
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
A data warehousing team is experiencing inconsistent query performance on a large fact table C SALES FACT) that is updated daily. Some queries involving complex joins and aggregations take significantly longer to execute than others, even when run with the same virtual warehouse size. You suspect that the query result cache is not being effectively utilized due to variations in query syntax and the dynamic nature of the data'. Which of the following strategies could you implement to maximize the effectiveness of the query result cache and improve query performance consistency? Assume virtual warehouse size is large and the data is skewed across days.
A. Create a separate virtual warehouse specifically for running these queries. This will isolate the cache and prevent it from being invalidated by other queries.
B. Implement query tagging to standardize query syntax. By applying consistent tags to queries, you can ensure that similar queries are recognized as identical and reuse cached results.
C. Implement a data masking policy on the 'SALES_FACT table. Data masking will reduce the size of the data that needs to be cached, improving cache utilization.
D. Optimize the 'SALES_FACT table by clustering it on the most frequently used filter columns and enabling automatic clustering. This will improve data locality and reduce the amount of data that needs to be scanned.
E. Use stored procedures with parameters to encapsulate the queries. This will ensure that the query syntax is consistent, regardless of the specific parameters used.
正解:D,E
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
You are using the Snowflake Spark connector to update records in a Snowflake table based on data from a Spark DataFrame. The Snowflake table 'CUSTOMER' has columns 'CUSTOMER ID' (primary key), 'NAME, and 'ADDRESS'. You have a Spark DataFrame with updated 'NAME and 'ADDRESS' values for some customers. To optimize performance and minimize data transfer, which of the following strategies can you combine with a temporary staging table to perform an efficient update?
A. Broadcast the Spark DataFrame to all executor nodes, then use a UDF to execute the 'UPDATE' statement for each row directly from Spark.
B. Use Spark's foreachPartition to batch update statements and execute on each partition. This will help with efficient data transfer and avoid single row based updates.
C. Write the Spark DataFrame to a temporary table in Snowflake using MERGE. Use the WHEN MATCHED clause for Update the target table based on updates from staging table and finally drop the staging table
D. Iterate through each row in the Spark DataFrame and execute an individual 'UPDATE statement against the 'CUSTOMER table in Snowflake. Use the 'CUSTOMER_ID in the 'WHERE clause.
E. Write the Spark DataFrame to a temporary table in Snowflake. Then, execute an 'UPDATE statement in Snowflake joining the temporary table with the 'CUSTOMER table using the 'CUSTOMER_ID to update the 'NAME and 'ADDRESS' columns. Finally, drop the temporary table.
正解:C,E
解説: (Pass4Test メンバーにのみ表示されます)
質問 7:
You are responsible for monitoring the performance of a Snowflake data pipeline that loads data from S3 into a Snowflake table named 'SALES DATA. You notice that the COPY INTO command consistently takes longer than expected. You want to implement telemetry to proactively identify the root cause of the performance degradation. Which of the following methods, used together, provide the MOST comprehensive telemetry data for troubleshooting the COPY INTO performance?
A. Query the 'COPY_HISTORY view and the view in 'ACCOUNT_USAG Also, check the S3 bucket for throttling errors.
B. Query the ' LOAD_HISTORY function and monitor the network latency between S3 and Snowflake using an external monitoring tool.
C. Use Snowflake's partner connect integrations to monitor the virtual warehouse resource consumption and query the 'VALIDATE function to ensure data quality before loading.
D. Query the 'COPY HISTORY view in the 'INFORMATION SCHEMA' and monitor CPU utilization of the virtual warehouse using the Snowflake web I-Jl.
E. Query the 'COPY HISTORY view in the 'INFORMATION SCHEMA' and enable Snowflake's query profiling for the COPY INTO statement.
正解:A,E
解説: (Pass4Test メンバーにのみ表示されます)
質問 8:
You are developing a JavaScript UDF in Snowflake to perform complex data validation on incoming data'. The UDF needs to validate multiple fields against different criteria, including checking for null values, data type validation, and range checks. Furthermore, you need to return a JSON object containing the validation results for each field, indicating whether each field is valid or not and providing an error message if invalid. Which approach is the MOST efficient and maintainable way to structure your JavaScript UDF to achieve this?
A. Define a JavaScript object containing validation rules and corresponding validation functions. Iterate through the object and apply the rules to the input data, collecting the validation results in a JSON object. This object is returned as a string.
B. Utilize a JavaScript library like Lodash or Underscore.js within the UDF to perform data manipulation and validation. Return a JSON string containing the validation results.
C. Use a single, monolithic JavaScript function with nested if-else statements to handle all validation logic. Return a JSON string containing the validation results.
D. Create separate JavaScript functions for each validation check (e.g., 'isNull', 'isValidType', 'isWithinRange'). Call these functions from the main UDF and aggregate the results into a JSON object.
E. Directly embed SQL queries within the JavaScript UDF to perform data validation checks using Snowflake's built-in functions. Return a JSON string containing the validation results.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 9:
You are configuring cross-cloud replication for a Snowflake database named 'SALES DB' from an AWS (us-east-I) account to an Azure (eastus) account. You have already set up the necessary network policies and security integrations. However, replication is failing with the following error: 'Replication of database SALES DB failed due to insufficient privileges on object 'SALES DB.PUBLIC.ORDERS'.' What is the MOST LIKELY cause of this issue, and how would you resolve it? (Assume the replication group and target database exist).
A. The target Azure account does not have sufficient storage capacity. Increase the storage quota for the Azure account.
B. The network policy is blocking access to the ORDERS table. Update the network policy to allow access to the ORDERS table.
C. The replication group is missing the 'ORDERS' table. Alter the replication group to include the 'ORDERS' table: 'ALTER REPLICATION GROUP ADD DATABASE SALES DB;'
D. The replication group does not have the necessary permissions to access the 'ORDERS' table in the AWS account. Grant the 'OWNERSHIP' privilege on the 'ORDERS table to the replication group: 'GRANT OWNERSHIP ON TABLE SALES DB.PUBLIC.ORDERS TO REPLICATION GROUP
E. The user account performing the replication does not have the 'ACCOUNTADMIN' role in the AWS account. Grant the 'ACCOUNTADMIN' role to the user.
正解:D
解説: (Pass4Test メンバーにのみ表示されます)