Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Question # 44

You want to optimize your queries for cost and performance. How should you structure your data?

Options:

Partition table data by create_date, location_id and device_version

Partition table data by create_date cluster table data by location_Id and device_version

Cluster table data by create_date location_id and device_version

Cluster table data by create_date partition by locationed and device_version

Buy Now

Question # 45

You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You’ve collected a labeled dataset that has on average 1000 examples for each unique component. Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept) within a few working days. What should you do?

Options:

Use Cloud Vision AutoML with the existing dataset.

Use Cloud Vision AutoML, but reduce your dataset twice.

Use Cloud Vision API by providing custom labels as recognition hints.

Train your own image recognition model leveraging transfer learning techniques.

Buy Now

Question # 46

You are a BigQuery admin supporting a team of data consumers who run ad hoc queries and downstream reporting in tools such as Looker. All data and users are combined under a single organizational project. You recently noticed some slowness in query results and want to troubleshoot where the slowdowns are occurring. You think that there might be some job queuing or slot contention occurring as users run jobs, which slows down access to results. You need to investigate the query job information and determine where performance is being affected. What should you do?

Options:

Use Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

Use slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

Use Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

Use available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.

Buy Now

Answer:

Explanation:

To troubleshoot query performance issues related to job queuing or slot contention in BigQuery, using administrative resource charts along with querying the INFORMATION_SCHEMA is the best approach. Here’s why option D is the best choice:

Administrative Resource Charts:

BigQuery provides detailed resource charts that show slot usage and job performance over time. These charts help identify patterns of slot contention and peak usage times.

INFORMATION_SCHEMA Queries:

The INFORMATION_SCHEMA tables in BigQuery provide detailed metadata about query jobs, including execution times, slots consumed, and other performance metrics.

Running queries on INFORMATION_SCHEMA allows you to pinpoint specific jobs causing contention and analyze their performance characteristics.

Comprehensive Analysis:

Combining administrative resource charts with detailed queries on INFORMATION_SCHEMA provides a holistic view of the system’s performance.

This approach enables you to identify and address the root causes of performance issues, whether they are due to slot contention, inefficient queries, or other factors.

Steps to Implement:

Access Administrative Resource Charts:

Use the Google Cloud Console to view BigQuery’s administrative resource charts. These charts provide insights into slot utilization and job performance metrics over time.

Run INFORMATION_SCHEMA Queries:

Execute queries on BigQuery’s INFORMATION_SCHEMA to gather detailed information about job performance. For example:

SELECT

creation_time,

job_id,

user_email,

query,

total_slot_ms / 1000 AS slot_seconds,

total_bytes_processed / (1024 * 1024 * 1024) AS processed_gb,

total_bytes_billed / (1024 * 1024 * 1024) AS billed_gb

FROM

`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE

creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)

AND state = 'DONE'

ORDER BY

slot_seconds DESC

LIMIT 100;

Analyze and Optimize:

Use the information gathered to identify bottlenecks, optimize queries, and adjust resource allocations as needed to improve performance.

Reference Links:

Monitoring BigQuery Slots

BigQuery INFORMATION_SCHEMA

BigQuery Performance Best Practices

Question # 47

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.

What should you do?

Options:

Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

Place the MariaDB instances in an Instance Group with a Health Check.

Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

Install the StackDriver Agent and configure the MySQL plugin.

Buy Now

Question # 48

You have a table that contains millions of rows of sales data, partitioned by date Various applications and users query this data many times a minute. The query requires aggregating values by using avg. max. and sum, and does not require joining to other tables. The required aggregations are only computed over the past year of data, though you need to retain full historical data in the base tables You want to ensure that the query results always include the latest data from the tables, while also reducing computation cost, maintenance overhead, and duration. What should you do?

Options:

Create a materialized view to aggregate the base table data Configure a partition expiration on the base table to retain only the last one year of partitions.

Create a materialized view to aggregate the base table data include a filter clause to specify the last one year of partitions.

Create a new table that aggregates the base table data include a filter clause to specify the last year of partitions. Set up a scheduled query to recreate the new table every hour.

Create a view to aggregate the base table data Include a filter clause to specify the last year of partitions.

Buy Now

Question # 49

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

Options:

Enable data access logs in each Data Analyst’s project. Restrict access to Stackdriver Logging via Cloud IAM roles.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ projects. Restrict access to the Cloud Storage bucket.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.

Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.

Buy Now

Question # 50

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub

streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

Options:

They have not assigned the timestamp, which causes the job to fail

They have not set the triggers to accommodate the data coming in late, which causes the job to fail

They have not applied a global windowing function, which causes the job to fail when the pipeline is

created

They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Buy Now

Question # 51

You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

Options:

Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.

Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.

Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.

Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.

Buy Now

Question # 52

Your car factory is pushing machine measurements as messages into a Pub/Sub topic in your Google Cloud project. A Dataflow streaming job. that you wrote with the Apache Beam SDK, reads these messages, sends acknowledgment lo Pub/Sub. applies some custom business logic in a Doffs instance, and writes the result to BigQuery. You want to ensure that if your business logic fails on a message, the message will be sent to a Pub/Sub topic that you want to monitor for alerting purposes. What should you do?

Options:

Use an exception handling block in your Data Flow’s Doffs code to push the messages that failed to be transformed through a side output

and to a new Pub/Sub topic. Use Cloud Monitoring to monitor the topic/num_jnacked_messages_by_region metric on this new topic.

Enable retaining of acknowledged messages in your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the

subscription/num_retained_acked_messages metric on this subscription.

Enable dead lettering in your Pub/Sub pull subscription, and specify a new Pub/Sub topic as the dead letter topic. Use Cloud Monitoring to

monitor the subscription/dead_letter_message_count metric on your pull subscription.

Create a snapshot of your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the snapshot/numessages metric on this

snapshot.

Buy Now

Question # 53

You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using Sidelnputs to join data You noticed that the pipeline is taking longer to complete than expected, what should you do to expedite the Dataflow job?

Options:

Switch to compressed Avro files

Reduce the batch size

Retry records that throw an error

Use CoGroupByKey instead of the Sidelnput

Buy Now

Exam Code: Professional-Data-Engineer

Exam Name: Google Professional Data Engineer Exam

Last Update: Apr 25, 2025

Questions: 374

Professional-Data-Engineer PDF

$34 ~~$84.99~~

Add to Cart

Professional-Data-Engineer Testing Engine

$38 ~~$94.99~~

Add to Cart

Professional-Data-Engineer PDF + Testing Engine

$54 ~~$134.99~~

Add to Cart

Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

certsboard certification exams

Navigation:

Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Professional-Data-Engineer PDF

Professional-Data-Engineer Testing Engine

Professional-Data-Engineer PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure