Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Question # 44

You want to optimize your queries for cost and performance. How should you structure your data?

Options:

A.

Partition table data by create_date, location_id and device_version

B.

Partition table data by create_date cluster table data by location_Id and device_version

C.

Cluster table data by create_date location_id and device_version

D.

Cluster table data by create_date partition by locationed and device_version

Buy Now
Question # 45

You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You’ve collected a labeled dataset that has on average 1000 examples for each unique component. Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept) within a few working days. What should you do?

Options:

A.

Use Cloud Vision AutoML with the existing dataset.

B.

Use Cloud Vision AutoML, but reduce your dataset twice.

C.

Use Cloud Vision API by providing custom labels as recognition hints.

D.

Train your own image recognition model leveraging transfer learning techniques.

Buy Now
Question # 46

You are a BigQuery admin supporting a team of data consumers who run ad hoc queries and downstream reporting in tools such as Looker. All data and users are combined under a single organizational project. You recently noticed some slowness in query results and want to troubleshoot where the slowdowns are occurring. You think that there might be some job queuing or slot contention occurring as users run jobs, which slows down access to results. You need to investigate the query job information and determine where performance is being affected. What should you do?

Options:

A.

Use Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

B.

Use slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

C.

Use Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

D.

Use available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.

Buy Now
Question # 47

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.

What should you do?

Options:

A.

Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

B.

Place the MariaDB instances in an Instance Group with a Health Check.

C.

Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

D.

Install the StackDriver Agent and configure the MySQL plugin.

Buy Now
Question # 48

You have a table that contains millions of rows of sales data, partitioned by date Various applications and users query this data many times a minute. The query requires aggregating values by using avg. max. and sum, and does not require joining to other tables. The required aggregations are only computed over the past year of data, though you need to retain full historical data in the base tables You want to ensure that the query results always include the latest data from the tables, while also reducing computation cost, maintenance overhead, and duration. What should you do?

Options:

A.

Create a materialized view to aggregate the base table data Configure a partition expiration on the base table to retain only the last one year of partitions.

B.

Create a materialized view to aggregate the base table data include a filter clause to specify the last one year of partitions.

C.

Create a new table that aggregates the base table data include a filter clause to specify the last year of partitions. Set up a scheduled query to recreate the new table every hour.

D.

Create a view to aggregate the base table data Include a filter clause to specify the last year of partitions.

Buy Now
Question # 49

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

Options:

A.

Enable data access logs in each Data Analyst’s project. Restrict access to Stackdriver Logging via Cloud IAM roles.

B.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ projects. Restrict access to the Cloud Storage bucket.

C.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.

D.

Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.

Buy Now
Question # 50

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub

streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

Options:

A.

They have not assigned the timestamp, which causes the job to fail

B.

They have not set the triggers to accommodate the data coming in late, which causes the job to fail

C.

They have not applied a global windowing function, which causes the job to fail when the pipeline is

created

D.

They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Buy Now
Question # 51

You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

Options:

A.

Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.

B.

Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.

C.

Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.

D.

Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.

Buy Now
Question # 52

Your car factory is pushing machine measurements as messages into a Pub/Sub topic in your Google Cloud project. A Dataflow streaming job. that you wrote with the Apache Beam SDK, reads these messages, sends acknowledgment lo Pub/Sub. applies some custom business logic in a Doffs instance, and writes the result to BigQuery. You want to ensure that if your business logic fails on a message, the message will be sent to a Pub/Sub topic that you want to monitor for alerting purposes. What should you do?

Options:

A.

Use an exception handling block in your Data Flow’s Doffs code to push the messages that failed to be transformed through a side output

and to a new Pub/Sub topic. Use Cloud Monitoring to monitor the topic/num_jnacked_messages_by_region metric on this new topic.

B.

Enable retaining of acknowledged messages in your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the

subscription/num_retained_acked_messages metric on this subscription.

C.

Enable dead lettering in your Pub/Sub pull subscription, and specify a new Pub/Sub topic as the dead letter topic. Use Cloud Monitoring to

monitor the subscription/dead_letter_message_count metric on your pull subscription.

D.

Create a snapshot of your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the snapshot/numessages metric on this

snapshot.

Buy Now
Question # 53

You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using Sidelnputs to join data You noticed that the pipeline is taking longer to complete than expected, what should you do to expedite the Dataflow job?

Options:

A.

Switch to compressed Avro files

B.

Reduce the batch size

C.

Retry records that throw an error

D.

Use CoGroupByKey instead of the Sidelnput

Buy Now
Exam Name: Google Professional Data Engineer Exam
Last Update: Apr 25, 2025
Questions: 374
Professional-Data-Engineer pdf

Professional-Data-Engineer PDF

$34  $84.99
Professional-Data-Engineer Engine

Professional-Data-Engineer Testing Engine

$38  $94.99
Professional-Data-Engineer PDF + Engine

Professional-Data-Engineer PDF + Testing Engine

$54  $134.99