Winter Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bigdisc65

Google Cloud Certified Professional-Data-Engineer Full Course Free

Page: 7 / 14
Question 28

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Options:

A.

Use federated data sources, and check data in the SQL query.

B.

Enable BigQuery monitoring in Google Stackdriver and create an alert.

C.

Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.

D.

Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Question 29

Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

Options:

A.

Use Google Stackdriver Audit Logs to review data access.

B.

Get the identity and access management IIAM) policy of each table

C.

Use Stackdriver Monitoring to see the usage of BigQuery query slots.

D.

Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Question 30

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

Options:

A.

Supervised learning to determine which transactions are most likely to be fraudulent.

B.

Unsupervised learning to determine which transactions are most likely to be fraudulent.

C.

Clustering to divide the transactions into N categories based on feature similarity.

D.

Supervised learning to predict the location of a transaction.

E.

Reinforcement learning to predict the location of a transaction.

F.

Unsupervised learning to predict the location of a transaction.

Question 31

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

Options:

A.

Redefine the schema by evenly distributing reads and writes across the row space of the table.

B.

The performance issue should be resolved over time as the site of the BigDate cluster is increased.

C.

Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.

D.

Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Page: 7 / 14
Exam Name: Google Professional Data Engineer Exam
Last Update: Nov 21, 2024
Questions: 330
Professional-Data-Engineer pdf

Professional-Data-Engineer PDF

$28  $80
Professional-Data-Engineer Engine

Professional-Data-Engineer Testing Engine

$33.25  $95
Professional-Data-Engineer PDF + Engine

Professional-Data-Engineer PDF + Testing Engine

$45.5  $130