Passcerty.com » Databricks » Databricks Certification » DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Exam Questions & Answers

Exam Code: DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER

Exam Name: Databricks Certified Professional Data Engineer Exam

Updated: May 03, 2024

Q&As: 120

At Passcerty.com, we pride ourselves on the comprehensive nature of our DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER exam dumps, designed meticulously to encompass all key topics and nuances you might encounter during the real examination. Regular updates are a cornerstone of our service, ensuring that our dedicated users always have their hands on the most recent and relevant Q&A dumps. Behind every meticulously curated question and answer lies the hard work of our seasoned team of experts, who bring years of experience and knowledge into crafting these premium materials. And while we are invested in offering top-notch content, we also believe in empowering our community. As a token of our commitment to your success, we're delighted to offer a substantial portion of our resources for free practice. We invite you to make the most of the following content, and wish you every success in your endeavors.

Download Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Demo

Experience Passcerty.com exam material in PDF version.
Simply submit your e-mail address below to get started with our PDF real exam demo of your Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER exam.

Instant download
Latest update demo according to real exam

* Our demo shows only a few questions from your selected exam for evaluating purposes

Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Dumps

Practice These Free Questions and Answers to Pass the Databricks Certification Exam

Questions 1

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.

The data engineering team has been made aware of new requirements from a customer- facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

A. Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

B. Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.

C. Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.

D. Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

E. Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer- facing application.

Show Answer

Questions 2

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. Theuser_idfield represents a unique key for the data, which has the following schema:

user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT

New records are all ingested into a table namedaccount_historywhich maintains a full record of all data in the same schema as the source. The next table in the system is namedaccount_currentand is implemented as a Type 1 table representing the most recent value for each uniqueuser_id.

Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the describedaccount_currenttable as part of each hourly batch job?

A. Use Auto Loader to subscribe to new files in the account history directory; configure a Structured Streaminq trigger once job to batch update newly detected files into the account current table.

B. Overwrite the account current table with each batch using the results of a query against the account history table grouping by user id and filtering for the max value of last updated.

C. Filter records in account history using the last updated field and the most recent hour processed, as well as the max last iogin by user id write a merge statement to update or insert the most recent value for each user id.

D. Use Delta Lake version history to get the difference between the latest version of account history and one version prior, then write these records to account current.

E. Filter records in account history using the last updated field and the most recent hour processed, making sure to deduplicate on username; write a merge statement to update or insert the most recent value for each username.

Show Answer

Questions 3

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

A. Total VMs; 1 400 GB per Executor 160 Cores / Executor

B. Total VMs: 8 50 GB per Executor 20 Cores / Executor

C. Total VMs: 4 100 GB per Executor 40 Cores/Executor

D. Total VMs:2 200 GB per Executor 80 Cores / Executor

Show Answer

Questions 4

The data engineering team maintains the following code:

Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?

A. A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.

B. The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.

C. An incremental job will leverage information in the state store to identify unjoined rows in the source tables and write these rows to the enriched_iteinized_orders_by_account table.

D. An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.

E. No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.

Show Answer

Questions 5

Which of the following is true of Delta Lake and the Lakehouse?

A. Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.

B. Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.

C. Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

D. Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

E. Z-order can only be applied to numeric values stored in Delta Lake tables

Show Answer More Questions

Viewing Page 3 of 3 pages. Download PDF or Software version with 120 questions

<1 23

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Exam Questions & Answers

Download Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Demo

Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Dumps

Practice These Free Questions and Answers to Pass the Databricks Certification Exam

Why Choose Passcerty.com

What Our Customers Are Saying:

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Exam Questions & Answers

Download Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Demo

Free Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Dumps

Practice These Free Questions and Answers to Pass the Databricks Certification Exam

Why Choose Passcerty.com

What Our Customers Are Saying:

Related ExamsRelated Certifications