Which of the following describes the storage organization of a Delta table?
Options:
A.
Delta tables are stored in a single file that contains data, history, metadata, and other attributes.
B.
Delta tables store their data in a single file and all metadata in a collection of files in a separate location.
C.
Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
D.
Delta tables are stored in a collection of files that contain only the data stored within the table.
E.
Delta tables are stored in a single file that contains only the data stored within the table.
Answer:
C
Explanation:
Explanation:
Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse. Delta Lake is open source software that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling1. Delta Lake stores its data and metadata in a collection of files in a directory on a cloud storage system, such as AWS S3 or Azure Data Lake Storage2. Each Delta table has a transaction log that records the history of operations performed on the table, such as insert, update, delete, merge, etc. The transaction log also stores the schema and partitioning information of the table2. The transaction log enables Delta Lake to provide ACID guarantees, time travel, schema enforcement, and other features1. References:
What is Delta Lake? | Databricks on AWS
Quickstart — Delta Lake Documentation
Question 29
Which of the following describes the relationship between Bronze tables and raw data?
Options:
A.
Bronze tables contain less data than raw data files.
B.
Bronze tables contain more truthful data than raw data.
C.
Bronze tables contain aggregates while raw data is unaggregated.
D.
Bronze tables contain a less refined view of data than raw data.
E.
Bronze tables contain raw data with a schema applied.
Answer:
E
Explanation:
Explanation:
Bronze tables are the first layer of a medallion architecture, which is a data design pattern used to organize data in a lakehouse. Bronze tables contain raw data ingested from various sources, such as RDBMS data, JSON files, IoT data, etc. The table structures in this layer correspond to the source system table structures “as-is”, along with any additional metadata columns that capture the load date/time, process ID, etc. The only transformation applied to the raw data in this layer is to apply a schema, which defines the column names and data types of the table. The schema can be inferred from the data source or specified explicitly. Applying a schema to the raw data enables the use of SQL and other structured query languages to access and analyze the data. Therefore, option E is the correct answer. References: What is a Medallion Architecture?, Raw Data Ingestion into Delta Lake Bronze tables using Azure Synapse Mapping Data Flow, Apache Spark + Delta Lake concepts, Delta Lake Architecture & Azure Databricks Workspace.