Creating your first catalog, schema and tables in DataBricks

Working in Databricks, it is key to harness a foundational understanding of Catalogs, Schemas, and Tables before moving on to advanced AI and ML use cases. The traditional database work of setting up a data environment is rapidly scalable within the Da…


This content originally appeared on DEV Community and was authored by Jordan Smith

Working in Databricks, it is key to harness a foundational understanding of Catalogs, Schemas, and Tables before moving on to advanced AI and ML use cases. The traditional database work of setting up a data environment is rapidly scalable within the Databricks platform like never before, but nonetheless, we cannot skip these steps.

Image description

Catalog Overview and Default Catalogs

A Catalog is the primary unit of data organization in the Databricks Unity Catalog data governance model, and Catalogs are the first layer in Unity Catalog's three-level namespace (for example, catalog.schema.table) a catalog can only contain schemas, but schemas can subsequently contain several disparate types of data.

When you design your data governance model, you should give careful thought to the catalogs that you create. As the highest level in your organization’s data governance model, each catalog should represent a logical unit of data isolation and a logical category of data access, allowing an efficient hierarchy of grants to flow down to schemas and the data objects that they contain.

A default catalog is configured for each workspace that is enabled for Unity Catalog. The default catalog lets you perform data operations without specifying a catalog. If you omit the top-level catalog name when you perform data operations, the default catalog is assumed.

If your workspace was enabled for Unity Catalog automatically, the pre-provisioned workspace catalog is specified as the default catalog. A workspace admin can change the default catalog as needed.

Even though most of the work described in this blog can be completed by the Databricks UI, it is important to understand the code behind the workflows. To create a new Catalog, you can use the following SQL code in a Databricks Notebook:

%sql
-- Find the below URL by going to Catalog >> Create New Catalog >> Storage Location
CREATE CATALOG IF NOT EXISTS first_catalog
MANAGED LOCATION 'abfss://unity-catalog-storage@dbstoragewe2nak3uyjbts.dfs.core.windows.net/3297083325245759'

There are several additional arguments that can be added when creating a catalog, which can be reviewed in the Databricks Documentation website. The only argument we will discuss here is MANAGED LOCATION, which is required if your Databricks account does not have a metastore-level storage location specified. For demo and trial users of Databricks, we might not have metastore-level storage set up. We can work around this by finding the URL of our account's unity catalog by navigating to Catalog on the lefthand sidebar, selecting Create New Catalog, and selecting the default storage location.

Schema Overview and Code

In Unity Catalog, a schema is a child of a catalog and can contain tables, views, volumes, models, and functions. A schema organizes data and AI assets into logical categories that are more granular than catalogs. Typically a schema represents a single use case, project, or team sandbox. Regardless of category type, schemas are a useful tool for managing data access control and improving data discoverability.

We can create a schema within the first Catalog that we set up earlier in this blog. Notice two of the three components of the the catalog.schema.table namespace are utilized in the below command.

%sql
CREATE SCHEMA IF NOT EXISTS first.catalog.first.schema

Volumes and Tables

While there are several objects that can sit below Schemas in Databricks, Volumes and Tables are the key objects for beginners to understand.

While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. *You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data. * Another way to understand this, is that volumes are the precursor to tables, where we might import bronze-level data and preform transformation and ETL steps (former excel users, think power-query). To create a volume, use the following code:

%sql
CREATE VOLUME IF NOT EXISTS first_catalog.first_schema.first_volume

This has served as an introduction to setting up a preliminary data environment in Databricks. Check out the next blog for an overview of ingesting raw data from the internet into the volume you created, and transforming the volume data into a a tabular table that we can preform AI and ML on.


This content originally appeared on DEV Community and was authored by Jordan Smith


Print Share Comment Cite Upload Translate Updates
APA

Jordan Smith | Sciencx (2025-02-13T20:17:10+00:00) Creating your first catalog, schema and tables in DataBricks. Retrieved from https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/

MLA
" » Creating your first catalog, schema and tables in DataBricks." Jordan Smith | Sciencx - Thursday February 13, 2025, https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/
HARVARD
Jordan Smith | Sciencx Thursday February 13, 2025 » Creating your first catalog, schema and tables in DataBricks., viewed ,<https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/>
VANCOUVER
Jordan Smith | Sciencx - » Creating your first catalog, schema and tables in DataBricks. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/
CHICAGO
" » Creating your first catalog, schema and tables in DataBricks." Jordan Smith | Sciencx - Accessed . https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/
IEEE
" » Creating your first catalog, schema and tables in DataBricks." Jordan Smith | Sciencx [Online]. Available: https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/. [Accessed: ]
rf:citation
» Creating your first catalog, schema and tables in DataBricks | Jordan Smith | Sciencx | https://www.scien.cx/2025/02/13/creating-your-first-catalog-schema-and-tables-in-databricks/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.