What Are DBT Sources
What Are DBT Sources
What Are DBT Sources
dbt Cloud is the fastest and most reliable way to deploy dbt. Develop, test, schedule,
document, and investigate data models all in one browser-based UI.
In addition to providing a hosted architecture for running dbt across your organization, dbt
Cloud comes equipped with turnkey support for scheduling jobs, CI/CD, hosting
documentation, monitoring and alerting, an integrated development environment (IDE), and
allows you to develop and run dbt commands from your local command line interface (CLI)
or code editor.
dbt Cloud's flexible plans and features make it well-suited for data teams of any size — sign
up for your free 14-day trial!
Use the dbt Cloud CLI to develop, test, run, and version control dbt projects and commands
in your dbt Cloud development environment. Collaborate with team members, directly from
the command line.
The IDE is the easiest and most efficient way to develop dbt models, allowing you to build,
test, run, and version control your dbt projects directly from your browser.
Manage environments
Set up and manage separate production and development environments in dbt Cloud to
help engineers develop and test code more efficiently, without impacting users or data.
Create custom schedules to run your production jobs. Schedule jobs by day of the week,
time of day, or a recurring interval. Decrease operating costs by using webhooks to trigger
CI jobs and the API to start jobs.
Notifications
Set up and customize job notifications in dbt Cloud to receive email or slack alerts when a
job run succeeds, fails, or is cancelled. Notifications alert the right people when something
goes wrong instead of waiting for a user to report it.
Run visibility
View the history of your runs and the model timing dashboard to help identify where
improvements can be made to the scheduled jobs.
dbt Cloud hosts and authorizes access to dbt project documentation, allowing you to
generate data documentation on a schedule for your project. Invite teammates to dbt Cloud
to collaborate and share your project's documentation.
Seamlessly connect your git account to dbt Cloud and provide another layer of security to
dbt Cloud. Import new repositories, trigger continuous integration, clone repos using
HTTPS, and more!
Configure dbt Cloud to run your dbt projects in a temporary schema when new commits are
pushed to open pull requests. This build-on-PR functionality is a great way to catch bugs
before deploying to production, and an essential tool in any analyst's belt.
Security
Manage risk with SOC-2 compliance, CI/CD deployment, RBAC, and ELT architecture.
Use the dbt Semantic Layer to define metrics alongside your dbt models and query them
from any integrated analytics tool. Get the same answers everywhere, every time.
Discovery API*
Enhance your workflow and run ad-hoc queries, browse schema, or query the dbt Semantic
Layer. dbt Cloud serves a GraphQL API, which supports arbitrary queries.
dbt Explorer*
Learn about dbt Explorer and how to interact with it to understand, improve, and leverage
your data pipelines.
Defer is a powerful feature that allows developers to only build and run and test models
they've edited without having to first run and build all the models that come before them
(upstream parents). dbt powers this by using a production manifest for comparison, and
resolves the {{ ref() }} function with upstream production artifacts.
Both the dbt Cloud IDE and the dbt Cloud CLI enable users to natively defer to production
metadata directly in their development workflows.
dbt uses the production locations of parent models to resolve {{ ref() }} functions,
based on metadata from the production environment.
If a development version of a deferred model exists, dbt preferentially uses the
development database location when resolving the reference.
Passing the --favor-state flag overrides the default behavior and always resolve refs
using production metadata, regardless of the presence of a development relation.
For a clean slate, it's a good practice to drop the development schema at the start and end
of your development cycle.
Required setup
You must select the Production environment checkbox in the Environment
Settings page.
o This can be set for one deployment environment per dbt Cloud project.
You must have a successful job run first.
When using defer, it compares artifacts from the most recent successful production job,
excluding CI jobs.
To enable defer in the dbt Cloud IDE, toggle the Defer to production button on the
command bar. Once enabled, dbt Cloud will:
1. Pull down the most recent manifest from the Production environment for comparison
2. Pass the --defer flag to the command (for any command that accepts the flag)
For example, if you were to start developing on a new branch with nothing in your
development schema, edit a single model, and run dbt build -s state:modified — only the
edited model would run. Any {{ ref() }} functions will point to the production location of the
referenced models.
Select
the 'Defer to production' toggle on the bottom right of the command bar to enable defer in
the dbt Cloud IDE.
Defer in dbt Cloud CLI
One key difference between using --defer in the dbt Cloud CLI and the dbt Cloud IDE is
that --defer is automatically enabled in the dbt Cloud CLI for all invocations, compared with
production artifacts. You can disable it with the --no-defer flag.
The dbt Cloud CLI offers additional flexibility by letting you choose the source environment
for deferral artifacts. You can set a defer-env-id key in either
your dbt_project.yml or dbt_cloud.yml file. If you do not provide a defer-env-id setting, the
dbt Cloud CLI will use artifacts from your dbt Cloud environment marked "Production".
dbt_cloud.yml
defer-env-id: '123456'
dbt_project.yml
dbt_cloud:
defer-env-id: '123456'
Install dbt Cloud CLI
dbt commands are run against dbt Cloud's infrastructure and benefit from:
Prerequisites
The dbt Cloud CLI is available in all deployment regions and for both multi-tenant and
single-tenant accounts (Azure single-tenant not supported at this time).
Ensure you are using dbt version 1.5 or higher. Refer to dbt Cloud versions to
upgrade.
Note that SSH tunneling for Postgres and Redshift connections doesn't support the
dbt Cloud CLI yet.
Before you begin, make sure you have Homebrew installed in your code editor or command
line terminal. Refer to the FAQs if your operating system runs into path conflicts.
which dbt
o If you see a dbt not found, you're good to go. If the dbt help text appears,
use pip uninstall dbt to remove dbt Core from your system.
2. Install the dbt Cloud CLI with Homebrew:
o First, remove the dbt-labs tap, the separate repository for packages, from
Homebrew. This prevents Homebrew from installing packages from that
repository:
3. Verify your installation by running dbt --help in the command line. If you see the
following output, your installation is correct:
The dbt Cloud CLI - an ELT tool for running SQL transformations and data models in
dbt Cloud...
If you don't see this output, check that you've deactivated pyenv or venv and don't
have a global dbt version installed.
o Note that you no longer need to run the dbt deps command when your
environment starts. This step was previously required during initialization.
However, you should still run dbt deps if you make any changes to
your packages.yml file.
4. Clone your repository to your local computer using git clone. For example, to clone a
GitHub repo using HTTPS format, run git clone https://github.com/YOUR-
USERNAME/YOUR-REPOSITORY.
5. After cloning your repo, configure the dbt Cloud CLI for your dbt Cloud project. This
lets you run dbt commands like dbt environment show to view your dbt Cloud
configuration or dbt compile to compile your project and validate models and tests.
You can also add, edit, and synchronize files with your repo.
During the public preview period, we recommend updating before filing a bug report. This is
because the API is subject to breaking changes.
macOS (brew)
Windows (executable)
Linux (executable)
To update the dbt Cloud CLI, run brew update and then brew upgrade dbt.
Using VS Code extensions
Visual Studio (VS) Code extensions enhance command line tools by adding extra
functionalities. The dbt Cloud CLI is fully compatible with dbt Core, however, it doesn't
support some dbt Core APIs required by certain tools, for example, VS Code extensions.
You can use extensions like dbt-power-user with the dbt Cloud CLI by following these
steps:
This setup allows dbt-power-user to continue to work with dbt Core in the background,
alongside the dbt Cloud CLI. For more, check the dbt Power User documentation.
FAQs
What's the difference between the dbt Cloud CLI and dbt Core?Hover to view
How do I run both the dbt Cloud CLI and dbt Core?Hover to view
How to create an alias?Hover to view
Why am I receiving a `Session occupied` error?Hover to view
Configure and use the dbt Cloud CLI
2. Download your credentials from dbt Cloud by clicking on the Try the dbt Cloud
CLI banner on the dbt Cloud homepage. Alternatively, if you're in dbt Cloud, you can
download the credentials from the links provided based on your region:
version: "1"
context:
active-project: "<project id from the list below>"
active-host: "<active host from the list>"
defer-env-id: "<optional defer environment id>"
projects:
- project-id: "<project-id>"
account-host: "<account-host>"
api-key: "<user-api-key>"
- project-id: "<project-id>"
account-host: "<account-host>"
api-key: "<user-api-key>"
4. After downloading the config file, navigate to a dbt project in your terminal:
cd ~/dbt-projects/jaffle_shop
5. In your dbt_project.yml file, ensure you have or include a dbt-cloud section with
a project-id field. The project-id field contains the dbt Cloud project ID you want to
use.
# dbt_project.yml
name:
version:
# Your project configs...
dbt-cloud:
project-id: PROJECT_ID
oTo find your project ID, select Develop in the dbt Cloud navigation menu. You
can use the URL to find the project ID. For example,
in https://cloud.getdbt.com/develop/26228/projects/123456, the project ID
is 123456.
6. You should now be able to use the dbt Cloud CLI and run dbt commands like dbt
environment show to view your dbt Cloud configuration details or dbt compile to
compile models in your dbt project.
With your repo recloned, you can add, edit, and sync files with your repo.
To set environment variables in the dbt Cloud CLI for your dbt project:
The dbt Cloud integrated development environment (IDE) is a single web-based interface
for building, testing, running, and version-controlling dbt projects. It compiles dbt code into
SQL and executes it directly on your database.
The dbt Cloud IDE offers several keyboard shortcuts and editing features for faster and
more efficient data platform development and governance:
Syntax highlighting for SQL: Makes it easy to distinguish different parts of your code,
reducing syntax errors and enhancing readability.
Auto-completion: Suggests table names, arguments, and column names as you
type, saving time and reducing typos.
Code formatting and linting: Help standardize and fix your SQL code effortlessly.
Navigation tools: Easily move around your code, jump to specific lines, find and
replace text, and navigate between project files.
Version control: Manage code versions with a few clicks.
These features create a powerful editing environment for efficient SQL coding, suitable for
both experienced and beginner developers.
The
dbt Cloud IDE includes version control,files/folders, an editor, a command/console, and
more.
Enabl
e dark mode for a great viewing experience in low-light environments.
DISABLE AD BLOCKERS
To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. This
is because some project file names, such as google_adwords.sql, might resemble ad traffic
and trigger ad blockers.
Prerequisites
A dbt Cloud account and Developer seat license
A git repository set up and git provider must have write access enabled.
See Connecting your GitHub Account or Importing a project by git URL for detailed
setup instructions
A dbt project connected to a data platform
A development environment and development credentials set up
The environment must be on dbt version 1.0 or higher
To understand how to navigate the IDE and its user interface elements, refer to the IDE
user interface page.
Feature Info
Keyboard You can access a variety of commands and actions in the IDE by choosing the
shortcuts appropriate keyboard shortcut. Use the shortcuts for common tasks like
building modified models or resuming builds from the last failure.
File state Ability to see when changes or actions have been made to the file. The
indicators indicators M, D, A, and • appear to the right of your file or folder name and
indicate the actions performed:
Feature Info
IDE version The IDE version control section and git button allow you to apply the concept
control of version control to your project directly into the IDE.
Project Generate and view your project documentation for your dbt project in real-time.
documentation You can inspect and verify what your project's documentation will look like
before you deploy your changes to production.
Preview and You can compile or preview code, a snippet of dbt code, or one of your dbt
Compile button models after editing and saving.
Build, test, and Build, test, and run your project with a button click or by using the Cloud IDE
run button command bar.
Command bar You can enter and run commands from the command bar at the bottom of the
IDE. Use the rich model selection syntax to execute dbt commands directly
within dbt Cloud. You can also view the history, status, and logs of previous
runs by clicking History on the left of the bar.
Drag and drop Drag and drop files located in the file explorer, and use the file breadcrumb on
the top of the IDE for quick, linear navigation. Access adjacent files in the
same file by right-clicking on the breadcrumb file.
Organize tabs - Move your tabs around to reorganize your work in the IDE
and files - Right-click on a tab to view and select a list of actions, including duplicate
files
- Close multiple, unsaved tabs to batch save your work
- Double click files to rename files
Find and replace - Press Command-F or Control-F to open the find-and-replace bar in the upper
right corner of the current file in the IDE. The IDE highlights your search results
in the current file and code outline
- You can use the up and down arrows to see the match highlighted in the
current file when there are multiple matches
- Use the left arrow to replace the text with something else
Multiple You can make multiple selections for small and simultaneous edits. The below
selections commands are a common way to add more cursors and allow you to insert
cursors below or above with ease.
- Press Option and click on an area or Press Ctrl-Alt and click on an area
Lint and Format Lint and format your files with a click of a button, powered by SQLFluff, sqlfmt,
Prettier, and Black.
Git diff view Ability to see what has been changed in a file before you make a pull request.
DAG in the IDE You can see how models are used as building blocks from left to right to
transform your data from raw sources into cleaned-up modular derived pieces
and final outputs on the far right of the DAG. The default view is 2+model+2
(defaults to display 2 nodes away), however, you can change it to +model+
(full DAG). Note the --exclude flag isn't supported.
Status bar This area provides you with useful information about your IDE and project
status. You also have additional options like enabling light or dark mode,
restarting the IDE, or recloning your repo.
Dark mode From the status bar in the Cloud IDE, enable dark mode for a great viewing
experience in low-light environments.
Start-up process
There are three start-up states when using or launching the Cloud IDE:
Creation start — This is the state where you are starting the IDE for the first time.
You can also view this as a cold start (see below), and you can expect this state to
take longer because the git repository is being cloned.
Cold start — This is the process of starting a new develop session, which will be
available for you for three hours. The environment automatically turns off three hours
after the last activity. This includes compile, preview, or any dbt invocation, however,
it does not include editing and saving a file.
Hot start — This is the state of resuming an existing or active develop session within
three hours of the last activity.
Work retention
The Cloud IDE needs explicit action to save your changes. There are three ways your work
is stored:
Unsaved, local code — The browser stores your code only in its local storage. In
this state, you might need to commit any unsaved changes in order to switch
branches or browsers. If you have saved and committed changes, you can access
the "Change branch" option even if there are unsaved changes. But if you attempt to
switch branches without saving changes, a warning message will appear, notifying
you that you will lose any unsaved changes.
If you attempt to switch branches without saving changes, a warning message will
appear, telling you that you will lose your changes.
Saved but uncommitted code — When you save a file, the data gets stored in
durable, long-term storage, but isn't synced back to git. To switch branches using
the Change branch option, you must "Commit and sync" or "Revert" changes.
Changing branches isn't available for saved-but-uncommitted code. This is to ensure
your uncommitted changes don't get lost.
Committed code — This is stored in the branch with your git provider and you can
check out other (remote) branches.
The IDE uses developer credentials to connect to your data platform. These developer
credentials should be specific to your user and they should not be super user credentials or
the same credentials that you use for your production deployment of dbt.
1. Navigate to your Credentials under Your Profile settings, which you can access
at https://YOUR_ACCESS_URL/settings/profile#credentials,
replacing YOUR_ACCESS_URL with the appropriate Access URL for your region
and plan.
2. Select the relevant project in the list.
3. Click Edit on the bottom right of the page.
4. Enter the details under Development Credentials.
5. Click Save.
Config
ure developer credentials in your Profile
6. Access the Cloud IDE by clicking Develop at the top of the page.
7. Initialize your project and familiarize yourself with the IDE and its delightful features.
If a model or test fails, dbt Cloud makes it easy for you to view and download the run logs
for your dbt invocations to fix the issue.
Use dbt's rich model selection syntax to run dbt commands directly within dbt Cloud.
Previe
w, compile, or build your dbt project. Use the lineage tab to see your DAG.
Build and view your project's docs
The dbt Cloud IDE makes it possible to build and view documentation for your dbt project
while your code is still in development. With this workflow, you can inspect and verify what
your project's generated documentation will look like before your changes are released to
production.
Related docs
How we style our dbt projects
User interface
Version control basics
dbt Commands
Related questions
How can I fix my .gitignore file?Hover to view
A .gitignore file specifies which files git should intentionally ignore or 'untrack'. dbt Cloud
indicates untracked files in the project file explorer pane by putting the file or folder name
in italics.
If you encounter issues like problems reverting changes, checking out or creating a new
branch, or not being prompted to open a pull request after a commit in the dbt Cloud
IDE — this usually indicates a problem with the .gitignore file. The file may be missing or
lacks the required entries for dbt Cloud to work correctly.
To resolve issues with your gitignore file, adding the correct entries won't automatically
remove (or 'untrack') files or folders that have already been tracked by git. The
updated gitignore will only prevent new files or folders from being tracked. So you'll need to
first fix the gitignore file, then perform some additional git operations to untrack any
incorrect files or folders.
1. Launch the Cloud IDE into the project that is being fixed, by selecting Develop on
the menu bar.
2. In your File Explorer, check to see if a .gitignore file exists at the root of your dbt
project folder. If it doesn't exist, create a new file.
3. Open the new or existing gitignore file, and add the following:
# ✅ Correct
target/
dbt_packages/
logs/
# legacy -- renamed to dbt_packages in dbt v1
dbt_modules/
Note — You can place these lines anywhere in the file, as long as they're on
separate lines. The lines shown are wildcards that will include all nested files and
folders. Avoid adding a trailing '*' to the lines, such as target/*.
Restart the
IDE by clicking the three dots on the lower right or click on the Status bar
6. Once the IDE restarts, go to the File Explorer to delete the following files or folders
(if they exist). No data will be lost:
9. Once the IDE restarts, use the Create a pull request (PR) button under the Version
Control menu to start the process of integrating the changes.
10. When the git provider's website opens to a page with the new PR, follow the
necessary steps to complete and merge the PR into the main branch of that
repository.
o Note — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or
something else depending on the organizational naming conventions. The
goal is to merge these changes into the root branch that all other development
branches are created from.
11. Return to the dbt Cloud IDE and use the Change Branch button, to switch to the
main branch of the project.
12. Once the branch has changed, click the Pull from remote button to pull in all the
changes.
13. Verify the changes by making sure the files/folders in the .gitignore file are in italics.
A dbt
project on the main branch that has properly configured gitignore folders (highlighted in
italics).
Fix in the git provider
Sometimes it's necessary to use the git providers web interface to fix a
broken .gitignore file. Although the specific steps may vary across providers, the general
process remains the same.
There are two options for this approach: editing the main branch directly if allowed, or
creating a pull request to implement the changes if required:
When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of
your repo. Here are the following steps:
1. Go to your repository's web interface.
2. Switch to the main branch and the root directory of your dbt project.
3. Find the .gitignore file. Create a blank one if it doesn't exist.
4. Edit the file in the web interface, adding the following entries:
target/
dbt_packages/
logs/
# legacy -- renamed to dbt_packages in dbt v1
dbt_modules/
12. Great job 🎉! You've configured the .gitignore correctly and can continue with your
development!
For more info, refer to this detailed video for additional guidance.
Is there a cost to using the Cloud IDE?Hover to view
Not at all! You can use dbt Cloud when you sign up for the Free Developer plan, which
comes with one developer seat. If you’d like to access more features or have more
developer seats, you can upgrade your account to the Team or Enterprise plan.
dbt Cloud CLI: The dbt Cloud CLI allows you to run dbt commands against your dbt
Cloud development environment from your local command line or code editor. It
supports cross-project ref, speedier, lower-cost builds, automatic deferral of build
artifacts, and more.
dbt Core: dbt Core is an open-sourced software that’s freely available. You can build
your dbt project in a code editor, and run dbt commands from the command line
The dbt Cloud IDE is a tool for developers to effortlessly build, test, run, and version-control
their dbt projects, and enhance data governance — all from the convenience of your
browser. Use the Cloud IDE to compile dbt code into SQL and run it against your database
directly -- no command line required!
This page offers comprehensive definitions and terminology of user interface elements,
allowing you to navigate the IDE landscape with ease.
The
Cloud IDE layout includes version control on the upper left, files/folders on the left, editor on
the right an command/console at the bottom
Basic layout
The IDE streamlines your workflow, and features a popular user interface layout with files
and folders on the left, editor on the right, and command and console information at the
bottom.
The Git repo link, documentation site button,
Version Control menu, and File Explorer
1. Git repository link — Clicking the Git repository link, located on the upper left of the
IDE, takes you to your repository on the same active branch.
o Note: This feature is only available for GitHub or GitLab repositories on multi-
tenant dbt Cloud accounts.
2. Documentation site button — Clicking the Documentation site book icon, located
next to the Git repository link, leads to the dbt Documentation site. The site is
powered by the latest dbt artifacts generated in the IDE using the dbt docs
generate command from the Command bar.
3. Version Control — The IDE's powerful Version Control section contains all git-
related elements, including the Git actions button and the Changes section.
4. File Explorer — The File Explorer shows the filetree of your repository. You can:
o Click on any file in the filetree to open the file in the File Editor.
o Click and drag files between directories to move files.
o Right-click a file to access the sub-menu options like duplicate file, copy file
name, copy as ref, rename, delete.
o Note: To perform these actions, the user must not be in read-only mode,
which generally happens when the user is viewing the default branch.
o Use file indicators, located to the right of your files or folder name, to see
when changes or actions were made:
Unsaved (•) — The IDE detects unsaved changes to your file/folder
Modification (M) — The IDE detects a modification of existing
files/folders
Added (A) — The IDE detects added files
Deleted (D) — The IDE detects deleted files.
Use
the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status
5. Command bar — The Command bar, located in the lower left of the IDE, is used to
invoke dbt commands. When a command is invoked, the associated logs are shown
in the Invocation History Drawer.
7. Status button — The IDE Status button, located on the lower right of the IDE,
displays the current IDE status. If there is an error in the status or in the dbt code
that stops the project from parsing, the button will turn red and display "Error". If
there aren't any errors, the button will display a green "Ready" status. To access
the IDE Status modal, simply click on this button.
Editing features
The IDE features some delightful tools and layouts to make it easier for you to write dbt
code and collaborate with teammates.
Use
the file editor, version control section, and save button during your development workflow
1. File Editor — The File Editor is where users edit code. Tabs break out the region for
each opened file, and unsaved files are marked with a blue dot icon in the tab view.
o Use intuitive keyboard shortcuts to help develop easier for you and your team.
2. Save button — The editor has a Save button that saves editable files. Pressing the
button or using the Command-S or Control-S shortcut saves the file contents. You
don't need to save to preview code results in the Console section, but it's necessary
before changes appear in a dbt invocation. The File Editor tab shows a blue icon for
unsaved changes.
3. Version Control — This menu contains all git-related elements, including the Git
actions button. The button updates relevant actions based on your editor's state,
such as prompting to pull remote changes, commit and sync when reverted commit
changes are present, or creating a merge/pull request when appropriate.
o The dropdown menu on the Git actions button allows users to revert changes,
refresh Git state, create merge/pull requests, and change branches.
Keep in mind that although you can't delete local branches in the IDE
using this menu, you can reclone your repository, which deletes your
local branches and refreshes with the current remote branches,
effectively removing the deleted ones.
o You can also resolve merge conflicts and for more info on git, refer to Version
control basics.
o Version Control Options menu — The Changes section, under the Git
actions button, lists all file changes since the last commit. You can click on a
change to open the Git Diff View to see the inline changes. You can also right-
click any file and use the file-specific options in the Version Control Options
menu.
Right-
click edited files to access Version Control Options menu
Additional editing features
Minimap — A Minimap (code outline) gives you a high-level overview of your source
code, which is useful for quick navigation and code understanding. A file's minimap
is displayed on the upper-right side of the editor. To quickly jump to different sections
of your file, click the shaded area.
Use the Minimap for quick navigation and code understanding
dbt Editor Command Palette — The dbt Editor Command Palette displays text
editing actions and their associated keyboard shortcuts. This can be accessed by
pressing F1 or right-clicking in the text editing area and selecting Command Palette.
Click F1 to access the dbt Editor Command Palette menu for editor shortcuts
Git Diff View — Clicking on a file in the Changes section of the Version Control
Menu will open the changed file with Git Diff view. The editor will show the previous
version on the left and the in-line changes made on the right.
The Git Diff View displays the previous version on the left and the changes made on
the right of the Editor
Markdown Preview console tab — The Markdown Preview console tab shows a
preview of your .md file's markdown code in your repository and updates it
automatically as you edit your code.
The Markdown Preview console tab renders markdown code below the Editor tab.
CSV Preview console tab — The CSV Preview console tab displays the data from
your CSV file in a table, which updates automatically as you edit the file in your seed
directory.
View csv code in the CSV Preview console tab below the Editor tab.
Console section
The console section, located below the File editor, includes various console tabs and
buttons to help you with tasks such as previewing, compiling, building, and viewing
the DAG. Refer to the following sub-bullets for more details on the console tabs and
buttons.
The
Console section is located below the File editor and has various tabs and buttons to help
execute tasks
1. Preview button — When you click on the Preview button, it runs the SQL in the
active file editor regardless of whether you have saved it or not and sends the results
to the Results console tab. You can preview a selected portion of saved or unsaved
code by highlighting it and then clicking the Preview button.
Starting from dbt v1.6 or higher, when you save changes to a model, you can compile its
code with the model's specific context. This context is similar to what you'd have when
building the model and involves useful context variables
like {{ this }} or {{ is_incremental() }}.
3. Build button — The build button allows users to quickly access dbt commands
related to the active model in the File Editor. The available commands include dbt
build, dbt test, and dbt run, with options to include only the current resource, the
resource and its upstream dependencies, the resource, and its downstream
dependencies, or the resource with all dependencies. This menu is available for all
executable nodes.
4. Format button — The editor has a Format button that can reformat the contents of
your files. For SQL files, it uses either sqlfmt or sqlfluff, and for Python files, it
uses black.
5. Results tab — The Results console tab displays the most recent Preview results in
tabular format.
6. Compiled Code tab — The Compile button triggers a compile invocation that
generates compiled code, which is displayed in the Compiled Code tab.
7. Lineage tab — The Lineage tab in the File Editor displays the active model's lineage
or DAG. By default, it shows two degrees of lineage in both directions
(2+model_name+2), however, you can change it to +model+ (full DAG).
View
resource lineage in the Lineage tab
Invocation history
The Invocation History Drawer stores information on dbt invocations in the IDE. When you
invoke a command, like executing a dbt command such as dbt run, the associated logs are
displayed in the Invocation History Drawer.
Clicking the ^ icon next to the Command bar on the lower left of the page
Typing a dbt command and pressing enter
Or pressing Control-backtick (or Ctrl + `)
The
Invocation History Drawer returns a log and detail of all your dbt Cloud invocations.
1. Invocation History list — The left-hand panel of the Invocation History Drawer
displays a list of previous invocations in the IDE, including the command, branch
name, command status, and elapsed time.
4. Command Control button — Use the Command Control button, located on the
right side, to control your invocation and cancel or rerun a selected run.
The
Invocation History list displays a list of previous invocations in the IDE
5. Node Summary tab — Clicking on the Results Status Tabs will filter the Node
Status List based on their corresponding status. The available statuses are Pass
(successful invocation of a node), Warn (test executed with a warning), Error
(database error or test failure), Skip (nodes not run due to upstream error), and
Queued (nodes that have not executed yet).
6. Node result toggle — After running a dbt command, information about each
executed node can be found in a Node Result toggle, which includes a summary and
debug logs. The Node Results List lists every node that was invoked during the
command.
7. Node result list — The Node result list shows all the Node Results used in the dbt
run, and you can filter it by clicking on a Result Status tab.
Editor tab menu — To interact with open editor tabs, right-click any tab to access
the helpful options in the file tab menu.
Right-click a tab to view the Editor tab menu options
File Search — You can easily search for and navigate between files using the File
Navigation menu, which can be accessed by pressing Command-O or Control-O or
The Command History returns a log and detail of all your dbt Cloud invocations.
IDE Status modal — The IDE Status modal shows the current error message and
debug logs for the server. This also contains an option to restart the IDE. Open this
by clicking on the IDE Status button.
The Command History returns a log and detail of all your dbt Cloud invocations.
Commit Changes modal — The Commit Changes modal is accessible via the Git
Actions button to commit all changes or via the Version Control Options menu to
commit individual changes. Once you enter a commit message, you can use the
modal to commit and sync the selected changes.
The Commit Changes modal is how users commit changes to their branch.
Change Branch modal — The Change Branch modal allows users to switch git
branches in the IDE. It can be accessed through the Change Branch link or the Git
Actions button in the Version Control menu.
IDE Options menu — The IDE Options menu can be accessed by clicking on the
three-dot menu located at the bottom right corner of the IDE. This menu contains
global options such as:
Acces
s the IDE Options menu to switch to dark or light mode, restart the IDE, reclone your repo,
or view the IDE status
Tags:
IDE
Enhance your development workflow by integrating with popular linters and formatters
like SQLFluff, sqlfmt, Black, and Prettier. Leverage these powerful tools directly in the dbt
Cloud IDE without interrupting your development flow.
What are linters and formatters?
In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five different file
types:
SQL — Lint and fix with SQLFluff, and format with sqlfmt
YAML, Markdown, and JSON — Format with Prettier
Python — Format with Black
Each file type has its own unique linting and formatting rules. You can customize the linting
process to add more flexibility and enhance problem and style detection.
By default, the IDE uses sqlfmt rules to format your code, making it convenient to use right
away. However, if you have a file named .sqlfluff in the root directory of your dbt project, the
IDE will default to SQLFluff rules instead.
Use
SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.
Use
sqlfmt to format your SQL code.
Forma
t YAML, Markdown, and JSON files using Prettier.
Use
the Config button to select your tool.
Custo
mize linting by configuring your own linting code rules, including dbtonic linting/styling.
Lint
With the dbt Cloud IDE, you can seamlessly use SQLFluff, a configurable SQL linter, to
warn you of complex functions, syntax, formatting, and compilation errors. This integration
allows you to run checks, fix, and display any code errors directly within the Cloud IDE:
1. To enable linting, make sure you're on a development branch. Linting isn't available
on main or read-only branches.
2. Open a .sql file and click the Code Quality tab.
3. Click on the </> Config button on the bottom right side of the console section, below
the File editor.
4. In the code quality tool config pop-up, you have the option to
select sqlfluff or sqlfmt.
5. To lint your code, select the sqlfluff radio button. (Use sqlfmt to format your code)
6. Once you've selected the sqlfluff radio button, go back to the console section (below
the File editor) to select the Lint or Fix dropdown button:
o Lint button — Displays linting issues in the IDE as wavy underlines in the File
editor. You can hover over an underlined issue to display the details and
actions, including a Quick Fix option to fix all or specific issues. After linting,
you'll see a message confirming the outcome. Linting doesn't rerun after
saving. Click Lint again to rerun linting.
o Fix button — Automatically fixes linting errors in the File editor. When fixing
is complete, you'll see a message confirming the outcome.
o Use the Code Quality tab to view and debug any code errors.
Use
the Lint or Fix button in the console section to lint or auto-fix your code.
Customize linting
SQLFluff is a configurable SQL linter, which means you can configure your own linting rules
instead of using the default linting settings in the IDE. You can exclude files and directories
by using a standard .sqlfluffignore file. Learn more about the syntax in the .sqlfluffignore
syntax docs.
Custo
mize linting by configuring your own linting code rules, including dbtonic linting/styling.
Format
In the dbt Cloud IDE, you can format your code to match style guides with a click of a
button. The IDE integrates with formatters like sqlfmt, Prettier, and Black to automatically
format code on five different file types — SQL, YAML, Markdown, Python, and JSON:
SQL — Format with sqlfmt, which provides one way to format your dbt SQL and
Jinja.
YAML, Markdown, and JSON — Format with Prettier.
Python — Format with Black.
The Cloud IDE formatting integrations take care of manual tasks like code formatting,
enabling you to focus on creating quality data models, collaborating, and driving impactful
results.
Format SQL
To format your SQL code, dbt Cloud integrates with sqlfmt, which is an uncompromising
SQL query formatter that provides one way to format the SQL query and Jinja.
By default, the IDE uses sqlfmt rules to format your code, making the Format button
available and convenient to use immediately. However, if you have a file named .sqlfluff in
the root directory of your dbt project, the IDE will default to SQLFluff rules instead.
To enable sqlfmt:
Use
sqlfmt to format your SQL code.
Format YAML, Markdown, JSON
To format your YAML, Markdown, or JSON code, dbt Cloud integrates with Prettier, which
is an opinionated code formatter.
For more info on the order of precedence and how to configure files, refer to Prettier's
documentation. Please note, .prettierrc.json5, .prettierrc.js, and .prettierrc.toml files aren't
currently supported.
Format Python
To format your Python code, dbt Cloud integrates with Black, which is an uncompromising
Python code formatter.
A dbt project informs dbt about the context of your project and how to transform your data
(build your data sets). By design, dbt enforces the top-level structure of a dbt project such
as the dbt_project.yml file, the models directory, the snapshots directory, and so on. Within
the directories of the top-level, you can organize your project in any way that meets the
needs of your organization and data pipeline.
At a minimum, all a project needs is the dbt_project.yml project configuration file. dbt
supports a number of different resources, so a project may also include:
Resource Description
models Each model lives in a single file and contains logic that either transforms raw data into
a dataset that is ready for analytics or, more often, is an intermediate step in such a
transformation.
snapshots A way to capture the state of your mutable tables so you can refer to it later.
seeds CSV files with static data that you can load into your data platform with dbt.
data tests SQL queries that you can write to test the models and resources in your project.
Resource Description
sources A way to name and describe the data loaded into your warehouse by your Extract and
Load tools.
analysis A way to organize analytical SQL queries in your project such as the general ledger
from your QuickBooks.
When building out the structure of your project, you should consider these impacts on your
organization's workflow:
Project configuration
Every dbt project includes a project configuration file called dbt_project.yml. It defines the
directory of the dbt project and other project configurations.
require-dbt- Restrict your project to only work with a range of dbt Core versions
version
Project subdirectories
You can use the Project subdirectory option in dbt Cloud to specify a subdirectory in your
git repository that dbt should use as the root directory for your project. This is helpful when
you have multiple dbt projects in one repository or when you want to organize your dbt
project files into subdirectories for easier management.
To use the Project subdirectory option in dbt Cloud, follow these steps:
1. Click on the cog icon on the upper right side of the page and click on Account
Settings.
2. Under Projects, select the project you want to configure as a project subdirectory.
4. In the Project subdirectory field, add the name of the subdirectory. For example, if
your dbt project files are located in a subdirectory called <repository>/finance, you
would enter finance as the subdirectory.
o You can also reference nested subdirectories. For example, if your dbt project
files are located in <repository>/teams/finance, you would
enter teams/finance as the subdirectory. Note: You do not need a leading or
trailing / in the Project subdirectory field.
5. Click Save when you've finished.
After configuring the Project subdirectory option, dbt Cloud will use it as the root directory
for your dbt project. This means that dbt commands, such as dbt run or dbt test, will operate
on files within the specified subdirectory. If there is no dbt_project.yml file in the Project
subdirectory, you will be prompted to initialize the dbt project.
New projects
You can create new projects and share them with other people by making them available
on a hosted git repository like GitHub, GitLab, and BitBucket.
After you set up a connection with your data platform, you can initialize your new project in
dbt Cloud and start developing. Or, run dbt init from the command line to set up your new
project.
During project initialization, dbt creates sample model files in your project directory to help
you start developing quickly.
Sample projects
If you want to explore dbt projects more in-depth, you can clone dbt Lab’s Jaffle shop on
GitHub. It's a runnable project that contains sample configurations and helpful notes.
If you want to see what a mature, production project looks like, check out the GitLab Data
Team public repo.
Models are where your developers spend most of their time within a dbt environment.
Models are primarily written as a select statement and saved as a .sql file. While the
definition is straightforward, the complexity of the execution will vary from environment to
environment. Models will be written and rewritten as needs evolve and your organization
finds new ways to maximize efficiency.
SQL is the language most dbt users will utilize, but it is not the only one for building models.
Starting in version 1.3, dbt Core and dbt Cloud support Python models. Python models are
useful for training or deploying data science models, complex transformations, or where a
specific Python package meets a need — such as using the dateutil library to parse dates.
The top level of a dbt workflow is the project. A project is a directory of a .yml file (the
project configuration) and either .sql or .py files (the models). The project file tells dbt the
project context, and the models let dbt know how to build a specific data set. For more
details on projects, refer to About dbt projects.
Your organization may need only a few models, but more likely you’ll need a complex
structure of nested models to transform the required data. A model is a single file containing
a final select statement, and a project can have multiple models, and models can even
reference each other. Add to that, numerous projects and the level of effort required for
transforming complex data sets can improve drastically compared to older methods.
Learn more about models in SQL models and Python models pages. If you'd like to begin
with a bit of practice, visit our Getting Started Guide for instructions on setting up the
Jaffle_Shop sample data so you can get hands-on with the power of dbt.
Related documentation
Snapshot configurations
Snapshot properties
snapshot command
Analysts often need to "look back in time" at previous data states in their mutable tables.
While some source data systems are built in a way that makes accessing historical data
possible, this is not always the case. dbt provides a mechanism, snapshots, which records
changes to a mutable table over time.
Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables.
These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over
time. Imagine you have an orders table where the status field can be overwritten as the
order is processed.
id status updated_at
1 pendin 2019-01-01
id status updated_at
g
Now, imagine that the order goes from "pending" to "shipped". That same record will now
look like:
id status updated_at
1 shippe 2019-01-02
d
This order is now in the "shipped" state, but we've lost the information about when the order
was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it
took for an order to ship. dbt can "snapshot" these changes to help you understand how
values in a row change over time. Here's an example of a snapshot table for the previous
example:
i updated_a
status dbt_valid_from dbt_valid_to
d t
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
{% endsnapshot %}
On the first run: dbt will create the initial snapshot table — this will be the result set
of your select statement, with additional columns
including dbt_valid_from and dbt_valid_to. All records will have a dbt_valid_to = null.
On subsequent runs: dbt will check which records have changed or if any new
records have been created:
o The dbt_valid_to column will be updated for any existing records that have
changed
o The updated record and any new records will be inserted into the snapshot
table. These records will now have dbt_valid_to = null
Snapshots can be referenced in downstream models the same way as referencing models
— by using the ref function.
Example
To add a snapshot to your project:
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{% endsnapshot %}
3. Write a select statement within the snapshot block (tips for writing a good snapshot
query are below). This select statement defines the results that you want to snapshot
over time. You can use sources and refs here.
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{% endsnapshot %}
4. Check whether the result set of your query includes a reliable timestamp column that
indicates when a record was last updated. For our example, the updated_at column
reliably indicates record changes, so we can use the timestamp strategy. If your
query result set does not have a reliable timestamp, you'll need to instead use
the check strategy — more details on this below.
5. Add configurations to your snapshot using a config block (more details below). You
can also configure your snapshot from your dbt_project.yml file (docs).
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
6. Run the dbt snapshot command — for our example a new table will be created
at analytics.snapshots.orders_snapshot. You can change
the target_database configuration, the target_schema configuration and the name of
the snapshot (as defined in {% snapshot .. %}) will change how dbt names this table.
$ dbt snapshot
Running with dbt=0.16.0
Completed successfully
7. Inspect the results by selecting from the table dbt created. After the first run, you
should see the results of your query, plus the snapshot meta fields as described
below.
8. Run the snapshot command again, and inspect the results. If any records have been
updated, the snapshot should reflect this.
9. Select from the snapshot in downstream models using the ref function.
models/changed_orders.sql
select * from {{ ref('orders_snapshot') }}
10. Schedule the snapshot command to run regularly — snapshots are only useful if you
run them frequently.
The timestamp strategy uses an updated_at field to determine if a row has changed. If the
configured updated_at column for a row is more recent than the last time the snapshot ran,
then dbt will invalidate the old record and record the new one. If the timestamps are
unchanged, then dbt will not take any action.
updated_a A column which represents when the source row was last updated updated_at
Config Description Example
t
Example usage:
snapshots/orders_snapshot_timestamp.sql
{% snapshot orders_snapshot_timestamp %}
{{
config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
)
}}
{% endsnapshot %}
Check strategy
The check strategy is useful for tables which do not have a reliable updated_at column.
This strategy works by comparing a list of columns between their current and historical
values. If any of these columns have changed, then dbt will invalidate the old record and
record the new one. If the column values are identical, then dbt will not take any action.
check_cols A list of columns to check for changes, or all to check all ["name", "email"]
columns
CHECK_COLS = 'ALL'
The check snapshot strategy can be configured to track changes to all columns by
supplying check_cols = 'all'. It is better to explicitly enumerate the columns that you want to
check. Consider using a surrogate key to condense many columns into a single column.
Example Usage
snapshots/orders_snapshot_check.sql
{% snapshot orders_snapshot_check %}
{{
config(
target_schema='snapshots',
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
Rows that are deleted from the source query are not invalidated by default. With the config
option invalidate_hard_deletes, dbt can track rows that no longer exist. This is done by left
joining the snapshot table with the source table, and filtering the rows that are still valid at
that point, but no longer can be found in the source table. dbt_valid_to will be set to the
current snapshot time.
This configuration is not a different strategy as described above, but is an additional opt-in
feature. It is not enabled by default since it alters the previous behavior.
Example Usage
snapshots/orders_snapshot_hard_delete.sql
{% snapshot orders_snapshot_hard_delete %}
{{
config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}
{% endsnapshot %}
Configuring snapshots
Snapshot configurations
check_cols If using the check strategy, then the Only if using ["status"]
columns to check the check strategy
Snapshots can be configured from both your dbt_project.yml file and a config block, check
out the configuration docs for more information.
Basically – keep your query as simple as possible! Some reasonable exceptions to these
recommendations include:
Snapshot meta-fields
Snapshot tables will be created as a clone of your source dataset, plus some additional
meta-fields*.
dbt_valid_from The timestamp when this snapshot row This column can be used to order the
was first inserted different "versions" of a record.
dbt_valid_to The timestamp when this row became The most recent snapshot record will
invalidated. have dbt_valid_to set to null.
dbt_scd_id A unique key generated for each This is used internally by dbt
snapshotted record.
dbt_updated_a The updated_at timestamp of the source This is used internally by dbt
t record when this snapshot row was
inserted.
*The timestamps used for each column are subtly different depending on the strategy you
use:
For the timestamp strategy, the configured updated_at column is used to populate
the dbt_valid_from, dbt_valid_to and dbt_updated_at columns.
For the check strategy, the current timestamp is used to populate each column. If
configured, the check strategy uses the updated_at column instead, as with the timestamp
strategy.
To change this, update the snapshot-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
snapshot-paths: ["snapshots"]
Note that you cannot co-locate snapshots and models in the same directory.
Debug Snapshot target is not a snapshot table errorsHover to view
Overview
Data tests are assertions you make about your models and other resources in your dbt
project (e.g. sources, seeds and snapshots). When you run dbt test, dbt will tell you if each
test in your project passes or fails.
You can use data tests to improve the integrity of the SQL in each model by making
assertions about the results generated. Out of the box, you can test whether a specified
column in a model only contains non-null values, unique values, or values that have a
corresponding value in another model (for example, a customer_id for
an order corresponds to an id in the customers model), and values from a specified list. You
can extend data tests to suit business logic specific to your organization – any assertion
that you can make about your model in the form of a select query can be turned into a data
test.
Data tests return a set of failing records. Generic data tests (f.k.a. schema tests) are
defined using test blocks.
Like almost everything in dbt, data tests are SQL queries. In particular, they
are select statements that seek to grab "failing" records, ones that disprove your assertion.
If you assert that a column is unique in a model, the test query selects for duplicates; if you
assert that a column is never null, the test seeks after nulls. If the data test returns zero
failing rows, it passes, and your assertion has been validated.
A singular data test is testing in its simplest form: If you can write a SQL query that
returns failing rows, you can save that query in a .sql file within your test directory.
It's now a data test, and it will be executed by the dbt test command.
A generic data test is a parameterized query that accepts arguments. The test query
is defined in a special test block (like a macro). Once defined, you can reference the
generic test by name throughout your .yml files—define it on models, columns,
sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we
think you should use them!
Defining data tests is a great way to confirm that your outputs and inputs are as expected,
and helps prevent regressions when your code changes. Because you can use them over
and over again, making similar assertions with minor variations, generic data tests tend to
be much more common—they should make up the bulk of your dbt data testing suite. That
said, both ways of defining data tests have their time and place.
These tests are defined in .sql files, typically in your tests directory (as defined by your test-
paths config). You can use Jinja (including ref and source) in the test definition, just like you
can when creating models. Each .sql file contains one select statement, and it defines one
data test:
tests/assert_total_payment_amount_is_positive.sql
-- Refunds have a negative amount, so the total amount should always be >= 0.
-- Therefore return records where this isn't true to make the test fail
select
order_id,
sum(amount) as total_amount
from {{ ref('fct_payments' )}}
group by 1
having not(total_amount >= 0)
Singular data tests are easy to write—so easy that you may find yourself writing the same
basic structure over and over, only changing the name of a column or model. By that point,
the test isn't so singular! In that case, we recommend...
select *
from {{ model }}
where {{ column_name }} is null
{% endtest %}
You'll notice that there are two arguments, model and column_name, which are then
templated into the query. This is what makes the test "generic": it can be defined on as
many columns as you like, across as many models as you like, and dbt will pass the values
of model and column_name accordingly. Once that generic test has been defined, it can be
added as a property on any existing model (or source, seed, or snapshot). These properties
are added in .yml files in the same directory as your resource.
INFO
If this is your first time working with adding properties to a resource, check out the docs
on declaring properties.
Out of the box, dbt ships with four generic data tests already
defined: unique, not_null, accepted_values and relationships. Here's a full example using
those tests on an orders model:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id
Behind the scenes, dbt constructs a select query for each data test, using the parametrized
query from the generic test block. These queries return the rows where your assertion
is not true; if the test returns zero rows, your assertion passes.
You can find more information about these data tests, and additional configurations
(including severity and tags) in the reference section.
Those four tests are enough to get you started. You'll quickly find you want to use a wider
variety of tests—a good thing! You can also install generic data tests from a package, or
write your own, to use (and reuse) across your dbt project. Check out the guide on custom
generic tests for more information.
INFO
There are generic tests defined in some open source packages, such as dbt-utils and dbt-
expectations — skip ahead to the docs on packages to learn more!
Example
1. Add a .yml file to your models directory, e.g. models/schema.yml, with the following
content (you may need to adjust the name: values for an existing model)
models/schema.yml
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
$ dbt test
Found 3 models, 2 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 0
sources
Completed successfully
Unique test
Compiled SQL
Templated SQL
select *
from (
select
order_id
from analytics.orders
where order_id is not null
group by order_id
having count(*) > 1
) validation_errors
Compiled SQL
Templated SQL
select *
from analytics.orders
where order_id is null
This workflow allows you to query and examine failing records much more quickly in
development:
Store
test failures in the database for faster development-time debugging.
Note that, if you elect to store test failures:
FAQs
How do I test one model at a time?Hover to view
One of my tests failed, how can I debug it?Hover to view
What tests should I add to my project?Hover to view
When should I run my tests?Hover to view
Can I store my tests in a directory other than the `tests` directory in my project?Hover to
view
How do I run tests on just my sources?Hover to view
Can I set test failure thresholds?Hover to view
As of v0.20.0, you can use the error_if and warn_if configs to set custom failure thresholds
in your tests. For more details, see reference for more information.
For dbt v0.19.0 and earlier, you could try these possible solutions:
Consider an orders table that contains records from multiple countries, and the combination
of ID and country code is unique:
country_cod
order_id
e
1 AU
2 AU
... ...
1 US
2 US
... ...
Here are some approaches:
select
country_code || '-' || order_id as surrogate_key,
...
models/orders.yml
version: 2
models:
- name: orders
columns:
- name: surrogate_key
tests:
- unique
2. Test an expression
models/orders.yml
version: 2
models:
- name: orders
tests:
- unique:
column_name: "(country_code || '-' || order_id)"
models/orders.yml
version: 2
models:
- name: orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- country_code
- order_id
Overview
In dbt, you can combine SQL with Jinja, a templating language.
Using Jinja turns your dbt project into a programming environment for SQL, giving you the
ability to do things that aren't normally possible in SQL. For example, with Jinja you can:
In fact, if you've used the {{ ref() }} function, you're already using Jinja!
Jinja can be used in any SQL in a dbt project, including models, analyses, tests, and
even hooks.
/models/order_payment_method_amounts.sql
{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}
select
order_id,
{% for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as
{{payment_method}}_amount,
{% endfor %}
sum(amount) as total_amount
from app_data.payments
group by 1
/models/order_payment_method_amounts.sql
select
order_id,
sum(case when payment_method = 'bank_transfer' then amount end) as
bank_transfer_amount,
sum(case when payment_method = 'credit_card' then amount end) as
credit_card_amount,
sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount,
sum(amount) as total_amount
from app_data.payments
group by 1
You can recognize Jinja based on the delimiters the language uses, which we refer to as
"curlies":
Expressions {{ ... }}: Expressions are used when you want to output a string. You
can use expressions to reference variables and call macros.
Statements {% ... %}: Statements don't output a string. They are used for control
flow, for example, to set up for loops and if statements, to set or modify variables, or
to define macros.
Comments {# ... #}: Jinja comments are used to prevent the text within the comment
from executing or outputing a string.
When used in a dbt model, your Jinja needs to compile to a valid query. To check what
SQL your Jinja compiles to:
Using dbt Cloud: Click the compile button to see the compiled SQL in the Compiled
SQL pane
Using dbt Core: Run dbt compile from the command line. Then open the compiled
SQL file in the target/compiled/{project name}/ directory. Use a split screen in your
code editor to keep both files open at once.
Macros
Macros in Jinja are pieces of code that can be reused multiple times – they are analogous
to "functions" in other programming languages, and are extremely useful if you find yourself
repeating code across multiple models. Macros are defined in .sql files, typically in
your macros directory (docs).
macros/cents_to_dollars.sql
models/stg_payments.sql
select
id as payment_id,
{{ cents_to_dollars('amount') }} as amount_usd,
...
from app_data.payments
target/compiled/models/stg_payments.sql
select
id as payment_id,
(amount / 100)::numeric(16, 2) as amount_usd,
...
from app_data.payments
A number of useful macros have also been grouped together into packages — our most
popular package is dbt-utils.
After installing a package into your project, you can use any of the macros in your own
project — make sure you qualify the macro by prefixing it with the package name:
select
field_1,
field_2,
field_3,
field_4,
field_5,
count(*)
from my_table
{{ dbt_utils.dimensions(5) }}
You can also qualify a macro in your own project by prefixing it with your package
name (this is mainly useful for package authors).
FAQs
What parts of Jinja are dbt-specific?Hover to view
Which docs should I use when writing Jinja or creating a macro?Hover to view
Why do I need to quote column names in Jinja?Hover to view
My compiled SQL has a lot of spaces and new lines, how can I get rid of it?Hover to view
How do I debug my Jinja?Hover to view
How do I document macros?Hover to view
Why does my dbt output have so many macros in it?Hover to view
dbtonic Jinja
Just like well-written python is pythonic, well-written dbt code is dbtonic.
Once you learn the power of Jinja, it's common to want to abstract every repeated line into
a macro! Remember that using Jinja can make your models harder for other users to
interpret — we recommend favoring readability when mixing Jinja with SQL, even if it
means repeating some lines of SQL in a few places. If all your models are macros, it might
be worth re-assessing.
Writing a macro for the first time? Check whether we've open sourced one in dbt-utils that
you can use, and save yourself some time!
{% set ... %} can be used to create a new variable, or update an existing one. We
recommend setting variables at the top of a model, rather than hardcoding it inline. This is a
practice borrowed from many other coding languages, since it helps with readability, and
comes in handy if you need to reference the variable in two places:
-- 🙅 This works, but can be hard to maintain as your code grows
{% for payment_method in ["bank_transfer", "credit_card", "gift_card"] %}
...
{% endfor %}
Using sources
Sources make it possible to name and describe the data loaded into your warehouse by
your Extract and Load tools. By declaring these tables as sources in dbt, you can then
select from source tables in your models using the {{ source() }} function, helping
define the lineage of your data
test your assumptions about your source data
calculate the freshness of your source data
Declaring a source
models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
schema: jaffle_shop
tables:
- name: orders
- name: customers
- name: stripe
tables:
- name: payments
*By default, schema will be the same as name. Add schema only if you want to use a
source name that differs from the existing schema.
If you're not already familiar with these files, be sure to check out the documentation on
schema.yml files before proceeding.
Once a source has been defined, it can be referenced from a model using
the {{ source()}} function.
models/orders.sql
select
...
target/compiled/jaffle_shop/models/my_model.sql
select
...
from raw.jaffle_shop.orders
Using the {{ source () }} function also creates a dependency between the model and the
source table.
The
source function tells dbt a model is dependent on a source
Testing and documenting sources
These should be familiar concepts if you've already added tests and descriptions to your
models (if not check out the guides on testing and documentation).
models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
description: This is a replica of the Postgres database used by our app
tables:
- name: orders
description: >
One record per order. Includes cancelled and deleted orders.
columns:
- name: id
description: Primary key of the orders table
tests:
- unique
- not_null
- name: status
description: Note that the status can change over time
- name: ...
- name: ...
You can find more details on the available properties for sources in the reference section.
FAQs
models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
- name: product_skus
freshness: null # do not check freshness for this table
In the freshness block, one or both of warn_after and error_after can be provided. If neither
is provided, then dbt will not calculate freshness snapshots for the tables in this source.
These configs are applied hierarchically, so freshness and loaded_at_field values specified
for a source will flow through to all of the tables defined in that source. This is useful when
all of the tables in a source have the same loaded_at_field, as the config can just be
specified once in the top-level source definition.
To snapshot freshness information for your sources, use the dbt source
freshness command (reference docs):
Behind the scenes, dbt uses the freshness properties to construct a select query, shown
below. You can find this query in the query logs.
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders
The results of this query are used to determine whether the source is fresh or not:
Uh oh!
Not everything is as fresh as we'd like!
Filter
Some databases can have tables where a filter over certain columns are required, in order
prevent a full scan of the table, which could be costly. In order to do a freshness check on
such tables a filter argument can be added to the configuration, e.g. filter: _etl_loaded_at
>= date_sub(current_date(), interval 1 day). For the example above, the resulting query
would look like
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders
where _etl_loaded_at >= date_sub(current_date(), interval 1 day)
FAQs
The dbt source freshness command will output a pass/warning/error status for
each table selected in the freshness snapshot.
Additionally, dbt will write the freshness results to a file in the target/ directory
called sources.json by default. You can also override this destination, use the -o flag to
the dbt source freshness command.
After enabling source freshness within a job, configure Artifacts in your Project
Details page, which you can find by clicking the gear icon and then selecting Account
settings. You can see the current status for source freshness by clicking View Sources in
the job page.
Add Exposures to your DAG
Exposures make it possible to define and describe a downstream use of your dbt project,
such as in a dashboard, application, or data science pipeline. By defining exposures, you
can then:
run, test, and list resources that feed into your exposure
populate a dedicated page in the auto-generated documentation site with context
relevant to data consumers
Declaring an exposure
models/<filename>.yml
version: 2
exposures:
- name: weekly_jaffle_metrics
label: Jaffles by the Week
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
Available properties
Required:
Expected:
depends_on: list of refable nodes, including ref, source, and metric (While possible,
it is highly unlikely you will ever need an exposure to depend on a source directly)
Optional:
description
tags
meta
Referencing exposures
Once an exposure is defined, you can run commands that reference it:
When we generate our documentation site, you'll see the exposure appear:
Dedic
ated page in dbt-docs for each exposure
Add groups to your DAG
A group is a collection of nodes within a dbt DAG. Groups are named, and every group has
an owner. They enable intentional collaboration within and across teams by
restricting access to private models.
Group members may include models, tests, seeds, snapshots, analyses, and metrics. (Not
included: sources and exposures.) Each node may belong to only one group.
Declaring a group
models/marts/finance/finance.yml
groups:
- name: finance
owner:
# 'name' or 'email' is required; additional properties allowed
email: [email protected]
slack: finance-data
github: finance-data-team
Project-level
Model-level
In-file
dbt_project.yml
models:
marts:
finance:
+group: finance
By default, all models within a group have the protected access modifier. This means they
can be referenced by downstream resources in any group in the same project, using
the ref function. If a grouped model's access property is set to private, only resources within
its group can reference it.
models/schema.yml
models:
- name: finance_private_model
access: private
config:
group: finance
# in a different group!
- name: marketing_model
config:
group: marketing
models/marketing_model.sql
select * from {{ ref('finance_private_model') }}
Related docs
Analyses
Overview
dbt's notion of models makes it easy for data teams to version control and collaborate on
data transformations. Sometimes though, a certain SQL statement doesn't quite fit into the
mold of a dbt model. These more "analytical" SQL files can be versioned inside of your dbt
project using the analysis functionality of dbt.
Any .sql files found in the analyses/ directory of a dbt project will be compiled, but not
executed. This means that analysts can use dbt functionality like {{ ref(...) }} to select from
models in an environment-agnostic way.
In practice, an analysis file might look like this (via the open source Quickbooks models):
analyses/running_total_by_account.sql
-- analyses/running_total_by_account.sql
with journal_entries as (
select *
from {{ ref('quickbooks_adjusted_journal_entries') }}
), accounts as (
select *
from {{ ref('quickbooks_accounts_transformed') }}
select
txn_date,
account_id,
adjusted_amount,
description,
account_name,
sum(adjusted_amount) over (partition by account_id order by id rows unbounded
preceding)
from journal_entries
order by account_id, id
dbt compile
Data Build Tool (DBT) is a popular open-source tool used in the data analytics and data
engineering fields. DBT helps data professionals transform, model, and prepare data for
analysis. If you’re preparing for an interview related to DBT, it’s important to be well-versed
in its concepts and functionalities. To help you prepare, here’s a list of common interview
1. What is DBT?
Answer: DBT, short for Data Build Tool, is an open-source data transformation and modeling
tool. It helps analysts and data engineers manage the transformation and preparation of
Answer:DBT is primarily used for data transformation, modeling, and preparing data for
analysis and reporting. It is commonly used in data warehouses to create and maintain data
pipelines.
Answer: Unlike traditional ETL tools, DBT focuses on transforming and modeling data within
the data warehouse itself, making it more suitable for ELT (Extract, Load, Transform)
workflows. DBT leverages the power and scalability of modern data warehouses and allows
Answer: A DBT model is a SQL file that defines a transformation or a table within the data
warehouse. Models can be simple SQL queries or complex transformations that create
derived datasets.
data warehouse. Models are the transformed and structured datasets created using DBT to
support analytics.
Answer: A DBT project is a directory containing all the files and configurations necessary to
define data models, tests, and documentation. It is the primary unit of organization for DBT.
Answer: DAG stands for Directed Acyclic Graph, and in the context of DBT, it represents the
dependencies between models. DBT uses a DAG to determine the order in which models
are built.
Answer: To write a DBT model, you create a `.sql` file in the appropriate project directory,
9. What are DBT macros, and how are they useful in transformations?
Answer: DBT macros are reusable SQL code snippets that can simplify and standardize
columns.
10. How can you perform testing and validation of DBT models?
Answer: You can perform testing in DBT by writing custom SQL tests to validate your data
models. These tests can check for data quality, consistency, and other criteria to ensure
Answer: Deploying DBT models to production typically involves using DBT Cloud, CI/CD
pipelines, or other orchestration tools. You’ll need to compile and build the models and then
Answer: DBT integrates with version control systems like Git, allowing teams to collaborate
on DBT projects and track changes to models over time. It provides a clear history of
13. What are some common performance optimization techniques for DBT models?
Answer: Performance optimization in DBT can be achieved by using techniques like
materialized views, optimizing SQL queries, and using caching to reduce query execution
times.
Answer: DBT provides logs and diagnostics to help monitor and troubleshoot issues. You
can also use data warehouse-specific monitoring tools to identify and address performance
problems.
15. Can DBT work with different data sources and data warehouses?
Answer: Yes, DBT supports integration with a variety of data sources and data warehouses,
including Snowflake, BigQuery, Redshift, and more. It’s adaptable to different cloud and on-
premises environments.
16. How does DBT handle incremental loading of data from source systems?
Answer: DBT can handle incremental loading by using source freshness checks and
managing data updates from source systems. It can be configured to only transform new or
changed data.
17. What security measures does DBT support for data access and transformation?
Answer: DBT supports the security features provided by your data warehouse, such as row-
level security and access control policies. It’s important to implement proper access controls
Answer: Sensitive data in DBT models should be handled according to your organization’s
data security policies. This can involve encryption, tokenization, or other data protection
measures.
1)View (Default):
Purpose: Views are virtual tables that are not materialized. They are essentially saved
Use Case: Useful for simple transformations or when you want to reference a SQL query in
multiple models.
{{ config(
materialized='view'
) }}
SELECT
...
FROM ...
2)Table:
Purpose: Materializes the result of a SQL query as a physical table in your data warehouse.
Use Case: Suitable for intermediate or final tables that you want to persist in your data
warehouse.
{{ config(
materialized='table'
) }}
SELECT
...
INTO {{ ref('my_table') }}
FROM ...
3)Incremental:
Purpose: Materializes the result of a SQL query as a physical table, but is designed to be
Use Case: Ideal for situations where you want to update your table with only the new or
{{ config(
materialized='incremental'
) }}
SELECT
...
FROM ...
Purpose: Similar to the incremental materialization, but specifies a unique key that dbt can
Use Case: Useful when dbt needs a way to identify changes in the data.
{{ config(
materialized='table',
unique_key='id'
) }}
SELECT
...
INTO {{ ref('my_table') }}
FROM ...
5)Snapshot:
Purpose: Materializes a table in a way that retains a version history of the data, allowing
Use Case: Useful for slowly changing dimensions or situations where historical data is
important.
{{ config(
materialized='snapshot'
) }}
SELECT
...
INTO {{ ref('my_snapshot_table') }}
FROM ...
Answer: Dbt provides several types of tests that you can use to validate your data. Here are
version: 2
models:
- name: my_model
tests:
- unique:
columns: [id]
models:
- name: my_model
tests:
- not_null:
columns: [name, age]
version: 2
models:
- name: my_model
tests:
- accepted_values:
column: status
values: ['active', 'inactive']
Verifies that the values in a foreign key column match primary key values in the referenced
table.
version: 2
models:
- name: orders
tests:
- relationship:
to: ref('customers')
field: customer_id
Checks that foreign key relationships are maintained between two tables.
version: 2
models:
- name: orders
tests:
- referential_integrity:
to: ref('customers')
field: customer_id
version: 2
models:
- name: my_model
tests:
- custom_sql: "column_name > 0"
21.What is seed?
Answer: A “seed” refers to a type of dbt model that represents a table or view containing
static or reference data. Seeds are typically used to store data that doesn’t change often and
doesn’t require transformation during the ETL (Extract, Transform, Load) process.
1. Static Data: Seeds are used for static or reference data that doesn’t change
frequently. Examples include lookup tables, reference data, or any data that
2. Initial Data Load: Seeds are often used to load initial data into a data warehouse
or data mart. This data is typically loaded once and then used as a stable
3. YAML Configuration: In dbt, a seed is defined in a YAML file where you specify
the source of the data and the destination table or view in your data warehouse.
The YAML file also includes configurations for how the data should be loaded.
version: 2
sources:
- name: my_seed_data
tables:
- name: my_seed_table
seed:
freshness: { warn_after: '7 days', error_after: '14 days' }
Answer: Pre-hooks and Post-hooks are mechanisms to execute SQL commands or scripts
before and after the execution of dbt models, respectively. dbt is an open-source tool that
1)Pre-hooks:
models.
It allows you to perform setup tasks or run additional SQL commands before the
main dbt modeling process.
Common use cases for pre-hooks include tasks such as creating temporary
tables, loading data into staging tables, or performing any other necessary setup
Example of a pre-hook :
-- models/my_model.sql
{{ config(
pre_hook = "CREATE TEMP TABLE my_temp_table AS SELECT * FROM my_source_table"
) }}
SELECT
column1,
column2
FROM
my_temp_table
2)Post-hooks:
It allows you to perform cleanup tasks, log information, or execute additional SQL
tables, logging information about the run, or deleting temporary tables created
Example of a post-hook :
-- models/my_model.sql
SELECT
column1,
column2
FROM
my_source_table
{{ config(
post_hook = "UPDATE metadata_table SET last_run_timestamp = CURRENT_TIMESTAMP"
) }}
23.what is snapshots?
Answer: “snapshots” refer to a type of dbt model that is used to track changes over time in a
table or view. Snapshots are particularly useful for building historical reporting or analytics,
where you want to analyze how data has changed over different points in time.
2. Unique Identifiers: To track changes over time, dbt relies on unique identifiers
(primary keys) in the underlying data. These identifiers are used to determine
which rows have changed, and dbt creates new records in the snapshot table
accordingly.
historical version of a record was valid. This allows you to query the data as it
creating a separate SQL file for each snapshot table. This file defines the base
table or view you’re snapshotting, the primary key, and any other necessary
configurations.
Here’s a simplified example:
-- snapshots/customer_snapshot.sql
{{ config(
materialized='snapshot',
unique_key='customer_id',
target_database='analytics',
target_schema='snapshots',
strategy='timestamp'
) }}
SELECT
customer_id,
name,
email,
address,
current_timestamp() as snapshot_timestamp
FROM
source.customer;
24.What is macros?
Answer: macros refer to reusable blocks of SQL code that can be defined and invoked within
dbt models. dbt macros are similar to functions or procedures in other programming
languages, allowing you to encapsulate and reuse SQL logic across multiple queries.
SQL code that can take parameters, making it flexible and reusable.
-- my_macro.sql
{% macro my_macro(parameter1, parameter2) %}
SELECT
column1,
column2
FROM
my_table
WHERE
condition1 = {{ parameter1 }}
AND condition2 = {{ parameter2 }}
{% endmacro %}
2. Invocation: You can then use the macro in your dbt models by referencing it.
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
When you run the dbt project, dbt replaces the macro invocation with the actual SQL code
3. Parameters: Macros can accept parameters, making them dynamic and reusable for
different scenarios. In the example above, parameter1 and parameter2 are parameters that
4. Code Organization: Macros help in organizing and modularizing your SQL code. They
are particularly useful when you have common patterns or calculations that need to be
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
-- another_model.sql
{{ my_macro(parameter1=2, parameter2='another_value') }}
Answer: Aproject structure refers to the organization and layout of files and directories within
a dbt project. dbt is a command-line tool that enables data analysts and engineers to
transform data in their warehouse more effectively. The project structure in dbt is designed
to be modular and organized, allowing users to manage and version control their analytics
code easily.
1. Models Directory:
This is where you store your SQL files containing dbt models. Each model represents a
logical transformation or aggregation of your raw data. Models are defined using SQL syntax
and are typically organized into subdirectories based on the data source or business logic.
2. Data Directory:
The data directory is used to store any data files that are required for your dbt
transformations. This might include lookup tables, reference data, or any other supplemental
This directory contains SQL files that are used for ad-hoc querying or exploratory analysis.
These files are separate from the main models and are not intended to be part of the core
4. Tests Directory:
dbt allows you to write tests to ensure the quality of your data transformations.
The tests directory is where you store YAML files defining the tests for your models. Tests
can include checks on the data types, uniqueness, and other criteria.
5. Snapshots Directory:
Snapshots are used for slowly changing dimensions or historical tracking of data changes.
The snapshots directory is where you store SQL files defining the logic for these snapshots.
6. Macros Directory:
Macros in dbt are reusable pieces of SQL code. The macros directory is where you store
these macros, and they can be included in your models for better modularity and
maintainability.
7. Docs Directory:
This directory is used for storing documentation for your dbt project. Documentation is
crucial for understanding the purpose and logic behind each model and transformation.
8. dbt_project.yml:
This YAML file is the configuration file for your dbt project. It includes settings such as the
9. Profiles.yml:
This file contains the connection details for your data warehouse. It specifies how to connect
to your database, including the type of database, host, username, and password.
You may have additional directories for custom scripts, notebooks, or other artifacts related
Having a well-organized project structure makes it easier to collaborate with team members,
maintain code, and manage version control. It also ensures that your analytics code is
Answer: “data refresh” typically refers to the process of updating or reloading data in your
data warehouse. Dbt is a command-line tool that enables data analysts and engineers to
transform data in their warehouse more effectively. It allows you to write modular SQL
Here’s a brief overview of the typical workflow involving data refresh in dbt:
1. Write Models: Analysts write SQL queries to transform raw data into analysis-
2. Run dbt: Analysts run dbt to execute the SQL queries and create or update the
tables in the data warehouse. This process is often referred to as a dbt run.
3. Data Refresh: After the initial run, you may need to refresh your data regularly to
incremental models. These models only transform and refresh the data that has
changed since the last run, rather than reprocessing the entire dataset. This is
particularly useful for large datasets where a full refresh may be time-consuming.
model depends on another model, dbt ensures that the dependencies are run
raw data into a clean, structured format for analysis. This approach promotes repeatability,
Change the materialization that a model uses – a materialization determines the SQL that
dbt uses to create the model in your warehouse.
3. Can I store my models in a directory other than the ⊨⊨ directory in my project?
By default, dbt expects your seed files to be located in the models subdirectory of your
project.
To change this, update the source-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
source-paths: [“transformations”]
26. Can I store my models in a directory other than the ⊨⊨ directory in my project?
By default, dbt expects your seed files to be located in the models subdirectory of your
project.
To change this, update the source-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
source-paths: [“transformations”]
27. Can I connect my dbt project to two databases?
It depends on the warehouse used in your tech stack.
dbt projects connecting to warehouses like Snowflake or Bigquery—these empower one set
of credentials to draw from all datasets or ‘projects’ available to an account—are sometimes
said to connect to more than one database.
dbt projects connecting to warehouses like Redshift and Postgres—these tie one set of
credentials to one database—are said to connect to one database only.
dbt (Data Build Tool) Overview: What is dbt and What Can It Do for My Data Pipeline?
There are many tools on the market to help your organization transform data and make it
accessible for business users. One that we recommend and use often—dbt (data build tool)
—focuses solely on making the process of transforming data simpler and faster. In this blog
we will discuss what dbt is, how it can transform the way your organization curates its data
for decision making, and how you can get started with using dbt (data build tool).
Data plays an instrumental role in decision making for organizations. As the volume of data
increases, so does the need to make it accessible to everyone within your organization to
use. However, because there is a shortage of data engineers in the marketplace, for most
organizations there isn’t enough time or resources available to curate data and make data
analytics ready.
Disjointed sources, data quality issues, and inconsistent definitions for metrics and
business attributes lead to confusion, redundant efforts, and poor information being
distributed for decision making. Transforming your data allows you to integrate, clean, de-
duplicate, restructure, filter, aggregate, and join your data—enabling your organization to
develop valuable, trustworthy insights through analytics and reporting. There are many
tools on the market to help you do this, but one in particular—dbt (data build tool)—
simplifies and speeds up the process of transforming data and building data pipelines.
What is dbt?↵
dbt (data build tool) makes data engineering activities accessible to people with data
analyst skills to transform the data in the warehouse using simple select statements,
effectively creating your entire transformation process with code. You can write custom
business logic using SQL, automate data quality testing, deploy the code, and deliver
trusted data with data documentation side-by-side with the code. This is more important
today than ever due to the shortage of data engineering professionals in the marketplace.
Anyone who knows SQL can now build production-grade data pipelines, reducing the
barrier to entry that previously limited staffing capabilities for legacy technologies.
In short, dbt (data build tool) turns your data analysts into engineers and allows them to
own the entire analytics engineering workflow.
Hear why dbt is the iFit engineering team’s favorite tool and how it helped them drive triple-
digit growth for the company:
dbt’s ELT methodology brings increased agility and speed to iFit’s data pipeline. What
would have taken months with traditional ETL tools, now takes weeks or days.
With dbt, data analysts take ownership of the entire analytics engineering workflow from
writing data transformation code all the way through to deployment and documentation—as
well as to becoming better able to promote a data-driven culture within the organization.
They can:
1. Quickly and easily provide clean, transformed data ready for analysis:
The dbt Cloud UI offers an attractive interface for individuals of all ranges of experience to
comfortably develop in.
code:
Continuous integration means less time testing and quicker time to development, especially
with dbt Cloud. You don’t need to push an entire repository when there are necessary
changes to deploy, but rather just the components that change. You can test all the
changes that have been made before deploying your code into production. dbt Cloud also
has integration with GitHub for automation of your continuous integration pipelines, so you
won’t need to manage your own orchestration, which simplifies the process.
While configuring a continuous integration job in the dbt Cloud UI, you can take advantage
of dbt’s sleek slim UI feature and even use webhooks to run jobs automatically when a pull
request is open.
dbt (data build tool) allows you to establish macros and integrate other functions outside of
SQL’s capabilities for advanced use cases. Macros in Jinja are pieces of code that can be
used multiple times. Instead of starting at the raw data with every analysis, analysts instead
build up reusable data models that can be referenced in subsequent work.
Instead of repeating code to create a hashed surrogate key, create a dynamic macro with
Jinja and SQL to consolidate the logic in one spot using dbt.
4. Maintain data documentation and definitions within dbt as they build and develop
lineage graphs:
Data documentation is accessible, easily updated, and allows you to deliver trusted data
across the organization. dbt (data build tool) automatically generates documentation around
descriptions, models dependencies, model SQL, sources, and tests. dbt creates lineage
graphs of the data pipeline, providing transparency and visibility into what the data is
describing, how it was produced, as well as how it maps to business logic.
Lineage is automatically generated for all your models in dbt. This has saved teams
numerous hours in manual documentation time.
There is no need to host an orchestration tool when using dbt Cloud. It includes a feature
that provides full autonomy with scheduling production refreshes at whatever cadence the
business wants.
Scheduling is simplified in the dbt Cloud UI. Just give it directions on what time you want a
production job to run, and it will take it from there.
dbt (data build tool) comes prebuilt with unique, not null, referential integrity, and accepted
value testing. Additionally, you can write your own custom tests using a combination of Jinja
and SQL. To apply any test on a given column, you simply reference it under the same
YAML file used for documentation for a given table or schema. This makes testing data
integrity an almost effortless process.
Simple example of applying tests on the primary key for a table in a project.
Talk to an expert about your dbt needs.
Before learning dbt (data build tool), there are three pre-requisites that we recommend:
1. SQL: Since dbt uses SQL as its core language to perform transformations, you must
be proficient in using SQL SELECT statements. There are plenty of courses online
available if you don’t have this experience, so make sure to find one that gives you
the necessary foundation to begin learning dbt.
2. Modeling: Like any other data transformation tool, you should have
some strategy when it comes to data modeling. This will be critical for re-usability of
code, drilling down, and performance optimization. Don’t just adopt the model of your
data sources, we recommend transforming data into the language and structure of
the business. Modeling will be essential to structure your project and find lasting
success.
3. Git: If you are interested in learning how to use dbt Core, you will need to be
proficient in Git. We recommend finding any course that covers the Git Workflow, Git
Branching, and using Git in a team setting. There are lots of great options available
online, so explore and find one that you like.
1. The dbt Labs Free dbt Fundamentals Course: This course is a great starting point
for any individual interested in learning the basics on using dbt (data build cloud).
This covers many critical concepts like setting up dbt, creating models and tests,
generating documentation, deploying your project, and much more.
2. The “Getting Started Tutorial” from dbt Labs: Although there is some overlap
with concepts from the fundamentals course above, the “getting started tutorial” is a
comprehensive hands-on way to learn as you go. There are video series offered for
both using dbt Core and dbt Cloud. If you really want to dive in, you can find a
sample dataset from online to model out as you go through the videos. This is a
great way to learn how to use dbt (data build tool) in a way that will directly reflect
how you would build out a project for your organization.
3. Join the dbt Slack Community: This is an active community of thousands of
members that range from beginner to advanced. There are channels like #learn-on-
demand and #advice-dbt-for-beginners that will be very helpful for a beginner to ask
questions as they go through the above resources.
dbt (data build tool) simplifies and speeds up the process of transforming data and building
data pipelines. Now is the time to dive in and learn how to use it to help your organization
curate its data for better decision making.
A data model organizes different data elements and standardizes how they relate to one
another and real-world entity properties. So logically then, data modeling is the process of
creating those data models.
Data models are composed of entities, and entities are the objects and concepts whose
data we want to track. They, in turn, become tables found in a database. Customers,
products, manufacturers, and sellers are potential entities.
Each entity has attributes—details that the users want to track. For instance, a customer’s
name is an attribute.
With that out of the way, let’s check out those data modeling interview questions!
Physical data model - This is where the framework or schema describes how data
is physically stored in the database.
Conceptual data model - This model focuses on the high-level, user’s view of the
data in question
Logical data models - They straddle between physical and theoretical data
models, allowing the logical representation of data to exist apart from the physical
storage.
2. What is a Table?
A table consists of data stored in rows and columns. Columns, also known as fields, show
data in vertical alignment. Rows also called a record or tuple, represent data’s horizontal
alignment.
3. What is Normalization?
Database normalization is the process of designing the database in such a way that it
reduces data redundancy without sacrificing integrity.
Ensure relationships between the tables in addition to the data residing in the
tables
ERD stands for Entity Relationship Diagram and is a logical entity representation, defining
the relationships between the entities. Entities reside in boxes, and arrows symbolize
relationships.
A surrogate key, also known as a primary key, enforces numerical attributes. This surrogate
key replaces natural keys. Instead of having primary or composite primary keys, data
modelers create the surrogate key, which is a valuable tool for identifying records,
building SQL queries, and enhancing performance.
8. What Are the Critical Relationship Types Found in a Data Model? Describe Them.
The main relationship types are:
Identifying. A relationship line normally connects parent and child tables. But if a
child table’s reference column is part of the table’s primary key, the tables are
connected by a thick line, signifying an identifying relationship.
This is a data model that consists of all the entries required by an enterprise.
10. What Are the Most Common Errors You Can Potentially Face in Data Modeling?
These are the errors most likely encountered during data modeling.
Building overly broad data models: If tables are run higher than 200, the data
model becomes increasingly complex, increasing the likelihood of failure
Unnecessary surrogate keys: Surrogate keys must only be used when the natural
key cannot fulfill the role of a primary key
The purpose is missing: Situations may arise where the user has no clue about
the business’s mission or goal. It’s difficult, if not impossible, to create a specific
business model if the data modeler doesn’t have a workable understanding of the
company’s business model
The two design schema is called Star schema and Snowflake schema. The Star schema
has a fact table centered with multiple dimension tables surrounding it. A Snowflake
schema is similar, except that the level of normalization is higher, which results in the
schema looking like a snowflake.
A data mart is the most straightforward set of data warehousing and is used to focus on one
functional area of any given business. Data marts are a subset of data warehouses oriented
to a specific line of business or functional area of an organization (e.g., marketing, finance,
sales). Data enters data marts by an assortment of transactional systems, other data
warehouses, or even external sources.
Data sparsity defines how much data we have for a model’s specified dimension or entity. If
there is insufficient information stored in the dimensions, then more space is needed to
store these aggregations, resulting in an oversized, cumbersome database.
Entities can be broken down into several sub-entities or grouped by specific features. Each
sub-entity has relevant attributes and is called a subtype entity. Attributes common to every
entity are placed in a higher or super level entity, which is why they are called supertype
entities.
Metadata is defined as “data about data.” In the context of data modeling, it’s the data that
covers what types of data are in the system, what it’s used for, and who uses it.
No, it’s not an absolute requirement. However, denormalized databases are easily
accessible, easier to maintain, and less redundant.
19. What’s the Difference Between forwarding and Reverse Engineering, in the Context of
Data Models?
Forward engineering is a process where Data Definition Language (DDL) scripts are
generated from the data model itself. DDL scripts can be used to create databases.
Reverse Engineering creates data models from a database or scripts. Some data modeling
tools have options that connect with the database, allowing the user to engineer a database
into a data model.
20. What Are Recursive Relationships, and How Do You Rectify Them?
Recursive relationships happen when a relationship exists between an entity and itself. For
instance, a doctor could be in a health center’s database as a care provider, but if the
doctor is sick and goes in as a patient, this results in a recursive relationship. You would
need to add a foreign key to the health center’s number in each patient’s record.
22. Why Are NoSQL Databases More Useful than Relational Databases?
They have a dynamic schema, which means they can evolve and change as
quickly as needed
NoSQL databases have sharding, the process of splitting up and distributing data
to smaller databases for faster access
They offer failover and better recovery options thanks to the replication
This is a grouping of low-cardinality attributes like indicators and flags, removed from other
tables, and subsequently “junked” into an abstract dimension table. They are often used to
initiate Rapidly Changing Dimensions within data warehouses.
24. If a Unique Constraint Gets Applied to a Column, Will It Generate an Error If You
Attempt to Place Two Nulls in It?
No, it won’t, because null error values are never equal. You can put in numerous null values
in a column and not generate an error.
Learn over a dozen of data science tools and skills with PG Program in Data Science and
get access to masterclasses by Purdue faculty. Enroll now and add a shining star to your
data science resume!
Do You Want Data Modeling Training?
I hope these Data modeling interview questions have given you an idea of the kind of
questions can be asked in an interview. So, if you’re intrigued by what you’ve read about
data modeling and want to know how to become a data modeler, then you will want to
check the article that shows you how to become one.
But if you’re ready to accelerate your career in data science, then sign up for
Simplilearn’s Data Scientist Course. You will gain hands-on exposure to key technologies,
including R, SAS, Python, Tableau, Hadoop, and Spark. Experience world-class training by
an industry leader on the most in-demand Data Science and Machine learning skills.
The program boasts a half dozen courses, over 30 in-demand skills and tools, and more
than 15 real-life projects. So check out Simplilearn’s resources and get that new data
modeling career off to a great start!
In the world of data analytics, where information reigns supreme, businesses rely on robust
tools to manage and analyze their data effectively.
One such tool that has gained remarkable traction is dbt, or Data Build Tool. With its ability
to transform and analyze data efficiently, dbt has become a game-changer in the field of
data engineering and analysis.
To harness the power of dbt, organizations need skilled professionals who can navigate it's
intricacies and unleash its capabilities.
As a result, dbt-related job interviews have become increasingly critical for both employers
and candidates.
If you're preparing for a dbt-related job interview or seeking to evaluate candidates' dbt
skills, it's important to ask the right questions.
To help you with that, we have compiled a list of essential dbt interview questions for every
level. These questions cover a range of topics and will assess the candidate's knowledge
and understanding of dbt's core concepts, features, and best practices.
Question 1.1: What is dbt, and how does it differ from traditional ETL/ELT tools?
dbt stands for Data Build Tool and is designed to transform, test, and document
data. Unlike traditional ETL/ELT tools, dbt focuses on transforming data within a data
warehouse, utilizing SQL and version control systems.
Answer:
dbt (data build tool) is an open-source tool that enables analysts and data engineers to
transform, test, and manage data in their data warehouses. It uses SQL and YAML
configuration files to define transformations, models, and tests, making it easy to build and
maintain data pipelines.
To install and set up dbt (data build tool), follow these steps:
1). Install Python: Ensure Python is installed on your system. dbt requires Python 3.6 or
later.
2). Install dbt: Open your command line interface (CLI) and run the following command to
install dbt using pip, which is the Python package installer:
4). Initialize the project: Run the following command to initialize your dbt project:
dbt init
5). Configure your project: Open the dbt_project.yml file in your project directory and modify
it according to your project needs. This file contains project-level configurations such as the
target database, connection information, and plugins.
6). Set up your database connection: Open the profiles.yml file in your project directory and
configure your database connection details, including the database type, host, port,
username, password, and database name.
7). Test the setup: Run the following command to test your dbt installation and project
setup:
dbt debug
If everything is set up correctly, you should see debug information about your dbt project
and database connection.
With this, you have now installed and set up dbt. You can start using dbt to build, test, and
deploy your data models.
Question 1.3: What is the purpose of dbt models?
Models in dbt are SQL scripts that define transformations or aggregations on the
data. They can be used to create new tables, views, or materialized views, and they
serve as building blocks for data analysis.
Question 1.4: Explain the concept of "sources" and "seeds" in dbt.
Sources refer to external data tables that are used as inputs to dbt models. Seeds,
on the other hand, are a way to define static or reference data that can be used
within the dbt project.
2). Intermediate Level.
dbt allows for easy schema migrations by using the concept of "ref" and "source" in
model definitions. It tracks changes to models and supports incremental changes to
the data warehouse schema.
Question 2.2: What are the different types of dbt hooks, and when would you use
them?
dbt hooks are SQL scripts that are executed at specific points during the dbt
lifecycle. They can be pre-hooks (before a model is built), post-hooks (after a model
is built), or on-run-hooks (before and after running specific dbt commands).
Candidates should explain use cases for each hook type.
Question 2.2: How do you handle incremental or time-based data loads in dbt?
Incremental data loads can be handled using dbt's "merge" functionality, which
enables the comparison of source data with target tables to perform inserts, updates,
or upserts based on specific columns.
Question 2.2: Can you explain how dbt macros work?
Macros in dbt are reusable pieces of SQL code that can be shared across multiple
models. They help in simplifying complex logic, promoting code reusability, and
adhering to best practices.
3).Advanced Level.
Optimizing dbt performance is crucial for efficient data transformation. Here are a few
strategies to improve dbt's performance:
Incremental models: Utilize incremental models to only process and transform new
or changed data. This reduces unnecessary processing and improves overall
performance.
Caching: Configure dbt's caching feature to store the results of previously executed
models. This helps avoid repetitive computations and speeds up subsequent runs.
Materialized views: Leverage materialized views to precompute and store the results
of complex or frequently used queries. Materialized views provide faster access to
aggregated or derived data.
Query optimization: Analyze and optimize the SQL queries used in dbt models.
Consider indexing columns used for joins and filtering conditions, optimizing
subqueries, and using appropriate query techniques based on the underlying
database.
By implementing these performance optimization techniques, you can significantly enhance
the speed and efficiency of dbt transformations.
Question 3.2: What is the importance of testing in dbt, and how would you write tests
for dbt models?
The importance of testing in dbt lies in ensuring the accuracy, reliability, and quality
of data transformations. Testing helps validate data integrity, compliance with
business rules, and prevention of regressions. To write tests for dbt models, you can
use the built-in testing framework provided by dbt, utilizing the test macro to define
tests based on specific requirements such as column presence, data types,
relationships, or values.
Question 3.3:Can you describe the process of integrating dbt with a version control
system?
Integrating dbt with a version control system (VCS) allows for effective collaboration, code
management, and tracking of changes in your dbt project.
Set up a version control repository: Choose a VCS platform (e.g., Git, GitHub,
GitLab) and create a new repository to store your dbt project's code.
Initialize dbt as a Git repository: Navigate to your dbt project's root directory in your
command-line interface or terminal.
Run the following commands:
git init
git add .
git commit -m "Initial commit"
Connect your local repository to the remote repository: Link your local Git repository
to the remote repository you created on the VCS platform.
Run the following command, replacing with the URL of your remote repository:
Push your local repository to the remote repository: Upload your local dbt project
code to the remote repository using the following command:
git push -u origin master
Collaborate and manage changes: With the integration complete, you can now
collaborate with your team on the dbt project. Each team member can clone the
repository, make changes in their local environment, and use Git commands (git add,
git commit, git push) to push their changes to the remote repository.
Branching and pull requests: Utilize Git branching strategies to work on separate
features or experiments. When ready to merge changes, team members can create
pull requests on the VCS platform, allowing for code review and seamless integration
of changes into the main branch.
By integrating dbt with a version control system, you establish a structured and
collaborative development environment, enabling effective teamwork, change tracking, and
the ability to roll back changes if necessary.
Question 3.4: Have you worked with dbt packages? Explain their purpose and how to
use them.
dbt packages are reusable collections of dbt code, such as models, macros, and
tests, that can be shared and used across projects. Candidates should discuss how
to install, use, and create dbt packages.
More Practice Questions.
1). What are the benefits of using dbt?
2). What are the different types of dbt models?
3). How do you write a dbt model?
4). How do you run dbt?
5). How do you use dbt to handle data quality issues?
6). How do you use dbt to manage data lineage?
7). How do you use dbt to deploy changes to production?
8). How do you use dbt to test your data pipelines?
9). How do you use dbt to collaborate with other data engineers?
10). How do you use dbt to create custom macros?
11). How do you use dbt to integrate with other data tools?
12). How do you use dbt to automate your data workflow?
13). How do you use dbt to scale your data engineering efforts?
14). How do you use dbt to create a data-driven culture?
These are just a few examples of essential dbt interview questions. The specific questions
you will be asked will depend on the role you are interviewing for and the experience level
of the interviewer. However, these questions should give you a good starting point for
preparing for your interview.
In addition to these technical questions, you may also be asked behavioral questions about
your experience with dbt.
These questions will assess your skills and abilities in areas such as collaboration,
communication, and problem-solving. Be sure to practice answering these types of
questions as well.
If you are getting started with dbt, here is some of the resources you might find helpful: