What Are DBT Sources

dbt Cloud features
dbt Cloud is the fastest and most reliable way to deploy dbt. Develop, test, schedule,
document, and investigate data models all in one browser-based UI.
In addition to providing a hosted architecture for running dbt across your organization, dbt
Cloud comes equipped with turnkey support for scheduling jobs, CI/CD, hosting
documentation, monitoring and alerting, an integrated development environment (IDE), and
allows you to develop and run dbt commands from your local command line interface (CLI)
or code editor.
dbt Cloud's flexible plans and features make it well-suited for data teams of any size — sign
up for your free 14-day trial!
dbt Cloud CLI
Use the dbt Cloud CLI to develop, test, run, and version control dbt projects and commands
in your dbt Cloud development environment. Collaborate with team members, directly from
the command line.
dbt Cloud IDE
The IDE is the easiest and most efficient way to develop dbt models, allowing you to build,
test, run, and version control your dbt projects directly from your browser.
Manage environments
Set up and manage separate production and development environments in dbt Cloud to
help engineers develop and test code more efficiently, without impacting users or data.
Schedule and run dbt jobs
Create custom schedules to run your production jobs. Schedule jobs by day of the week,
time of day, or a recurring interval. Decrease operating costs by using webhooks to trigger
CI jobs and the API to start jobs.
Notifications
Set up and customize job notifications in dbt Cloud to receive email or slack alerts when a
job run succeeds, fails, or is cancelled. Notifications alert the right people when something
goes wrong instead of waiting for a user to report it.
Run visibility
View the history of your runs and the model timing dashboard to help identify where
improvements can be made to the scheduled jobs.
Host & share documentation
dbt Cloud hosts and authorizes access to dbt project documentation, allowing you to
generate data documentation on a schedule for your project. Invite teammates to dbt Cloud
to collaborate and share your project's documentation.
Supports GitHub, GitLab, AzureDevOPs
Seamlessly connect your git account to dbt Cloud and provide another layer of security to
dbt Cloud. Import new repositories, trigger continuous integration, clone repos using
HTTPS, and more!
Enable Continuous Integration
Configure dbt Cloud to run your dbt projects in a temporary schema when new commits are
pushed to open pull requests. This build-on-PR functionality is a great way to catch bugs
before deploying to production, and an essential tool in any analyst's belt.
Security
Manage risk with SOC-2 compliance, CI/CD deployment, RBAC, and ELT architecture.
dbt Semantic Layer*
Use the dbt Semantic Layer to define metrics alongside your dbt models and query them
from any integrated analytics tool. Get the same answers everywhere, every time.
Discovery API*
Enhance your workflow and run ad-hoc queries, browse schema, or query the dbt Semantic
Layer. dbt Cloud serves a GraphQL API, which supports arbitrary queries.
dbt Explorer*
Learn about dbt Explorer and how to interact with it to understand, improve, and leverage
your data pipelines.
Using defer in dbt Cloud
Defer is a powerful feature that allows developers to only build and run and test models
they've edited without having to first run and build all the models that come before them
(upstream parents). dbt powers this by using a production manifest for comparison, and
resolves the {{ ref() }} function with upstream production artifacts.
Both the dbt Cloud IDE and the dbt Cloud CLI enable users to natively defer to production
metadata directly in their development workflows.
By default, dbt follows these rules:
 dbt uses the production locations of parent models to resolve {{ ref() }} functions,
based on metadata from the production environment.
 If a development version of a deferred model exists, dbt preferentially uses the
development database location when resolving the reference.
 Passing the --favor-state flag overrides the default behavior and always resolve refs
using production metadata, regardless of the presence of a development relation.
For a clean slate, it's a good practice to drop the development schema at the start and end
of your development cycle.
Required setup
 You must select the Production environment checkbox in the Environment
Settings page.
o This can be set for one deployment environment per dbt Cloud project.
 You must have a successful job run first.
When using defer, it compares artifacts from the most recent successful production job,
excluding CI jobs.
Defer in the dbt Cloud IDE
To enable defer in the dbt Cloud IDE, toggle the Defer to production button on the
command bar. Once enabled, dbt Cloud will:
1. Pull down the most recent manifest from the Production environment for comparison
2. Pass the --defer flag to the command (for any command that accepts the flag)
For example, if you were to start developing on a new branch with nothing in your
development schema, edit a single model, and run dbt build -s state:modified — only the
edited model would run. Any {{ ref() }} functions will point to the production location of the
referenced models.
Select
the 'Defer to production' toggle on the bottom right of the command bar to enable defer in
the dbt Cloud IDE.
Defer in dbt Cloud CLI
One key difference between using --defer in the dbt Cloud CLI and the dbt Cloud IDE is
that --defer is automatically enabled in the dbt Cloud CLI for all invocations, compared with
production artifacts. You can disable it with the --no-defer flag.
The dbt Cloud CLI offers additional flexibility by letting you choose the source environment
for deferral artifacts. You can set a defer-env-id key in either
your dbt_project.yml or dbt_cloud.yml file. If you do not provide a defer-env-id setting, the
dbt Cloud CLI will use artifacts from your dbt Cloud environment marked "Production".
dbt_cloud.yml
defer-env-id: '123456'
dbt_project.yml
dbt_cloud:
defer-env-id: '123456'
Install dbt Cloud CLI
PUBLIC PREVIEW FUNCTIONALITY

The dbt Cloud CLI is currently in public preview. Share feedback or request features you'd
like to see on the dbt community Slack.
dbt Cloud natively supports developing using a command line (CLI), empowering team
members to contribute with enhanced flexibility and collaboration. The dbt Cloud CLI allows
you to run dbt commands against your dbt Cloud development environment from your local
command line.
dbt commands are run against dbt Cloud's infrastructure and benefit from:
 Secure credential storage in the dbt Cloud platform.

 Automatic deferral of build artifacts to your Cloud project's production environment.
 Speedier, lower-cost builds.
 Support for dbt Mesh (cross-project ref),
 Significant platform improvements, to be released over the coming months.
Prerequisites
The dbt Cloud CLI is available in all deployment regions and for both multi-tenant and
single-tenant accounts (Azure single-tenant not supported at this time).
 Ensure you are using dbt version 1.5 or higher. Refer to dbt Cloud versions to
upgrade.
 Note that SSH tunneling for Postgres and Redshift connections doesn't support the
dbt Cloud CLI yet.
Install dbt Cloud CLI

You can install the dbt Cloud CLI on the command line by using one of these methods.
View a video tutorial for a step-by-step guide to installation.

 macOS (brew)
 Windows (native executable)
 Linux (native executable)
 Existing dbt Core users (pip)
Before you begin, make sure you have Homebrew installed in your code editor or command
line terminal. Refer to the FAQs if your operating system runs into path conflicts.
1. Verify that you don't already have dbt Core installed:
which dbt
o If you see a dbt not found, you're good to go. If the dbt help text appears,
use pip uninstall dbt to remove dbt Core from your system.
2. Install the dbt Cloud CLI with Homebrew:
o First, remove the dbt-labs tap, the separate repository for packages, from
Homebrew. This prevents Homebrew from installing packages from that
repository:
brew untap dbt-labs/dbt
o Then, add and install the dbt Cloud CLI as a package:
brew tap dbt-labs/dbt-cli

brew install dbt
If you have multiple taps, use brew install dbt-labs/dbt-cli/dbt.
3. Verify your installation by running dbt --help in the command line. If you see the
following output, your installation is correct:
The dbt Cloud CLI - an ELT tool for running SQL transformations and data models in
dbt Cloud...
If you don't see this output, check that you've deactivated pyenv or venv and don't
have a global dbt version installed.
o Note that you no longer need to run the dbt deps command when your
environment starts. This step was previously required during initialization.
However, you should still run dbt deps if you make any changes to
your packages.yml file.
4. Clone your repository to your local computer using git clone. For example, to clone a
GitHub repo using HTTPS format, run git clone https://github.com/YOUR-
USERNAME/YOUR-REPOSITORY.
5. After cloning your repo, configure the dbt Cloud CLI for your dbt Cloud project. This
lets you run dbt commands like dbt environment show to view your dbt Cloud
configuration or dbt compile to compile your project and validate models and tests.
You can also add, edit, and synchronize files with your repo.
Update dbt Cloud CLI

The following instructions explain how to update the dbt Cloud CLI to the latest version
depending on your operating system.
During the public preview period, we recommend updating before filing a bug report. This is
because the API is subject to breaking changes.
 macOS (brew)
 Windows (executable)
 Linux (executable)
 Existing dbt Core users (pip)
To update the dbt Cloud CLI, run brew update and then brew upgrade dbt.
Using VS Code extensions
Visual Studio (VS) Code extensions enhance command line tools by adding extra
functionalities. The dbt Cloud CLI is fully compatible with dbt Core, however, it doesn't
support some dbt Core APIs required by certain tools, for example, VS Code extensions.
You can use extensions like dbt-power-user with the dbt Cloud CLI by following these
steps:
 Install it using Homebrew along with dbt Core.

 Create an alias to run the dbt Cloud CLI as dbt-cloud.
This setup allows dbt-power-user to continue to work with dbt Core in the background,
alongside the dbt Cloud CLI. For more, check the dbt Power User documentation.
FAQs
What's the difference between the dbt Cloud CLI and dbt Core?Hover to view
How do I run both the dbt Cloud CLI and dbt Core?Hover to view
How to create an alias?Hover to view
Why am I receiving a `Session occupied` error?Hover to view
Configure and use the dbt Cloud CLI
PUBLIC PREVIEW FUNCTIONALITY

The dbt Cloud CLI is currently in public preview. Share feedback or request features you'd
like to see on the dbt community Slack.
Prerequisites
 You must set up a project in dbt Cloud.
o Note — If you're using the dbt Cloud CLI, you can connect to your data
platform directly in the dbt Cloud interface and don't need a profiles.yml file.
 You must have your personal development credentials set for that project. The dbt
Cloud CLI will use these credentials, stored securely in dbt Cloud, to communicate
with your data platform.
 You must be on dbt version 1.5 or higher. Refer to dbt Cloud versions to upgrade.
Configure the dbt Cloud CLI

Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt Cloud
project.
1. Ensure you meet the prerequisites above.
2. Download your credentials from dbt Cloud by clicking on the Try the dbt Cloud
CLI banner on the dbt Cloud homepage. Alternatively, if you're in dbt Cloud, you can
download the credentials from the links provided based on your region:
o North America: https://cloud.getdbt.com/cloud-cli

o EMEA: https://emea.dbt.com/cloud-cli
o APAC: https://au.dbt.com/cloud-cli
o North American Cell 1: https:/ACCOUNT_PREFIX.us1.dbt.com/cloud-cli
o Single-tenant: https://YOUR_ACCESS_URL/cloud-cli
3. Follow the banner instructions and download the config file to:
o Mac or Linux: ~/.dbt/dbt_cloud.yml

o Windows: C:\Users\yourusername\.dbt\dbt_cloud.yml
The config file looks like this:
version: "1"
context:
active-project: "<project id from the list below>"
active-host: "<active host from the list>"
defer-env-id: "<optional defer environment id>"
projects:
- project-id: "<project-id>"
account-host: "<account-host>"
api-key: "<user-api-key>"
- project-id: "<project-id>"
account-host: "<account-host>"
api-key: "<user-api-key>"
4. After downloading the config file, navigate to a dbt project in your terminal:
cd ~/dbt-projects/jaffle_shop
5. In your dbt_project.yml file, ensure you have or include a dbt-cloud section with
a project-id field. The project-id field contains the dbt Cloud project ID you want to
use.
# dbt_project.yml
name:
version:
# Your project configs...
dbt-cloud:
project-id: PROJECT_ID
oTo find your project ID, select Develop in the dbt Cloud navigation menu. You
can use the URL to find the project ID. For example,
in https://cloud.getdbt.com/develop/26228/projects/123456, the project ID
is 123456.
6. You should now be able to use the dbt Cloud CLI and run dbt commands like dbt
environment show to view your dbt Cloud configuration details or dbt compile to
compile models in your dbt project.
With your repo recloned, you can add, edit, and sync files with your repo.
Set environment variables
To set environment variables in the dbt Cloud CLI for your dbt project:
1. Select the gear icon on the upper right of the page.

2. Then select Profile Settings, then Credentials.
3. Click on your project and scroll to the Environment Variables section.
4. Click Edit on the lower right and then set the user-level environment variables.
o Note, when setting up the dbt Semantic Layer, using environment
variables like {{env_var('DBT_WAREHOUSE')}} is not supported. You should
use the actual credentials instead.
Use the dbt Cloud CLI

 The dbt Cloud CLI uses the same set of dbt commands and MetricFlow
commands as dbt Core to execute the commands you provide. For example, use
the dbt environment command to view your dbt Cloud configuration details.
 It allows you to automatically defer build artifacts to your Cloud project's production
environment.
 It also supports project dependencies, which allows you to depend on another
project using the metadata service in dbt Cloud.
o Project dependencies instantly connect to and reference (or ref) public models
defined in other projects. You don't need to execute or analyze these
upstream models yourself. Instead, you treat them as an API that returns a
dataset.
USE THE --help FLAG

As a tip, most command-line tools have a --help flag to show available commands and
arguments. Use the --help flag with dbt in two ways:
 dbt --help: Lists the commands available for dbt

 dbt run --help: Lists the flags available for the run command
About the dbt Cloud IDE
The dbt Cloud integrated development environment (IDE) is a single web-based interface
for building, testing, running, and version-controlling dbt projects. It compiles dbt code into
SQL and executes it directly on your database.
The dbt Cloud IDE offers several keyboard shortcuts and editing features for faster and
more efficient data platform development and governance:
 Syntax highlighting for SQL: Makes it easy to distinguish different parts of your code,
reducing syntax errors and enhancing readability.
 Auto-completion: Suggests table names, arguments, and column names as you
type, saving time and reducing typos.
 Code formatting and linting: Help standardize and fix your SQL code effortlessly.
 Navigation tools: Easily move around your code, jump to specific lines, find and
replace text, and navigate between project files.
 Version control: Manage code versions with a few clicks.
These features create a powerful editing environment for efficient SQL coding, suitable for
both experienced and beginner developers.
The
dbt Cloud IDE includes version control,files/folders, an editor, a command/console, and
more.
Enabl
e dark mode for a great viewing experience in low-light environments.
DISABLE AD BLOCKERS
To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. This
is because some project file names, such as google_adwords.sql, might resemble ad traffic
and trigger ad blockers.
Prerequisites
 A dbt Cloud account and Developer seat license
 A git repository set up and git provider must have write access enabled.
See Connecting your GitHub Account or Importing a project by git URL for detailed
setup instructions
 A dbt project connected to a data platform
 A development environment and development credentials set up
 The environment must be on dbt version 1.0 or higher
dbt Cloud IDE features

The dbt Cloud IDE comes with features that make it easier for you to develop, build,
compile, run, and test data models.
To understand how to navigate the IDE and its user interface elements, refer to the IDE
user interface page.
Feature Info
Keyboard You can access a variety of commands and actions in the IDE by choosing the
shortcuts appropriate keyboard shortcut. Use the shortcuts for common tasks like
building modified models or resuming builds from the last failure.
File state Ability to see when changes or actions have been made to the file. The
indicators indicators M, D, A, and • appear to the right of your file or folder name and
indicate the actions performed:
Feature Info
- Unsaved (•) — The IDE detects unsaved changes to your file/folder

- Modification (M) — The IDE detects a modification of existing files/folders
- Added (A) — The IDE detects added files
- Deleted (D) — The IDE detects deleted files.
IDE version The IDE version control section and git button allow you to apply the concept
control of version control to your project directly into the IDE.
- Create or change branches

- Commit or revert individual files by right-clicking the edited file
- Resolve merge conflicts
- Execute git commands using the git button
- Link to the repo directly by clicking the branch name
Project Generate and view your project documentation for your dbt project in real-time.
documentation You can inspect and verify what your project's documentation will look like
before you deploy your changes to production.
Preview and You can compile or preview code, a snippet of dbt code, or one of your dbt
Compile button models after editing and saving.
Build, test, and Build, test, and run your project with a button click or by using the Cloud IDE
run button command bar.
Command bar You can enter and run commands from the command bar at the bottom of the
IDE. Use the rich model selection syntax to execute dbt commands directly
within dbt Cloud. You can also view the history, status, and logs of previous
runs by clicking History on the left of the bar.
Drag and drop Drag and drop files located in the file explorer, and use the file breadcrumb on
the top of the IDE for quick, linear navigation. Access adjacent files in the
same file by right-clicking on the breadcrumb file.
Organize tabs - Move your tabs around to reorganize your work in the IDE
and files - Right-click on a tab to view and select a list of actions, including duplicate
files
- Close multiple, unsaved tabs to batch save your work
- Double click files to rename files
Find and replace - Press Command-F or Control-F to open the find-and-replace bar in the upper
right corner of the current file in the IDE. The IDE highlights your search results
in the current file and code outline
- You can use the up and down arrows to see the match highlighted in the
current file when there are multiple matches
- Use the left arrow to replace the text with something else
Multiple You can make multiple selections for small and simultaneous edits. The below
selections commands are a common way to add more cursors and allow you to insert
cursors below or above with ease.
- Option-Command-Down arrow or Ctrl-Alt-Down arrow

- Option-Command-Up arrow or Ctrl-Alt-Up arrow
Feature Info
- Press Option and click on an area or Press Ctrl-Alt and click on an area
Lint and Format Lint and format your files with a click of a button, powered by SQLFluff, sqlfmt,
Prettier, and Black.
Git diff view Ability to see what has been changed in a file before you make a pull request.
dbt autocomplete New autocomplete features to help you develop faster:
- Use ref to autocomplete your model names

- Use source to autocomplete your source name + table name
- Use macro to autocomplete your arguments
- Use env var to autocomplete env var
- Start typing a hyphen (-) to use in-line autocomplete in a YAML file
DAG in the IDE You can see how models are used as building blocks from left to right to
transform your data from raw sources into cleaned-up modular derived pieces
and final outputs on the far right of the DAG. The default view is 2+model+2
(defaults to display 2 nodes away), however, you can change it to +model+
(full DAG). Note the --exclude flag isn't supported.
Status bar This area provides you with useful information about your IDE and project
status. You also have additional options like enabling light or dark mode,
restarting the IDE, or recloning your repo.
Dark mode From the status bar in the Cloud IDE, enable dark mode for a great viewing
experience in low-light environments.
Start-up process
There are three start-up states when using or launching the Cloud IDE:
 Creation start — This is the state where you are starting the IDE for the first time.
You can also view this as a cold start (see below), and you can expect this state to
take longer because the git repository is being cloned.
 Cold start — This is the process of starting a new develop session, which will be
available for you for three hours. The environment automatically turns off three hours
after the last activity. This includes compile, preview, or any dbt invocation, however,
it does not include editing and saving a file.
 Hot start — This is the state of resuming an existing or active develop session within
three hours of the last activity.
Work retention
The Cloud IDE needs explicit action to save your changes. There are three ways your work
is stored:
 Unsaved, local code — The browser stores your code only in its local storage. In
this state, you might need to commit any unsaved changes in order to switch
branches or browsers. If you have saved and committed changes, you can access
the "Change branch" option even if there are unsaved changes. But if you attempt to
switch branches without saving changes, a warning message will appear, notifying
you that you will lose any unsaved changes.
If you attempt to switch branches without saving changes, a warning message will
appear, telling you that you will lose your changes.
 Saved but uncommitted code — When you save a file, the data gets stored in
durable, long-term storage, but isn't synced back to git. To switch branches using
the Change branch option, you must "Commit and sync" or "Revert" changes.
Changing branches isn't available for saved-but-uncommitted code. This is to ensure
your uncommitted changes don't get lost.
 Committed code — This is stored in the branch with your git provider and you can
check out other (remote) branches.
Access the Cloud IDE

DISABLE AD BLOCKERS
To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. This
is because some project file names, such as google_adwords.sql, might resemble ad traffic
and trigger ad blockers.
In order to start experiencing the great features of the Cloud IDE, you need to first set up
a dbt Cloud development environment. In the following steps, we outline how to set up
developer credentials and access the IDE. If you're creating a new project, you will
automatically configure this during the project setup.
The IDE uses developer credentials to connect to your data platform. These developer
credentials should be specific to your user and they should not be super user credentials or
the same credentials that you use for your production deployment of dbt.
Set up your developer credentials:
1. Navigate to your Credentials under Your Profile settings, which you can access
at https://YOUR_ACCESS_URL/settings/profile#credentials,
replacing YOUR_ACCESS_URL with the appropriate Access URL for your region
and plan.
2. Select the relevant project in the list.
3. Click Edit on the bottom right of the page.
4. Enter the details under Development Credentials.
5. Click Save.
Config
ure developer credentials in your Profile
6. Access the Cloud IDE by clicking Develop at the top of the page.
7. Initialize your project and familiarize yourself with the IDE and its delightful features.
Nice job, you're ready to start developing and building models 🎉!
Build, compile, and run projects

You can build, compile, run, and test dbt projects using the command bar or Build button.
Use the Build button to quickly build, run, or test the model you're working on. The Cloud
IDE will update in real-time when you run models, tests, seeds, and operations.
If a model or test fails, dbt Cloud makes it easy for you to view and download the run logs
for your dbt invocations to fix the issue.
Use dbt's rich model selection syntax to run dbt commands directly within dbt Cloud.
Previe
w, compile, or build your dbt project. Use the lineage tab to see your DAG.
Build and view your project's docs
The dbt Cloud IDE makes it possible to build and view documentation for your dbt project
while your code is still in development. With this workflow, you can inspect and verify what
your project's generated documentation will look like before your changes are released to
production.
Related docs
 How we style our dbt projects
 User interface
 Version control basics
 dbt Commands
Related questions
How can I fix my .gitignore file?Hover to view
A .gitignore file specifies which files git should intentionally ignore or 'untrack'. dbt Cloud
indicates untracked files in the project file explorer pane by putting the file or folder name
in italics.
If you encounter issues like problems reverting changes, checking out or creating a new
branch, or not being prompted to open a pull request after a commit in the dbt Cloud
IDE — this usually indicates a problem with the .gitignore file. The file may be missing or
lacks the required entries for dbt Cloud to work correctly.
Fix in the dbt Cloud IDE
To resolve issues with your gitignore file, adding the correct entries won't automatically
remove (or 'untrack') files or folders that have already been tracked by git. The
updated gitignore will only prevent new files or folders from being tracked. So you'll need to
first fix the gitignore file, then perform some additional git operations to untrack any
incorrect files or folders.
1. Launch the Cloud IDE into the project that is being fixed, by selecting Develop on
the menu bar.
2. In your File Explorer, check to see if a .gitignore file exists at the root of your dbt
project folder. If it doesn't exist, create a new file.
3. Open the new or existing gitignore file, and add the following:
# ✅ Correct
target/
dbt_packages/
logs/
# legacy -- renamed to dbt_packages in dbt v1
dbt_modules/
 Note — You can place these lines anywhere in the file, as long as they're on
separate lines. The lines shown are wildcards that will include all nested files and
folders. Avoid adding a trailing '*' to the lines, such as target/*.
For more info on gitignore syntax, refer to the Git docs.
4. Save the changes but don't commit.

5. Restart the IDE by clicking on the three dots next to the IDE Status button on the
lower right corner of the IDE screen and select Restart IDE.
Restart the
IDE by clicking the three dots on the lower right or click on the Status bar
6. Once the IDE restarts, go to the File Explorer to delete the following files or folders
(if they exist). No data will be lost:
o target, dbt_modules, dbt_packages, logs

7. Save and then Commit and sync the changes.
8. Restart the IDE again using the same procedure as step 5.
9. Once the IDE restarts, use the Create a pull request (PR) button under the Version
Control menu to start the process of integrating the changes.
10. When the git provider's website opens to a page with the new PR, follow the
necessary steps to complete and merge the PR into the main branch of that
repository.
o Note — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or
something else depending on the organizational naming conventions. The
goal is to merge these changes into the root branch that all other development
branches are created from.
11. Return to the dbt Cloud IDE and use the Change Branch button, to switch to the
main branch of the project.
12. Once the branch has changed, click the Pull from remote button to pull in all the
changes.
13. Verify the changes by making sure the files/folders in the .gitignore file are in italics.
A dbt
project on the main branch that has properly configured gitignore folders (highlighted in
italics).
Fix in the git provider
Sometimes it's necessary to use the git providers web interface to fix a
broken .gitignore file. Although the specific steps may vary across providers, the general
process remains the same.
There are two options for this approach: editing the main branch directly if allowed, or
creating a pull request to implement the changes if required:
 Edit in main branch
 Unable to edit main branch
When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of
your repo. Here are the following steps:
1. Go to your repository's web interface.
2. Switch to the main branch and the root directory of your dbt project.
3. Find the .gitignore file. Create a blank one if it doesn't exist.
4. Edit the file in the web interface, adding the following entries:
target/
dbt_packages/
logs/
# legacy -- renamed to dbt_packages in dbt v1
dbt_modules/
5. Commit (save) the file.

6. Delete the following folders from the dbt project root, if they exist. No data or code
will be lost:
o target, dbt_modules, dbt_packages, logs
7. Commit (save) the deletions to the main branch.
8. Switch to the dbt Cloud IDE, and open the project that you're fixing.
9. Reclone your repo in the IDE by clicking on the three dots next to the IDE
Status button on the lower right corner of the IDE screen, then select Reclone
Repo.
o Note — Any saved but uncommitted changes will be lost, so make sure you
copy any modified code that you want to keep in a temporary location outside
of dbt Cloud.
10. Once you reclone the repo, open the .gitignore file in the branch you're working in. If
the new changes aren't included, you'll need to merge the latest commits from the
main branch into your working branch.
11. Go to the File Explorer to verify the .gitignore file contains the correct entries and
make sure the untracked files/folders in the .gitignore file are in italics.
12. Great job 🎉! You've configured the .gitignore correctly and can continue with your
development!
For more info, refer to this detailed video for additional guidance.
Is there a cost to using the Cloud IDE?Hover to view
Not at all! You can use dbt Cloud when you sign up for the Free Developer plan, which
comes with one developer seat. If you’d like to access more features or have more
developer seats, you can upgrade your account to the Team or Enterprise plan.
Refer to dbt pricing plans for more details.

Can I be a contributor to dbt Cloud?Hover to view
As a proprietary product, dbt Cloud's source code isn't available for community
contributions. If you want to build something in the dbt ecosystem, we encourage you to
review [this article](/community/contributing/contributing-coding) about contributing to a dbt
package, a plugin, dbt-core, or this documentation site. Participation in open-source is a
great way to level yourself up as a developer, and give back to the community.
What is the difference between developing on the dbt Cloud IDE, the dbt Cloud CLI, and
dbt Core?Hover to view
You can develop dbt using the web-based IDE in dbt Cloud or on the command line
interface using the dbt Cloud CLI or open-source dbt Core, all of which enable you to
execute dbt commands. The key distinction between the dbt Cloud CLI and dbt Core is the
dbt Cloud CLI is tailored for dbt Cloud's infrastructure and integrates with all its features:
 dbt Cloud IDE: dbt Cloud is a web-based application that allows you to develop dbt
projects with the IDE, includes a purpose-built scheduler, and provides an easier
way to share your dbt documentation with your team. The IDE is a faster and more
reliable way to deploy your dbt models and provides a real-time editing and
execution environment for your dbt project.
 dbt Cloud CLI: The dbt Cloud CLI allows you to run dbt commands against your dbt
Cloud development environment from your local command line or code editor. It
supports cross-project ref, speedier, lower-cost builds, automatic deferral of build
artifacts, and more.
 dbt Core: dbt Core is an open-sourced software that’s freely available. You can build
your dbt project in a code editor, and run dbt commands from the command line
IDE user interface
The dbt Cloud IDE is a tool for developers to effortlessly build, test, run, and version-control
their dbt projects, and enhance data governance — all from the convenience of your
browser. Use the Cloud IDE to compile dbt code into SQL and run it against your database
directly -- no command line required!
This page offers comprehensive definitions and terminology of user interface elements,
allowing you to navigate the IDE landscape with ease.
The
Cloud IDE layout includes version control on the upper left, files/folders on the left, editor on
the right an command/console at the bottom
Basic layout
The IDE streamlines your workflow, and features a popular user interface layout with files
and folders on the left, editor on the right, and command and console information at the
bottom.
The Git repo link, documentation site button,
Version Control menu, and File Explorer
1. Git repository link — Clicking the Git repository link, located on the upper left of the
IDE, takes you to your repository on the same active branch.
o Note: This feature is only available for GitHub or GitLab repositories on multi-
tenant dbt Cloud accounts.
2. Documentation site button — Clicking the Documentation site book icon, located
next to the Git repository link, leads to the dbt Documentation site. The site is
powered by the latest dbt artifacts generated in the IDE using the dbt docs
generate command from the Command bar.
3. Version Control — The IDE's powerful Version Control section contains all git-
related elements, including the Git actions button and the Changes section.
4. File Explorer — The File Explorer shows the filetree of your repository. You can:
o Click on any file in the filetree to open the file in the File Editor.
o Click and drag files between directories to move files.
o Right-click a file to access the sub-menu options like duplicate file, copy file
name, copy as ref, rename, delete.
o Note: To perform these actions, the user must not be in read-only mode,
which generally happens when the user is viewing the default branch.
o Use file indicators, located to the right of your files or folder name, to see
when changes or actions were made:
 Unsaved (•) — The IDE detects unsaved changes to your file/folder
 Modification (M) — The IDE detects a modification of existing
files/folders
 Added (A) — The IDE detects added files
 Deleted (D) — The IDE detects deleted files.
Use
the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status
5. Command bar — The Command bar, located in the lower left of the IDE, is used to
invoke dbt commands. When a command is invoked, the associated logs are shown
in the Invocation History Drawer.
6. Defer to production — The Defer to production toggle allows developers to only

build and run and test models they've edited without having to first run and build all
the models that come before them (upstream parents). Refer to Using defer in dbt
Cloud for more info.
7. Status button — The IDE Status button, located on the lower right of the IDE,
displays the current IDE status. If there is an error in the status or in the dbt code
that stops the project from parsing, the button will turn red and display "Error". If
there aren't any errors, the button will display a green "Ready" status. To access
the IDE Status modal, simply click on this button.
Editing features
The IDE features some delightful tools and layouts to make it easier for you to write dbt
code and collaborate with teammates.
Use
the file editor, version control section, and save button during your development workflow
1. File Editor — The File Editor is where users edit code. Tabs break out the region for
each opened file, and unsaved files are marked with a blue dot icon in the tab view.
o Use intuitive keyboard shortcuts to help develop easier for you and your team.
2. Save button — The editor has a Save button that saves editable files. Pressing the
button or using the Command-S or Control-S shortcut saves the file contents. You
don't need to save to preview code results in the Console section, but it's necessary
before changes appear in a dbt invocation. The File Editor tab shows a blue icon for
unsaved changes.
3. Version Control — This menu contains all git-related elements, including the Git
actions button. The button updates relevant actions based on your editor's state,
such as prompting to pull remote changes, commit and sync when reverted commit
changes are present, or creating a merge/pull request when appropriate.
o The dropdown menu on the Git actions button allows users to revert changes,
refresh Git state, create merge/pull requests, and change branches.
 Keep in mind that although you can't delete local branches in the IDE
using this menu, you can reclone your repository, which deletes your
local branches and refreshes with the current remote branches,
effectively removing the deleted ones.
o You can also resolve merge conflicts and for more info on git, refer to Version
control basics.
o Version Control Options menu — The Changes section, under the Git
actions button, lists all file changes since the last commit. You can click on a
change to open the Git Diff View to see the inline changes. You can also right-
click any file and use the file-specific options in the Version Control Options
menu.
Right-
click edited files to access Version Control Options menu
Additional editing features
 Minimap — A Minimap (code outline) gives you a high-level overview of your source
code, which is useful for quick navigation and code understanding. A file's minimap
is displayed on the upper-right side of the editor. To quickly jump to different sections
of your file, click the shaded area.
Use the Minimap for quick navigation and code understanding
 dbt Editor Command Palette — The dbt Editor Command Palette displays text
editing actions and their associated keyboard shortcuts. This can be accessed by
pressing F1 or right-clicking in the text editing area and selecting Command Palette.
Click F1 to access the dbt Editor Command Palette menu for editor shortcuts
 Git Diff View — Clicking on a file in the Changes section of the Version Control
Menu will open the changed file with Git Diff view. The editor will show the previous
version on the left and the in-line changes made on the right.
The Git Diff View displays the previous version on the left and the changes made on
the right of the Editor
 Markdown Preview console tab — The Markdown Preview console tab shows a
preview of your .md file's markdown code in your repository and updates it
automatically as you edit your code.
The Markdown Preview console tab renders markdown code below the Editor tab.
 CSV Preview console tab — The CSV Preview console tab displays the data from
your CSV file in a table, which updates automatically as you edit the file in your seed
directory.
View csv code in the CSV Preview console tab below the Editor tab.
Console section
The console section, located below the File editor, includes various console tabs and
buttons to help you with tasks such as previewing, compiling, building, and viewing
the DAG. Refer to the following sub-bullets for more details on the console tabs and
buttons.
The
Console section is located below the File editor and has various tabs and buttons to help
execute tasks
1. Preview button — When you click on the Preview button, it runs the SQL in the
active file editor regardless of whether you have saved it or not and sends the results
to the Results console tab. You can preview a selected portion of saved or unsaved
code by highlighting it and then clicking the Preview button.
Row limits in IDE

2. Compile button — The Compile button compiles the saved or unsaved SQL code
and displays it in the Compiled Code tab.
Starting from dbt v1.6 or higher, when you save changes to a model, you can compile its
code with the model's specific context. This context is similar to what you'd have when
building the model and involves useful context variables
like {{ this }} or {{ is_incremental() }}.
3. Build button — The build button allows users to quickly access dbt commands
related to the active model in the File Editor. The available commands include dbt
build, dbt test, and dbt run, with options to include only the current resource, the
resource and its upstream dependencies, the resource, and its downstream
dependencies, or the resource with all dependencies. This menu is available for all
executable nodes.
4. Format button — The editor has a Format button that can reformat the contents of
your files. For SQL files, it uses either sqlfmt or sqlfluff, and for Python files, it
uses black.
5. Results tab — The Results console tab displays the most recent Preview results in
tabular format.
Preview results show up in the Results console tab
6. Compiled Code tab — The Compile button triggers a compile invocation that
generates compiled code, which is displayed in the Compiled Code tab.
Compile results show up in the Compiled Code tab
7. Lineage tab — The Lineage tab in the File Editor displays the active model's lineage
or DAG. By default, it shows two degrees of lineage in both directions
(2+model_name+2), however, you can change it to +model+ (full DAG).
o Double-click a node in the DAG to open that file in a new tab

o Expand or shrink the DAG using node selection syntax.
o Note, the --exclude flag isn't supported.
View
resource lineage in the Lineage tab
Invocation history
The Invocation History Drawer stores information on dbt invocations in the IDE. When you
invoke a command, like executing a dbt command such as dbt run, the associated logs are
displayed in the Invocation History Drawer.
You can open the drawer in multiple ways:
 Clicking the ^ icon next to the Command bar on the lower left of the page
 Typing a dbt command and pressing enter
 Or pressing Control-backtick (or Ctrl + `)
The
Invocation History Drawer returns a log and detail of all your dbt Cloud invocations.
1. Invocation History list — The left-hand panel of the Invocation History Drawer
displays a list of previous invocations in the IDE, including the command, branch
name, command status, and elapsed time.
2. Invocation Summary — The Invocation Summary, located above System Logs,

displays information about a selected command from the Invocation History list, such
as the command, its status (Running if it's still running), the git branch that was
active during the command, and the time the command was invoked.
3. System Logs toggle — The System Logs toggle, located under the Invocation
Summary, allows the user to see the full stdout and debug logs for the entirety of the
invoked command.
4. Command Control button — Use the Command Control button, located on the
right side, to control your invocation and cancel or rerun a selected run.
The
Invocation History list displays a list of previous invocations in the IDE
5. Node Summary tab — Clicking on the Results Status Tabs will filter the Node
Status List based on their corresponding status. The available statuses are Pass
(successful invocation of a node), Warn (test executed with a warning), Error
(database error or test failure), Skip (nodes not run due to upstream error), and
Queued (nodes that have not executed yet).
6. Node result toggle — After running a dbt command, information about each
executed node can be found in a Node Result toggle, which includes a summary and
debug logs. The Node Results List lists every node that was invoked during the
command.
7. Node result list — The Node result list shows all the Node Results used in the dbt
run, and you can filter it by clicking on a Result Status tab.
Modals and Menus

Use menus and modals to interact with IDE and access useful options to help your
development workflow.
 Editor tab menu — To interact with open editor tabs, right-click any tab to access
the helpful options in the file tab menu.
Right-click a tab to view the Editor tab menu options
 File Search — You can easily search for and navigate between files using the File
Navigation menu, which can be accessed by pressing Command-O or Control-O or
clicking on the 🔍 icon in the File Explorer.
The Command History returns a log and detail of all your dbt Cloud invocations.
 Global Command Palette— The Global Command Palette provides helpful

shortcuts to interact with the IDE, such as git actions, specialized dbt commands,
and compile, and preview actions, among others. To open the menu, use Command-
P or Control-P.
 IDE Status modal — The IDE Status modal shows the current error message and
debug logs for the server. This also contains an option to restart the IDE. Open this
by clicking on the IDE Status button.
 Commit Changes modal — The Commit Changes modal is accessible via the Git
Actions button to commit all changes or via the Version Control Options menu to
commit individual changes. Once you enter a commit message, you can use the
modal to commit and sync the selected changes.
The Commit Changes modal is how users commit changes to their branch.
 Change Branch modal — The Change Branch modal allows users to switch git
branches in the IDE. It can be accessed through the Change Branch link or the Git
Actions button in the Version Control menu.
The Commit Changes modal is how users change their branch.
 Revert Uncommitted Changes modal — The Revert Uncommitted Changes modal

is how users revert changes in the IDE. This is accessible via the Revert File option
above the Version Control Options menu, or via the Git Actions button when there
are saved, uncommitted changes in the IDE.
The Commit Changes modal is how users change their branch.
 IDE Options menu — The IDE Options menu can be accessed by clicking on the
three-dot menu located at the bottom right corner of the IDE. This menu contains
global options such as:
o Toggling between dark or light mode for a better viewing experience

o Restarting the IDE
o Fully recloning your repository to refresh your git state and view status details
o Viewing status details, including the IDE Status modal.
Acces
s the IDE Options menu to switch to dark or light mode, restart the IDE, reclone your repo,
or view the IDE status
Tags:
 IDE
Lint and format your code
Enhance your development workflow by integrating with popular linters and formatters
like SQLFluff, sqlfmt, Black, and Prettier. Leverage these powerful tools directly in the dbt
Cloud IDE without interrupting your development flow.
What are linters and formatters?
In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five different file
types:
 SQL — Lint and fix with SQLFluff, and format with sqlfmt
 YAML, Markdown, and JSON — Format with Prettier
 Python — Format with Black
Each file type has its own unique linting and formatting rules. You can customize the linting
process to add more flexibility and enhance problem and style detection.
By default, the IDE uses sqlfmt rules to format your code, making it convenient to use right
away. However, if you have a file named .sqlfluff in the root directory of your dbt project, the
IDE will default to SQLFluff rules instead.
Use
SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.
Use
sqlfmt to format your SQL code.
Forma
t YAML, Markdown, and JSON files using Prettier.
Use
the Config button to select your tool.
Custo
mize linting by configuring your own linting code rules, including dbtonic linting/styling.
Lint
With the dbt Cloud IDE, you can seamlessly use SQLFluff, a configurable SQL linter, to
warn you of complex functions, syntax, formatting, and compilation errors. This integration
allows you to run checks, fix, and display any code errors directly within the Cloud IDE:
 Works with Jinja and SQL,

 Comes with built-in linting rules. You can also customize your own linting rules.
 Empowers you to enable linting with options like Lint (displays linting errors and
recommends actions) or Fix (auto-fixes errors in the IDE).
 Displays a Code Quality tab to view code errors, and provides code quality visibility
and management.
EPHEMERAL MODELS NOT SUPPORTED
Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to the FAQs for
more info.
Enable linting
1. To enable linting, make sure you're on a development branch. Linting isn't available
on main or read-only branches.
2. Open a .sql file and click the Code Quality tab.
3. Click on the </> Config button on the bottom right side of the console section, below
the File editor.
4. In the code quality tool config pop-up, you have the option to
select sqlfluff or sqlfmt.
5. To lint your code, select the sqlfluff radio button. (Use sqlfmt to format your code)
6. Once you've selected the sqlfluff radio button, go back to the console section (below
the File editor) to select the Lint or Fix dropdown button:
o Lint button — Displays linting issues in the IDE as wavy underlines in the File
editor. You can hover over an underlined issue to display the details and
actions, including a Quick Fix option to fix all or specific issues. After linting,
you'll see a message confirming the outcome. Linting doesn't rerun after
saving. Click Lint again to rerun linting.
o Fix button — Automatically fixes linting errors in the File editor. When fixing
is complete, you'll see a message confirming the outcome.
o Use the Code Quality tab to view and debug any code errors.
Use
the Lint or Fix button in the console section to lint or auto-fix your code.
Customize linting
SQLFluff is a configurable SQL linter, which means you can configure your own linting rules
instead of using the default linting settings in the IDE. You can exclude files and directories
by using a standard .sqlfluffignore file. Learn more about the syntax in the .sqlfluffignore
syntax docs.
To configure your own linting rules:

1. Create a new file in the root project directory (the parent or top-level directory for
your files). Note: The root project directory is the directory where
your dbt_project.yml file resides.
2. Name the file .sqlfluff (make sure you add the . before sqlfluff).
3. Create and add your custom config code.
4. Save and commit your changes.
5. Restart the IDE.
6. Test it out and happy linting!
CONFIGURE DBTONIC LINTING RULES

Refer to the SQLFluff config file to add the dbt code (or dbtonic) rules we use for our own
projects:
dbtonic config code example provided by dbt Labs

For more info on styling best practices, refer to How we style our SQL.
Custo
mize linting by configuring your own linting code rules, including dbtonic linting/styling.
Format
In the dbt Cloud IDE, you can format your code to match style guides with a click of a
button. The IDE integrates with formatters like sqlfmt, Prettier, and Black to automatically
format code on five different file types — SQL, YAML, Markdown, Python, and JSON:
 SQL — Format with sqlfmt, which provides one way to format your dbt SQL and
Jinja.
 YAML, Markdown, and JSON — Format with Prettier.
 Python — Format with Black.
The Cloud IDE formatting integrations take care of manual tasks like code formatting,
enabling you to focus on creating quality data models, collaborating, and driving impactful
results.
Format SQL
To format your SQL code, dbt Cloud integrates with sqlfmt, which is an uncompromising
SQL query formatter that provides one way to format the SQL query and Jinja.
By default, the IDE uses sqlfmt rules to format your code, making the Format button
available and convenient to use immediately. However, if you have a file named .sqlfluff in
the root directory of your dbt project, the IDE will default to SQLFluff rules instead.
To enable sqlfmt:
1. Make sure you're on a development branch. Formatting isn't available on main or

read-only branches.
2. Open a .sql file and click on the Code Quality tab.
3. Click on the </> Config button on the right side of the console.
4. In the code quality tool config pop-up, you have the option to select sqlfluff or sqlfmt.
5. To format your code, select the sqlfmt radio button. (Use sqlfluff to lint your code).
6. Once you've selected the sqlfmt radio button, go to the console section (located
below the File editor) to select the Format button.
7. The Format button auto-formats your code in the File editor. Once you've auto-
formatted, you'll see a message confirming the outcome.
Use
sqlfmt to format your SQL code.
Format YAML, Markdown, JSON
To format your YAML, Markdown, or JSON code, dbt Cloud integrates with Prettier, which
is an opinionated code formatter.
1. To enable formatting, make sure you're on a development branch. Formatting isn't

available on main or read-only branches.
2. Open a .yml, .md, or .json file.
3. In the console section (located below the File editor), select the Format button to
auto-format your code in the File editor. Use the Code Quality tab to view code
errors.
4. Once you've auto-formatted, you'll see a message confirming the outcome.
Forma
t YAML, Markdown, and JSON files using Prettier.
You can add a configuration file to customize formatting rules for YAML, Markdown, or
JSON files using Prettier. The IDE looks for the configuration file based on an order of
precedence. For example, it first checks for a "prettier" key in your package.json file.
For more info on the order of precedence and how to configure files, refer to Prettier's
documentation. Please note, .prettierrc.json5, .prettierrc.js, and .prettierrc.toml files aren't
currently supported.
Format Python
To format your Python code, dbt Cloud integrates with Black, which is an uncompromising
Python code formatter.
1. To enable formatting, make sure you're on a development branch. Formatting isn't

available on main or read-only branches.
2. Open a .py file.
3. In the console section (located below the File editor), select the Format button to
auto-format your code in the File editor.
4. Once you've auto-formatted, you'll see a message confirming the outcome.
Forma
t Python files using Black.
FAQs
When should I use SQLFluff and when should I use sqlfmt?Hover to view
Can I nest `.sqlfluff` files?Hover to view
Can I run SQLFluff commands from the terminal?Hover to view
Why am I unable to see the Lint or Format button?Hover to view
Why is there inconsistent SQLFluff behavior when running outside the dbt Cloud IDE?
Hover to view
What are some considerations when using dbt Cloud linting?Hover to view
Related docs
About dbt projects
A dbt project informs dbt about the context of your project and how to transform your data
(build your data sets). By design, dbt enforces the top-level structure of a dbt project such
as the dbt_project.yml file, the models directory, the snapshots directory, and so on. Within
the directories of the top-level, you can organize your project in any way that meets the
needs of your organization and data pipeline.
At a minimum, all a project needs is the dbt_project.yml project configuration file. dbt
supports a number of different resources, so a project may also include:
Resource Description
models Each model lives in a single file and contains logic that either transforms raw data into
a dataset that is ready for analytics or, more often, is an intermediate step in such a
transformation.
snapshots A way to capture the state of your mutable tables so you can refer to it later.
seeds CSV files with static data that you can load into your data platform with dbt.
data tests SQL queries that you can write to test the models and resources in your project.
Resource Description
macros Blocks of code that you can reuse multiple times.
docs Docs for your project that you can build.
sources A way to name and describe the data loaded into your warehouse by your Extract and
Load tools.
exposure A way to define and describe a downstream use of your project.

s
metrics A way for you to define metrics for your project.
groups Groups enable collaborative node organization in restricted collections.
analysis A way to organize analytical SQL queries in your project such as the general ledger
from your QuickBooks.
When building out the structure of your project, you should consider these impacts on your
organization's workflow:
 How would people run dbt commands — Selecting a path

 How would people navigate within the project — Whether as developers in the
IDE or stakeholders from the docs
 How would people configure the models — Some bulk configurations are easier
done at the directory level so people don’t have to remember to do everything in a
config block with each new model
Project configuration
Every dbt project includes a project configuration file called dbt_project.yml. It defines the
directory of the dbt project and other project configurations.
Edit dbt_project.yml to set up common project configurations such as:
YAML key Value description
name Your project’s name in snake case
version Version of your project
require-dbt- Restrict your project to only work with a range of dbt Core versions
version
profile The profile dbt uses to connect to your data platform
model-paths Directories to where your model and source files live
seed-paths Directories to where your seed files live
test-paths Directories to where your test files live
analysis-paths Directories to where your analyses live
macro-paths Directories to where your macros live
snapshot-paths Directories to where your snapshots live
docs-paths Directories to where your docs blocks live

YAML key Value description
vars Project variables you want to use for data compilation

For complete details on project configurations, see dbt_project.yml.
Project subdirectories
You can use the Project subdirectory option in dbt Cloud to specify a subdirectory in your
git repository that dbt should use as the root directory for your project. This is helpful when
you have multiple dbt projects in one repository or when you want to organize your dbt
project files into subdirectories for easier management.
To use the Project subdirectory option in dbt Cloud, follow these steps:
1. Click on the cog icon on the upper right side of the page and click on Account
Settings.
2. Under Projects, select the project you want to configure as a project subdirectory.
3. Select Edit on the lower right-hand corner of the page.
4. In the Project subdirectory field, add the name of the subdirectory. For example, if
your dbt project files are located in a subdirectory called <repository>/finance, you
would enter finance as the subdirectory.
o You can also reference nested subdirectories. For example, if your dbt project
files are located in <repository>/teams/finance, you would
enter teams/finance as the subdirectory. Note: You do not need a leading or
trailing / in the Project subdirectory field.
5. Click Save when you've finished.
After configuring the Project subdirectory option, dbt Cloud will use it as the root directory
for your dbt project. This means that dbt commands, such as dbt run or dbt test, will operate
on files within the specified subdirectory. If there is no dbt_project.yml file in the Project
subdirectory, you will be prompted to initialize the dbt project.
New projects
You can create new projects and share them with other people by making them available
on a hosted git repository like GitHub, GitLab, and BitBucket.
After you set up a connection with your data platform, you can initialize your new project in
dbt Cloud and start developing. Or, run dbt init from the command line to set up your new
project.
During project initialization, dbt creates sample model files in your project directory to help
you start developing quickly.
Sample projects
If you want to explore dbt projects more in-depth, you can clone dbt Lab’s Jaffle shop on
GitHub. It's a runnable project that contains sample configurations and helpful notes.
If you want to see what a mature, production project looks like, check out the GitLab Data
Team public repo.
About dbt models

dbt Core and Cloud are composed of different moving parts working harmoniously. All of
them are important to what dbt does — transforming data—the 'T' in ELT. When you
execute dbt run, you are running a model that will transform your data without that data
ever leaving your warehouse.
Models are where your developers spend most of their time within a dbt environment.
Models are primarily written as a select statement and saved as a .sql file. While the
definition is straightforward, the complexity of the execution will vary from environment to
environment. Models will be written and rewritten as needs evolve and your organization
finds new ways to maximize efficiency.
SQL is the language most dbt users will utilize, but it is not the only one for building models.
Starting in version 1.3, dbt Core and dbt Cloud support Python models. Python models are
useful for training or deploying data science models, complex transformations, or where a
specific Python package meets a need — such as using the dateutil library to parse dates.
Models and modern workflows
The top level of a dbt workflow is the project. A project is a directory of a .yml file (the
project configuration) and either .sql or .py files (the models). The project file tells dbt the
project context, and the models let dbt know how to build a specific data set. For more
details on projects, refer to About dbt projects.
Your organization may need only a few models, but more likely you’ll need a complex
structure of nested models to transform the required data. A model is a single file containing
a final select statement, and a project can have multiple models, and models can even
reference each other. Add to that, numerous projects and the level of effort required for
transforming complex data sets can improve drastically compared to older methods.
Learn more about models in SQL models and Python models pages. If you'd like to begin
with a bit of practice, visit our Getting Started Guide for instructions on setting up the
Jaffle_Shop sample data so you can get hands-on with the power of dbt.
Add snapshots to your DAG
Related documentation
 Snapshot configurations
 Snapshot properties
 snapshot command
What are snapshots?
Analysts often need to "look back in time" at previous data states in their mutable tables.
While some source data systems are built in a way that makes accessing historical data
possible, this is not always the case. dbt provides a mechanism, snapshots, which records
changes to a mutable table over time.
Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables.
These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over
time. Imagine you have an orders table where the status field can be overwritten as the
order is processed.
id status updated_at
1 pendin 2019-01-01
g
Now, imagine that the order goes from "pending" to "shipped". That same record will now
look like:
1 shippe 2019-01-02
d
This order is now in the "shipped" state, but we've lost the information about when the order
was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it
took for an order to ship. dbt can "snapshot" these changes to help you understand how
values in a row change over time. Here's an example of a snapshot table for the previous
example:
i updated_a
status dbt_valid_from dbt_valid_to
d t
1 pending 2019-01-01 2019-01-01 2019-01-02
1 shipped 2019-01-02 2019-01-02 null

In dbt, snapshots are select statements, defined within a snapshot block in a .sql file
(typically in your snapshots directory). You'll also need to configure your snapshot to tell dbt
how to detect record changes.
snapshots/orders_snapshot.sql
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
PREVIEW OR COMPILE SNAPSHOTS IN IDE

It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run
the dbt snapshot command in the IDE by completing the following steps.
When you run the dbt snapshot command:
 On the first run: dbt will create the initial snapshot table — this will be the result set
of your select statement, with additional columns
including dbt_valid_from and dbt_valid_to. All records will have a dbt_valid_to = null.
 On subsequent runs: dbt will check which records have changed or if any new
records have been created:
o The dbt_valid_to column will be updated for any existing records that have
changed
o The updated record and any new records will be inserted into the snapshot
table. These records will now have dbt_valid_to = null
Snapshots can be referenced in downstream models the same way as referencing models
— by using the ref function.
Example
To add a snapshot to your project:
1. Create a file in your snapshots directory with a .sql file extension,

e.g. snapshots/orders.sql
2. Use a snapshot block to define the start and end of a snapshot:
{% endsnapshot %}
3. Write a select statement within the snapshot block (tips for writing a good snapshot
query are below). This select statement defines the results that you want to snapshot
over time. You can use sources and refs here.
{% endsnapshot %}
4. Check whether the result set of your query includes a reliable timestamp column that
indicates when a record was last updated. For our example, the updated_at column
reliably indicates record changes, so we can use the timestamp strategy. If your
query result set does not have a reliable timestamp, you'll need to instead use
the check strategy — more details on this below.
5. Add configurations to your snapshot using a config block (more details below). You
can also configure your snapshot from your dbt_project.yml file (docs).
{{
config(
unique_key='id',
)
}}
{% endsnapshot %}
6. Run the dbt snapshot command — for our example a new table will be created
at analytics.snapshots.orders_snapshot. You can change
the target_database configuration, the target_schema configuration and the name of
the snapshot (as defined in {% snapshot .. %}) will change how dbt names this table.
$ dbt snapshot
Running with dbt=0.16.0
15:07:36 | Concurrency: 8 threads (target='dev')

15:07:36 |
15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN]
15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s]
15:07:36 |
15:07:36 | Finished running 1 snapshots in 0.68s.
Completed successfully
Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1
7. Inspect the results by selecting from the table dbt created. After the first run, you
should see the results of your query, plus the snapshot meta fields as described
below.
8. Run the snapshot command again, and inspect the results. If any records have been
updated, the snapshot should reflect this.
9. Select from the snapshot in downstream models using the ref function.
models/changed_orders.sql
select * from {{ ref('orders_snapshot') }}
10. Schedule the snapshot command to run regularly — snapshots are only useful if you
run them frequently.
Detecting row changes

Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies
built-in to dbt — timestamp and check.
Timestamp strategy (recommended)
The timestamp strategy uses an updated_at field to determine if a row has changed. If the
configured updated_at column for a row is more recent than the last time the snapshot ran,
then dbt will invalidate the old record and record the new one. If the timestamps are
unchanged, then dbt will not take any action.
The timestamp strategy requires the following configurations:
Config Description Example
updated_a A column which represents when the source row was last updated updated_at
t
Example usage:
snapshots/orders_snapshot_timestamp.sql
{% snapshot orders_snapshot_timestamp %}
{{
config(
unique_key='id',
)
}}
{% endsnapshot %}
Check strategy
The check strategy is useful for tables which do not have a reliable updated_at column.
This strategy works by comparing a list of columns between their current and historical
values. If any of these columns have changed, then dbt will invalidate the old record and
record the new one. If the column values are identical, then dbt will not take any action.
The check strategy requires the following configurations:
check_cols A list of columns to check for changes, or all to check all ["name", "email"]
columns
CHECK_COLS = 'ALL'
The check snapshot strategy can be configured to track changes to all columns by
supplying check_cols = 'all'. It is better to explicitly enumerate the columns that you want to
check. Consider using a surrogate key to condense many columns into a single column.
Example Usage
snapshots/orders_snapshot_check.sql
{% snapshot orders_snapshot_check %}
{{
config(
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}
{% endsnapshot %}
Hard deletes (opt-in)
Rows that are deleted from the source query are not invalidated by default. With the config
option invalidate_hard_deletes, dbt can track rows that no longer exist. This is done by left
joining the snapshot table with the source table, and filtering the rows that are still valid at
that point, but no longer can be found in the source table. dbt_valid_to will be set to the
current snapshot time.
This configuration is not a different strategy as described above, but is an additional opt-in
feature. It is not enabled by default since it alters the previous behavior.
For this configuration to work with the timestamp strategy, the

configured updated_at column must be of timestamp type. Otherwise, queries will fail due
to mixing data types.
Example Usage
snapshots/orders_snapshot_hard_delete.sql
{% snapshot orders_snapshot_hard_delete %}
{{
config(
unique_key='id',
invalidate_hard_deletes=True,
)
}}
{% endsnapshot %}
Configuring snapshots
Snapshot configurations
There are a number of snapshot-specific configurations:
Config Description Required? Example
target_database The database that dbt should render No analytics

the snapshot table into
target_schema The schema that dbt should render Yes snapshots

the snapshot table into
strategy The snapshot strategy to use. One Yes timestamp

of timestamp or check
unique_key A primary key column or expression Yes id

Config Description Required? Example
for the record
check_cols If using the check strategy, then the Only if using ["status"]
columns to check the check strategy
updated_at If using the timestamp strategy, the Only if using updated_at

timestamp column to compare the timestamp strategy
invalidate_hard_deletes Find hard deleted records in source, No True

and set dbt_valid_to current time if
no longer exists
A number of other configurations are also supported (e.g. tags and post-hook), check out
the full list here.
Snapshots can be configured from both your dbt_project.yml file and a config block, check
out the configuration docs for more information.
Note: BigQuery users can use target_project and target_dataset as aliases

for target_database and target_schema, respectively.
Configuration best practices
Use the timestamp strategy where possible

This strategy handles column additions and deletions better than the check strategy.
Ensure your unique key is really unique

The unique key is used by dbt to match rows up, so it's extremely important to make sure
this key is actually unique! If you're snapshotting a source, I'd recommend adding a
uniqueness test to your source (example).
Use a target_schema that is separate to your analytics schema

Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate
schema so end users know they are special. From there, you may want to set different
privileges on your snapshots compared to your models, and even run them as a different
user (or role, depending on your warehouse) to make it very difficult to drop a snapshot
unless you really want to.
Snapshot query best practices

Snapshot source data.
Your models should then select from these snapshots, treating them like regular data
sources. As much as possible, snapshot your source data in its raw form and use
downstream models to clean up the data
Use the source function in your query.

This helps when understanding data lineage in your project.
Include as many columns as possible.

In fact, go for select * if performance permits! Even if a column doesn't feel useful at the
moment, it might be better to snapshot it in case it becomes useful – after all, you won't be
able to recreate the column later.
Avoid joins in your snapshot query.
Joins can make it difficult to build a reliable updated_at timestamp. Instead, snapshot the
two tables separately, and join them in downstream models.
Limit the amount of transformation in your query.

If you apply business logic in a snapshot query, and this logic changes in the future, it can
be impossible (or, at least, very difficult) to apply the change in logic to your snapshots.
Basically – keep your query as simple as possible! Some reasonable exceptions to these
recommendations include:
 Selecting specific columns if the table is wide.

 Doing light transformation to get data into a reasonable shape, for example,
unpacking a JSON blob to flatten your source data into columns.
Snapshot meta-fields
Snapshot tables will be created as a clone of your source dataset, plus some additional
meta-fields*.
Field Meaning Usage
dbt_valid_from The timestamp when this snapshot row This column can be used to order the
was first inserted different "versions" of a record.
dbt_valid_to The timestamp when this row became The most recent snapshot record will
invalidated. have dbt_valid_to set to null.
dbt_scd_id A unique key generated for each This is used internally by dbt
snapshotted record.
dbt_updated_a The updated_at timestamp of the source This is used internally by dbt
t record when this snapshot row was
inserted.
*The timestamps used for each column are subtly different depending on the strategy you
use:
For the timestamp strategy, the configured updated_at column is used to populate
the dbt_valid_from, dbt_valid_to and dbt_updated_at columns.
Details for the timestamp strategy
For the check strategy, the current timestamp is used to populate each column. If
configured, the check strategy uses the updated_at column instead, as with the timestamp
strategy.
Details for the check strategy

FAQs
How do I run one snapshot at a time?Hover to view
How often should I run the snapshot command?Hover to view
What happens if I add new columns to my snapshot query?Hover to view
Do hooks run with snapshots?Hover to view
Why is there only one `target_schema` for snapshots?Hover to view
Can I store my snapshots in a directory other than the `snapshot` directory in my project?
Hover to view
By default, dbt expects your snapshot files to be located in the snapshots subdirectory of
your project.
To change this, update the snapshot-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
snapshot-paths: ["snapshots"]
Note that you cannot co-locate snapshots and models in the same directory.
Debug Snapshot target is not a snapshot table errorsHover to view
Add data tests to your DAG
Related reference docs

 Test command
 Data test properties
 Data test configurations
 Test selection examples
Overview
Data tests are assertions you make about your models and other resources in your dbt
project (e.g. sources, seeds and snapshots). When you run dbt test, dbt will tell you if each
test in your project passes or fails.
You can use data tests to improve the integrity of the SQL in each model by making
assertions about the results generated. Out of the box, you can test whether a specified
column in a model only contains non-null values, unique values, or values that have a
corresponding value in another model (for example, a customer_id for
an order corresponds to an id in the customers model), and values from a specified list. You
can extend data tests to suit business logic specific to your organization – any assertion
that you can make about your model in the form of a select query can be turned into a data
test.
Data tests return a set of failing records. Generic data tests (f.k.a. schema tests) are
defined using test blocks.
Like almost everything in dbt, data tests are SQL queries. In particular, they
are select statements that seek to grab "failing" records, ones that disprove your assertion.
If you assert that a column is unique in a model, the test query selects for duplicates; if you
assert that a column is never null, the test seeks after nulls. If the data test returns zero
failing rows, it passes, and your assertion has been validated.
There are two ways of defining data tests in dbt:
 A singular data test is testing in its simplest form: If you can write a SQL query that
returns failing rows, you can save that query in a .sql file within your test directory.
It's now a data test, and it will be executed by the dbt test command.
 A generic data test is a parameterized query that accepts arguments. The test query
is defined in a special test block (like a macro). Once defined, you can reference the
generic test by name throughout your .yml files—define it on models, columns,
sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we
think you should use them!
Defining data tests is a great way to confirm that your outputs and inputs are as expected,
and helps prevent regressions when your code changes. Because you can use them over
and over again, making similar assertions with minor variations, generic data tests tend to
be much more common—they should make up the bulk of your dbt data testing suite. That
said, both ways of defining data tests have their time and place.
CREATING YOUR FIRST DATA TESTS

If you're new to dbt, we recommend that you check out our quickstart guide to build your
first dbt project with models and tests.
Singular data tests
The simplest way to define a data test is by writing the exact SQL that will return failing
records. We call these "singular" data tests, because they're one-off assertions usable for a
single purpose.
These tests are defined in .sql files, typically in your tests directory (as defined by your test-
paths config). You can use Jinja (including ref and source) in the test definition, just like you
can when creating models. Each .sql file contains one select statement, and it defines one
data test:
tests/assert_total_payment_amount_is_positive.sql
-- Refunds have a negative amount, so the total amount should always be >= 0.
-- Therefore return records where this isn't true to make the test fail
select
order_id,
sum(amount) as total_amount
from {{ ref('fct_payments' )}}
group by 1
having not(total_amount >= 0)
The name of this test is the name of the file: assert_total_payment_amount_is_positive.

Simple enough.
Singular data tests are easy to write—so easy that you may find yourself writing the same
basic structure over and over, only changing the name of a column or model. By that point,
the test isn't so singular! In that case, we recommend...
Generic data tests

Certain data tests are generic: they can be reused over and over again. A generic data test
is defined in a test block, which contains a parametrized query and accepts arguments. It
might look like:
{% test not_null(model, column_name) %}
select *
from {{ model }}
where {{ column_name }} is null
{% endtest %}
You'll notice that there are two arguments, model and column_name, which are then
templated into the query. This is what makes the test "generic": it can be defined on as
many columns as you like, across as many models as you like, and dbt will pass the values
of model and column_name accordingly. Once that generic test has been defined, it can be
added as a property on any existing model (or source, seed, or snapshot). These properties
are added in .yml files in the same directory as your resource.
INFO
If this is your first time working with adding properties to a resource, check out the docs
on declaring properties.
Out of the box, dbt ships with four generic data tests already
defined: unique, not_null, accepted_values and relationships. Here's a full example using
those tests on an orders model:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id
In plain English, these data tests translate to:
 unique: the order_id column in the orders model should be unique

 not_null: the order_id column in the orders model should not contain null values
 accepted_values: the status column in the orders should be one
of 'placed', 'shipped', 'completed', or 'returned'
 relationships: each customer_id in the orders model exists as an id in
the customers table (also known as referential integrity)
Behind the scenes, dbt constructs a select query for each data test, using the parametrized
query from the generic test block. These queries return the rows where your assertion
is not true; if the test returns zero rows, your assertion passes.
You can find more information about these data tests, and additional configurations
(including severity and tags) in the reference section.
More generic data tests
Those four tests are enough to get you started. You'll quickly find you want to use a wider
variety of tests—a good thing! You can also install generic data tests from a package, or
write your own, to use (and reuse) across your dbt project. Check out the guide on custom
generic tests for more information.
INFO
There are generic tests defined in some open source packages, such as dbt-utils and dbt-
expectations — skip ahead to the docs on packages to learn more!
Example
To add a generic (or "schema") test to your project:
1. Add a .yml file to your models directory, e.g. models/schema.yml, with the following
content (you may need to adjust the name: values for an existing model)
models/schema.yml
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
2. Run the dbt test command:
$ dbt test
Found 3 models, 2 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 0
sources
17:31:05 | Concurrency: 1 threads (target='learn')

17:31:05 |
17:31:05 | 1 of 2 START test not_null_order_order_id..................... [RUN]
17:31:06 | 1 of 2 PASS not_null_order_order_id........................... [PASS in 0.99s]
17:31:06 | 2 of 2 START test unique_order_order_id....................... [RUN]
17:31:07 | 2 of 2 PASS unique_order_order_id............................. [PASS in 0.79s]
17:31:07 |
17:31:07 | Finished running 2 tests in 7.17s.
Completed successfully
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
3. Check out the SQL dbt is running by either:

o dbt Cloud: checking the Details tab.
o dbt Core: checking the target/compiled directory
Unique test
 Compiled SQL
 Templated SQL
select *
from (
select
order_id
from analytics.orders
where order_id is not null
group by order_id
having count(*) > 1
) validation_errors
Not null test
 Compiled SQL
 Templated SQL
select *
from analytics.orders
where order_id is null
Storing test failures

Normally, a data test query will calculate failures as part of its execution. If you set the
optional --store-failures flag, the store_failures, or the store_failures_as configs, dbt will first
save the results of a test query to a table in the database, and then query that table to
calculate the number of failures.
This workflow allows you to query and examine failing records much more quickly in
development:
Store
test failures in the database for faster development-time debugging.
Note that, if you elect to store test failures:
 Test result tables are created in a schema suffixed or named dbt_test__audit, by

default. It is possible to change this value by setting a schema config. (For more
details on schema naming, see using custom schemas.)
 A test's results will always replace previous failures for the same test.
FAQs
How do I test one model at a time?Hover to view
One of my tests failed, how can I debug it?Hover to view
What tests should I add to my project?Hover to view
When should I run my tests?Hover to view
Can I store my tests in a directory other than the `tests` directory in my project?Hover to
view
How do I run tests on just my sources?Hover to view
Can I set test failure thresholds?Hover to view
As of v0.20.0, you can use the error_if and warn_if configs to set custom failure thresholds
in your tests. For more details, see reference for more information.
For dbt v0.19.0 and earlier, you could try these possible solutions:
 Setting the severity to warn, or:

 Writing a custom generic test that accepts a threshold argument (example)
Can I test the uniqueness of two columns?Hover to view

Yes, There's a few different options.
Consider an orders table that contains records from multiple countries, and the combination
of ID and country code is unique:
country_cod
order_id
e
1 AU
2 AU
... ...
1 US
2 US
... ...
Here are some approaches:
1. Create a unique key in the model and test that

models/orders.sql
select
country_code || '-' || order_id as surrogate_key,
...
models/orders.yml
version: 2
models:
- name: orders
columns:
- name: surrogate_key
tests:
- unique
2. Test an expression
models/orders.yml
version: 2
models:
- name: orders
tests:
- unique:
column_name: "(country_code || '-' || order_id)"
3. Use the dbt_utils.unique_combination_of_columns test

This is especially useful for large datasets since it is more performant. Check out the docs
on packages for more information.
models/orders.yml
version: 2
models:
- name: orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- country_code
- order_id
Edit this page
Jinja and macros

 Jinja Template Designer Documentation (external link)
 dbt Jinja context
 Macro properties
Overview
In dbt, you can combine SQL with Jinja, a templating language.
Using Jinja turns your dbt project into a programming environment for SQL, giving you the
ability to do things that aren't normally possible in SQL. For example, with Jinja you can:
 Use control structures (e.g. if statements and for loops) in SQL

 Use environment variables in your dbt project for production deployments
 Change the way your project builds based on the current target.
 Operate on the results of one query to generate another query, for example:
o Return a list of payment methods, in order to create a subtotal column per
payment method (pivot)
o Return a list of columns in two relations, and select them in the same order to
make it easier to union them together
 Abstract snippets of SQL into reusable macros — these are analogous to functions
in most programming languages.
In fact, if you've used the {{ ref() }} function, you're already using Jinja!
Jinja can be used in any SQL in a dbt project, including models, analyses, tests, and
even hooks.
READY TO GET STARTED WITH JINJA AND MACROS?

Check out the tutorial on using Jinja for a step-by-step example of using Jinja in a model,
and turning it into a macro!
Getting started
Jinja
Here's an example of a dbt model that leverages Jinja:
/models/order_payment_method_amounts.sql
{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}
select
order_id,
{% for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as
{{payment_method}}_amount,
{% endfor %}
from app_data.payments
group by 1
This query will get compiled to:
/models/order_payment_method_amounts.sql
select
order_id,
sum(case when payment_method = 'bank_transfer' then amount end) as
bank_transfer_amount,
sum(case when payment_method = 'credit_card' then amount end) as
credit_card_amount,
sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount,
group by 1
You can recognize Jinja based on the delimiters the language uses, which we refer to as
"curlies":
 Expressions {{ ... }}: Expressions are used when you want to output a string. You
can use expressions to reference variables and call macros.
 Statements {% ... %}: Statements don't output a string. They are used for control
flow, for example, to set up for loops and if statements, to set or modify variables, or
to define macros.
 Comments {# ... #}: Jinja comments are used to prevent the text within the comment
from executing or outputing a string.
When used in a dbt model, your Jinja needs to compile to a valid query. To check what
SQL your Jinja compiles to:
 Using dbt Cloud: Click the compile button to see the compiled SQL in the Compiled
SQL pane
 Using dbt Core: Run dbt compile from the command line. Then open the compiled
SQL file in the target/compiled/{project name}/ directory. Use a split screen in your
code editor to keep both files open at once.
Macros
Macros in Jinja are pieces of code that can be reused multiple times – they are analogous
to "functions" in other programming languages, and are extremely useful if you find yourself
repeating code across multiple models. Macros are defined in .sql files, typically in
your macros directory (docs).
Macro files can contain one or more macros — here's an example:
macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, scale=2) %}

({{ column_name }} / 100)::numeric(16, {{ scale }})
{% endmacro %}
A model which uses this macro might look like:
models/stg_payments.sql
select
id as payment_id,
{{ cents_to_dollars('amount') }} as amount_usd,
...
This would be compiled to:
target/compiled/models/stg_payments.sql
select
id as payment_id,
(amount / 100)::numeric(16, 2) as amount_usd,
...
Using a macro from a package
A number of useful macros have also been grouped together into packages — our most
popular package is dbt-utils.
After installing a package into your project, you can use any of the macros in your own
project — make sure you qualify the macro by prefixing it with the package name:
select
field_1,
field_2,
field_3,
field_4,
field_5,
count(*)
from my_table
{{ dbt_utils.dimensions(5) }}
You can also qualify a macro in your own project by prefixing it with your package
name (this is mainly useful for package authors).
FAQs
What parts of Jinja are dbt-specific?Hover to view
Which docs should I use when writing Jinja or creating a macro?Hover to view
Why do I need to quote column names in Jinja?Hover to view
My compiled SQL has a lot of spaces and new lines, how can I get rid of it?Hover to view
How do I debug my Jinja?Hover to view
How do I document macros?Hover to view
Why does my dbt output have so many macros in it?Hover to view
dbtonic Jinja
Just like well-written python is pythonic, well-written dbt code is dbtonic.
Favor readability over DRY-ness
Once you learn the power of Jinja, it's common to want to abstract every repeated line into
a macro! Remember that using Jinja can make your models harder for other users to
interpret — we recommend favoring readability when mixing Jinja with SQL, even if it
means repeating some lines of SQL in a few places. If all your models are macros, it might
be worth re-assessing.
Leverage package macros
Writing a macro for the first time? Check whether we've open sourced one in dbt-utils that
you can use, and save yourself some time!
Set variables at the top of a model
{% set ... %} can be used to create a new variable, or update an existing one. We
recommend setting variables at the top of a model, rather than hardcoding it inline. This is a
practice borrowed from many other coding languages, since it helps with readability, and
comes in handy if you need to reference the variable in two places:
-- 🙅 This works, but can be hard to maintain as your code grows
{% for payment_method in ["bank_transfer", "credit_card", "gift_card"] %}
...
{% endfor %}
-- ✅ This is our preferred method of setting variables

{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}
{% for payment_method in payment_methods %}

...
{% endfor %}
Questions from the Community
No recent forum posts for this topic. Ask a question!
Add sources to your DAG

 Source properties
 Source configurations
 {{ source() }} jinja function
 source freshness command
Using sources
Sources make it possible to name and describe the data loaded into your warehouse by
your Extract and Load tools. By declaring these tables as sources in dbt, you can then
 select from source tables in your models using the {{ source() }} function, helping
define the lineage of your data
 test your assumptions about your source data
 calculate the freshness of your source data
Declaring a source
Sources are defined in .yml files nested under a sources: key.
models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
schema: jaffle_shop
tables:
- name: orders
- name: customers
- name: stripe
tables:
- name: payments
*By default, schema will be the same as name. Add schema only if you want to use a
source name that differs from the existing schema.
If you're not already familiar with these files, be sure to check out the documentation on
schema.yml files before proceeding.
Selecting from a source
Once a source has been defined, it can be referenced from a model using
the {{ source()}} function.
models/orders.sql
select
...
from {{ source('jaffle_shop', 'orders') }}
left join {{ source('jaffle_shop', 'customers') }} using (customer_id)
dbt will compile this to the full table name:
target/compiled/jaffle_shop/models/my_model.sql
select
...
from raw.jaffle_shop.orders
left join raw.jaffle_shop.customers using (customer_id)
Using the {{ source () }} function also creates a dependency between the model and the
source table.
The
source function tells dbt a model is dependent on a source
Testing and documenting sources
You can also:

 Add data tests to sources
 Add descriptions to sources, that get rendered as part of your documentation site
These should be familiar concepts if you've already added tests and descriptions to your
models (if not check out the guides on testing and documentation).
version: 2
sources:
- name: jaffle_shop
description: This is a replica of the Postgres database used by our app
tables:
- name: orders
description: >
One record per order. Includes cancelled and deleted orders.
columns:
- name: id
description: Primary key of the orders table
tests:
- unique
- not_null
- name: status
description: Note that the status can change over time
- name: ...
- name: ...
You can find more details on the available properties for sources in the reference section.
FAQs
What if my source is in a poorly named schema or table?Hover to view

What if my source is in a different database to my target database?Hover to view
I need to use quotes to select from my source, what should I do?Hover to view
How do I run tests on just my sources?Hover to view
How do I run models downstream of one source?Hover to view
Snapshotting source data freshness
With a couple of extra configs, dbt can optionally snapshot the "freshness" of the data in
your source tables. This is useful for understanding if your data pipelines are in a healthy
state, and is a critical component of defining SLAs for your warehouse.
Declaring source freshness
To configure sources to snapshot freshness information, add a freshness block to your

source and loaded_at_field to your table declaration:
version: 2
sources:
- name: jaffle_shop
database: raw
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
- name: customers # this will use the freshness defined above
- name: product_skus
freshness: null # do not check freshness for this table
In the freshness block, one or both of warn_after and error_after can be provided. If neither
is provided, then dbt will not calculate freshness snapshots for the tables in this source.
Additionally, the loaded_at_field is required to calculate freshness for a table. If

a loaded_at_field is not provided, then dbt will not calculate freshness for the table.
These configs are applied hierarchically, so freshness and loaded_at_field values specified
for a source will flow through to all of the tables defined in that source. This is useful when
all of the tables in a source have the same loaded_at_field, as the config can just be
specified once in the top-level source definition.
Checking source freshness
To snapshot freshness information for your sources, use the dbt source
freshness command (reference docs):
$ dbt source freshness
Behind the scenes, dbt uses the freshness properties to construct a select query, shown
below. You can find this query in the query logs.
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
The results of this query are used to determine whether the source is fresh or not:
Uh oh!
Not everything is as fresh as we'd like!
Filter
Some databases can have tables where a filter over certain columns are required, in order
prevent a full scan of the table, which could be costly. In order to do a freshness check on
such tables a filter argument can be added to the configuration, e.g. filter: _etl_loaded_at
>= date_sub(current_date(), interval 1 day). For the example above, the resulting query
would look like
select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
where _etl_loaded_at >= date_sub(current_date(), interval 1 day)
FAQs
How do I exclude a table from a freshness snapshot?Hover to view

How do I snapshot freshness for one source only?Hover to view
Are the results of freshness stored anywhere?Hover to view
Yes!
The dbt source freshness command will output a pass/warning/error status for
each table selected in the freshness snapshot.
Additionally, dbt will write the freshness results to a file in the target/ directory
called sources.json by default. You can also override this destination, use the -o flag to
the dbt source freshness command.
After enabling source freshness within a job, configure Artifacts in your Project
Details page, which you can find by clicking the gear icon and then selecting Account
settings. You can see the current status for source freshness by clicking View Sources in
the job page.
Add Exposures to your DAG
Exposures make it possible to define and describe a downstream use of your dbt project,
such as in a dashboard, application, or data science pipeline. By defining exposures, you
can then:
 run, test, and list resources that feed into your exposure
 populate a dedicated page in the auto-generated documentation site with context
relevant to data consumers
Declaring an exposure
Exposures are defined in .yml files nested under an exposures: key.
version: 2
exposures:
- name: weekly_jaffle_metrics
label: Jaffles by the Week
type: dashboard
maturity: high
url: https://bi.tool/dashboards/1
description: >
Did someone say "exponential growth"?
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- source('gsheets', 'goals')
- metric('count_orders')
owner:
name: Callum McData
email: [email protected]
Available properties
Required:
 name: a unique exposure name written in snake case

 type: one of dashboard, notebook, analysis, ml, application (used to organize in
docs site)
 owner: name or email required; additional properties allowed
Expected:
 depends_on: list of refable nodes, including ref, source, and metric (While possible,
it is highly unlikely you will ever need an exposure to depend on a source directly)
Optional:
 label: may contain spaces, capital letters, or special characters.

 url: enables the link to View this exposure in the upper right corner of the
generated documentation site
 maturity: one of high, medium, low
General properties (optional)
 description
 tags
 meta
We plan to add more subtypes and optional properties in future releases.
Referencing exposures
Once an exposure is defined, you can run commands that reference it:
dbt run -s +exposure:weekly_jaffle_report

dbt test -s +exposure:weekly_jaffle_report
When we generate our documentation site, you'll see the exposure appear:
Dedic
ated page in dbt-docs for each exposure
Add groups to your DAG
A group is a collection of nodes within a dbt DAG. Groups are named, and every group has
an owner. They enable intentional collaboration within and across teams by
restricting access to private models.
Group members may include models, tests, seeds, snapshots, analyses, and metrics. (Not
included: sources and exposures.) Each node may belong to only one group.
Declaring a group
Groups are defined in .yml files, nested under a groups: key.
models/marts/finance/finance.yml
groups:
- name: finance
owner:
# 'name' or 'email' is required; additional properties allowed
email: [email protected]
slack: finance-data
github: finance-data-team
Adding a model to a group
Use the group configuration to add one or more models to a group.
 Project-level
 Model-level
 In-file
dbt_project.yml
models:
marts:
finance:
+group: finance
Referencing a model in a group
By default, all models within a group have the protected access modifier. This means they
can be referenced by downstream resources in any group in the same project, using
the ref function. If a grouped model's access property is set to private, only resources within
its group can reference it.
models/schema.yml
models:
- name: finance_private_model
access: private
config:
group: finance
# in a different group!
- name: marketing_model
config:
group: marketing
models/marketing_model.sql
select * from {{ ref('finance_private_model') }}
$ dbt run -s marketing_model

...
dbt.exceptions.DbtReferenceError: Parsing Error
Node model.jaffle_shop.marketing_model attempted to reference node
model.jaffle_shop.finance_private_model,
which is not allowed because the referenced node is private to the finance group.
Related docs
Analyses
Overview
dbt's notion of models makes it easy for data teams to version control and collaborate on
data transformations. Sometimes though, a certain SQL statement doesn't quite fit into the
mold of a dbt model. These more "analytical" SQL files can be versioned inside of your dbt
project using the analysis functionality of dbt.
Any .sql files found in the analyses/ directory of a dbt project will be compiled, but not
executed. This means that analysts can use dbt functionality like {{ ref(...) }} to select from
models in an environment-agnostic way.
In practice, an analysis file might look like this (via the open source Quickbooks models):
analyses/running_total_by_account.sql
-- analyses/running_total_by_account.sql
with journal_entries as (
select *
from {{ ref('quickbooks_adjusted_journal_entries') }}
), accounts as (
select *
from {{ ref('quickbooks_accounts_transformed') }}
select
txn_date,
account_id,
adjusted_amount,
description,
account_name,
sum(adjusted_amount) over (partition by account_id order by id rows unbounded
preceding)
from journal_entries
order by account_id, id
To compile this analysis into runnable sql, run:
dbt compile
Then, look for the compiled SQL file in target/compiled/{project

name}/analyses/running_total_by_account.sql. This sql can then be pasted into a data
visualization tool, for instance. Note that no running_total_by_account relation will be
materialized in the database as this is an analysis, not a model.
Data Build Tool (DBT) is a popular open-source tool used in the data analytics and data
engineering fields. DBT helps data professionals transform, model, and prepare data for
analysis. If you’re preparing for an interview related to DBT, it’s important to be well-versed
in its concepts and functionalities. To help you prepare, here’s a list of common interview
questions and answers about DBT.
1. What is DBT?
Answer: DBT, short for Data Build Tool, is an open-source data transformation and modeling
tool. It helps analysts and data engineers manage the transformation and preparation of
data for analytics and reporting.
2. What are the primary use cases of DBT?
Answer:DBT is primarily used for data transformation, modeling, and preparing data for
analysis and reporting. It is commonly used in data warehouses to create and maintain data
pipelines.
3. How does DBT differ from traditional ETL tools?
Answer: Unlike traditional ETL tools, DBT focuses on transforming and modeling data within
the data warehouse itself, making it more suitable for ELT (Extract, Load, Transform)
workflows. DBT leverages the power and scalability of modern data warehouses and allows
for version control and testing of data models.
4. What is a DBT model?
Answer: A DBT model is a SQL file that defines a transformation or a table within the data
warehouse. Models can be simple SQL queries or complex transformations that create
derived datasets.
5. Explain the difference between source and model in DBT.

Answer: A source in DBT refers to the raw or untransformed data that is ingested into the
data warehouse. Models are the transformed and structured datasets created using DBT to
support analytics.
6. What is a DBT project?
Answer: A DBT project is a directory containing all the files and configurations necessary to
define data models, tests, and documentation. It is the primary unit of organization for DBT.
7. What is a DAG in the context of DBT?
Answer: DAG stands for Directed Acyclic Graph, and in the context of DBT, it represents the
dependencies between models. DBT uses a DAG to determine the order in which models
are built.
8. How do you write a DBT model to transform data?
Answer: To write a DBT model, you create a `.sql` file in the appropriate project directory,
defining the SQL transformation necessary to generate the target dataset.
9. What are DBT macros, and how are they useful in transformations?
Answer: DBT macros are reusable SQL code snippets that can simplify and standardize
common operations in your DBT models, such as filtering, aggregating, or renaming
columns.
10. How can you perform testing and validation of DBT models?
Answer: You can perform testing in DBT by writing custom SQL tests to validate your data
models. These tests can check for data quality, consistency, and other criteria to ensure
your models are correct.
11. Explain the process of deploying DBT models to production.
Answer: Deploying DBT models to production typically involves using DBT Cloud, CI/CD
pipelines, or other orchestration tools. You’ll need to compile and build the models and then
deploy them to your data warehouse environment.
12. How does DBT support version control and collaboration?
Answer: DBT integrates with version control systems like Git, allowing teams to collaborate
on DBT projects and track changes to models over time. It provides a clear history of
changes and enables collaboration in a multi-user environment.
13. What are some common performance optimization techniques for DBT models?
Answer: Performance optimization in DBT can be achieved by using techniques like
materialized views, optimizing SQL queries, and using caching to reduce query execution
times.
14. How do you monitor and troubleshoot issues in DBT?
Answer: DBT provides logs and diagnostics to help monitor and troubleshoot issues. You
can also use data warehouse-specific monitoring tools to identify and address performance
problems.
15. Can DBT work with different data sources and data warehouses?
Answer: Yes, DBT supports integration with a variety of data sources and data warehouses,
including Snowflake, BigQuery, Redshift, and more. It’s adaptable to different cloud and on-
premises environments.
16. How does DBT handle incremental loading of data from source systems?
Answer: DBT can handle incremental loading by using source freshness checks and
managing data updates from source systems. It can be configured to only transform new or
changed data.
17. What security measures does DBT support for data access and transformation?
Answer: DBT supports the security features provided by your data warehouse, such as row-
level security and access control policies. It’s important to implement proper access controls
at the database level.
18. How can you manage sensitive data in DBT models?
Answer: Sensitive data in DBT models should be handled according to your organization’s
data security policies. This can involve encryption, tokenization, or other data protection
measures.
19. Types of Materialization?
Answer: DBT supports several types of materialization are as follows:
1)View (Default):
Purpose: Views are virtual tables that are not materialized. They are essentially saved
queries that are executed at runtime.
Use Case: Useful for simple transformations or when you want to reference a SQL query in
multiple models.
{{ config(
materialized='view'
) }}
SELECT
...
FROM ...
2)Table:
Purpose: Materializes the result of a SQL query as a physical table in your data warehouse.
Use Case: Suitable for intermediate or final tables that you want to persist in your data
warehouse.
{{ config(
materialized='table'
) }}
SELECT
...
INTO {{ ref('my_table') }}
FROM ...
3)Incremental:
Purpose: Materializes the result of a SQL query as a physical table, but is designed to be
updated incrementally. It’s typically used for incremental data loads.
Use Case: Ideal for situations where you want to update your table with only the new or
changed data since the last run.
{{ config(
materialized='incremental'
) }}
SELECT
...
FROM ...
4)Table + Unique Key:
Purpose: Similar to the incremental materialization, but specifies a unique key that dbt can
use to identify new or updated rows.
Use Case: Useful when dbt needs a way to identify changes in the data.
{{ config(
materialized='table',
unique_key='id'
) }}
SELECT
...
INTO {{ ref('my_table') }}
FROM ...
5)Snapshot:
Purpose: Materializes a table in a way that retains a version history of the data, allowing
you to query the data as it was at different points in time.
Use Case: Useful for slowly changing dimensions or situations where historical data is
important.
{{ config(
materialized='snapshot'
) }}
SELECT
...
INTO {{ ref('my_snapshot_table') }}
FROM ...
20. Types of Tests in DBT?
Answer: Dbt provides several types of tests that you can use to validate your data. Here are
some common test types in dbt:
1)Unique Key Test (unique):
Verifies that a specified column or set of columns contains unique values.
version: 2
models:
- name: my_model
tests:
- unique:
columns: [id]
2)Not Null Test (not_null):
Ensures that specified columns do not contain null values.

version: 2
models:
- name: my_model
tests:
- not_null:
columns: [name, age]
3)Accepted Values Test (accepted_values):
Validates that the values in a column are among a specified list.
version: 2
models:
- name: my_model
tests:
- accepted_values:
column: status
values: ['active', 'inactive']
4)Relationship Test (relationship):
Verifies that the values in a foreign key column match primary key values in the referenced
table.
version: 2
models:
- name: orders
tests:
- relationship:
field: customer_id
5)Referential Integrity Test (referential integrity):
Checks that foreign key relationships are maintained between two tables.
version: 2
models:
- name: orders
tests:
- referential_integrity:
field: customer_id
6)Custom SQL Test (custom_sql):
Allows you to define custom SQL expressions to test specific conditions.
version: 2
models:
- name: my_model
tests:
- custom_sql: "column_name > 0"
21.What is seed?
Answer: A “seed” refers to a type of dbt model that represents a table or view containing
static or reference data. Seeds are typically used to store data that doesn’t change often and
doesn’t require transformation during the ETL (Extract, Transform, Load) process.
Here are some key points about seeds in dbt:
1. Static Data: Seeds are used for static or reference data that doesn’t change
frequently. Examples include lookup tables, reference data, or any data that
serves as a fixed input for analysis.
2. Initial Data Load: Seeds are often used to load initial data into a data warehouse
or data mart. This data is typically loaded once and then used as a stable
reference for reporting and analysis.
3. YAML Configuration: In dbt, a seed is defined in a YAML file where you specify
the source of the data and the destination table or view in your data warehouse.
The YAML file also includes configurations for how the data should be loaded.
Here’s an example of a dbt seed YAML file:
version: 2
sources:
- name: my_seed_data
tables:
- name: my_seed_table
seed:
freshness: { warn_after: '7 days', error_after: '14 days' }
22.What is Pre-hook and Post-hook?
Answer: Pre-hooks and Post-hooks are mechanisms to execute SQL commands or scripts
before and after the execution of dbt models, respectively. dbt is an open-source tool that
enables analytics engineers to transform data in their warehouse more effectively.
Here’s a brief explanation of pre-hooks and post-hooks:
1)Pre-hooks:
 A pre-hook is a SQL command or script that is executed before running dbt
models.
 It allows you to perform setup tasks or run additional SQL commands before the
main dbt modeling process.
 Common use cases for pre-hooks include tasks such as creating temporary
tables, loading data into staging tables, or performing any other necessary setup
before model execution.
Example of a pre-hook :
-- models/my_model.sql
{{ config(
pre_hook = "CREATE TEMP TABLE my_temp_table AS SELECT * FROM my_source_table"
) }}
SELECT
column1,
column2
FROM
my_temp_table
2)Post-hooks:
 A post-hook is a SQL command or script that is executed after the successful
completion of dbt models.
 It allows you to perform cleanup tasks, log information, or execute additional SQL
commands after the models have been successfully executed.

 Common use cases for post-hooks include tasks such as updating metadata
tables, logging information about the run, or deleting temporary tables created
during the pre-hook.
Example of a post-hook :
-- models/my_model.sql
SELECT
column1,
column2
FROM
my_source_table
{{ config(
post_hook = "UPDATE metadata_table SET last_run_timestamp = CURRENT_TIMESTAMP"
) }}
23.what is snapshots?
Answer: “snapshots” refer to a type of dbt model that is used to track changes over time in a
table or view. Snapshots are particularly useful for building historical reporting or analytics,
where you want to analyze how data has changed over different points in time.
Here’s how snapshots work in dbt:
1. Snapshot Tables: A snapshot table is a table that represents a historical state of
another table. For example, if you have a table representing customer
information, a snapshot table could be used to capture changes to that
information over time.
2. Unique Identifiers: To track changes over time, dbt relies on unique identifiers
(primary keys) in the underlying data. These identifiers are used to determine
which rows have changed, and dbt creates new records in the snapshot table
accordingly.
3. Timestamps: Snapshots also use timestamp columns to determine when each
historical version of a record was valid. This allows you to query the data as it
existed at a specific point in time.
4. Configuring Snapshots: In dbt, you configure snapshots in your project by
creating a separate SQL file for each snapshot table. This file defines the base
table or view you’re snapshotting, the primary key, and any other necessary
configurations.
Here’s a simplified example:
-- snapshots/customer_snapshot.sql
{{ config(
materialized='snapshot',
unique_key='customer_id',
strategy='timestamp'
) }}
SELECT
customer_id,
name,
email,
address,
current_timestamp() as snapshot_timestamp
FROM
source.customer;
24.What is macros?
Answer: macros refer to reusable blocks of SQL code that can be defined and invoked within
dbt models. dbt macros are similar to functions or procedures in other programming
languages, allowing you to encapsulate and reuse SQL logic across multiple queries.
Here’s how dbt macros work:
1. Definition: A macro is defined in a separate file with a .sql extension. It contains
SQL code that can take parameters, making it flexible and reusable.
-- my_macro.sql
{% macro my_macro(parameter1, parameter2) %}
SELECT
column1,
column2
FROM
my_table
WHERE
condition1 = {{ parameter1 }}
AND condition2 = {{ parameter2 }}
{% endmacro %}
2. Invocation: You can then use the macro in your dbt models by referencing it.
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
When you run the dbt project, dbt replaces the macro invocation with the actual SQL code
defined in the macro.
3. Parameters: Macros can accept parameters, making them dynamic and reusable for
different scenarios. In the example above, parameter1 and parameter2 are parameters that
can be supplied when invoking the macro.
4. Code Organization: Macros help in organizing and modularizing your SQL code. They
are particularly useful when you have common patterns or calculations that need to be
repeated across multiple models.
-- my_model.sql
{{ my_macro(parameter1=1, parameter2='value') }}
-- another_model.sql
{{ my_macro(parameter1=2, parameter2='another_value') }}
25.what is project structure?
Answer: Aproject structure refers to the organization and layout of files and directories within
a dbt project. dbt is a command-line tool that enables data analysts and engineers to
transform data in their warehouse more effectively. The project structure in dbt is designed
to be modular and organized, allowing users to manage and version control their analytics
code easily.
A typical dbt project structure includes the following key components:
1. Models Directory:
This is where you store your SQL files containing dbt models. Each model represents a
logical transformation or aggregation of your raw data. Models are defined using SQL syntax
and are typically organized into subdirectories based on the data source or business logic.
2. Data Directory:
The data directory is used to store any data files that are required for your dbt
transformations. This might include lookup tables, reference data, or any other supplemental
data needed for your analytics.

3. Analysis Directory:
This directory contains SQL files that are used for ad-hoc querying or exploratory analysis.
These files are separate from the main models and are not intended to be part of the core
data transformation process.
4. Tests Directory:
dbt allows you to write tests to ensure the quality of your data transformations.
The tests directory is where you store YAML files defining the tests for your models. Tests
can include checks on the data types, uniqueness, and other criteria.
5. Snapshots Directory:
Snapshots are used for slowly changing dimensions or historical tracking of data changes.
The snapshots directory is where you store SQL files defining the logic for these snapshots.
6. Macros Directory:
Macros in dbt are reusable pieces of SQL code. The macros directory is where you store
these macros, and they can be included in your models for better modularity and
maintainability.
7. Docs Directory:
This directory is used for storing documentation for your dbt project. Documentation is
crucial for understanding the purpose and logic behind each model and transformation.
8. dbt_project.yml:
This YAML file is the configuration file for your dbt project. It includes settings such as the
target warehouse, database connection details, and other project-specific configurations.
9. Profiles.yml:
This file contains the connection details for your data warehouse. It specifies how to connect
to your database, including the type of database, host, username, and password.
10. Analysis and Custom Folders:
You may have additional directories for custom scripts, notebooks, or other artifacts related
to your analytics workflow.
Having a well-organized project structure makes it easier to collaborate with team members,
maintain code, and manage version control. It also ensures that your analytics code is
modular, reusable, and easy to understand.

my_project/
|-- analysis/
| |-- my_analysis_file.sql
|-- data/
| |-- my_model_file.sql
|-- macros/
| |-- my_macro_file.sql
|-- models/
| |-- my_model_file.sql
|-- snapshots/
| |-- my_snapshot_file.sql
|-- tests/
| |-- my_test_file.sql
|-- dbt_project.yml
26. What is data refresh?
Answer: “data refresh” typically refers to the process of updating or reloading data in your
data warehouse. Dbt is a command-line tool that enables data analysts and engineers to
transform data in their warehouse more effectively. It allows you to write modular SQL
queries, called models, that define transformations on your raw data.
Here’s a brief overview of the typical workflow involving data refresh in dbt:
1. Write Models: Analysts write SQL queries to transform raw data into analysis-
ready tables. These queries are defined in dbt models.
2. Run dbt: Analysts run dbt to execute the SQL queries and create or update the
tables in the data warehouse. This process is often referred to as a dbt run.
3. Data Refresh: After the initial run, you may need to refresh your data regularly to
keep it up to date. This involves re-running dbt on a schedule or as needed to
reflect changes in the source data.
4. Incremental Models: To optimize performance, dbt allows you to write
incremental models. These models only transform and refresh the data that has
changed since the last run, rather than reprocessing the entire dataset. This is
particularly useful for large datasets where a full refresh may be time-consuming.
5. Dependency Management: Dbt also handles dependency management. If a
model depends on another model, dbt ensures that the dependencies are run
first, maintaining a proper order of execution.

By using dbt for data refresh, you can streamline and automate the process of transforming
raw data into a clean, structured format for analysis. This approach promotes repeatability,
maintainability, and collaboration in the data transformation process

1. What is a model in dbt (data build tool)?
A model is a select statement. Models are defined in .sql files (typically in your models
directory):
Each .sql file contains one model / select statement
The name of the file is used as the model name
Models can be nested in subdirectories within the models directory
When you execute the dbt run command, dbt will build this model in your data warehouse
by wrapping it in a create view as or create table as statement.
2. What are the configurations in a model?
Configurations are “model settings” that can be set in your dbt_project.yml file, and in your
model file using a config block. Some example configurations include:
Change the materialization that a model uses – a materialization determines the SQL that
dbt uses to create the model in your warehouse.
3. Can I store my models in a directory other than the ⊨⊨ directory in my project?
By default, dbt expects your seed files to be located in the models subdirectory of your
project.
To change this, update the source-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
source-paths: [“transformations”]
4. Can I split my models across multiple schemas?

Yes. Use the schema configuration in your dbt_project.yml file, or using a config block:
dbt_project.yml
name: jaffle_shop
…
models:
jaffle_shop:
marketing:
schema: marketing #
5. Do model names need to be unique?

Yes! To build dependencies between models, you need to use the ref function. The ref
function only takes one argument — the model name (i.e. the filename). As a result, these
model names need to be unique, even if they are in distinct folders.
Post Views: 9,375
Related Posts
 DBT : How does DBT handle performance optimization and data scalability
DBT does not handle performance optimization and data scalability directly.
However, it can be used…
 DBT : Handling Late-Arriving Data in DBT
Data warehousing and business intelligence often involve working with data
that arrives after a certain…
 DBT : DBT's way of handling versioning of data models.
DBT uses a versioning system called "Incremental Modeling" which allows to
version data models by…
 How does DBT handle dependencies and data lineage?
DBT handles dependencies and data lineage by providing a set of features that
allow users…
 DBT : How does DBT handle data lineage and auditing ?
DBT handles data lineage and auditing by tracking the history of
transformations and changes to…
 DBT : Explain DBT's seed-paths
In a DBT (Data Build Tool) project, seed-paths configuration in the
dbt_project.yml file is used…
 How does DBT handle incremental data loading?
DBT (Data Build Tool) does not have a built-in feature for incremental data
loading, but…
 DBT : What is DBT quoting ?
DBT (Data Build Tool) quoting refers to the process of wrapping a string or
identifier…
 DBT : How do you use DBT to document your data pipeline?
DBT helps maintain a clear and detailed documentation of the entire data
pipeline, making it…
 How do you use DBT to manage your data lineage?
Data lineage refers to the history of data as it moves from its source to…
6. How do I remove deleted models from my data warehouse?
If you delete a model from your dbt project, dbt does not automatically drop the relation
from your schema. This means that you can end up with extra objects in schemas that dbt
creates, which can be confusing to other users.
7. If models can only be ‘select’ statements, how do I insert records?
If you wish to use insert statements for perfomance reasons (i.e. to reduce data that is
processed), consider incremental models
If you wish to use insert statements since your source data is constantly changing (e.g. to
create “Type 2 Slowly Changing Dimensions”), consider snapshotting your source data, and
building models on top of your snaphots.
8. What are the four types of materializations built into dbt ?
table
view
incremental
ephemeral
9. What is incremental models in dbt?
Incremental models are built as tables in your data warehouse – the first time a model is
run, the table is built by transforming all rows of source data. On subsequent runs, dbt
transforms only the rows in your source data that you tell dbt to filter for, inserting them into
the table that has already been built (the target table). Incremental models allow dbt to
insert or update records into a table since the last time that dbt was run. You can
significantly reduce the build time by just transforming new records.Incremental models
require extra configuration and are an advanced usage of dbt.
10. What is ephemeral models in dbt ?

ephemeral models are not directly built into the database. Instead, dbt will interpolate the
code from this model into dependent models as a common table expression. You can still
write reusable logic. Ephemeral models can help keep your data warehouse clean by
reducing clutter (also consider splitting your models across multiple schemas by using
custom schemas. You cannot select directly from this model. Overuse of the ephemeral
materialization can also make queries harder to debug.
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
. How do I use the incremental materialization?
incremental models are defined with select statements, with the the materialization defined
in a config block.
{{
config(
materialized=’incremental’
)
}}
select …
To use incremental models, you also need to tell dbt, on how to filter the rows on an
incremental run and the uniqueness constraint of the model (if any).
12. How to do Filtering rows on an incremental ?
To tell dbt which rows it should transform on an incremental run, wrap valid SQL that filters
for these rows in the is_incremental() macro. Often, you’ll want to filter for “new” rows, as in,
rows that have been created since the last time dbt ran this model. The best way to find the
timestamp of the most recent run of this model is by checking the most recent timestamp in
your target table. dbt makes it easy to query your target table by using the “{{ this }}”
variable.
13. How do I rebuild an incremental model?
If your incremental model logic has changed, the transformations on your new rows of data
may diverge from the historical transformations, which are stored in your target table. In this
case, you should rebuild your incremental model. To force dbt to rebuild the entire
incremental model from scratch, use the –full-refresh flag on the command line. This flag
will cause dbt to drop the existing target table in the database before rebuilding it for all-
time.
$ dbt run –full-refresh –models my_incremental_model+
14. What is the is_incremental() macro ?
The is_incremental() macro will return True if:
the destination table already exists in the database
dbt is not running in full-refresh mode
the running model is configured with materialized=’incremental’
15. What if the columns of my incremental model change?
If you add a column from your incremental model, and execute a dbt run, this column will
not appear in your target table. Similarly, if you remove a column from your incremental
model, and execute a dbt run, this column will not be removed from your target table.
Instead, whenever the logic of your incremental changes, execute a full-refresh run of both
your incremental model and any downstream models.
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
16. What is an incremental_strategy?
incremental_strategy config controls the code that dbt uses to build incremental models.
Different approaches may vary by effectiveness depending on the volume of data, the
reliability of your unique_key, or the availability of certain features.
Snowflake: merge (default), delete+insert (optional)
BigQuery: merge (default), insert_overwrite (optional)
Spark: insert_overwrite (default), merge (optional, Delta-only)
17. What is aliases in dbt ?
When dbt runs a model, it will generally create a relation (either a table or a view) in the
database. By default, dbt uses the filename of the model as the identifier for this relation in
the database. This identifier can optionally be overridden using the alias model
configuration.
18. What is a custom schema in dbt ?
By default, all dbt models are built in the schema specified in your target. In dbt projects
with lots of models, it may be useful to instead build some models in schemas other than
your target schema – this can help logically group models together. You can use custom
schemas in dbt to build models in a schema other than your target schema. It’s important to
note that by default, dbt will generate the schema name for a model by concatenating the
custom schema to the target schema, as in: <target_schema>_<custom_schema>;.
19. How do I use custom schemas?
Use the schema configuration key to specify a custom schema for a model. As with any
configuration, you can either:
apply this configuration to a specific model by using a config block within a model, or
apply it to a subdirectory of models by specifying it in your dbt_project.yml file
{{ config(schema=’marketing’) }}
select
20. Which vars are available in generate_schema_name?
Globally-scoped variables and variables defined on the command line with –vars are
accessible in the generate_schema_name context.
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
21. What tests are available to use in dbt?
Out of the box, dbt ships with the following tests:
unique
not_null
accepted_values
relationships (i.e. referential integrity)
22. How do I build one seed at a time?
As of v0.16.0, you can use a –select option with the dbt seed command, like so:
$ dbt seed –select country_codes
There is also an –exclude option.
23. How can I see the SQL that dbt is running?
To check out the SQL that dbt is running, you can look in:
dbt Cloud:
Within the run output, click on a model name, and then select “Details”
dbt CLI:
The target/compiled/ directory for compiled select statements
The target/run/ directory for compiled create statements
The logs/dbt.log file for verbose logging.
24. What is the difference between dbt Core, the dbt CLI and dbt Cloud?
dbt Core is the software that takes a dbt project (.sql and .yml files) and a command and
then creates tables/views in your warehouse. dbt Core includes a command line interface
(CLI) so that users can execute dbt commands using a terminal program. dbt Core is open
source and free to use.
dbt Cloud is an application that helps teams use dbt. dbt Cloud provides a web-based IDE
to develop dbt projects, a purpose-built scheduler, and a way to share dbt documentation
with your team. dbt Cloud offers a number of features for free, as well as additional features
in paid tiers
25. Can I store my seeds in a directory other than the data�� directory in my
project?
By default, dbt expects your seed files to be located in the data subdirectory of your project.
To change this, update the data-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
data-paths: [“seeds”]
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
Pages:
HOME » DBT (DATA BUILD TOOL) INTERVIEW QUESTIONS » PAGE 6

POSTED INSOFTWARE
dbt (data build tool) interview questions
USER JANUARY 17, 2021 LEAVE A COMMENTON DBT (DATA BUILD TOOL)
INTERVIEW QUESTIONS
26. Can I store my models in a directory other than the ⊨⊨ directory in my project?
By default, dbt expects your seed files to be located in the models subdirectory of your
project.
To change this, update the source-paths configuration in your dbt_project.yml file, like so:
dbt_project.yml
source-paths: [“transformations”]
27. Can I connect my dbt project to two databases?
It depends on the warehouse used in your tech stack.
dbt projects connecting to warehouses like Snowflake or Bigquery—these empower one set
of credentials to draw from all datasets or ‘projects’ available to an account—are sometimes
said to connect to more than one database.
dbt projects connecting to warehouses like Redshift and Postgres—these tie one set of
credentials to one database—are said to connect to one database only.
28. Do I need to create my target schema before running dbt?

Nope. dbt will check if the schema exists when it runs. If the schema does not exist, dbt will
create it for you.
29. How do I create dependencies between models?
When you use the ref function, dbt automatically infers the dependencies between models.
30. How do I define a column type?
Your warehouse’s SQL engine automatically assigns a datatype to every column, whether
it’s found in a source or model. To force SQL to treat a columns a certain datatype, use cast
functions:
select
cast(order_id as integer),
cast(order_price as double(6,2)) — a more generic way of doing type conversion
from {{ ref(‘stg_orders’) }}
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
31. Do I need to add a yaml entry for column for it to appear in the docs site?
no.dbt will introspect your warehouse to generate a list of columns in each relation, and
match it with the list of columns in your .yml files.
32. Can I document things other than models, like sources, seeds, and snapshots?
Yes! You can document almost everything in your project using the description.
33. How to debug if any of the tests failed?
To debug a failing test, find the SQL that dbt ran by:
dbt Cloud:
Within the test output, click on the failed test, and then select “Details”
dbt CLI:
Open the file path returned as part of the error message.
Navigate to the target/compiled/schema_tests directory for all compiled test queries
Copy the SQL into a query editor (in dbt Cloud, you can paste it into a new Statement), and
run the query to find the records that failed.
34. If the compiled SQL has a lot of spaces and new lines, how can I get rid of it?
This is known as “whitespace control”.
Use a minus sign (-, e.g. {{- … -}}, {%- … %}, {#- … -#}) at the start or end of a block to strip
whitespace before or after the block
35. How do I preserve leading zeros in a seed?
If you need to preserve leading zeros (for example in a zipcode or mobile number):
v0.16.0 onwards: Include leading zeros in your seed file, and use the column_types
configuration with a varchar datatype of the correct length.
Prior to v0.16.0: Use a downstream model to pad the leading zeros using SQL, for
example: lpad(zipcode, 5, ‘0’)
Post Views: 9,376
Related Posts
allow users…
loading, but…
identifier…
36. How do I run models downstream of a seed?

You can run models downstream of a seed using the model selection syntax, and treating
the seed like a model.
$ dbt run –models country_codes+
37. How do I run one model at a time?
To run one model, use the –models flag (or -m flag), followed by the name of the model:
38. How do I run models downstream of one source?
To run models downstream of a source, use the source: selector:
$ dbt run –models source:jaffle_shop+
39. What happens if I add new columns to my snapshot query?
When the columns of your source query changes, dbt will attempt to reconcile this change
in the destination snapshot table. dbt does this by:
Creating new columns from the source query in the destination table
Expanding the size of string types where necessary (eg. varchars on Redshift)
dbt will not delete columns in the destination snapshot table if they are removed from the
source query. It will also not change the type of a column beyond expanding the size of
varchar columns. That is, if a string column is changed to a date column in the snapshot
source query, dbt will not attempt to change the type of the column in the destination table.
40. How do I specify column types?
Simply cast the column to the correct type in your model:
select
id,
created::timestamp as created
from some_other_table
41. Do model names need to be unique?
Yes. To build dependencies between models, you need to use the ref function. The ref
function only takes one argument – the model name (i.e. the filename). As a result, these
model names need to be unique, even if they are in distinct folders.
Post Views: 9,376
Related Posts
Data warehousing and business intelligence often involve working with data that
arrives after a certain…
DBT uses a versioning system called "Incremental Modeling" which allows to version
data models by…
DBT handles dependencies and data lineage by providing a set of features that allow
users…
DBT handles data lineage and auditing by tracking the history of transformations and
changes to…
In a DBT (Data Build Tool) project, seed-paths configuration in the dbt_project.yml
file is used…
DBT (Data Build Tool) does not have a built-in feature for incremental data loading,
but…
identifier…
DBT helps maintain a clear and detailed documentation of the entire data pipeline,
making it…
dbt (Data Build Tool) Overview: What is dbt and What Can It Do for My Data Pipeline?
There are many tools on the market to help your organization transform data and make it
accessible for business users. One that we recommend and use often—dbt (data build tool)
—focuses solely on making the process of transforming data simpler and faster. In this blog
we will discuss what dbt is, how it can transform the way your organization curates its data
for decision making, and how you can get started with using dbt (data build tool).
Data plays an instrumental role in decision making for organizations. As the volume of data
increases, so does the need to make it accessible to everyone within your organization to
use. However, because there is a shortage of data engineers in the marketplace, for most
organizations there isn’t enough time or resources available to curate data and make data
analytics ready.
Disjointed sources, data quality issues, and inconsistent definitions for metrics and
business attributes lead to confusion, redundant efforts, and poor information being
distributed for decision making. Transforming your data allows you to integrate, clean, de-
duplicate, restructure, filter, aggregate, and join your data—enabling your organization to
develop valuable, trustworthy insights through analytics and reporting. There are many
tools on the market to help you do this, but one in particular—dbt (data build tool)—
simplifies and speeds up the process of transforming data and building data pipelines.
In this blog, we cover:
 What is dbt?↵
 How is dbt Different Than Other Tools?↵
 What Can dbt Do for My Data Pipeline?↵
 How Can I Get Started with dbt?↵
 Training To Learn How to Use dbt↵
What is dbt (data build tool)?

According to dbt, the tool is a development framework that combines modular SQL with
software engineering best practices to make data transformation reliable, fast, and fun.
dbt (data build tool) makes data engineering activities accessible to people with data
analyst skills to transform the data in the warehouse using simple select statements,
effectively creating your entire transformation process with code. You can write custom
business logic using SQL, automate data quality testing, deploy the code, and deliver
trusted data with data documentation side-by-side with the code. This is more important
today than ever due to the shortage of data engineering professionals in the marketplace.
Anyone who knows SQL can now build production-grade data pipelines, reducing the
barrier to entry that previously limited staffing capabilities for legacy technologies.
In short, dbt (data build tool) turns your data analysts into engineers and allows them to
own the entire analytics engineering workflow.
How is dbt (Data Build Tool) Different Than Other Tools?

With dbt, anyone who knows how to write SQL SELECT statements has the power to build
models, write tests, and schedule jobs to produce reliable, actionable datasets for analytics.
The tool acts as an orchestration layer on top of your data warehouse to improve and
accelerate your data transformation and integration process. dbt works by pushing down
your code—doing all the calculations at the database level—making the entire
transformation process faster, more secure, and easier to maintain.
dbt (data build tool) is easy to use for anyone who knows SQL—you don’t need to have a
high-powered data engineering skillset to build data pipelines anymore.
Hear why dbt is the iFit engineering team’s favorite tool and how it helped them drive triple-
digit growth for the company:
dbt’s ELT methodology brings increased agility and speed to iFit’s data pipeline. What
would have taken months with traditional ETL tools, now takes weeks or days.
What Can dbt (Data Build Tool) Do for My Data Pipeline?

dbt (data build tool) has two core workflows: building data models and testing data models.
It fits nicely into the modern data stack and is cloud agnostic—meaning it works within each
of the major cloud ecosystems: Azure, GCP, and AWS.
With dbt, data analysts take ownership of the entire analytics engineering workflow from
writing data transformation code all the way through to deployment and documentation—as
well as to becoming better able to promote a data-driven culture within the organization.
They can:
1. Quickly and easily provide clean, transformed data ready for analysis:
dbt enables data analysts to custom-write transformations through SQL SELECT

statements. There is no need to write boilerplate code. This makes data transformation
accessible for analysts that don’t have extensive experience in other programming
languages.
The dbt Cloud UI offers an attractive interface for individuals of all ranges of experience to
comfortably develop in.
2. Apply software engineering practices—such as modular code, version control,
testing, and continuous integration/continuous deployment (CI/CD)—to analytics
code:
Continuous integration means less time testing and quicker time to development, especially
with dbt Cloud. You don’t need to push an entire repository when there are necessary
changes to deploy, but rather just the components that change. You can test all the
changes that have been made before deploying your code into production. dbt Cloud also
has integration with GitHub for automation of your continuous integration pipelines, so you
won’t need to manage your own orchestration, which simplifies the process.
While configuring a continuous integration job in the dbt Cloud UI, you can take advantage
of dbt’s sleek slim UI feature and even use webhooks to run jobs automatically when a pull
request is open.
3. Build reusable and modular code using Jinja.
dbt (data build tool) allows you to establish macros and integrate other functions outside of
SQL’s capabilities for advanced use cases. Macros in Jinja are pieces of code that can be
used multiple times. Instead of starting at the raw data with every analysis, analysts instead
build up reusable data models that can be referenced in subsequent work.
Instead of repeating code to create a hashed surrogate key, create a dynamic macro with
Jinja and SQL to consolidate the logic in one spot using dbt.
4. Maintain data documentation and definitions within dbt as they build and develop
lineage graphs:
Data documentation is accessible, easily updated, and allows you to deliver trusted data
across the organization. dbt (data build tool) automatically generates documentation around
descriptions, models dependencies, model SQL, sources, and tests. dbt creates lineage
graphs of the data pipeline, providing transparency and visibility into what the data is
describing, how it was produced, as well as how it maps to business logic.
Lineage is automatically generated for all your models in dbt. This has saved teams
numerous hours in manual documentation time.
5. Perform simplified data refreshes within dbt Cloud:
There is no need to host an orchestration tool when using dbt Cloud. It includes a feature
that provides full autonomy with scheduling production refreshes at whatever cadence the
business wants.
Scheduling is simplified in the dbt Cloud UI. Just give it directions on what time you want a
production job to run, and it will take it from there.
6. Perform automated testing:
dbt (data build tool) comes prebuilt with unique, not null, referential integrity, and accepted
value testing. Additionally, you can write your own custom tests using a combination of Jinja
and SQL. To apply any test on a given column, you simply reference it under the same
YAML file used for documentation for a given table or schema. This makes testing data
integrity an almost effortless process.
Simple example of applying tests on the primary key for a table in a project.
Talk to an expert about your dbt needs.
How Can I Get Started with dbt (Data Build Tool)?
Prerequisites to Getting Started with dbt (Data Build Tool)
Before learning dbt (data build tool), there are three pre-requisites that we recommend:
1. SQL: Since dbt uses SQL as its core language to perform transformations, you must
be proficient in using SQL SELECT statements. There are plenty of courses online
available if you don’t have this experience, so make sure to find one that gives you
the necessary foundation to begin learning dbt.
2. Modeling: Like any other data transformation tool, you should have
some strategy when it comes to data modeling. This will be critical for re-usability of
code, drilling down, and performance optimization. Don’t just adopt the model of your
data sources, we recommend transforming data into the language and structure of
the business. Modeling will be essential to structure your project and find lasting
success.
3. Git: If you are interested in learning how to use dbt Core, you will need to be
proficient in Git. We recommend finding any course that covers the Git Workflow, Git
Branching, and using Git in a team setting. There are lots of great options available
online, so explore and find one that you like.
Training To Learn How to Use dbt (Data Build Tool)

There are many ways you can dive in and learn how to use dbt (data build tool). Here are
three tips on the best places to start:
1. The dbt Labs Free dbt Fundamentals Course: This course is a great starting point
for any individual interested in learning the basics on using dbt (data build cloud).
This covers many critical concepts like setting up dbt, creating models and tests,
generating documentation, deploying your project, and much more.
2. The “Getting Started Tutorial” from dbt Labs: Although there is some overlap
with concepts from the fundamentals course above, the “getting started tutorial” is a
comprehensive hands-on way to learn as you go. There are video series offered for
both using dbt Core and dbt Cloud. If you really want to dive in, you can find a
sample dataset from online to model out as you go through the videos. This is a
great way to learn how to use dbt (data build tool) in a way that will directly reflect
how you would build out a project for your organization.
3. Join the dbt Slack Community: This is an active community of thousands of
members that range from beginner to advanced. There are channels like #learn-on-
demand and #advice-dbt-for-beginners that will be very helpful for a beginner to ask
questions as they go through the above resources.
dbt (data build tool) simplifies and speeds up the process of transforming data and building
data pipelines. Now is the time to dive in and learn how to use it to help your organization
curate its data for better decision making.
What is a Data Model?
A data model organizes different data elements and standardizes how they relate to one
another and real-world entity properties. So logically then, data modeling is the process of
creating those data models.
Data models are composed of entities, and entities are the objects and concepts whose
data we want to track. They, in turn, become tables found in a database. Customers,
products, manufacturers, and sellers are potential entities.
Each entity has attributes—details that the users want to track. For instance, a customer’s
name is an attribute.
With that out of the way, let’s check out those data modeling interview questions!
Basic Data Modeling Interview Questions
1. What Are the Three Types of Data Models?
The three types of data models:
 Physical data model - This is where the framework or schema describes how data
is physically stored in the database.
 Conceptual data model - This model focuses on the high-level, user’s view of the
data in question
 Logical data models - They straddle between physical and theoretical data
models, allowing the logical representation of data to exist apart from the physical
storage.
2. What is a Table?
A table consists of data stored in rows and columns. Columns, also known as fields, show
data in vertical alignment. Rows also called a record or tuple, represent data’s horizontal
alignment.
3. What is Normalization?
Database normalization is the process of designing the database in such a way that it
reduces data redundancy without sacrificing integrity.
4. What Does a Data Modeler Use Normalization For?
The purposes of normalization are:
 Remove useless or redundant data
 Reduce data complexity
 Ensure relationships between the tables in addition to the data residing in the
tables
 Ensure data dependencies and that the data is stored logically.
Become a Data Scientist with Hands-on Training!
Data Scientist Master’s ProgramEXPLORE PROGRAM
5. So, What is Denormalization, and What is its Purpose?
Denormalization is a technique where redundant data is added to an already normalized

database. The procedure enhances read performance by sacrificing write performance.
6. What Does ERD Stand for, and What is it?
ERD stands for Entity Relationship Diagram and is a logical entity representation, defining
the relationships between the entities. Entities reside in boxes, and arrows symbolize
relationships.
7. What’s the Definition of a Surrogate Key?
A surrogate key, also known as a primary key, enforces numerical attributes. This surrogate
key replaces natural keys. Instead of having primary or composite primary keys, data
modelers create the surrogate key, which is a valuable tool for identifying records,
building SQL queries, and enhancing performance.
8. What Are the Critical Relationship Types Found in a Data Model? Describe Them.
The main relationship types are:
 Identifying. A relationship line normally connects parent and child tables. But if a
child table’s reference column is part of the table’s primary key, the tables are
connected by a thick line, signifying an identifying relationship.
 Non-identifying. If a child table’s reference column is NOT a part of the table’s

primary key, the tables are connected by a dotted line, signifying a no-identifying
relationship.
 Self-recursive. A recursive relationship is a standalone column in a table

connected to the primary key in the same table.
9. What is an Enterprise Data Model?
This is a data model that consists of all the entries required by an enterprise.
Intermediate Data Modeling Interview Questions
10. What Are the Most Common Errors You Can Potentially Face in Data Modeling?
These are the errors most likely encountered during data modeling.
 Building overly broad data models: If tables are run higher than 200, the data
model becomes increasingly complex, increasing the likelihood of failure
 Unnecessary surrogate keys: Surrogate keys must only be used when the natural
key cannot fulfill the role of a primary key
 The purpose is missing: Situations may arise where the user has no clue about
the business’s mission or goal. It’s difficult, if not impossible, to create a specific
business model if the data modeler doesn’t have a workable understanding of the
company’s business model
 Inappropriate denormalization: Users shouldn’t use this tactic unless there is an

excellent reason to do so. Denormalization improves read performance, but it
creates redundant data, which is a challenge to maintain.
11. Explain the Two Different Design Schemas.
The two design schema is called Star schema and Snowflake schema. The Star schema
has a fact table centered with multiple dimension tables surrounding it. A Snowflake
schema is similar, except that the level of normalization is higher, which results in the
schema looking like a snowflake.
12. What is a Slowly Changing Dimension?

These are dimensions used to manage both historical data and current data in data
warehousing. There are four different types of slowly changing dimensions: SCD Type 0
through SCD Type 3.
13. What is Data Mart?
A data mart is the most straightforward set of data warehousing and is used to focus on one
functional area of any given business. Data marts are a subset of data warehouses oriented
to a specific line of business or functional area of an organization (e.g., marketing, finance,
sales). Data enters data marts by an assortment of transactional systems, other data
warehouses, or even external sources.
14. What is Granularity?
Granularity represents the level of information stored in a table. Granularity is defined as

high or low. High granularity data contains transaction-level data. Low granularity has low-
level information only, such as that found in fact tables.
15. What is Data Sparsity, and How Does it Impact Aggregation?
Data sparsity defines how much data we have for a model’s specified dimension or entity. If
there is insufficient information stored in the dimensions, then more space is needed to
store these aggregations, resulting in an oversized, cumbersome database.
16. What Are Subtype and Supertype Entities?
Entities can be broken down into several sub-entities or grouped by specific features. Each
sub-entity has relevant attributes and is called a subtype entity. Attributes common to every
entity are placed in a higher or super level entity, which is why they are called supertype
entities.
17. In the Context of Data Modeling, What is the Importance of Metadata?
Metadata is defined as “data about data.” In the context of data modeling, it’s the data that
covers what types of data are in the system, what it’s used for, and who uses it.
Advanced-Data Modeling Interview Questions
18. Should All Databases Be Rendered in 3NF?
No, it’s not an absolute requirement. However, denormalized databases are easily
accessible, easier to maintain, and less redundant.
19. What’s the Difference Between forwarding and Reverse Engineering, in the Context of
Data Models?
Forward engineering is a process where Data Definition Language (DDL) scripts are
generated from the data model itself. DDL scripts can be used to create databases.
Reverse Engineering creates data models from a database or scripts. Some data modeling
tools have options that connect with the database, allowing the user to engineer a database
into a data model.
20. What Are Recursive Relationships, and How Do You Rectify Them?
Recursive relationships happen when a relationship exists between an entity and itself. For
instance, a doctor could be in a health center’s database as a care provider, but if the
doctor is sick and goes in as a patient, this results in a recursive relationship. You would
need to add a foreign key to the health center’s number in each patient’s record.
21. What’s a Confirmed Dimension?
If a dimension is confirmed, it’s attached to at least two fact tables.
22. Why Are NoSQL Databases More Useful than Relational Databases?
NoSQL databases have the following advantages:
 They can store structured, semi-structured, or unstructured data
 They have a dynamic schema, which means they can evolve and change as
quickly as needed
 NoSQL databases have sharding, the process of splitting up and distributing data
to smaller databases for faster access
 They offer failover and better recovery options thanks to the replication
 It’s easily scalable, growing or shrinking as necessary
23. What’s a Junk Dimension?
This is a grouping of low-cardinality attributes like indicators and flags, removed from other
tables, and subsequently “junked” into an abstract dimension table. They are often used to
initiate Rapidly Changing Dimensions within data warehouses.
24. If a Unique Constraint Gets Applied to a Column, Will It Generate an Error If You
Attempt to Place Two Nulls in It?
No, it won’t, because null error values are never equal. You can put in numerous null values
in a column and not generate an error.
Learn over a dozen of data science tools and skills with PG Program in Data Science and
get access to masterclasses by Purdue faculty. Enroll now and add a shining star to your
data science resume!
Do You Want Data Modeling Training?
I hope these Data modeling interview questions have given you an idea of the kind of
questions can be asked in an interview. So, if you’re intrigued by what you’ve read about
data modeling and want to know how to become a data modeler, then you will want to
check the article that shows you how to become one.
But if you’re ready to accelerate your career in data science, then sign up for
Simplilearn’s Data Scientist Course. You will gain hands-on exposure to key technologies,
including R, SAS, Python, Tableau, Hadoop, and Spark. Experience world-class training by
an industry leader on the most in-demand Data Science and Machine learning skills.
The program boasts a half dozen courses, over 30 in-demand skills and tools, and more
than 15 real-life projects. So check out Simplilearn’s resources and get that new data
modeling career off to a great start!
In the world of data analytics, where information reigns supreme, businesses rely on robust
tools to manage and analyze their data effectively.
One such tool that has gained remarkable traction is dbt, or Data Build Tool. With its ability
to transform and analyze data efficiently, dbt has become a game-changer in the field of
data engineering and analysis.
To harness the power of dbt, organizations need skilled professionals who can navigate it's
intricacies and unleash its capabilities.
As a result, dbt-related job interviews have become increasingly critical for both employers
and candidates.
If you're preparing for a dbt-related job interview or seeking to evaluate candidates' dbt
skills, it's important to ask the right questions.
To help you with that, we have compiled a list of essential dbt interview questions for every
level. These questions cover a range of topics and will assess the candidate's knowledge
and understanding of dbt's core concepts, features, and best practices.
1). Beginner Level.
Question 1.1: What is dbt, and how does it differ from traditional ETL/ELT tools?
 dbt stands for Data Build Tool and is designed to transform, test, and document
data. Unlike traditional ETL/ELT tools, dbt focuses on transforming data within a data
warehouse, utilizing SQL and version control systems.
Answer:
dbt (data build tool) is an open-source tool that enables analysts and data engineers to
transform, test, and manage data in their data warehouses. It uses SQL and YAML
configuration files to define transformations, models, and tests, making it easy to build and
maintain data pipelines.
Question 1.2: How do you install and set up dbt?
 To install and set up dbt (data build tool), follow these steps:
1). Install Python: Ensure Python is installed on your system. dbt requires Python 3.6 or
later.
2). Install dbt: Open your command line interface (CLI) and run the following command to
install dbt using pip, which is the Python package installer:
pip install dbt

3). Set up a dbt project: Create a new directory for your dbt project. Navigate to the project
directory in your CLI.
4). Initialize the project: Run the following command to initialize your dbt project:
dbt init
5). Configure your project: Open the dbt_project.yml file in your project directory and modify
it according to your project needs. This file contains project-level configurations such as the
target database, connection information, and plugins.
6). Set up your database connection: Open the profiles.yml file in your project directory and
configure your database connection details, including the database type, host, port,
username, password, and database name.
7). Test the setup: Run the following command to test your dbt installation and project
setup:
dbt debug
If everything is set up correctly, you should see debug information about your dbt project
and database connection.
With this, you have now installed and set up dbt. You can start using dbt to build, test, and
deploy your data models.
Question 1.3: What is the purpose of dbt models?
 Models in dbt are SQL scripts that define transformations or aggregations on the
data. They can be used to create new tables, views, or materialized views, and they
serve as building blocks for data analysis.
Question 1.4: Explain the concept of "sources" and "seeds" in dbt.
 Sources refer to external data tables that are used as inputs to dbt models. Seeds,
on the other hand, are a way to define static or reference data that can be used
within the dbt project.
2). Intermediate Level.
Question 2.1: How does dbt handle schema migrations?
 dbt allows for easy schema migrations by using the concept of "ref" and "source" in
model definitions. It tracks changes to models and supports incremental changes to
the data warehouse schema.
Question 2.2: What are the different types of dbt hooks, and when would you use
them?
 dbt hooks are SQL scripts that are executed at specific points during the dbt
lifecycle. They can be pre-hooks (before a model is built), post-hooks (after a model
is built), or on-run-hooks (before and after running specific dbt commands).
Candidates should explain use cases for each hook type.
Question 2.2: How do you handle incremental or time-based data loads in dbt?
 Incremental data loads can be handled using dbt's "merge" functionality, which
enables the comparison of source data with target tables to perform inserts, updates,
or upserts based on specific columns.
Question 2.2: Can you explain how dbt macros work?
 Macros in dbt are reusable pieces of SQL code that can be shared across multiple
models. They help in simplifying complex logic, promoting code reusability, and
adhering to best practices.
3).Advanced Level.
Question 3.1: How do you optimize dbt performance?
Optimizing dbt performance is crucial for efficient data transformation. Here are a few
strategies to improve dbt's performance:
 Incremental models: Utilize incremental models to only process and transform new
or changed data. This reduces unnecessary processing and improves overall
performance.
 Caching: Configure dbt's caching feature to store the results of previously executed
models. This helps avoid repetitive computations and speeds up subsequent runs.
 Materialized views: Leverage materialized views to precompute and store the results
of complex or frequently used queries. Materialized views provide faster access to
aggregated or derived data.
 Query optimization: Analyze and optimize the SQL queries used in dbt models.
Consider indexing columns used for joins and filtering conditions, optimizing
subqueries, and using appropriate query techniques based on the underlying
database.
By implementing these performance optimization techniques, you can significantly enhance
the speed and efficiency of dbt transformations.
Question 3.2: What is the importance of testing in dbt, and how would you write tests
for dbt models?
 The importance of testing in dbt lies in ensuring the accuracy, reliability, and quality
of data transformations. Testing helps validate data integrity, compliance with
business rules, and prevention of regressions. To write tests for dbt models, you can
use the built-in testing framework provided by dbt, utilizing the test macro to define
tests based on specific requirements such as column presence, data types,
relationships, or values.
Question 3.3:Can you describe the process of integrating dbt with a version control
system?
Integrating dbt with a version control system (VCS) allows for effective collaboration, code
management, and tracking of changes in your dbt project.
Here's a step-by-step process to integrate dbt with a VCS:
 Set up a version control repository: Choose a VCS platform (e.g., Git, GitHub,
GitLab) and create a new repository to store your dbt project's code.
 Initialize dbt as a Git repository: Navigate to your dbt project's root directory in your
command-line interface or terminal.
Run the following commands:
git init
git add .
git commit -m "Initial commit"
 Connect your local repository to the remote repository: Link your local Git repository
to the remote repository you created on the VCS platform.
Run the following command, replacing with the URL of your remote repository:
git remote add origin <remote-repo-url>
 Push your local repository to the remote repository: Upload your local dbt project
code to the remote repository using the following command:
git push -u origin master
 Collaborate and manage changes: With the integration complete, you can now
collaborate with your team on the dbt project. Each team member can clone the
repository, make changes in their local environment, and use Git commands (git add,
git commit, git push) to push their changes to the remote repository.
 Branching and pull requests: Utilize Git branching strategies to work on separate
features or experiments. When ready to merge changes, team members can create
pull requests on the VCS platform, allowing for code review and seamless integration
of changes into the main branch.
By integrating dbt with a version control system, you establish a structured and
collaborative development environment, enabling effective teamwork, change tracking, and
the ability to roll back changes if necessary.
Question 3.4: Have you worked with dbt packages? Explain their purpose and how to
use them.
 dbt packages are reusable collections of dbt code, such as models, macros, and
tests, that can be shared and used across projects. Candidates should discuss how
to install, use, and create dbt packages.
More Practice Questions.
1). What are the benefits of using dbt?
2). What are the different types of dbt models?
3). How do you write a dbt model?
4). How do you run dbt?
5). How do you use dbt to handle data quality issues?
6). How do you use dbt to manage data lineage?
7). How do you use dbt to deploy changes to production?
8). How do you use dbt to test your data pipelines?
9). How do you use dbt to collaborate with other data engineers?
10). How do you use dbt to create custom macros?
11). How do you use dbt to integrate with other data tools?
12). How do you use dbt to automate your data workflow?
13). How do you use dbt to scale your data engineering efforts?
14). How do you use dbt to create a data-driven culture?
These are just a few examples of essential dbt interview questions. The specific questions
you will be asked will depend on the role you are interviewing for and the experience level
of the interviewer. However, these questions should give you a good starting point for
preparing for your interview.
In addition to these technical questions, you may also be asked behavioral questions about
your experience with dbt.
These questions will assess your skills and abilities in areas such as collaboration,
communication, and problem-solving. Be sure to practice answering these types of
questions as well.
If you are getting started with dbt, here is some of the resources you might find helpful:

What Are DBT Sources

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

What Are DBT Sources

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Are DBT Sources

Uploaded by

Copyright:

Available Formats

dbt Cloud features

dbt Cloud CLI

dbt Cloud IDE

Schedule and run dbt jobs

Host & share documentation

Supports GitHub, GitLab, AzureDevOPs

Enable Continuous Integration

dbt Semantic Layer*

Using defer in dbt Cloud

By default, dbt follows these rules:

Defer in the dbt Cloud IDE

PUBLIC PREVIEW FUNCTIONALITY

 Secure credential storage in the dbt Cloud platform.

Install dbt Cloud CLI

View a video tutorial for a step-by-step guide to installation.

 Windows (native executable)

 Linux (native executable)

 Existing dbt Core users (pip)

1. Verify that you don't already have dbt Core installed:

brew untap dbt-labs/dbt

o Then, add and install the dbt Cloud CLI as a package:

brew tap dbt-labs/dbt-cli

If you have multiple taps, use brew install dbt-labs/dbt-cli/dbt.

Update dbt Cloud CLI

 Existing dbt Core users (pip)

 Install it using Homebrew along with dbt Core.

PUBLIC PREVIEW FUNCTIONALITY

Configure the dbt Cloud CLI

1. Ensure you meet the prerequisites above.

o North America: https://cloud.getdbt.com/cloud-cli

o Mac or Linux: ~/.dbt/dbt_cloud.yml

The config file looks like this:

Set environment variables

1. Select the gear icon on the upper right of the page.

Use the dbt Cloud CLI

USE THE --help FLAG

 dbt --help: Lists the commands available for dbt

About the dbt Cloud IDE

dbt Cloud IDE features

- Unsaved (•) — The IDE detects unsaved changes to your file/folder

- Create or change branches

- Option-Command-Down arrow or Ctrl-Alt-Down arrow

dbt autocomplete New autocomplete features to help you develop faster:

- Use ref to autocomplete your model names

Access the Cloud IDE

Set up your developer credentials:

Nice job, you're ready to start developing and building models 🎉!

Build, compile, and run projects

Fix in the dbt Cloud IDE

For more info on gitignore syntax, refer to the Git docs.

4. Save the changes but don't commit.

o target, dbt_modules, dbt_packages, logs

8. Restart the IDE again using the same procedure as step 5.

 Edit in main branch

 Unable to edit main branch

5. Commit (save) the file.

Refer to dbt pricing plans for more details.

IDE user interface