What Is Analytics Engineering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Table of Contents 

The Analytics Engineering Guide


 Start Here

What is analytics engineering?

What is analytics engineering?

CLAIRE CARROLL
Originally published on 2019-10-16

Analytics engineers provide clean data sets to end users, modeling data in a way that
empowers end users to answer their own questions. Here are the market trends that
gave rise to the newest role on modern data teams.

A year ago, I was preparing a presentation for an event and the title slide asked me to fill
in my role. I had been hired as a “Data Analyst”, and when I started the role, I spent my
time doing normal data analyst things. I pulled data for finance and marketing, analyzed
trends and generated insights, and spent lots of time in Excel and Looker.

But my role had been changing dramatically. Finance and marketing were able to run
their own reports. So a normal day for me involved preparing data for analysis by writing
transformation and testing code, and writing really good documentation. My tools were
no longer Excel and Looker, they were iTerm, GitHub, and Atom.

Was I still a data analyst?

I left the slide blank for the moment, and just before the event, I filled in: “Claire Carroll –
Data Something.”

Since then, the industry has begun to adopt a title for what I was attempting to describe
– analytics engineer.

What is an analytics engineer? #


Analytics engineers provide clean data sets to end users, modeling data in a way that
empowers end users to answer their own questions. While a data analyst spends their
time analyzing data, an analytics engineer spends their time transforming, testing,
deploying, and documenting data. Analytics engineers apply software engineering best
practices like version control and continuous integration to the analytics code base.

When did analytics engineering become a thing? #

The traditional data team


If you were on a “traditional data team” pre 2012, your first data hire was probably a data
engineer. You needed this person to build your infrastructure: extract data from the
Postgres database and SaaS tools that ran your business, transform that data, and then
load it into your data warehouse.
You would then hire a data analyst to build dashboards and reports on top of this data.
Analysts, like me, would maintain a mess of SQL files with names like
monthly_revenue_final.sql, or maybe just bookmark their queries in a SQL web editor.
Often we would need to supplement data in the warehouse with fancy Excel work.

The people consuming the data–CEOs, Marketing VPs, CFOs–would receive monthly
reports, request special analysis as-needed, and send analysts a never-ending stream of
requests to “segment by this” or “cut by that” or “hey, we’ve updated our definition of
‘account’”.

Being a data analyst was a hard and thankless job, and it didn’t have a ton of
leverage. Because of this, it was often a junior role, one where you “did your time” and
then moved on to something else.

What happened to the traditional data team?


Since 2012, there have been huge changes in the data tooling landscape:

Cloud-based data warehouses (Redshift, followed by BigQuery and Snowflake) made


data storage and processing affordable and fast.
Data pipeline services (ex: Stitch, Fivetran) turned data extraction into work that took
only a few clicks
Business intelligence (BI) tools (ex: Looker, Mode, Chartio) increased ability for
stakeholders to be self-service.

By 2016, it had never been easier to get data into a warehouse in a raw form, and for
stakeholders to build reports on top of the data.

As data tools changed, so did the people who used them. People who weren’t on data
teams began developing data literacy. This was good: business users wanted to self-
serve and be data-driven. The downside was that these people often knew just enough
SQL to be dangerous. If you’ve ever been to a meeting where two executives have
different numbers for the same metric, you’ve experienced the result of this.

The solution: transform the raw data into a shape that’s ready for analytics. At the
time, there were only two widely-used options:

Looker’s Persistent Derived Tables


Get a data engineer involved

The first was easy enough for anyone with SQL skills and a Looker license to manage, but
created a host of maintenance issues. The second meant waiting in a data engineering
queue that could take…a long time.

This is when dbt entered the market.

The modern data team


dbt is the transformation layer built for modern data warehousing and ingestion tools.
Built around SQL, dbt puts the transformation layer firmly within the domain of data
analysts.

Today, if you’re a “modern data team” your first data hire will be someone who
ends up owning the entire data stack. This person can set up Stitch or Fivetran to start
ingesting data, keep a data warehouse tidy, write complex data transformations in SQL
using dbt, and build reports on top of a clean data layer in Looker, Mode, Redash, etc.

This job is neither data engineering, nor analysis. It’s somewhere in the middle, and it
needed a new title. Starting in 2018, we and a few of our friends in the Locally Optimistic
community started calling this role the analytics engineer.

Analytics engineers deliver well-defined, transformed, tested, documented, and code-


reviewed data sets. Because of the high quality of this data and the associated
documentation, business users are able to use BI tools to do their own analysis while
getting reliable, consistent answers.

It turns out, your company can get pretty far with a single analytics engineer working as
a data team of one supporting a whole business. But for those companies that need a
larger data team, how does this team structure scale? Do you simply hire another
analytics engineers? Or do you diversify?

In our experience, we see team members start to become more specialized, with roles
that align more closely with those that we started with. Depending on your needs your
next hire may be a data engineer, or a data analyst.
Here’s how I think about the different roles on modern data teams in larger
organizations:

The lines between these roles are blurry – some analytics engineers might spend time
doing analyst work like deep dives, while others might be comfortable writing
production level Python code but realize doing so often isn’t the highest leverage use of
their time.

The term “analytics engineer” is pretty new, and a lot of people doing analytics
engineering work don’t go by this title (I didn’t a year ago!). So how do you know if you’re
an analytics engineer?

On the surface, you can often spot an analytics engineer by the set of technologies they
are using (dbt, Snowflake/BigQuery/Redshift, Stitch/Fivetran). But deeper down, you’ll
notice they are fascinated by solving a different class of problems than the other
members of the data team. Analytics engineers care about problems like:

Is it possible to build a single table that allows us to answer this entire set of business
questions?
What is clearest possible naming convention for tables in our warehouse?
What if I could be notified of a problem in the data before a business user finds a
broken chart in Looker?
What do analysts or other business users need to understand about this table to be
able to quickly use it?
How can I improve the quality of my data as its produced, rather than cleaning it
downstream?

Where is this headed? #


At a recent NYC meetup where 100 data professionals gathered to talk about analytics
engineering, one speaker compared the analytics engineer to a librarian—the person
who curates an organization’s data and acts as a resource who wants to make use of it. I
like this metaphor: the analytics engineer is a steward of organizational knowledge, not a
researcher answering a specific question. The analytics engineer curates the catalog
so that the researchers can do their work more effectively.

The tooling, the practice, and the organizational role of the analytics engineer are very
much evolving in real time. This title didn’t exist a year ago. Today when we put this topic
as the subject of a meetup we had over 100 attendees turn up, and we’re seeing more
and more job postings for this title every month. So: there’s a ton of traction in the
industry for this idea and this role, but we’re all very much figuring this out together in
real time.

While I may not have had the right words to describe my role a year ago, I knew dozens
of other individuals within the dbt Community whose roles aligned with mine, and who
had incredibly intelligent opinions on the space. That’s why the dbt community is so
valuable to me, personally, and to all of its members. All of us, together, are inventing a
new thing.

 Previous: Start Here Next: Why does it exist? 

dbt Learn on-demand

 A free intro course to transforming data with dbt



Enroll for free

dbt 

Company 

Resources 

Connect with us



Read the Roundup

© dbt Labs, Inc. All rights reserved.

Terms of Service
Privacy Policy
Security

You might also like