Cloud Data is Just Data (in the Cloud)
The Tri-Domain+ Model from Cloud Data Warehousing, Volume I

Cloud Data is Just Data (in the Cloud)

One key message from “Cloud Data Warehousing—Volume I: Architecting Data Warehouse, Lakehouse, Mesh, and Fabric” (available now) is that the data and information that is managed and processed in any cloud data warehousing pattern—cloud data warehouse, data lakehouse, data mesh, or data fabric—has fundamentally the same characteristics as the data and information that you have known and loved over the past half century. I introduced these a decade ago and have seen no need to change or expand the definitions of the different data/information types as we’ve moved ever faster to the cloud. There are three (and a half) different domains or types of data/information. More on that in a moment.

First, as hinted at in my previous article, a word on the difference between data and information. Many in our industry think that data is the source from which information is distilled, in order to gain insights, make decisions, and take action. The emergence of the phrase data driven has hugely worsened the problem. In fact, that is only half the story. Life starts with information.

Information is what people create—directly (by typing, speaking, or making a video, for example) or indirectly (such as by friending someone on Facebook or researching products on Amazon). It is their means of expressing themselves and for communicating with and relating to others. Data, on the other hand, or better, naked data, is a subset of information from which context has been stripped to the maximum extent. In the simplest terms:

Information - Context = Naked Data

I’m adding naked here to make clear the difference between the word data on its own, often used to mean different things, sometimes even to stand for information. As I often do myself…

You see from the equation that starting with information and stripping away the context (and hopefully storing it somewhere accessible and usable) gets you to data. The process is commonplace and reasonable. It’s called modeling and it creates what’s needed for calculations and computing.

As seen in the naming in the above figure, information is sourced from people, always. The context that’s stripped away is also informational in nature: context-setting information. I like to call it CSI because it is indeed a bit like crime scene investigation where the underlying circumstances and context is revealed. What it should not be called—although, unfortunately, it often is—is metadata. It’s not data, nor is only about data.

So, what about the data? Process-mediated data and machine-generated data, as the names show, come from two very different sources, and have many distinct characteristics. The former comes from business processes, designed and built to run and manage the business. It is often complex in structure and demands careful management and governance; after all, it is the legal foundation of the business. We have been creating it since the earliest days of computing in operational systems and copying it into data warehouses since the 1980s. The latter type of data is generated by machines. It is much simpler in structure and both fast and voluminous. Manufacturing, telecoms, and others have been creating it for decades, but it has acquired a whole new level of fame with the Internet of Things.

Find out more in my on-line seminar covering the material of this book and volume II over three mornings, 21-23 June. Book at Adept Events.

Al Amin

Digital Marketing Expert with 3+ Years of Experience in Social Media Marketing, Specially Skilled in YouTube video SEO and promotion, Amazon Book promotion, Google and Facebook Paid Ads.

1y

Congratulations Barry!!

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics