Faster Insights From Faster Data: Best Practices Report Q1
Faster Insights From Faster Data: Best Practices Report Q1
Faster Insights From Faster Data: Best Practices Report Q1
Q1 2020
By David Stodder
Co-sponsored by:
BEST PRACTICES REPORT
Q1 2020
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . 4
Technologies and practices for better
speed to insight Speed to Insight: About More than Just “Fast” . . . . . . . . 5
Data Literacy and the Analytics Culture . . . . . . . . . . . . . . 6
Recommendations . . . . . . . . . . . . . . . . . . . . . . 33
© 2020 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions
in whole or in part are prohibited except by written permission. Email requests or
Research Co-sponsor: Denodo . . . . . . . . . . . . . . . . 35
feedback to [email protected].
tdwi.org 1
Faster Insights from Faster Data
By banding together, sponsors can validate a new market niche and educate organizations
about alternative solutions to critical problems or issues. To suggest a topic that meets these
requirements, please contact TDWI senior research directors Fern Halper ([email protected]),
Philip Russom ([email protected]), and David Stodder ([email protected]).
Acknowledgments
TDWI would like to thank many people who contributed to this report. First, we appreciate the
many professionals who responded to our survey, especially those who agreed to our requests
for phone interviews. Second, our report sponsors, who diligently reviewed outlines, survey
questions, and report drafts. Finally, we would like to recognize TDWI’s production team: James
Powell, Peter Considine, Lindsay Stares, and Rod Gosser.
Sponsors
Ascend.io, Denodo, Matillion, SAS, and Wyn Enterprise by GrapeCity sponsored the research and
writing of this report.
2
Research Methodology and Demographics
Survey Methodology. In September 2019, TDWI sent an invitation via email Industry
to business and IT professionals in our database, asking them to complete Financial services 15%
an internet-based survey. The invitation was also posted online and in Consulting/professional services 13%
publications from TDWI and other firms. The survey collected responses Healthcare 9%
from 147 respondents. Not all respondents completed every single question; Manufacturing (noncomputers) 8%
however, all responses are valuable and so are included in this report’s sample. Retail/Wholesale/Distribution 7%
This explains why the number of respondents varies per question. Education 6%
Software 6%
Research Methods. In addition to the survey, TDWI conducted telephone
Government 4%
interviews with IT and business executives and managers, technical users, and
Transportation/logistics 4%
BI, analytics, AI, and data management experts. TDWI also received briefings
Other 28%
from vendors that offer products related to the topics addressed in this report.
(“Other” consists of multiple industries, each
Survey Demographics. Just over one-quarter of survey respondents are represented by 3% of respondents or less.)
business or IT executives and VPs (27%). The second-largest percentages are Geography
business or data analysts and data scientists (23%) and developers and data,
United States 65%
application, or enterprise architects (23%). Line-of-business (LOB) managers Europe 14%
and business sponsors account for 12% of the respondent population. “IT- Canada 6%
other” titles account for the same percentage (12%). Central or South America 5%
Financial services is the largest industry group (15%), followed by Africa 3%
consulting and professional services (13%), healthcare (9%), manufacturing Australia/New Zealand 3%
(noncomputers) (8%), and retail, wholesale, and distribution (7%). Respondents Asia/Pacific Islands 2%
from education and software industries each accounted for 6%, and government South Asia (India and Pakistan) 1%
respondents for 4%, the same as from transportation and logistics (4%). The Middle East 1%
tdwi.org 3
Faster Insights from Faster Data
Executive Summary
This TDWI Best Practices Organizations today place a high priority on fact-based, data-driven decision making. This makes
Report shows strong speed to insight a competitive advantage. If a retailer can analyze data to uncover a trend in
interest in achieving customer preferences before other retailers in the marketplace, they can gain an edge, potentially
faster insights to seize delivering higher market share, customer loyalty, and profitability. If an insurer or government
business opportunities, healthcare agency can use predictive models to detect a fraud scheme before it has a chance to do
but points out where significant damage, it can save costs and avoid public embarrassment.
organizations need to
address technology and There are many more cases where faster insights can be beneficial. However, achieving faster
practice weaknesses. insights can only happen if delays and bottlenecks that exist throughout data life cycles are
addressed using better practices and modern technologies. This TDWI Best Practices Report
examines where organizations are coming up against barriers to getting relevant data from
sources into the right condition for analytics, for developing artificial intelligence (AI) programs
such as machine learning to discover insights, and for delivery to the array of users who need
insights in time to solve business problems.
Some of the challenges relate to how organizations put together project development teams and
determine deliverables. Setting project objectives is a challenge; TDWI research finds that just
9% of organizations surveyed regard themselves as very successful in identifying value measures
and quantifiable objectives (see Figure 4 in this report). As projects move toward deliverables,
less than half of those surveyed regard their organizations as either good or excellent at testing
prototypes and developing proofs of concept. To address these weaknesses, organizations are
implementing agile, DataOps, and other methods to help them better organize projects and move
faster to create value.
Technology advances are also key. This report discusses how organizations can reduce
latency in data preparation, transformation, and development of data pipelines. It details how
organizations could be using data catalogs, metadata repositories, and data virtualization
more effectively, including for governance. With expense and scalability identified by research
participants as two main issues they face, it is not surprising that cloud-based data management,
integration, transformation, and development are popular. This report discusses how moving to
the cloud solves some problems but spotlights other issues, such as governance and finding the
right balance between centralization and self-service environments.
Data itself is getting faster as organizations begin to analyze new sources including streaming
data coming from sensors, websites, mobile devices, geolocations, and more. The report finds
that some organizations use streaming, real-time analytics and AI to automate decisions and
deliver actionable recommendations to users. TDWI recommends that organizations focus on
well-defined objectives and also devote attention to their big-picture strategy to avoid letting
complexity slow innovation.
4
Speed to Insight: About More than Just “Fast”
Nearly everyone agrees that if business executives, managers, and frontline personnel do
not have to wait for the most current insights—or be forced to use yesterday’s or last month’s
information when they really need the very latest—they could make more timely decisions, seize
fleeting opportunities, and serve customers and partners more effectively. Yet, the rub is that
faster is not better if the data and information are not accurate, complete, or fit for the purpose.
To be sure, for some use cases such as data science, exploratory analytics, and AI, the faster
that users and AI programs can get access to raw, live data that’s just been recorded or streamed
in real time, the better. However, for most users and applications, our research finds that data
quality, accuracy, completeness, and relevance are more important than just pure speed. Users
want to know if they can trust the data; they often need to know where it came from, how it
relates to other data, and how it has been transformed and aggregated. Before the data can flow,
organizations also need to ensure that it is secure and governed in accord with regulations.
This TDWI Best Practices Report examines experiences, practices, and technology trends
that focus on identifying bottlenecks and latencies in the data’s life cycle, from sourcing and
collection to delivery to users, applications, and AI programs for analysis, visualization, and
sharing. Reducing time to insight depends on applying technologies and appropriate practices
that improve matters at each phase in the life cycle. Organizations must take into account the
type of user and their context; “fast” and “real time” can have different meanings depending
on these factors. For some, just getting the data or insights at the time that they need it is the
equivalent of real time (what some used to call “right time”); for others, real time means reducing
latency to the smallest possible interval between the data’s creation and its availability for
analysis and visualization.
Through AI-driven augmentation, BI and analytics solutions are evolving to offer recommendations
about data sets users might explore, visualizations to use, and, ultimately, decisions or actions to
take. Such recommendations can result in faster insights, which can translate into more responsive
and proactive engagement with customers and partners as well as strategies for success in the
marketplace, supply chains, and other business contexts. With the appropriate stack of technologies
to support it, users can experience right-time or actual real-time dashboards in contexts that focus
attention on particular key performance indicators and other metrics. These can be supplemented
with prescriptive, AI-driven recommendations based on the data.
AI and advanced analytics on top of streamed, real-time data feeds enable organizations to spot
trends and patterns, apply predictive insights, and potentially automate responses to situations,
particularly those in fraud detection, securities trading, e-commerce, emergency healthcare, and
tdwi.org 5
Faster Insights from Faster Data
population health where instantaneous response is critical. Solutions that employ AI techniques
such as machine learning can help organizations profile, transform, and enrich real-time data
as it flows through pipelines so these steps are faster and more efficient than with traditional
extract, transform, and load (ETL) technologies.
AI techniques are being combined with other technologies to accelerate data processing, access,
and interaction. AI is becoming useful for governance by enabling organizations to learn
more, faster about new big data and improving tracking to ensure that data use adheres to
governance rules and policies. AI’s contribution to data lineage tracking can help with other data
management requirements such as performance and concurrency that demand monitoring of
what data is being used and shared in preparation and pipeline processes.
Analytics cultures depend on raising the data literacy of individuals in the organization. Users
may have great BI and analytics tools, but if they struggle to understand data, visualizations,
and analytics and cannot effectively interact with and share data and insights, the tools will not
be enough. As part of our research for this report, TDWI asked participants how they would rate
the overall data literacy of users in their organizations (see Figure 1). The highest percentage
say their organizations are “about average” (42%) in terms of users’ ability to consume,
analyze, interact with, share, and discuss data in the course of carrying out their roles and
responsibilities. Nearly the same percentages rate their organizations’ data literacy as “somewhat
high” (22%) and “somewhat low” (24%).
How would you rate the overall data literacy of users in your organization: that is, their ability
to consume, analyze, interact with, share, and discuss data in the course of carrying out their
roles and responsibilities?
Very high 7%
Somewhat high 22%
About average 42%
Somewhat low 24%
Very low 3%
Don't know 1%
6
Leadership, Organizational, and Project Challenges
Growth in data literacy can increase an organization’s overall speed to insight and accelerate
innovation with data. Personnel will be better prepared to contribute to projects for development
of analytics, data-driven applications, and AI. This leads us into the first section of research
results, which focuses on how organizations put together leadership, teams, and project
development processes. Along with data literacy, these issues can either propel or thwart an
organization’s progress toward building value from data.
For this TDWI Best Practices Report, we asked research participants to rate how well stakeholders
in their organizations collaborate to accomplish key steps in the life cycles of BI, analytics, AI,
and data integration and management projects (Figure 2). Only a small percentage gave their
organizations “excellent” ratings for any of the steps. Organizations surveyed appear strongest
in identifying relevant data sources (65% combined excellent and good ratings) and identifying
opportunities to achieve business benefits (61% combined). This suggests a relationship between
the two: that is, stakeholders believe that if they can access and analyze certain data sources
properly, it will lead to better business outcomes.
How would you rate your organization’s collaboration between business stakeholders, IT,
developers, data scientists, and analysts to accomplish the following steps in the life cycle of
projects for BI, analytics, AI, and data integration and management? Excellent
Good
Identify relevant data sources 15% 50% 22% 11% 2%
Fair
Identify opportunities to achieve business
benefits
12% 49% 27% 12% Poor
Connect analytics to product/service Don't Know/NA
innovation
11% 36% 29% 21% 3%
Figure 2. Based on answers from 147 respondents. Ordered by highest combined “excellent” and
“good” responses.
tdwi.org 7
Faster Insights from Faster Data
Significant but smaller percentages of respondents gave their organizations good or excellent
ratings for providing sustained leadership (47% combined good and excellent) and articulating
strategy to attract project sponsorship (44% combined). Concerted leadership is key to keeping
projects going and dealing with both short-term needs and long-term objectives. Project teams
need to include personnel who can interpret business needs and articulate strategic vision so
leaders sustain sponsorship of projects and provide appropriate funding. Leadership often asks
teams to quantify anticipated return on investment (ROI) or business outcome; however, only
33% of research participants view their organizations as either good or excellent in doing this.
The majority of organizations struggle with getting projects to the point of operationalization.
Less than half see collaboration to drive adoption and operationalization as a strength, with only
10% checking “excellent” and 32% calling it “good.”
Length of development before getting to actionable results varies. In 2017, TDWI asked
research participants about the length of their development, testing, and deployment cycles. We
asked the same question for this report to see if cycles were any faster or slower. The results are
fairly consistent between then and now; the fastest 15% say their cycles are two weeks or less,
which is the same percentage as in 2017; in both, the slowest 20% say their cycles take more than
four months. Both then and now, 17% of research participants say their cycles take 15–30 days.
For cycles that take beyond 30 days, the comparison shows that organizations are becoming a
little faster, with more in the 31–60 day range than previously (18% for this survey compared to
15% in 2017). Overall, however, it appears that the time it takes for development, deployment,
and testing cycles to finish has not appreciably changed.
One factor that may help reduce cycle times is the use of agile, DevOps, and DataOps
methodologies, which will be discussed in greater length in a moment. When we filter the results
to look only at organizations that are using these methodologies, we see that development cycle
times are less varied across the spectrum. The bulk of respondents’ cycles are under 90 days, with
34% of this selected group completing cycles in 30 days or less.
8
Leadership, Organizational, and Project Challenges
typically have less clarity; data scientists, analysts, and business subject matter experts need to
explore data, try different variables and combinations of data sources, and so on. Organizations
are finding that agility is an important attribute of data-intensive development projects—and so,
not surprisingly, agile methods have become popular.
We asked research participants what methodologies, if any, their organizations are using for
BI, analytics, AI, and/or data integration and management projects (Figure 3). Agile methods
are clearly the most popular, with 78% of organizations noting that they use them compared to
waterfall methods (46%).
Agile methods such as kanban have been part of software development culture for nearly 20
years and build on a long history of software engineering efforts to streamline development,
increase collaboration among stakeholders, and improve quality and ROI. About one-quarter
(24%) are using lean, a method related to agile that is inspired by the Toyota Production System
methodology. About half as many (13%) are using process models such as Capability Maturity
Model Integration (CMMI) and ISO 9000 quality management systems standards.
Is your organization using any of the following methodologies for BI, analytics, AI, and/or data
integration and management projects?
Agile 78%
Waterfall 46%
DevOps 40%
Lean 24%
Design thinking 18%
DataOps 15%
Process models (e.g., CMMI, ISO 9000) 13%
Data vault 7%
Other 7%
Figure 3. Based on answers from 136 respondents. Respondents could select all answers that apply.
A considerable percentage of organizations surveyed use DevOps practices (40%), which extend
agile methods to increase automation of repeatable software development tasks and overall
enable faster and more reliable releases. DevOps, like agile, aims at facilitating collaboration
between stakeholders—in this case, between developers (“Dev”) and operations (“Ops”)
personnel such as IT systems engineers, database administrators (DBAs), and network and
security personnel. DevOps and agile practices bring stakeholders together to work in self-
organizing (rather than strictly IT-led) teams that meet continuously to test iterations and
produce deliverables regularly instead of waiting until the end of a waterfall cycle.
Design thinking is helpful for capturing human factors. Almost one-fifth (18%) are using
design thinking, a discipline that builds on methods that have been used by organizations to
learn human factors in customer experiences and spur innovation to improve experiences.
Organizations are applying design thinking to bring greater fulfillment to internal customers,
i.e., users of BI and analytics applications.
tdwi.org 9
Faster Insights from Faster Data
Rather than focus on traditional requirements gathering for data access, query needs, and
other technical aspects, design thinking helps development teams understand human and
emotional factors that often have the biggest effect on the success or failure of applications.
Design thinking’s five phases—Empathizing, Defining, Ideating, Prototyping, and Testing—can
complement agile and DevOps methods.1
DataOps for increasing collaboration and agility. DataOps, as the term suggests, picks up
ideas from DevOps to apply them to how an organization’s developers, IT, business, and other
stakeholders can improve collaboration throughout data life cycles. DataOps is a relatively recent
term; Figure 3 shows that currently just 15% of organizations surveyed are currently using this
method. A central problem it addresses is the delays and inefficiency that exist due to poorly
integrated development, transformation (ETL and ELT), data quality, and enrichment steps as
well as governance and other data life cycle phases.
With organizations needing to integrate these complex and interdependent phases into data
pipelines that serve analytics and AI development, eliminating inefficiency and applying
automation is critical. DataOps can offer a framework that enables organizations to get the big
picture of multiple and diverse data life cycles and set in motion continuous improvement cycles
for how they source, profile, transform, and ultimately deliver data for end purposes. DataOps
can also facilitate better stakeholder collaboration and teamwork to achieve outcomes. With
shared principles and approaches, DataOps can complement use of agile methods and the use of
design thinking practices.
• Organizations still have progress to make in setting project objectives, measuring progress,
and organizing themselves to achieve deliverables. Only 9% of organizations surveyed are
very successful in identifying value measures and quantifiable objectives, while 26% are
somewhat successful. Only about one-quarter (23%) are either very or somewhat successful
in scheduling clear-cut deliverables from easiest to most difficult. The majority also fall
short in defining strategy and following it within alignment with corporate objectives; 38%
are successful at this objective.
1
10 For more, see TDWI Checklist Report: Using Design Thinking to Unleash Creativity in BI and Analytics Development, online at tdwi.org/checklists
Technologies and Cloud Services in Use
How successful is your organization in achieving the following objectives in its BI, analytics,
AI, data integration, and data management projects?
Deliver data insights that provide business Very successful
value
11% 40% 30% 14% 4% 1
Somewhat successful
Automate repetitive tasks 9% 38% 26% 16% 10% 1 Average
Adapt existing analytics workflows 9% 34% 35% 14% 7% 1 Somewhat unsuccessful
Not successful
Learn from mistakes and continuously
improve
9% 31% 34% 15% 9% 2 Don't know/NA
Encourage reuse of queries, models, and
workflows
11% 28% 39% 14% 7% 1
Define and follow strategy that aligns
corporate objectives
9% 29% 37% 14% 9% 2
Collaborate and communicate to share
knowledge and feedback
9% 26% 39% 16% 9% 1
Identify value measures and quantifiable
objectives
9% 26% 34% 22% 7% 2
Know what decisions will be based on the
analytics
6% 27% 40% 19% 6% 2
Version control, test, and monitor quality
of analytics models and code
4% 29% 32% 20% 12% 3
Monitor data collection, analytics
modeling, and insight delivery processes 7% 25% 31% 24% 11% 2
End-to-end process efficiency from data
collection to delivery 4% 22% 41% 21% 12%
Schedule clear-cut deliverables from
easiest to most difficult 4% 19% 40% 19% 14% 4%
Figure 4. Based on answers from 140 respondents. Ordered by highest combined “very successful”
and “somewhat successful” responses.
• Almost half of organizations surveyed are successful at automating repetitive tasks (47%
combined very and somewhat successful). Fewer are succeeding at encouraging reuse of
queries, models, and workflows (39% combined). About one-third successfully monitor data
collection, analytics modeling, and insight delivery processes (32% combined). Automation,
reuse, and monitoring processes are factors that can lead to better quality and efficiency.
Only about one-quarter regard their organizations as successful with end-to-end process
efficiency from data collection to delivery (26% combined “very” and “somewhat” successful).
• Adaptability is important in analytics workflows because users may want to add new data
sources or make iterative adjustments as they explore data and evaluate models. About
one-third of respondents (34%) say their organizations are somewhat successful with
adaptability and just 9% are very successful; 35% say they are average.
tdwi.org 11
Faster Insights from Faster Data
To get a sense of the state of technology implementation, we asked research participants how
reliant their organizations’ decision makers are currently on various tools, platforms, and cloud-
based services. After spreadsheets, which are always the most commonly used tools, participants
are most reliant on data warehouses on premises; 33% are very reliant and 31% are somewhat
reliant on them (figure not shown).
Organizations are using a About the same percentages rely on supporting ETL, ELT, change data capture (CDC), or data
variety of data integration integration systems. About half (51%) rely on data pipelines and data preparation tools. Slightly
and transformation more than a quarter (28%) say their organizations’ decision makers rely on data virtualization
technologies to layers or virtual databases. These technologies enable organizations to access data from multiple
meet equally varied sources without the delays involved in physically moving and consolidating the data. Data
data timeliness and virtualization solutions do not store data; they federate queries for execution at the sources. In
delivery needs. contrast to ETL and CDC technologies, which require time for data movement, transformation,
and determination of the changed data, virtualization offers real-time data access.
Data orchestration also highlights where automation could improve efficiency and effectiveness
in those activities and organize their scheduling to fit business objectives. This is critical as
organizations seek to reduce latency and want to get value from near or true real-time data sources.
Our research sees reliance on cloud-based data warehouses as currently less common, perhaps
showing the relative immaturity of this option and reflecting the challenges of migrating to the
cloud. Just 12% of respondents say their decision makers are very reliant on cloud-based data
warehouses and 23% are somewhat reliant on them. Similar percentages of respondents say their
organizations’ decision makers are either very reliant (11%) or somewhat reliant (27%) on cloud-
based BI, online analytical processing (OLAP), or analytics services (provided via software-as-a-
service, i.e., SaaS).
Both IT-centric and departmental BI and analytics remain alive and healthy in organizations
surveyed. Almost two-thirds of participants (63%) say their decision makers are either very
or somewhat reliant on BI reporting and/or an OLAP system managed by central IT. Over half
(54%) say they rely on departmentally managed versions of these systems. About half (49%)
of organizations surveyed say users rely on embedded dashboards or analytics in business
applications, such as CRM, SFA, or ERP.
It is likely that this variety of IT-centric, departmental, and embedded BI and analytics tools
will continue to be part of organizations’ data interaction environments, even as SaaS and
other cloud services are adopted. Organizations should adopt data architectures that balance
centralization requirements for governance and data quality with demand for user and
departmental control of self-service capabilities.
More organizations are using or plan to use data catalogs, glossaries, and metadata
repositories. In a 2016 TDWI Best Practices Report survey, only 17% of research participants said
their organizations used these systems, which can provide efficient and often better centralized
ways of gathering metadata, documenting data definitions, and providing other descriptive,
12
Technologies and Cloud Services in Use
location, and origination information about the data.2 In the survey for this report, we find that
about double the percentage of organizations surveyed (36%) are using these systems.
A slightly larger percentage (39%) are using master data management (MDM), which is both a
process and technology system for gathering definitions and knowledge about data resources that
are related to higher-level entities such as customers or products. If they are up-to-date, accurate,
and comprehensive, data catalogs, glossaries, metadata repositories, and MDM can speed insight by
making it easier for all types of users to more easily find related data, integrate it, and analyze it.
One-third are using real-time alerting and analytics. Alerting and notification are valuable, Real-time alerting and
particularly in operational use cases where managers and frontline personnel monitoring a analytics are valuable for
process need to know immediately about changes in the data or when situations arise that operational use cases;
demand immediate attention. Alerts and notifications may be delivered in the context of metrics, just over one-third of
key performance indicators, predictive analytics, or as embedded functionality in business organizations surveyed
applications and processes to make it easier for personnel to determine what action to take. One say decision makers rely
of the key challenges with alerts and notifications is making sure they are important, timely, on these technologies.
and relevant; otherwise, personnel can suffer “alert fatigue” and stop paying attention to them
because it is unclear whether or why they matter.
Real-time analytics, which will be discussed later in this report, can be critical to developing
smarter alerts and notifications so that “faster” does not just result in too much information
for personnel. An objective of real-time analytics is to analyze data as it is received to find
significant patterns, anomalies, and trends. Alerts and notifications can pick up these insights
and inform personnel who need to know and are accountable for taking action. Analytics insights
should be presented within the context of the receiver’s responsibilities. Our research finds that
just over one-third (36%) of research participants say decision makers in their organizations rely
on real-time alerting and/or analytics, although only 6% are “very” reliant.
To find the right balance, it is helpful to identify personas: the roles, data and analytics needs,
common likes and dislikes, data access and sharing authorizations, and daily decision-making
challenges of different types of users. Then, development teams can have a better understanding of
the requirements that different types of users have. Organizations can bring stakeholders together
in a center of excellence (CoE) or governance committee to make sure that the defined personas are
accurate, assess whether new definitions are needed (such as for external partners and customers),
and plan how to make improvements to the personas’ data and analytics experiences.
We asked research participants about the satisfaction levels of different types of personnel with
their ability to access the data and information they need, when they need it, for analytics,
visualization, or other data consumption (Figure 5). Business and data analysts are the most
satisfied, with 11% of respondents saying these personas are very satisfied and 55% saying they are
somewhat satisfied.
2
TDWI Best Practices Report: Improving Data Preparation for Business Analytics, Q3 2016, page 10, online at tdwi.org/bpreports tdwi.org 13
Faster Insights from Faster Data
Thinking of your organization’s most recent projects, how satisfied are the following types of
personnel with their ability to access the data and information they need when they need it
for analytics, visualization, or other data consumption?
Very satisfied
Business and data analysts 11% 55% 22% 10% 2
Somewhat satisfied
Somewhat unsatisfied Business users 13% 49% 25% 12% 1
Not satisfied Internal application developers 10% 47% 22% 7% 14%
Don't know/NA
Data engineers 12% 34% 27% 13% 14%
Figure 5. Based on answers from 136 respondents. Ordered by highest combined “very satisfied” and
“somewhat satisfied” responses.
These personas, of course, can be highly varied. Some business and data analysts are experienced
in using BI and analytics tools and working closely with the data while others are not technical and
do not have the time or interest to acquire programming, data integration, transformation, and
other skills. Typically, they are responsible for providing trusted and relevant subject matter data,
visualizations, and analytics to business executives and managers. In other words, easier, more
automated access to trusted data is essential for business and data analyst personas.
Data scientists at organizations surveyed are less satisfied than business and data analysts (42%
combined “very” and “somewhat” satisfied). They also need tools and practices that enable
them to prepare data faster so they can focus on analytics, machine learning development, and
providing insights to business leaders. However, along with ease of use they need flexible access to
many types and sources of data as well as readily available compute and processing power to test
analytics models and algorithms.
Research participants surveyed indicate that business users in their organizations are also
reasonably satisfied, with 13% very satisfied and 49% somewhat satisfied. Business users are
generally data and analytics consumers, not developers; ease of use and data relevance, including
timeliness, are critical. Data engineers are at the other end of the spectrum; they possess technical
skills and knowledge about the data. A little under half of research participants say data engineers
in their firms are satisfied (46% combined “very” and “somewhat” satisfied). They require
technologies and practices for preparing, integrating, and processing data faster, orchestrating and
operationalizing BI and analytics, and improving data quality.
Self-service and
Importance of Self-Service and Embedded Analytics
embedded analytics
continue to be high TDWI research finds that organizations continue to regard achieving higher levels of user self-
priorities as organizations reliance as a very high priority; 53% of research participants say it is very important and 36% say
seek to empower decision it is somewhat important. (No figure shown.) Self-service capabilities enable analysts and users
makers with relevant and closest to business questions that need answering to shape their data interaction and visualization.
timely data insights. Modern self-service technologies use automation, AI-driven recommendations, notifications, and
14
Technologies and Cloud Services in Use
intuitive interfaces to relieve decision makers of either knowing the intricacies of how to set up
data interaction themselves or going to IT developers for every need. They can then move faster to
develop relevant data insights.
Self-service analytics and visualization are becoming more mainstream in embedded reporting and
data interaction functionality. Embedded (sometimes called “inline”) BI and analytics functionality
is critical for users who lack the skills, time, and interest to work with dedicated tools and are more
comfortable staying within their business application such as CRM, SFA, ERP, a mobile app, or
specialized, vertical industry software solution.
Self-service technologies now enable not just third-party solution developers but also
organizations’ developers themselves to embed analytics in cloud-based services for customers
and business partners as part of strategies to monetize data and analytics. Organizations may also
develop such services for internal employees, sometimes supplementing access to internal data
with access to external syndicated data about customers, suppliers, or other subjects of interest.
We asked research participants if their organizations currently embed analytics into any of the
applications or systems listed in Figure 6. We can see that the largest percentage embeds analytics
in dashboards or reports (72%), followed by performance management KPIs or scorecards (51%).
Does your organization currently embed analytics into any of the following types of
applications or systems?
In dashboards or reports 72%
In performance management KPIs or scorecards 51%
In operational systems 28%
In CRM, SFA, or marketing management 23%
In mobile applications 22%
In data pipeline development 21%
In business process management 21%
In externally facing websites and portals 19%
In data catalogs or metadata repositories 18%
In SaaS or other cloud-based systems 17%
In streaming data 13%
In point applications 12%
In devices (e.g., IoT sensors or machines) 9%
None of the above 16%
Figure 6. Based on answers from 134 respondents. Respondents could select all answers that apply.
Tightening integration between analytics and metrics can enable those accountable for the metrics
to ask questions about the data and look at trends, patterns, and predictive insights to determine
the right course of action sooner. The research shows that some organizations are embedding
analytics for externally facing websites and portals (19%), potentially as part of data monetization
strategies. Leading-edge organizations are beginning to embed analytics in collaborative,
workflow, and instant messaging systems such as Slack.
In addition, although not the majority of research participants, some organizations surveyed
are also embedding analytics in data pipeline development (21%) and data catalogs or metadata
repositories (18%). These uses of analytics, as well as AI and machine learning, can help
tdwi.org 15
Faster Insights from Faster Data
organizations profile data as it moves from sources through pipeline steps for transformation,
enrichment, and delivery to users. They can use analytics to look for patterns and anomalies in raw
source data, learn the data’s quality, and in general speed up preparation steps. AI and machine
learning enable organizations to deal with higher volume, velocity, and variety of data as well.
Some embed analytics in operational, streaming data, and Internet of Things (IoT)
systems. Just over one-quarter (28%) embed analytics in operational systems; 21% embed
analytics in business process management. Along with potentially reducing the latency between
analytics and action, embedding analytics in these systems enables users to view data insights
within the context of their operations and processes. Enabling users to interact with data in
context is important to increasing an organization’s overall data literacy.
Embedding analytics can become necessary for organizations that want immediate notifications
or predictive insights from streaming data, such as IoT data, for operational monitoring and
management. Whether fully embedded or not, having analytics more deeply integrated with
applications or business process management systems can reduce decision latency; personnel
do not have to move to a different tool or interface to consume the analytics. Just 13% of
participants are currently embedding analytics in streaming data and 9% are doing so in devices
such as IoT sensors or machines.
Figure 7. Based on answers from 132 respondents. Ordered by highest combined “very satisfied” and
“somewhat satisfied” responses.
16
Technologies and Cloud Services in Use
Performing visual analysis goes beyond just consumption of visual presentation of data;
depending on the use case, it could require functionality for filtering, drawing comparisons,
examining correlations, visual pivoting and summarization, and drill-down data exploration.
Half of research participants (50%) are either very or somewhat satisfied with their data
discovery and exploration. This suggests that users are either bumping up against the
functionality limits of their tools or may need additional training to make full use of the
functionality at their disposal.
It could also mean that many users are not happy with the data sets available to them.
Organizations surveyed indicate that there is significant room for improvement regarding
data integration, blending, and preparation. Only 5% are very satisfied and 35% are somewhat
satisfied; 47% are unsatisfied, with 15% answering “not satisfied.”
Moderate satisfaction seen for access to freshest data and data streams. More than half
of participants say users in their organizations are either very or somewhat satisfied with their
access to the freshest (e.g., live) data (58% combined). Access to this data is important to users in
finance, risk management, and operations, for example, who need to monitor current business
transactions and potentially fraudulent events and receive updates about situations in near
real time. Rather than wait for users to query the data, some systems can push notifications to
a dashboard on either a desktop or mobile device. When notified, typically users will want to
compare the latest data with historical data in a data warehouse; some may want to perform
deeper analytics across sources to answer business questions prompted by the notifications.
Organizations that need more than access to live data and notification systems are advancing
beyond standard tools to position analytics directly on real-time data streams. Technologies
built with open source frameworks such as Apache Spark, Kafka, and Flink are enabling more
organizations to attempt such projects. Streaming data sets can be sourced from event-streaming
platforms such as Apache Kafka, for example, or directly from sources such as IoT sensors,
equipment monitors, pricing systems, fraud detection systems, and online customer behavior.
Although some platforms enable access to real-time data through traditional Open Data
Connectivity (ODBC) and Java Database Connectivity (JDBC) APIs to BI dashboards, many
organizations find they need to augment data management with technologies that are specialized
for managing real-time data streams. In Figure 7, we can see that although not the majority, a
significant number of organizations are succeeding with analytics on real-time data streams; 41%
of research participants say users are satisfied with their ability to do this, although of those only
8% are very satisfied.
Most users are satisfied with ad hoc query and reporting but less so with advanced
analytics. Just over half of research participants say users in their organizations are satisfied
with ad hoc, “on the fly” query and reporting (12% very satisfied plus 42% somewhat satisfied). BI
and analytics tools typically offer easy-to-use capabilities that enable users to deal with dynamic
needs on their own, often by choosing visualizations, filtering data, and other personalized
attributes; some tools enable on-demand data integration and transformation. Although the
intention of ad hoc functionality is to meet immediate needs, often ad hoc activities can turn Only 31% of research
into new requirements for standard reports and analysis. Organizations should monitor ad participants say their
hoc querying and reporting to determine if user demands could be met more effectively—and organizations’ users are
developers’ talents and system resources could be used more effectively—with standardization. satisfied with advanced
analytics, most likely
Research participants show less user satisfaction with advanced analytics such as data mining indicating continuing
and predictive modeling, with 7% very satisfied and 24% somewhat satisfied. Most users are dependence on highly
likely still dependent on highly skilled data scientists for analytics beyond what they can do with skilled data scientists.
tdwi.org 17
Faster Insights from Faster Data
self-service tools. Data scientists are capable of exploring and blending broader types of data and
can build homegrown models and AI and machine learning routines. They can also push beyond
limited types of advanced analytics supported by tools and perform their own statistical and
mathematical analysis.
Long-term technology trends show incorporation of more advanced capabilities in leading toolsets
in the coming years to enable “citizen data scientists”—advanced business and data analysts as
well as power users who want to go beyond standard BI and OLAP capabilities—to do more on
their own. However, the research results indicate that users are not yet highly satisfied with the
advanced analytics functionality at their disposal.
Data quality is the The pace of BI, analytics, and AI projects typically slows down considerably when users and
biggest barrier to analysts confront too much dirty data and not enough lineage information to understand and fix
gaining faster value anomalies. By taking steps to improve data quality and establish data lineage, organizations can
from data, according to enable projects to move faster, with fewer mistakes and disputes about whether to trust the data.
research participants;
disconnected data silos Data quality problems often go hand in hand with the problem of too many disconnected data silos;
is the second-most 51% of research participants cite this as one of their biggest challenges. To fix the silo problem
common barrier cited. as well as improve data quality and consistency, organizations often will seek to consolidate and
integrate selected data into a central physical location such as a data warehouse, which today
could be on premises or in the cloud. However, as the number and volume of data sources and silos
grow, rather than try to consolidate all data, many organizations will create a data architecture
that combines consolidation with a greater role for data virtualization and data catalogs. A
virtualization layer provides an alternative to relying on heavy data movement and consolidation
by enabling federated querying and virtual, single views of multisource data. Centralized data
catalogs bring together metadata to make it easier to find data and keep track of its lineage.
Almost half of research participants (46%) say data governance and regulatory concerns are one
or their biggest challenges in enabling data assets to be used effectively. In interview research
TDWI finds that analytics and AI projects often run into barriers when IT lacks confidence that it
can govern how the data will be used and shared. For users, governance is an important element
of trust; if users are uncertain about how they can use the data or whether they can gain regular
access to it, they will not trust it. Governance and data privacy regulatory adherence should be
priorities at every stage of a project’s evolution.
Problems exist with data pipelines, transformation, and preprocessing. About a third of
respondents say data transformations that are slow and difficult to manage can impede their
realization of value from data (34%). Traditionally, ETL development has involved significant
manual coding and has been time-consuming for IT to manage as the number and complexity of
ETL routines increase. ETL routines often slow down or have to be restarted when data formatting
and other data consistency issues across sources cannot be resolved without human intervention.
Modern solutions are offering greater automation, which can help organizations streamline
transformation and alleviate some manual work.
18
Overcoming Barriers to Faster Value from Data
Organizations also have the option of using data virtualization alongside ETL processes to provide
faster exploration of new data. They can then transform the data as needed rather than as a
condition of it being loaded into a target database such as a data warehouse.
As organizations move beyond classic BI and OLAP requirements to address new, analytics-
oriented transformation needs and bigger data volumes, our research shows that organizations
are having difficulties with data pipelines and transformation. More than a quarter of research
participants (28%) regard difficulties in updating, maintaining, or iterating data pipelines as
one of their big challenges. It appears that organizations are faring somewhat better with data
preprocessing, such as for OLAP and reporting; only 17% of research participants say slowness
and lack of scalability for data preprocessing is a challenge.
APIs and pipelines offer an alternative to traditional data integration. The growing
popularity of application programming interfaces (APIs) is also putting pressure on traditional
ETL and electronic data interchange (EDI) systems, often forming a reason why organizations
choose to adopt data pipelines. APIs enable applications to expose data that the application is
allowing to be shared with other applications. APIs make it easy to establish data connections,
request data, review documentation about its structure, and then with permission, automatically
populate an application with data from another application. Data pipelines can build data security,
preparation, transformation, versioning, deduplication, and other required activities into the
interchange of data through APIs.
APIs, along with data pipelines, can create simpler connections that allow data to flow easily
between applications. Their use does not require the knowledge about data sources needed
for traditional ETL processes, including knowledge about any changes in the data sources and
structures that might require ETL processes to be rewritten. The API/pipeline style fits with
the cloud computing paradigm, which favors easier and simpler data connectivity. However,
this ease and simplicity depends on adoption of open, standard APIs where possible; otherwise,
APIs can begin to resemble old-fashioned, point-to-point integration that requires specialized
knowledge about each API. Big companies as well as industries such as fintech are working to
establish open APIs.
Lack of skilled personnel and investment are significant obstacles. Over half of research
participants say not having enough skilled personnel is one of their organizations’ biggest issues
(55%). A shortage of skilled personnel is often a strong motivator for organizations to look for
solutions that can automate steps in making data assets valuable. It is also a driver behind the
trend toward cloud-based services for data integration and management that can obviate the
need for adding skilled personnel onsite who can work with on-premises systems. A significant
percentage also say overall investment is inadequate to the challenges they face (41%).
tdwi.org 19
Faster Insights from Faster Data
When we asked this question in 2012, 15% said new data could be added within this time frame, so
the percentage of organizations that can add new data at a fairly fast rate has increased slightly.
In this current study, the largest share of respondents (28%) say it takes between one and three
months to add new data. In 2012, a slightly larger percentage (31%) said it took this amount of
time; the results again indicate that organizations are getting a little quicker about adding new
data to their data warehouses. Data warehouse automation tools have become more prevalent in
use since 2012, which may be helping organizations add new data and set up and populate tables
and columns sooner.
Our research indicates that dashboard and reporting requirements are fairly stable. About half of
research participants (49%) say less than a quarter of users’ requirements change monthly if not
more frequently (figure not shown).
Comparatively, about half as many (25%) say between one-quarter and half of dashboard or
reporting requirements change that often. Just 16% of respondents say more than half of users’
dashboard or reporting requirements change at least monthly (10% don’t know). Our interview
research finds that the features most subject to change are how the data is represented visually,
options for filtering the data, and adjustments needed to KPIs. Users also want to see their
dashboards and reporting upgraded with the latest technologies for drag-and-drop interfaces,
search, and pop-up data and visualization recommendations.
Organizations increasingly Requirements for machine learning and/or on-demand analytics are fairly stable. Moving
need to support data beyond dashboards and reporting, the latest set of requirements that many organizations must
demands of AI/machine address are those for machine learning and for business-driven, on-demand analytics. These two
learning and on-demand areas may come together if AI techniques such as machine learning are embedded in on-demand
analytics. Fortunately, analytics, which are typically packaged solutions or cloud-native services designed to address
current requirements
specific business requirements such as customer segmentation, sales opportunity analysis,
appear to be fairly stable
pricing, or a vertical industry need.
for these projects.
We asked research participants what percentage of their organizations’ machine learning models
and/or on-demand analytics requirements change at least monthly, if not more frequently. The
largest percentage (39%) say only one-quarter or fewer of their requirements change that frequently
(figure not shown). Just 15% of respondents say between one-quarter and half of requirements
for these systems change monthly, if not more frequently, and only 8% said more than half of
requirements change that often. However, 38% of respondents either don’t know or find the
question not applicable, which indicates that machine learning and on-demand analytics are not yet
mainstream. As the use of these technologies and solutions grows and they begin to serve a greater
variety of use cases, we may see less stability in the rate at which requirements change.
20
Overcoming Barriers to Faster Value from Data
one or multiple sources usable for users and applications requirements—can be notoriously slow.
Data profiling, quality improvement, transformation, enrichment, and other steps that are part
of data preparation procedures in data pipelines can take up the majority of users’ time, not to
mention IT specialists who work on bigger jobs for preparing data and building pipelines for data
scientists, analysts, and other users.
Thus, it is not surprising that organizations surveyed show high interest in making
improvements. The majority of research participants (74%) say it is either “extremely” or “very”
important for their organizations to reduce the amount of time and resources spent on data
preparation, transformation, and pipeline processes (figure not shown).
We asked research participants what percentage of the total time spent on recent BI and analytics
projects was devoted to preparing the data compared to the time spent performing analysis and data
interaction (see Figure 8). The survey results are fairly consistent with what we saw in 2016 when
we asked this question. In this report, however, research participants indicate that an even higher
percentage of users’ time is spent preparing the data than it was in 2016. Almost half of respondents
(49%) say 61% or more of users’ time is spent on data preparation; in 2016, 45% said this amount
was being spent. Thus, it appears that organizations may be losing rather than gaining ground on
objectives for reducing the time spent on preparation, transformation, and pipelines.
Thinking of your organization’s most recent BI and analytics projects, what percentage of the
total time was spent preparing the data compared to the time spent performing analysis and
data interaction?
81%–100% 10%
61%–80% 39%
41%–60% 24%
21%–40% 10%
0%–20% 7%
Don’t know 10%
If errors and inconsistencies become embedded in the data used for analytics and visual
reporting, it can take weeks or months for organizations to uncover them, which can have many
negative repercussions, not the least of which is a lack of confidence in the data. It is important
for organizations to invest in solutions that can handle data preparation, including diverse
transformation requirements, especially as they begin to work with fast, continuous, and high-
volume streams of data for both alerting and analytics. Such solutions can reduce bottlenecks and
streamline the flow of data from ingestion through preparation to support a variety of use cases.
tdwi.org 21
Faster Insights from Faster Data
Many data pipelines perform transformations, which makes them similar to ETL processes.
However, a data pipeline is considered to be a broader idea than traditional ETL because pipelines
can involve a greater variety of data, including data streams, and the destinations for the data
can vary from AI and machine learning algorithms to predictive analytics models, standard BI
dashboards, and business applications.
Running largely in slower, traditional batch modes, ETL focuses on extraction of structured data
from known sources to a staging area (such as a specialized ETL server) for transformation and
then loading into a data warehouse or data mart. For pipelines that have a data warehouse, data
mart, or BI/OLAP system as their destination, transformations will be the expected centerpiece.
Organizations may choose to use change data capture (CDC) technology instead of ETL to reduce
delays because ETL processes are often time-consuming.
Another option is to alter the ETL sequence to extract, load, and transform (ELT). Organizations
will use ELT if the target destination (such as a data lake, data warehouse, or analytics appliance)
has a powerful data engine that uses massively parallel processing (MPP) and/or clustering and
can perform “push down” transformations to be processed in the database rather than in an
intermediate station specialized for ETL.
The choice of ETL or ELT often depends on the amount of data and the complexity of
transformations. Some modern data transformation and pipeline systems are able to
automatically determine which approach is optimal so administrators do not have to manually
decide for each workload. ELT is typically the choice for organizations that need faster ingestion
for analytics and machine learning programs. ELT processes can land the data, and from there
analytics and AI programs may apply custom transformations as part of processes for exploring
the data for patterns, trends, and other insights.
Data virtualization is an additional option. Data virtualization leaves the data in place and does
not require moving the data from sources to ETL staging areas and then to data warehouses or
other target zones. Data virtualization can offer greater agility than traditional ETL because
it does not force data coming from multiple sources to comply with a single data model and
transformation plan. The data virtualization layer provides single views of logically integrated,
multisourced data and supports federated queries to those sources. Virtualization layers can
optimize query processing at the sources to take advantage of local processing power.
22
Overcoming Barriers to Faster Value from Data
Data virtualization, ETL, and ELT are not mutually exclusive choices. Organizations will often Data virtualization, ETL,
use a combination of these options in their preparation and pipeline processes to fit different use and ELT are not mutually
cases, levels of real-time data requirements, data volumes, transformation complexity, and more. exclusive choices;
In addition, organizations should evaluate large-scale data processing tools and frameworks organizations will often
such as Apache Spark to scale beyond ELT. Technology solutions are available that can help use a combination
organizations overcome skills gaps that may make them hesitant to try Apache Spark frameworks of these options.
and libraries as their data processing needs outgrow traditional technologies.
Expense and scalability stand out as challenges. We asked research participants which of six
issues that TDWI frequently sees as major challenges regarding ETL, ELT, and data pipelines
are the most important for their organizations to address (see Figure 9). Participants indicate
that all are important, but at the top of the list are expense and scalability, with 88% and 87% of
participants respectively saying these are combined “very” or “somewhat” important issues.
The potential to reduce expenses is, of course, a driving reason why many organizations want to
migrate all types of systems and applications to the cloud, including those for data integration
and transformation. Cloud platforms can relieve organizations of having to dedicate budget
and resources to storing and processing data on premises, but often they must still develop
and execute data preparation routines and pipelines. Utilizing software automation along with
cloud options can help reduce expenses generated by intensive manual work by developers and
administrators and enable them to focus on more value-adding activities.
Figure 9. Based on answers from 129 respondents. Ordered by highest combined “very important”
and “somewhat important” responses.
Software solutions and cloud-based services can help organizations address scalability challenges
with ETL, ELT, and data pipelines. Leading solutions today use AI techniques such as machine
learning to scale data exploration and analysis of high-volume, high-velocity, and highly varied
big data. Slow performance, which 83% of participants said was an important challenge, is often
related to scalability issues, although it could be due to other factors such as mistakes and quality
errors in the data, queries, and programs. Having several technology options is important for
addressing scalability challenges. Organizations can then match the right on-premises system
or cloud-native service with the right workload instead of trying to funnel all workloads through
one approach such as a traditional data warehouse and ETL process.
tdwi.org 23
Faster Insights from Faster Data
However, to innovate with data and analytics in the cloud and take advantage of the other
benefits, most organizations need to move and migrate data to the cloud. These phases can be
slow and costly, which can impact how quickly organizations can respond to business needs.
Organizations must therefore focus on improving how well they can load, move, and replicate
from on-premises sources to cloud-based platforms, potentially use virtualization to reduce data
movement and provide views of data in place, and perform operations such as transformations
and updates for data used in visualizations, analytics, and AI.
In addition, for most organizations, “the cloud” does not consist of just one platform; our
research finds that many organizations use the services of multiple cloud data providers for data
storage, data lakes, and data warehouses. The danger is that each one can become a silo, which
only exacerbates existing disparate data problems and leads to slower and less complete data
access. Thus, as organizations set up data platforms with multiple cloud providers, they should
examine what new technologies they will need to move, replicate, and otherwise manage data
across them to produce integrated views and access for users.
In Figure 10, we can see the levels of satisfaction research participants have with how well their
organizations can accomplish a variety of factors having to do with getting data ready in the
24
Overcoming Barriers to Faster Value from Data
cloud for users, analytics, and applications. There is room for improvement; for not one of these
factors do we see high “very satisfied” percentages or responses.
How satisfied is your organization with the following factors regarding loading, replicating,
transforming, and updating data from data sources such as on-premises systems to cloud-
based data platforms, such as a data warehouse?
Amount of data you can load 13% 47% 25% 8% 7% Very satisfied
Somewhat satisfied
Ability to do incremental loads 9% 51% 22% 10% 8%
Somewhat unsatisfied
Change data capture 10% 35% 31% 10% 14% Not satisfied
Don't know/NA
Time it takes to load data 5% 40% 34% 14% 7%
Time it takes to load into OLAP cubes 2% 20% 23% 13% 42%
Figure 10. Based on answers from 131 respondents. Ordered by highest combined “very satisfied” and
“somewhat satisfied” responses.
Organizations show the most satisfaction with their ability to handle the amount of data
they need to load (60% combined “very” and “somewhat” satisfied) and their ability to
handle incremental loads (also 60% combined). This suggests that there is reasonable but
not overwhelming satisfaction with how well organizations surveyed can manage full and
incremental loads to cloud data warehouses or data marts and analytics sandboxes. We can also
see in the chart that among organizations that indicate that they are loading data into cloud-
based OLAP cubes, satisfaction is average regarding the time it takes them to load data into the
cubes and how much data they can load into them.
As we saw earlier with cost being the biggest ETL and data pipeline challenge, we find in
Figure 10 that the second-highest percentage of dissatisfaction is with costs associated with
data loading; 46% are either somewhat unsatisfied or not satisfied. The highest level of overall
dissatisfaction is for the time it takes to load data to the cloud (48%), and the third-highest
dissatisfaction levels are with support for data streaming to the cloud, with 26% somewhat
unsatisfied and 18% not satisfied (note that 30% either don’t know or find this attribute not
applicable). Instead of loading all data into the cloud in batch, organizations could choose
to stream the data into cloud data warehouses or other data platforms continuously. Then,
organizations can begin running some analytics or AI programs on this data sooner. However,
this research shows that organizations are not yet happy with this option.
Data virtualization can provide an alternative to loading data into a separate, central store from
hybrid, multicloud sources. Data virtualization offers transparent access so that users do not
need to know how and where to access the data. Data virtualization solutions create a logical
view (or “logical data warehouse” as it is sometimes called). Users can then query this view
from within their chosen front-end tools. Data virtualization uses metadata extensively, which
spotlights the importance of data catalogs and repositories for documenting data knowledge and
enabling access to multiple data stores.
tdwi.org 25
Faster Insights from Faster Data
The top goal Data catalogs, metadata repositories, business glossaries, and emerging semantic integration can
organizations have help organizations meet many goals. We asked which goals are most important to organizations.
with data catalogs and The one selected by most research participants is making it easier for users to search for and find
metadata management data (79%; figure not shown). This result is consistent with the response to a similar question
is to make it easier for that we asked in 2018.3 The goal with the second-highest percentage is also the same as in 2018:
users to search for and improving governance, security, and regulatory adherence (70%). For governance, monitoring
find data, followed by
access to sensitive data, and data lineage, it is essential to know where the data is, its life cycle in
improving governance.
the organization, and how it is being shared.
To adhere to regulations, organizations typically need to audit data management and be able
to demonstrate that they are protecting the data. Modern solutions’ capabilities for automated
tagging and data lineage tracking can be key to making governance effective, easier, and less
expensive. Over half of respondents (55%) see the ability to centrally monitor data usage and
lineage as a key goal. About one-third of organizations surveyed regard consolidating multiple
smaller data catalogs and glossaries as an important goal. Where consolidation is too difficult,
slow, and costly, data virtualization integrated with data cataloging can provide an alternative.
A sizeable percentage of research participants are seeking to coordinate data meaning across
sources (59%). This is important to resolving debates among users about what data means,
whether calculations match up, and—if there are discrepancies among multiple data sources—
determining which one is correct. Master data management (MDM) and semantic integration are
technologies that build from metadata definitions to bring higher-level concepts into focus and
make it easier to find and manage data related to each concept. Coordinating data meaning can
ultimately make it faster and easier for users to find and interact with relevant data.
Finally, more than half of organizations surveyed want to use data catalogs and metadata
management to improve data preparation and data pipelines (54%). Integration of these
technologies plus data governance can help organizations ultimately deliver more trusted and
complete data to users, AI programs, and applications.
3
TDWI Best Practices Report: BI and Analytics in the Age of AI and Big Data, Q4 2018, online at tdwi.org/bpreports.
26
Closing in on Real Time: Options for Faster Data
TDWI research explored which technologies organizations are using to make data (including data The data warehouse and/
streams) available sooner for BI, analytics, and AI/machine learning (see Figure 11). As when or data mart remains the
we asked this question in our 2018 Best Practices Report, a data warehouse and/or data mart is most common technology
again the most common technology used (61%). This is followed by BI/analytics access to live used to make data
data (48%), which typically means that users can access data as it is being recorded in business available sooner; 26% are
applications or transaction processing systems. using CDC and 23% are
using data virtualization.
Which technologies are in use by your organization to make data (including data streams)
available sooner for BI, analytics, and AI/machine learning?
Data warehouse and/or data mart 61%
BI/analytics access to live data 48%
Data lake 44%
AI/machine learning 38%
In-memory database 31%
Data pipelines 30%
In-memory analytics 30%
Preprocessing (e.g., OLAP cubes) 29%
Operational data store 27%
Change data capture (CDC) 26%
In-database processing 26%
Columnar database 25%
Apache Kafka 23%
Data virtualization or federation 23%
Apache Spark Streaming 19%
Other Apache open source (e.g., Apex, Flume, or Storm) 10%
Commercial streaming or message-oriented middleware 7%
Complex event processing 7%
Other (please specify) 4%
Figure 11. Based on answers from 124 respondents. Respondents could select all answers that apply.
To support near-real-time data warehousing, organizations will often use either CDC to update
changed data in the warehouse or data virtualization to speed access to sources without having
to move or replicate the data to a central physical store. Figure 11 shows that 26% are using
CDC, which is about the same percentage as in 2018. More research participants say their
organizations are using data virtualization or federation now (23%) than said they were using
this technology in 2018 (18%).
Just over a quarter of organizations surveyed (27%) are using an operational data store (ODS)—a
system that typically complements a data warehouse by providing users with access to a selected,
trusted set of integrated near-real-time data, usually for operational reporting and notification.
This is a somewhat smaller percentage than we saw in the 2018 report, which was 36%.
Data lakes garner the third-highest percentage of respondents in this report (44%). Data
lakes, increasingly stored in the cloud, are flexible platforms that can contain any type of
data. Organizations use them to consolidate data from multiple sources, which could include
operational, time-series, and near-real-time data. However, unlike an ODS, data lakes are
tdwi.org 27
Faster Insights from Faster Data
typically set up for exploratory analytics and AI/machine learning to look for patterns and other
insights. Some organizations create operational data lakes or set up portions of their data lake
for fast SQL queries (using, for example, SQL-on-Apache Hadoop query technologies) on big data.
Organizations can also develop templates and preconfigured views of selected operational data
for consistent and repeatable reports or for developing OLAP cubes.
In-memory analytics and databases are becoming more common. In 2018, we reported that
21% of organizations we surveyed were using in-memory analytics; this year’s report shows
that a higher percentage of respondents are using this technology (30%). The use of in-memory
database technology has also risen, from 17% to 31% of respondents saying their organizations
are using this option.
In-memory platforms, by reducing the need to read data stored on disk, can enable faster access
to data for visualization, data exploration, and testing models. As larger random access memory
(RAM) caches become available, organizations are able to keep more “hot” data available for
computation. Technologies are evolving to make it possible to store entire data warehouses, data
marts, or OLAP cubes in-memory. Commercial as well as open source solutions using Apache
Spark or the more recent Apache Ignite can support in-memory analytics and database systems.
They can also support streaming workloads.
About one-quarter of Organizations are implementing streaming and event processing technologies. Figure 11
organizations surveyed shows that about a quarter of organizations surveyed use Apache Kafka. Now an established
are using Apache Kafka, platform for distributed streaming of large numbers of events, Kafka began as a messaging
which can support
system with optimized performance. Organizations are using Kafka to build streaming data
data streaming and
pipelines for automated applications that must react to real-time data or must make the data
real-time analytics and
available for analytics and machine learning. Kafka can be a source for Apache Spark Streaming,
machine learning.
which 19% of organizations surveyed are using. This module allows organizations to integrate
a variety of workloads, including streaming, on the same processing platform, which can reduce
programming and modeling complexity.
Only a small percentage say their organizations are using complex event processing (7%),
which is an older technology for processing and analyzing real-time event streams. The same
percentage of respondents indicate that their organizations are using commercial message-
oriented middleware (7%).
Clearly, there is no single technology approach to managing and analyzing near or true real-time
data, including data streams. Organizations need to define their requirements for data freshness
and the scale of data flows, speed, and volume. Organizations should look at current in-house
skill sets and see where they need to hire more experts in data management, data engineering,
and data science. They should also evaluate the potential of data management automation and
cloud and SaaS options. Organizations should begin with proofs of concept (POCs) and test
applications with smaller, well-defined projects.
28
Closing in on Real Time: Options for Faster Data
Figure 12. Based on answers from 126 respondents. Respondents could select all that apply.
Performance management, frequently one of the main drivers behind dashboard development,
is also an area where many organizations surveyed (42%) would expect to see improvement with
faster data and analytics. Business strategy, which organizations implementing performance
management seek to communicate via dashboards, KPIs, and other metrics, would be a focus for
39% of organizations surveyed.
The results suggest that if and when organizations invest in technologies and services for faster
data and analytics, the leading objective is most likely to improve information for managers
tasked with increasing operational efficiency and effectiveness. About one-quarter (24%) say
process execution would improve with investment in faster data and analytics, which again
shows that some organizations see the value of not only faster but smarter business processes.
A significant percentage (39%) say situational awareness and alerting are outcomes that their
organizations would want to see from investments—objectives that are often key goals behind
deployment of streaming data management and real-time analytics.
Use cases for streaming technologies draw interest. Streaming data can give organizations
new insights into how to solve problems. In Figure 12, we can see that 35% of research
participants say their organizations would focus investment in faster data and analytics on
improving predictive maintenance; 29% note IT systems and network management would be
a priority for improvement. With many organizations now able to tap IoT sensor data from
machines and other equipment, they need analytics that can explore this data sooner for trends
and patterns that could indicate an imminent failure.
tdwi.org 29
Faster Insights from Faster Data
This modernized analysis can result in smarter maintenance. Rather than use traditional fixed-
schedule maintenance, which can either overlook serious problems or apply maintenance when it
is not needed, organizations can monitor conditions to see when maintenance is actually needed.
They can develop predictive models based on all relevant data rather than assumptions based on
a smaller selection of historical data and other records. Similar efficiency could be brought to risk
management and fraud detection, which 33% of research participants cite as key areas.
These and other use cases require integrated analysis of real-time, streaming data and historical
data. Organizations should evaluate solutions such as data virtualization that can provide
combined views of streaming and historical data. Some data virtualization solutions, for
example, can read data as it is streaming from edge devices through pipelines for comparison
with historical data rather than having to wait for this data to be loaded into a target database.
Predictive maintenance Faster data drives analytics innovations for business benefits. Predictive maintenance
based on IoT sensor data based on IoT sensor data is a growing use case for analytics in operations, manufacturing, IT,
is a growing use case and logistics. However, perhaps an even bigger trend is the use of streaming data to improve
for real-time analytics. customer engagement and personalization, which 37% of research participants indicate is an
Customer engagement objective for improvement. Figure 12 also shows that 34% would like to see improvement in
and personalization as
pattern detection in customer data. If organizations can analyze near or real-time data, they
well as customer behavior
can respond to customers’ interests and concerns in the timeliest manner possible, which is a
pattern detection are
competitive advantage. Organizations can also gain insights into patterns that they would not
other common use cases.
find when analyzing only historical data.
AI can help organizations churn through volumes of data to help understand why something is
happening, what could happen next, and what to do about it. In some cases, AI techniques are
adding the “smarts” to drive fast, automated decisions; in others, AI algorithms are surfacing
data insights to augment information humans are using to make decisions.
As we did in 2018, we asked research participants to identify the most important ways in which
their organizations currently use (or plan to use) AI such as machine learning to augment
BI, analytics, and support data integration and management. The most prevalent choice is to
automate discovery of actionable insights (52%; figure not shown); this was also the most common
selection in 2018. This could indicate that organizations intend to set up algorithms and models
that do not require regular human intervention, but with the end purpose of supplying personnel
with insights that can improve daily decisions. Just under half of organizations surveyed (44%)
want to use AI to enable faster analytics on large data volumes, which shows that organizations
see AI as a solution for scaling up discovery and analytics and growing big data sources.
30
AI for Faster Insights and Automated Decisions
To augment human decisions, AI-derived insights can be delivered to decision makers in the form To augment human
of recommendations; 41% of research participants say their organizations want to augment user decisions, 41% of
decision making by giving them recommendations, about the same percentage as in 2018. Some organizations surveyed
decision makers, however, do not necessarily want recommendations; they just want faster and want to see AI-derived
more comprehensive data search and exploration. Our research finds that 35% of organizations recommendations
surveyed want AI to help users find, select, and use data for analysis. delivered to decision
makers.
Because of emerging requirements such as these, organizations must make how data integration
solutions use AI a key point in their evaluations. They should examine how AI is applied for
faster location, access, and viewing of new data. Some solutions can apply AI programs that learn
from and adjust integration and preparation steps to changes in the data and its formats. This
can reduce the need for manual adjustments to reconfigure target databases or logical views,
which slow down access and analysis. AI programs can also learn from users’ search, access, and
viewing patterns to recommend related data sets.
Organizations see a role for AI in data governance, cataloging, and preparation. Along
with helping users locate relevant data and data relationships, research participants see AI
helping their organizations better govern, integrate, and manage the data. Survey results show
organizations seeking improvements in the following areas:
• Automating data classification for governance and security. About a third of research
participants (32%) anticipate that AI can help reduce the manual effort and inconsistency
that plague data classification and make it hard to locate data for governance and security.
Some organizations see AI addressing the general problem of taxonomy development; 20% of
research participants regard AI as important for this task.
• Develop and update the data catalog or metadata repository. Also as noted earlier,
collecting and consolidating knowledge about the data and its location and origins is
frequently manual and incomplete. Almost a third of research participants (30%) regard AI’s
role in enabling their organizations to build such a catalog or repository as important; this is
up from 27% in 2018.
We asked research participants about the importance of seven different steps that involve
integrating analytics to reduce delays and automate decisions (see Figure 13). Integrating visual
analytics with business process management topped the list; 77% regard this as either very or
somewhat important. With better integration, organizations can bring data insights to bear
tdwi.org 31
Faster Insights from Faster Data
directly on business processes to improve efficiency and effectiveness. Managers accountable for
specific processes can tailor analytics based on their context and knowledge of the data.
How important to your organization’s efforts to reduce delays and automate decisions are
the following steps for integrating analytics with business applications, processes, and
workflows?
Integrate visual analytics with business
Very important 23% 54% 14% 5% 4%
process management
Somewhat important Automate decisions in operational or
process systems 32% 37% 15% 8% 7%
Not too important
Enable analytics to guide real-time
Not at all important process optimization 20% 47% 19% 8% 6%
Don't know/NA Develop predictive machine learning
workflows 19% 44% 15% 12% 10%
Develop simulations and scenarios for
decision optimization 18% 43% 22% 7% 10%
Evaluate which decisions to optimize using
algorithms 13% 46% 19% 12% 10%
Run predictive models and/or AI on
streaming data 22% 30% 24% 12% 12%
Figure 13. Based on answers from 125 respondents. Ordered by highest combined “very important”
and “somewhat important” responses.
Demand for tighter Demand for tighter integration between analytics and business processes is also driving interest
integration between in smarter process automation and optimization. As we can see in Figure 13, more than two-thirds
analytics and business of research participants (69%) say their organizations find it either very or somewhat important
processes is driving to automate decisions in operational or process systems. Nearly the same percentage see it as a
interest in smarter priority to enable analytics to guide real-time process optimization (67%).
process automation
and optimization; This level of optimization typically demands continuous data such as data streams to assess
large percentages of the performance of manufacturing systems, for example, and make adjustments automatically.
organizations surveyed Analytics and AI can help organizations calibrate proper levels based on numerous variables
see this as a priority. associated with costs, resources, energy used, demand, and other factors. Almost two-thirds
want to integrate analytics to enable development of simulations and scenarios for decision
optimization (61% combined very and somewhat important responses).
We can see that the activities involving streaming data in Figure 13 are not as commonly regarded
as important as those discussed above. However, significant percentages of organizations
surveyed do see the steps as important to reducing delays and automating decisions. Just over half
regard running predictive models and/or AI on streaming data (52%) as either very or somewhat
important. A larger percentage of participants say their organizations want to develop predictive
machine learning workflows (63% combined very and somewhat important responses). These
workflows could involve many types of data, including real-time streaming data in machine
learning development, testing, and operationalization cycles.
32
Recommendations
Recommendations
To conclude this report, here are 10 recommendations for developing strategies to reduce delays in Delivering timely data
data integration and management and increase business value through faster BI and analytics. and recommendations
to users and finding the
Deliver more timely data and recommendations to users. Organizations need to modernize right balance between
the paradigm for BI, visual analytics, and dashboards, particularly where they are deployed to agility and centralized
operational managers and frontline personnel. These users often need very timely data, including management are key
in some cases real-time data views and analytics within the context of their responsibilities. They to taking self-service
also need applications that are less passive and can supply users with recommendations about analytics to the next level.
data sets that might be relevant, visualizations, analytics, and ultimately prescribed actions to
take. Operational managers and frontline users would then be in a better position to make good
decisions based on fresher, more contextual, and richer information.
Find the right balance between agility and centralized management. An imbalance
here leaves neither users nor IT happy and can lead to bottlenecks that thwart faster decision
making. Users want agility, and our research shows that organizations are pursuing self-service
technologies to give users more freedom in how they personalize workspaces and access and
analyze data. However, ungoverned self-service can lead to too many data silos, duplication, and
workloads that haphazardly compete for computation and processing. IT’s perspective is to ensure
good governance, performance, and quality, especially for priority workloads. Yet, clamping down
unnecessarily on users will make it harder to move forward and drive users to set up their own
data silos in the cloud. Data virtualization solutions could be helpful in reducing dependence on
traditional physical data consolidation, which tends to require rigid, preset integration processes.
Users and IT need to form committees or a center of excellence to discuss how to balance self-
service with centralization.
Explore how agile, DataOps, and related methods could help projects deliver value
sooner. Too often, organizations are mired in chaotic, inconsistent, and often redundant work
in projects for developing BI, analytics, applications, and AI. This can result in delays, inefficient
use of data and processing resources, and dissatisfaction among users, who need capabilities
as soon as possible. Our research finds that many organizations are using agile (or agile-like)
methods—as well as DevOps, DataOps, and design thinking—and are having positive experiences.
Organizations that are not using them should try these methods for one or a small number of
projects that have clear deliverables to assess whether they are helpful and iron out difficulties
before trying them on a larger number of more complex projects.
Get the big picture and orchestrate parts into a whole. As projects grow more numerous and
workloads become more complex, it’s easy for organizations to get bogged down; then, despite
having the latest technologies, it can seem impossible to deliver data faster to support faster
analytics. Methods such as DataOps can help organizations get a holistic picture and gain an end-
to-end understanding of interrelated steps in projects, stakeholder responsibilities, and where
tdwi.org 33
Faster Insights from Faster Data
impasses in data flows, updates, and transformations need to be corrected. Organizations need
tools that can complement use of DataOps and other methods. Organizations should evaluate
tools that not only improve data life cycles but also help them orchestrate what happens in
multiple data pipelines and observe the big picture.
Take advantage of opportunities for automation and reuse. With an increasing number of
analytics and AI workloads needed to meet diverse business demands, it’s essential to exploit
the potential for smarter automation and reuse in software solutions and cloud services.
Organizations heavily dependent on manual coding, monitoring, data preparation, and
integration will struggle to scale as more users and applications need to interact with the data and
test models and algorithms. “Smarter” is a key word: automating processes will increase the speed
and efficiency of routine chores, but organizations should evaluate how AI and analytics can
contribute to automating decisions or provide users with recommendations for action.
Create repositories of knowledge about the data and improve access to them. Research
in this report suggests an upswing in the number of organizations that are developing and
managing data catalogs, metadata repositories, and business glossaries, all of which are helpful in
bringing together “data about the data” as well as other useful information about its lineage. Such
resources are valuable to users, administrators, data scientists, and applications; they can shorten
paths to finding and interacting with all data relevant to a subject of interest. Data catalogs
and metadata repositories are essential for data governance as well. Technologies are making
it easier to develop and use these systems. Organizations should make it a priority to invest in
these technologies so they have useful resources of knowledge about their data that users and
applications can easily apply to produce more complete views of and access to the data.
Improve trust in data, analytics, and AI. It doesn’t matter how fast the data, analytics, and
algorithms are if no one can trust the data. TDWI finds that a number of issues that impact data
trust—most prominently, problems with data quality—are key challenges stalling progress in
building strong analytics cultures and accelerating decision processes. Data trust is essential to
collaboration on decisions and acceptance of analytics insights. Organizations need to invest in
data quality, shared data catalogs, data lineage, and other technologies and practices that give
decision makers transparency into the data and confidence in insights drawn from the data.
Use appropriate technologies to streamline governance. With goals for faster data and
analytics, it’s never been more important for organizations to set up rules and policies to protect
sensitive data and reduce confusion about where the data is, where it came from, who is accessing
it, and what’s being done with it. Semantic data integration, which builds on a foundation of
central data catalogs and metadata management, can help organizations answer these questions.
Organizations should examine options for how a data virtualization layer could help protect
access to sensitive data distributed across hybrid, multicloud platforms.
Organizations should Evaluate the potential of data streaming and real-time analytics. With a greater selection
develop a strategy for of open source programs and frameworks as well as the latest generation of commercial tools to
augmenting existing choose from, data streaming and real-time analytics are poised to become mainstream, possibly
data management displacing older technologies and practices. TDWI research finds that significant percentages
and analytics with of organizations are using technologies to manage and analyze IoT sensor data, web logs and
data streaming and customer behavior data, mobile and geolocation information, and more. Organizations should
real-time analytics.
develop a strategy for how to augment existing data management and analytics with data
streaming and real-time analytics and which business objectives would benefit.
34
Research Co-sponsor: Denodo
denodo.com
Denodo is a leader in data virtualization providing agile, high-performance data integration, data
abstraction, and real-time data services across the broadest range of enterprise, cloud, big data,
and unstructured data sources at half the cost of traditional approaches. Denodo’s customers
across every major industry have gained significant business agility and ROI by enabling faster
and easier access to unified business information for agile BI, big data analytics, web and cloud
integration, single-view applications, and enterprise data services.
The Denodo Platform offers the broadest access to structured and unstructured data residing in
enterprise, big data, and cloud sources, in both batch and real-time, exceeding the performance
needs of data-intensive organizations for both analytical and operational use cases, delivered in a
much shorter time frame than traditional data integration tools.
The Denodo Platform drives agility, faster time to market, and increased customer engagement
by delivering a single view of the customer and operational efficiency from realtime business
intelligence and self-serviceability.
Founded in 1999, Denodo is privately held, with main offices in Palo Alto (CA), Madrid (Spain),
Munich (Germany), and London (UK).
For more information visit www.denodo.com, follow Denodo via twitter@denodo, or contact us to
request an evaluation copy at [email protected].
tdwi.org 35
TDWI Research provides research and advice for data
professionals worldwide. TDWI Research focuses
exclusively on data management and analytics issues and
teams up with industry thought leaders and practitioners
to deliver both broad and deep understanding of the
business and technical challenges surrounding the
deployment and use of data management and analytics
solutions. TDWI Research offers in-depth research reports,
commentary, inquiry services, and topical conferences
as well as strategic planning services to user and vendor
organizations.
T 425.277.9126
555 S. Renton Village Place, Ste. 700 F 425.687.2842
Renton, WA 98057-3295 E [email protected] tdwi.org