GC 600 SUP Sup Genai Adv Whitepaper Final V2 en
GC 600 SUP Sup Genai Adv Whitepaper Final V2 en
GC 600 SUP Sup Genai Adv Whitepaper Final V2 en
advantage:
A founder’s guide to using your data as a differentiator
Table of contents
O V E R V IE W
A founder’s guide to using your data as a differentiator.......................... 3
G E N E R AT I V E A I
A brief primer. . ............................................................................. 4
Put data at the center of your generative AI approach............................ 5
SEC T IO N 1
Turn data into a differentiator for generative AI applications.. .................. 7
SEC T IO N 2
Establish a data foundation for generative AI..................................... 11
SEC T IO N 3
Think beyond technology to create a competitive advantage................. 14
C O N C L U SIO N
Tap into your data in a new way to deliver more value.......................... 19
G L O SS A RY . . . . . . . ........................................................................... 20
2
O V E R V IE W
But the true strength of generative AI goes beyond general-knowledge chatbots. Startups
across industries are just beginning to peel back the layers and uncover ways generative AI
can help them innovate. They are eager to embrace these possibilities—and for good reason.
According to Goldman Sachs, generative AI could raise the global gross domestic product
(GDP) by almost $7 trillion and increase productivity growth by 1.5 percentage points over a
10-year period. At the same time, the technology is also attracting investors, with generative
AI startups raising $27 billion in 2023 according to Pitchbook.
As the technology matures, a new wave of innovative startups and technology companies are
fast-tracking the rollout of new generative AI capabilities to meet the growing demand from
both customers and businesses. This constant innovation is leading to an increase in available
generative AI use cases.
Given these findings, it is no surprise that startup founders and data leaders want to move
quickly with their own generative AI applications. They want to know not only how to take
the next best step, but also how to gain a competitive advantage in this emerging space and
attract investors. The key to unlocking the potential of generative AI is your startup’s own data.
3
A brief primer
At a high level, generative AI can be defined as a type of AI used to produce new content and
ideas. Generative AI applications can, for example, write stories, generate code, and design
digital images. These applications also make it possible to automate cumbersome tasks, such
as taking lengthy documents and condensing them into brief summaries.
Artificial intelligence
Machine learning
Neural networks
Generative
models
4
G E N E R AT I V E A I
Like all AI, generative AI is powered by machine learning (ML) models—very large models
that are pretrained on massive amounts of data. These models are commonly referred to as
foundation models (FMs).
It’s important to note that at its core, an FM uses the latest advances in ML. A class of FMs,
such as the generative pretrained transformer (GPT) models, are commonly referred to as
large language models (LLMs) and are specifically focused on language-based tasks, such
as summarization, text generation, and open-ended Q and A. LLMs are special because they
contain a large number of parameters that make them capable of learning advanced concepts.
Generative AI can help move these numbers in the right direction. Generative AI presents
an opportunity for you to tap into your data in a new way and get more value from it. The
technology enables you to innovate on top of your data more quickly, use this data in new types
of applications, and unlock the value of data that has traditionally been hard to work with, like
unstructured data.
We’re already seeing some of our customers combine data with generative AI to improve
business outcomes and customer experiences. For instance, Intuit built Intuit Assist, a new
generative AI–powered assistant that uses relevant contextual datasets spanning small business,
consumer finance, and tax to deliver personalized financial insights to customers.
5
G E N E R AT I V E A I
These applications are exciting and represent perhaps just a fraction of what generative AI can
deliver for startups and their customers. The ways in which generative AI will change our world
are still coming into focus. You may be grappling with how to capture this enormous potential
when it feels as though we’re entering uncharted territory. In reality, the path to realizing
business value with generative AI is not much different than it is with any other technology. It
comes down to the strength of your data strategy and how you use your data as a differentiator
within that strategy.
You may already have a data strategy in place or are just starting to build one. In either case,
there’s never been a better time than now to bring this strategy together in a way that allows
you to turbocharge your business value with generative AI. You have an unprecedented
opportunity to gain a sustainable competitive advantage by differentiating with your data.
In this whitepaper, we provide founders and their teams with insights and next steps for using
data to create generative AI applications that are unique to their startup. To innovate and
compete in this arena, you need to develop a broad data strategy that includes technology,
in addition to your business priorities and use cases, your employees, and your governance
guardrails. Taken together, this strategy represents a modern view of data that ensures you can
realize business value from your generative AI applications.
We’ll focus on three areas to help you create this modern data strategy:
1 2 3
Turning your data Establishing the right Thinking beyond
into a differentiator data foundation to technology to create a
for your generative AI unlock the value of your competitive advantage
applications current data through with generative AI
generative AI
6
SEC T IO N 1
Your data is the difference between having a generic application and one that knows your
startup and customers deeply. Because of this, you’ll have to determine how to best use your
data to capture and showcase your business’s uniqueness. For most startups, the starting point
for deploying generative AI applications is an out-of-the-box foundation model (FM). A small
number of startups will choose to build their own FMs to power their generative AI applications,
but that requires extensive computing resources and highly specialized staff.
While FMs are powerful out of the box, they’re generalized by design. They are—as the name
implies—foundational. This means they are not tuned to your business needs because they can’t
access your most recent startup data or perform domain-specific tasks to fulfill user requests.
Your data is key to aligning generative AI applications to your customer experience, internal
knowledge, brand voice, and ethical parameters.
For instance, if you are an online travel agency that wants to provide better travel
recommendations to your customers through a generative AI application, you would likely want
to use data that is specific to your individual customers, such as past trips, web histories, and
travel preferences. You would also want to access aggregate data on similar traveler patterns
and trip inventories to create a better recommendation. By using your data, you create a
personalized and unique customer experience.
Also, out-of-the-box FMs are widely available, so customizing them with your own data allows
you to differentiate your generative AI applications. Let’s say you are also using an out-of-
the-box FM to draft marketing copy for your online travel agency. Your competitors may be
using the same model to the same effect. These models largely pull from the same general
knowledge pool. So, without customization, you could end up creating content that is nearly
identical to theirs and the other way around.
Customization creates a sustainable competitive advantage. You can customize your FM using a
few methods, including fine-tuning and in-context learning.
7
SEC T IO N 1
Customer spotlight
INRIX, a global provider of transportation data and analytics, is building a new solution
centered on Amazon Bedrock. This will deliver up-to-date, real-time information to help
traffic and safety engineers understand where, when, and why something is happening on our
streets—and what to do about it.
Its new Amazon Bedrock solution uses Retrieval Augmented Generation, or RAG, to augment
prompts to the underlying FM with historic data like history of speeding incidents and crashes,
and recent data like congestion status and current weather conditions.
By using its own data to augment its FM in Amazon Bedrock, INRIX can provide its customers
with fast answers to complex questions like how roads should be changed to alleviate
congestion and minimize accidents, how to determine the ideal location of a new retail store,
or even how to mitigate traffic and parking issues for the next concert.
8
SEC T IO N 1
Molecular sequences,
programming languages
Fine-tuning
With an out-of-the-box FM, you must use your data to customize the model for your unique
business needs. Fine-tuning is a good option for domain-intensive applications, such as
technical support agents or content creation unique to your business. With Amazon Bedrock,
you can securely customize an FM with your data and use other built-in tools to build
applications that know your business, your data, and your customers.
Imagine a content marketing manager who works at a leading ecommerce startup and needs to
develop fresh, targeted ad and campaign copy for an upcoming new line of handbags. To do this,
they provide Amazon Bedrock a few labeled examples housed in a data lake on Amazon Simple
Storage Service (Amazon S3) of their best-performing taglines from past campaigns, along
with the associated product descriptions. Amazon Bedrock makes a separate copy of the base
model that is accessible only to the customer for model training. After training, Amazon Bedrock
automatically generates effective social media, display ads, and web copy for the new handbags.
9
SEC T IO N 1
In-context learning
Foundation models are trained at a moment in time and it is not practical to fine-tune them
every time a dataset changes. Once they are trained, they’re no longer ingesting new knowledge
or data. They are also unable to locate and access real-time information if they need additional
context to solve a problem.
To make responses more relevant and contextual, you can ground your FM with data through
in-context learning, a technique in which the FM is guided to domain-specific, contextual data
either through prompt engineering or Retrieval Augmented Generation (RAG). Many businesses
will use RAG as their main method to perform in-context learning. RAG enables your FM to
access your startup’s most recent data to provide a more accurate, more relevant response.
Often, RAG uses vector embeddings, or numerical representations of words, phrases, or images.
Embeddings encode the semantic meaning of the source text or images so that FMs can more
easily find relationships between similar vectors and improve responses to prompts.
While you can use each of these techniques separately, the combination of fine-tuning and RAG
can help you turn your data into a differentiator for your generative AI applications.
2 Query
Relevant
1 3 information
for enhanced
Prompt context
+
Query Generated
text 5
response
Prompt
+
4 Query
+
Enhanced
Large language model endpoint
context
10
SEC T IO N 2
Your data is key for creating value with generative AI applications. It then becomes crucial to customize
your models with high-quality, relevant, readily accessible, and available-to-use data. You meet these
benchmarks by first having a strong data foundation. This foundation includes a comprehensive,
integrated set of data services for all workloads, use cases, and types of data, in addition to tools
to govern that data. The following is a high-level overview of what this data foundation looks like:
Comprehensive
For generative AI, you need to store various types of data—including unstructured, structured,
streaming, and vector data—that can be used for building and customizing models and adding
context to prompts with or without RAG. A comprehensive set of data services makes it possible
to store all of this data and to query and analyze it at scale.
Typically, a comprehensive set of data services for generative AI includes a highly durable
and scalable data lake. This data lake stores the domain-specific data you need to build and
customize your FMs. Amazon Web Services (AWS) has been helping customers build a strong
foundation for data lakes to store structured and unstructured data with services like
Amazon S3, AWS Glue, and AWS Lake Formation for years. Customers use Amazon S3 to
create hundreds of thousands of data lakes.
A data foundation for generative AI also includes high-performance knowledge stores for RAG.
AWS provides several options depending on your use case. For instance, NoSQL databases
store conversation state and history, so a chatbot can remember prior responses. Transactional
databases store context and customer information to create more personalized responses.
You can also use a knowledge store like Amazon Kendra that connects to multiple structured
and unstructured content repositories, providing a document-based knowledge source for
your FMs. Or you can use databases that have vector search capabilities, specifically designed
to be efficient at storing and retrieving embeddings. Being able to use vector search within
the databases you already use has several advantages. For instance, it eliminates the steep
learning curve for new programming tools, APIs, and SDKs. You can also feel confident knowing
that your existing databases are proven in production and meet requirements for scalability,
availability, storage, and compute. And when your vectors and business data are stored in the
same place, your applications can run faster—and there’s no data sync or data movement to
worry about. AWS offers vector search capabilities for many of our popular data stores to give
customers even more flexibility as they build their generative AI applications.
11
SEC T IO N 2
A comprehensive data foundation accounts for both data analytics and data storage. You can
use data warehouses to customize your FMs and use cases that require up-to-date operational
data, such as building an FM or other LLM to provide insights on business data through natural
language queries. Amazon Redshift is a fast, petabyte-scale data warehouse delivering up to six
times better price performance than other cloud data warehouses.1 Amazon Redshift integrates
with many of your data sources, including Amazon S3 and Amazon Aurora, so you can get a
more complete look at your data.
What every CEO should know about generative AI, McKinsey Digital, 2023
Integrated
Data integration gives you a complete view of your business and ensures your data is readily
accessible for your generative AI applications. With direct integrations between AWS services,
we’re reducing and eliminating the extract, transform, and load (ETL) process for common use
cases so your team can move faster. You can also use AWS Glue, our serverless and scalable
ETL and data integration service that makes it easier to discover, prepare, move, and integrate
data from multiple sources for analytics and machine learning. AWS connects to hundreds
of data sources, including software as a service, on premises, and other clouds, in addition to
third-party data from more than 300 data providers.
1
Amazon Redshift Price Performance 12
SEC T IO N 2
With Amazon Bedrock, all your data is encrypted at rest using your own key management
service (KMS) keys, which provide full control and visibility into how you store and access your
data and custom models. With AWS PrivateLink, you can pass your data on AWS to Amazon
Bedrock exclusively through the AWS network and never by using public internet. You can
privately customize an FM on your own virtual private cloud (VPC), so no data is ever leaked,
and no data is ever used to train or customize a model that would be available to other startups.
Amazon Titan FMs are built to detect and remove harmful content in the data you want to use
for customization, reject inappropriate content in the user input, and filter the model’s outputs
that contain inappropriate content, such as profanity or hate speech.
Earlier in this whitepaper, we also discussed the importance of customizing your FM to reflect
your specific brand and customer experience and to root out inaccurate or irrelevant content.
While customization is important to address these challenges, so is data governance in the form
of human oversight and feedback. You will need people to ensure your output reflects how you
want the world to perceive you. For example, reinforcement learning from the human feedback
method enables you to train an FM to make decisions and act while receiving guidance from
human experts. These experts look for potential complications—such as bias in the data, data
quality issues, and data gaps—and help align your FM to your brand voice, corporate guidelines,
ethics, and policies.
13
SEC T IO N 3
Technology is a fantastic enabler for generative AI, but it’s only one component of a strategy to
embed data as a strategic asset in your startup.
At AWS, we take a modern view of data strategy that is more expansive in nature. An end-to-end
data strategy is one that encompasses technology in addition to mindset, people, and processes.
This combination paves the way for your team to become data-driven by weaving data into every
aspect of your business and operations.
Mindset
Mindset refers to the way a company thinks about and treats data. A startup’s mindset is reflected
in the beliefs, values, and behaviors that create its data-driven culture with aligned use cases.
Traditionally, founders have built their data strategies with the mindset that data is a platform
and a means to build solutions. This viewpoint consistently results in a mismatch between IT
investment and improved business outcomes.
As interest in generative AI grows, we’re seeing more of this type of thinking. This trend is
understandable as founders want to use generative AI and ensure they can remain competitive.
However, they must first explore how these applications can help them solve a problem
or differentiate their business. This requires approaching generative AI as an evolving data
product that adds real value. In doing so, startups can focus on customers, not solutions, and
close the critical gap between data initiatives and business results.
14
SEC T IO N 3
The flywheel speeds transformation and creates incremental value. We summarize it as “think
big, start small, and scale fast” and have illustrated it briefly in the following example:
4. Scale 3. Build
with 9–12 experience activating
Deliver new month data products while
business value road map of implementing a
priority cloud-based data
projects foundation
As an example, online beauty retailer BEAUTY BAY strives to offer an excellent customer
experience and keep its young audience current with the latest trends. It worked with
AWS Partner BJSS on a digital transformation that affected many aspects of its business.
Together, they conducted an AWS Well-Architected review of the IT estate, revamped the
order management system, and built a cloud-native data platform on AWS.
BJSS built the data platform using an everything-as-code approach and the AWS CDK, an
open-source software development framework that defines cloud application resources using
familiar programming languages. This made it more flexible, easier, cheaper to maintain, and
quicker to deploy too, improving the IT team’s ability to innovate.
15
SEC T IO N 3
For example, Marketing Evolution built an innovative measurement and attribution solution
on AWS. To reduce time-consuming manual processes, Marketing Evolution began using
AWS Glue, a serverless data integration service that makes it simpler to discover, prepare,
migrate, and integrate data from multiple sources for analytics, ML, and application
development. Marketing Evolution cut costs and increased the efficiency of its solution,
accelerating results and increasing return on investment for customers.
Protium used Amazon Aurora to develop Turiya, its in-house lending and risk management
stack at the core of the startup’s business. Turiya gives Protium the ability to extend and
customize credit offerings to a larger user base, thanks to its AWS-powered platform. With a
workflow-based model, the team can expedite product launches and services on its platform.
Additionally, Protium can now integrate and scale across channels, all while maintaining high
availability, performance, and compliance in a cost-effective manner.
Data curator: Responsible for sourcing, collecting, and organizing high-quality datasets
that are diverse, representative, and properly labeled
Generative AI artist: E
xplores the creative possibilities of generative AI technologies t o
produce unique and innovative art pieces, music compositions, or visual designs
AI policy and regulation specialist: Shapes policies, guidelines, and regulations that
govern the responsible development, deployment, and use of generative AI systems
16
SEC T IO N 3
Generative AI can help make it possible for more employees to innovate with data. To prevent
roadblocks and bottlenecks, you must make it easier for them to discover, consume, share,
and manage data—with appropriate policies. Traditionally, the IT department has held almost
complete control over the dissemination of data in a startup. An end-to-end data strategy pushes
this responsibility to the edges, giving it to the teams that produce and consume data. We often
refer to this model as a modern data community and see customers using it to empower more
employees to make data-driven decisions that affect their specific business function.
Currently, however, the talent pool for AI, ML, and cloud expertise is limited. Startups often
lack the skilled and diverse workforce to fully adopt their data strategies. To ensure you keep
moving ahead, you need to invest in upskilling your current workforce, including those outside
of the IT department in teams such as finance or marketing. Training these teams to interact
with data removes bottlenecks by giving them access to the right data products at the right
time to make business decisions.
Upskilling employees is an important part of your data strategy. These exercises are an
investment, but aren’t the heavy lift some founders may imagine. Again, thanks to AI/ML, you
can more easily equip employees who have all levels of technical skill with the tools to analyze
data, uncover insights, and construct narratives. For example, Amazon Q, our new generative AI
assistant, helps you in Amazon QuickSight to author dashboards and create compelling visual
stories from your dashboard data using natural language. We also announced that Amazon Q
can help you create data integration pipelines using natural language. For example, you can
ask Q to “read JSON files from S3, join on ‘accountid,’ and load into DynamoDB,” and Amazon
Q will return an end-to-end data integration job to perform this action. With Amazon Q, data
analysts, scientists, and engineers can also be more productive using generative AI text-to-SQL
functionality in Amazon Redshift to query data in your data warehouse.
17
SEC T IO N 3
In the previous section, we discussed some of the tools that help mitigate the common risks
associated with generative AI. But, as startup founders know, data governance is more about
strategy than it is tool selection. Data governance needs to be architected to support wider data
and AI strategy, but implemented incrementally based on use cases and business priorities.
Founders often tell us they find it challenging to show business value from their data
governance initiatives. When they look more broadly at their data strategy and align it with
business priorities, however, they move closer to showing this value.
A broad data strategy is important because it ensures data governance policies are in step with
organizational structures. To innovate and move quickly with generative AI, your team needs
ready access to data. They also need to operate within the confines of security, sharing, and
privacy policies. You need to strike a balance between the nonnegotiables and autonomy and
speed. You can achieve this balance when you view data governance as an enabler of your
broader data strategy.
18
C O N C L U SIO N
We know that nearly every founder wants to explore the potential of generative AI, and we also
know that your next steps will be critical. Rather than use a generic application, you should focus
on what makes your business unique and use it to guide your decision-making. That uniqueness
exists within your data. To fully use your data as a strategic asset, you need to create a data strategy
that encompasses not just your technology but also your mindset, people, and process. Each layer
of this strategy firmly cements your data as a differentiator for how you build and customize your
generative AI applications and how you empower your team to innovate.
Learn how AWS makes it easy to build, scale, and realize the business
value of generative AI ›
19
G L O SS A RY
Generative AI is a type of AI that can create new content like text, code, images,
and video using patterns it has learned by training on extensive public data with
ML techniques.
Foundation models (FMs) are deep learning models trained on vast quantities
of unstructured, unlabeled data that can be used for a wide range of tasks out
of the box or adapted to specific tasks through customization.
Large language models (LLMs) make up a class of foundation models that can
process massive amounts of unstructured text and learn the relationships between
words or portions of words. This enables LLMs to generate natural language text,
performing tasks such as summarization or knowledge extraction.
©️ 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20