0% found this document useful (0 votes)
35 views60 pages

Large Language Model Based Solutions How To Deliver Value With Cost Effective Generative AI Applications 1st Edition Shreyas Subramanian

Uploaded by

odilbucico
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
35 views60 pages

Large Language Model Based Solutions How To Deliver Value With Cost Effective Generative AI Applications 1st Edition Shreyas Subramanian

Uploaded by

odilbucico
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 60

Get ebook downloads in full at ebookname.

com

Large Language Model Based Solutions How to


Deliver Value with Cost Effective Generative AI
Applications 1st Edition Shreyas Subramanian

https://ebookname.com/product/large-language-model-based-
solutions-how-to-deliver-value-with-cost-effective-
generative-ai-applications-1st-edition-shreyas-subramanian/

OR CLICK BUTTON

DOWNLOAD EBOOK

Explore and download more ebook at https://ebookname.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Generative AI 1st Edition Martin Musiol

https://ebookname.com/product/generative-ai-1st-edition-martin-
musiol/

Empowering the Public Sector with Generative AI 1st


Edition Sanjeev Pulapaka

https://ebookname.com/product/empowering-the-public-sector-with-
generative-ai-1st-edition-sanjeev-pulapaka/

Interfaceless Conscious Design for Spatial Computing


with Generative AI 1st Edition Diana Olynick

https://ebookname.com/product/interfaceless-conscious-design-for-
spatial-computing-with-generative-ai-1st-edition-diana-olynick/

Metamaterials with Negative Parameters Theory Design


and Microwave Applications Wiley Series in Microwave
and Optical Engineering 1st Edition Ricardo Marqués

https://ebookname.com/product/metamaterials-with-negative-
parameters-theory-design-and-microwave-applications-wiley-series-
in-microwave-and-optical-engineering-1st-edition-ricardo-marques/
Handbook of Microdialysis Volume 16 Methods
Applications and Perspectives Handbook of Behavioral
Neuroscience 1st Edition Ben Hc Westerink

https://ebookname.com/product/handbook-of-microdialysis-
volume-16-methods-applications-and-perspectives-handbook-of-
behavioral-neuroscience-1st-edition-ben-hc-westerink/

Ordinal Measurement in the Behavioral Sciences 1st


Edition Norman Cliff

https://ebookname.com/product/ordinal-measurement-in-the-
behavioral-sciences-1st-edition-norman-cliff/

Dvoretsky s Endgame Manual Fifth Edition Mark Dvoretsky


(Revised By Karsten Müller)

https://ebookname.com/product/dvoretsky-s-endgame-manual-fifth-
edition-mark-dvoretsky-revised-by-karsten-muller/

Post Communist Democratisation in Lithuania Elites


Parties and Youth Political Organisations 1988 2001 1st
Edition Diana Janusauskien■

https://ebookname.com/product/post-communist-democratisation-in-
lithuania-elites-parties-and-youth-political-
organisations-1988-2001-1st-edition-diana-janusauskiene/

The Shadow of the Object Psychoanalysis of the


Unthought Known 1st Edition Christopher Bollas

https://ebookname.com/product/the-shadow-of-the-object-
psychoanalysis-of-the-unthought-known-1st-edition-christopher-
bollas/
Posted Work in the European Union The Political Economy
of Free Movement 1st Edition Jens Arnholtz (Editor)

https://ebookname.com/product/posted-work-in-the-european-union-
the-political-economy-of-free-movement-1st-edition-jens-arnholtz-
editor/
CONTENTS AT A GLANCE

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

CHAPTER 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2 Tuning Techniques for Cost Optimization. . . . . . . . . . . . . . . . . . . . . . 29
CHAPTER 3 Inference Techniques for Cost Optimization . . . . . . . . . . . . . . . . . . . 49
CHAPTER 4 Model Selection and Alternatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
CHAPTER 5 Infrastructure and Deployment Tuning Strategies. . . . . . . . . . . . . . 123

CONCLUSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Large Language Model–Based Solutions
Large Language Model–Based Solutions
HOW TO DELIVER VALUE WITH COST-EFFECTIVE
GENERATIVE AI APPLICATIONS

Shreyas Subramanian
Copyright © 2024 by John Wiley & Sons Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.


Published simultaneously in Canada and the United Kingdom.

ISBNs: 9781394240722 (Paperback), 9781394240746 (ePDF), 9781394240739 (ePub)

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of
the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923,
(978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission
should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201)
748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its
affiliates in the United States and other countries and may not be used without written permission. All other trademarks are
the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in
this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this
book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book
and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be
created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not
be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware
that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not
limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care
Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

If you believe you’ve found a mistake in this book, please bring it to our attention by emailing our reader support team at
[email protected] with the subject line “Possible Book Errata Submission.”
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in
electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging in Publication data available on request.

Cover image: © CSA-Printstock/Getty Images


Cover design: Wiley
To my wife, Divya Prabhakar, for her infinite patience, love,
and support. For being the spark, the melody, and the joy in
every chapter, and for always making me the hero of our story.
ABOUT THE AUTHOR

Dr. Shreyas Subramanian has been at the forefront of driving revolutionary advancements in machine
learning (ML) and artificial intelligence that resonate with businesses and researchers alike. With a PhD in
aerospace engineering from Purdue University, Dr. Subramanian currently serves as a principal data scientist at
Amazon, a position held by few people worldwide. His prolific research record includes 26 academic papers
and six patents, with significant citations to date. His two previous books in the field of AI have sold thousands
of copies, with his latest book, Applied Machine Learning and High-­Performance Computing, being one of the
top 50 books covering AI sold on Amazon and one of the only books bridging the gap between HPC and AI. His
earlier AWS AI certification guide was ranked the top 9th bestseller in the AI category worldwide.
With a rich and extensive career, Dr. Subramanian has championed the development and application of AI/
ML models while carving a distinct leadership path within Amazon. His achievements range from implementing
AI/ML solutions for use cases in core verticals, including manufacturing, aerospace, automotive, financial services,
and healthcare, to fundamental Artificial Intelligence research. Particularly noteworthy is his role as the creator of
the open-­source ML package ezsmdeploy, which simplifies the deployment of models on the cloud to a single-­line
API call and has garnered more than 50,000 downloads so far. Most recently, Dr. Subramanian has been involved
in helping train generative large language models like ChatGPT for customers of Amazon Web Services in a
cost-­efficient way. This speaks volumes about his influence in democratizing ML and fostering a community of
practitioners.
Dr. Subramanian’s PhD dissertation focused on developing algorithms for complex aerospace systems
design problems. Since then, he has published several seminal papers on topics such as evolutionary algorithms,
surrogate modeling, distributed optimization, deep learning, and language modeling. Dr. Subramanian’s compre-
hensive expertise extends to academia and industry, where he has served as a reviewer for prominent journals
and conferences, contributing to the academic community. Recently, Dr. Subramanian won the Best Presentation
Award at the Pattern Recognition and Machine Learning 2023 conference for his work on a novel scheduler for
faster language model training. He has also been an invited judge and session chair for major conferences such as
IEEE, INFORMS, and AIAA.
Dr. Subramanian’s research has attracted significant interest from government funding agencies. He was
invited to serve on five NSF review panels on artificial intelligence to evaluate proposals worth up to $2 million
in Small Business Innovation Research grants for startups and small businesses. One of Dr. Subramanian’s signifi-
cant contributions lies in his ability to secure funding for pioneering projects in topics related to applied machine
learning. His skill in proposal writing secured more than $4.6 million in funding from NASA while he was the
director of research at a NASA subcontractor, where he helped identify and solve problems related to aviation
safety using AI/ML tools on the cloud. Dr. Subramanian exemplifies leadership in the AI research community with
elite academic credentials and impactful real-world contributions. He was recently nominated and selected to be
an IEEE senior member, a distinction held by only 8% of IEEE’s 400,000+ members worldwide.
In his current role as a principal data scientist at Amazon, Dr. Subramanian’s contributions have led to sub-
stantial cost savings for numerous businesses. His efforts in architecting, building, and scaling large ML models
have resulted in remarkable annual savings of hundreds of thousands of dollars for clients. Moreover, his guid-
ance has led to the success of end-­to-­end advanced driver assistance systems (ADASs) and self-­driving car pro-
jects, underpinning the vital intersection of ML and automotive technology, which is currently considered a key
milestone in the field of AI. At Amazon, Dr. Subramanian leads a team of machine learning solutions architects
and researchers across several projects. Internally at Amazon, several of his ideas have been ­incorporated into
new product features for Amazon’s machine learning services. By identifying areas of cost optimization within
machine learning operations, Dr. Subramanian has collectively saved millions of dollars for clients. For example,
he reduced production costs by 8% per quarter for one of the world’s largest contract manufacturers, saving
­millions of dollars. In another instance, Dr. Subramanian reduced the cost of tuning a large number of models
for a customer by more than 99%, from hundreds of thousands of dollars per year to just dozens.
About the Author

This extreme interest in applying cost optimization principles to “do more with less” has led to this book on
optimizing performance with cost in the era of large language models.
Dr. Subramanian continues publishing cutting-­edge papers in the field of AI, filing high-­value patents, writing
books with a unique viewpoint, and speaking at major AI conferences.

x
ABOUT THE TECHNICAL EDITOR

Rabi Jay is a renowned expert in digital transformation and enterprise AI, boasting more than 15 years of rich
experience in guiding businesses through the complexities of technology-­driven change. His expertise encom-
passes a wide range of areas, including AI-­driven martech innovation, platform modernization, enterprise asset
consolidation, and efficiency enhancement through automated workflows. Jay’s proficiency is further reinforced
by an impressive array of certifications spanning AWS, Azure, SAP, ITIL, TOGAF, and SAFe Agile, demonstrating
his comprehensive understanding of both the technical and strategic aspects of digital transformation.
Beyond his technical acumen, Jay has demonstrated exceptional leadership and global strategic insight as a
global alliance manager with Deloitte. He skillfully leads large-­scale, multinational projects across diverse sectors
such as retail, food, consumer products, aerospace, and software technology. As a VP of digital transformation,
he championed an integrated practice using human-­centered design, AI platforms, and change management built
upon the principles of design thinking and process reengineering. An accomplished author and speaker, Jay has
contributed significantly to the thought leadership on AI and cloud technologies, with notable books including
SAP NetWeaver Portal Technology: The Complete Reference (McGraw-­Hill, 2008) and Enterprise AI in the
Cloud: A Practical Guide to Deploying End-­to-­End Machine Learning and ChatGPT Solutions (Wiley, 2024).
His LinkedIn newsletter, “Enterprise AI Transformation: Playbook for Professionals and Businesses to Implement
AI,” is a testament to his passion for sharing knowledge and best practices in generative AI, cloud adoption, and
AI implementation. Outside his professional pursuits, Jay is an avid traveler, golfer, ping-­pong enthusiast, and
dedicated self-­development coach with a keen interest in yoga and meditation.
CONTENTS

INTRODUCTION xix

CHAPTER 1: INTRODUCTION 1

Overview of GenAI Applications and Large Language Models 1


The Rise of Large Language Models 1
Neural Networks, Transformers, and Beyond 2
GenAI vs. LLMs: What’s the Difference? 5
The Three-Layer GenAI Application Stack 6
The Infrastructure Layer 6
The Model Layer 7
The Application Layer 8
Paths to Productionizing GenAI Applications 9
Sample LLM-Powered Chat Application 11
The Importance of Cost Optimization 12
Cost Assessment of the Model Inference Component 12
Cost Assessment of the Vector Database Component 19
Benchmarking Setup and Results 20
Other Factors to Consider 23
Cost Assessment of the Large Language Model Component 24
Summary 27

CHAPTER 2: TUNING TECHNIQUES FOR COST OPTIMIZATION 29

Fine-Tuning and Customizability 29


Basic Scaling Laws You Should Know 30
Parameter-Efficient Fine-Tuning Methods 32
Adapters Under the Hood 33
Prompt Tuning 34
Prefix Tuning 36
P-tuning 39
IA3 40
Low-Rank Adaptation 44
Cost and Performance Implications of PEFT Methods 46
Summary 48
Contents

CHAPTER 3: INFERENCE TECHNIQUES FOR COST OPTIMIZATION 49

Introduction to Inference Techniques 49


Prompt Engineering 50
Impact of Prompt Engineering on Cost 50
Estimating Costs for Other Models 52
Clear and Direct Prompts 53
Adding Qualifying Words for Brief Responses 53
Breaking Down the Request 54
Example of Using Claude for PII Removal 55
Conclusion 59
Providing Context 59
Examples of Providing Context 60
RAG and Long Context Models 60
Recent Work Comparing RAG with Long Content Models 61
Conclusion 62
Context and Model Limitations 62
Indicating a Desired Format 63
Example of Formatted Extraction with Claude 63
Trade-­Off Between Verbosity and Clarity 66
Caching with Vector Stores 66
What Is a Vector Store? 66
How to Implement Caching Using Vector Stores 66
Conclusion 69
Chains for Long Documents 69
What Is Chaining? 69
Implementing Chains 69
Example Use Case 70
Common Components 70
Tools That Implement Chains 72
Comparing Results 76
Conclusion 76
Summarization 77
Summarization in the Context of Cost and Performance 77
Efficiency in Data Processing 77
Cost-­Effective Storage 77
Enhanced Downstream Applications 77
Improved Cache Utilization 77
Summarization as a Preprocessing Step 77
Enhanced User Experience 77
Conclusion 77

xiv
Contents

Batch Prompting for Efficient Inference 78


Batch Inference 78
Experimental Results 80
Using the accelerate Library 81
Using the DeepSpeed Library 81
Batch Prompting 82
Example of Using Batch Prompting 83
Model Optimization Methods 83
Quantization 83
Code Example 84
Recent Advancements: GPTQ 85
Parameter-­Efficient Fine-­Tuning Methods 85
Recap of PEFT Methods 85
Code Example 86
Cost and Performance Implications 87
Summary 88
References 88

CHAPTER 4: MODEL SELECTION AND ALTERNATIVES 89

Introduction to Model Selection 89


Motivating Example: The Tale of Two Models 89
The Role of Compact and Nimble Models 90
Examples of Successful Smaller Models 91
Quantization for Powerful but Smaller Models 91
Text Generation with Mistral 7B 93
Zephyr 7B and Aligned Smaller Models 94
CogVLM for Language-Vision Multimodality 95
Prometheus for Fine-Grained Text Evaluation 96
Orca 2 and Teaching Smaller Models to Reason 98
Breaking Traditional Scaling Laws with Gemini and Phi 99
Phi 1, 1.5, and 2 B Models 100
Gemini Models 102
Domain-Specific Models 104
Step 1 - Training Your Own Tokenizer 105
Step 2 - Training Your Own Domain-Specific Model 107
More References for Fine-Tuning 114
Evaluating Domain-Specific Models vs. Generic Models 115
The Power of Prompting with General-Purpose Models 120
Summary 122

xv
Contents

CHAPTER 5: INFRASTRUCTURE AND DEPLOYMENT TUNING STRATEGIES 123

Introduction to Tuning Strategies 123


Hardware Utilization and Batch Tuning 124
Memory Occupancy 126
Strategies to Fit Larger Models in Memory 128
KV Caching 130
PagedAttention 131
How Does PagedAttention Work? 131
Comparisons, Limitations, and Cost Considerations 131
AlphaServe 133
How Does AlphaServe Work? 133
Impact of Batching 134
Cost and Performance Considerations 134
S3: Scheduling Sequences with Speculation 134
How Does S3 Work? 135
Performance and Cost 135
Streaming LLMs with Attention Sinks 136
Fixed to Sliding Window Attention 137
Extending the Context Length 137
Working with Infinite Length Context 137
How Does StreamingLLM Work? 138
Performance and Results 139
Cost Considerations 139
Batch Size Tuning 140
Frameworks for Deployment Configuration Testing 141
Cloud-­Native Inference Frameworks 142
Deep Dive into Serving Stack Choices 142
Batching Options 143
Options in DJL Serving 144
High-­Level Guidance for Selecting Serving Parameters 146
Automatically Finding Good Inference Configurations 146
Creating a Generic Template 148
Defining a HPO Space 149
Searching the Space for Optimal Configurations 151
Results of Inference HPO 153
Inference Acceleration Tools 155
TensorRT and GPU Acceleration Tools 156
CPU Acceleration Tools 156
Monitoring and Observability 157

xvi
Contents

LLMOps and Monitoring 157


Why Is Monitoring Important for LLMs? 159
Monitoring and Updating Guardrails 160
Summary 161

CONCLUSION 163
INDEX 181

xvii
Introduction

WHAT’S IN THIS CHAPTER?

➤➤ GenAI Applications and Large Language Models


➤➤ Importance of Cost Optimization
➤➤ Micro Case Studies
➤➤ Who Is This Book For?

GenAI APPLICATIONS AND LARGE LANGUAGE MODELS


Large language models (LLMs) have evolved to become a cornerstone in the domain of text-based content
generation. They can produce coherent and contextually relevant text for a variety of applications, making them
invaluable assets in today’s digital landscape. One notable example is OpenAI’s GPT-4, which reportedly ranked
in the 90th percentile of human test takers on the Uniform BAR Examination, showcasing its advanced language
understanding and generation capabilities​. Generative AI tools (like ChatGPT, for example) may use LLMs, but
also other kinds of large models (e.g., foundational vision models). These models serve as the backbone for many
modern applications, facilitating a multitude of tasks that would otherwise require substantial human effort for
building bespoke, application-specific models. The capabilities of these models to understand, interpret, and gen-
erate human-like text are not only pushing the boundaries of what’s achievable with AI but also unlocking new
avenues for innovation across different sectors. To reemphasize what’s already obvious, Figure 1 shows Google
Trends interest over time for the term Generative AI worldwide.

FIGURE 1: Google Trends chart of interest over time for the term Generative AI worldwide
Introduction

Generative AI (GenAI) and LLMs represent two interlinked domains within artificial intelligence, both focusing
on content generation but from slightly different angles. GenAI encompasses a broader category of AI technolo-
gies aimed at creating original content. While LLMs excel at text processing and production, GenAI places a
broader emphasis on creativity and content generation across different mediums. Understanding the distinctions
and potential synergies between these two areas is crucial to fully harness the benefits of AI in various applica-
tions, ranging from automated customer service and content creation to more complex tasks such as code genera-
tion and debugging​. This field has seen rapid advancements, enabling enterprises to automate intelligence across
multiple domains and significantly accelerate innovation in AI development​​. On the other hand, LLMs, being a
subset of GenAI, are specialized in processing and generating text. They have demonstrated remarkable capabili-
ties, notably in natural language processing tasks and beyond, with a substantial influx of research contributions
propelling their success​.
The proliferation of LLMs and GenAI applications has been fueled by both competitive advancements and col-
laborative efforts within the AI community, with various stakeholders including tech giants, academic institutions,
and individual researchers contributing to the rapid progress witnessed in recent years​. In the following sections,
we will talk about the importance of cost optimization in this era of LLMs, explore a few case studies of success-
ful companies in this area, and describe the scope of the rest of the book.

IMPORTANCE OF COST OPTIMIZATION


The importance of cost optimization in the development and operation of GenAI applications and LLMs cannot
be understated. Cost can ultimately make or break the progress toward a company’s adoption of GenAI. This
necessity stems from various aspects of these technologically advanced models. GenAI and LLMs are resource-
intensive by nature, necessitating substantial computational resources to perform complex tasks. Training state-
of-the-art LLMs such as OpenAI’s GPT-3 can involve weeks or even months of high-performance computing. This
extensive computational demand translates into increased costs for organizations leveraging cloud infrastructure
and operating models.
The financial burden of developing GenAI models is considerable. For instance, McKinsey estimates that develop-
ing a single generative AI model costs up to $200 million, with up to $10 million required to customize an exist-
ing model with internal data and up to $2 million needed for deployment. Moreover, the cost per token generated
during inference for newer models like GPT-4 is estimated to be 30 times more than that of GPT-3.5, showing a
trend of rising costs with advancements in model capabilities. The daily operational cost for running large models
like ChatGPT is significant as well, with OpenAI reported to spend $700,000 daily to maintain the model’s
operations.
GenAI models require high utilization of specialized hardware like graphics processing units (GPUs) and tensor
processing units (TPUs) to accelerate model training and inference. These specialized hardware units come at a
premium cost in cloud infrastructure, further driving up the expenses. Companies trying to do this on-premises,
without the help of cloud providers, may need a significant, up-front capital investment.
Beyond compute requirements, large-scale, high-performance data storage is imperative for training and fine-
tuning GenAI models, with the storage and management of extensive datasets incurring additional cloud storage
costs. As AI models evolve and adapt to ever-increasing stores of data (like the Internet), ongoing storage require-
ments further contribute to overall expenses. This is why scalability poses a significant challenge in cost optimiza-
tion. Rapid scaling to accommodate the resource demands of GenAI applications can lead to cost inefficiencies if
not managed effectively. Overscaling can result in underutilized resources and unnecessary expenditure, whereas
underscaling may hinder model performance and productivity.
Strategies to optimize costs while scaling GenAI in large organizations include prioritizing education across
all teams, creating spaces for innovation, and reviewing internal processes to adapt for faster innovation
where possible.
Pre-training a large language model to perform fundamental tasks serves as a foundation for an AI system, which
can then be fine-tuned at a lower cost to perform a wide range of specific tasks. This approach aids in cost opti-
mization while retaining model effectiveness for specific tasks.

xx
Introduction

Conducting a thorough cost-value assessment to rank and prioritize GenAI implementations based on potential
impact, cost, and complexity can lead to better financial management and realization of ROI in GenAI initiatives.
Lastly, the most common pattern seen today is for “model providers” to spend and try to recoup their costs by
providing an API and for “model consumers” to heavily optimize their costs by using GenAI model APIs without
the need for any up-front investment or even data.

Challenges and Opportunities


The pathway to cost optimization in GenAI applications with large language models is laden with both challenges
and opportunities. These arise from the inherent complexities of the models and the evolving landscape of AI
technologies. The following are the principal challenges and the accompanying opportunities in this domain:

Computational demands: LLMs like GPT-3 or BERT require substantial computational resources for training
and inference. The high computational demands translate to increased operational costs and energy consumption,
which may create barriers, especially for small to medium-sized enterprises (SMEs) with limited resources.
Opportunity: The challenge of computational demands opens the door for innovation in developing more effi-
cient algorithms, hardware accelerators, and cloud-based solutions that can reduce the cost and energy footprint
of operating LLMs.
Model complexity: The complexity of LLMs, both in terms of architecture and the amount of training data
required, presents challenges in achieving cost optimization. The model’s size often correlates with its perfor-
mance, with larger models generally delivering better results at the expense of increased costs.
Opportunity: This challenge catalyzes the exploration and adoption of techniques such as model prun-
ing, quantization, and knowledge distillation that aim to reduce model size while retaining or even enhancing
performance.
Data privacy and security: Handling sensitive data securely is a paramount concern, especially in sectors such as
healthcare and finance. The cost of ensuring data privacy and security while training and deploying LLMs can be
significant.
Opportunity: The necessity for robust data privacy and security solutions fosters innovation in privacy-pre-
serving techniques, such as federated learning, differential privacy, and encrypted computation.
Scalability: Scaling GenAI applications to accommodate growing data and user demands without a proportional
increase in costs is a formidable challenge.
Opportunity: This challenge drives the advancement of scalable architectures and technologies that allow for
efficient scaling, such as microservices, container orchestration, and serverless computing.
Model generalizability and domain adaptation: Achieving high performance on domain-specific tasks often
requires fine-tuning LLMs with additional data, which can be cost-intensive.
Opportunity: This creates a niche for developing techniques and frameworks that facilitate efficient domain
adaptation and transfer learning, enabling cost-effective customization of LLMs for various domain-specific
applications.
Evolving regulatory landscape: The regulatory landscape surrounding AI and data usage is continually evolving,
potentially incurring compliance costs.
Opportunity: The dynamic regulatory environment stimulates the development of adaptable AI systems and
compliance monitoring tools that can mitigate the risks and costs associated with regulatory compliance.
Each of these challenges, while posing hurdles, concurrently lays the groundwork for innovation and advance-
ments that can significantly contribute to cost optimization in GenAI applications with large foundational mod-
els. The confluence of these challenges is an important factor in propelling the field of GenAI forward, fostering
the development of cost-effective, efficient, and robust GenAI packages, software, and solutions. The myriad
of factors contributing to the high costs in the development, deployment, and operation of GenAI and LLMs

xxi
Introduction

necessitates a structured approach toward cost optimization to ensure the sustainable adoption and scalability of
these transformative technologies. This book dives into the details of what makes GenAI applications powerful
but costly and highlights several aspects of balancing performance with cost to ensure the success of organiza-
tions that make use of large foundational models. Next, we will look at a few case studies as motivation for the
rest of the book.

MICRO CASE STUDIES


This section focuses on three different companies that have “walked the walk” in terms of putting large models
in production. What “in production” means is different for different companies, as you will see in the following
studies. The case studies should provide a glimpse into the kind of effort and investment required to be involved
in the deployment and production usage of foundational models like LLMs in the form of GenAI applications.

OpenAI: Leading the Way


Founded in 2015, OpenAI embarked on a mission to ensure that artificial general intelligence (AGI) benefits
all of humanity. Initially operating as a nonprofit, it pledged to collaborate freely with other institutions and
researchers, making its patents and research public. The early years saw the launch of OpenAI Gym and Universe,
platforms dedicated to reinforcing learning research and measuring AI’s general intelligence across a spec-
trum of tasks.
As AI technology advanced, OpenAI rolled out GPT-1 in 2018, marking its venture into robust language models.
GPT-1, with 117 million parameters, showcased the potential of generating coherent language from prompts,
although it had its limitations such as generating repetitive text. Addressing these challenges, OpenAI unveiled
GPT-2 in 2019 with 1.5 billion parameters, offering improved text generation capabilities. In 2020, the release of
GPT-3, a behemoth with 175 billion parameters, set a new standard in the NLP realm. GPT-3’s ability to generate
sophisticated responses across a variety of tasks and create novel content such as computer code and art show-
cased a significant leap in AI capabilities.
By late 2022, OpenAI transitioned ChatGPT to GPT-3.5 and eventually introduced GPT-4 in March 2023,
further enhancing the system’s multimodal capabilities and user engagement with a subscription model, ChatGPT
Plus. OpenAI’s trajectory has been significantly bolstered by robust financial backing, amassing a total of $11.3
billion in funding over 10 rounds until August 2023. Noteworthy is the $13 billion investment from Microsoft,
which has provided not only a substantial financial runway but also strategic partnerships in various ventures.
OpenAI operates on a pricing model hinging on cost per request and monthly quotas, providing a straightforward
and flexible pricing structure for its users. The pricing varies with the type of model, with distinct models like
OpenAI Ada and OpenAI Babbage priced differently for different use cases. The revenue landscape of OpenAI is
on an upswing, with projections indicating a surge from $10 million in 2022 to $200 million in 2023, and a stag-
gering $1 billion by 2024.
OpenAI’s CEO, Sam Altman, revealed a revenue pace crossing a $1.3 billion annualized rate, demonstrating a
significant revenue potential with the growing user base and subscription services. The launch of ChatGPT saw
a rapid user base expansion, reaching 100 million monthly active users within just two months post-launch.
Moreover, the introduction of a paid subscription service, ChatGPT Plus, didn’t deter the growth, indicating a
strong user willingness to pay for enhanced services. The substantial user engagement, especially from large rev-
enue companies, correlates directly with the rising revenue trajectory.
OpenAI’s journey elucidates a nuanced navigation through technological advancements, financial fortification,
and a user-centric operational model. The continual investment in cutting-edge AI models, coupled with a grow-
ing user base and strategic financial backing, underscores OpenAI’s substantial impact in the AI domain and its
potential for further revenue generation and technological innovation.

xxii
Introduction

Hugging Face: Open-Source Community Building


Founded in 2016, Hugging Face pioneered an open ecosystem for natural language processing (NLP) based on
sharing pre-trained models. By 2022, its website hosted more than 100,000 daily active users accessing a broad
range of AI capabilities. However, the emergence of LLMs—­AI systems with billions of parameters—­threatened
Hugging Face’s ability to support user growth economically. This case examines how Hugging Face adapted its
platform architecture and operations to scale out and serve massive user demand while keeping costs contained
even as model sizes exploded.
In recent years, AI models have grown exponentially larger. For example, OpenAI’s GPT-3 contained 175 billion
parameters in 2020. The trend accelerated in 2021 and 2022, with models reaching trillions of parameters. Practi-
cally, we see that this vertical scaling to larger and larger models may not be sustainable, so several companies are
considering hosting a collection of large (as opposed to one very large) model. These LLMs demonstrated new
NLP capabilities but required massive compute resources for training and inference. For Hugging Face, LLMs
presented a dilemma. Users expected access to cutting-edge models like GPT-3, but running them required costly
cloud computing resources. As a small startup, Hugging Face had limited ability to absorb these costs, especially
as user counts approached six figures. Providing LLMs through their existing infrastructure would force Hugging
Face to either restrict access, pass costs to users, or operate at a loss. A new approach was needed to economically
scale out AI: optimizing model hosting. Hugging Face’s first initiative focused on optimizing their model hosting
architecture. In their original setup, models were stored together with code in a monolithic GitHub repository.
This might have worked initially but did not allow computational separation of storage and inference. Engineers
redesigned the architecture as microservices, splitting storage and compute. Models were moved to scalable cloud
object storage like S3, while compute happened in isolated containers on demand. This allowed independently
scaling storage and compute to match user demand. Large models could be affordably stored while compute
scaled elastically with usage.
Next, Hugging Face optimized inference itself. Out-of-the-box PyTorch and TensorFlow were flexible but slow.
So, engineers created optimized model servers that reduced overhead. For example, request batching allowed
amortizing costs over multiple inferences. Execution was also streamlined by eliminating excess framework code.
Together, these optimizations reduced compute requirements by up to 3x. Additional savings came from aggres-
sively right-sizing instances. Usage patterns and models were analyzed to select ideal CPU/GPU configurations.
The result was inference costs cut by nearly 80% compared to off-the-shelf solutions.
Democratizing access with caching despite optimizations, LLMs still carried high compute costs. To further
reduce expenses, Hugging Face deployed aggressive caching: once a model produced an output for a given input,
the result was cached. Subsequent identical requests reused the cached output rather than rerunning inference.
Popular models saw cache hit rates above 90%, greatly reducing compute needs. This worked thanks to Hugging
Face’s scale; similar inputs recurred frequently across the large user base. Caching allowed democratizing access
to expensive LLMs that would otherwise be available to only few users. The cache layer also added monitoring
capabilities for usage insights.
As usage grew, Hugging Face needed further scalability. Its final strategy was pooling community resources via a
federated compute network. Users could volunteer spare computing power in return for platform credit. Requests
were dynamically routed to volunteer resources based on load, geographic proximity, and costs. This federated
architecture achieved almost unlimited scale at low costs by tapping underutilized capacity. Volunteers benefited
by earning credits for their own platform usage. The network was unified through a blockchain-based coordina-
tion layer for secure decentralized orchestration. Hugging Face’s architectural optimizations and federated model
enabled scaling to serve more than 100,000 daily users at just $0.001 inference cost per request. Despite expo-
nential LLM growth, costs remained contained through efficiency gains. Platform contributions also increased as
volunteers shared resources in exchange for credits.
This scalable, open-source oriented approach unlocked AI for the entire community. By innovatively pooling
collective capacity, Hugging Face democratized access to capabilities once available only to tech giants. This story
provides lessons for sustainably scaling out AI alongside the relentless growth in model size and complexity.

xxiii
Introduction

Bloomberg GPT: LLMs in Large Commercial Institutions


Bloomberg, known worldwide for its financial data and analytics, took a big step by developing its large language
model called Bloomberg GPT. This was driven by the growing need for better NLP capabilities in finance to help
with decision-making and customer interactions.
Bloomberg’s venture into the realm of LLMs represents a forward-thinking endeavor to harness the potential of
AI in financial analytics and services. With an ambitious goal, Bloomberg aimed to develop a model capable of
understanding and generating human-like text, tailored to the financial sector’s nuanced needs. The project was
not only a technological endeavor but also a strategic move to stay ahead in the highly competitive financial
information services arena.
The model, boasting 50 billion parameters, is a testament to Bloomberg’s commitment to cutting-edge innovation.
This extensive model size necessitated a significant investment in computational resources. The training phase
consumed a staggering 1.3 million hours of GPU time, showcasing the intensive computational demand that large
language models entail. Yet, it was a necessary venture to develop a model with a deep understanding of financial
lexicon and concepts.
Bloomberg’s approach was unique. The company engaged in reinforcement learning from human feedback
(RLHF), a method that utilized human feedback to fine-tune the model iteratively. This approach enabled the
model to better understand and generate financial text, improving its performance significantly over several itera-
tions. The in-house development allowed for a tailored approach, ensuring the model’s alignment with Bloomb-
erg’s specific requirements in financial analytics and reporting.
The financial commitment to this project was substantial, reflecting Bloomberg’s strategic investment in AI as
a long-term asset. While the exact figures remain undisclosed, industry estimates place the development of such
models in the range of tens to hundreds of millions of dollars. The investment extends beyond the model itself
to a robust infrastructure capable of supporting the model’s computational demands and the talent required to
develop and maintain such a sophisticated AI system.
The ability to provide insightful financial analytics and generate human-like text proved to be a valuable asset,
offering a competitive advantage in the fast-paced financial services sector. Several months after the publication of
the model, no other organization of the same scale has publicly announced a competitive foundational model for
finance. The model’s success also demonstrates the significant potential and value that large language models hold
in specialized domains.
As of this writing, Bloomberg plans to commercialize this technology by integrating it into its existing suite of
financial analytics tools. The model will power new features, providing more in-depth insights and analytics to
Bloomberg’s clientele. Additionally, the model serves as a foundation for future internal and external-facing AI
projects, showcasing the company’s capability and commitment to leveraging AI for better financial analysis and
decision-making.
The Bloomberg GPT project underscores the substantial financial and computational investments required to
develop specialized large language models. It also illustrates the strategic importance of AI in the financial sector,
not only as a tool for better analytics but as a competitive differentiator in a market where timely and accurate
information is crucial.

WHO IS THIS BOOK FOR?


This book was crafted with a broad spectrum of readers in mind, encompassing a range of individuals who are
either enthralled by the promise of GenAI or actively engaged in its exploration and application. Whether you are
a budding enthusiast, a citizen data scientist, a seasoned researcher, a rockstar engineer, or a visionary decision-
maker, this book has insights that can help you along the pathway to cost-effective GenAI applications.

xxiv
Introduction

AI practitioners: For those immersed in the day-to-day endeavor of building, tuning, and deploying AI models,
this book offers a collection of strategies and techniques for cost optimization, helping to maximize the value and
impact of your work while minimizing expenditure.
Researchers: Academics and researchers delving into the frontiers of GenAI and large language models will find
a structured discourse on the economic aspects that underpin the practical deployment of research findings. This
book aims to bridge the chasm between academic exploration and real-world application, shedding light on cost-
effectiveness as a critical vector.
Engineers: Engineers standing at the confluence of software, hardware, and AI will discover a wealth of knowl-
edge on how to architect, implement, and optimize systems for cost efficiency while harnessing the potential of
large language models.
Educators and students: Educators aiming to equip students with a holistic understanding of GenAI will find this
book a valuable resource. Similarly, students aspiring to delve into this exciting domain will garner a pragmatic
understanding of the cost dynamics involved.
Tech enthusiasts: If you are captivated by the unfolding narrative of AI and its potential to shape the future, this
book offers a lens through which you can appreciate the economic dimensions that are integral to making this
promise a reality.
Policy makers: Those engaged in shaping the policy framework around AI and data utilization will find insightful
discussions on the cost considerations that are imperative for fostering a sustainable and inclusive AI ecosystem.
Decision-makers: For decision-makers steering the strategic direction of organizations, this book provides a lucid
understanding of the economic landscape of GenAI applications. It elucidates the cost implications, risks, and
opportunities that accompany the journey toward leveraging GenAI for business advantage.
In essence, this book caters to a large and diverse readership, aiming to engender a nuanced understanding of cost
optimization in the realm of GenAI and large language models. Through a blend of technical exposition, real-
world case studies, and strategic insights, it seeks to foster an informed dialogue and pragmatic action toward
cost-effective and responsible AI deployment.

SUMMARY
This chapter introduced the world of GenAI and LLMs and highlighted the importance of cost optimization.
It presented three micro case studies to help you further understand what it takes for even large, well-funded
organizations to achieve scale while controlling costs.

xxv
Large Language Model–Based Solutions
1
Introduction
WHAT’S IN THIS CHAPTER?

➤➤ Overview of GenAI Applications and Large Language Models


➤➤ Paths to Productionizing GenAI Applications
➤➤ The Importance of Cost Optimization

OVERVIEW OF GenAI APPLICATIONS AND LARGE


LANGUAGE MODELS
In this section, we introduce GenAI applications and large language models.

The Rise of Large Language Models


Large language models (LLMs) have become a cornerstone of artificial intelligence (AI) research and appli-
cations, transforming the way we interact with technology and enabling breakthroughs in natural language
processing (NLP). These models have evolved rapidly, with their origins dating back to the 1950s and
1960s, when researchers at IBM and Georgetown University developed a system to automatically translate
a collection of phrases from Russian to English. The early pioneers were optimistic that human-level intel-
ligence would soon be within reach. However, building thinking machines akin to the human mind proved
more challenging than anticipated. In the initial decades, research in AI was focused on symbolic reasoning
and logic-based systems. But these early AI systems were quite brittle and limited in their capabilities. They
struggled with commonsense knowledge and making inferences in the real world.
By the 1980s, AI researchers realized that rule-based programming alone could not replicate the versatil-
ity and robustness of human intelligence. This led to the emergence of machine learning techniques, where
algorithms are trained on large amounts of data to pick up statistical patterns. Instead of hard-coding
complex rules, the key idea was to have systems automatically learn from experience and improve their
performance. Machine learning enabled progress in specialized domains such as computer vision and
speech recognition. But the overarching goal of achieving artificial general intelligence remained distant.
The limitations of earlier approaches led scientists to look at AI through a new lens. Rather than explicit
programming, perhaps deep learning neural networks could be the answer. Neural networks are com-
puting systems inspired by the biological neural networks in the human brain. They consist of layers of
2 ❘ CHAPTER 1  Introduction

interconnected nodes that transmit signals between input and output. By training on huge amounts of data, these
multilayered networks could potentially learn representations and patterns too complex for humans to hard-code
using rules.

NOTE Language is a complex and intricate system of human expressions governed by


grammatical rules. It therefore poses a significant challenge to develop capable AI algo-
rithms for comprehending and grasping a language. Language modeling is one of the major
approaches to advancing machine language intelligence. In general, language modeling aims
to model the generative likelihood of word sequences, so as to predict the probabilities of
future (or missing) tokens. Language modeling research has received extensive attention in
the literature, which can be divided into four major development stages: statistical language
models (SLMs), neural language models (NLMs), pre-trained language models (PLMs), and
large language models.

In the 2010s, deep learning finally enabled a breakthrough in AI capabilities. With sufficient data and comput-
ing power, deep neural networks achieved remarkable accuracy in perception tasks such as image classification
and speech recognition. However, these systems were narrow in scope, focused on pattern recognition in specific
domains. Another challenge was that they required massive labeled datasets for supervised training. Obtaining
such rich annotation at scale for complex cognitive tasks proved infeasible.
This is where self-supervised generative modeling opened new possibilities. By training massive neural network
models to generate representations from unlabeled data itself, systems could learn powerful feature representa-
tions. Self-supervised learning could scale more easily by utilizing the abundant digital data available on the
Internet and elsewhere. Language modeling emerged as a promising approach, where neural networks are trained
to predict the next word in a sequence of text.

Neural Networks, Transformers, and Beyond


Language modeling has been studied for decades using statistical methods like n-gram models. But neural net-
work architectures were found to be much more effective, leading to the field of neural language modeling. Word
vectors trained with language modeling formed useful representations that could be leveraged for various natural
language processing tasks.
Around 2013, an unsupervised learning approach called word2vec became popular. It allowed efficiently training
shallow neural networks to generate word embeddings from unlabeled text data. The word2vec embeddings were
useful for downstream NLP tasks when used as input features. This demonstrated the power of pre-training word
representations on large textual data.
The next major development was the proposal of ELMo by Allen Institute researchers in 2018. ELMo introduced
deep contextualized word representations using pre-trained bidirectional long short-term memory (LSTM). The
internal states of the bidirectional LSTM (BiLSTM) over a sentence were used as powerful context-based word
embeddings. ELMo embeddings led to big performance gains in question answering and other language under-
standing tasks.
Later in 2018, Google AI proposed the revolutionary Bidirectional Encoders from Transformers (BERT) model.
BERT is a novel self-attention neural architecture. BERT introduced a new pre-training approach called masked
language modeling on unlabeled text. The pre-trained BERT model achieved huge performance gains across
diverse NLP tasks by merely fine-tuning on task datasets.
The immense success of BERT established the “pre-train and fine-tune” paradigm in NLP. Many more
transformer-based pre-trained language models were proposed after BERT, such as XLNet, RoBERTa, T5, etc.
Scaling model size as well as unsupervised pre-training strategies yielded better transfer learning performance on
downstream tasks.
However, model sizes were still limited to hundreds of millions of parameters in most cases. In 2020, OpenAI
proposed GPT-3, which scaled up model parameters to an unprecedented 175 billion! GPT-3 demonstrated
Overview of GenAI Applications and Large Language Models ❘ 3

zero-shot, few-shot learning capabilities never observed before, stunning the AI community. Without any gradient
updates or fine-tuning, GPT-3 could perform NLP tasks from just task descriptions and a few examples. As such,
GPT-3 highlighted the power of scale in language models. Its surprising effectiveness motivated intense research
interest in training even larger models. This led to the exploration of LLMs with model parameters in the trillion+
range. Startups such as Anthropic and public efforts such as PaLM, Gopher, and LLaMA pushed model scale
drastically with significant investments in the space. Several tech companies and startups are now using (and
training their own) LLMs with hundreds of billions or even a trillion plus parameters. Models like PaLM, Flan,
LaMDA, and LLaMA have demonstrated the scalability of language modeling objectives using the transformer
architecture. At the time of this writing, Anthropic has developed Claude, the first LLM to be openly released
with conversational abilities rivaling GPT-3.
You can see that all the models mentioned are related, much like the Tree of Life. In other words, anatomical
similarities and differences in a phylogenetic tree are similar to the architectural similarities found in language
models. For example, Figure 1.1 shows the evolutionary tree of LLMs and highlights some of the most popular
models used in production so far. The models that belong to the same branch are more closely related, and the
vertical position of each model on the timeline indicates when it was released. The transformer models are repre-
sented by colors other than gray: decoder-only models like GPT, OPT and their derivatives, encoder-only models
like BERT, and the encoder-decoder models T5 and Switch are shown in separate main branches. As mentioned
earlier, models have successively “grown” larger. Interestingly, this is visually and objectively similar to the evolu-
tion of intelligent species, as shown in Figure 1.2. A deeper comparison is out of the scope of this book, but for
more information on either of these evolutionary trees, refer to the links in the captions.

FIGURE 1.1: Evolutionary tree of language models


(see Rice University / https://arxiv.org/pdf/2304.13712.pdf / last accessed December 12, 2023.)
4 ❘ CHAPTER 1  Introduction

FIGURE 1.2: Evolutionary tree of human brain structure


(see André M.M. Sousa et al., 2017/ with permission of Elsevier.)

Increasing the model size, compute, and data seems to unlock new abilities in LLMs, which exhibit impressive
performance on question answering, reasoning, and text generation with simple prompting techniques. By train-
ing LLMs to generate code, models such as AlphaCode and Codex display proficient coding skills. LLMs can
chat, translate, summarize, and even write mathematical proofs aided by suitable prompting strategies.
The key shift from PLMs to LLMs is that scale seems to bring about qualitative transitions beyond just incremen-
tal improvements. LLMs display certain emergent capabilities such as few-shot learning, chain of reasoning, and
instruction following not observed in smaller models. These abilities emerge suddenly once model scale crosses a
sufficient threshold, defying smooth scaling trends.
LLMs entail a paradigm shift in AI from narrowly specialized systems to versatile, general-purpose models. Lead-
ing experts feel recent LLMs display signs of approaching human-level artificial general intelligence. From sta-
tistical to neural networks, the steady progress in language modeling scaled up by orders of magnitude has been
the missing link enabling this rapid advancement toward more human-like flexible intelligence. The astounding
capabilities of GPT-3 highlighted the power of scale in language models. This has led to intense research interest
in developing even larger LLMs with model parameters in the trillion range. The assumption is that bigger is bet-
ter when it comes to language AI. Scaling model size along with compute and data seems to unlock new abilities
and performance improvements.
Overview of GenAI Applications and Large Language Models ❘ 5

The largest LLMs have shown the ability to perform human-level question answering and reasoning in many
domains without any fine-tuning. With proper prompting techniques like chain of thought, they can solve com-
plex arithmetic, logical, and symbolic reasoning problems. LLMs can intelligently manipulate symbols, numbers,
concepts, and perform multistep inferences when presented with the right examples.
But of course, language generation is the main area where LLMs’ capabilities have taken a huge leap. LLMs
can generate fluent, coherent, and human-like text spanning news articles, poetry, dialogue, code, mathematical
proofs, and more. The creativity and versatility displayed in conditional and unconditioned text generation are
remarkable. Few-shot prompting allows controlling attributes such as length, style, content, etc. Text-to-image
generation has also made rapid progress leveraging LLMs. The exponential growth in model parameters has
been matched by the computing power and datasets availability. Modern GPU clusters, the emergence of model
parallelism techniques, and optimized software libraries have enabled training LLMs with trillions of parameters.
Massive text corpora for pre-training are sourced from the Internet and digitization initiatives.
All this has fueled tremendous excitement and optimism about the future of AI. LLMs display a form of algo-
rithmic and statistical intelligence to solve many problems automatically given the right data. Leading AI experts
believe rapid recent progress is bringing us closer to artificial general intelligence than before. Large language
models may be the missing piece that enables machines to learn concepts, infer chains of reasoning, and solve
problems by formulating algorithms like humans.
LLMs still have major limitations. They are expensive and difficult to put into production, prone to hallucina-
tion, lack common sense, and struggle with complex symbolic reasoning. Model capabilities are also severely
constrained by the training data distribution. LLMs can propagate harmful biases, generate toxic outputs, and
be manipulated in dangerous ways. There are rising concerns around AI ethics, governance, and risks that merit
careful consideration. Responsible development of AI aligned with human values is necessary. However, we
already see several generative AI (GenAI) applications with these LLMs at their core! GenAI heralds a paradigm
shift from narrow analytical intelligence toward creative and versatile systems. GenAI applications powered by
models such as GPT-3, PaLM, and Claude are displaying remarkable abilities previously thought impossible
for machines.

GenAI vs. LLMs: What’s the Difference?


While both GenAI and LLMs deal with generating content, their scopes and applications differ. GenAI is a
broader term that encompasses AI systems capable of creating various types of content, such as text, images, vid-
eos, and other media. LLMs, on the other hand, are a specific class of deep learning models designed to process
and understand natural language data. LLMs are used as a core component in GenAI applications to generate
human-like text. GenAI applications, on the other hand, use LLMs to create more comprehensive and interac-
tive experiences for users. While LLMs are responsible for understanding and generating human-like text, GenAI
applications utilize these capabilities to create more comprehensive and interactive experiences for users.
GenAI applications are full end-to-end applications that could involve LLMs as their core. For example, Chat-
GPT is a GenAI application with GPT-3.5 and GPT-4 at its core. This means that while LLMs are responsible for
understanding and generating human-like text, GenAI applications utilize these capabilities to create more com-
prehensive and interactive experiences for users. Putting LLMs into production as GenAI applications requires
overcoming several challenges, including aligning LLMs with human values and preferences, training LLMs due
to their huge model size, adapting LLMs for specific downstream tasks, and evaluating the abilities of LLMs.
Despite these challenges, LLMs have the potential to revolutionize the way we develop and use AI algorithms,
and they are poised to have a significant impact on the AI community, and society in general.
LLMs have enabled remarkable advances in GenAI applications in recent years. By learning from vast amounts
of text data, LLMs like GPT-3 and PaLM can generate highly fluent and coherent language. This capability has
been harnessed to power a diverse range of GenAI applications that were previously infeasible. Let’s discuss some
popular GenAI applications here:

Conversational agents and chatbots: One of the most popular applications of LLMs is conversational agents and
chatbots. Systems like Anthropic’s Claude and Google’s LaMDA leverage the language generation skills of LLMs
to conduct natural conversations. They can answer questions, offer advice, and discuss open-ended topics through
Another Random Document on
Scribd Without Any Related Topics
CHARITY UNDERTOOK TO DIVIDE EVERYTHING WITH EQUAL
FAIRNESS.

Up to now they had seen very few people. A boy driving


cattle in the distance was an Indian chasing buffaloes; an
old man with a dog was a chief with his wolf hound.

But when they took to their raft again, their sharp eyes
spied a fisherman some distance away with his line across
the stream.
"Now you'll hear him swear," said Charlie, with a
delighted chuckle. "I know what fishermen are like. I've
passed them before."

"He isn't a fisherman," said Hope; "he's a pirate looking


out for ships from his island."

"It's a pity we have no gun to blow him to pieces! My


dear husband, put a bit more strength into that old punt
stick of yours! Let's rush down the stream and pull hold of
his line—perchance we may pull in the pirate to his death!"

So all four got hold of their oars, and by dint of


prodding the banks in punt-like fashion, the raft began to
quicken its pace. The fisherman saw them before they
reached him and pulled in his line. He did not swear but
laughed heartily as the raft approached.

"Who the dickens is this?" he asked. "Here, hi, don't


pass me by! Is there room for me on board?"

He was a tall, broad-shouldered young man.

Charlie threw a ferocious look at him.

"You're a pirate, let us pass!"

The young man had stopped the raft with his foot.
Charlie was rather exhausted with his efforts, and the little
girls were panting for breath.

"If I'm a pirate, I beg to tell you that this water is mine,
and that you are my prisoners. You'll land at once, and
forfeit your ship."

With a quick, dexterous stroke, he had seized hold of


the rope, and drawn the raft close to the bank.
Winding it round a tree stump, the discomfited voyagers
found their passage stopped.

Charity looked up into the pirate's face. She saw that


his eyes were twinkling, and she felt reassured.

"The ocean is free to all," she said boldly.

The young man pointed to a board close by.

"Private water. Trespassers will be prosecuted."

And then Charlie saw that he had brought his raft along
the wrong bend of the stream. He had seldom before come
as far as this.

For a moment the little Captain looked perplexed.

"We are on a peace voyage," he said, "otherwise with


cutlasses and guns we would make short work of you. We
are searching for an island called 'Tarjak,' and for treasure
thereon."

"Ah," said the young man in a mysterious tone, "now


you've come to the right party. I know the very spot and
will lead you to it, if I may share in the booty."

"He means treachery!" said Charity in a loud whisper.

"I've got you in my power!" said the young man sternly.

Charlie and the little girls hastily consulted together.

"We'll just let him join us," was Charlie's decision.

Then he turned to the stranger.

"Now will you lead us?"


"I'll tow you along," was the cheerful reply. "You have
spoilt my fishing, but I'll take you straight to a treasure
island such as you have never seen before."

In a very short time, this amazing young man was


marching along the banks rope in hand, and the raft was
being towed along without any effort on the part of the
crew. They gave themselves up to the delight of it, for all
small backs and wrists were aching, and it was delicious to
be towed along so swiftly, without any effort.

And presently the stream widened considerably and a


veritable small island appeared. The Pirate used the oars
now, and stood in the middle of the raft himself. He brought
it to the edge of the island and told the children to get out.

To their delight there was a tiny thatched hut in the


middle of it.

"Years ago," said the pirate, "I landed here in a small


boat, having been shipwrecked, and having lost all I
possessed. I built myself this hut and lived here in peace,
quitting it for rougher waters and sea fishing occasionally.
For the most part I lived on fish. One day I was digging for
bait, when I stumbled on a—a cache, and in a certain spot
on this island is one hidden now. Treasure is in it. If you
have brought spades, dig away till you find it. I will give you
a clue. Four paces from Security, an arm's length to the
right, dig for two feet down!"

"We haven't any spades," said the little girls. "The


Captain has the only one."

The young man went into the hut, and soon appeared
with a pitchfork and two spades of very small proportions.
"And now," said Charity, "where is Security? It must be
the hut, of course!"

In a very few minutes each child was digging four paces


from the hut. But Charlie began to flag, whereupon the
young man whispered something to him.

"Your Captain is ill," he said; "he's going to rest. I'll take


his place."

Charlie sat down on the step leading to the hut and


watched the others with a rather bitter face. It was hard to
be bowled over so soon, just when he would like to prove
his strength to be superior to the girls.

Digging went on steadily, but the three little girls made


slow progress.

The young man dug too, but he presently said, "I'll give
another clue:"

"By the side of a singer's home, a hand's span from the


base."

The little girls were completely puzzled. Charlie's bright


eyes roved to and fro. At last his face lightened.

"I have it. A singer's home is a bird's tree, and the base
is the trunk of it. There's only one tree which it can mean,
and it's the ash tree between Bolt and Ben."

Charity made a rush to the spot, and Charlie sprang up


declaring he was quite rested. He and the three girls all
attacked the ground round the ash tree, and the young man
quietly slipped into the hut, leaving them at work.
It was not long before Charity's spade hit against
something hard. Then four eager pairs of hands dragged it
to light. It was a rusty tin box tied with string and sealed.

Charlie took command as Captain, and cut the string


with his clasp knife. His face was solemn as he did it, but
the little girls' faces bubbled all over with curiosity and
delight.

It was hard to open, but at last the lid gave way, and
then Charlie very carefully lifted out the contents.

Four black farthings, a blue marble, and two peach


stones.

Faith's face fell; she was dreadfully disappointed. She


had really expected to find it full of precious stones.

Charity danced up and down.

"Golden sovereigns, a blue jewel worth a million


pounds, and seeds of a pomegranate that grew in
Paradise!" she cried.

Charlie turned to her with an approving face.

"You are right, my wife, wonderful treasure indeed!


Does the Pirate mean to let us carry it off?"

"The Pirate invites you to a meal. He is tired of a very


lonely life and welcomes treasure seekers to his home."

It was the young man who spoke. The children dashed


into the hut, Charlie clutching his tin box. There another
surprise awaited them.
A kettle was boiling merrily. A cake was on the table,
and some ham sandwiches. The Pirate had cups and
saucers, and was measuring tea into a brown teapot.

"We have no cow on the island, but we have sugar and


good tea. Let us fall to!"

It seemed like magic. The children sat up round a little


table and they had a merry meal. After it was over, the
treasure box was produced and the treasures divided. The
Pirate took one of the peach stones. Charity took the other.

"I as Captain's wife have first choice," she said. "I am


going to plant this wonderful seed, and perhaps it will
spring up into a magic tree which will reach the sun."

Charlie gave Hope and Faith a farthing each.

"A guinea for Bolt and Ben," he said. "I being Captain
keep the blue diamond and a guinea, my wife can have the
other."

Then the Pirate pulled out his pipe, and sitting cross
legged on the ground told them the most wonderful story of
how the treasures had been obtained and hidden away. The
children listened breathlessly, but at last the Captain said it
was getting late, and they must go. The Pirate took them
back to their raft, and then he surprised them again. He got
out a very small boat from under the willows, tucked
himself into it, and fastening the rope of the raft to his
painter, rowed gaily off down the stream, towing the
children back to the spot where he found them. Then he
bade them good-bye.

"We shall never meet again most likely," he said, "but I


warn you to keep to your own waters. There are other
pirates who would make short work of you should they find
you where I did!"

The children waved their hands gaily to him. Charlie


was supremely happy, and content at the result of his
voyage after treasure, and all their tongues wagged fast as
they made their way down the stream towards Charlie's
home. It was nearly six o'clock when they got there, so the
little girls bade their new friend a hasty good-bye.

"It has been perfect—simply perfect!" said Charity, and


the others echoed her words.

They ran home then as fast as they could, and told


Granny all of their adventures.

"Who do you think the Pirate can be?" Faith asked. "He
has such nice kind eyes, but a very grave face."

Granny said she could not possibly tell, and Aunt Alice
could not help them.

But the next day the rector's wife paid a long call.
Charity happened to be in the room, and though she was as
quiet as a little mouse she kept her ears wide open, and
when she was alone with Hope and Faith she was quite
excited.

"I believe Mrs. Webster was talking about the Pirate, I


believe she was! She told Aunt Alice she had a little
grandson staying with her last week, and she wished she
had asked us over to tea with him, as he was so lonely. And
then she said that Fred Cardwell had been so good to him.
He had taken him off fishing with him several days, and had
entertained him on an island which the little boy had loved.
She asked Aunt Alice if she knew the Cardwells, and I
pricked up my ears and listened hard. Aunt Alice said no,
and she said Fred Cardwell lived with a very cross, ill father
—I think he's a squire, like Sir George, and they live about
five miles from here. The father is parry—something—a long
word, and Fred had to come home and look after him, and
he's no mother or brothers and sisters, and Mrs. Webster
said it was a terrible life for him, and it made him gloomy,
and he doesn't go anywhere or won't know anybody, but he
likes children and she said her little grandson loved him.
Don't you think Fred may be our pirate? Because there can't
be lots of islands about, and perhaps that was why he had
cake and tea in the hut, he had put them there when the
little boy was with him!"

Charity paused for breath.

"I wish we could see him again," said Faith.

"We'll keep a sharp look-out along the roads when we


walk," said Hope. "Mrs. Cox wasn't at all right about the
country. We do meet people very often, and we may meet
him."

"We'll ask Lady Melville if she knows him when we see


her next," Charity said; "and now I'm going to plant my
magic seed. Come and see me do it."

So Hope and Faith accompanied her to the little garden,


and she planted it just below their bedroom window.

"Perhaps it will be like Jack's beanstalk, and grow so


high that we can step out of our window on to its branches,"
said Hope.

"It's sure to be different from any other tree, for it has


been hidden away for years, I'm sure," said Charity.
They did not see Charlie for some time after this. They
heard that he was ill, and one day a letter came from him
addressed to "My Wife and Crew."

The little girls opened it with great delight. It was very


short.

"This is to tell you that youre Captin lies


dangirosly wunded, and his sickness is suvere.
His next voyage will not take plaice, and when
he gets better Ben is to come to him to delever
messages to his wife and to Bolt.

"Charles, Captain of
the 'Success.'"

Faith was Ben. She wondered why she was especially


invited. Charity tossed her head.

"He knows he can do what he likes with Faith. I expect


none of us will be able to go with him now, for Aunt Alice
said we were to begin lessons. This new governess, Miss
Vale, is coming next Monday."

"Well, anyhow," said Hope, "I'm glad it isn't school, and


Aunt Alice says she will be very nice."

"I must answer his letter," said Faith.

But she was not very fond of letter-writing and put it


off. She left Charity and Hope playing in the orchard that
afternoon and went off to visit her friend the shepherd.
There was nothing she enjoyed so much as creeping into his
little cottage and sitting on a small stool in the chimney
corner with the old man. Sandy would come and rest his
nose in her lap, and she and Timothy found plenty to say to
each other. She told him all about Charlie, and the raft and
the Pirate.

"'Tis a pity the little laddie enjoys such poor health,"


said Timothy to her. "The doctor be such a hearty man, but
there—the Lord have a way of His Own with each o' us, and
'tis ordained for him to be weakly. I often sits and thinks o'
strength. 'Tis misused in the body, and if so be the soul is
strong, 'tisn't so much odds about the body!"

"I wonder how strong my soul is," said Faith. "I'm not
very strong in my body, Granny says. Can I make my soul
strong, Timothy?"

"Ask the Comforter," said the old man. "He'll strengthen


the weak. We are told 'He helpeth our infirmities.'"

"What do infirmities mean?"

"Our weakness and ailments, surely. The Book says, 'We


can be strengthened with might in the inner man' by Him."

"Is our inner man our souls?" asked Faith.

"I suppose I have an inner woman—I'm not a man."

"'Tis just a figure o' speech. 'Tis your little soul you
want to be made strong."

"I like the Comforter very much," said Faith softly and
reverently; "He came and comforted me this morning,
Timothy. Aunt Alice scolded me because she told us not to
leave the front door open, there was such a wind. And Hope
left it open; she came out last, and the wind knocked a
china vase off the table, and broke it, and Aunt Alice was
very angry and scolded me, because she thought I'd gone
out last. And I went away and cried, and then I distinctly
felt the Comforter near me, and I asked Him to comfort me.
I almost felt He took me in His Arms. He was so close. And I
kept quite still, and then I couldn't be sorry any more, for I
knew He knew I hadn't done it. He was so kind!"

Faith heaved a sigh of happiness, and Timothy nodded


his head.

"'Tis just so!" he said simply.

And then they began to talk about Sandy and the


sheep, and when she left the cottage, Faith's little face was
radiant.

"I feel when I'm talking to you," she said as she bade
the old man farewell, "that I'm getting happier every
minute. I shan't be able to come and see you so often when
we do lessons, but I'll come whenever I can!"

Charity and Hope could not understand her friendship


with the old man. But Faith paid no heed to them. She was
a quiet, old-fashioned child, and loved to go her own way
without any interference of other people.

CHAPTER VI
CHARLIE STILL IN COMMAND
MISS VALE arrived on Monday, and the little girls fell in
love with her. She was very pretty, with bright, dark eyes,
and a quick, cheerful manner. But they found she was very
firm and strict in some things, and lessons could not be
trifled with.

"I shall not give you any lessons you cannot prepare,
and when I come I expect to find them done. If they are
not, I shall conclude it is idleness that is the cause and will
deal with it accordingly."

This sounded very alarming, but the children found that


she was right, and that there was no excuse for their
lessons not being learnt.

She came from nine o'clock to one, and they had an


hour every afternoon in which they did what they called
their "prep" for her.

Charity and Hope did everything together, Faith could


not keep up with them. She was slow and persevering, but
not very clever at books. Yet Miss Vale, if she had any
preference, liked to teach her the best of the three, for her
whole heart was in what she did, and she was extremely
conscientious.

In a few days' time, Faith was allowed to go to see


Charlie. His mother met her at the door.

"My boy did too much with you that day. He has been in
bed ever since. His father says there must be no voyages
down the stream for a long time to come, so don't
encourage him to talk about it."

Faith was taken into the sitting-room, where Charlie lay


on the couch, looking very white and frail, but he greeted
her most cheerfully:
"Come on, Ben. Isn't it hard lines for me?"

"What's been the matter with you?" asked Faith in a


sympathetic tone.

"Oh temperature; it's always that. My head nearly


bursting and I'm hot as fire! But I'm all right now. What
have you been doing?"

Faith gave an account of their days since last they met,


and from being very bright Charlie's spirits sank, and he
began to talk most gloomily.

"It's no use my trying to do anything like other fellows.


If God had made me a girl, I shouldn't have minded half so
much, but boys are meant to be strong, and I think it's a
shame I shouldn't be. I quite hate myself sometimes. If I
was meant to be weak and ill, I oughtn't to have been born
a boy. It was a big mistake."

"But," said Faith with rather a shocked look, "God


borned you, and He can't make mistakes. And I don't see
why girls should be ill and not boys. Besides, Charlie, if you
aren't strong outside, you can be inside. I was talking with
my friend Timothy about you. He said:"

"'It wasn't any odds about the body, it's the soul that
really matters.'"

"Don't you like people with strong souls?"

"I don't know what they're like," Charlie said, staring at


her.

Faith's eyes grew big and glowed with light as she


replied:
"Oh, they're heroes!—Always smiling at difficult days
and going straight on with their heads up, even if they're
hurt. I've thought about them a lot. And I'm hoping the
Comforter will make my soul grow big and strong. I should
like never to cry when I'm hurt, and never grumble when
things are horrid! And keep smiling even if the whole world
turned against me and trampled me down!"

Faith spoke with such fervour that Charlie was much


impressed.

She added:

"We all think you have a strong soul because you're so


cheerful."

"Oh, I'm not!" said Charlie. "I've been beastly to Mother


I'm so awfully disappointed that Dad won't let me go on the
raft. It's the one thing I really enjoy! There's nothing else I
can play at. It does seem a shame."

"We play lots of games," said Faith, "and we've no raft."

"No, but you are girls, and you can run about, and climb
trees. I've had a miserable time shut up here all alone."

"I suppose," said Faith shyly, "you wouldn't like the


Comforter to stay with you? I think you would feel better if
He did. I don't know Him very well yet, but Timothy talks to
me about Him. And I think it's so wonderful that He likes
coming and living with boys and girls and making them
happy and good and strong!"

"What do you mean?" asked Charlie.

"Timothy told me about Him first. If you're ill and


unhappy, He would like to come to you and comfort you.
That's why He's called the Comforter. Wherever He goes He
comforts, and He does it perfectly, because, you know, He
is God."

Faith's voice sank to an awed whisper.

"What a funny girl you are!" said Charlie. "You seemed


as if you were talking about a real person!"

"But the Comforter is real."

"Well, a real person is somebody you can see."

"If you can't see Him, you can hear Him," said Faith
gravely. "There are lots of real things we can't see. The
wind—"

"Oh, I know, but you're preaching a sermon."

"Am I?"

Faith subsided into silence. Presently she said:

"Then I can't tell you how to be happy and comforted, if


you won't believe what Timothy says."

"Can you find out this puzzle?"

Charlie would have no more serious talk; and produced


a little puzzle box with a padlock which opened itself in a
wonderful way; Faith was interested at once. She paid him
quite a long visit, and when the time came for her to go,
Charlie produced two packets which he charged her to give
to the wife and to Bolt.

"They're my wishes," he said. "You signed on to do what


I told you for a year, and I'm still in command, though I'm
ill!"
"Aye, sir, aye!" said Faith, saluting him in best style.

Then she went home, and Charity and Hope opened


their packets eagerly.

There was a great deal of paper but not much writing.


Charity's was as follows:

"Captain Charles sends greetings to his wife.


He wishes her to find the Pirate's Haunt, and let
me know his house and his riteful name. She
must make the journey by land, but she must
not fale to do it. For a lot hangs on the Finding!
And she must rite the name in sekrecy and seal
her letter with a red seal and send it through
the post. And if she does not keep it sekret
death will o'ertake her.

"(Signed) CAPTAIN
CHARLES."

Hope opened hers and found a strange map, drawn in


red ink by Charlie. This was the letter accompanying it.

"Captain Charles to his humble and devotted


servant Bolt. I charge you to studey this map,
and to find on your walks the place that is
called Boggy Glen. There is a wonderful herbe
therein, called Wild peppermint, whichesame
will releave the Captain of his mortal sicknes,
and is to be sent to me in a sealed letter by
poste with no derlay.
"(Signed) CAPTAIN
CHARLES."

"What fun!" cried Hope. "You and me, Charity, will have
to be busy!"

"And he's given me nothing to do," said Faith, feeling


aggrieved; "nothing at all."

"You're his messenger," said Charity, "you must take


back answers to these notes. A King's messenger is most
important, and so is a Captain's messenger."

Faith's face brightened. She took back two notes the


next day to the doctor's house, but did not see Charlie.
These were the notes:

"MY DEAR HUSBAND,

"I'm sorry you're sick, but I'll find the Pirate


in a jiffy. I wish you had told me to hang him or
something difficult like that, but after all I don't
want him hanged, because he was very kind. I
mustn't let him know of course. I'll criep about
like a spy and follow his tracks. I think I've
already found out his name. But I'll write it
hidden in a sentence and you'll have to find it
out. Good-bye and good luck.

"Your sooperior
WIFE."
"MY DEAR CAPTAIN,

"I will find the heeling herb, no fear about


that! And I'll make a potion of it to be
swallowed at midnight in the dark. And you will
rise up early a hale and harty man. So three
cheers for our Captain and his crew.

"Yours as signed,
BOLT."

Faith had these letters read aloud to her and she


thought them wonderfully clever.

The very next day, Charity and Hope set off on their
different quests. Faith wandered out alone. She called at
Timothy's cottage, but he was out; and then she rambled
on through some fields, and made her way along a strange
lane, which had banks of primroses on each side. Presently
she saw a hole through the hedge; she crept through, and
then she started, for two men were talking together. They
were standing by a rick of hay, and she heard one man say
with passion in his tone:

"I tell you, Fielding, I'm so dead sick of rotting here like
a vegetable, when I might be up and doing with the rest of
the working world, that at times I feel inclined to make a
bolt for it!"

"It's hard lines," murmured the other man, and then the
two separated.

One went on across the field, the other turned and


faced Faith, and she saw it was the Pirate. Just for a
moment he looked as if he were going to pass her without
speaking, and his brows were knitted so fiercely, and his
face so gloomy, that Faith was frightened. He stood still, as
he saw her shrinking into the hedge, and then his brow
cleared.

"Why, it's one of the treasure seekers, isn't it?" he said


in his pleasant voice.

Then Faith held out her small hand.

"I'm so glad you will speak to me," she said, "I should
have been so disappointed, if you hadn't."

"Would you? And what are you doing alone here?"

"Just walking about and trying to amuse myself,"


answered Faith. "Charity and Hope have gone off to do
errands for Charlie, but I've none to do."

For a moment she was tempted to tell him that Charity


was looking for him, then she remembered that Charlie said
it was to be kept a secret, so she shut her lips determinedly.

"Well, I'm an idle person too. We'll take a walk together,


I want to go to Dapperton Bridge to see if there's a chance
of any salmon there. Will you come with me?"

"Is it very far? I should love to."

"No, it is only half a mile further on."

He took hold of her hand, and chatted about different


things till they came to the bridge; then he leant his arms
on it, and as he gazed at the river flowing beneath, his face
became gloomy again, and absorbed. Faith stood by his side

You might also like