Webinar Fast-Track To Generative AI With NVIDIA

Fast-Track to Generative AI With NVIDIA
January 2024
Anne Hecht, Sr Director Enterprise Products, NVIDIA
Tony Paikeday, Sr Director AI Systems, NVIDIA

How to Use the Console
What We'll Cover
✓ How Enterprises are Using Generative AI
✓ Model Customization Best Practices
✓ Putting Generative AI into Production
✓ Getting Started
Generative AI is Transforming Business
Generative AI’s impact on productivity could add up to $4.4 trillion annually to the global economy. 1
Finance Healthcare Retail Telecommunications

Fraud Detection | Personalized Banking Molecule Simulation | Drug Discovery Personalized Shopping | Automated Catalog AI Virtual Assistants | Network Performance Tuning
Investment Insights Clinical Trial Data Analysis Descriptions | Automatic Price Optimization Remote Support Capabilities
Media & Entertainment Manufacturing Federal Energy

Character Development | Style Augmentation Factory Simulation | Product Design Document Summarization | Audit Compliance Predictive Maintenance | Customer Service
Video Editing & Image Creation Predictive Maintenance AI Virtual Assistants Knowledge Base Q&A
Source: 1. The economic potential of generative AI: The next productivity frontier, McKinsey, 2023
Addressing the need for custom-built AI
Enterprise apps built on industrial-grade models
Built on your Oceans of

brand/vocabulary training data
Trustworthiness Intellectual property

Building Generative AI for the Enterprise
NVIDIA AI Foundation Models – NeMo with NVIDIA AI Enterprise - DGX & DGX Cloud
NVIDIA AI
Foundation Models
RAG
LLM Prompts
Agent Vector Store LLM
Community Deploy on NVIDIA AI Enterprise

Models & Augment with Enterprise Data
Customize in an
AI Foundry
DGX/ DGX
NeMo Cloud Foundation
Model
The AI Foundry for Model Customization
Customize in an
AI Foundry
DGX/ DGX Cloud

NeMo Foundation Model
Enterprises Need Custom Models to Power Their Business
Businesses need to turn “off-the-shelf” models into proprietary models
General Response
Every Google account comes
with 15 GB of storage.
Leading Foundation Model
“How much storage do I

have in my Google Drive?”
Specific Response
Company employees have unlimited
storage in their Google drive.
Enterprise Customized Model

NVIDIA AI Foundation Models and Endpoints
Fast-track custom generative AI models for enterprise applications
NEMOTRON-3 8B-QA NEMOTRON-3 8B-Chat- NEMOTRON-3 8B-Chat-

SteerLM RLHF
LLAMA 2 STABLE DIFFUSION XL CODE LLAMA
Enterprise-ready, performance optimized models from Experience foundation models running on the NVIDIA AI
NVIDIA and the community stack via API endpoints
Building Generative AI Applications for the Enterprise
Build, customize and deploy generative AI models with NVIDIA NeMo
Data Distributed Model Accelerated Retrieval Augmented

Curation Training Customization Inference Generation Guardrails
In-domain, secure, …
cited responses
In-domain
queries
NeMo Curator Megatron Core NeMo Aligner Triton & TensorRT-LLM NeMo Retriever NeMo Guardrails
Model Development Enterprise Application Deployment
NVIDIA NeMo
NVIDIA AI Enterprise
Why Are AI Initiatives Costing More Than Anticipated?
The cost of AI developers expending effort on non-development work
Where AI developers are

spending their time
wait on cluster provision
stack engineering Which of

model optimization these
factors are
wait on resource allocation impeding
job launch prep
your
developers?
job monitoring
Why Are AI Initiatives Costing More Than Anticipated?
The cost of AI developers expending effort on non-development work
wait on cluster provision
“Hidden” IaaS stack engineering

2 days
challenges that model optimization 20 days An AI developer could lose
5 days
drive up OpEx wait on resource allocation over 30 days on non-value
job launch prep
add “effort”
1 day
• Delays in job monitoring
infrastructure • $30k - $50k lost
provisioning productivity per developer
• Effort expended • Potentially $1m+ across an

on AI code
modification /
AI team of 30 developers
15 days
adaptation
• Unaccounted when
• Training job observing purely the cost
troubleshooting of infrastructure
/ resource
utilization
inefficiency
10 days
NVIDIA DGX Cloud
Build Your Models Faster with Serverless AI on NVIDIA DGX Cloud
AI PLATFORM YOUR OWN GET FASTEST ROI FOR

THAT PUTS SERVERLESS AI UNSTUCK YOUR AI ENDEAVORS
DEVELOPERS FIRST FACTORY
Easy-to-use, powerful tools Dedicated platform for NVIDIA AI experts are Superior ROI with
for delivering production- multi-node training, ready to help you get maximized utilization
ready models sooner optimized for Generative AI better results, faster efficiency
DGX Cloud Solves the Challenges of Scaling AI
Delivering enterprise-scale data science effectiveness and efficiency
Multi-node clusters are ready, waiting for your

Wait on cluster provision
developers NOW, not a month from now
Full-stack containers ensure compatibility and Stack engineering

performance across layers
Includes pre-trained models that are Model optimization

optimized and ready to use
No refactoring of code to use service APIs

Job launch prep
Advanced telemetry and automated resource Job monitoring

management across all jobs & resource allocation
NVIDIA DGX Cloud for Custom LLMs
Delivering the Premium AI Training Service for the Era of Generative AI
Traditional AI NVIDIA DGX Organizations doubling-down

development Cloud on generative AI need:
NeMo, BioNeMo, Picasso Software that unleashes
DIY tools + open source job scheduling
and orchestration
developer productivity
Inconsistent access to multi- DGX Cloud Instance Multi-node training

(8) A100/H100 80GB GPUs
node scale across regions with multi-node scale performance + easy scale
10TB storage / instance
10TB egress
Community forums AI practitioners who help
and “sweat equity” NVIDIA AI Expert maximize performance
Escalating costs, add-on fees Predictable pricing that

for reserved instances, includes everything
storage, data egress etc.
Traditional Hosted in
AI clouds leading clouds
Deploy on NVIDIA AI Enterprise
RAG
LLM Prompts
(runtime)
The Challenges of the Enterprise Developer
65,000 public generative AI projects created on GitHub in 2023 – a 248% YoY growth
ACCESS TO ACCELERATED STAYING CURRENT ON SUCCESSFUL TRANSITION

INFRASTRUCTURE LATEST AI DEVELOPMENT FROM PILOT TO
SKILLS PRODUCTION
Designed for Enterprises that Run their Business on AI
NVIDIA AI Enterprise: Production-Grade Software for AI
Accelerated Computing Enterprise-Grade

Cloud Native & Certified
increases productivity while security, stability,
to run everywhere
lowering TCO manageability & support
Generative AI
NVIDIA AI
ENTERPR
ISE
CVE Patching API Stability

ETL/Spark Inference RTX 6000 Ada- H100 - DGX
#1
End-to-End SLAs with
Manageability NVIDIA Support
…
An Enterprise-Grade Software Platform for Your AI Runtimes
MLOps AI Applications
Infrastructure Management Application Frameworks
Cloud Native Management

and Orchestration
GPU Operator, Network Operator LLM Speech AI Cybersecurity Medical Imaging More
NeMo Riva Morpheus Clara
Cluster Management
AI Development
Base Command Manager Essentials
Data Science / Prep
Model Training and Customization
RAPIDS, RAPIDS Accelerator
NeMo, TAO, PyTorch, TensorFlow
for Apache Spark
Infra Acceleration Libraries

Deploy at Scale Optimize for Inference
Magnum IO, vGPU, CUDA
Triton Inference Server TensorRT, TensorRT-LLM
Cloud | Data Center | Workstations | Edge

Flexible Software Branches for All AI Deployments
Security, API Stability, and Peace of Mind for All Your AI Investments
Feature Branch v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v12 v13 v14 v15 v16

Top of tree SW optimization
Monthly release cadence
CVE patches and bug fixes in roll
forward release
NVIDIA AI ENTERPRISE
v1.1 v1.2 v1.3 v1.4 v1

v1.5 v1.6 v1.7 v1.8 v1.9
Production Branch
API Stability
Monthly CVE patches and bug fixes
2 branches/year with 9-month lifetime v7.1 v7.2 v7.3 v7.4 v1
v7.5 v7.6 v7.7 v7.8 v7.9
3-month overlap between 2 PBs
v14. v14. v14. v14.

1 2 3 4
Long-Term Support
Branch v.next v.next v.next v.next v.next v.next
For highly regulated industries
Quarterly CVE patches/bug fixes
Up to three years support
6-month overlap period
Accelerated AI Improves Productivity While Lowering Total Cost
Segment Anything Model (SAM) – TensorRT Optimized
24x Throughput 5% of the Cost

(Images/min) ($/1M Images)
31.8
$13,137
1.4 $662
CPU NVIDIA AI
Learn More
www.nvidia.com/ai-foundation-models
NVIDIA AI
Foundation Models
RAG
LLM Prompts
Community Deploy on NVIDIA AI Enterprise

Models & Augment with Enterprise Data
Customize in an
AI Foundry
DGX/ DGX
NeMo Cloud Foundation
Model
Enterprises Adopting NVIDIA AI Foundry
Improve Spear Phishing Detection with AI
Learn How Generative AI Can Be Used to Detect Spear Phishing Emails Faster
Tuesday, January 30, 9:00 am PT or

Wednesday, January 31, at 10:00 am CET
Register >
Join this webinar to learn how NVIDIA’s AI technologies,

along with ecosystem partners, can help organizations
build powerful solutions to defend against cyber threats.
Move from pilot to production for your spear phishing

detection AI solution with confidence
with NVIDIA AI Enterprise.
The In-Person GTC Experience
Is Back
Come to GTC—the conference for the era of AI—to connect with a
dream team of industry luminaries, developers, researchers, and
business experts shaping what’s next in AI and accelerated computing.
From the highly anticipated keynote by NVIDIA CEO Jensen Huang to

over 600 inspiring sessions, 200+ exhibits, and tons of networking
events, GTC delivers something for every technical level and interest
area.
Be sure to save your spot for this transformative event. You can even
take advantage of early-bird pricing when you register by February 7.
March 18-21, 2024 | www.nvidia.com/gtc
Plask
Q&A

Webinar Fast-Track To Generative AI With NVIDIA

Uploaded by

Copyright:

Available Formats

Webinar Fast-Track To Generative AI With NVIDIA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Webinar Fast-Track To Generative AI With NVIDIA

Uploaded by

Copyright:

Available Formats

Fast-Track to Generative AI With NVIDIA

Tony Paikeday, Sr Director AI Systems, NVIDIA

✓ Model Customization Best Practices

✓ Putting Generative AI into Production

Finance Healthcare Retail Telecommunications

Media & Entertainment Manufacturing Federal Energy

Built on your Oceans of

Trustworthiness Intellectual property

Agent Vector Store LLM

Community Deploy on NVIDIA AI Enterprise

DGX/ DGX Cloud

“How much storage do I

Enterprise Customized Model

NEMOTRON-3 8B-QA NEMOTRON-3 8B-Chat- NEMOTRON-3 8B-Chat-

LLAMA 2 STABLE DIFFUSION XL CODE LLAMA

Data Distributed Model Accelerated Retrieval Augmented

Model Development Enterprise Application Deployment

Where AI developers are

wait on cluster provision

stack engineering Which of

wait on cluster provision

“Hidden” IaaS stack engineering

• Effort expended • Potentially $1m+ across an

AI PLATFORM YOUR OWN GET FASTEST ROI FOR

Multi-node clusters are ready, waiting for your

Full-stack containers ensure compatibility and Stack engineering

Includes pre-trained models that are Model optimization

No refactoring of code to use service APIs

Advanced telemetry and automated resource Job monitoring

Traditional AI NVIDIA DGX Organizations doubling-down

Inconsistent access to multi- DGX Cloud Instance Multi-node training

Escalating costs, add-on fees Predictable pricing that

Agent Vector Store LLM

ACCESS TO ACCELERATED STAYING CURRENT ON SUCCESSFUL TRANSITION

Accelerated Computing Enterprise-Grade

CVE Patching API Stability

Infrastructure Management Application Frameworks

Cloud Native Management

Infra Acceleration Libraries

Cloud | Data Center | Workstations | Edge

Feature Branch v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v12 v13 v14 v15 v16

v1.1 v1.2 v1.3 v1.4 v1

v14. v14. v14. v14.

24x Throughput 5% of the Cost

Agent Vector Store LLM

Community Deploy on NVIDIA AI Enterprise

Tuesday, January 30, 9:00 am PT or

Join this webinar to learn how NVIDIA’s AI technologies,

Move from pilot to production for your spear phishing

From the highly anticipated keynote by NVIDIA CEO Jensen Huang to

March 18-21, 2024 | www.nvidia.com/gtc

You might also like