Webinar Fast-Track To Generative AI With NVIDIA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Fast-Track to Generative AI With NVIDIA

January 2024
Anne Hecht, Sr Director Enterprise Products, NVIDIA

Tony Paikeday, Sr Director AI Systems, NVIDIA


How to Use the Console
What We'll Cover
✓ How Enterprises are Using Generative AI

✓ Model Customization Best Practices

✓ Putting Generative AI into Production

✓ Getting Started
Generative AI is Transforming Business
Generative AI’s impact on productivity could add up to $4.4 trillion annually to the global economy. 1

Finance Healthcare Retail Telecommunications


Fraud Detection | Personalized Banking Molecule Simulation | Drug Discovery Personalized Shopping | Automated Catalog AI Virtual Assistants | Network Performance Tuning
Investment Insights Clinical Trial Data Analysis Descriptions | Automatic Price Optimization Remote Support Capabilities

Media & Entertainment Manufacturing Federal Energy


Character Development | Style Augmentation Factory Simulation | Product Design Document Summarization | Audit Compliance Predictive Maintenance | Customer Service
Video Editing & Image Creation Predictive Maintenance AI Virtual Assistants Knowledge Base Q&A

Source: 1. The economic potential of generative AI: The next productivity frontier, McKinsey, 2023
Addressing the need for custom-built AI
Enterprise apps built on industrial-grade models

Built on your Oceans of


brand/vocabulary training data

Trustworthiness Intellectual property


Building Generative AI for the Enterprise
NVIDIA AI Foundation Models – NeMo with NVIDIA AI Enterprise - DGX & DGX Cloud

NVIDIA AI
Foundation Models
RAG
LLM Prompts

Agent Vector Store LLM

Community Deploy on NVIDIA AI Enterprise


Models & Augment with Enterprise Data

Customize in an
AI Foundry

DGX/ DGX
NeMo Cloud Foundation
Model
The AI Foundry for Model Customization

Customize in an
AI Foundry

DGX/ DGX Cloud


NeMo Foundation Model
Enterprises Need Custom Models to Power Their Business
Businesses need to turn “off-the-shelf” models into proprietary models

General Response
Every Google account comes
with 15 GB of storage.
Leading Foundation Model

“How much storage do I


have in my Google Drive?”

Specific Response
Company employees have unlimited
storage in their Google drive.

Enterprise Customized Model


NVIDIA AI Foundation Models and Endpoints
Fast-track custom generative AI models for enterprise applications

NEMOTRON-3 8B-QA NEMOTRON-3 8B-Chat- NEMOTRON-3 8B-Chat-


SteerLM RLHF

LLAMA 2 STABLE DIFFUSION XL CODE LLAMA

Enterprise-ready, performance optimized models from Experience foundation models running on the NVIDIA AI
NVIDIA and the community stack via API endpoints
Building Generative AI Applications for the Enterprise
Build, customize and deploy generative AI models with NVIDIA NeMo

Data Distributed Model Accelerated Retrieval Augmented


Curation Training Customization Inference Generation Guardrails

In-domain, secure, …
cited responses

In-domain
queries
NeMo Curator Megatron Core NeMo Aligner Triton & TensorRT-LLM NeMo Retriever NeMo Guardrails

Model Development Enterprise Application Deployment

NVIDIA NeMo

NVIDIA AI Enterprise
Why Are AI Initiatives Costing More Than Anticipated?
The cost of AI developers expending effort on non-development work

Where AI developers are


spending their time

wait on cluster provision

stack engineering Which of


model optimization these
factors are
wait on resource allocation impeding
job launch prep
your
developers?
job monitoring
Why Are AI Initiatives Costing More Than Anticipated?
The cost of AI developers expending effort on non-development work

wait on cluster provision

“Hidden” IaaS stack engineering


2 days
challenges that model optimization 20 days An AI developer could lose
5 days
drive up OpEx wait on resource allocation over 30 days on non-value
job launch prep
add “effort”
1 day
• Delays in job monitoring
infrastructure • $30k - $50k lost
provisioning productivity per developer

• Effort expended • Potentially $1m+ across an


on AI code
modification /
AI team of 30 developers
15 days
adaptation
• Unaccounted when
• Training job observing purely the cost
troubleshooting of infrastructure
/ resource
utilization
inefficiency

10 days
NVIDIA DGX Cloud
Build Your Models Faster with Serverless AI on NVIDIA DGX Cloud

AI PLATFORM YOUR OWN GET FASTEST ROI FOR


THAT PUTS SERVERLESS AI UNSTUCK YOUR AI ENDEAVORS
DEVELOPERS FIRST FACTORY

Easy-to-use, powerful tools Dedicated platform for NVIDIA AI experts are Superior ROI with
for delivering production- multi-node training, ready to help you get maximized utilization
ready models sooner optimized for Generative AI better results, faster efficiency
DGX Cloud Solves the Challenges of Scaling AI
Delivering enterprise-scale data science effectiveness and efficiency

Multi-node clusters are ready, waiting for your


Wait on cluster provision
developers NOW, not a month from now

Full-stack containers ensure compatibility and Stack engineering


performance across layers

Includes pre-trained models that are Model optimization


optimized and ready to use

No refactoring of code to use service APIs


Job launch prep

Advanced telemetry and automated resource Job monitoring


management across all jobs & resource allocation
NVIDIA DGX Cloud for Custom LLMs
Delivering the Premium AI Training Service for the Era of Generative AI

Traditional AI NVIDIA DGX Organizations doubling-down


development Cloud on generative AI need:

NVIDIA AI Enterprise
NeMo, BioNeMo, Picasso Software that unleashes
DIY tools + open source job scheduling
and orchestration
developer productivity

Inconsistent access to multi- DGX Cloud Instance Multi-node training


(8) A100/H100 80GB GPUs
node scale across regions with multi-node scale performance + easy scale
10TB storage / instance
10TB egress
Community forums AI practitioners who help
and “sweat equity” NVIDIA AI Expert maximize performance

Escalating costs, add-on fees Predictable pricing that


for reserved instances, includes everything
storage, data egress etc.

Traditional Hosted in
AI clouds leading clouds
Deploy on NVIDIA AI Enterprise

RAG

LLM Prompts

Agent Vector Store LLM

NVIDIA AI Enterprise
(runtime)
The Challenges of the Enterprise Developer
65,000 public generative AI projects created on GitHub in 2023 – a 248% YoY growth

ACCESS TO ACCELERATED STAYING CURRENT ON SUCCESSFUL TRANSITION


INFRASTRUCTURE LATEST AI DEVELOPMENT FROM PILOT TO
SKILLS PRODUCTION
Designed for Enterprises that Run their Business on AI
NVIDIA AI Enterprise: Production-Grade Software for AI

Accelerated Computing Enterprise-Grade


Cloud Native & Certified
increases productivity while security, stability,
to run everywhere
lowering TCO manageability & support

Generative AI
NVIDIA AI
ENTERPR
ISE

CVE Patching API Stability


ETL/Spark Inference RTX 6000 Ada- H100 - DGX

#1
End-to-End SLAs with
Manageability NVIDIA Support


NVIDIA AI Enterprise
An Enterprise-Grade Software Platform for Your AI Runtimes

MLOps AI Applications

NVIDIA AI Enterprise

Infrastructure Management Application Frameworks

Cloud Native Management


and Orchestration
GPU Operator, Network Operator LLM Speech AI Cybersecurity Medical Imaging More
NeMo Riva Morpheus Clara

Cluster Management
AI Development
Base Command Manager Essentials
Data Science / Prep
Model Training and Customization
RAPIDS, RAPIDS Accelerator
NeMo, TAO, PyTorch, TensorFlow
for Apache Spark

Infra Acceleration Libraries


Deploy at Scale Optimize for Inference
Magnum IO, vGPU, CUDA
Triton Inference Server TensorRT, TensorRT-LLM

Cloud | Data Center | Workstations | Edge


Flexible Software Branches for All AI Deployments
Security, API Stability, and Peace of Mind for All Your AI Investments

Feature Branch v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v12 v13 v14 v15 v16


Top of tree SW optimization
Monthly release cadence
CVE patches and bug fixes in roll
forward release
NVIDIA AI ENTERPRISE

v1.1 v1.2 v1.3 v1.4 v1


v1.5 v1.6 v1.7 v1.8 v1.9
Production Branch
API Stability
Monthly CVE patches and bug fixes
2 branches/year with 9-month lifetime v7.1 v7.2 v7.3 v7.4 v1
v7.5 v7.6 v7.7 v7.8 v7.9
3-month overlap between 2 PBs

v14. v14. v14. v14.


1 2 3 4

Long-Term Support
Branch v.next v.next v.next v.next v.next v.next
For highly regulated industries
Quarterly CVE patches/bug fixes
Up to three years support
6-month overlap period
Accelerated AI Improves Productivity While Lowering Total Cost
Segment Anything Model (SAM) – TensorRT Optimized

24x Throughput 5% of the Cost


(Images/min) ($/1M Images)

31.8
$13,137

1.4 $662

CPU NVIDIA AI
Learn More
www.nvidia.com/ai-foundation-models

NVIDIA AI
Foundation Models
RAG
LLM Prompts

Agent Vector Store LLM

Community Deploy on NVIDIA AI Enterprise


Models & Augment with Enterprise Data

Customize in an
AI Foundry

DGX/ DGX
NeMo Cloud Foundation
Model
Enterprises Adopting NVIDIA AI Foundry
Improve Spear Phishing Detection with AI
Learn How Generative AI Can Be Used to Detect Spear Phishing Emails Faster

Tuesday, January 30, 9:00 am PT or


Wednesday, January 31, at 10:00 am CET
Register >

Join this webinar to learn how NVIDIA’s AI technologies,


along with ecosystem partners, can help organizations
build powerful solutions to defend against cyber threats.

Move from pilot to production for your spear phishing


detection AI solution with confidence
with NVIDIA AI Enterprise.
The In-Person GTC Experience
Is Back
Come to GTC—the conference for the era of AI—to connect with a
dream team of industry luminaries, developers, researchers, and
business experts shaping what’s next in AI and accelerated computing.

From the highly anticipated keynote by NVIDIA CEO Jensen Huang to


over 600 inspiring sessions, 200+ exhibits, and tons of networking
events, GTC delivers something for every technical level and interest
area.

Be sure to save your spot for this transformative event. You can even
take advantage of early-bird pricing when you register by February 7.

March 18-21, 2024 | www.nvidia.com/gtc

Plask
Q&A

You might also like