Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Tutorials#
The best way to get started with NeMo is to start with one of our tutorials. These tutorials cover various domains and provide both introductory and advanced topics. They are designed to help you understand and use the NeMo toolkit effectively.
Running Tutorials on Colab#
Most NeMo tutorials can be run on Google’s Colab.
To run a tutorial:
Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.
Tutorial Overview#
Domain |
Title |
GitHub URL |
---|---|---|
General |
Getting Started: NeMo Fundamentals |
|
General |
Getting Started: Audio translator example |
|
General |
Getting Started: Voice swap example |
|
General |
Getting Started: NeMo Models |
|
General |
Getting Started: NeMo Adapters |
|
General |
Getting Started: NeMo Models on Hugging Face Hub |
Domain |
Title |
GitHub URL |
---|---|---|
Multimodal |
Preparations and Advanced Applications: Multimodal Data Preparation |
|
Multimodal |
Preparations and Advanced Applications: NeVA (LLaVA) Tutorial |
|
Multimodal |
Preparations and Advanced Applications: Stable Diffusion Tutorial |
|
Multimodal |
Preparations and Advanced Applications: DreamBooth Tutorial |
|
Multimodal |
Preparations and Advanced Applications: Stable Diffusion XL Quantization Tutorial |
Domain |
Title |
GitHub URL |
---|---|---|
ASR |
ASR with NeMo |
|
ASR |
ASR with Subword Tokenization |
|
ASR |
Offline ASR |
|
ASR |
Online ASR Microphone Cache Aware Streaming |
|
ASR |
Online ASR Microphone Buffered Streaming |
|
ASR |
ASR CTC Language Fine-Tuning |
|
ASR |
Intro to Transducers |
|
ASR |
ASR with Transducers |
|
ASR |
ASR with Adapters |
|
ASR |
Speech Commands |
|
ASR |
Online Offline Microphone Speech Commands |
|
ASR |
Voice Activity Detection |
|
ASR |
Online Offline Microphone VAD |
|
ASR |
Speaker Recognition and Verification |
|
ASR |
Speaker Diarization Inference |
|
ASR |
ASR with Speaker Diarization |
|
ASR |
Online Noise Augmentation |
|
ASR |
ASR for Telephony Speech |
|
ASR |
Streaming inference |
|
ASR |
Buffered Transducer inference |
|
ASR |
Buffered Transducer inference with LCS Merge |
|
ASR |
Offline ASR with VAD for CTC models |
|
ASR |
Self-supervised Pre-training for ASR |
|
ASR |
Multi-lingual ASR |
|
ASR |
Hybrid ASR-TTS Models |
|
ASR |
ASR Confidence Estimation |
|
ASR |
Confidence-based Ensembles |
Domain |
Title |
GitHub URL |
---|---|---|
TTS |
Basic and Advanced: NeMo TTS Primer |
|
TTS |
Basic and Advanced: TTS Speech/Text Aligner Inference |
|
TTS |
Basic and Advanced: FastPitch and MixerTTS Model Training |
|
TTS |
Basic and Advanced: FastPitch Finetuning |
|
TTS |
Basic and Advanced: FastPitch and HiFiGAN Model Training for German |
|
TTS |
Basic and Advanced: Tacotron2 Model Training |
|
TTS |
Basic and Advanced: FastPitch Duration and Pitch Control |
|
TTS |
Basic and Advanced: FastPitch Speaker Interpolation |
|
TTS |
Basic and Advanced: TTS Inference and Model Selection |
|
TTS |
Basic and Advanced: TTS Pronunciation Customization |
Domain |
Title |
GitHub URL |
---|---|---|
Utility Tools |
Utility Tools for Speech and Text: NeMo Forced Aligner |
|
Utility Tools |
Utility Tools for Speech and Text: Speech Data Explorer |
|
Utility Tools |
Utility Tools for Speech and Text: CTC Segmentation |
Domain |
Title |
GitHub URL |
---|---|---|
Text Processing |
Text Normalization Techniques: Text Normalization |
|
Text Processing |
Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger |
|
Text Processing |
Text Normalization Techniques: WFST Tutorial |