Skip to content

Files

Failed to load latest commit information.

Latest commit

 Cannot retrieve latest commit at this time.

History

History

nanogpt

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Implementing NanoGPT

Introduction

This example implements a NanoGPT model using Tripy:

  1. model.py defines the model as an nvtripy.Module.
  2. weight_loader.py loads weights from a HuggingFace checkpoint.
  3. example.py runs inference in float16 on input text and displays the output.

Running The Example

  1. Install prerequisites:

    python3 -m pip install -r requirements.txt
  2. Run the example:

    python3 example.py --input-text "What is the answer to life, the universe, and everything?"
  3. [Optional] Use a fixed seed for predictable outputs:

    python3 example.py --input-text "What is the answer to life, the universe, and everything?" --seed=0

Running with Quantization

quantization.py, uses NVIDIA TensorRT Model Optimizer to quantize the pytorch model.

load_quant_weights_from_hf in weight_loader.py converts the quantization parameters to scales and loads them into the Tripy model.

Use --quant-mode in example.py to enable quantization. Supported modes:

  • Weight-only int8 quantization:

    python3 example.py --input-text "What is the answer to life, the universe, and everything?" --seed=0 --quant-mode int8-weight-only

Warning

For this model, int4 quantization may result in poor accuracy. We include it only to demonstrate the workflow.

  • Weight-only int4 quantization:

    python3 example.py --input-text "What is the answer to life, the universe, and everything?" --seed=0 --quant-mode int4-weight-only