Stars
Interactively inspect module inputs, outputs, parameters, and gradients.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.
A tool for working with stacked PRs on github.
Helpful tools and examples for working with flex-attention
This repository contains the experimental PyTorch native float8 training UX
Master programming by recreating your favorite technologies from scratch.
🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
Collaborative Collection of C++ Best Practices. This online resource is part of Jason Turner's collection of C++ Best Practices resources. See README.md for more information.
Improved build system generator for CPython C, C++, Cython and Fortran extensions
Distribute and run LLMs with a single file.
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
A native PyTorch Library for large model training
Ring attention implementation with flash attention
Annotated version of the Mamba paper
Marp for VS Code: Create slide deck written in Marp Markdown on VS Code
Official Code for Stable Cascade
Large World Model -- Modeling Text and Video with Millions Context
FlashInfer: Kernel Library for LLM Serving