- Discord Office Hours - Come ask us anything! We hold office hours most days (9am - 12pm PST).
- Documentation - Learn BAML
- Documentation - BAML Syntax Reference
- Documentation - Prompt engineering tips
- Boundary Studio - Observability and more
Calling LLMs in your code is frustrating:
- your code uses types everywhere: classes, enums, and arrays
- but LLMs speak English, not types
BAML makes calling LLMs easy by taking a type-first approach that lives fully in your codebase:
- Define types for your inputs and outputs in BAML files
- Define prompt templates using these types in BAML
- Define retries and fallbacks in BAML
- Use a generated Python/Typescript client to call LLMs with those types and templates
We were inspired by similar patterns for type safety: protobuf and OpenAPI for RPCs, Prisma and SQLAlchemy for databases.
BAML guarantees type safety for LLMs and comes with tools to give you a great developer experience:
Jump to BAML code or how Flexible Parsing works without additional LLM calls.
BAML Tooling | Capabilities |
---|---|
BAML Compiler install | Transpiles BAML code to a native Python / Typescript library (you only need it for development, never for releases) Works on Mac, Windows, Linux |
VSCode Extension install | Syntax highlighting for BAML files Real-time prompt preview Testing UI |
Boundary Studio open (not open source) |
Type-safe observability Labeling |
✅ Type-safe guarantee
The LLM will return your data model, or we'll raise an exception. We use Flexible Parsing
✅ Fast iteration loops (code)
Compare multiple prompts / LLM providers in VSCode
Streaming partial jsons (code)
✅ Python
🚧 Typescript
BAML fills in incomplete jsons as they come in
✅ LLM Robustness for production (code)
Retry policies, Fallback strategies, Round-robin selection
✅ Many LLM providers
OpenAI, Azure, Anthropic out-of-the-box. Reach out to get beta access for Mistral and more
✅ Comments in prompts (code)
Your future self will thank you
Use Cases | Prompt Examples |
---|---|
✅ Function calling (code) Using tools |
✅ Chain of thought (code) Using techniques like reasoning |
✅ Classification (code) Getting intent from a customer message |
✅ Multi-shot (code) Adding examples to the prompt |
✅ Extraction (code) Extracting a Resume data model from unstructed text |
✅ Symbol tuning (code) Using symbolic names for data-types |
✅ Agents Orchestrating multiple prompts to achieve a goal |
✅ Chat roles (code) Use system / assistant / whatever you want. We standardize it for all models |
🚧 Images Coming soon |
Mac:
brew install boundaryml/baml/baml
Linux:
curl -fsSL https://raw.githubusercontent.com/BoundaryML/homebrew-baml/main/install-baml.sh | bash
Windows (with Scoop):
scoop bucket add baml-bucket https://github.com/boundaryml/homebrew-baml
scoop install baml
Search for "BAML" or Click here
If you are using python, enable typechecking in VSCode’s settings.json:
"python.analysis.typecheckingMode": "basic"
cd my_project
baml init
For now this readme use rust syntax highlighting, but once we have 200 repos using BAML, Github will support BAML!
// extract_resume.baml
// 1. Define the type for the output
class Resume {
name string
// Use an array to get multiple education histories
education Education[]
}
// A nested class
class Education {
university string
start_year int
// @description injects context into the prompt about this field
end_year int? @description("unset if still in school")
}
// 2. Define the function signature
// This function takes in a single paramater
// Outputs a Resume type
function ExtractResume {
input (resume_text: string)
output Resume
}
// 3. Use an llm to implement ExtractResume.
// We'll name this impl 'version1'.
impl<llm, ExtractResume> version1 {
client GPT4Client
// BAML will automatically dedent and strip the
// prompt for you. You can see the prompt fully
// in the VSCode preview (including whitespace).
// We provide some macros like {#input} and {#print_type}
// to use the types you defined above.
prompt #"
Extract the resume from:
###
{#input.resume_text}
###
Output JSON Schema:
{#print_type(output)}
"#
}
// Define a reuseable client for an LLM
// that can be used in any impl
client<llm> GPT4Client {
provider "baml-openai-chat"
options {
model "gpt-4"
// Use an API key safely!
api_key env.OPENAI_API_KEY
// temperature 0.5 // Is 0 by default
}
}
# baml_client is autogenerated
from baml_client import baml as b
# BAML also auto generates types for all your data models
from baml_client.baml_types import Resume
async def get_resume(resume_url: str) -> Resume:
resume_text = await load_resume(resume_url)
# Call the generated BAML function (this uses 'version1' by default)
resume = await b.ExtractResume(resume_text=resume_text)
# Not required, BAML already guarantees this!
assert isinstance(resume, Resume)
return resume
// baml_client is autogenerated
import baml as b from "@/baml_client";
// BAML also auto generates types for all your data models
import { Resume } from "@/baml_client/types";
function getResume(resumeUrl: string): Promise<Resume> {
const resume_text = await loadResume(resumeUrl);
// Call the BAML function (this uses 'version1' by default)
// This will raise an exception if we don't find
// a Resume type.
return b.ExtractResume({ resumeText: content });
}
// This will be available as an enum in your Python and Typescript code.
enum Category {
Refund
CancelOrder
TechnicalSupport
AccountIssue
Question
}
function ClassifyMessage {
input string
// Could have used Category[] for multi-label
output Category
}
impl<llm, ClassifyMessage> version1 {
client GPT4
// BAML also provides a {#print_enum} macro in addition to
// {#input} or {#print_type}.
prompt #"
Classify the following INPUT into ONE
of the following Intents:
{#print_enum(Category)}
INPUT: {#input}
Response:
"#
}
With BAML, this is just a simple extraction task!
class GithubCreateReleaseParams {
owner string
repo string
tag_name string
target_commitish string
name string
body string
draft bool
prerelease bool
}
function BuildGithubApiCall {
input string
output GithubCreateReleaseParams
}
impl<llm, BuildGithubApiCall> v1 {
client GPT35
prompt #"
Given the user query, extract the right details:
Instructions
{#input}
Output JSON Schema:
{#print_type(output)}
JSON:
"#
}
If you wanted to call multiple functions you can combine enums with classes. Almost 0 changes to the prompt.
+ enum Intent {
+ CreateRelease
+ CreatePullRequest
+ }
+
+ class GithubCreatePullRequestParams {
+ ...
+ }
+
+ class Action {
+ tag Intent
+ data GithubCreateReleaseParams | GithubCreatePullRequestParams
+ }
function BuildGithubApiCall {
input string
- output GithubCreateReleaseParams
+ output Action
}
impl<llm, BuildGithubApiCall> v1 {
client GPT35
prompt #"
Given the user query, extract the right details:
Instructions
{#input}
+ {#print_enum(Intent)}
Output JSON Schema:
{#print_type(output)}
JSON:
"#
}
With BAML you combine AI functions with regular code to create powerful agents. That means you can do everything purely in python or typescript!
from baml_client import baml as b
from baml_client.baml_types import Intent
async def handle_message(msg: str) -> str:
# Determine what the user is trying to do
intent = await b.ClassifyMessage(msg)
if intent == Intent.AccountBalance:
# Get the balance, requires no more AI functions
balance = await some_remote_api(...)
return f'Your balance is: {balance}'
if intent == Intent.ShowPlot:
# Call another AI function with some additional context
plot_query = await b.GetPlotQuery(
table_defs=load_table_defintions(), query=msg)
# Run the query, then return a plot
response = await some_query_executor(plot_query)
return response.to_plot_html()
...
Supporting BAML code
enum Intent {
AccountBalance
ShowPlot
Other
}
function ClassifyMessage {
input string
output Intent
}
function GetPlotQuery {
input (table_defs: string, query: string)
output string
}
// Impls of prompts pending
We make it easy to add retry policies to your LLM calls (and also provide other resilience strategies). Your python / typescript interfaces for functions using the GPT4Client
are completely unimpacted.
client<llm> GPT4Client {
provider "baml-openai-chat"
+ retry_policy SimpleRetryPolicy
options {
model "gpt-4"
api_key env.OPENAI_API_KEY
}
}
+ retry_policy SimpleRetryPolicy {
+ max_retries 5
+ strategy {
+ type exponential_backoff
+ delay_ms 300
+ multiplier 1.5
+ }
+ }
To do planning with BAML, just tell the LLM what planning steps to do. Thanks to Flexible Parsing, BAML will automatically find your data objects and convert them automatically.
impl<llm, GetOrderInfo> version1 {
client GPT4
prompt #"
Given the email below:
Email Subject: {#input.subject}
Email Body: {#input.body}
Extract this info from the email in JSON format:
{#print_type(output)}
Schema definitions:
{#print_enum(OrderStatus)}
+ Before you output the JSON, please explain your
+ reasoning step-by-step. Here is an example on how to do this:
+ 'If we think step by step we can see that ...
+ therefore the output JSON is:
+ {
+ ... the json schema ...
+ }'
"#
}
To add examples into your prompt with BAML, you can use a multiple parameters:
function DoSomething {
+ input (my_data: int, examples: string)
output string
}
impl<llm, DoSomething> v1 {
client GPT4
prompt #"
Given DATA do something cool!
DATA: {#input.my_data}
+ Examples:
+ {#input.examples}
"#
}
Sometimes using abstract names as "symbols" (e.g. k1, k2, k3…) allows the LLM to focus on your rules better.
- research paper
- Also used by OpenAI for their content moderation.
// Enums will still be available as Category.Refund
// BAML will auto convert k1 --> Category.Refund for you :)
enum Category {
Refund @alias("k1")
@description("Customer wants to refund a product")
CancelOrder @alias("k2")
@description("Customer wants to cancel an order")
TechnicalSupport @alias("k3")
@description("Customer needs help with a technical issue unrelated to account creation or login")
AccountIssue @alias("k4")
@description("Specifically relates to account-login or account-creation")
Question @alias("k5")
@description("Customer has a question")
// Skip this category for the LLM
Bug @skip
}
// whenever you now use:
// {#print_enum(Category)}
// BAML will substitute in the alias and description automatically
// and parse anything the LLM returns into the appropriate type
In BAML you do this by declaring multiple impls
. The VSCode Extension will also let you run the tests side by side.
// Same signature as above
function ExtractResume {
input (resume_text: string)
output Resume
+ // Declare which impl is the default one my python/ts code calls
+ default_impl version1
}
// My original impl
impl<llm, ExtractResume> version1 {
...
}
+ // My new and super improved impl
+ impl<llm, ExtractResume> with_claude {
+ // Since resumes are faily easy, i'll try claude here
+ client ClaudeClient
+ prompt #"
+ {#chat(system)}
+ You are an expert tech recruiter.
+ Extract the resume from TEXT.
+
+ {#chat(user)}
+ TEXT
+ ###
+ {#input.resume_text}
+ ###
+
+ {#chat(assistant)}
+ Output JSON Schema:
+ {#print_type(output)}
+ "#
+ }
+
+ // another client definition
+ client<llm> ClaudeClient {
+ provider "baml-anthropic-chat"
+ options {
+ model "claude-3-haiku-20240307"
+ api_key env.ANTHROPIC_API_KEY
+ }
+ }
Instead of using arrays of {"role": string, "content": string}
, BAML uses a {#chat(role)}
macro that auto converts prompts into multiple messages. Everything from after the {#chat(role)}
to either the next {#chat(role)}
or the end is included as the content.
VSCode will give you a live preview of exactly how it will be split.
impl<llm, BuildGithubApiCall> v1 {
client GPT35
prompt #"
+ {#chat(system)}
Given the user query, extract the right details:
+ {#chat(user)}
{#input}
+ {#chat(assitant)}
{#print_enum(Intent)}
Output JSON Schema:
{#print_type(output)}
JSON:
"#
}
Once we finish adding loops and conditionals, you'll be able to add
{#chat(role)}
dynamically based on inputs and clients as well!
BAML is able to offer streaming for partial jsons out of the box. No changes to BAML files, just call a different python function (TS support coming soon).
async def main():
async with b.ExtractResume.stream(resume_text="...") as stream:
async for output in stream.parsed_stream:
if output.is_parseable:
# name is typed with None | str (in case we haven't gotten the name field in the response yet.)
name = output.parsed.name
print(f"streaming: {output.parsed.model_dump_json()}")
# You can also get the current delta. This will always be present.
print(f"streaming: {output.delta}")
# Resume type
final_output = await stream.get_final_response()
if final_output.has_value:
print(f"final response: {final_output.value}")
else:
# A parsing error likely occurred.
print(f"final resopnse didnt have a value")
Analyze, label, and trace each request in Boundary Studio.
You can add prompts in comments wrapped around {// comment //}
.
#"
Hello world.
{// this won't show up in the prompt! //}
Please {// 'please' works best, don't ask.. //} enter your name:
"#
Comments can be multiline
#"
{//
some giant
comment
//}
"#
"be conservative in what you send, be liberal in what you accept". The principle is also known as Postel's law, after Jon Postel, who used the wording in an early specification of TCP.
In other words, programs that send messages to other machines (or to other programs on the same machine) should conform completely to the specifications, but programs that receive messages should accept non-conformant input as long as the meaning is clear. [1]
LLMs are prone to producing non-conformant outputs. Instead of wasting tokens and time getting the prompt perfect to your needs, we built a parser that handles many of these scenarios for you. The parser uses 0 LLMs, instead relies on the types you define in BAML.
BAML Data model
class Quote {
author string @alias("name")
quote string[] @description("in lower case")
}
Raw LLM Output
The principal you were talking about is Postel's Law by Jon Postel. Your answer in the schema you requested is: ```json { "name": "Jon Postel", "quote": "be conservative in what you send, be liberal in what you accept" } ```
What BAML parsed as
{
"author": "Jon Postel",
"quote": ["be conservative in what you send, be liberal in what you accept"]
}
What the parser did
- Stripped all the prefix and suffix text around the object
{
"name": "Jon Postel",
"quote": "be conservative in what you send, be liberal in what you accept"
}
- Replaced the alias
name
-->author
{
"author": "Jon Postel",
"quote": "be conservative in what you send, be liberal in what you accept"
}
- Converted
quote
fromstring
->string[]
{
"author": "Jon Postel",
"quote": ["be conservative in what you send, be liberal in what you accept"]
}
We started building SDKs for TypeScript and Python (and even experimented with YAML), and it worked, but they weren't fun. Writing software should be fun, not frustrating. We set out to build the best developer experience for AI. No boilerplate, dead-simple syntax, great errors, auto-complete.
Here’s our detailed comparison vs Pydantic and other frameworks. TL;DR: BAML is more than just a data-modeling library like Pydantic.
- Everything is typesafe
- The prompt is also never hidden from you
- It comes with an integrated playground
- can support any model
No, BAML uses a custom-built compiler. Takes just a few milliseconds!
Basically, A Made-up Language
Rust 🦀
BAML files are only used to generate Python or Typescript code. You don’t need to install the BAML compiler in your actual production servers. Just commit the generated code as you would any other python code, and you're good to go
Your BAML-generated code never talks to our servers. We don’t proxy LLM APIs -- you call them directly from your machine. We only publish traces to our servers if you enable Boundary Studio explicitly.
BAML and the VSCode extension will always be 100% free and open-source.
Our paid capabilities only start if you use Boundary Studio, which focuses on Monitoring, Collecting Feedback, and Improving your AI pipelines. Contact us for pricing details at [email protected].
Please do not file GitHub issues or post on our public forum for security vulnerabilities, as they are public!
Boundary takes security issues very seriously. If you have any concerns about BAML or believe you have uncovered a vulnerability, please get in touch via the e-mail address [email protected]. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you as soon as possible.
Note that this security address should be used only for undisclosed vulnerabilities. Please report any security problems to us before disclosing it publicly.
Made with ❤️ by Boundary
HQ in Seattle, WA
P.S. We're hiring for software engineers. Email us or reach out on discord!