Unified AI client for all projects. One package, Vercel AI Gateway by default, direct provider escape hatches, provider-aware model tiers, and normalized generation settings.
import { createAI } from "@howells/ai";
import { generateText, Output, streamText, embed } from "ai";
const ai = createAI({
app: { name: "MyApp", url: "https://myapp.com" },
});
// Pick a model by tier
const { text } = await generateText({
model: ai.model("fast"),
prompt: "Classify this ingredient",
});
// Add capabilities per tier
const { text: analysis } = await generateText({
model: ai.model("powerful", {
agent: "taste-analysis",
tools: true,
vision: true,
}),
prompt: "Analyze this design",
});
// Structured output
const { output } = await generateText({
model: ai.model("standard", { agent: "search" }),
output: Output.object({ schema: myZodSchema }),
prompt: "Extract entities from this text",
});Use ai.generationOptions(...) for the settings that vary across providers:
reasoning budget, verbosity, structured-output provider behavior, tool policy,
response length, sampling, prompt cache, user attribution, and service tier.
const provider = "openai";
const { text } = await generateText({
model: ai.model("powerful", { provider, tools: true }),
prompt: "Plan the migration",
tools: migrationTools,
...ai.generationOptions({
provider,
reasoning: "high",
verbosity: "medium",
structured: "strict",
tools: "auto",
maxToolSteps: 5,
outputLength: "long",
creativity: "focused",
user: "migration-agent",
}),
});For Gateway calls, pass the canonical model ID when you want provider-specific options inferred as well as Gateway attribution:
const modelId = "openai/gpt-5.4";
await streamText({
model: ai.modelById(modelId),
prompt: "...",
...ai.generationOptions({
provider: "gateway",
modelId,
reasoning: "medium",
verbosity: "high",
}),
});| Normalized Option | AI SDK / Provider Mapping |
|---|---|
reasoning |
OpenAI reasoningEffort, Anthropic thinking, Google thinkingConfig, OpenRouter reasoning. Accepts a preset ("high") or { effort, maxTokens }. |
verbosity |
OpenAI textVerbosity |
structured |
OpenAI strict JSON schema, Anthropic structured output mode, Google structured outputs |
tools |
AI SDK toolChoice |
maxToolSteps |
AI SDK stopWhen: stepCountIs(n) |
parallelTools |
OpenAI/OpenRouter parallel tool calls, Anthropic inverse disable flag |
outputLength |
AI SDK maxOutputTokens preset |
creativity |
AI SDK temperature preset |
cache |
Anthropic cacheControl, OpenRouter cache_control. Pass "ephemeral" or { ttl: "5m" | "1h" }. |
serviceTier |
OpenAI/Google service tier where supported |
routing |
Gateway sort/only/order/zeroDataRetention/..., OpenRouter provider.{sort, only, ignore, order, allow_fallbacks, max_price, quantizations, zdr, data_collection} |
fallbackModels |
Gateway models, OpenRouter models (model fallback chain) |
tags |
Gateway tags (spend reporting). Ignored elsewhere. |
webSearch |
OpenRouter plugins: [{ id: "web", ... }]. For Gateway, wire gateway.tools.parallelSearch() / perplexitySearch() via AI SDK tools. |
responseHealing |
OpenRouter plugins: [{ id: "response-healing" }] (auto-repair JSON for generateObject). |
includeCost |
OpenRouter usage: { include: true }. Gateway returns cost automatically. |
logprobs / logitBias |
OpenRouter only (logprobs + top_logprobs, logit_bias). |
// Cheapest provider, ZDR-only, with a price ceiling and fallback model
await generateText({
model: ai.modelById("anthropic/claude-sonnet-4.6", { provider: "gateway" }),
prompt: "...",
...ai.generationOptions({
provider: "gateway",
modelId: "anthropic/claude-sonnet-4.6",
routing: {
prefer: "cheapest",
privacy: ["no-retention", "no-training"],
allow: ["anthropic", "amazon-bedrock"],
},
fallbackModels: ["anthropic/claude-haiku-4.5"],
tags: ["feature:checkout"],
}),
});routing.prefer accepts "auto", "cheapest", "fastest", or "highest-throughput".
routing.privacy accepts any combination of "no-retention", "no-training", "hipaa".
routing.maxCost (OpenRouter only) takes USD-per-million-token ceilings:
{ promptPerMillion, completionPerMillion, requestUsd }.
When the Gateway provider is configured, ai.gateway exposes the control-plane APIs:
const ai = createAI();
if (ai.gateway) {
const { balance } = await ai.gateway.credits();
const { models } = await ai.gateway.listModels();
const spend = await ai.gateway.spend({
startDate: "2026-04-01",
endDate: "2026-04-30",
groupBy: "model",
});
const info = await ai.gateway.generationInfo("gen_01H...");
}Normal tests are deterministic and do not call providers:
pnpm test
pnpm check-types
pnpm buildLive tests are opt-in because they use real API keys and spend provider quota.
They load keys from .env, .env.local, or apps/benchmark/.env.local, then
verify every configured provider/model route plus the normalized config option
matrix:
pnpm test:liveThe package ships a small CLI as both ai and howells-ai:
ai models
ai providers
ai doctor
ai doctor --live
ai test --provider openai
ai models --task coding
ai bench --provider gateway --task coding --tier fast --prompt "Reply in one sentence."Use --json on models, providers, doctor, test, and bench for
scriptable output. The CLI loads local keys from .env, .env.local, and
apps/benchmark/.env.local, and never prints secret values.
Language models are selected by tier, then capability flags. Structured input/output is a baseline requirement for every default language model.
| Tier | Text Default | Tools Default | Vision / Vision Tools Default | Use When |
|---|---|---|---|---|
nano |
xiaomi/mimo-v2-flash |
xiaomi/mimo-v2-flash |
google/gemini-3.1-flash-lite-preview |
Cheap structured output and light vision work |
fast |
x-ai/grok-4.1-fast |
x-ai/grok-4.1-fast |
x-ai/grok-4.1-fast |
Low-latency tool calls, chat, image reads, long context |
standard |
google/gemini-3-flash-preview |
google/gemini-3-flash-preview |
google/gemini-3-flash-preview |
Everyday tasks, chat, coding, vision, 1M context |
powerful |
x-ai/grok-4.3 |
x-ai/grok-4.3 |
x-ai/grok-4.3 |
High-quality synthesis with strong speed/cost balance |
reasoning |
anthropic/claude-opus-4.7 |
anthropic/claude-opus-4.7 |
anthropic/claude-opus-4.7 |
Frontier quality and deep multi-step reasoning |
ai.model("fast"); // fast text
ai.model("fast", { tools: true }); // fast tool calling
ai.model("fast", { vision: true }); // fast image understanding
ai.model("fast", { tools: true, vision: true }); // fast image + toolsPass task when the best model depends on the job more than the generic tier.
general preserves the base matrix; other tasks layer RouterBase-informed picks
over the same tier/capability shape.
ai.model("fast", { task: "coding", tools: true }); // MiniMax M2.5
ai.model("standard", { task: "coding" }); // GLM 5
ai.model("fast", { task: "agentic", tools: true }); // Grok 4.1 Fast
ai.model("standard", { task: "vision", vision: true }); // Gemini 3 Flash Preview
ai.model("standard", { task: "longContext" }); // Grok 4.1 FastAvailable tasks: general, coding, agentic, chat, bulk, vision,
reasoning, longContext, and creative.
When you pin a provider, task selection stays inside that provider wherever the
provider has coverage. For example, provider: "openai", task: "coding" routes
to OpenAI's Codex line, while provider: "zai", task: "vision" routes to GLM's
vision model instead of falling back to the global winner from another provider.
If a requested capability is incompatible with the resolved model, selection
throws before any provider call. For example, provider: "deepseek", vision: true fails locally because DeepSeek's selected models are not vision-capable.
| Slot | Voyage Default | Gemini Default | Use When |
|---|---|---|---|
embed |
voyage-3 |
gemini-embedding-2-preview |
Text embeddings |
multimodalEmbed |
voyage-multimodal-3.5 |
gemini-embedding-2-preview |
Text + image embeddings |
rerank |
rerank-2.5 |
n/a | Search result reranking |
Override any tier variant or retrieval model per project:
import {
ANTHROPIC_MODELS,
createAI,
GOOGLE_EMBED_MODELS,
VOYAGE_MODELS,
} from "@howells/ai";
const ai = createAI({
app: { name: "Sorrel", url: "https://sorrel.app" },
models: {
standard: {
text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
tools: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
},
tasks: {
coding: {
standard: {
text: ANTHROPIC_MODELS.CLAUDE_SONNET_4_6,
},
},
},
embed: { voyage: VOYAGE_MODELS.VOYAGE_3_LITE },
rerank: VOYAGE_MODELS.RERANK_2_5_LITE,
},
});Embedding slots are provider-aware. Configure embed and multimodalEmbed
once, then select the provider at the call site:
const ai = createAI({
models: {
embed: {
voyage: VOYAGE_MODELS.VOYAGE_3,
gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
},
multimodalEmbed: {
voyage: VOYAGE_MODELS.MULTIMODAL_3_5,
gemini: GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2,
},
},
});import { embed, embedMany } from "ai";
// Provider-neutral text embeddings
const { embedding } = await embed({
model: ai.embeddingModel({ input: "text", provider: "voyage" }),
value: "some text",
});
// Provider-neutral image or image+text embeddings.
// Switch to { provider: "gemini" } without changing the call site shape.
const imageModel = ai.embeddingModel({ input: "image", provider: "voyage" });
// Google Gemini text embeddings (for benchmarking)
const { embedding: g } = await embed({
model: ai.embeddingModel({ input: "text", provider: "gemini" }),
value: "some text",
});
// Google Gemini image+text embeddings
const { embedding: imageEmbedding } = await embed({
model: ai.embeddingModel({ input: "image", provider: "gemini" }),
value: "green woven upholstery",
providerOptions: {
google: {
content: [
[{ inlineData: { mimeType: "image/png", data: "<base64>" } }],
],
},
},
});
// Batch
const { embeddings } = await embedMany({
model: ai.embeddingModel({ provider: "voyage" }),
values: ["text one", "text two", "text three"],
});const reranker = ai.rerankModel();Some frameworks accept config objects instead of AI SDK models:
const model = ai.modelConfig("deepseek/deepseek-v3.2", {
provider: "openrouter",
agent: "materials-agent",
});
// { provider, id, service, capabilities, apiKey, serviceApiKey, baseURL, headers, user }The capabilities field describes which config fields the selected provider
can consume, so callers can pass through the useful fields without branching on
one provider-specific helper.
| Provider | API Key | Base URL | Headers | App Attribution | Agent Attribution |
|---|---|---|---|---|---|
gateway |
yes | no | no | no | no |
openrouter |
yes | yes | yes | yes | yes |
anthropic |
yes | no | no | no | no |
openai |
yes | no | no | no | no |
google |
yes | no | no | no | no |
deepseek |
yes | yes | no | no | no |
xai |
yes | yes | no | no | no |
qwen |
yes | yes | no | no | no |
zai |
yes | yes | no | no | no |
moonshotai |
yes | yes | no | no | no |
For OpenRouter direct HTTP clients, request an OpenRouter model config and pass
user in the request body:
const config = ai.modelConfig("deepseek/deepseek-v3.2", {
provider: "openrouter",
agent: "nl-search",
});
await fetch(`${config.baseURL}/chat/completions`, {
method: "POST",
headers: {
Authorization: `Bearer ${config.apiKey}`,
...config.headers,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "deepseek/deepseek-v3.2",
messages,
user: config.user,
}),
});For models that don't fit any tier:
const { text } = await generateText({
model: ai.modelById("openai/gpt-5-nano"),
prompt: "...",
});Route through OpenRouter or direct providers when needed:
ai.model("standard", { provider: "openrouter" });
ai.modelById("claude-sonnet-4-6", { provider: "anthropic" });
ai.modelById("x-ai/grok-4.3", { provider: "xai" });
ai.modelById("moonshotai/kimi-k2.6", { provider: "moonshotai" });Constants use normalized package IDs. createAI() translates known provider
mismatches at runtime, such as Anthropic's direct 4-6 IDs, OpenRouter and
Google direct google/gemini-3-flash-preview IDs for Gemini 3 Flash,
Gateway's xai/grok-4.1-fast-non-reasoning, and Alibaba-hosted Qwen IDs.
DeepSeek, xAI, Qwen, Z.ai, and Moonshot/Kimi are direct OpenAI-compatible
routes when their keys are configured. Other catalog services such as MiniMax,
StepFun, Xiaomi, Inception, and Nex AGI route through Gateway or OpenRouter.
Tag OpenRouter requests for per-agent cost tracking:
ai.model("fast", { agent: "search", provider: "openrouter" });
// Sends user tag when provider is "openrouter"import {
ANTHROPIC_MODELS,
DEEPSEEK_MODELS,
GLM_MODELS,
GOOGLE_EMBED_MODELS,
GOOGLE_MODELS,
INCEPTION_MODELS,
KIMI_MODELS,
MINIMAX_MODELS,
NEX_AGI_MODELS,
OPENAI_MODELS,
PROVIDER_TASK_DEFAULT_MODELS,
QWEN_MODELS,
STEPFUN_MODELS,
VOYAGE_MODELS,
XAI_MODELS,
XIAOMI_MODELS,
} from "@howells/ai";
// Anthropic
ANTHROPIC_MODELS.CLAUDE_OPUS_4_7 // "anthropic/claude-opus-4.7"
ANTHROPIC_MODELS.CLAUDE_OPUS_4_6 // "anthropic/claude-opus-4.6"
ANTHROPIC_MODELS.CLAUDE_SONNET_4_6 // "anthropic/claude-sonnet-4.6"
// DeepSeek
DEEPSEEK_MODELS.DEEPSEEK_V3_2 // "deepseek/deepseek-v3.2"
DEEPSEEK_MODELS.DEEPSEEK_V4_FLASH // "deepseek/deepseek-v4-flash"
// GLM / Z.ai
GLM_MODELS.GLM_5 // "z-ai/glm-5"
GLM_MODELS.GLM_5V_TURBO // "z-ai/glm-5v-turbo"
GLM_MODELS.GLM_4_7 // "z-ai/glm-4.7"
GLM_MODELS.GLM_4_7_FLASH // "z-ai/glm-4.7-flash"
GLM_MODELS.GLM_4_6V // "z-ai/glm-4.6v"
// Kimi / Moonshot
KIMI_MODELS.KIMI_K2_6 // "moonshotai/kimi-k2.6"
KIMI_MODELS.KIMI_K2_5 // "moonshotai/kimi-k2.5"
KIMI_MODELS.KIMI_K2_THINKING // "moonshotai/kimi-k2-thinking"
// Google language models
GOOGLE_MODELS.GEMINI_3_FLASH_PREVIEW // "google/gemini-3-flash-preview"
GOOGLE_MODELS.GEMINI_3_1_PRO_PREVIEW // "google/gemini-3.1-pro-preview"
GOOGLE_MODELS.GEMINI_3_1_FLASH_LITE_PREVIEW
// OpenAI
OPENAI_MODELS.GPT_5_4_NANO // "openai/gpt-5.4-nano"
OPENAI_MODELS.GPT_5_4 // "openai/gpt-5.4"
OPENAI_MODELS.GPT_5_3_CODEX // "openai/gpt-5.3-codex"
// Qwen
QWEN_MODELS.QWEN_3_235B_A22B_2507 // "qwen/qwen3-235b-a22b-2507"
QWEN_MODELS.QWEN_3_NEXT_80B_A3B_INSTRUCT_FREE
QWEN_MODELS.QWEN_3_6_PLUS // "qwen/qwen3.6-plus"
// xAI
XAI_MODELS.GROK_4_1_FAST // "x-ai/grok-4.1-fast"
XAI_MODELS.GROK_4_3 // "x-ai/grok-4.3"
// Gateway/OpenRouter-only services
MINIMAX_MODELS.MINIMAX_M2_7 // "minimax/minimax-m2.7"
MINIMAX_MODELS.MINIMAX_M2_5 // "minimax/minimax-m2.5"
STEPFUN_MODELS.STEP_3_5_FLASH // "stepfun/step-3.5-flash"
XIAOMI_MODELS.MIMO_V2_FLASH // "xiaomi/mimo-v2-flash"
INCEPTION_MODELS.MERCURY_2 // "inception/mercury-2"
NEX_AGI_MODELS.DEEPSEEK_V3_1_NEX_N1 // "nex-agi/deepseek-v3.1-nex-n1"
// Provider-pinned task matrix
PROVIDER_TASK_DEFAULT_MODELS.openai?.coding?.standard?.text
// "openai/gpt-5.3-codex"
ai.modelCapabilities({ modelId: "deepseek/deepseek-v3.2" })
// { structured: true, tools: true, vision: false }
// Voyage
VOYAGE_MODELS.VOYAGE_3 // "voyage-3"
VOYAGE_MODELS.VOYAGE_3_LITE // "voyage-3-lite"
VOYAGE_MODELS.VOYAGE_3_5 // "voyage-3.5"
VOYAGE_MODELS.VOYAGE_3_5_LITE // "voyage-3.5-lite"
VOYAGE_MODELS.MULTIMODAL_3 // "voyage-multimodal-3"
VOYAGE_MODELS.MULTIMODAL_3_5 // "voyage-multimodal-3.5"
VOYAGE_MODELS.RERANK_2_5 // "rerank-2.5"
VOYAGE_MODELS.RERANK_2_5_LITE // "rerank-2.5-lite"
// Google
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_2 // "gemini-embedding-2-preview"
GOOGLE_EMBED_MODELS.GEMINI_EMBEDDING_1 // "gemini-embedding-001"| Variable | Required | Used By |
|---|---|---|
AI_GATEWAY_API_KEY |
Yes locally for default language models | Vercel AI Gateway |
OPENROUTER_API_KEY |
Only if using provider: "openrouter" |
OpenRouter provider |
ANTHROPIC_API_KEY |
Only if using provider: "anthropic" |
Anthropic provider |
OPENAI_API_KEY |
Only if using provider: "openai" |
OpenAI provider |
VOYAGE_API_KEY |
Yes (for embed/rerank) | Voyage provider |
GOOGLE_GEMINI_API_KEY |
Only if using Gemini embeddings or provider: "google" |
Google provider |
DEEPSEEK_API_KEY |
Only if using provider: "deepseek" |
DeepSeek direct provider |
XAI_API_KEY |
Only if using provider: "xai" |
xAI direct provider |
QWEN_API_KEY |
Only if using provider: "qwen" |
Qwen direct provider |
ZAI_API_KEY |
Only if using provider: "zai" |
Z.ai / GLM direct provider |
MOONSHOT_API_KEY |
Only if using provider: "moonshotai" |
Moonshot / Kimi direct provider |
Keys can also be passed directly to createAI():
const ai = createAI({
gatewayKey: "vck_...",
openRouterKey: "sk-or-...",
voyageKey: "pa-...",
googleKey: "...",
xaiKey: "...",
moonshotKey: "...",
serviceKeys: {
zai: "...",
qwen: "...",
},
});Service keys are exposed through ai.availableServices and ai.modelConfig()
for runtimes that can use provider-specific credentials. The same keys also
enable direct OpenAI-compatible AI SDK routes for DeepSeek, xAI, Qwen, Z.ai, and
Moonshot/Kimi.
- Each
createAI()returns an independent client (no shared module state) - Providers are lazy-initialized on first use
- Safe for tests and multi-config scenarios
- Language models route through Vercel AI Gateway by default
- OpenRouter and direct provider routes are available per call
- Embeddings/reranking through Voyage AI or Google