Can I self-host Lazu?
Yes. Lazu is designed as an open-source AI gateway and console. Use the hosted service at lazu.ai, or deploy your own API, web console, docs, database, storage, and email configuration.
FAQ
Direct answers about model access, migration, pricing, privacy, and troubleshooting.
Yes. Lazu is designed as an open-source AI gateway and console. Use the hosted service at lazu.ai, or deploy your own API, web console, docs, database, storage, and email configuration.
Lazu exposes the live token-scoped model catalog from GET /api/models/catalog. Current first-class provider families are OpenAI, Anthropic, Google Gemini, and xAI.
Lazu supports OpenAI, Anthropic, Google Gemini, xAI, and custom OpenAI-compatible upstream channels. The console model list and changelog are the source of truth for newly added models.
Change two lines: set base_url from https://api.openai.com/v1 to https://api.lazu.ai/v1, then replace api_key with your Lazu key. Your application code can keep using the official OpenAI Python, JavaScript, or Go SDKs.
Yes. Use the Lazu endpoint at https://api.lazu.ai and connect directly from domestic IP addresses without a VPN or proxy. Production latency figures should be checked against the live monitoring dashboard.
Pricing follows upstream token usage: input tokens, output tokens, and cache tokens are accounted for separately where the provider exposes them. There is no monthly fee and no minimum spend.
Lazu supports online recharge through Stripe.
Lazu does not store request or response bodies. It keeps request metadata such as timestamp, token count, status code, and request_id for billing and troubleshooting. User data is not used to train models and is only forwarded to upstream providers as needed to fulfill the request.
Yes. Chat completion endpoints support SSE streaming with stream: true, matching OpenAI-compatible behavior.
Yes. In addition to OpenAI-compatible /v1/chat/completions, Lazu exposes /v1/messages for Anthropic-compatible requests using x-api-key and anthropic-version headers. Extended Thinking, Prompt Caching, and Tool Use are supported through that path.
Rate limits depend on the model, account level, and upstream channel capacity. You can inspect live usage and limits in the console. For higher RPM or TPM, contact the administrator or business support.
The console provides usage dashboards by day, week, and month, with breakdowns by model, endpoint, and token type. Every request includes an x-request-id that can be searched in the logs page.
Every response includes an x-request-id header. Search that ID in the console logs to see request time, selected upstream channel, upstream status, token billing details, and any available error stack.