Custom AI Agents Need Sandboxes, Not Scripts
Teams can prototype custom AI agents in a notebook or a single container in a day. The harder part starts when those agents need to run across teams, survive restarts, keep secrets separated, and preserve session state in production. That is why BerriAI's open-source LiteLLM Agent Platform matters: it focuses less on prompt logic and more on the infrastructure layer agents need once they leave the demo environment.
According to MarkTechPost's coverage of the release, BerriAI open-sourced the platform in May 2026 as a self-hosted way to run multiple agents with isolated sandboxes and persistent sessions. For enterprise teams in software, fintech, and healthcare, that shifts the discussion from model choice alone to AI integration architecture and day-two operations.
What is custom AI agents?
Custom AI agents are task-specific systems that combine a model, tools, memory, permissions, and runtime logic to complete work inside a business environment. In production, they need more than prompting: they need isolated execution, persistent state, and operational controls so they can run safely across teams and restarts.
Why do local scripts fail when custom AI agents move into production?
A local script is usually stateless enough to restart without much consequence. Production agents are different. They accumulate chat history, tool outputs, intermediate steps, and credentials over time. If that state lives only inside one container, a redeploy or pod crash can erase the work in progress.
That becomes more serious when multiple teams share infrastructure. A coding agent for engineering may need GitHub access, while a finance workflow agent may need a different toolchain and tighter scopes. Put both in one shared runtime and the trade-off is obvious: simpler setup, but weaker isolation.
This is the core problem LiteLLM Agent Platform is trying to solve. Its design centers on per-session sandboxes and session continuity rather than only agent prompts or UI polish. The official GitHub repository makes that intent clear in its architecture and quickstart materials.
Why do isolated sandboxes matter for AI agent development?
When teams talk about AI agent development, they often focus on frameworks, model selection, or tool calling. Isolation deserves equal attention. Sandboxes reduce the risk of one agent session seeing another session's files, tokens, or runtime dependencies.
In LiteLLM Agent Platform, those isolated runtimes are managed on Kubernetes through the agent-sandbox project from kubernetes-sigs. Locally, developers can use kind to run the cluster inside Docker. In production, the documented path points to AWS EKS for sandbox execution.
That architecture suits teams evaluating private AI solutions or on-premise AI patterns because the runtime boundary is explicit. It also reflects a practical operator lesson: most agent failures in production are not model failures first. They are environment, permissions, or lifecycle failures.
For teams moving from prototypes to deployed systems, this is where an implementation partner can help define the runtime boundary, persistence model, and service ownership. A similar pattern shows up in AI Integration Services for Real Estate, where the hard part is not only generating outputs but fitting AI safely into existing workflows and systems.
How does persistent session management keep custom AI agents reliable?
Persistent sessions are the difference between an agent that feels durable and one that forgets everything after an update window. The platform uses PostgreSQL as a backing store for session state, metadata, and agent configuration, with schema migration run before startup.
That matters because production systems restart for ordinary reasons: deployments, autoscaling, host maintenance, dependency updates, or failures. If the only copy of the agent state is inside RAM or a local filesystem, every restart becomes a business interruption.
The source material describes a separated web process, a worker process, and a database layer. That split is important. The web app handles dashboard interactions. The worker handles asynchronous tasks. The database preserves continuity. In other words, the platform treats AI deployment services as an operations problem, not just an interface problem.
There is a trade-off here too. Persistent state adds complexity: more infrastructure, more migrations, and more debugging paths. But for enterprise AI integrations, that complexity is usually cheaper than losing session history or rerunning failed tasks after every deployment.
What does the LiteLLM Gateway handle versus the Agent Platform?
This distinction is easy to miss, but it matters for stack design. LiteLLM Gateway and LiteLLM Agent Platform solve different layers of the problem.
The LiteLLM documentation positions the gateway as the model access layer. It handles routing across many model providers in OpenAI-compatible format, cost tracking, rate limiting, and provider abstraction. That includes providers such as OpenAI and Anthropic.
The Agent Platform sits above that layer. It handles sandbox lifecycle, session continuity, dashboard management, and agent CRUD operations. Put simply: the gateway decides how model calls are made; the platform decides how agent runtimes are operated.
That separation is healthy for enterprise AI integrations because it prevents one service from trying to do everything. It also creates cleaner ownership boundaries for platform teams, security teams, and application teams.
How is the platform structured under the hood?
The released architecture is relatively straightforward:
- A Next.js web process on port 3000 serves the dashboard.
- A worker process handles asynchronous agent tasks.
- PostgreSQL stores persistent session and agent data.
- A Kubernetes sandbox cluster runs isolated execution environments.
- An init migration ensures the database schema is ready before app startup.
For local testing, the quickstart is simple: provision the kind cluster, then run Docker Compose. For production, the recommended setup separates concerns further: AWS EKS for the sandbox cluster and Render for the web and worker services.
One operational detail stands out. Environment variables prefixed with CONTAINER_ENV_ are passed into sandbox containers with the prefix removed. That is a clean approach for secret injection because teams can provide credentials to the session runtime without rebuilding images. It is also a reminder that AI agent platform design depends on boring but essential details like startup order, secret handling, and state recovery.
How should enterprises evaluate custom AI agents after this release?
The release is a useful signal for buyers and builders alike. It suggests the market is maturing past single-agent demos and toward infrastructure that supports multiple teams, multiple contexts, and long-running work.
For enterprise teams, four evaluation questions matter:
- Where does agent state live when a pod restarts?
- How are secrets separated by team, role, and context?
- Which layer owns model routing versus runtime orchestration?
- Can the deployment model support both local development and production operations?
These questions shape AI integration architecture more than prompt templates do. They also help explain why many early agent pilots struggle when moved from experimentation to production. The issue is often not that the agent cannot reason. The issue is that the operating model was never built for persistence, isolation, or recovery.
FAQ
What is LiteLLM Agent Platform in simple terms?
LiteLLM Agent Platform is a self-hosted infrastructure layer for running multiple AI agents in production. It adds isolated sandboxes, session continuity, and a dashboard on top of a running LiteLLM Gateway so teams can manage agents more reliably.
How is this different from the LiteLLM Gateway?
The gateway handles model routing, provider access, cost tracking, and rate limits. The Agent Platform handles the runtime layer: sandbox lifecycle, session persistence, and operational management of agent workloads.
Why do production AI agents need isolated sandboxes?
Agents often need different tools, filesystems, secrets, and access scopes. If all sessions share one runtime, one mistake or dependency conflict can affect other workloads. Sandboxes reduce that blast radius.
Can custom AI agents survive pod restarts?
Yes, if their state is persisted outside the running container. That is one of the main goals of LiteLLM Agent Platform: preserving session continuity so work is not lost during redeployments or failures.
What do I need for the local quickstart?
The source documentation lists Docker Desktop, kind, kubectl, helm, and a running LiteLLM Gateway. Local setup does not require cloud credentials, which lowers the barrier for teams testing the architecture.
Key takeaways
- Custom AI agents need runtime isolation and persistent state once they move beyond prototypes.
- LiteLLM Agent Platform separates model routing from agent operations, which simplifies ownership across the stack.
- Kubernetes-native sandboxes are useful for multi-team environments with different tools, scopes, and secrets.
- Session continuity is not a nice-to-have in production; it is part of reliability.
- The biggest agent decision in 2026 may be infrastructure design, not model selection alone.
Martin Kuvandzhiev
CEO and Founder of Encorp.io with expertise in AI and business transformation