Computer-use integration guide
This guide explains how to opt into the computer_use server tool — the SDK feature that lets an agent drive a sandboxed browser on the user's behalf with an explicit consent UX, allowlist enforcement, audit logging, and a kill switch.
The feature is default-off. Nothing in this guide takes effect until you populate
governance.computerUsein yourChatClientconfig and configure a provider on your proxy. Until then, everycomputer_useinvocation fails fast withCOMPUTER_USE_NOT_ENABLED.
When to use this
- A user asks the agent to perform a task on a site that is not exposed via your APIs (e.g. "look up my warranty status on the manufacturer's portal").
- The task is bounded: one or two hostnames, a small number of clicks/forms, a clear success criterion.
- The host has reviewed the threat model in docs/reference/computer-use-threat-model.md and accepted it.
When not to use this
- Any flow where you can call an API directly. Computer-use is a last-resort capability.
- Any task that would require visiting more than a handful of hostnames.
- Any task that crosses payments, authentication, or PII boundaries without a separate human review step.
Architecture, at a glance
ChatClient (browser)
├── governance: { computerUse: { enabled, allowlist, ... } }
└── tools: [ computerUseTool({ endpoint: '/chat/tool-call' }) ]
│
│ HTTPS POST /chat/tool-call (Idempotency-Key, chat token)
▼
Customer proxy
├── createComputerUseHandler({ provider, policy, audit, ... })
│ ├── ComputerUseSession (per-call state machine + audit)
│ └── ComputerUseProvider (Mock | Browserbase | ...)
│ │
│ │ vendor REST/CDP
│ ▼
│ Hosted Chromium (Browserbase) — the security boundary
├── GET /chat/computer-use/:sessionId/stream (signed SSE)
└── POST /chat/computer-use/:sessionId/control (decisions + abort)
Three things to keep in mind:
- The iframe is a display sandbox, not a vendor sandbox. Browserbase runs Chromium in their isolated environment; our iframe shows PNG screenshots and a structured action log. No remote HTML/JS reaches the host page.
- The proxy is the policy enforcement point. The SDK-side check is fast-fail UX; the proxy re-applies every check independently. Never widen the allowlist at the SDK to "fix" an error.
- Approval is two-tier. The session-level approval reuses the existing
ToolRegistrytool.awaiting_approvalflow. The per-action high-risk approval (e.g. form submit) is a second channel inside the live session and surfaces via the surface'sConfirmDialog.
SDK config
import { createChatClient, computerUseTool } from 'gecx-chat';
const client = createChatClient({
// ... your existing transport, auth, etc.
governance: {
computerUse: {
enabled: true,
allowlist: ['acme-orders.example.com'],
maxDurationMs: 5 * 60_000,
maxActionsPerSession: 30,
highRiskActions: ['submit_form', 'download', 'navigate_external'],
killSwitch: false,
},
},
tools: [
computerUseTool({
endpoint: '/chat/tool-call',
// Optional: hand the resolved policy to the tool so the generated
// tool description tells the agent which hostnames are allowed.
policy: {
allowlist: ['acme-orders.example.com'],
maxDurationMs: 5 * 60_000,
maxActionsPerSession: 30,
},
}),
],
});
Allowlist matching
The allowlist matches hostnames exactly (case-insensitive, IDN-decoded). Subdomains are NOT implicit: acme.com does NOT match orders.acme.com. List each hostname explicitly. Schemes other than http: and https: are always rejected.
Default values
| Field | Default |
|---|---|
enabled | false |
allowlist | [] |
maxDurationMs | 300_000 (5 minutes) |
maxActionsPerSession | 30 |
highRiskActions | ['submit_form', 'download', 'navigate_external'] |
killSwitch | false |
Proxy config
Set these env vars on the proxy:
COMPUTER_USE_PROVIDER=mock | browserbase
COMPUTER_USE_ALLOWLIST=acme-orders.example.com,...
COMPUTER_USE_MAX_DURATION_MS=300000
COMPUTER_USE_MAX_ACTIONS=30
COMPUTER_USE_KILL_SWITCH=0
COMPUTER_USE_STREAM_KEY=<rotate per deployment>
# Only when COMPUTER_USE_PROVIDER=browserbase:
BROWSERBASE_API_KEY=...
BROWSERBASE_PROJECT_ID=...
The proxy's createComputerUseHandler is the authoritative enforcement point. Even if the SDK config is mis-set, the proxy refuses anything outside its own allowlist.
Wiring a host-supplied automation hook
BrowserbaseProvider does not pull in Playwright as a dependency — the host wires its preferred automation library via the act hook so we don't impose a platform-binary toolchain on every customer:
import { chromium } from 'playwright';
import { BrowserbaseProvider } from 'gecx-chat/server';
const provider = new BrowserbaseProvider({
apiKey: process.env.BROWSERBASE_API_KEY!,
projectId: process.env.BROWSERBASE_PROJECT_ID!,
act: async ({ connectUrl, action, signal }) => {
const browser = await chromium.connectOverCDP(connectUrl);
try {
const context = browser.contexts()[0]!;
const page = context.pages()[0] ?? await context.newPage();
switch (action.actionType) {
case 'navigate':
await page.goto(action.url!, { signal });
break;
case 'click':
await page.click(action.target!);
break;
// ... etc.
}
return { summary: `executed ${action.actionType}`, currentUrl: page.url() };
} finally {
await browser.close();
}
},
screenshot: async ({ connectUrl }) => {
const browser = await chromium.connectOverCDP(connectUrl);
try {
const page = browser.contexts()[0]!.pages()[0]!;
const buffer = await page.screenshot({ type: 'png' });
const url = `data:image/png;base64,${buffer.toString('base64')}`;
return { url };
} finally {
await browser.close();
}
},
});
In production, host the screenshot bytes from a signed proxy URL rather than embedding base64 — large PNGs in data: URLs balloon the SSE frames and break some intermediaries.
Consent and abort UX
Mount <ComputerUseSurface> in your transcript or as a sibling of your <ChatPanel> when a computer_use tool call enters awaiting_approval or executing:
import { ComputerUseSurface } from 'gecx-chat/react';
<ComputerUseSurface
computerUseSessionId={session.computerUseSessionId}
goal={session.goal}
allowlist={session.allowlist}
streamUrl={session.streamUrl}
controlUrl={session.controlUrl}
maxDurationMs={300_000}
maxActionsPerSession={30}
onEvent={(event) => yourAuditSink(event)}
onComplete={(result) => console.log('done', result)}
/>
You can pass renderConsent and renderApproval overrides to swap in your design system.
Audit and kill switch
Every step emits a governance.computer_use.* event through ChatGovernance's audit pipeline. Wire it to your SIEM the same way you wire voice_session_started:
governance: {
audit: (event) => yourSiemClient.send(event),
computerUse: { /* ... */ },
}
To engage the kill switch:
- SDK side (instant, in-process):
chatClient.governance.triggerComputerUseKill(true, { reason: 'incident-2026-03-14' }) - Proxy side (instant, server-wide):
POST /admin/computer-use/kill-switch { "on": true, "reason": "..." } - Config: set
COMPUTER_USE_KILL_SWITCH=1and restart, or flipgovernance.computerUse.killSwitchin the SDK config (effective at next session boot).
While the kill switch is on, every active session emits governance.computer_use.killed and tears down, and every new computer_use call fails fast with COMPUTER_USE_PROVIDER_UNAVAILABLE.
Error codes
See docs/reference/error-codes.md#computer-use for the full list. Each code has a user message, developer hint, and remediation step.
Verification runbook
pnpm install && pnpm typecheck && pnpm test— fast inner-loop.COMPUTER_USE_PROVIDER=mock pnpm dev— visit/computer-usein the showcase to drive the deterministic demo.BROWSERBASE_API_KEY=... BROWSERBASE_PROJECT_ID=... COMPUTER_USE_PROVIDER=browserbase pnpm dev— smoke against a real vendor session.pnpm e2e/pnpm e2e:applied— Playwright covers consent → screenshot stream → high-risk approval → completion.
Threat model and penetration test
See docs/reference/computer-use-threat-model.md.
docs/guides/computer-use.md