Streaming#

Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.

Streaming#

Every session automatically starts a WebSocket stream server on an OS-assigned port. The server streams viewport frames and accepts input events (mouse, keyboard, touch).

To bind to a specific port, set AGENT_BROWSER_STREAM_PORT:

AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com

You can also manage streaming at runtime:

agent-browser stream status            # Show streaming state and bound port
agent-browser stream enable --port 9223  # Re-enable on a specific port
agent-browser stream disable           # Stop streaming for the session

stream status returns the enabled state, active port, browser connection state, and whether screencasting is active. stream disable tears the server down and removes the session's .stream metadata file.

Runtime status response#

agent-browser stream status --json returns data like:

{
  "enabled": true,
  "port": 9223,
  "connected": true,
  "screencasting": true
}

connected reports whether the daemon currently has a browser attached. screencasting reports whether frames are actively being produced for the stream server.

Relationship to screencast commands#

stream enable creates the WebSocket server and keeps it available for the session. WebSocket clients then trigger live frame delivery automatically.

The lower-level screencast_start and screencast_stop commands still control explicit CDP screencasts directly. Use them when you want a screencast without the WebSocket runtime server.

WebSocket protocol#

Connect to ws://localhost:9223 to receive frames and send input.

Frame messages#

The server sends frame messages with base64-encoded images:

{
  "type": "frame",
  "data": "<base64-encoded-jpeg>",
  "metadata": {
    "deviceWidth": 1280,
    "deviceHeight": 720,
    "pageScaleFactor": 1,
    "offsetTop": 0,
    "scrollOffsetX": 0,
    "scrollOffsetY": 0
  }
}

Status messages#

Connection and screencast status:

{
  "type": "status",
  "connected": true,
  "screencasting": true,
  "viewportWidth": 1280,
  "viewportHeight": 720
}

Input injection#

Send input events to control the browser remotely.

Mouse events#

// Click
{
  "type": "input_mouse",
  "eventType": "mousePressed",
  "x": 100,
  "y": 200,
  "button": "left",
  "clickCount": 1
}

// Release
{
  "type": "input_mouse",
  "eventType": "mouseReleased",
  "x": 100,
  "y": 200,
  "button": "left"
}

// Move
{
  "type": "input_mouse",
  "eventType": "mouseMoved",
  "x": 150,
  "y": 250
}

// Scroll
{
  "type": "input_mouse",
  "eventType": "mouseWheel",
  "x": 100,
  "y": 200,
  "deltaX": 0,
  "deltaY": 100
}

Keyboard events#

// Key down
{
  "type": "input_keyboard",
  "eventType": "keyDown",
  "key": "Enter",
  "code": "Enter"
}

// Key up
{
  "type": "input_keyboard",
  "eventType": "keyUp",
  "key": "Enter",
  "code": "Enter"
}

// Type character
{
  "type": "input_keyboard",
  "eventType": "char",
  "text": "a"
}

// With modifiers (1=Alt, 2=Ctrl, 4=Meta, 8=Shift)
{
  "type": "input_keyboard",
  "eventType": "keyDown",
  "key": "c",
  "code": "KeyC",
  "modifiers": 2
}

Touch events#

// Touch start
{
  "type": "input_touch",
  "eventType": "touchStart",
  "touchPoints": [{ "x": 100, "y": 200 }]
}

// Touch move
{
  "type": "input_touch",
  "eventType": "touchMove",
  "touchPoints": [{ "x": 150, "y": 250 }]
}

// Touch end
{
  "type": "input_touch",
  "eventType": "touchEnd",
  "touchPoints": []
}

// Multi-touch (pinch zoom)
{
  "type": "input_touch",
  "eventType": "touchStart",
  "touchPoints": [
    { "x": 100, "y": 200, "id": 0 },
    { "x": 200, "y": 200, "id": 1 }
  ]
}

Programmatic API#

For advanced use, control streaming directly via the TypeScript API:

import { BrowserManager } from 'agent-browser';

const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');

// Start screencast with callback
await browser.startScreencast((frame) => {
  console.log('Frame:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
  // frame.data is base64-encoded image
}, {
  format: 'jpeg',  // or 'png'
  quality: 80,     // 0-100, jpeg only
  maxWidth: 1280,
  maxHeight: 720,
  everyNthFrame: 1
});

// Inject mouse event
await browser.injectMouseEvent({
  type: 'mousePressed',
  x: 100,
  y: 200,
  button: 'left',
  clickCount: 1
});

// Inject keyboard event
await browser.injectKeyboardEvent({
  type: 'keyDown',
  key: 'Enter',
  code: 'Enter'
});

// Inject touch event
await browser.injectTouchEvent({
  type: 'touchStart',
  touchPoints: [{ x: 100, y: 200 }]
});

// Check if screencasting
console.log('Active:', browser.isScreencasting());

// Stop screencast
await browser.stopScreencast();

Use cases#

  • Pair browsing - Human watches and assists AI agent in real-time
  • Remote preview - View browser output in a separate UI
  • Recording - Capture frames for video generation
  • Mobile testing - Inject touch events for mobile emulation
  • Accessibility testing - Manual interaction during automated tests