Skip to main content
This example is available on GitHub: examples/01_standalone_sdk/24_responses_streaming.py
Enable live token streaming when using the OpenAI Responses API path. This guide shows how to:
  • Subscribe to streaming deltas from the model
  • Log streamed chunks to a JSONL file
  • Optionally render streaming visually or print deltas to stdout
examples/01_standalone_sdk/24_responses_streaming.py
Running the Example
export LLM_API_KEY="your-openai-compatible-api-key"
# Optional overrides
# export LLM_MODEL="openhands/gpt-5-codex"
# export LLM_BASE_URL="https://your-litellm-or-provider-base-url"

cd agent-sdk
uv run python examples/01_standalone_sdk/24_responses_streaming.py

How It Works

  • Pass a token callback to Conversation to receive streaming chunks as they arrive:
conversation = Conversation(
    agent=agent,
    workspace=os.getcwd(),
    token_callbacks=[on_token],
)
  • Each chunk contains a delta: text_delta for content tokens or arguments_delta for tool-call arguments. The example logs a serialized record per chunk to ./logs/stream/*.jsonl.
  • For a visual live view, use the built-in streaming visualizer:
from openhands.sdk.conversation.streaming_visualizer import create_streaming_visualizer

visualizer = create_streaming_visualizer()
conversation = Conversation(
    agent=agent,
    workspace=os.getcwd(),
    token_callbacks=[on_token],
    callbacks=[visualizer.on_event],
    visualize=False,
)

Next Steps