Skip to content

Conversation

@ThomasK33
Copy link
Member

Refactors AIBridge integration tests to use a shared exported testutil harness and typed .txtar fixtures.

  • Adds testutil package: TXTARFixture, LLMFixture, UpstreamServer, MCPServer, BridgeServer, RecorderSpy, Inspector
  • Migrates bridge/metrics/trace integration tests to the new harness, removing legacy helpers and tuple-return setup
  • Splits injected-tools integration tests into their own file

Fixes #73.


📋 Implementation Plan

Refactor AIBridge tests: exported testutil harness + fixture structs

Goals

  • Remove copy/paste across integration tests by centralizing the common flow:
    1. load fixture → 2) start mock upstream (+ optional MCP) → 3) start AIBridge → 4) perform request → 5) assert via a single “inspector”.
  • Replace “helper functions with 4+ return values” with structs (harness objects) so new features don’t keep accreting more tuple returns.
  • Keep .txtar fixtures but make them typed (fixture structs with methods), so tests don’t manually txtar.Parse + filesMap + magic filenames.
  • Make helpers reusable outside this repo by shipping an exported github.com/coder/aibridge/testutil package.

Non-goals

  • No changes to production request/streaming behavior.
  • No fixture format migration (stay on .txtar).
  • No attempt to fully abstract provider SDK parsing (Anthropic/OpenAI streaming accumulation can stay in tests initially).

Proposed testutil package (exported)

Key design principles

  • Prefer composition over mega-helper functions: return one Harness/Server struct that exposes small methods.
  • Option pattern over extra return values: adding capabilities should be additive without breaking call sites.
  • Defensive-by-default: constructors validate fixtures early and fail fast with clear messages.

Package layout (suggested)

  • testutil/fixture_txtar.go: parse .txtar, expose a safe API.
  • testutil/fixture_llm.go: provider/upstream fixture (request + streaming/non-streaming responses + optional tool-call responses).
  • testutil/upstream_server.go: mock upstream server driven by LLMFixture.
  • testutil/http_reflector.go: “raw HTTP response fixture” server (current mockHTTPReflector pattern).
  • testutil/mcp_server.go: mock MCP server + call tracking + proxier creation.
  • testutil/recorder_spy.go: aibridge.Recorder implementation + query helpers.
  • testutil/bridge_server.go: start an AIBridge httptest.Server with actor context and configurable deps.
  • testutil/inspector.go: one place to access assertion data (recorder + MCP calls + upstream call count).
  • testutil/doc.go: package docs + “test-only / not API-stable” disclaimer + examples.

Core types & responsibilities

1) Fixtures

  • TXTARFixture
    • Owns .Comment and a map[string][]byte.
    • Provides File(name) / MustFile(name) / Has(name).
  • LLMFixture
    • Created from a TXTARFixture.
    • Encodes the known sections (request, streaming, non-streaming, optional tool-call variants).
    • Provides:
      • RequestBody(streaming bool) ([]byte, error) (e.g. sets stream=true safely)
      • Response(call int, streaming bool) ([]byte, error) (supports 2-turn injected tool flows)

Why: tests should stop knowing about fixtureStreamingToolResponse strings and “exactly 5 files”. The fixture struct should own that.

2) Mock upstream server

  • type UpstreamServer struct { URL string; Calls atomic; ... }
  • Constructed with LLMFixture and options:
    • WithResponseMutator(func(call int, resp []byte) []byte) (keeps the escape hatch used today)
    • WithStatusCode(int)
  • Behavior:
    • Reads request body once, determines streaming vs non-streaming (current logic).
    • Uses LLMFixture.Response(call, streaming) to select the correct response.
    • Provides RequireCallCount(t, n) / EventuallyCallCount(t, n, timeout).

3) Mock MCP server (fix PR72/63-style accretion)

Replace:

  • callAccumulator
  • createMockMCPSrv
  • setupMCPServerProxiesForTest (map + accumulator return)

With:

  • type MCPServer struct { Server *httptest.Server; calls map[...]... }
  • Methods:
    • CallsByTool(name string) []any
    • Proxiers(t, tracer, logger) map[string]mcp.ServerProxier (or Proxier() + Name())
    • Close() via t.Cleanup in constructor

4) Recorder + Inspector

  • RecorderSpy (exported) implements aibridge.Recorder
    • Thread-safe slices for interceptions/token/prompt/tool usage.
    • Convenience queries:
      • TokenUsages() []*aibridge.TokenUsageRecord
      • ToolUsages() []*aibridge.ToolUsageRecord
      • RequireAllInterceptionsEnded(t)
  • Inspector
    • Holds pointers to:
      • RecorderSpy
      • MCPServer (optional)
      • UpstreamServer (optional)
    • Provides higher-level helpers:
      • RequireToolCalled(t, tool string, wantArgs any) (checks both recorded tool usage and MCP receipt)
      • UpstreamCalls() int

5) AIBridge test server / harness

  • BridgeServer (or Harness)
    • Starts aibridge.NewRequestBridge(...) and wraps it in httptest.Server.
    • Ensures actor context is set (current BaseContext pattern).
    • Holds the *http.Client used by tests.
    • Provides:
      • Do(req *http.Request) (*http.Response, error)
      • NewProviderRequest(provider string, body []byte) (*http.Request, error) (replaces createOpenAIChatCompletionsReq / createAnthropicMessagesReq).
📎 API sketch (example usage)
fixture := testutil.MustParseTXTAR(t, antSingleInjectedTool)
llm := testutil.MustLLMFixture(fixture)

upstream := testutil.NewUpstreamServer(t, t.Context(), llm,
  testutil.WithInjectedToolSecondResponse(llm),
)

mcpSrv := testutil.NewMCPServer(t, testutil.CoderToolset())
rec := testutil.NewRecorderSpy()

bridge := testutil.NewBridgeServer(t, testutil.BridgeConfig{
  Ctx:      t.Context(),
  ActorID:  userID,
  Providers: []aibridge.Provider{aibridge.NewAnthropicProvider(anthropicCfg(upstream.URL(), apiKey), nil)},
  Recorder:  rec,
  MCPProxiers: mcpSrv.Proxiers(t, testTracer),
  Tracer:   testTracer,
})

reqBody := testutil.Must(llm.RequestBody(streaming))
req := bridge.MustNewProviderRequest(aibridge.ProviderAnthropic, reqBody)
resp := bridge.MustDo(req)

defer resp.Body.Close()

ins := testutil.NewInspector(rec, mcpSrv, upstream)
ins.RequireToolCalled(t, mockToolName, map[string]any{"owner": "admin"})
upstream.RequireCallCountEventually(t, 2)

Migration plan (mixed: refactor big file early, but keep incremental safety)

Phase 0 — Baseline + safety rails

  1. Add testutil package skeleton with docs (doc.go) and a clear “test-only / API may change” disclaimer.
  2. Add a small compile-time check somewhere in testutil to ensure it doesn’t import aibridge_test or rely on _test.go symbols.

Phase 1 — Move the lowest-risk helpers first

  1. Extract mockRecorderClient into testutil.RecorderSpy (keep method names close to today to simplify migration).
  2. Extract .txtar parsing helpers:
    • filesMaptestutil.MustParseTXTAR returning TXTARFixture.
    • setJSON wrapper (if currently local) → testutil.SetJSON / WithJSONField.
  3. Extract request builders:
    • createAnthropicMessagesReq / createOpenAIChatCompletionsReqBridgeServer.NewProviderRequest(...).

Phase 2 — Fix the tech-debt hotspots (PR72/PR63 patterns)

  1. Implement testutil.MCPServer and move callAccumulator inside it.
  2. Replace setupMCPServerProxiesForTest usages with mcpSrv.Proxiers(...).
  3. Update injected-tool tests to stop passing around map[string]mcp.ServerProxier and *callAccumulator.

Phase 3 — Rewrite bridge_integration_test.go early (reduce the 1900+ line “god file”)

  1. Split the file by concern (examples):
    • bridge_integration_anthropic_test.go
    • bridge_integration_openai_test.go
    • bridge_integration_injected_tools_test.go
    • bridge_integration_errors_test.go
    • bridge_integration_passthrough_test.go
  2. Introduce testutil.UpstreamServer and migrate the “standard flow” tests (simple/builtin-tool/injected-tool) to the new harness.
  3. Keep escape hatches:
    • For tests that need full raw HTTP response fixtures, use testutil.HTTPReflectorServer.

Phase 4 — Migrate metrics + trace integration tests

  1. Replace newTestSrv with testutil.NewBridgeServer (or reuse the same BridgeServer type).
  2. Keep span/metrics assertions local, but share:
    • server startup
    • recorder spy
    • request builders

Phase 5 — Cleanup + enforce the new pattern

  1. Delete old helpers once all call sites are migrated:
    • setupInjectedToolTest
    • setupMCPServerProxiesForTest
    • callAccumulator
    • provider-specific request builders
  2. Add a short “how to write a new test” section (either in testutil/doc.go or README.md), showing the new preferred pattern.

Acceptance criteria (what “done” looks like)

  • Tests no longer manually do txtar.Parse(...) + filesMap(...).
  • No helper returns a 4-tuple; complex setup returns a single struct.
  • Injected-tools tests don’t mention callAccumulator or raw proxies maps.
  • Adding a new provider fixture-driven test requires only:
    • pick fixture
    • choose streaming bool
    • configure bridge
    • assertions
  • bridge_integration_test.go is split and substantially smaller.
⚠️ Risks / mitigations
  • Exported helper API churn: keep the exported surface area small; hide most fields; use options for extension.
  • Cross-repo reuse (coder): avoid embedding local fixture paths in testutil; accept []byte fixtures or fs.FS.
  • Accidental prod import: clearly document “test-only” and keep package name testutil (discourages production use).

Generated with mux • Model: openai:gpt-5.2 • Thinking: xhigh

Add exported github.com/coder/aibridge/testutil helpers (txtar fixtures, upstream/mcp/bridge harnesses) and migrate integration tests to use them.

Fixes #73

Change-Id: I7cdf803a08643a6c6683044bb576fc3da4622d9a
Signed-off-by: Thomas Kosiewski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create proper testing package

1 participant