testutil: export harness + refactor integration tests #98

ThomasK33 · 2025-12-23T19:05:40Z

Refactors AIBridge integration tests to use a shared exported testutil harness and typed .txtar fixtures.

Adds testutil package: TXTARFixture, LLMFixture, UpstreamServer, MCPServer, BridgeServer, RecorderSpy, Inspector
Migrates bridge/metrics/trace integration tests to the new harness, removing legacy helpers and tuple-return setup
Splits injected-tools integration tests into their own file

Fixes #73.

📋 Implementation Plan

Refactor AIBridge tests: exported `testutil` harness + fixture structs

Goals

Remove copy/paste across integration tests by centralizing the common flow:
1. load fixture → 2) start mock upstream (+ optional MCP) → 3) start AIBridge → 4) perform request → 5) assert via a single “inspector”.
Replace “helper functions with 4+ return values” with structs (harness objects) so new features don’t keep accreting more tuple returns.
Keep .txtar fixtures but make them typed (fixture structs with methods), so tests don’t manually txtar.Parse + filesMap + magic filenames.
Make helpers reusable outside this repo by shipping an exported github.com/coder/aibridge/testutil package.

Non-goals

No changes to production request/streaming behavior.
No fixture format migration (stay on .txtar).
No attempt to fully abstract provider SDK parsing (Anthropic/OpenAI streaming accumulation can stay in tests initially).

Proposed `testutil` package (exported)

Key design principles

Prefer composition over mega-helper functions: return one Harness/Server struct that exposes small methods.
Option pattern over extra return values: adding capabilities should be additive without breaking call sites.
Defensive-by-default: constructors validate fixtures early and fail fast with clear messages.

Package layout (suggested)

testutil/fixture_txtar.go: parse .txtar, expose a safe API.
testutil/fixture_llm.go: provider/upstream fixture (request + streaming/non-streaming responses + optional tool-call responses).
testutil/upstream_server.go: mock upstream server driven by LLMFixture.
testutil/http_reflector.go: “raw HTTP response fixture” server (current mockHTTPReflector pattern).
testutil/mcp_server.go: mock MCP server + call tracking + proxier creation.
testutil/recorder_spy.go: aibridge.Recorder implementation + query helpers.
testutil/bridge_server.go: start an AIBridge httptest.Server with actor context and configurable deps.
testutil/inspector.go: one place to access assertion data (recorder + MCP calls + upstream call count).
testutil/doc.go: package docs + “test-only / not API-stable” disclaimer + examples.

Core types & responsibilities

1) Fixtures

TXTARFixture
- Owns .Comment and a map[string][]byte.
- Provides File(name) / MustFile(name) / Has(name).
LLMFixture
- Created from a TXTARFixture.
- Encodes the known sections (request, streaming, non-streaming, optional tool-call variants).
- Provides:
  - RequestBody(streaming bool) ([]byte, error) (e.g. sets stream=true safely)
  - Response(call int, streaming bool) ([]byte, error) (supports 2-turn injected tool flows)

Why: tests should stop knowing about fixtureStreamingToolResponse strings and “exactly 5 files”. The fixture struct should own that.

2) Mock upstream server

type UpstreamServer struct { URL string; Calls atomic; ... }
Constructed with LLMFixture and options:
- WithResponseMutator(func(call int, resp []byte) []byte) (keeps the escape hatch used today)
- WithStatusCode(int)
Behavior:
- Reads request body once, determines streaming vs non-streaming (current logic).
- Uses LLMFixture.Response(call, streaming) to select the correct response.
- Provides RequireCallCount(t, n) / EventuallyCallCount(t, n, timeout).

3) Mock MCP server (fix PR72/63-style accretion)

Replace:

callAccumulator
createMockMCPSrv
setupMCPServerProxiesForTest (map + accumulator return)

With:

type MCPServer struct { Server *httptest.Server; calls map[...]... }
Methods:
- CallsByTool(name string) []any
- Proxiers(t, tracer, logger) map[string]mcp.ServerProxier (or Proxier() + Name())
- Close() via t.Cleanup in constructor

4) Recorder + Inspector

RecorderSpy (exported) implements aibridge.Recorder
- Thread-safe slices for interceptions/token/prompt/tool usage.
- Convenience queries:
  - TokenUsages() []*aibridge.TokenUsageRecord
  - ToolUsages() []*aibridge.ToolUsageRecord
  - RequireAllInterceptionsEnded(t)
Inspector
- Holds pointers to:
  - RecorderSpy
  - MCPServer (optional)
  - UpstreamServer (optional)
- Provides higher-level helpers:
  - RequireToolCalled(t, tool string, wantArgs any) (checks both recorded tool usage and MCP receipt)
  - UpstreamCalls() int

5) AIBridge test server / harness

BridgeServer (or Harness)
- Starts aibridge.NewRequestBridge(...) and wraps it in httptest.Server.
- Ensures actor context is set (current BaseContext pattern).
- Holds the *http.Client used by tests.
- Provides:
  - Do(req *http.Request) (*http.Response, error)
  - NewProviderRequest(provider string, body []byte) (*http.Request, error) (replaces createOpenAIChatCompletionsReq / createAnthropicMessagesReq).

📎 API sketch (example usage)

fixture := testutil.MustParseTXTAR(t, antSingleInjectedTool)
llm := testutil.MustLLMFixture(fixture)

upstream := testutil.NewUpstreamServer(t, t.Context(), llm,
  testutil.WithInjectedToolSecondResponse(llm),
)

mcpSrv := testutil.NewMCPServer(t, testutil.CoderToolset())
rec := testutil.NewRecorderSpy()

bridge := testutil.NewBridgeServer(t, testutil.BridgeConfig{
  Ctx:      t.Context(),
  ActorID:  userID,
  Providers: []aibridge.Provider{aibridge.NewAnthropicProvider(anthropicCfg(upstream.URL(), apiKey), nil)},
  Recorder:  rec,
  MCPProxiers: mcpSrv.Proxiers(t, testTracer),
  Tracer:   testTracer,
})

reqBody := testutil.Must(llm.RequestBody(streaming))
req := bridge.MustNewProviderRequest(aibridge.ProviderAnthropic, reqBody)
resp := bridge.MustDo(req)

defer resp.Body.Close()

ins := testutil.NewInspector(rec, mcpSrv, upstream)
ins.RequireToolCalled(t, mockToolName, map[string]any{"owner": "admin"})
upstream.RequireCallCountEventually(t, 2)

Migration plan (mixed: refactor big file early, but keep incremental safety)

Phase 0 — Baseline + safety rails

Add testutil package skeleton with docs (doc.go) and a clear “test-only / API may change” disclaimer.
Add a small compile-time check somewhere in testutil to ensure it doesn’t import aibridge_test or rely on _test.go symbols.

Phase 1 — Move the lowest-risk helpers first

Extract mockRecorderClient into testutil.RecorderSpy (keep method names close to today to simplify migration).
Extract .txtar parsing helpers:
- filesMap → testutil.MustParseTXTAR returning TXTARFixture.
- setJSON wrapper (if currently local) → testutil.SetJSON / WithJSONField.
Extract request builders:
- createAnthropicMessagesReq / createOpenAIChatCompletionsReq → BridgeServer.NewProviderRequest(...).

Phase 2 — Fix the tech-debt hotspots (PR72/PR63 patterns)

Implement testutil.MCPServer and move callAccumulator inside it.
Replace setupMCPServerProxiesForTest usages with mcpSrv.Proxiers(...).
Update injected-tool tests to stop passing around map[string]mcp.ServerProxier and *callAccumulator.

Phase 3 — Rewrite `bridge_integration_test.go` early (reduce the 1900+ line “god file”)

Split the file by concern (examples):
- bridge_integration_anthropic_test.go
- bridge_integration_openai_test.go
- bridge_integration_injected_tools_test.go
- bridge_integration_errors_test.go
- bridge_integration_passthrough_test.go
Introduce testutil.UpstreamServer and migrate the “standard flow” tests (simple/builtin-tool/injected-tool) to the new harness.
Keep escape hatches:
- For tests that need full raw HTTP response fixtures, use testutil.HTTPReflectorServer.

Phase 4 — Migrate metrics + trace integration tests

Replace newTestSrv with testutil.NewBridgeServer (or reuse the same BridgeServer type).
Keep span/metrics assertions local, but share:
- server startup
- recorder spy
- request builders

Phase 5 — Cleanup + enforce the new pattern

Delete old helpers once all call sites are migrated:
- setupInjectedToolTest
- setupMCPServerProxiesForTest
- callAccumulator
- provider-specific request builders
Add a short “how to write a new test” section (either in testutil/doc.go or README.md), showing the new preferred pattern.

Acceptance criteria (what “done” looks like)

Tests no longer manually do txtar.Parse(...) + filesMap(...).
No helper returns a 4-tuple; complex setup returns a single struct.
Injected-tools tests don’t mention callAccumulator or raw proxies maps.
Adding a new provider fixture-driven test requires only:
- pick fixture
- choose streaming bool
- configure bridge
- assertions
bridge_integration_test.go is split and substantially smaller.

⚠️ Risks / mitigations

Exported helper API churn: keep the exported surface area small; hide most fields; use options for extension.
Cross-repo reuse (coder): avoid embedding local fixture paths in testutil; accept []byte fixtures or fs.FS.
Accidental prod import: clearly document “test-only” and keep package name testutil (discourages production use).

Generated with mux • Model: openai:gpt-5.2 • Thinking: xhigh

Add exported github.com/coder/aibridge/testutil helpers (txtar fixtures, upstream/mcp/bridge harnesses) and migrate integration tests to use them. Fixes #73 Change-Id: I7cdf803a08643a6c6683044bb576fc3da4622d9a Signed-off-by: Thomas Kosiewski <[email protected]>

ThomasK33 requested a review from pawbana December 23, 2025 20:18

dannykopping mentioned this pull request Dec 24, 2025

feat: add aibtest package for testing utilities #74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

testutil: export harness + refactor integration tests #98

testutil: export harness + refactor integration tests #98

Uh oh!

ThomasK33 commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

testutil: export harness + refactor integration tests #98

Are you sure you want to change the base?

testutil: export harness + refactor integration tests #98

Uh oh!

Conversation

ThomasK33 commented Dec 23, 2025

Refactor AIBridge tests: exported testutil harness + fixture structs

Goals

Non-goals

Proposed testutil package (exported)

Key design principles

Package layout (suggested)

Core types & responsibilities

1) Fixtures

2) Mock upstream server

3) Mock MCP server (fix PR72/63-style accretion)

4) Recorder + Inspector

5) AIBridge test server / harness

Migration plan (mixed: refactor big file early, but keep incremental safety)

Phase 0 — Baseline + safety rails

Phase 1 — Move the lowest-risk helpers first

Phase 2 — Fix the tech-debt hotspots (PR72/PR63 patterns)

Phase 3 — Rewrite bridge_integration_test.go early (reduce the 1900+ line “god file”)

Phase 4 — Migrate metrics + trace integration tests

Phase 5 — Cleanup + enforce the new pattern

Acceptance criteria (what “done” looks like)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor AIBridge tests: exported `testutil` harness + fixture structs

Proposed `testutil` package (exported)

Phase 3 — Rewrite `bridge_integration_test.go` early (reduce the 1900+ line “god file”)