Observability SDK
Log goals, plans, tool calls, observations, and decisions. Replay timelines with screenshots and DOM snapshots inline.
v0.1 · open source
FlightLog records what your browser and tool-calling agents did, scores it, and gates pull requests on regression suites — surfaced directly as GitHub Check Runs.
Log goals, plans, tool calls, observations, and decisions. Replay timelines with screenshots and DOM snapshots inline.
Score every run on goal completion, constraint violations, redundant steps, and human-approval requests.
Map suites to GitHub repos. PR webhooks execute cases and post aggregate + per-case Check Runs linked to traces.
01 — Instrument
Drop the SDK into any agent runtime — native, AI SDK, LangChain, or custom. Emit plan, tool, observe, and decide events.
Traces render as a replayable timeline with screenshots and DOM snapshots aligned to each step.
import { FlightLog } from '@flightlog/sdk';
const log = new FlightLog({ apiKey: process.env.FLIGHTLOG_KEY });
const run = log.run({ goal: 'Find product pricing without buying anything' });
run.plan('Open vendor site, locate pricing page');
run.tool('browser.goto', { url: 'https://acme.dev' });
run.observe({ dom: snapshot });
run.decide('Pricing visible in nav, follow link');
run.finish({ status: 'success' });name: agent-regression
on: pull_request
jobs:
flightlog:
uses: flightlog/regression@v1
with:
suite: checkout-flow
threshold: 0.85 02 — Gate
Define goal-based cases with score thresholds. PR webhooks execute the suite and report a GitHub Check Run with per-case results. Failed checks deep-link to the exact trace and the step that broke.
completion, constraints, repetition, approvals
native, AI SDK, LangChain, custom
Check Runs with deep links to traces
self-host or use hosted