v0.1 · open source

Observability and CI regression
for autonomous agents.

FlightLog records what your browser and tool-calling agents did, scores it, and gates pull requests on regression suites — surfaced directly as GitHub Check Runs.

Observability SDK

Log goals, plans, tool calls, observations, and decisions. Replay timelines with screenshots and DOM snapshots inline.

Evaluations

Score every run on goal completion, constraint violations, redundant steps, and human-approval requests.

CI Regression

Map suites to GitHub repos. PR webhooks execute cases and post aggregate + per-case Check Runs linked to traces.

01 — Instrument

One SDK. Every step of the loop.

Drop the SDK into any agent runtime — native, AI SDK, LangChain, or custom. Emit plan, tool, observe, and decide events. Traces render as a replayable timeline with screenshots and DOM snapshots aligned to each step.

import { FlightLog } from '@flightlog/sdk';

const log = new FlightLog({ apiKey: process.env.FLIGHTLOG_KEY });

const run = log.run({ goal: 'Find product pricing without buying anything' });

run.plan('Open vendor site, locate pricing page');
run.tool('browser.goto', { url: 'https://acme.dev' });
run.observe({ dom: snapshot });
run.decide('Pricing visible in nav, follow link');
run.finish({ status: 'success' });
name: agent-regression
on: pull_request

jobs:
  flightlog:
    uses: flightlog/regression@v1
    with:
      suite: checkout-flow
      threshold: 0.85

02 — Gate

Block regressions before they merge.

Define goal-based cases with score thresholds. PR webhooks execute the suite and report a GitHub Check Run with per-case results. Failed checks deep-link to the exact trace and the step that broke.

Eval signals
4

completion, constraints, repetition, approvals

Runtimes
any

native, AI SDK, LangChain, custom

CI surface
GitHub

Check Runs with deep links to traces

License
MIT

self-host or use hosted