Sombra
Sign in
10 articlesShared 3 months agoLive

Sombra / About

What Sombra is and why it exists — product overview, core features (URL saving, collections, context distillation, MCP integration), and the research context: cognitive offloading, PKM, and the rise of context engineering.

Back to collection

Trust Infrastructure for AI Workflows

1 month ago

Trust Infrastructure for AI Workflows

What it is

Sombra is trust infrastructure for AI workflows: a substrate that tracks not just what your knowledge base contains, but what each claim is grounded in — and lets that grounding be interrogated, queried, and audited as your work evolves.

In practice, this means three things:

  • Every claim in a Sombra artifact or distilled context can carry a citation pointing to the source that supports it — internal (another artifact in your workspace) or external (a paper, a doc page, a code blob).
  • Citations are first-class temporal references: they carry the source location, the cited excerpt, and a timestamped record of when the grounding was established. They are not metadata in the export sense. They are queryable, auditable, and supersedable.
  • The substrate continuously checks whether citations have drifted — whether the source still says what the citation claims it says. Drift is surfaced, not hidden.

The user-facing version: you can trust an AI-written document because the trust is structural, not promised.

What problem it solves

Working with AI assistants on knowledge work has produced a new failure mode that knowledge workers across roles have started to recognise. A few representative voices, gathered from a single recent practitioner discussion:

  • "WIP feature is already surfacing tons of stuff that is either fully hallucinated, or has paper thin provenance."
  • "Honestly 30% of everything from Claude is hallucinations for me."
  • "Too much context and they drift quite a bit. I still find that when I do analysis runs I can easily get substantially different answers on multiple runs against the same document."
  • "Human review always."

These are not edge cases. The problem is structural: AI-assisted writing produces outputs that mix well-sourced claims with confidently-stated hallucinations, and the consumer of the output has no reliable way to tell which is which without re-doing all the underlying research themselves.

The workarounds practitioners are using today — running a verification pass, using another LLM as a judge, cross-model QC checks, manual SME review, instructing the agent not to deviate from approved source material — are all workflow workarounds for missing substrate. They are doing by hand what the substrate ought to do continuously. The cost is real: every verification pass starts from scratch; the disagreement between "Tuesday's analysis" and "Wednesday's analysis" cannot be resolved without going back to the sources; the SME's verification effort is not captured anywhere usable next time.

Sombra makes this work substrate-resident rather than workflow-resident. The grounding state of a claim is a property of the corpus, not a property of the most recent verification pass. Once you've grounded a claim, it stays grounded — and stays auditable — until the source changes.

What it does, concretely

Citations are typed, queryable, and time-aware

Every citation in Sombra has:

  • A claim end — a range in a citing artifact or context document, with its excerpt captured at the time the citation was committed.
  • A source end — a range in a source artifact (or a saved external page), with its excerpt captured at the same time.
  • A timestamp — when the grounding was established.
  • A drift statecoincident (the source still says what we cited), drifted (the source has changed but the claim is still recoverable nearby), or stranded (the cited text is no longer present in the source).

Citations can be added by you, by an AI agent via MCP, or retroactively against an existing corpus. They survive document edits — when the citing or cited content moves, the substrate re-anchors automatically and surfaces drift if it occurs.

Three states of epistemic health

Every claim in a Sombra artifact is in one of three states:

  • Externally grounded — cited to an external source (paper, doc, page) that still says what we cited.
  • Internally grounded — cited to another artifact in your workspace that supports it. (Weaker than external grounding, because internal sources can themselves be ungrounded — the chain can be followed.)
  • Floating — no citation. Visibly ungrounded, not silently assumed true.

The fourth state worth naming is paper thin — a citation exists but the source doesn't quite bear the weight of the claim. Sombra surfaces this via the confidence score returned during citation creation: if the cited excerpt only loosely matches the claim, the system flags it rather than committing a sketchy grounding.

Drift detection, continuously

When you save a web page, write a context document, or create a citation, the source's excerpt and a content hash are stored alongside the citation. If the source changes — the page updates, the artifact gets edited — Sombra detects the drift and tells you. A citation that no longer matches its source is not a silent rot; it's a reportable state change.

Tools an agent can use over MCP

The MCP server exposes the citation substrate to any compatible client (Claude.ai, Claude Code, Cursor, ChatGPT-via-MCP, others as the protocol grows):

  • Cite a claim — given a passage in a citing artifact and a candidate source, find and commit the strongest matching citation.
  • Check citations — return drift state for every citation on a parent artifact or collection.
  • Find citations — reverse lookup: which artifacts cite this source?
  • Repair drift — re-anchor citations whose ranges have shifted due to edits.

This means agents can write grounded documents, not just confident ones. Claude generating a context summary in your workspace can cite every factual claim back into the source artifacts it drew from — and the resulting context document becomes an auditable record rather than a black-box summary.

Who it's for

The category of person who benefits most: anyone whose work product is shaped knowledge that other people will act on.

Concretely:

  • Senior engineers and architects writing design docs, post-mortems, or technical strategy that subordinates and successors will treat as canonical
  • Researchers and analysts producing briefs, literature reviews, or competitive intelligence that decisions get made on
  • Solo founders and consultants producing client-facing or investor-facing artifacts where a hallucinated fact has real cost
  • Writers, journalists, and academics who already maintain citation graphs and care about provenance
  • Strategy and ops people in functional roles where institutional memory is the actual deliverable
  • Compliance, legal, and regulated-industry teams for whom unverified claims carry liability
  • Educators and trainers whose work is reorganising knowledge for others to depend on

What these audiences have in common: they cannot afford "confidently wrong" output, they already feel the cost of unverified AI-assisted writing, and they are willing to invest small amounts of effort upfront for large amounts of trustworthiness downstream.

How it integrates with how you already work

Sombra's trust infrastructure is invisible to your agent surface. You don't have to switch tools, change your editor, or learn a new prompting style. Whatever AI agent you're already using — Claude.ai, Claude Code, Cursor, your own ChatGPT — the substrate appears underneath via MCP. You write with the agent you're comfortable with; the grounding state shows up in your Sombra workspace; the citations are queryable from anywhere that speaks the protocol.

Two practical workflows it slots into:

Drafting against your sources. You save the relevant pages and papers into a Sombra collection. You write a context document — a distillation of what matters across those sources — and as you draft, citations are added pointing back into the source artifacts. The agent helping you write can read the sources, cite back to them automatically, and surface ungrounded claims you would have missed. The resulting context document is auditable: every factual claim points to a source that bears it.

Auditing existing work. You have a context document that was written before this infrastructure existed. You can run a grounding pass over it: every claim either gets cited to an existing source in your workspace, gets flagged as ungrounded, or gets noted as relying only on internal sources (which means weaker grounding than external citation). The output is a grounding report — not a rewrite, but a map of what's well-anchored and what isn't.

What it's not

  • Not a fact-checking service. Sombra tracks provenance, not correctness. A claim can be grounded in a source that is itself wrong; the substrate gives you the receipts so you can decide whether to trust the source, but doesn't decide for you.
  • Not a moderation tool. Trust infrastructure is for the writer and reader of knowledge work, not for content policy at scale.
  • Not a compliance product, despite being relevant to compliance. The substrate is genuinely useful in regulated industries because it produces auditable provenance, but Sombra is built for individual knowledge workers and small teams first; enterprise compliance shops can adopt the substrate, but the product is not architected around their specific procurement and audit workflows.
  • Not LLM-as-judge. We do not use a second model to grade the first. We track grounding as data.

Why this matters now

The cost of confidently-wrong AI-assisted writing has gone up by an order of magnitude since 2024. Hallucinations that used to live in a private chat window now propagate: an AI-written claim becomes a source for the next agent, a context document becomes an input to a strategy doc, an unsourced summary becomes the basis for a decision. The chain compounds, and the original lack of grounding gets harder to detect at every step.

Tooling has not caught up. Most knowledge management tools treat citations as export metadata for bibliographies. Most AI writing tools rely on similarity scores to gesture at sources without committing to them. Most "AI memory" features hide their provenance behind a black-box summary.

Sombra is built around the opposite premise: provenance is the substrate, not the export. When AI does the writing, the substrate keeps the receipts. The user gets to inspect, query, audit, and trust the work — structurally, not on faith.

Where to go from here

  • The MCP server is at https://sombra.so/mcp. Connect once with claude mcp add --transport http --scope user sombra https://sombra.so/mcp and your AI agent has access to the trust infrastructure from any compatible client.
  • The Chrome extension captures sources cleanly so they can be cited cleanly.
  • The web UI shows you grounding state at a glance — citations highlighted, drift surfaced, floating claims visible — so you can see the epistemic health of your corpus the same way you see file structure.

Trust the substrate, not the vibes.