What is Sombra? Your AI Agent's Research Library
Save web pages, organise them into collections, distil the important parts, and give your AI agent access to all of it through MCP.
You research things before you code them. You read API docs, migration guides, blog posts, architecture writeups. You open twenty tabs, compare approaches, form opinions. Then you open Claude Code or Cursor, and the agent knows none of it.
So, you paste docs into prompts. You re-explain what you read yesterday. The agent hallucinates an API endpoint because it never saw the page you saw. You do this again tomorrow. Compounding this, someone in your team has a totally different set of documents.
Sombra stops the cycle. Save web pages, organise them into collections, distil the important parts, and give your AI agent access to all of it through MCP. One connection. Every tool you use reads the same library.
Save
The Chrome extension saves any web page in one click. Sombra strips the ads, navigation, and layout chrome, and stores the content as clean markdown. If the page goes offline next month, your copy doesn't.
Saving also works mid-conversation. Tell your assistant "save that http-kit migration guide to my Pedestal collection" and it does, through MCP, without you switching windows.
Pages behind login walls — internal wikis, paywalled articles, authenticated dashboards — work too. When you use the extension, it captures what your browser renders, client-side, with a screenshot for visual reference. We do not capture JavaScript data, session variables or any private data — we explicitly filter out forms, inputs, and other non-content items from the HTML before it gets turned into Markdown. They're irrelevant to us, and we respect your security and privacy.
To quickly add a public URL, just ask your assistant, or use the app. The chome extension is there for the more tricky situations — JS heavy pages, content scrapers cannot see.
Organise
Collections are named groups of saved content. "Tailwind migration docs." "Competitor pricing Q1." "EU e-invoicing requirements." One collection per project or task, not one giant archive.
Your AI agent creates, searches, and rearranges collections through MCP. You can do it in the web app too, but you probably won't bother once you've tried asking Claude to do it for you.
Distil
A collection full of saved pages is better than twenty open tabs, but it's still a pile of raw material. Your agent doesn't need forty pages of framework docs. It needs the five hundred tokens that capture the breaking changes, the key API differences, and the three configuration flags that will actually bite you.
Collection context is a distilled summary you write (or ask your AI to write) for any collection. It preserves code examples and CLI commands verbatim — because those are the tokens a coding agent actually uses — while cutting everything else down to signal. One distillation replaces a folder of raw docs.
This is where Sombra differs from bookmarking tools. Raindrop saves your links. Sombra turns your links into dense, structured context that makes AI agents produce better code. We go into a lot more detail on the why of this in a later blog post. It's worth understanding how much better your AI usage is when you curate, fewer tokens, more accuracy.
Connect
claude mcp add --transport http sombra https://sombra.so/mcp
One command. Claude.ai, Claude Desktop, Claude Code, Cursor, ChatGPT — anything that speaks MCP reads your library. Your research stops being trapped in whichever chat window you happened to use when you found it. There's a full breakdown with instructions for clients on our MCP page.
Content can be saved to your Google Drive or Dropbox account — once you're connected, we sync all your saved notes, web clippings and contexts to your cloud drive. Your data is your data, and is portable by design.
Switch tools whenever you want. The context lives in Sombra or your cloud drive, not in a conversation thread that expires when you close the tab.
Let the agent do the research
You don't have to build collections by hand. Sombra is fully accessible through MCP, so your AI can do the whole workflow: find relevant pages, save them, organise them into a collection, write the distilled context. You describe what you need, the agent builds the collection, and you come back to a structured knowledge base with sources and a synthesis.
This works well for topics you know little about. Ask Claude to research a migration path, a new API, or a competitor's product. It saves as it goes, organises what it finds, and produces a distillation you can review and refine. The agent does the gathering; you do the judgement.
What people use it for
Framework migrations. Save the docs for both stacks. Distil the breaking changes. Your coding agent writes the migration against accurate, current documentation instead of its training data from eighteen months ago.
API integrations. Save the reference, the auth guide, a few working examples. Distil the endpoints, rate limits, and auth flow. The agent writes the integration without you pasting the same curl examples into every conversation.
Competitive research. Save competitor sites, pricing pages, changelogs, press. Distil positioning and recent moves. The brief updates as you add material — it doesn't go stale in a Google Doc nobody maintains.
Onboarding onto a codebase. Save the architecture docs, the contribution guide, the design decisions that aren't written down anywhere except a Slack thread from 2023. Distil the conventions and gotchas. New developers (or you, joining an unfamiliar repo) get the map without the six-hour archaeological dig.
Vulnerability response. A CVE drops. Advisories scattered across three vendor sites, two researcher blogs, and a mailing list. Save them all, distil the affected versions and mitigations into one coherent briefing. Share it as a public collection if your team needs it.
Sales prep. Have the agent build a collection on a prospect before a call — website, recent press, funding history, job listings. It distils the relevant details into a briefing. You walk in knowing their priorities without having spent an hour on LinkedIn.
Version history
Every change to every artifact is versioned. Edit a note, update a distillation, re-scrape a saved page — the previous state stays in the database. View the full timeline, compare any two versions, roll back if an AI agent rewrote something you preferred the old way.
This uses Datomic's immutable data model. Every transaction is recorded permanently. Querying the state of any entity at any point in time is a database operation, not a reconstructed event log. History isn't a feature we added; it's a property of how the data is stored.
It matters because AI agents are opinionated editors. They will rewrite your carefully scoped distillation into something more generic if you let them. With history, you can let them try, check what they changed, and undo it in two clicks. Undoing changes is an MCP native tool.
Sync to Dropbox and Google Drive
Your library, once you connect, syncs automatically to Dropbox and Google Drive as .md files. Notes, distilled context, clippings — all plain markdown in a dedicated folder.
Connect both services and it syncs to both. No export wizard, no proprietary format. If Sombra disappeared tomorrow (which it will not — we're carefully and sustainably built for the long haul), your research will always be sitting in your cloud storage as text files you can open in any editor.
Public sharing
Share any collection via a public URL. The shared page shows your distilled context, your notes, and your sources cited with title, link, and annotation. The full extracted page content stays private — what you're sharing is your synthesis, not someone else's article.
Share as a live link (updates as you work) or a snapshot (frozen at a specific point).
Some examples:
- Somatic Mosaicism & Clonal Evolution — built live during a research session that started with a 10-year-old's question about seeds
- About Sombra — Sombra's own product context, collated from primary sources
- Acronis CVE Breakdown — a cybersecurity briefing with cited sources
How it differs from everything else
Notion, Obsidian, Logseq — these are great tools, but fundamentally built for humans to browse. You search, follow links, traverse a graph. They assume a human is doing the retrieval. None of them speak MCP. None of them extract web content. The rich graph is counter-intuitively an antipattern in this case, as we discuss when looking at context engineering.
Sombra does the full loop: capture web content, organise by project, distil to dense context, serve to any MCP client. Hosted, persistent, multi-client. Connect once and every agent you use benefits.
Stack
Clojure and ClojureScript. Datomic. Pedestal with Http-Kit. MCP over Streamable HTTP with OAuth 2.1. Chrome extension for client-side capture. The right tools.
Built in Southern Portugal, by a small bootstrapped international team.
We'd love to hear from you.