Unwrapped

Teardown · exa

EXA

EXA

CategorySearch API for AILast round · $85M · 2025Site ↗
  • Benchmark
  • Lightspeed
  • NVIDIA
UX wrapper

Public web index + custom embedding search + API endpoints.

01

Public data / API layer

Common Crawl
Common CrawlPublic
GitHub public repos
GitHub public reposPublic
arXiv
arXivPublic
Wikipedia
WikipediaPublic
Publisher licensing deals
Publisher licensing dealsLicensed
Proprietary web crawl index
Proprietary web crawl indexScraped

Internal replication score

Easy
0.86

Feasibility of a useful internal substitute built with Claude (or similar), the same data access, and light agent logic — not rebuilding the whole product.

IRS = 0.30·D + 0.25·L + 0.20·O + 0.15·R + 0.10·Sthis record · 86%
  • D

    Data accessibility

    weight 0.300.95
    • 1.0mostly customer-owned / public / standard third-party sources
    • 0.5mixed accessibility
    • 0.0hard-to-access or proprietary source layer
  • L

    LLM substitutability

    weight 0.250.85
    • 1.0mostly retrieve / prompt / cite / summarize / classify / compare
    • 0.5mixed standard + custom behavior
    • 0.0strongly custom model behavior (fine-tunes on proprietary data, etc.)
  • O

    Output simplicity

    weight 0.200.90
    • 1.0straightforward internal work product (memo, list, reply, SQL query)
    • 0.5moderately specialized
    • 0.0highly specialized (e.g. FDA-graded clinical text)
  • R

    Review / risk tolerance

    weight 0.150.85
    • 1.0internal use with human review is acceptable
    • 0.5moderate risk
    • 0.0very low tolerance for error (e.g. external legal filings)
  • S

    Surface complexity

    weight 0.10inverse — higher means less surface dependence0.50
    • 1.0a simple internal shell is enough
    • 0.5polished workflow matters somewhat
    • 0.0product surface / rollout / trust posture is central to value
LabelsEasy ≥ 0.67Medium ≥ 0.34Hard < 0.34

Missing factor rows use heuristics from wrapper scores. Editorial heuristic, not investment advice.

Build it yourself

Recreate the workflow inside your org.

Internal build

Build it yourself

Same public web + embedding retrieval + frontier LLM extraction — lacks polish and agent integrations.

Internal use only. Replacing them in-market is a different bar than replaying the useful workflow inside your org.

01 · Connectors & flow

Common Crawl
Common Crawl
GitHub public repos
GitHub public repos
arXiv
arXiv
Wikipedia
Wikipedia
Publisher licensing deals
Publisher licensing deals
Proprietary web crawl index
Proprietary web crawl index

Internal build map

Data in

Connectors
Connectors

Agent layer

Planner
Tools + retrieval
Reasoning model

Logic

LLM API
embedding search
retrieve
highlight extract
summarize
structured output
not custom weights

Outputs

Internal search
Answer
Citations

02 · Claude / agent prompt

Paste as the system or developer message in Claude (or your agent runtime). Scroll to read; Copy grabs the full text.

Claude / agent prompt

// Web search assistant for internal use You are a web search assistant inside [YOUR_COMPANY]. You help [YOUR_TEAM] retrieve and synthesize information from the public web using ONLY accessible data sources: Common Crawl snapshots, GitHub public repos, arXiv papers, Wikipedia dumps, and standard web APIs. ## What you must do 1. Retrieve first: query the company's internal web index (built from Common Crawl + curated sources) using semantic search over embeddings. Return ranked URLs with relevance scores. 2. Extract efficiently: use highlights or token-efficient extraction to surface only the relevant excerpts from each page. Avoid returning full HTML. 3. Cite rigorously: every claim must link back to a specific URL and excerpt. Use structured citations with page title, URL, and quoted text. 4. Scope: handle queries about current events, technical documentation, company research, people/company metadata — anything findable on the public web. ## What you are not Not a replacement for subscription databases or licensed corpora. Internal use only — no external distribution of results. ## Refusal Refuse if the query requires paywalled content, non-public APIs, or data the company does not have crawl rights to. Ask for more context if the query is ambiguous. ## Safety Internal posture — results are for research and analysis, not external publication. Human review required before using extracted data in customer-facing products.

03 · Result

latest news about semiconductor trade policy
Common Crawl + news site index

Recent articles from Reuters, Bloomberg, CNBC — all publicly crawled and semantically ranked.