url2md

Any URL → clean markdown your LLM can actually read. No nav bars. No cookie banners. No junk.

$ curl "http://localhost:8000/api/convert?url=https://example.com/article"
{ "title": "How We Reduced Latency by 40%", "markdown": "# How We Reduced Latency by 40%\n\nLast Tuesday...", "char_count": 12,847, "token_estimate": 3,212, "line_count": 183 }

The problem you already know

You find a solid doc page or blog post. Select all, copy, paste into Claude. Half your context window is now a cookie consent banner, a nav menu, and six "related articles" you didn't ask for. You just burned 4K tokens on HTML garbage.

~33K
avg chars per article
~8.5K
avg tokens extracted
30/hr
free API calls

Readability extraction

Same algorithm Firefox Reader View uses. Grabs just the article body, skips everything else.

Token counts

See exactly how many tokens you're about to burn. No more context window surprises in Cursor or Claude.

Free API, no signup

30 requests/hour per IP. Curl it, script it, wire it into your pipeline. No API key, no account.

Handles long pages

Up to 500K characters. Full changelogs, long docs, multi-part tutorials — it chews through all of them.

How it works

1
Fetch — clean user-agent, no tracking headers
2
Extract — mozilla/readability grabs the article body
3
Convert — html2text turns HTML into clean markdown
4
Count — token estimate (chars ÷ 4) so you know before you paste

API

# Self-hosted (Docker)
curl "http://localhost:8000/api/convert?url=https://example.com/article"

# Hosted API — coming soon at api.url2md.com
curl -X POST http://localhost:8000/api/convert \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'
Response:
{
  "url": "https://example.com/article",
  "title": "Article Title",
  "markdown": "# Article Title\n\nContent here...",
  "char_count": 4521,
  "token_estimate": 1130,
  "line_count": 87,
  "truncated": false
}

Rate limit: 30 requests/hour per IP. No API key needed.

Open source. MIT license. Self-host with Docker in 30 seconds.