url2md — Any URL to clean Markdown for LLMs

$ curl "http://localhost:8000/api/convert?url=https://example.com/article"

{ "title": "How We Reduced Latency by 40%", "markdown": "# How We Reduced Latency by 40%\n\nLast Tuesday...", "char_count": 12,847, "token_estimate": 3,212, "line_count": 183 }

The problem you already know

You find a solid doc page or blog post. Select all, copy, paste into Claude. Half your context window is now a cookie consent banner, a nav menu, and six "related articles" you didn't ask for. You just burned 4K tokens on HTML garbage.

Readability extraction

Same algorithm Firefox Reader View uses. Grabs just the article body, skips everything else.

Token counts

See exactly how many tokens you're about to burn. No more context window surprises in Cursor or Claude.

Free API, no signup

30 requests/hour per IP. Curl it, script it, wire it into your pipeline. No API key, no account.

Handles long pages

Up to 500K characters. Full changelogs, long docs, multi-part tutorials — it chews through all of them.

How it works

Fetch — clean user-agent, no tracking headers

Extract — mozilla/readability grabs the article body

Convert — html2text turns HTML into clean markdown

Count — token estimate (chars ÷ 4) so you know before you paste

API

# Self-hosted (Docker)
curl "http://localhost:8000/api/convert?url=https://example.com/article"

# Hosted API — coming soon at api.url2md.com

curl -X POST http://localhost:8000/api/convert \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

Response:
{
  "url": "https://example.com/article",
  "title": "Article Title",
  "markdown": "# Article Title\n\nContent here...",
  "char_count": 4521,
  "token_estimate": 1130,
  "line_count": 87,
  "truncated": false
}

Rate limit: 30 requests/hour per IP. No API key needed.