mcp-web-fetch

Token-efficient web reading and HTTP requests for MCP agents.

An MCP server with two tools: fetch_text strips web pages down to clean readable text — dramatically reducing token usage when an agent needs to read a URL. http_request is a full HTTP client for REST API calls, form submissions, and anything requiring raw control.

Built for Claude, Cursor, and any MCP-compatible agent. No browser required. Pure Node.js, single bundled file.

Why `fetch_text` matters for agents

A typical web page weighs 300–800 KB of raw HTML — scripts, styles, nav bars, footers. Most of it is noise. An agent reading that page burns thousands of tokens on markup it cannot use.

fetch_text scrapes the page and returns only the readable content:

google.com raw HTML   →  ~480 000 chars
google.com fetch_text →       177 chars

manifesto page HTML   →  ~42 000 chars  
manifesto fetch_text  →    5 800 chars  (~7× smaller)

This is a simple HTML scraper — not a full browser renderer. It does not execute JavaScript, handle SPAs, or bypass bot protection. That is the tradeoff for zero dependencies and minimal overhead. For static pages, documentation, articles, and llms.txt files it works excellently.

Tools

`fetch_text` — low-token web content

Fetches a URL and returns clean readable text. Skips all scripts, styles, navigation, and layout noise. Extracts <title> separately. Prefers <main> or <article> when available.

param	type	default	description
`url`	string	required	Any valid URL
`max_chars`	number	`20000`	Output character cap
`timeout_ms`	number	`10000`	Request timeout in ms

Response:

{
  "ok": true,
  "url": "https://example.com/article",
  "status": 200,
  "title": "Article title",
  "text": "Clean readable content without any HTML...",
  "char_count": 4821,
  "truncated": false,
  "elapsed_ms": 248
}

Examples:

# Read an article or documentation page
fetch_text("https://docs.example.com/guide")

# Read a manifesto or about page
fetch_text("https://unpredictablemachine.com/manifesto")

# Read llms.txt
fetch_text("https://example.com/llms.txt")

# Limit output for large pages
fetch_text("https://en.wikipedia.org/wiki/Node.js", max_chars=5000)

Limits:

Does not execute JavaScript — SPAs and dynamically rendered content may return empty or partial text
Does not handle bot protection or CAPTCHAs
Not a replacement for a headless browser

`http_request` — full HTTP client

Universal HTTP client with full control over method, headers, and body. Use for REST APIs, form posts, webhooks, localhost, and internal network addresses.

param	type	default	description
`url`	string	required	Any valid URL (https, http, localhost, internal IP)
`method`	string	`GET`	GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS
`headers`	object	`{}`	Custom request headers
`body`	string	—	Raw request body (XML, form-data, plain text)
`body_json`	object	—	Auto-serialized JSON + sets Content-Type: application/json
`timeout_ms`	number	`10000`	Request timeout in ms
`max_bytes`	number	`500000`	Response body size cap

body_json takes priority over body when both are provided.

Response:

{
  "ok": true,
  "url": "https://api.example.com/posts",
  "method": "POST",
  "status": 201,
  "status_text": "Created",
  "content_type": "application/json",
  "headers": { "content-type": "application/json" },
  "body": "{\"id\": 42}",
  "truncated": false,
  "elapsed_ms": 142
}

Examples:

# REST POST with JSON body
http_request("https://api.example.com/posts",
  method="POST",
  body_json={"title": "Hello", "published": true})

# PUT with Authorization header
http_request("https://api.example.com/users/1",
  method="PUT",
  headers={"Authorization": "Bearer TOKEN"},
  body_json={"name": "Pavel"})

# Raw XML payload
http_request("https://legacy.api/endpoint",
  method="POST",
  headers={"Content-Type": "application/xml"},
  body="<root><item>value</item></root>")

# DELETE
http_request("http://localhost:8080/api/posts/42", method="DELETE")

# Internal network
http_request("http://192.168.1.100:8080/api/status")

When to use which

situation	tool
Reading articles, docs, blog posts	`fetch_text`
Reading `llms.txt` or plain text files	`fetch_text`
REST API calls (POST / PUT / DELETE)	`http_request`
Raw response body or headers needed	`http_request`
Localhost or internal network	both work
JavaScript-rendered SPA	neither (use a browser)

Logging

All requests logged to .var/requests.log — one JSON line per request:

{"ts":"2026-06-10T08:20:00.000Z","tool":"fetch_text","method":"GET","url":"https://example.com","status":200,"ok":true,"elapsed_ms":248}

Rotates at ~1 MB → keeps one .1 backup. Configure or disable in src/Config.js:

LOG_FILE: '.var/requests.log',  // '' = disabled
LOG_MAX_BYTES: 1_000_000

Install & build

build.cmd

Installs dependencies, bundles to dist/mcp.js, runs tests. The dist/ folder is self-contained — no node_modules needed at runtime.

Claude Desktop config

{
  "mcpServers": {
    "mcp-web-fetch": {
      "command": "node",
      "args": ["D:/dev/ai/mcp-web-fetch/dist/mcp.js"]
    }
  }
}

Stack

Node.js 22+ (native fetch built-in, no extra HTTP dependency)
@modelcontextprotocol/sdk
node-html-parser — fast pure-JS HTML parser, no native bindings
zod + zod-to-json-schema
esbuild (build only)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
test		test
README.md		README.md
build.cmd		build.cmd
build.sh		build.sh
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-web-fetch

Why `fetch_text` matters for agents

Tools

`fetch_text` — low-token web content

`http_request` — full HTTP client

When to use which

Logging

Install & build

Claude Desktop config

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-web-fetch

Why fetch_text matters for agents

Tools

fetch_text — low-token web content

http_request — full HTTP client

When to use which

Logging

Install & build

Claude Desktop config

Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `fetch_text` matters for agents

`fetch_text` — low-token web content

`http_request` — full HTTP client

Packages