HTTP_GUARD.md raw

HTTP Guard — Bot Blocking and Rate Limiting

Overview

The HTTP Guard is application-level middleware that protects ORLY from abusive traffic: automated scrapers, AI crawlers, and high-volume requesters. It runs inside ORLY's HTTP handler, before any routing, so it covers both REST API endpoints and WebSocket upgrade requests.

This is designed for deployments where the reverse proxy cannot be customized — particularly Cloudron, which runs nginx but does not allow user modifications to its configuration. For deployments behind a configurable reverse proxy (Caddy, nginx, HAProxy), you can use the HTTP Guard as defense-in-depth or disable it in favor of proxy-level rules.

Configuration

VariableDefaultDescription
ORLY_HTTP_GUARD_ENABLEDtrueEnable the HTTP guard middleware
ORLY_HTTP_GUARD_BOT_BLOCKtrueBlock known scraper/bot User-Agents
ORLY_HTTP_GUARD_RPM120Max HTTP requests per minute per IP
ORLY_HTTP_GUARD_WS_PER_MIN10Max WebSocket upgrade requests per minute per IP

The existing ORLY_IP_BLACKLIST variable is also respected — IPs matching any blacklist prefix are blocked with 403 before any other checks.

Bot Blocking

When ORLY_HTTP_GUARD_BOT_BLOCK=true, requests with User-Agent strings containing any of the following substrings (case-insensitive) are rejected with HTTP 403:

BotOperator
SemrushBotSemrush (SEO crawler)
AhrefsBotAhrefs (SEO crawler)
MJ12botMajestic (SEO crawler)
DotBotMoz (SEO crawler)
PetalBotHuawei/Aspiegel (search)
BLEXBotWebMeUp (backlink checker)
DataForSeoBotDataForSEO (SEO data)
AmazonbotAmazon (product indexing)
meta-externalagentMeta/Facebook (content scraper)
BytespiderByteDance/TikTok (crawler)
GPTBotOpenAI (AI training crawler)
ClaudeBotAnthropic (AI training crawler)
CCBotCommon Crawl (dataset crawler)
FacebookBotMeta (social preview crawler)

This list matches the scraper blocking rules from the relay.orly.dev Caddy configuration. Legitimate search engines (Googlebot, Bingbot) are not blocked.

To disable bot blocking while keeping rate limiting active:

ORLY_HTTP_GUARD_BOT_BLOCK=false

Rate Limiting

Each client IP gets two independent token buckets:

When a bucket is exhausted, the request is rejected with HTTP 429 (Too Many Requests) and a Retry-After: 60 header.

Why Separate WebSocket Limits

A single WebSocket connection is far more expensive than an HTTP request — it holds a goroutine, consumes memory for subscription state, and generates continuous traffic for the lifetime of the connection. Rate limiting WebSocket upgrades separately prevents a single IP from opening hundreds of connections while still allowing normal HTTP API usage.

IP Extraction

The guard determines the client IP using this priority:

  1. X-Forwarded-For header (first IP in chain) — covers reverse proxy deployments
  2. X-Real-Ip header — alternative proxy header
  3. RemoteAddr from the connection — direct connections

In Cloudron, the X-Forwarded-For header is set by Cloudron's nginx. In direct deployments, RemoteAddr is used.

Memory Management

Per-IP state is stored in a concurrent map. A background goroutine runs every 5 minutes and evicts entries for IPs that haven't been seen in the last 10 minutes. This prevents memory growth from drive-by scanners.

Interaction with Other Protections

The HTTP Guard runs before all other request processing in ServeHTTP:

HTTP request → HTTP Guard (bot + rate) → CORS → Blossom → WebSocket → API routing

It complements (does not replace) ORLY's other protection mechanisms:

LayerScopeMechanism
HTTP GuardAll HTTP + WSBot UA blocking, per-IP rate limiting
IP Blacklist (ORLY_IP_BLACKLIST)All connectionsPrefix-match IP blocking (also checked in Guard)
Per-IP Connection Limit (ORLY_MAX_CONN_PER_IP)WebSocket onlyMax concurrent WS connections per IP
Global Connection Limit (ORLY_MAX_GLOBAL_CONNECTIONS)WebSocket onlyTotal WS connection cap
PID Rate Limiter (ORLY_RATE_LIMIT_*)Database operationsMemory-pressure-adaptive throttling
Query Result Limit (ORLY_QUERY_RESULT_LIMIT)Nostr REQ queriesMax events per filter response

Cloudron Deployment

The HTTP Guard is enabled by default in the cloudron-orly deployment template. The environment variables are set in /app/data/orly.env:

export ORLY_HTTP_GUARD_ENABLED="true"
export ORLY_HTTP_GUARD_BOT_BLOCK="true"

No port changes or nginx configuration are needed.

Disabling

To disable the guard entirely:

ORLY_HTTP_GUARD_ENABLED=false

This is appropriate when running behind a reverse proxy that already handles bot blocking and rate limiting (e.g., Caddy with respond rules, Cloudflare, or nginx with limit_req).