Skip to main content

How It Works

A Cloudflare Worker sits in front of your entire website. Every request passes through it before reaching your origin server. When an AI bot is detected by User-Agent, the Worker logs it to the traffic-ingest-service as a non-blocking background task (ctx.waitUntil), then passes the request through to your origin unchanged. Your real visitors never notice any difference.

Setup

1

Install the Wrangler CLI

Wrangler is Cloudflare’s official CLI for deploying Workers.
npm install -g wrangler
Then log in to your Cloudflare account:
wrangler login
2

Create your Worker project

Create a new directory for the Worker (or use the cloudflare-wroker folder from the downloaded files):
mkdir siftly-bot-tracker
cd siftly-bot-tracker
3

Create the Worker files

worker.js — Copy the code below into this file:
worker.js
const AI_BOT_PATTERNS = [
  "GPTBot", "ChatGPT-User", "ClaudeBot", "Claude-Web",
  "Googlebot", "bingbot", "CCBot", "anthropic-ai",
  "Bytespider", "FacebookBot", "Applebot", "PerplexityBot",
  "YouBot", "DuckDuckBot", "Baiduspider"
];

// 👇 Replace with your traffic-ingest-service URL
const LOG_ENDPOINT = "<TRAFFIC_INGEST_SERVICE_URL>/log/cloudflare";

// 👇 Replace with your Organisation ID from the Siftly dashboard
const ORGANIZATION_ID = "<YOUR_ORGANIZATION_ID>";

export default {
  async fetch(request, env, ctx) {
    const userAgent = request.headers.get("user-agent") || "";
    const isBot = AI_BOT_PATTERNS.some(bot =>
      userAgent.toLowerCase().includes(bot.toLowerCase())
    );

    if (isBot) {
      const payload = {
        timestamp: new Date().toISOString(),
        bot_ua: userAgent,
        url: request.url,
        method: request.method,
        ip: request.headers.get("cf-connecting-ip"),
        country: request.cf?.country,
        city: request.cf?.city,
        asn: request.cf?.asn,
        asOrganization: request.cf?.asOrganization,
      };

      // Fire-and-forget — does not block the actual request
      ctx.waitUntil(
        fetch(LOG_ENDPOINT, {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${ORGANIZATION_ID}`,
          },
          body: JSON.stringify(payload),
        }).catch(() => {}) // silently fail if the service is unreachable
      );
    }

    // Always pass the request through to the origin normally
    return fetch(request);
  },
};
wrangler.toml — Worker configuration:
wrangler.toml
name = "siftly-bot-tracker"
main = "worker.js"
compatibility_date = "2024-01-01"

routes = [
  { pattern = "yourdomain.com/*", zone_name = "yourdomain.com" }
]
Replace yourdomain.com with your actual domain. The Worker will intercept all traffic matching this pattern.
4

Fill in your credentials

In worker.js, replace:
PlaceholderValue
<TRAFFIC_INGEST_SERVICE_URL>The URL of the traffic-ingest-service (ask your Siftly admin)
<YOUR_ORGANIZATION_ID>Your Organisation ID UUID from the Siftly dashboard
In wrangler.toml, replace:
PlaceholderValue
yourdomain.comYour actual domain name
5

Deploy the Worker

From the Worker directory, run:
wrangler deploy
Wrangler will upload the Worker to Cloudflare’s edge network and activate the route. You should see output like:
Uploaded siftly-bot-tracker (1.23 sec)
Published siftly-bot-tracker (0.42 sec)
  https://siftly-bot-tracker.your-account.workers.dev
  yourdomain.com/*

Verifying It Works

After deploying, simulate an AI bot request to confirm events flow through:
curl -A "GPTBot/1.0" https://yourdomain.com
Check the Traffic section of the Siftly dashboard within a few minutes. You should see an entry for the simulated GPTBot visit.

Updating the Worker

To change the endpoint URL or Organisation ID later:
  1. Edit worker.js with the new values.
  2. Run wrangler deploy again.
Changes propagate globally within seconds.

Troubleshooting

  1. Confirm the route in wrangler.toml matches your domain exactly.
  2. Visit dash.cloudflare.comWorkers & Pages → your worker → Logs to see real-time request logs.
  3. Check the LOG_ENDPOINT URL is correct and the service is reachable.
  4. Test with curl -A "GPTBot/1.0" https://yourdomain.com from a terminal.
The Worker calls return fetch(request) at the end, which proxies the request to your origin unchanged. If something looks wrong, check that you haven’t modified this line.You can also add specific paths to an allowlist or denylist using the matcher pattern in wrangler.toml.
Add multiple route entries to wrangler.toml:
routes = [
  { pattern = "site1.com/*", zone_name = "site1.com" },
  { pattern = "site2.com/*", zone_name = "site2.com" }
]
Each domain must be in your Cloudflare account.