AI Crawler Tracking

Besökskollen can show you exactly which AI bots fetch your site, GPTBot, ClaudeBot, PerplexityBot and more, once you install a small snippet on your server.

Why track AI crawlers?

AI bots don't run JavaScript, so they're invisible to regular analytics. Yet they decide whether your pages can be cited by ChatGPT, Perplexity, Claude and others. Tracking them gives you:

  • Which AI models know about your site , Each platform runs its own bot. Seeing PerplexityBot but no GPTBot? Now you know where you stand with each AI.
  • Which pages are indexed most , Pages AI fetches often are the ones most likely to be cited. That tells you where to focus optimization energy.
  • When indexing happens , Daily GPTBot traffic = OpenAI is updating its view of your site. Long pauses can signal your content isn't being treated as fresh.

How it works

The snippet checks the User-Agent on every request to your server. When an AI crawler matches, it makes an async POST to https://besokskollen.se/api/ai-crawl with the crawler name and URL. Everything runs server-side, zero impact on your visitors' experience.

We only store crawler name, URL, User-Agent and timestamp. No personal data, no cookies. Tracking applies only to bots, real visitors are ignored.

Installation

Pick your platform. Replace YOUR_SITE_ID with your site ID (the same one in your regular tracking snippet). The snippet runs in parallel with our regular JS tracker, they don't interfere.

Next.js (App Router or Pages)

Add to your existing middleware.ts (or create one). Vercel runs middleware at the edge, minimal latency.

ts
// middleware.ts (eller utvidga befintlig)
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const SITE_ID = 'YOUR_SITE_ID';
const AI_API = 'https://besokskollen.se/api/ai-crawl';

const AI_CRAWLER_PATTERNS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'PerplexityBot', 'Perplexity-User',
  'ClaudeBot', 'Claude-Web', 'Claude-User', 'Claude-SearchBot', 'anthropic-ai',
  'Google-Extended', 'GoogleOther',
  'Applebot-Extended',
  'meta-externalagent', 'meta-externalfetcher',
  'Amazonbot', 'cohere-ai', 'Bytespider', 'CCBot',
  'DuckAssistBot', 'YouBot', 'Diffbot',
];

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') || '';
  const matched = AI_CRAWLER_PATTERNS.find(p =>
    ua.toLowerCase().includes(p.toLowerCase())
  );

  if (matched) {
    // Fire-and-forget, blockerar inte response
    fetch(AI_API, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        siteId: SITE_ID,
        pathname: request.nextUrl.pathname,
        userAgent: ua,
      }),
    }).catch(() => {});
  }

  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next|favicon.ico).*)'],
};

WordPress

Add to your theme's functions.php or create an mu-plugin. wp_remote_post with 'blocking' => false runs async and doesn't affect page load.

php
// I functions.php (i ditt aktiva tema) eller som mu-plugin

add_action('template_redirect', function() {
    $site_id = 'YOUR_SITE_ID';
    $api = 'https://besokskollen.se/api/ai-crawl';

    $patterns = [
        'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
        'PerplexityBot', 'Perplexity-User',
        'ClaudeBot', 'Claude-Web', 'Claude-User', 'Claude-SearchBot', 'anthropic-ai',
        'Google-Extended', 'GoogleOther',
        'Applebot-Extended',
        'meta-externalagent', 'meta-externalfetcher',
        'Amazonbot', 'cohere-ai', 'Bytespider', 'CCBot',
        'DuckAssistBot', 'YouBot', 'Diffbot',
    ];

    $ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
    foreach ($patterns as $p) {
        if (stripos($ua, $p) !== false) {
            wp_remote_post($api, [
                'blocking' => false, // Asynkron, blockerar inte sidladdningen
                'timeout' => 1,
                'headers' => ['Content-Type' => 'application/json'],
                'body' => wp_json_encode([
                    'siteId' => $site_id,
                    'pathname' => $_SERVER['REQUEST_URI'] ?? '/',
                    'userAgent' => $ua,
                ]),
            ]);
            break;
        }
    }
});

Node.js / Express

Generic middleware for Express, Fastify or similar. Uses fetch (Node 18+), no extra dependency.

js
// Express middleware
const SITE_ID = 'YOUR_SITE_ID';
const AI_API = 'https://besokskollen.se/api/ai-crawl';

const AI_CRAWLER_PATTERNS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'PerplexityBot', 'Perplexity-User',
  'ClaudeBot', 'Claude-Web', 'Claude-User', 'Claude-SearchBot', 'anthropic-ai',
  'Google-Extended', 'GoogleOther',
  'Applebot-Extended',
  'meta-externalagent', 'meta-externalfetcher',
  'Amazonbot', 'cohere-ai', 'Bytespider', 'CCBot',
  'DuckAssistBot', 'YouBot', 'Diffbot',
];

app.use((req, res, next) => {
  const ua = req.get('user-agent') || '';
  const matched = AI_CRAWLER_PATTERNS.find(p =>
    ua.toLowerCase().includes(p.toLowerCase())
  );

  if (matched) {
    fetch(AI_API, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        siteId: SITE_ID,
        pathname: req.path,
        userAgent: ua,
      }),
    }).catch(() => {});
  }

  next();
});

Which AI bots are tracked?

We currently recognize crawlers from these AI models. The list grows as new bots appear.

OpenAI: GPTBot, ChatGPT-User, OAI-SearchBot
Anthropic: ClaudeBot, Claude-Web, Claude-User, Claude-SearchBot
Perplexity: PerplexityBot, Perplexity-User
Google: Google-Extended, GoogleOther
Apple: Applebot-Extended
Meta: meta-externalagent, meta-externalfetcher
Amazon: Amazonbot
Cohere: cohere-ai
ByteDance: Bytespider
Common Crawl: CCBot
DuckDuckGo: DuckAssistBot
You.com: YouBot

FAQ

Does the snippet affect my site's performance?

No. All variants use async or fire-and-forget fetch that runs in the background without blocking the response to either the visitor or the bot.

What if a bot is blocked by my CDN/firewall?

Then the snippet never sees the request, and we never get a report. That's correct behavior. If you want to see ALL bots including blocked ones, you'd need to log at the edge/CDN level instead.

Can I report additional crawlers myself?

Right now the known-crawler list is centralized, only bots we recognize are stored. If you spot a new AI crawler we miss, let us know and we'll add it.