Bots Behave Alone. Fleets Betray Themselves.

How aggregate path topology exposes coordinated bot campaigns that look perfectly clean — one session at a time

1. The Problem With Session-Level Bot Detection

Most bot detection works like a security guard checking IDs at the door. You look at one visitor, decide if they seem suspicious, and either wave them through or flag them. It is a reasonable heuristic, and it catches a lot.

But sophisticated bot operators have spent years learning exactly what the guard is checking.

They throttle request rate. They rotate IPs and user agents. They spoof TLS fingerprints. They fake scroll events and inject mouse movement. They buy residential proxies. A well-tuned fleet of bots can walk through per-session detection looking indistinguishable from a normal reader — because any individual session can be made to look ordinary.

The signal you are missing is not always in any one session.

It is in what the sessions are collectively doing.

2. The Setup: A Real Site, Not a Fake Honeypot

To study this properly, you need a real public site that attracts real visitors and automated traffic in the same place. Artificial honeypots produce artificial behavior. Fake “canary” sites in lab environments attract only the traffic you deliberately send them.

I built and deployed MacroCanary (macrocanary.pages.dev) as a real public observation surface for this experiment. It is a dashboard that tracks selected macroeconomic indicators — US Sahm Rule, yield curve, jobs, India CPI, and a handful of others. It applies threshold rules and staleness checks, and produces a simple traffic-light summary.

That matters for the research setting. A real page attracts the full spectrum: curious readers, mobile skimmers, search crawlers, scrapers, automated monitoring scripts, and eventually coordinated reconnaissance fleets. All in the same observation window, without you curating who shows up.

3. A Canary Layer on Top

Layered over the dashboard are several canary routes:

Robots.txt-disallowed paths: paths explicitly listed as Disallow in robots.txt. Compliant crawlers should not request them. A hit is a strong positive label.
Unlinked token paths: paths that appear nowhere in the page DOM, no sitemap, no internal link. They are accessible only by directory guessing, URL enumeration, or prior knowledge.
Entryless deep links: a path that is not surfaced as ordinary navigation. Hitting it cold suggests the visitor already knew it was there.

It is important to be clear about what the canary layer is and is not.

Canary hits are labels, not the detection method.

When a session hits /private-for-bots-only/, that tells you something about that session: it touched a path that ordinary compliant browsing should not reach. It says much less about campaign intent, coordination structure, or what the broader fleet was collectively doing.

The detection method is the path traversal topology — the aggregate structure of how sessions move through the site graph. That analysis runs on all sessions, canary-hitting or not. Canary hits serve one role: they provide positive-label anchors to check whether suspicious topology correlates with known bot-like behavior.

They do not drive the topology verdict by themselves.

No visitor is blocked, penalized, or served different content. The canary is observational.

4. Three Layers of Detection

The system looks at traffic in three passes.

The first pass asks a simple question:

Does this one visit look unusual?

That catches obvious automation. For example, a session that moves too fast, never scrolls, jumps between strange pages, or behaves unlike normal readers can be scored as suspicious.

But careful bots can avoid that. They can slow down. They can scroll. They can visit only a few pages. One bot can pretend to be ordinary.

So the second pass asks a different question:

Do many ordinary-looking visits look related?

This is where fleet clustering comes in. The system groups sessions that visit similar pages, come from related network areas, or behave in similar ways. Technically, the current implementation uses DBSCAN with path overlap and ASN distance, plus additional gates for route similarity and behavioral consistency.

But the plain-English version is this:

A single visitor may look normal. A group of visitors moving in strangely similar ways may not be normal.

The third pass asks the most important question:

What was the group trying to do?

That is campaign intent classification.

Instead of asking whether one session is a bot, the system asks whether a cluster of sessions looks like it was mapping the site, extracting a target page, arriving in timed waves, or probing pages that normal visitors do not usually find.

5. The Four Aggregate Signals

The system looks for four patterns that only appear when you study a group of sessions together.

A single visit may look normal.

A group of visits may not.

1. Did they divide up the site?

Technical name: Coverage Efficiency

A normal group of readers overlaps a lot. Many people visit the homepage, the dashboard, the same popular pages, and then leave.

A mapping fleet behaves differently.

It divides the work.

One session visits one slice of the site. Another session visits a different slice. A third session covers something else. Each visit still looks ordinary on its own, but the group covers the site unusually efficiently.

That is coverage efficiency.

fleet_coverage_efficiency =  unique_paths_visited_by_cluster / total_path_slots_available_to_cluster

If the value is high, the cluster may be doing site mapping.

Current threshold:

> 0.65 → candidate for site_mapping

The useful question is simple:

Did this group explore the site like coordinated scouts rather than ordinary readers?

2. Did they all end at the same place?

Technical name: Terminal Concentration

The terminal path is the last page a session visits.

Normal readers leave from different places. One person exits from the homepage. Another leaves from the dashboard. Someone else disappears from the methodology page because attention spans are now apparently a subscription service.

An extraction fleet may behave differently.

Sessions may take different routes, but many of them end at the same target page.

That shared destination matters.

terminal_concentration =
  most_common_final_page_count
  /
  cluster_size

If many sessions in the same cluster end on the same page, the cluster may be targeting that page.

Current threshold:

> 0.65 → candidate for targeted_extraction

The useful question is:

Were these sessions wandering randomly, or were they all being pulled toward the same destination?

3. Did they arrive like a metronome?

Technical name: Wave Coordination Score

Human traffic is messy.

People arrive through search, links, bookmarks, chats, random curiosity, and whatever disaster the internet is serving that hour. The gaps between visits are irregular.

A scripted campaign may arrive more regularly.

For example, one session every few minutes, with small random delays added to avoid looking too mechanical.

Each visit can still look normal.

But the timing pattern across the group can reveal coordination.

wave_coordination_score =
  CoV(inter_session_entry_intervals)

CoV means coefficient of variation. In plain English, it measures how regular or irregular the gaps are between session starts.

Current threshold:

< 0.40 → candidate for wave_campaign

The useful question is:

Did these sessions arrive like people, or like a schedule?

4. Did they jump to pages normal visitors do not discover?

Technical name: Orphan Access Ratio

Some pages are naturally reached through links.

Some pages are not.

If a session jumps to a page that has no observed inbound navigation path, that page is treated as a topological orphan.

Entry pages are excluded because someone can legitimately arrive directly from search, bookmarks, shared links, or copied URLs.

The stronger signal is mid-session orphan access.

orphan_access_ratio = orphan_pages_visited  /  total_pages_visited

A high orphan access ratio suggests that the cluster may be enumerating URLs rather than browsing normally.

The useful question is:

Did this group follow the site, or did it probe the site?

6. Campaign Intent

The four signals are combined to describe what the cluster appears to be doing.

These labels are not accusations about individual visitors.

They describe the shape of group behavior.

site_mapping

The cluster covers many different paths efficiently.

This looks like automated indexing or reconnaissance.

Plain English:

The group divided up the site and mapped it.

targeted_extraction

Many sessions end at the same destination.

This looks like automated harvesting focused on a specific page or endpoint.

Plain English:

The group took different routes, but kept ending at the same target.

wave_campaign

Sessions arrive at unusually regular intervals.

This looks like programmatic access designed to avoid simple rate limits.

Plain English:

The group arrived like a schedule, not like people.

coordinated_automation

The cluster looks coordinated, but no specific intent pattern is strong enough.

Plain English:

The group looks related, but the exact purpose is unclear.

When signals overlap, the current priority order is:

wave_campaign
> site_mapping
> targeted_extraction
> coordinated_automation

The key point is that these are structural claims.

The system is not saying:

This individual visitor is definitely a malicious bot.

It is saying:

This group of sessions produced a pattern consistent with coordinated automation.

That is a narrower claim, and a more honest one.

7. What the Synthetic Harness Validated

Before pointing this at live traffic, the full pipeline runs against a synthetic harness of 310 sessions across 13 archetypes. Three archetypes specifically exercise the Layer 3 signals.

Mapping fleet: Each session receives a shuffled random slice of the available path pool. Across the fleet, the site is covered efficiently. No individual session needs to be anomalous. Coverage efficiency is high, producing a site_mapping classification when the fleet gates pass.

Extraction fleet: Sessions wander via random intermediate paths but always terminate at /methodology. The routes are diverse, but the destination is concentrated. Terminal concentration is high, producing a targeted_extraction classification when the fleet gates pass.

Wave fleet: Sessions are generated at fixed intervals with small jitter. Each session follows a plausible route. Wave coordination CoV is low, producing a wave_campaign classification when the fleet gates pass.

The synthetic harness validates that the signal logic works before live data enters the picture.

That is a controlled-conditions claim, not a live-world proof.

8. What Makes This Useful to Security Teams

Per-session bot detection at the WAF or CDN layer catches the obvious cases and is necessary. Fleet clustering adds a second layer that catches coordinated campaigns the per-session layer misses. Both are useful.

The Layer 3 contribution addresses a specific gap: the retrospective classification of campaign intent.

Why does intent matter?

Because the response to a site_mapping campaign is different from the response to targeted_extraction. A site indexing pass might be competitive intelligence — worth monitoring but not necessarily blocking. Targeted extraction of a specific high-value page is a different threat posture. A wave campaign using rate-limit-aware timing suggests that simple rate limits may not be enough.

The signals are only visible retrospectively, across sessions. No real-time system can compute coverage efficiency for sessions that have not happened yet. This is by design: the detection pipeline runs offline, writes verdicts to a database, and produces aggregate reports that a security analyst can read.

The output of the system is information for human judgment, not automated blocking.

9. The Privacy Model

Studying traffic behavior without extracting individual-level data requires some care.

MacroCanary’s collector never writes:

Raw IP addresses
Full user-agent strings
Raw dwell or scroll values
Full TLS fingerprints

Dwell and scroll values are stored as buckets. JA4 is stored only as a short prefix. Session IDs are random tokens, not names or account identifiers.

The system is pseudonymous, not anonymous. That distinction matters. Pseudonymous data can still describe behavior over time, so the system keeps collection narrow and reports aggregate findings.

Raw event data is retained briefly through an automatic cleanup path. The report output is aggregate. It does not need to publish per-session identifiers.

The evidence claim ladder mechanically limits what the output can say based on the input. On synthetic data, the report states: “Pipeline validated against controlled scenarios.” On live data below the minimum fleet gates, it cannot output a fleet verdict.

The system cannot produce a claim the data does not support.

10. What This Does Not Prove

This is worth being explicit about.

It does not prove humans versus bots. Individual session classification always carries uncertainty.
It does not beat or circumvent enterprise bot management products. It complements them.
It does not predict recession. The macro dashboard is educational, not advisory.
Fleet verdicts appear only after the minimum-volume gates pass. Below that threshold, the pipeline reports insufficient evidence.

The strongest available claim is:

Retrospective analysis of aggregate path topology can find patterns consistent with coordinated reconnaissance, across an observation window where individual sessions may appear normal.

The system is designed to be honest about the epistemic limit.

11. What’s Next

The pipeline is deployed. MacroCanary is live at macrocanary.pages.dev, telemetry is collecting, and the scheduled detection run fires weekly.

A reader visiting MacroCanary will see the public macro dashboard, not a live bot-detection console. That is intentional. The detection pipeline runs offline: it reads telemetry, applies the fleet-detection logic, and produces aggregate reports. Raw session data and per-session identifiers are not exposed publicly.

For now, the public evidence is the deployed site, the public methodology, and the synthetic harness results described here. The implementation code is not public yet. The pipeline remains in synthetic fallback mode until the minimum live-session volume is reached. After that, it switches to live D1 data and produces live fleet verdicts only when the fleet gates pass.

The core methodological claim is already validated on the synthetic harness. The interesting question, which only live traffic can answer, is how often Layer 3 signals appear in the wild, and whether they correlate with canary hits that provide strong positive-label evidence.

A fleet that trips a canary is interesting.

A fleet whose traversal topology independently scores as site_mapping is also interesting.

A fleet that does both is a stronger claim than either signal alone.

That is the thing about waiting for a real site to get real traffic: the data has opinions.

MacroCanary is live at macrocanary.pages.dev. The detection methodology and synthetic validation approach are described here; the implementation repository is private for now.

Bots Behave Alone. Fleets Betray Themselves. was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.