Author: Berend Watchus. Independent non-profit AI & Cybersecurity Researcher. Publication for System Weakness online magazine.

May 25, 2026.

https://arxiv.org/pdf/2605.23448
https://arxiv.org/abs/2605.23448

Hunters Don’t Email the Prey: A Hypergame Reading of Zhang’s AI Security Publication Gap

A response to “AI Security Research Should Better Incentivize Defense Research” (Zhang, arXiv:2605.23448v1, May 2026)

Opening: the measurement is correct, the inference is not

Youqian Zhang’s recent paper makes an empirical claim that deserves to be taken seriously and a structural claim that does not survive careful examination.

The empirical claim is solid. Across 16 systematization-of-knowledge papers covering 1,162 attack/defense-classified citations, Zhang documents that attack papers outnumber defense papers by 1.24:1, with an average ratio of 2.15:1 across 21 additional survey papers. The imbalance is visible at venue level (security venues skew toward attack publication; AI venues skew slightly toward defense), at subfield level (some areas are extremely attack-skewed, with Suya et al.’s black-box adversarial attacks SoK reaching 34.4× attack-heavy), and over time (the gap widened from 2017 to 2022). The measurement is careful, the methodology is defensible, and the numbers are what the numbers are.

The structural claim is the one I want to engage. Zhang infers from the measured imbalance that defensive research is “structurally lagging,” that the field exhibits a “structural lag in defensive progress,” and that AI security research needs to “better incentivize defense research” by raising the academic profile of defensive contributions, improving evaluation standards for defenses, and treating defense as a “first-class scientific contribution.” His prescription assumes that publication count is a reasonable proxy for defensive capacity and that the imbalance he measures reflects a genuine deficit in defensive work that academic incentive reform could meaningfully address.

This is where the inference breaks down. The publication imbalance Zhang documents is real, but the structural conclusion he draws from it requires assumptions that hypergame theory predicts are false. Specifically: the assumption that publication-based measurement is unbiased with respect to the actors and operations that constitute the field’s defensive capacity, and the assumption that the academic publication game is the dominant game being played by the field’s defenders. Both assumptions, on examination, turn out to be wrong in ways that systematically bias the measurement against detecting actual defensive activity.

The article that follows is not a refutation of Zhang. The count is what the count is, and Zhang’s measurement work is a useful contribution to understanding what’s visible in the academic literature. What I want to argue is that the visible literature is one layer of a multi-layer game whose structure hypergame theory predicts and the historical record of cryptography confirms, and that the imbalance Zhang measures is the expected signature of that deeper structure rather than evidence of the defensive deficit he infers.

The contribution I’m claiming is narrow: not that hypergame theory is new (it has decades of literature behind it from Bennett, Fraser, Hipel, and others), and not that the broader observation that defensive work goes unpublished is novel (Zhang himself concedes this in Section 4.6’s industry-academia gap). What I’m arguing is that the specific application of hypergame theory to Zhang’s publication-imbalance methodology, combined with the cryptography historical precedent, reveals that his framework systematically cannot see what it would need to see to support its structural conclusion. The imbalance is real; the implied deficit is not what the imbalance is evidence of.

The hypergame framework

Hypergame theory was developed in the 1970s and 80s by Peter Bennett, Niall Fraser, Keith Hipel, and others, originally to analyze strategic conflicts where classical game theory failed. Its foundational insight is that rational actors in adversarial settings often have different perceptions of what game is being played, and that this perceptual asymmetry is itself a strategic resource. Classical game theory assumes common knowledge of the game structure — both players know the rules, the players, and the payoffs. Hypergame theory drops this assumption and asks: what happens when one player’s model of the game is incorrect, and the other player knows it?

The answer, in adversarial domains, is that the player with the more accurate model of both games (theirs and the opponent’s misperception of theirs) systematically outperforms the player with the less accurate model, even when the second player has material advantages. This is the structural reason guerrilla forces defeat conventional militaries materially superior to them, why insurgent campaigns succeed against industrial states, and why intelligence services with smaller budgets can run successful operations against larger ones. The asymmetry isn’t in resources. It’s in perception of the game.

Applied to cybersecurity, hypergame theory predicts that mature adversarial domains develop multi-layered structures where different actors are playing different games at different levels of awareness about each other. The visible game — academic publication, public conferences, regulatory disclosure — is one layer. Below it sit other games whose players are deliberately invisible from the perspective of the visible layer. Hypergame theory says: this multi-layer structure is a necessary feature of any mature adversarial domain, not a contingent failure of transparency. It exists because perception asymmetry is strategically valuable, and rational actors invest in maintaining it.

This is the framework Zhang’s analysis is missing. His methodology measures the visible publication layer as if it were the entire field. Hypergame theory predicts that the visible layer is one slice of a deeper structure, that the deeper structure is largely defensive in character, and that it is invisible to publication-based measurement by design. The imbalance Zhang measures is real, but it’s a measurement of the visible layer, not of the field’s defensive capacity.

The hunter principle: hunters don’t email the prey

The first concrete consequence of the hypergame framework is what I’ll call the hunter principle. A hunter does not send out emails to the animals they will hunt later, explaining that they will be using fake mating calls. The reason is obvious once stated: the value of the mating call as a hunting tool depends entirely on the prey not knowing it’s fake. Disclosure destroys the operation. The hunter can publish the existence of the technique only after the technique is no longer in use, and even then disclosure costs nothing only because the operation is already complete.

This generalizes. Any defensive or offensive operation whose strategic value depends on the adversary’s false model of the environment has an absolute interest in preserving that false model. Publishing the method destroys the asymmetry that makes the method work. The technical artifact and the strategic position are different things; disclosure preserves the artifact but eliminates the position.

The principle applies across a wide range of operations: honeypots, intelligence dangles, controlled-leak operations, seduction operations, sting operations, false-flag operations, and a substantial class of defensive techniques in cybersecurity. The seduction operation only works if the target believes they are in the love game; the moment they update to “I am being targeted for kompromat,” the operation ends. The honeypot only works if the attacker believes they have penetrated a real system; the moment they realize it’s a trap, every action they take after that is theater for the defender’s benefit. The attribution capability only works if the attacker doesn’t know they’re being attributed; the moment the methodology is published, sophisticated attackers begin laundering their signatures.

This is the structural reason a large class of valuable defensive work cannot be published. It’s not a cultural problem of academic incentives. It’s not a failure of publication norms. It’s a property of what the operations are. The hunter cannot email the prey, ever, regardless of how much academic incentive structure rewards publication. The technique exists only by virtue of the prey’s ignorance of it.

Zhang’s framework treats the publication imbalance as a fixable cultural problem. The hunter principle says: a portion of the imbalance is structurally fixed, not by culture but by the nature of the operations themselves. Whatever incentive reform Zhang’s framework proposes, it cannot reach perception-dependent defensive operations without destroying them. The class of defenses that can be published is necessarily smaller than the class of defenses that can exist.

The cryptography precedent

The hypergame framework’s predictions are not speculative. They have been confirmed historically in an adjacent field with similar actor structures: cryptography.

Differential cryptanalysis was published by Eli Biham and Adi Shamir in 1990 as a novel attack on block ciphers, treated by the academic community as a significant discovery. Four years later, Don Coppersmith — a member of the IBM team that designed DES in the 1970s — published a paper revealing that the IBM team had known about differential cryptanalysis (they called it the “T-attack”) and had specifically designed DES’s S-boxes to resist it. The NSA, involved in DES standardization, had asked IBM not to publish the technique. The capability had existed inside the classified world for nearly two decades before the academic world independently rediscovered it.

Public-key cryptography followed the same pattern. The Diffie-Hellman key exchange (1976) and what became RSA (1977) were treated as foundational discoveries of the academic cryptographic community. In 1997, GCHQ declassified work by James Ellis, Clifford Cocks, and Malcolm Williamson showing that the same techniques had been developed inside GCHQ in the early 1970s, classified, and shelved. The British government had possessed what became public-key cryptography years before its public discovery, and the academic community had no way to know.

These are not conspiracy theories. They are documented historical facts, confirmed by the actors who maintained the original secrecy. The NSA acknowledged its role in DES design indirectly through Coppersmith’s authorized publication. GCHQ formally declassified the Ellis-Cocks-Williamson work. The pattern is established: in adjacent technical fields with similar actor structures, the visible academic timeline lags the actual capability timeline by years to decades, and the academic community operates inside a perception of the threat landscape that is systematically incomplete.

Three properties of the cryptography case generalize directly to AI security:

The hidden-capability layer is real and large. Not a thin border of classified material around an otherwise public field, but a substantial body of work developed and maintained inside institutional contexts where publication was structurally inappropriate.

Suppression durations are decades, not months. The hypergame position was valuable enough to defend for twenty years across personnel changes, retirements, and political shifts. This calibrates the timescale on which the same dynamic should be expected to operate in AI security — not months or years, but potentially decades.

Revelation happens reactively, not voluntarily. The hidden layer reveals itself in response to the public layer catching up, partially, and only when there is a triggering reason to acknowledge prior knowledge. Cryptographic capabilities developed inside classified contexts that the public field has not independently rediscovered presumably remain undisclosed, because there has been no triggering event. The visible history of the field is the subset of the actual history that the public eventually rediscovered.

There is no specific reason to expect AI security to behave differently than cryptography behaved. The actor structures are similar: states with intelligence interests, large technology companies with proprietary capabilities, an academic research community, and a population of independent researchers. The strategic stakes are comparable, possibly higher. The hypergame framework predicts the same multi-layer structure in both domains. The cryptography case confirms the structure historically. The reasonable expectation is that AI security has analogous hidden layers, and that publication-based measurement of the field cannot see them.

The actor asymmetry: who plays the publication game?

The publication imbalance Zhang measures is also biased by the populations of actors who actually play the publication game, and this turns out to do more analytical work than the perception-dependence argument alone.

Consider the asymmetry between an offensive researcher publishing an attack and a civilian company publishing a defense. A hacker who finds a clever attack technique can share the trick the way a footballer shares a step-over — perform it, get the applause, move on to the next one. The marginal cost of publication is essentially zero. The hacker has no fiduciary duty to anyone harmed by the disclosure, no regulatory exposure, no competitor positioned to use the disclosure against them commercially, no customer base requiring reassurance, no board or shareholders asking questions, no brand damaged by association with the topic, and a portfolio that benefits from one more demonstrated capability. The publication decision is free. There are even researchers for whom not publishing would be costly, because their professional standing depends on visible track records.

A multi-billion-dollar company cannot afford to be seen as stupidly vulnerable. The same disclosure that earns the hacker reputational capital costs the company reputational capital, often catastrophically. The company isn’t praised for finding the vulnerability — they’re indicted for having had it in the first place. The reputational physics are inverted: the hacker gains by displaying technical sophistication, the company loses by displaying that they needed it. Same technical content, opposite reputational valence, and the asymmetry follows from role assignments that no amount of cultural reform can change.

But the asymmetry runs deeper than reputational physics. To see it, it helps to enumerate the actual populations that hold defensive knowledge in the field, and to ask of each: what game are they playing, and does publication serve it?

State and national security actors. Signals intelligence agencies, defense ministries, and national cyber commands have been studying AI security since before it had a name. NSA, GCHQ, Unit 8200, the Chinese Ministry of State Security, French DGSE, Russian services, and their counterparts run substantial programs on adversarial machine learning, model extraction, training-data inference, attribution methodology, and defensive countermeasures. None of this surfaces as arXiv preprints. Some fraction eventually leaks (Snowden-style), gets declassified after decades (the GCHQ public-key crypto release of 1997 is the canonical example), or appears in sanitized form in unclassified contractor reports. Most stays inside the wall, indefinitely, because the strategic value of these capabilities depends entirely on adversary ignorance of them. The cryptography precedent says: expect this layer to be large, durable, and structurally invisible until specific declassification events force partial disclosure. Zhang’s framework cannot see any of this.

Frontier labs and large technology companies. Anthropic, OpenAI, DeepMind, Meta, xAI, and the major cloud providers (AWS, Azure, GCP) run internal red teams, model-safety teams, and threat-intelligence functions that continuously analyze AI security at a level of sophistication that academic researchers usually cannot match — partly because the labs have access to model internals, training data, deployment telemetry, and incident data that academics don’t, and partly because they have the resources to fund sustained analysis. Some of their work surfaces as published model cards, safety research papers, or responsible-disclosure write-ups. Most doesn’t. The internal threat models, detection methodologies, attribution capabilities, and operational defensive infrastructure remain proprietary by design. The economic incentive to publish is weak — revealing defenses teaches attackers, and revealing detection capabilities teaches them how to evade. The economic incentive to do the analysis is enormous because the cost of a breach or a high-profile misuse event is potentially existential.

Threat-intelligence firms and security vendors. Mandiant, CrowdStrike, Microsoft Threat Intelligence, Palo Alto Networks Unit 42, SentinelOne, Recorded Future, and analogous firms operate as commercial threat-intelligence operations whose business model is precisely the productization of defensive analysis. They publish a calibrated subset of their findings — APT campaign reports, indicator-of-compromise feeds, conference presentations — but the underlying methodology, the unpublished portion of their analysis, and the contents of their commercial threat-intel feeds remain inside customer relationships. The publication serves marketing and credibility; the actual product is what doesn’t get published.

ISACs and sectoral threat-sharing organizations. The Financial Services ISAC, Health-ISAC, Electricity-ISAC, Aviation-ISAC, Automotive-ISAC, and analogous bodies across roughly two dozen critical infrastructure sectors operate as closed sharing circles where member organizations exchange threat intelligence, defensive analysis, and incident details that none of them would share publicly. CISA’s joint advisory program in the U.S., NCSC’s equivalent in the U.K., and similar national-level coordination bodies sit on top of this layer. The volume of defensive analysis circulating through these channels is large; the visibility to outside observers is essentially zero. ISACs exist precisely because the default civilian non-cooperation equilibrium (“let our competitors bleed”) produces sector-level vulnerability, and the institutional response is to convert it into a closed cooperative equilibrium that doesn’t leak to the public.

Civilian companies with security as infrastructure. Banks, retailers, healthcare systems, energy companies, transportation operators, manufacturers, telecoms — the broad civilian economy that runs on defended IT systems. Most of these organizations do not produce publishable research, because their security function is operational infrastructure embedded in a non-security core business. Their defensive work exists as specific firewall rules on specific networks protecting specific data with specific access patterns. To turn this work into a paper would require additional abstraction effort the company has no commercial reason to fund. They are not suppressing publishable findings; their work exists in a form that was never paper-shaped to begin with. This is the largest population in the field by sheer count of organizations and by aggregate defensive surface area, and it is almost entirely outside Zhang’s measurement.

Underground and gray-hat communities. Bug-bounty hunters, independent red teamers, gray-hat researchers, and the various trust networks that connect them — Discord servers, private forums, Telegram groups, back-channels at DEF CON / Black Hat / CCC — read attack papers carefully and also do defensive inversions, but share findings within networks rather than in academic venues. Some of this work surfaces as conference talks; much of it stays in the network. This layer often anticipates academic findings by years and operates with its own internal reputation economy that does not map to academic citation metrics.

Criminal organizations. Sophisticated cybercrime groups do their own threat analysis, including defensive analysis of the systems they target (to evade detection) and counter-defensive analysis of the law enforcement and threat-intel apparatus that pursues them. This work is hidden by definition. Some of it surfaces in indictments, post-arrest forensics, and ransomware-group leaks (Conti’s internal communications, for instance, have offered occasional windows into this layer), but the operational understanding is largely unpublished.

Independent analysts and bridge-builders. A small population of independent researchers — including those who write for venues like the OSINT Team magazine where I have published — read both the public attack literature and what’s visible of the defensive landscape, and produce synthesis work that connects them. This layer is partially visible (the published pieces are public) but represents a fraction of total field analysis and operates on its own incentive structure separate from both academic publication and corporate suppression.

The point of enumerating these is to make concrete what would otherwise be abstract. When Zhang’s framework counts “defense papers” and finds them outnumbered, the population producing those papers is overwhelmingly the academic research community plus a small calibrated subset of security-vendor publications. Every other population in the list above either does not produce papers, produces them rarely and selectively, or produces analysis in forms (incident reports, threat-intel feeds, internal documentation, sectoral advisories, indictments, conference back-channels) that Zhang’s citation graph cannot register.

The visible defense literature is not a sample of the field’s defensive activity. It is a sample of the field’s defensive activity that comes from actors who happen to be playing the academic publication game. Academic researchers play that game by definition. Security vendors play it strategically when it serves their commercial position. Everyone else — the state apparatus, the frontier labs’ proprietary work, the threat-intel firms’ unpublished analysis, the ISAC sharing networks, the broad civilian sector, the underground communities, the criminal-side analysis, and most of the independent layer — mostly doesn’t, and not for reasons academic incentive reform can change.

There is one place where civilian companies do show up in something resembling publication: regulatorily mandated compliance documentation. This is the standards regime, which deserves its own analysis, because what civilian defensive work looks like when it is forced into public form turns out to confirm the hypergame prediction in a particularly sharp way. Before turning to that, though, one more aspect of the actor asymmetry is worth naming, which has to do with how civilian companies handle attacks they have already detected.

When a civilian company is attacked and reverse-engineers the attack — Nike, Adidas, Apple, Google, a bank, a hospital, a transport operator — they generally do not publish what they found. The reasons are partly reputational (disclosure makes them look stupidly vulnerable), partly legal (disclosure creates regulatory and litigation exposure), and partly competitive: if Adidas figures out a novel attack vector that hit them, and Nike is exposed to the same vector but hasn’t been hit yet, Adidas has no commercial incentive to tell Nike. Letting your competitor bleed from a wound you already healed is, in pure market-share terms, advantageous.

This is the classical competitive game: a flat board with common-knowledge rules, where market share is roughly zero-sum and the other party’s loss is straightforwardly your gain. The accounting is direct and symmetric. But security is not played on this board. Security operates on a hypergame board where a third party (the attacker population) is partially hidden, where information structures are asymmetric in ways neither competitor can fully map, and where outcomes that look zero-sum at the company level are negative-sum at the sectoral level. When Nike gets breached and Adidas doesn’t, the competitive board shows Adidas winning. The hypergame board shows the threat actor gaining sectoral mapping and refined capability that makes Adidas the next logical target. The “let your competitor bleed” calculation is rational on the first board and self-destructive on the second, and the companies playing it are usually not distinguishing between the two boards. ISACs are essentially institutional attempts to make the second board visible to actors whose organizational machinery is optimized for the first.

There is also a deeper strategic position available to sophisticated victims: rather than patching the attacker out, leave the access channel open in a controlled way and monitor it. The attacker believes they still have undetected access; the victim has converted the attack into a surveillance position on the attacker. Now they don’t know that we know. In intelligence terminology this is called running the access. Mandiant’s APT1 report (2013) on PLA Unit 61398 was built substantially on this kind of monitoring across multiple victim networks. The detection-to-disclosure gap in most major APT disclosures (NOBELIUM/SolarWinds, HAFNIUM, various Russian and Chinese groups) is precisely the running-the-access window. This work cannot be published in operational detail without burning future running-the-access opportunities, which means the most sophisticated victim-side defensive analysis is exactly the analysis that cannot surface in academic publication.

All of this compounds Zhang’s measurement bias. The civilian-actor populations hold most of the operational defensive surface area in the field. They do not publish for structural reasons. Some of them produce analysis that they actively cannot publish without destroying its operational value. The visible defense literature is therefore not undercounting these populations by a small factor — it is largely failing to register them at all. The publication imbalance Zhang measures is the visible signature of an underlying population distribution in which most defensive analysis happens outside the publication game entirely.

Security vendors are different. CrowdStrike, Mandiant, Palo Alto Networks, SentinelOne — these companies have security as their core business, and publication serves their commercial position. Google publishes Project Zero work because their developer ecosystem benefits from being seen as security-serious. Microsoft publishes threat intelligence because their enterprise software business benefits from security positioning. Apple publishes selective security research because consumer trust rewards visible security investment. These are cases where secondary commercial alignment overcomes the default of non-publication. The exceptions prove the rule: civilian companies publish defensive work when there’s a specific commercial reason that makes publication serve the core business. Absent that reason, the default holds.

The standards: the floor is raised and the ceiling is documented

There is one place where civilian defensive work is forced into formal public documentation: regulatorily mandated security standards. PCI-DSS for payment processing, NIST 800–53 for federal systems, ISO 27001 for general information security management, HIPAA’s Security Rule for protected health information, the EU AI Act’s security provisions for high-risk AI systems, NIS2 for European critical infrastructure. These standards exist for genuinely good reasons and they do real work. They deserve to be treated honestly before they’re critiqued.

Consider a high-quality lock — a Medeco, an Abloy Protec, a Bowley. The lock is genuinely good security against the broad population of attackers who would otherwise enter through the door. Most burglars walk away when they encounter a serious lock. Most opportunistic intruders don’t know how to pick anything beyond a basic pin tumbler. The lock works. It works precisely because it raises the floor of competence required to defeat it, and most attackers don’t have that competence.

But the lock’s design is published. Locksport communities study every major lock model. Forensic locksmiths publish detailed analyses. Security researchers reverse-engineer mechanisms and document weaknesses. The same Medeco that defeats 99% of burglars is, to the small population of highly skilled lockpickers, a known mechanism with documented attack methods. The lock is simultaneously very effective against the broad attacker population and not effective against the narrow population of attackers who have specifically studied it.

This is the trade-off the standards make, scaled up to the regulatory level. PCI-DSS v4.0 specifies in operational detail which encryption algorithms are acceptable, what key management procedures are required, how network segmentation must be structured, what authentication factors are required for which kinds of access, how often vulnerability scans must be performed, what logging must capture, what incident response procedures must be in place, what testing methodologies must validate controls. The document runs about 360 pages. It is publicly available. Every payment processor at scale must comply with it. NIST 800–53 Rev. 5 catalogs over a thousand security controls across 20 families, in over 480 pages, freely downloadable. ISO 27001 specifies a management system with detailed control guidance, sold to anyone who buys the standard. HIPAA’s Security Rule is in the U.S. Code of Federal Regulations. The EU AI Act is in the Official Journal of the European Union.

The standards do genuinely good work for the population they serve. Most organizations face mostly opportunistic threats. The standards stop most opportunistic threats. The floor-raising effect for that population is large and important, and a thoughtful critique has to acknowledge this before naming the cost.

The cost is real, though, and it deserves to be named honestly. By the strict criteria of hypergame strategy in adversarial domains, the publication of standardized defensive doctrine is a major violation — surrendering information asymmetry, predictable response, ambiguous capability ceiling, and concealed blind spots in a single act. The defender’s strategic posture should be known only to the defender; the published standards make it known to anyone who downloads the PDF. The defender’s response function should not be derivable by the attacker; the published standards specify it in operational detail. The defender’s blind spots should not be identifiable from outside; the standards explicitly catalog what they cover and what they don’t, and academic critiques of the standards complete the gap analysis. The defender’s vendor and tooling choices should not be inferable; the standards constrain implementation enough that vendor markets converge on small numbers of solutions per sector.

The same documents the defender reads as “what we must implement” are read by sophisticated attackers as “what we must overcome.” An attacker reading PCI-DSS learns that cardholder data is segmented from the rest of the network, which means the attack path is to find the segmentation boundary and either cross it or operate through systems that legitimately bridge it. An attacker reading NIST 800–53 learns the control families and their typical under-implementation patterns. An attacker reading the EU AI Act learns what documentation high-risk AI providers must produce, which is also what external attackers now know is documented and may be queryable. The standards are not just leaking strategic information through patterns and predictability. They are formally constructed, regulatorily mandated, publicly distributed defensive doctrine. The hypergame violation is total, formal, and globally deployed.

And yet the trade-off is defensible. The population the standards serve mostly does not face sophisticated adversaries. Yes, lockpickers exist, but having a lock with high standards keeps out a high percentage of attackers. The standards stop the broad mass of opportunistic threats that would otherwise penetrate uncompliant systems. The cost — that they inform the smaller population of highly motivated, highly skilled attackers — is real, but the standardized defense was never going to be sufficient against that population anyway. The trade is rational for most defenders.

What is not rational is failing to recognize the trade has been made. Calling for more standardization and more public defensive publication — which is what Zhang’s framework implicitly does when it argues for treating defense as a first-class scientific contribution — entrenches the trade rather than recognizing its limits. The floor is mostly already raised by the existing standards regime. The ceiling is where the sophisticated AI security questions Zhang’s framework is implicitly about actually live, and standardization-style publication makes the ceiling worse, not better.

The publication imbalance Zhang measures looks different in light of this. Frontier labs publishing less defensive detail than they discover, civilian companies suppressing analytical findings, state actors keeping capabilities classified — these are not failures of academic incentives. They are partly correct responses to the lock-analogy logic. Publication informs sophisticated attackers; sophisticated attackers are the only attackers who can meaningfully threaten these organizations’ core operations; so publication trades floor-raising the organizations don’t need for ceiling-lowering that would damage their actual defense. The asymmetric trade favors non-publication for these defenders, in a way that is invisible to Zhang’s count.

Asymmetric warfare: attacking the categorization, not the defense

The standards-as-published-doctrine problem becomes most legible when read through the lens of asymmetric warfare. Conventional military doctrine is built around threat categorization. Soldiers are trained to recognize specific threat types and follow specific response protocols. The doctrine works because most threats fit into the categories the doctrine anticipates.

Sophisticated irregular forces deliberately attack the categorization itself. The body of a fallen comrade is, in conventional doctrine, a recovery objective with established procedures. Booby-trapping the body weaponizes the recovery procedure itself; the defender faces a choice between abandoning a categorically protected obligation or executing it under conditions the attacker has prepared for. Either choice is operationally costly, and the attack isn’t on the body or on the soldiers recovering it but on the category that says “bodies of comrades are objects to be recovered.” Once the category is unreliable, every future recovery operation must be conducted as if it might be weaponized, which permanently raises the cost of an entire class of operations.

The same logic applies to wedding ambiguity. Is this a wedding, a wedding-shaped gathering of combatants, a wedding being used as cover, or a wedding the attacker hopes will be struck for political reasons? Conventional doctrine relies on distinguishing combatants from civilians. When the attacker deliberately blurs these categories, the doctrine’s basic discrimination function breaks down. The cognitive burden isn’t “is this dangerous” but “is this even the kind of thing my doctrine knows how to evaluate.”

The same logic applies to environmental psychological pressure. A teddy bear hanging from a tree is not itself a threat. It’s a signal designed to be interpreted by the conventional force — possibly as warning, possibly as marker, possibly as bait, possibly as nothing. The conventional force cannot ignore it because anomalies are training-conditioned alarm triggers, but cannot reliably interpret it either. The cognitive load consumes attention. If it’s bait, the threat materializes from elsewhere. If it’s warning, ignoring it could be fatal. If it’s pure psychological pressure, responding teaches the adversary how the conventional force reacts.

These are attacks on the OODA loop — Boyd’s framework for the observation-orientation-decision-action cycle that underlies tactical decision-making. The irregular force’s deepest move isn’t striking targets directly. It’s attacking the categorization process by which the conventional force decides what is and isn’t a target. Once that process is unreliable, the conventional force’s material advantages become much less useful, because they can’t be reliably aimed.

The structural problem is that doctrine cannot incorporate these threats without losing its own coherence. If the doctrine says “all bodies are potential threats,” it has damaged the cultural and ethical fabric of the institution, created new categories the adversary can also attack (build up confidence that recoveries are safe, then reintroduce booby-trapped corpses at moments of operational consequence), slowed every recovery, and increased cognitive load across every soldier. If the doctrine says “all civilian gatherings may be combatant operations,” it has eliminated the civilian/combatant distinction, with legal, ethical, and operational consequences that produce the political backlash insurgents depend on for recruitment. The doctrine has to be selective about what it incorporates, and the irregulars exploit whatever is outside the selection.

This is what makes irregular warfare a structural rather than tactical problem, and it is exactly the structure civilian defenders find themselves in against sophisticated AI security threats. Compliance frameworks are conventional doctrine. Threats are categorized; responses are standardized; the universe of relevant threats is implicitly assumed to be enumerable and prepared for. Sophisticated attackers operate exactly like irregulars in the military analogy. They don’t attack the categories doctrine knows how to handle. They attack the categorization itself.

The trusted-channel attack — SolarWinds, 3CX, the various npm and PyPI poisoning campaigns — delivers the attack through channels the defender’s doctrine has categorically deemed safe. The attack isn’t on the defenses; it’s on the category of “trusted vendor.” The legitimate-credential attack uses stolen or socially-engineered credentials to operate inside the “authenticated” category, which the defender’s doctrine is committed to protecting. The benign-anomaly attack leaves deliberate but ambiguous traces that consume defender attention while the real attack happens through a channel that doesn’t trigger evaluation. The doctrine-aware attack studies the compliance framework, identifies the gaps, operates during audit windows when attention is lower, and uses techniques the standards don’t categorize as threats.

In all of these, the attack is not on the defender’s specific defenses but on the categorization framework the defenses depend on. And the defender’s response options are exactly the conventional military’s response options against irregulars: expand the doctrine (which creates new attack surfaces and operational costs), abandon the doctrine (which loses the institutional benefits standardization provides), or maintain the doctrine and accept that sophisticated attackers will operate inside its gaps. Most civilian defenders rationally choose the third option, because the first two are worse.

This means most civilian defenders are structurally accepting a certain level of compromise against sophisticated attackers in exchange for institutional functionality against unsophisticated ones. The compliance-bound civilian defender is essentially in the position of the conventional military force fighting irregular opponents — materially competent, doctrinally coherent, institutionally capable, and structurally vulnerable to opponents who deliberately operate outside the doctrine’s categorization framework.

False flag: when the defender’s response becomes the attacker’s weapon

The deepest hypergame move in this domain is false flag. A false flag operation is an attack on the identification function in the defender’s decision process. The defender observes an attack bearing the signatures of Adversary A. Doctrine says: respond as planned to Adversary A. The defender executes the standardized response. But the attack was actually from Adversary B, who wanted exactly this response — to draw the defender into conflict with A, to expose the defender’s anti-A playbook, to consume the defender’s resources on the wrong target, to damage the political relationship between defender and A.

The defender’s doctrine has been weaponized against them. The more disciplined and standardized the doctrine, the more reliably the false flag works, because predictable response is what makes the operation strategically valuable. An undisciplined defender who responds erratically is paradoxically harder to manipulate through false flag than a defender with mature standardized response protocols.

False flag is not exotic. The Gleiwitz incident in 1939, in which SS operatives staged an attack on a German radio station while dressed as Polish soldiers to provide pretext for invading Poland. Operation Northwoods in 1962, the U.S. Joint Chiefs’ proposal to stage attacks on American cities and attribute them to Cuba — rejected, but the fact that it reached formal proposal level shows that false flag was understood as a standard tool. Various Soviet active measures throughout the Cold War. The persistent attribution debates around chemical weapons use in Syria. Modern hybrid warfare incorporating attribution-confusion as a primary operational objective. In each case, the operation’s strategic value comes from the misattribution it creates, not from the direct effects of the attack itself.

The AI security mapping is particularly sharp. LLM-driven offensive operations have a property that makes false flag tractable: the signatures of who built the agent are increasingly separable from the signatures of who deployed it. An attribution capability that identifies “Claude-class agent behavior” or “GPT-class agent behavior” has identified the tool, not the operator. The operator can choose which tool to use specifically to shape attribution.

This produces several concrete false-flag dynamics. Tool-selection false flag: an attacker who wants the defender to think the attack came from a particular adversary uses tools associated with that adversary. Style mimicry false flag: sophisticated attackers shape attack patterns to match documented signatures of other actors, using publicly available attribution methodology as the engineering target. Compromise-mediated false flag: compromise the other actor’s infrastructure and conduct operations from it, so attribution lands where the attacker wants. Doctrine-aware false flag: engineer attacks that trigger specific standardized responses, where the response itself creates operational opportunity.

The implication for published defensive methodologies is acute. Publishing an attack capability burns the specific attack but doesn’t necessarily empower attackers; defenders can patch. Publishing an attribution capability burns the specific attribution and gives sophisticated attackers a tool for false-flag operations against defenders who rely on it. Whatever the published methodology says, sophisticated attackers can engineer operations that look like whatever the methodology expects to see for any given attacker. The defenders depending on the methodology are now misattributing systematically, and the false-flag operators are setting the agenda of who gets blamed for what.

This is the most acute version of the hunter principle. The hunter doesn’t email the prey. The defender doesn’t publish the attribution methodology, because publication makes the methodology a false-flag instruction manual. Frontier labs, threat-intelligence firms, and government attribution shops publish about specific incidents after the fact, but the underlying attribution methodologies are largely opaque, and the unpublished portions are deliberately so. The reason is structural: published attribution methodologies are weapons in the wrong hands.

What this means for Zhang’s prescription

Pulling the threads together: Zhang’s framework treats the publication imbalance as a fixable problem of academic incentives. The hypergame analysis says the imbalance is the visible signature of a multi-layer structure whose deeper layers are invisible to publication-based measurement by design.

The hunter principle says a class of defenses cannot be published without destroying them. Perception-dependent operations — honeypots, hunt-the-hunter, attribution capabilities, detection methodologies — lose their function the moment they enter public discourse. No incentive reform can move these into the visible literature without ending them.

The cryptography precedent says hidden capability layers are real, large, and durable. Decades of suppression are normal, not exceptional. The visible history of the field is the subset of the actual history that the public eventually rediscovers.

The actor-population asymmetry says most defensive work in the field is held by actors who don’t play the publication game — state actors, frontier labs, threat-intel firms, ISACs, the broad civilian sector, underground communities, criminal-side analysts, and most independent researchers. Their defensive work isn’t being suppressed from publication; it exists in forms that were never paper-shaped to begin with, or that operate inside closed sharing circles, or that depend on opacity for their value.

The board-mismatch observation says civilian companies are simultaneously playing two structurally different games. The classical competitive game is zero-sum and well-modeled by classical game theory; the hypergame against sophisticated attackers is multi-actor, partially hidden, and negative-sum at the sectoral level. Rational play on the classical board produces self-destructive play on the hypergame board, and most companies’ organizational machinery is optimized for the first board.

The standards section says the one place civilian defensive work is forced into public documentation produces exactly what hypergame theory predicts publication produces: maximum legibility, maximum predictability, maximum exposure of defensive architecture to sophisticated study. The standards do real work for the floor-raising population they serve. They do not solve ceiling-defense, and calling for more of them does not address what they were never designed to address.

The asymmetric warfare analogy says doctrinaire defenders are structurally vulnerable to opponents who attack the categorization itself rather than specific defenses. Standardization, which is what makes institutional response reliable, is also what makes it gameable.

False flag completes the picture: published response capability is not just predictable for sophisticated attackers; it becomes a weapon those attackers can aim at targets of their choosing. Some categories of defensive knowledge are not just costly to publish; they are actively harmful to publish, because publication makes the field worse defended, not better.

This produces the honest version of Zhang’s argument. The publication imbalance he measures is real, and the academic publication game in AI security is attack-heavy in ways that may genuinely harm the academic field’s epistemics — students see more attack papers than defense papers, defensive evaluation methodology develops more slowly, the discipline’s intellectual center of gravity sits on the offensive side. These are real costs of the imbalance, and Zhang’s prescription to improve academic incentives for defense publication might reasonably address them within the academic publication game.

But conflating “academic publication game imbalance” with “defensive capacity deficit” is the move hypergame theory specifically identifies as a category error. The actors who can fix the second problem are largely not playing the publication game in the first place. Their reasons for not publishing are not academic-incentive reasons; they are structural reasons grounded in perception-dependence, civilian-actor economics, the reputational physics of the soccer-trick-vs-multi-billion-dollar-vulnerability asymmetry, the category mismatch between operational infrastructure and research artifacts, the board mismatch between competitive and hypergame structures, the hypergame disadvantage that publication confers against sophisticated adversaries, and the false-flag risk that published attribution methodologies introduce. None of these dissolve under reformed academic incentives.

A mature treatment of AI security needs to name what’s actually happening. The visible publication imbalance is the visible layer of a structured equilibrium that includes hidden knowledge, hidden roles, perception-dependent operations, structurally non-publishing actors, regulatorily mandated floor-raising standards that simultaneously expose defensive doctrine to sophisticated adversaries, board mismatches between the games civilian companies play, and active attacker exploitation of all of the above. Hypergame theory predicts this structure in advance; the cryptography historical record confirms the pattern; AI security is the next domain operating inside it. Zhang’s framework can see the visible layer accurately and cannot see the deeper structure at all.

Limits and concessions

This argument has limits, and a fair article has to name them.

Hypergame theory is a diagnostic framework, not a prescriptive one. It explains why publication-based measurement systematically biases toward visible offensive work and against hidden defensive work. It does not provide a method for measuring the hidden layer. It does not let us claim specific suppressed capabilities exist in AI security today, only that the structural conditions for them exist. It does not tell academic researchers what to do differently in their own work, except to recognize the limits of what publication-based measurement can establish.

The cryptography precedent is suggestive rather than proof. The actor structures and strategic stakes are comparable, but the analogy is an analogy, not a measurement. We know hidden cryptographic capabilities existed because they were eventually revealed. We do not have the equivalent revelations for AI security yet, and may not for years or decades. The framework predicts they will eventually surface, partially, when public research independently catches up. Until then, the hidden layer remains inferred rather than observed.

The actor-asymmetry argument applies most strongly to civilian sectors and most weakly to academic AI security research itself. Academic researchers are precisely the actors for whom the publication game is the dominant game, and Zhang’s incentive-reform prescription might genuinely help inside that population. The argument is not that academic defense research is unnecessary or unhelpful. The argument is that academic publication-count is a biased proxy for defensive capacity overall, because most defensive capacity sits outside the population the measurement can reach.

The standards critique is calibrated, not absolute. PCI-DSS, NIST 800–53, ISO 27001, HIPAA, and the EU AI Act do real work for most defenders. The lock-analogy trade-off is a defensible trade-off. The hypergame violation is real and worth naming, but naming it is not an argument for repealing the standards. It is an argument for recognizing what the standards do and don’t do, and for not treating “more standardization and more public defensive publication” as straightforwardly better when the trade-off is already balanced for the population the standards serve.

The false-flag analysis applies to a subset of defensive operations, not to all defenses. Patching, hardening, training-data hygiene, evaluation methodologies, certain classes of robustness improvements — these can be published with manageable hypergame costs, and academic publication of this work has clear value. The argument is not that no defense should be published. The argument is that the most strategically consequential defenses, the ones that operate at the ceiling rather than the floor, have publication costs that scale with the sophistication of the defense and with the sophistication of the adversary it’s meant to address.

The article is offering a framework correction, not a refutation. Zhang’s measurement stands. His prescription, applied narrowly to the academic publication game, has reasonable scope. What does not survive examination is the inference from publication-count imbalance to field-level defensive capacity deficit, and the implied prescription that incentive reform of the academic publication game would meaningfully address the underlying security questions the field’s stated mission is implicitly about.

Closing

The cyber community has been measuring whether AI systems can be attacked. The privacy and security community has been measuring whether AI systems can be defended through published research. These two conversations have been treating publication as if it were a transparent window onto the field’s actual state. Hypergame theory, and the cryptography historical record, say publication is one layer of a multi-layer structure whose deeper layers are designed to be invisible from outside.

Zhang’s paper makes a careful contribution to understanding what’s visible in the academic literature, and the imbalance he measures is real. The structural conclusion he draws from it is not what the imbalance is evidence of. The imbalance is the expected signature of a field operating inside hypergame conditions — with hidden knowledge layers, hidden roles, perception-dependent operations, structurally non-publishing actors, regulatorily mandated standards that simultaneously raise the floor and lower the ceiling, civilian companies playing the wrong game on the wrong board, and sophisticated attackers exploiting the resulting structure through both standard and unconventional means. None of this is fixable by academic incentive reform alone, because most of it is not happening inside the academic publication game.

A mature security field should be honest about what it can and cannot see. The publication-imbalance measurement is what’s visible. What’s not visible is most of the field’s actual defensive capacity, by design, for structural reasons hypergame theory predicts and historical precedent confirms. Calling for more of the kind of publication the field already has does not address the deeper structure. Recognizing the deeper structure exists is the first step toward thinking clearly about what defensive AI security actually requires.

The hunters are not going to email the animals. The civilian companies are not going to publish their two-layer defenses. The state actors are not going to declassify their attribution methodologies. None of this is a failure of the publication game. It is the publication game working exactly as the underlying hypergame conditions require it to work. The question is not how to fix the imbalance. The question is how to think about defense in a field whose deepest defensive activity is structurally invisible to the methodologies we use to measure it.

A standing invitation

I would rather be corrected in public than be wrong in private. The novelty I’m claiming above is narrow on purpose, because the broader space has substantial prior art.

Hypergame theory has decades of literature behind it. It has been applied to cybersecurity before, including by Wayne Vane and others working on cyber conflict analysis. The general observation that defensive work goes unpublished is not new; Zhang himself concedes the industry-academia gap in Section 4.6, and the broader STS literature on the sociology of scientific publication has long recognized that fields select for what gets published in ways that bias what we know about them. The cryptography historical record is well-documented and widely discussed.

What I am claiming is the specific application: that hypergame theory, applied to Zhang’s publication-count methodology, with the cryptography precedent as supporting historical evidence, reveals that his framework systematically cannot see what it would need to see to support its structural conclusion, and that the publication imbalance is therefore the expected signature of a hypergame-structured field rather than evidence of the defensive capacity deficit he infers.

If a reader believes this specific application has prior art I have missed, I am asking that counter-claims meet four conditions:

Cite a specific source. Paper title, author list, venue, year, DOI or arXiv ID, link. No “I think someone presented this at a conference” — if it was a talk, link the recording or slide deck.

Quote the relevant passage. Copy the sentence, paragraph, or section that you believe makes the same claim. Section heading or page number. The reader should be able to verify the match without re-reading the entire source.

Specify what part of the narrowed claim is anticipated. The argument has three components: the application of hypergame theory to Zhang’s publication-count methodology specifically, the cryptography precedent as supporting evidence for the multi-layer structure, and the synthesis showing that the publication imbalance is the expected signature of hypergame conditions rather than evidence of a defensive deficit. A source that establishes one is interesting prior art; a source that establishes the combination would refute the novelty claim.

Distinguish prior work from adjacent work. General hypergame theory, general cyber-conflict applications of game theory, general critiques of academic publication norms, and Zhang’s own paper are acknowledged adjacent work, not refutations. Prior art has to anticipate the specific narrowed claim.

If a comment meets all four conditions and the cited source genuinely anticipates the narrowed claim, I will update the post with a correction, credit the commenter inline, and note the priority. If multiple sources do, I will cite the earliest. If a comment fails one or more conditions but raises an interesting adjacent point, I will engage with it as a related-work addition rather than as a refutation.

Comments are open.

This piece is published as a contribution to bridging the AI security publication-measurement literature and the hypergame analysis of adversarial domains. The underlying measurement work is Zhang’s. The historical record on cryptography is in the public literature. The hypergame theoretical framework belongs to Bennett, Fraser, Hipel, and the broader hypergame analysis tradition. The argument that these together reveal a structural limit to Zhang’s framework’s inferences is mine. If you cite or build on the synthesis argument, cite the post; for the underlying measurements, cite Zhang directly; for the hypergame framework, cite the original hypergame literature; for the cryptography precedent, cite Coppersmith 1994 and the GCHQ declassification.

— — — — —

relevant example: here I turned an attack paper into a defensive concept

First publication: LLM Agents Leave Fingerprints: A Case for Behavioral Attribution in the Age of…


Hunters Don’t Email the Prey: A Hypergame Reading of Zhang’s AI Security Publication Gap was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.