The Authorization Gap, Closed: A Practitioner’s Blueprint for TMF 672, OpenFGA, and Claude as an AI-Native Authorization Layer

Fourth and final in a four-part series: ReBAC Meets BSS, A Practitioner’s Blueprintfor AI-Native Role-Based Access in Telco

Four weeks ago I described a failure mode taking shape quietly inside enterprise AI deployments. Not a dramatic system outage. Not a security breach with a clear perimeter. A structural gap between when a party’s permission changes and when the authorization systems acting on behalf of that party reflect that change.

I argued that this gap is not new. It has always existed in enterprise authorization architectures. What is new is that AI agents inherit it and operate on it at speed and at scale, without the contextual judgment a human operator would apply to bridge it.

Over the past three weeks I have walked through the technical foundation of a response to that gap. This fourth and final article draws the architecture together, reflects on the design decisions that shaped it, is honest about where the build currently stands, and provides a working manual proof of concept that anyone can run today.

The Series in Four Sentences

Article 1 named three anti-patterns worth designing against: the over-permissioned agent, the under-permissioned agent, and the stale-permission agent. Article 2 introduced the two technical primitives that belong at the foundation of a solution: TMF 672 as a PermissionSet notification event source and OpenFGA as a relationship-based authorization graph. Article 3 showed what belongs between them: a hybrid orchestration layer with a deterministic fast path for known transitions and a Claude reasoning slow path for novel ones, running inside your AWS security boundary with party identity data never crossing to external infrastructure. This article closes the series with the implementation blueprint, the key design decisions, a manual proof of concept, and an honest account of where the build is.

Why TMF 672 and OpenFGA Belong at the Foundation

Before walking through the blueprint, I want to restate why these two specific primitives anchor the architecture rather than alternatives, and be precise about what TMF 672 actually is and what events it publishes.

TMF 672 as the Single Authoritative Source

TMF 672 is the User Role Permission Management API. Its primary function is managing the lifecycle of PermissionSets granted to Security Principals across a BSS landscape.

A few definitions from the v5.0.1 specification are worth stating precisely because they shape how the architecture reasons:

A PermissionSet is a set of permissions granted to a Security Principal. It may be granted explicitly by an authorized user or acquired implicitly through Party Role assignment. It has a validity period.

A PermissionSpecification is a definition of a permissible action on a function. Action is Read, Write, ReadWrite, or a domain-specific string such as Resell or Manage. Function is the entity class name, such as CustomerAccount, EnterpriseBroadband, or FaultManagement.

A SecurityPrincipal is either a human Individual or an autonomous software process defined as a Resource. AI agents are Security Principals in TMF 672 terms. This is not an extension of the standard. It is the standard’s own definition.

TMF 672 publishes three notification event types that this architecture acts on: PermissionSetCreateEvent when a new PermissionSet is granted, PermissionSetChangeEvent when an existing PermissionSet is modified, and PermissionSetDeleteEvent when a PermissionSet is terminated.

Manageable Assets as Event Sources

The critical architectural point is that TMF 672 PermissionSet events are not limited to direct administrative assignments. They are also triggered by state changes in Manageable Assets across the BSS landscape.

A Manageable Asset is the realisation of something that can be used and managed by users. The TMF 672 v5.0.1 specification defines these explicitly: resources created as part of a purchased product, service instances provisioned under a product, blocks of personal data, eCare system registrations, digital service platform accounts, IoT devices, and home gateways.

Readers of Article 2 may recall the hair dryer that won a weatherman’s bet. It did not need to understand meteorology. It just needed to be plugged in at the right moment. The IoT authorization problem is the same pattern made serious: a building controller whose operational state changed, a 5G device transitioning between zones with different regulatory constraints, or a smart meter reporting a tampered state. Each is a Manageable Asset. Each state change carries authorization intent. And when that state change affects the permissions of a Security Principal associated with that asset, TMF 672 publishes the resulting PermissionSet notification event.

The translation layer in this architecture does not need to understand what triggered the PermissionSet event upstream. Whether the event originated from a direct Party Role assignment, a consent asset being withdrawn, an account asset reaching a credit limit, or a contract asset expiring, the translation layer sees one thing: a TMF 672 PermissionSet notification event. The complexity of what triggered it is upstream and out of scope.

This single-source design has three direct architectural benefits.

First, it is cleaner. The translation layer has one event source, one schema, and three notification event types to handle. There is no aggregation layer required because TMF 672 has already normalised the permission implications of upstream asset state changes into standardised PermissionSet events.

Second, it is more auditable. Every PermissionSet event entering the translation layer carries the PermissionSpecification function and action fields that describe precisely what capability is being granted or revoked and on which Manageable Asset. The audit trail from asset state change to authorization graph update is contained within the TMF 672 event payload itself.

Third, it is more precisely aligned with how TM Forum ODA actually works in production. Operators who have adopted ODA already have TMF 672 in their BSS landscape, managing PermissionSets for parties across their estate. This architecture is not asking them to add new infrastructure. It is asking them to act on what TMF 672 is already publishing.

Why OpenFGA Is the Right Authorization Primitive

OpenFGA earns its place for a different reason. Traditional RBAC models authorization as a flat assignment: a user has a role, and the role has permissions. Telco authorization is not flat.

A reseller organisation manages enterprise customers who own product Manageable Assets available in markets governed by regulatory frameworks. An AI agent SecurityPrincipal, holds a PermissionSet granted by an individual over specific Manageable Assets with a defined validity period. A service Manageable Asset in a restricted state carries different permission implications than the same asset in an active state. These are all graphs of relationships, not flat role assignments.

OpenFGA models that graph natively, express it with precision, and answers authorization queries by traversing the full relationship chain in real time. It is also writable via API, which is the property that makes a programmatic reasoning layer possible.

Together, TMF 672 and OpenFGA form a foundation that most Tier-1 operators already have in production on the input side and can adopt with minimal infrastructure overhead on the output side. The architecture is not asking operators to replace their BSS landscape. It is asking them to extend the output of the TMF 672 PermissionSet event stream into a dynamic, real-time authorization graph that AI agents can query before every action.

The Blueprint

The complete architecture in plain language before the diagram.

A TMF 672 PermissionSet notification event arrives. It may be a PermissionSetCreateEvent granting new permissions, a PermissionSetChangeEvent modifying existing permissions, or a PermissionSetDeleteEvent terminating a PermissionSet. It may have originated from a direct administrative assignment or from a Manageable Asset state change. From the translation layer’s perspective, the origin is irrelevant. TMF 672 is the single authoritative input.

The event is published to an event stream, either Apache Kafka or AWS EventBridge, depending on the operator’s existing infrastructure. The orchestration service consumes the event, validates it against the TMF 672 schema, and performs two enrichment steps before making any routing decision.

Enrichment step one: the orchestration service reads the current OpenFGA tuples held by the affected Security Principal. This gives the reasoning layer the context to understand what needs to change, not just what the new state should look like.

Enrichment step two: the orchestration service retrieves the market and regulatory flags applicable to this PermissionSet event.

Only after both enrichment steps are complete does the routing decision occur. The orchestration service constructs a TransitionKey from four fields derived from the enriched event: previous PermissionSet state, new PermissionSet state, market, and regulatory tier. It checks this key against a rules cache.

In a fresh deployment, the cache is empty, and every event goes through the slow path. This is correct and expected. The cache populates through operational learning over time.

If the TransitionKey exists in the cache with a promoted confidence value of 0.99, the fast path fires. Pre-validated tuple mutations are served directly to the OpenFGA Write endpoint with the Security Principal ID substituted at runtime from the current event. The transaction completes in milliseconds. Claude is never invoked. The audit trail records the fast path execution.

If the TransitionKey is novel or not yet in the cache, the slow path fires. The enriched event is sent to Claude on AWS Bedrock via a VPC private endpoint. Claude receives the PermissionSet notification event, including the PermissionSpecification function and action fields, the current OpenFGA relationship schema injected at runtime, and the market and regulatory flags. It reasons about which tuple mutations are required and returns structured output: the mutations, a plain language justification for each decision referencing the specific PermissionSpecification, a confidence score, and any regulatory flags raised.

A structural validation step confirms that every proposed tuple type exists in the OpenFGA schema before the output reaches the human review gate. The gate routes on two signals: Claude’s runtime confidence score and the structural validation result. High confidence with no flags proceeds to a lightweight review. Lower confidence or regulatory flags route to deeper review. Low confidence or high severity flags block pending senior assessment.

Approved outputs write to OpenFGA atomically. The complete audit record, the original TMF 672 PermissionSet event, Claude’s reasoning, the mutations applied, the confidence score, and the reviewer’s approval is logged to CloudWatch alongside the Bedrock inference record.

Approved slow path outputs for novel events increment an approval count for their TransitionKey. At the promotion threshold, the pattern moves to the fast path rules cache with a promoted confidence of 0.99. The system learns without retraining. The proportion of events requiring Claude inference decreases as the deployment matures.

AI agents query the OpenFGA Check endpoint before every action they take on behalf of a Security Principal. Not at session initialisation. Not once per day. Before every action. This is what closes the stale-permission anti-pattern. The agent operates on the live authorization graph, not a cached snapshot.

The Key Design Decisions

Every architecture involves decisions that could have gone differently. Here are the four that shaped this one most significantly.

Decision 1: TMF 672 as a single source rather than direct Manageable Asset event consumption

The temptation when designing an event-driven authorization architecture is to consume directly from the upstream Manageable Asset state change events: the billing system, the consent platform, the contract management layer, each feeding the translation layer directly.

The single source approach through TMF 672 PermissionSet events is architecturally correct for three reasons. TMF 672 has already normalised the permission implications of upstream asset state changes into standardised PermissionSet events with PermissionSpecification function and action fields. Consuming directly from upstream systems would require the translation layer to replicate that normalisation logic. And TMF 672 as the single authoritative publication point means the audit chain is complete: the PermissionSpecification fields in the event describe precisely what capability changed and on which Manageable Asset, without requiring the translation layer to reach back to the upstream source.

Decision 2: Enrichment before routing, not routing on raw event reception

The routing decision cannot be made on the raw TMF 672 event alone. The TransitionKey requires four fields: previous PermissionSet state, new PermissionSet state, market, and regulatory tier. The market and regulatory tier fields come from the enrichment step, not the raw event. Routing before enrichment would produce an incomplete TransitionKey and unreliable cache lookups.

The enrichment step also populates Claude’s context on the slow path. Without knowing the current OpenFGA tuples held by the affected Security Principal, Claude cannot reason about what needs to change. It can only reason about what the new state should look like, which is insufficient for producing correct atomic delete and create mutation sets.

Decision 3: Hybrid fast path and slow path rather than pure AI reasoning

The temptation when designing an AI-native architecture is to route everything through the reasoning layer. It is simpler to build and sidesteps the complexity of maintaining a rules cache.

The hybrid approach is harder to build but architecturally correct for three reasons. It is significantly cheaper at scale: the majority of PermissionSet events in a mature deployment will be served from the fast path at negligible inference cost. It is faster: the fast path completes in milliseconds, whereas the slow path involves inference latency. And it is more trustworthy: the fast path serves human-verified mutations rather than AI-generated ones, providing a stronger basis for authorization decisions that carry the highest compliance stakes.

Decision 4: Schema injection at runtime rather than fine-tuning

The most architecturally significant prompt engineering decision in this architecture is injecting the OpenFGA relationship schema into the Claude system prompt at runtime rather than attempting to fine-tune a model on a specific deployment’s authorization model.

Schema injection means the reasoning layer is immediately portable across different OpenFGA deployments with different relationship models. A different operator with a different schema gets the same architecture with a different system prompt. No retraining. No fine-tuning. No model management overhead. The reasoning layer adapts to the deployment it is operating in rather than requiring the deployment to conform to a pre-trained model. It also means Claude is constrained to produce only tuple types that exist in the schema, which is the primary defence against hallucinated tuple mutations.

Rather than wait for the full code implementation to be production-quality before sharing anything, I have published a manual proof of concept as a GitHub Gist that anyone can run in approximately thirty-five minutes with no code, no infrastructure, and no cloud spend.

The Gist contains eight files: a README with TMF 672 v5.0.1 terminology reference and setup instructions, a Claude system prompt to copy directly into Claude.ai, four example TMF 672 PermissionSet notification event files, an OpenFGA authorization model for the Playground, and a step by step demo walkthrough with expected outputs.

The four event files use the correct TMF 672 v5.0.1 event types and payload structure: PermissionSetCreateEvent for the reseller grant scenario, PermissionSetDeleteEvent for the AI agent consent revocation scenario, PermissionSetChangeEvent for the credit limit restriction scenario, and PermissionSetDeleteEvent for the partner contract expiry scenario. Each event carries PermissionSpecification function and action fields. The AI agent scenario correctly models the agent as a SecurityPrincipal with referredType Resource, which is the precise TMF 672 v5.0.1 framing for autonomous software process actors.

The workflow is straightforward. Paste the system prompt into Claude.ai. Paste an event file as your next message. Claude returns the tuple mutations with reasoning that references the specific PermissionSpecification and a confidence score. Paste those mutations into the OpenFGA Playground at play.fga.dev. Run authorization checks before and after to observe how the authorization graph changes.

Github Gist

The Gist link [Readme.md] provides all manual demo resources.

This manual proof of concept demonstrates the slow path reasoning. What it defers to the full code implementation is the automated event ingestion, the Go orchestration layer with enrichment and TransitionKey construction, the fast path rules cache with Security Principal ID substitution at runtime, the AWS Bedrock integration, the Step Functions review gate, and the persistent OpenFGA store. The reasoning pattern, the system prompt design, and the tuple mutation outputs are identical between the manual proof of concept and the production architecture.

What the Full Production Implementation Requires

For practitioners considering this architecture in an enterprise context, here is an honest account of what the proof of concept defers and what a production implementation would need to address.

TMF 672 event stream integration: The proof of concept uses manual JSON inputs. A production implementation connects to the live TMF 672 PermissionSet event stream. The orchestration service subscribes to PermissionSetCreateEvent, PermissionSetChangeEvent, and PermissionSetDeleteEvent notifications. The event stream is fed by whatever upstream Manageable Asset state changes trigger PermissionSet mutations in the operator’s BSS landscape. The translation layer does not need to know or care about those upstream sources.

Event infrastructure: A production implementation replaces manual JSON inputs with a Kafka or AWS EventBridge consumer. The orchestration service and reasoning layer components are identical. The event ingestion mechanism changes.

Enrichment layer: The production orchestration service enriches each event before routing by reading the current OpenFGA tuples for the affected Security Principal and retrieving market and regulatory flags. The enrichment step is a precondition for TransitionKey construction and for meaningful Claude reasoning.

AWS Bedrock migration: The proof of concept uses Claude.ai directly. A production implementation uses AWS Bedrock with a VPC private endpoint so that Security Principal identity data never leaves your cloud security boundary. The system prompt and reasoning output structures are identical. Only the transport and authentication mechanism changes.

Persistent OpenFGA store: The proof of concept uses an in-memory OpenFGA instance that resets on container restart. A production implementation uses a PostgreSQL or MySQL backed store with appropriate backup and recovery procedures.

Confidence threshold calibration: The 0.70 confidence threshold in the proof of concept is a starting heuristic. A production implementation calibrates this empirically against a sample of real PermissionSet events, observing where Claude’s self-assessed confidence correlates with human reviewer decisions.

Batching for large mutation sets: OpenFGA has a default maximum of 10 tuple operations per write request. Complex PermissionSet deletions generating more than 10 mutations require batching with compensating transaction logic to handle partial failures.

Step Functions review gate: The proof of concept implements a simple confidence threshold gate. A production implementation replaces this with an AWS Step Functions state machine providing a proper approval workflow, timeout escalation, and audit logging.

What This Architecture Is and Is Not

I want to close the series with a precise statement of scope because the architecture has been described across four articles and precision matters.

This architecture is an authorization infrastructure pattern that uses AI reasoning for a specific, bounded function: translating TMF 672 PermissionSet notification events into OpenFGA tuple mutations when deterministic rules are insufficient. It is not a general-purpose AI agent deployment framework. It is not a replacement for existing IAM or identity management systems. It is not a claim that AI reasoning should be introduced into every authorization decision.

The AI layer earns its place in this architecture because it is doing something the system cannot do without it: reasoning about the authorization implications of novel PermissionSet events that no rules engine has anticipated, whether those events originated from a commercial contract expiry, a consent asset withdrawal, a billing threshold breach, or an IoT Manageable Asset changing operational state. Remove it and the system falls back to the manual processes described in Article 1. The same trouble tickets. The same propagation lag. The same authorization drift that AI agents inherit and amplify as SecurityPrincipals acting on stale PermissionSet state.

That is the precise scope. No more and no less.

The Broader Observation

The authorization gap described in this series is not a telco-specific problem. It is a telco-specific manifestation of a problem that exists in every enterprise deploying AI agents into complex, multi-role environments where SecurityPrincipal permissions change constantly and authorization state struggles to keep pace.

TMF 672’s PermissionSet model, a clean separation of what is permitted (PermissionSpecification), how permissions are grouped (PermissionSpecificationSet), and what has been granted to whom (PermissionSet), is an industry-curated answer to something genuinely complex. It is also a transferable pattern. Healthcare systems managing patient consent and clinician PermissionSets. Financial services platforms managing customer tier changes and advisor access grants. Public sector environments managing citizen service entitlements and case worker permission assignments.

Every industry has Manageable Assets whose state changes carry authorization intent. Every industry has Security Principals, both human and autonomous, whose PermissionSets need to reflect those changes in real time. Every industry has the same gap between when a permission changes and when the systems acting on that permission reflect the change.

The architecture described in this series closes that gap for telco. The pattern is transferable everywhere the gap exists.

The authorization gap is everywhere. The pattern for closing it is here.

A Final Question for the Series

Over the past four weeks the closing question in each article has generated the most substantive conversations in the comments. This is the last one.

If you are designing or evaluating authorization architectures for AI agent deployments in your own environment, what is the specific constraint making it hardest to solve? The data boundary question, the explainability requirement, the manual PermissionSet authoring bottleneck, Manageable Asset event integration complexity, or something else entirely?

I am building the full code implementation and the answers to that question will directly inform what I prioritise and what I document when I share it.

The comment section is open. So is my inbox.

Soumit Saha is a Digital Platform and Technical Architect with 25 years of experience in telco, cloud, and enterprise integration. He has led the adoption of TM Forum Open APIs across multiple markets and holds TOGAF 9, ODA Practitioner, and AWS Cloud Architect certifications. This series represents his personal architectural thinking and does not reflect the views or systems of any employer.

The build continues. The manual proof of concept is live now. The full code implementation follows. Follow along for updates.

Originally published at https://www.linkedin.com.


The Authorization Gap, Closed: A Practitioner’s Blueprint for TMF 672, OpenFGA, and Claude as an… was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.