Gable Topics | BCBS 239 and Data Flow Lineage

Your catalog vendor says "we have lineage." Your data platform team says "we have dbt lineage." And your supervisors keep flagging gaps.

They're all telling the truth. The problem is that "lineage" means three different things, and most banks have only implemented one or two of them. Supervisors on both sides of the Atlantic are asking for all three, connected, at the field level, from source system to submitted figure. Until that chain is complete, every supervisory review will find the same hole.

‍

What supervisors are actually asking for

The word "lineage" doesn't appear in the original BCBS 239 text. But the principles make it structurally necessary. Principle 2 requires architecture that "fully supports risk data aggregation capabilities" (p. 9). Principle 3 requires aggregation on a "largely automated basis so as to minimise the probability of errors" (p. 12). Principle 4 requires the ability to "capture and aggregate all material risk data across the banking group" (p. 13). Principle 7 requires that "risk management reports are reconciled and validated" (p. 16). You cannot satisfy any of those without being able to trace a value from where it was created to where it was reported.

US regulatory basis

US regulators don't use the word "lineage" either, but the infrastructure requirements are explicit.

The OCC Heightened Standards (12 CFR Part 30, Appendix D, §II.J.1) require covered banks to maintain "data architecture and information technology infrastructure that support the covered bank's risk aggregation and reporting needs during normal times and during times of stress." You cannot support risk aggregation without knowing where data comes from and how it transforms. That's lineage infrastructure, stated as a binding obligation.

The Federal Reserve's SR 11-7 on model risk management reinforces the same expectation from a different angle. Models require documented input lineage, transformation controls, and quality thresholds as part of validation. If you can't trace the inputs to a model back to their sources through every enrichment, join, and aggregation, you can't validate the model. SR 11-7 makes that a supervisory expectation for every model in the inventory.

What happens when that infrastructure doesn't exist? In March 2024, the OCC fined JPMorgan Chase $348.2 million for failing to surveil trading activity across more than 30 global venues. The core failure was aggregation: the bank couldn't consolidate and trace data flows across its trading systems. That's a lineage problem dressed up as a surveillance finding. When you can't connect data from point of origin to point of use across systems, you can't surveil, you can't aggregate, and you can't report.

US law requires the infrastructure. The ECB tells you exactly what it should look like. Together they define the standard.

The ECB's detailed articulation

The ECB has published the most detailed public articulation of what lineage should look like in practice, and these expectations inform supervisory thinking globally, including in the US. The ECB's 2024 RDARR supervisory guide stopped being implicit about it:

"A consolidated IT infrastructure is essential for consistent data aggregation, eliminating manual adjustments, and ensuring comprehensive data lineage." (p. 16)

"A complete and granular lineage needs to be ensured." (p. 16)

"A common data architecture supports the creation of end-to-end data lineage." (p. 17)

And the granularity expectation is specific: lineage should exist at "data attribute level" (ECB RDARR Guide, p. 16). Not table-to-table arrows. Not process diagrams. Individual fields, traceable through every transformation.

By the February 2025 supervisory newsletter, the ECB was cataloguing what they were still finding: "data lineage and data taxonomies insufficient," banks "too reliant on weakly controlled manual workarounds," and "many frameworks do not cover the entire data aggregation process from data capture to final reporting."

From data capture to final reporting is the scope statement. It doesn't start at the warehouse. It starts at the source system.

Industry analysis reinforces the trajectory. As SAVE Consulting Group summarized in their coverage of the January 2026 ECB update, "prudential compliance is no longer a downstream verification on reporting, but a significant assessment of the processes that generate regulatory information." Banks must "document the procedure that leads from the original data to the submitted figure."

That's the practical definition of end-to-end lineage in a BCBS 239 context. Whether you're responding to the OCC, the Fed, or the ECB, the question is the same: can you trace any reported value from where it was first created to where it was submitted? And that's why "we have lineage" keeps not being enough.

Three tiers of lineage

Lineage is often discussed as a single capability. In practice, it's three distinct tiers, each covering a different segment of the data path. They get conflated in vendor messaging constantly, which is how banks end up with coverage metrics that look strong and supervisory findings that say otherwise.

Tier 1: Table and storage lineage. This is metadata-based, table-to-table lineage, the kind most catalogs and warehouse tools provide out of the box. Dataset A feeds dataset B, inferred from ETL orchestration metadata or warehouse dependency graphs. It's useful for ownership, discovery, and high-level impact analysis. It is also coarse- grained. It doesn't answer field-level derivation questions, and it says nothing about what happened to a value before it arrived in the platform.

Tier 2: Database-level column lineage. This tier parses transformations inside the data platform: SQL statements, dbt models, Spark jobs. Atlan is a representative example. It traces which source columns feed which output columns, where filters and joins and aggregations were applied, and how reporting marts derive their values. This is genuinely valuable for understanding aggregation logic and transformation traceability within the platform.

But it starts after data enters the platform. It parses SQL, dbt, and Spark, the languages of the data warehouse and analytics layer. It cannot see into the application code of the source systems that generated the data in the first place. If a field was transformed, defaulted, enriched, or truncated before it landed in Kafka or a CDC stream or a staging table, database-level lineage sees only the post-ingestion state. The first leg of the journey, source system to platform boundary, is invisible.

Tier 3: Source-code-level lineage. This tier parses the application source code that produces data at the point of origin: the Java services, Kotlin microservices, Go event producers, and Python applications running in core banking, origination, payments, and trading systems. It captures the first leg: how data is created, serialized, mapped, and

published from producer logic to platform ingress. Because it's tied to version control and deployment workflows, it can detect breaking changes at the point a developer proposes them, before they reach production. Gable operates in this tier.

These tiers aren't competing approaches. They're layers in a stack. Table lineage gives a broad map. Database-level lineage explains in-platform field derivations. Source-code lineage explains where values originate and how producer- side changes alter downstream meaning. Together, they produce what supervisors, from the OCC to the ECB, are asking for: end-to-end, attribute-level lineage from original data to submitted figure. Missing any tier leaves a gap in the chain.

‍

Why database-level lineage alone doesn't close the loop

If you've invested in database-level column lineage (and you should have), you've addressed the middle and downstream segments of the data path. That's significant. But it's also why supervisors are still finding gaps, because the upstream segment remains uncovered.

The core limitation is architectural, not a product shortcoming. SQL parsers, dbt model analyzers, and Spark lineage extractors work on the artifacts of the data platform. They answer the question "how did this warehouse column get computed?" They cannot answer "where did the input value actually come from in the source system, and has anything about its meaning changed?"

That matters in several concrete ways.

First, it means you can't fully trace to original data. When the ECB says lineage must run from "data capture to final reporting," they mean from the application that first recorded the value. When the OCC requires infrastructure that supports risk aggregation, they mean the full path, not just the warehouse segment. If the source system is a Java service that serializes a trade event into Kafka, the lineage needs to include that serialization logic. Database-level lineage picks up the story only after the event lands in a staging table.

Second, source-system schema changes are invisible until they've propagated. A developer renames a field, changes a type, or alters an enum in a Go service. Database-level lineage might detect breakage downstream after the change reaches the warehouse, but it can't catch it at the pull request where the change was introduced. By then, the damage is in production. The JPMorgan surveillance failure illustrates the systemic version of this problem: when data flows across dozens of systems without traceable connections, individual field-level changes compound into aggregate blind spots.

Third, producer-consumer contracts defined by application behavior aren't enforceable from the database layer. If the contract between a source service and the data platform is implicit (encoded in serialization logic, field mapping, and schema assumptions), SQL lineage has no mechanism to detect violations. The violations are introduced in service code, not in warehouse SQL.

Fourth, change-time controls are weaker. Principle 3 of BCBS 239 emphasizes automation to minimize error probability. Controls that run only after ingestion are later and noisier than controls embedded in CI/CD where the producing team proposes a change. Catching a breaking schema change in a pull request is fundamentally different from detecting downstream data quality anomalies days later.

Fifth, triage under stress depends on tribal knowledge. Principle 6 requires reports to be "adaptable to the needs of its recipients… including requests during stress/crisis situations" (BCBS 239, p. 15). When a supervisor asks for rapid tracing under stress, teams need visibility from the reporting output back through platform transformations and all the way to source capture logic. Without source visibility, that investigation depends on engineers who happen to know the system. That's exactly the kind of "weakly controlled manual workaround" the ECB keeps flagging.

None of this makes database-level lineage less important. It makes it insufficient on its own. The gap between "we have lineage in our data platform" and "we can trace any reported attribute back to its point of origin" is exactly where supervisory findings land.

‍

What source-code lineage adds to compliance outcomes

Source-code lineage isn't a theoretical refinement. It directly supports the compliance metrics supervisors are measuring, whether you're responding to OCC Heightened Standards, Fed SR 11-7 model validation expectations, or ECB RDARR findings.

Contract discipline improves data quality indicators. When producer schemas and data contracts are validated at change time, in CI/CD before deployment, common DQI failures get prevented earlier: unexpected nulls, type drift, semantic field misuse, undocumented enum growth. Principle 7's reconciliation and validation outcomes improve because there's less unexplained variance between expected and observed values at the reporting layer.

Investigation cycles shorten. Linking a changed field to a specific commit, service, and owning team means "where did this number change?" becomes a lookup instead of a multi-day investigation. That directly supports Principle 6 adaptability, both for ad hoc supervisory requests and for stress-period reporting where hours matter. Under SR 11-7, it also means model input tracing becomes a queryable path instead of an archaeological expedition.

End-to-end coverage becomes auditable. Source-code lineage combined with database-level lineage produces traceable evidence across the full path: capture, transformation, aggregation, reporting. That's the chain the ECB describes and the infrastructure the OCC requires, and it's testable. Not a slide deck claim, but a queryable graph with version history.

Automation evidence gets concrete. Automated checks in CI/CD (policy-as-code gates, pull request validations, deployment logs) produce objective control artifacts that are straightforward to present during supervisory reviews. They demonstrate that controls are embedded in engineering workflows, not bolted on after the fact. For Principle 3, that's the difference between claiming automation and proving it.

This connects directly to the broader enforcement trajectory Claudia Buch and Elizabeth McCaul described when they wrote that "adequate RDARR capabilities… are still the exception". The banks that move from exception to standard will be the ones that can show the full chain, not just the middle of it.

‍

Closing the gap: what to evaluate

If your institution has table lineage and database-level column lineage in place, the remaining gap is the first leg: source systems to platform boundary. Here's what matters when evaluating how to close it.

Application code parsing, not only SQL parsing. The solution needs to parse producer code in the languages your source systems actually use (Java, Kotlin, Go, Python), not just SQL, dbt, and Spark. Those are different tiers solving different problems.

First-leg coverage. Confirm it captures source system behavior through ingestion interfaces so lineage starts at data capture, not at warehouse landing. This is the specific gap the ECB's "data capture to final reporting" language describes, and the infrastructure gap the OCC Heightened Standards require you to close.

Version control and CI/CD integration. Controls should run where changes are proposed, in pull requests and deployment pipelines, so schema drift and contract violations are detected before release, not after production impact.

Data contract enforcement. Look for enforceable producer-consumer contracts with compatibility policies, break detection, and ownership routing. The implicit contracts encoded in serialization logic need to become explicit and testable.

Interoperability with existing lineage. The goal is not to replace your database-level lineage investment. The source- code tier should complement it, connecting the origin story to the transformation story to produce a complete chain.

Attribute-level trace paths. Supervisors expect field-level granularity. The solution should provide individual field lineage from source through every transformation to reporting output.

Audit-ready artifacts. Change history, policy decisions, lineage graphs, and contract validation logs should be exportable for internal audit and supervisory dialogue. The evidence needs to survive examination, not just satisfy an internal checklist.

The target architecture is straightforward: keep database-level lineage for transformation and reporting logic, add source-code lineage to close the origin gap, and connect them. That combined model aligns directly with what BCBS 239 principles require and what supervisors from Washington to Frankfurt are actively examining.

BCBS 239 and Data Flow Lineage

‍What Is BCBS 239?

Improving BCBS 239 Compliance

BCBS 239 Compliance Metrics: What to Measure, How, and Why

BCBS 239 in 2026: The Regulatory Shift from Principles to Proof