The Last Auditor: How Foundation Models Will Audit ESG From Orbit. The Climate Brief

The official theory of ESG measurement is that companies tell us what they emit, what land they hold, what supply chains they depend on, what climate risks they run, and a system of disclosure standards and third-party audit verifies the telling. Trust, sample-check, aggregate, score. CSRD, TCFD, SASB, the GHG Protocol, the Science Based Targets initiative: each is a different scaffolding around the same load-bearing assumption, which is that the company is the unit of measurement and the disclosure is the data.

The official theory is becoming an antique. Not in five years. Now.

Through 2024, 2025, and into 2026, a wave of geospatial foundation models has produced a measurement infrastructure for the physical world that does not depend on disclosure. NASA and IBM shipped Prithvi-EO-2.0 in December 2024. Google DeepMind shipped AlphaEarth Foundations a few months later, with 1.4 trillion 10-metre embeddings of the planet per year, sitting free on Google Earth Engine. IBM and ESA shipped TerraMind on Earth Day 2025. The Allen Institute shipped OlmoEarth in November 2025. The release cadence at the start of 2024 was roughly one per year. By 2026 it is closer to one per week.

These releases did not coordinate. They are about to be the same story. The story is that the auditor of last resort for ESG measurement is moving from a human profession to a satellite layer, and the firms that build against the new layer first will set the methodology that the rest of the industry adopts.

This piece is original analysis of public material. I read the model release announcements and technical papers from NASA, IBM, Google DeepMind, ESA, and the Allen Institute. I read the community benchmark debates, including the increasingly common observation that Earth observation embeddings are having their "pre-MTEB moment" where every model claims state of the art on its own preferred suite. I did not consult sector specialists on background. The wave is documented enough that the structural argument can be made from public material alone.

Here is the case for treating the wave as the new last auditor for ESG, in five moves.

1. The cadence shift is the story, not any single model. 2. The architectures differ; the output structure does not, and that is what matters. 3. ESG measurement gains three things historically denied to it: independence from disclosure, continuous global coverage, asset-level resolution. 4. What the wave does not yet solve: benchmark fragmentation, planetary-scale storage cost, and a regulatory framework that has not caught up. 5. The capital implication: satellite-derived verification is entering ESG analytical stacks faster than internal infrastructures expect, and the methodology that ships first will become the default.

We take them in turn.

Part I. The cadence is the story

When NASA, IBM, and Jülich shipped Prithvi-EO-2.0 in December 2024 at 600 million parameters, the announcement read as a normal release. Six times bigger than the 2023 version. An eight-point improvement on GEO-Bench. Open source on Hugging Face. A step in a steady technical maturation.

The casual reader of model announcements stopped there.

The casual reader missed the fact that the same year had seen Google DeepMind ship AlphaEarth Foundations with global annual embeddings from 2017 to 2024. The casual reader missed Clay, Satlas, SatCLIP, DOFA, Galileo, SatMAE, Presto, ScaleMAE, SpectralGPT, each released through the same window. April 2025 brought TerraMind, an any-to-any multimodal model from IBM and ESA that outperformed the existing twelve models on PANGAEA by eight per cent or more. November 2025 brought OlmoEarth from the Allen Institute, four open model sizes shipping with a platform. The casual reader, reading the announcements one at a time, saw a normal year for Earth observation research.

The aggregate is not normal. The aggregate is that the cadence of foundation-model release went from one or two per year before 2024, to monthly through 2025, to weekly across academic and commercial labs in 2026. A community-tracker in May 2026 captured the moment with the observation that Earth observation embeddings are having their "pre-MTEB moment", in reference to the 2022 text embedding benchmark that finally let buyers compare text models head to head. The wave is real. The wave's organisers know it is real. The downstream industry has not yet caught up.

For ESG, the cadence matters because each model that ships expands the surface of what the satellite layer can measure. A single high-quality global product, like AlphaEarth's annual embeddings or Prithvi's HLS-based time series, is enough to host a generation of downstream tools. Half a dozen is enough to make satellite-derived measurement a routine input to any analytical workflow.

That is the shift the casual reader is missing.

Part II. The output is what matters, not the architecture

The wave's models do not agree on much architecturally. AlphaEarth produces 64-dimensional embeddings as annual composites at 10-metre resolution. Prithvi ingests Harmonized Landsat-Sentinel time series at 30 metres. TerraMind generates across nine modalities and explicitly trains for any-to-any translation between them. TESSERA ships 128-dimensional embeddings; some research models reach 512.

What they agree on is the output. Each produces a dense per-pixel embedding of the Earth's surface that captures visual content and contextual relationships. Each is queryable for downstream tasks without retraining. Each removes the historical step of building a custom Earth observation pipeline for every question.

This is a coordination point that did not exist three years ago. Before this wave, a question like "which of the 400 facilities in this fund's portfolio is emitting methane inconsistent with reported intensity" required either a custom contract with a remote-sensing firm or a partnership with one of a handful of specialist platforms. After this wave, the same question is a query against an embedding archive that NASA, IBM, Google, or the Allen Institute has already populated and made available. The cost curve has bent by an order of magnitude, possibly two.

A community survey of the embedding products in early 2026 catalogued the convergence: AlphaEarth, Tessera, Clay, Major-TOM, and MOSAIKS each ship at a per-pixel grain, each works with k-nearest-neighbour or linear-probe queries, each can be plugged into an analytical task in days rather than the years that traditional Earth observation pipelines required. The format fragmentation underneath (Cloud-Optimised GeoTIFF, GeoParquet, raw NumPy) is real. The analytical convergence on top of it is more striking.

The MRV industry that grew up around carbon credits in the 2010s was notoriously bespoke. Every verifier built their own pipelines. The foundation-model wave makes that bespoke pipeline economically obsolete. The verifiers who keep building their own pipelines from scratch in 2027 will be losing to verifiers who run the same queries against the open archives in an afternoon.

That is the shift the auditor of last resort is going to make.

Part III. What the disclosure system cannot do, the satellite layer can

Three structural shifts matter for anyone who allocates capital against ESG signals, regulates the disclosure regime, or operates in a sector where physical-world climate signals are material.

Independence from disclosure. The historic ESG measurement chain is disclosure-led. Company A reports figure X. Auditor B verifies that A's reporting follows the protocol. Analyst C aggregates B's verified A into a portfolio signal. The chain has known weaknesses at every step. Disclosure is partial; methodology choice is discretionary; audit coverage is sample-based. Satellite-derived verification of physical signals, methane plumes, deforestation, land use change, surface temperature anomalies, water stress, does not depend on the company being willing or able to disclose. The signal is observed directly.

This is not novel in principle. Climate TRACE has been producing emissions inventories from satellite and sensor data since 2020. Carbon Mapper, MethaneSAT, and GHGSat have demonstrated the principle on methane at high cost per facility. What is new is the scale and the unit economics. A foundation-model pipeline runs a methane detection model across every visible global facility, every week, without the per-facility cost structure that has historically made independent verification economical only for high-priority targets. The marginal facility is now free.

Continuous global coverage. Sample-based audit cannot give a fund manager a confident answer to the question "is any one of my 1,200 portfolio companies misstating Scope 1 emissions by more than 10 per cent". There is no resource model that audits 1,200 companies a year at that depth. A foundation-model-derived signal layer can. The unit of analysis stops being "the audited sample" and becomes "the population of physical facilities visible from orbit". This is the same structural shift that intraday financial data made over end-of-day prices in the 1990s. The change in granularity is the change in industry.

Asset-level resolution. ESG measurement has historically operated at the company level because the disclosure operates at the company level. The satellite layer operates at the asset level. This specific cement plant. This specific palm-oil concession. This specific tailings dam. A fund's exposure to climate risk is the sum of the asset-level signals at the facilities it owns, not the aggregated disclosure of the entity that owns them. Foundation models make asset-level aggregation tractable at portfolio scale for the first time.

Independent. Continuous. Asset-level. Three structural shifts, none of which the disclosure system was built to deliver, all of which the foundation-model wave now does. That is what the last auditor looks like.

Part IV. What the wave does not yet solve

The wave is not a settled system. Three caveats matter for anyone planning to build against it now.

Benchmark fragmentation. Each model release benchmarks against its own preferred task suite. AlphaEarth wins on the AlphaEarth suite. Prithvi wins on the Prithvi suite. TerraMind wins on PANGAEA. OlmoEarth wins on the Allen Institute's preferred set. GEO-Bench and PANGAEA are improvements on what came before, but the community-tracker observation that "anyone telling you their EO foundation model is state of the art is selling you a benchmark, not an embedding" remains correct in mid-2026. Until an MTEB-style shared scoreboard exists, model selection for any specific ESG task is empirical work, not a matter of reading the release announcement.

Compute and storage at planetary scale. Storing the TESSERA encoder embeddings at float32 globally costs roughly 3.1 petabytes of data, about US$847,000 at standard cloud rates. AlphaEarth's int8 quantisation cuts this by an order of magnitude. The compute economics work for individual queries against pre-computed embeddings. They do not work for any organisation that wants to materialise its own embedding archive at planetary scale. For the next few years, the major archives will sit with the public-good releasers (Google Earth Engine, Hugging Face, the Allen Institute), and the analytical layer will be the commercial opportunity.

Regulatory framework lag. The CSRD does not specify how satellite-derived verification fits into the assurance chain. The Science Based Targets initiative does not yet recognise satellite-derived emissions estimates as audit-grade inputs. The CDP scoring methodology weighs disclosure quality, not the divergence between disclosure and independent observation. Until the frameworks catch up, satellite-derived signals will sit alongside the disclosure-based chain rather than displacing it. The catch-up is starting. It is not yet visible in the published methodologies.

For an allocator building against the wave today, the practical implication is that satellite-derived signals are analytical augmentation, not yet compliance substitute. The substitution will come. It is two or three years out, not zero.

Part V. The capital implication

The auditor of last resort changes who reads the signal first.

For capital allocators, the implication is that satellite-derived signals are entering ESG analytical stacks faster than most internal infrastructures expect. The funds that build a foundation-model query layer into their ESG workflows in 2026 and 2027 will be reading the same signals as their portfolios are producing them. The funds that wait for the established ratings agencies to integrate the layer will be reading signals filtered through the ratings agencies' methodological choices, which will lag the underlying data by years. The methodology that ships first becomes the default, and the firm that ships the methodology becomes the citation.

For regulators, the question is not whether satellite-derived verification will become a meaningful input to disclosure regimes. It is whether the regulator integrates it as a first-class signal alongside corporate disclosure (the European instinct, visible in the Copernicus programme and in ESA's TerraMind co-development) or treats it as an external check on disclosure (the historic US instinct, less visible in the SEC's climate rule). The choice has implications for who controls the verification methodology over the next decade. Whoever ships the first regulatory-grade satellite-derived emissions verification standard will set the global benchmark by default.

For climate-tech operators, the wave is creating a flurry of derivative opportunities. The bespoke MRV businesses are commoditising. The new opportunity is the analytical layer on top. Tools that take the embedding archives and translate them into investment-decision-useful signals (asset-level climate risk, supply-chain physical exposure, transition-pathway compliance) are starting to ship as software products with venture-scale revenue trajectories. The velocity of competition at this layer is high enough that the firm with the best foundation-model-integration story in 2027 will not be the firm anyone is paying attention to in 2026.

Implication

The compromise that produced modern ESG was structural. The unit of measurement was the company because the data of measurement was the company's disclosure. The foundation-model wave of 2025 and 2026 changes the available data. The unit of measurement does not have to be the company any more. The data can be the planet itself, observed continuously, queried per asset, independent of what any company chooses to say.

That does not retire the disclosure system. Materiality framing, governance signals, forward-looking transition plans, scope-3 supply chain pathways: these still require what the company knows about itself. The satellite layer does not displace them. It surrounds them. The disclosure stays. The verification becomes harder to game.

For a capital allocator, the right posture today is to assume that within the next two to four years, every serious ESG analytical workflow will integrate a satellite-derived independent signal layer alongside disclosure-based inputs. The foundation models that make this possible are open. The cost curve is bending in the right direction. The regulatory uptake is starting in Europe. The firms that build the integration first will define the methodology that the rest of the market follows.

The auditor of last resort for ESG measurement is moving from a human profession to a satellite layer. The casual reader of foundation-model announcements has not noticed yet. The market will notice soon enough.

Part I. The cadence is the story

Part II. The output is what matters, not the architecture

Part III. What the disclosure system cannot do, the satellite layer can

Part IV. What the wave does not yet solve

Part V. The capital implication

Implication

More from The Climate Brief on this thread

From Yearly to Weekly: A Data Read of the Earth Observation Foundation Model Wave

The Climate-Data Substrate: A Practitioner's Guide to the Infrastructure Beneath the Compliance Layer

Investing in the Substrate: A Playbook for Climate-Tech Allocators in the Data-Infrastructure Layer

Investing in the Verification Layer: A Playbook for Climate-Tech Allocators in the Audit Decade