Reference

Methodology & sources

Every number on this site comes from an open federal dataset. This page lists the source for each chapter, what we did to it, the readiness-score formula, and the limitations a careful reader should keep in mind.

Sources

Dataset	Used in	Vintage	What we do with it
EIA-860 · annual	Retirement schedule, atlas, time machine	2022, 2023	Generator-level inventory. We split the "Retired and Canceled" sheet by EIA status code (RE → retired; CN / IP → canceled / postponed).
EIA-860M · monthly	Time machine, retirement schedule headline, atlas	Every month, Apr 2016 → latest	Same shape as EIA-860 but updated monthly. We use exact sheet-name routing (`Operating`, `Planned`, `Retired`, `Canceled or Postponed`) — the `_PR` proposed-retirement sheets are intentionally skipped because they duplicate rows that already exist on the base sheets with the retirement date inline.
EIA-923 · monthly Page 1	Capacity factor on the atlas	Latest annual per plant	Annual net generation / (nameplate × 8,760) for each plant.
EIA-930 · hourly	Generation clock	Trailing 12 months	Mean MWh by hour-of-day × fuel bucket, per balancing authority.
EPA CAMD CEMS	CO₂ intensity on the atlas	2024, 2025 daily files	Gross load + CO₂ per plant per year. We clip CO₂ intensity to NaN when `gross_load_mwh < 1,000`, because small denominators produce physically impossible values (NG ≈ 0.4, coal ≈ 1.0; anything above ~2.5 t/MWh is a data artifact).
LBNL Queued Up	Queue reality check	Latest annual snapshot	Dedup'd by `queue_project_id` across vintages to avoid double-counting. `active` + `suspended` are included in the headline active-queue figure (matches LBNL's own convention); `operational` projects have already connected and are excluded.
HIFLD · substations, transmission lines	Atlas (readiness score)	Public refresh, 2025	Equirectangular distance from each plant to every substation and line midpoint, ≥230 kV bucketed as "high voltage."
EIA gas pipelines · interstate + intrastate	Atlas (readiness score)	Latest available	Distance + interstate-miles-within-25-mi per plant.

The readiness score

The atlas's readiness score is a 0–100 composite of four sub-scores, each on the same 0–100 scale, weighted as follows:

Transmission (35%) — 60% the distance to the nearest ≥230 kV substation (perfect at 2 mi, awful at 25 mi), 40% the highest line voltage within 10 mi (perfect at 500 kV, awful at 69 kV).
Gas access (25%) — 70% the distance to the nearest interstate pipeline (perfect at 2 mi, awful at 30 mi), 30% the interstate-pipeline miles within 25 mi (perfect at 200 mi, awful at 0).
Site size (20%) — retiring nameplate MW (perfect at 1,500 MW, awful at 50 MW).
Operational headroom (20%) — recent capacity factor inverted: a plant running at 20% CF scores 80, a plant running flat-out scores 0.

Each sub-score is visible on the detail card. A missing input scores a neutral 50 rather than zero, so sparse-data plants are not penalized for missing fields.

Why the atlas only shows plants retiring 2024–2030

The atlas focuses on plants where existing interconnection service agreements (ISAs), transformers, gas taps, and on-site infrastructure may still be usable. That reuse window decays after retirement: ISA termination rules across PJM, MISO and ERCOT generally take effect within a few years of permanent retirement, and prime movers are typically scrapped on a similar timeline. Including plants that retired in 2015–2019 on the map would show sites where this infrastructure is less likely to remain. Those plants are shown separately as historical context, not as current prospects. The "Ready now" preset further narrows to capacity_factor ≤ 20%, retirement inside the near-term window (last unit retiring 2024–2028), and readiness ≥ 70.

Limitations & caveats

Why this is a screening tool, not a feasibility study

Substation thermal capacity isn't in this map. Proximity to a substation is necessary but not sufficient. The substation's thermal rating, current loading, and downstream transmission congestion all matter.
Interconnect rights don't transfer automatically. Recapture rules vary by RTO and utility. Some interconnect customers retain the rights at the bus; others have to re-apply. The atlas does not encode the regulatory path.
Water, fiber, land tenure, zoning, ownership status are not included in this data. The atlas covers a subset of the factors relevant to any site evaluation.

State- vs county-granularity mixing

OEWS labor data is state-only. The county-level scoring inherits state-level employment numbers across every county in that state. The atlas and time machine don't use that signal.
Queue pressure is computed at the state level. Many queue rows have a county FIPS, but enough don't that we keep it at state for now.
EIA-861 retail prices and YoY demand growth are currently only collected for eight states (IA, IL, IN, MI, MN, OH, PA, WI). The composite score re-weights when these inputs are missing, which means cross-state rankings are not directly comparable. The atlas does not depend on these signals.

Time-machine scope notes

The retirement schedule is recomputed for each historical snapshot from the corresponding EIA-860M xlsx — this is the signal that changes most frequently. The non-860M signals (transmission, queue, prices, tariffs) are not back-cast — they're taken from today's values and held constant, since those signals change far more slowly than retirement filings.
The freed-MW figure for any given month is "what the world knew at that moment about retiring fossil capacity within the snapshot's forward window" — anchored to the snapshot date, not to today.

Generation clock scope notes

EIA-930 reports generation at the balancing-authority level. Interchange between BAs isn't modeled — a CAISO load actually served by Northwest hydro imports looks "all CAISO" here.
BA timezone offsets are fixed (no DST handling), which smears the hour-of-day curve by an hour at the DST boundary. Acceptable for the screening signal, but noted.

Reproducing the numbers

Every JSON bundle this site loads is bundled alongside the HTML. The pipeline fetches raw federal data, normalizes it into Parquet tables and DuckDB views, scores each county, and exports the JSON files that drive each chart.

Why this exists

This started as a personal exercise: I wanted to understand the U.S. energy landscape better, and the most effective way I know to learn a domain is to build something with its data. The goal was to force myself to find the public datasets that exist, figure out what they actually measure, and see how they fit together.

I'm not an energy professional. I work in software, and some of what's here is certainly naive or oversimplified — the scoring weights are reasonable guesses, not calibrated models, and I'm sure an experienced grid planner would see gaps I don't. That's fine. This is a learning tool first.

What surprised me about the data

The amount of structured, freely available federal energy data is remarkable. EIA alone publishes plant-level inventories (860/860M), monthly generation (923), hourly grid-level fuel mix (930), and retail pricing (861) — all downloadable as bulk files. LBNL publishes interconnection queue data across every ISO. HIFLD publishes substation and transmission-line coordinates. None of it requires an account or approval; most of it has years of history.

The more interesting part was joining it. The time machine, for example, replays 120 individual monthly EIA-860M Excel files — one for every month since April 2016 — and reconstructs a county-level retirement schedule from each one. That kind of longitudinal view isn't published anywhere; you have to build it by stitching the archives together. The readiness score joins four separate spatial datasets (plant locations, substation coordinates, transmission lines, pipeline routes) into a single per-plant composite. Most of the pipeline effort was in cleaning and aligning these sources, not in the visualization.

How it was built

The entire pipeline and site were built with substantial help from Claude, Anthropic's AI assistant. I described what I wanted to see and worked through the implementation interactively: the data ingestion, normalization, scoring logic, and D3 charts all came together in conversation, iterating on data joins, scoring formulas, and chart designs as I went. Working this way let me move faster than I would have alone, and cover more of the data than I'd have had the patience to wrangle by hand.

Spotted something off? Reach out to mattstockton@gmail.com. Independent eyes have already caught one class of dedup bug in this data; assume there's another waiting to be found and be skeptical of anything that looks surprising.