emburden: Temporal Analysis of Household Energy Burden Using Net Energy Return Metrics
Eric Scheier
2025-12-15
Source:vignettes/jss-emburden.Rmd
jss-emburden.RmdAbstract
Energy burden—the proportion of household income spent on energy—is a critical metric for understanding energy poverty and inequity. However, traditional energy burden ratios present analytical challenges including difficulties with aggregation and visualization of extreme values. The emburden package for R implements Net Energy Return (Nh) methodology to address these limitations while enabling temporal analysis of household energy characteristics. This paper introduces the package’s design and demonstrates its application to comparing Low-Income Energy Affordability Data (LEAD) Tool vintages from 2018 and 2022 across geographic and demographic dimensions. The package provides functions for downloading, processing, and analyzing census tract-level energy burden data for all U.S. states, with particular attention to proper weighted aggregation and schema normalization across data vintages. We demonstrate the package’s capabilities through examples ranging from state-level summaries to fine-grained census tract comparisons, illustrating how policy-relevant insights can be extracted at multiple scales.
Introduction
Household energy affordability is a persistent challenge affecting millions of households in the United States. Low-income households face disproportionate energy burdens, often spending more than 6% of their income on energy costs compared to 2-3% for higher-income households (Ross, Drehobl, and Stickles 2018; Drehobl and Ross 2016). Understanding these disparities and tracking changes over time is essential for designing effective energy assistance programs and policies.
The traditional energy burden metric—the ratio of energy expenditures () to gross income ()—has several analytical limitations. As a ratio with income in the denominator, energy burden () approaches infinity for households with very low incomes, creating challenges for aggregation and visualization. Additionally, the metric requires harmonic mean aggregation rather than arithmetic means, which is not widely understood or consistently applied (Scheier and Kittner 2022).
Mathematical foundations
The emburden package for R addresses these challenges by
implementing Net Energy Return (NER) methodology, adapted from
macro-energy systems analysis (Hall, Lambert, and
Balogh 2011; Brandt, Dale, and Barnhart 2013; Carbajales-Dale et al.
2014). Net energy analysis estimates the net energy return of a
process as a relationship between gross resources extracted and embodied
energy directed toward extraction:
For households extracting income from the economy, these ratios become:
This metric represents the net earnings a household receives for every dollar of expenditure on secondary energy. For notational simplicity, we use to denote household Net Energy Return throughout this paper, where .
Comparison with energy burden
Energy burden, the traditional metric in energy poverty analysis, is defined as:
While energy burden is intuitive as a percentage, it has several mathematical limitations. The Net Energy Return transformation addresses these by preventing double-counting of energy expenditures (income in the numerator already includes the portion spent on energy) and enabling proper weighted mean aggregation:
In contrast, energy burden requires harmonic mean aggregation:
The two metrics are mathematically related through the transformation , allowing seamless conversion between representations.
Energy poverty threshold
Energy poverty is commonly defined as spending greater than 10% of household income on energy (Bednar and Reames 2020):
Translated to Net Energy Return, the energy poverty threshold becomes:
This means a household earning less than $9 of income for every dollar spent on secondary energy is considered to be in energy poverty by the traditional energy burden accounting method. A Net Energy Return of 9 or lower is equivalent to an energy burden of 10% or higher. While this threshold is somewhat arbitrary and may not be suitable in all situations, it provides a useful benchmark for comparing results to the energy poverty literature.
The LEAD Tool and temporal analysis
The U.S. Department of Energy’s Low-Income Energy Affordability Data (LEAD) Tool (Ma et al. 2019) provides census tract-level estimates of household energy characteristics based on American Community Survey microdata. The tool uses iterative proportional fitting to allocate households to census tracts while calibrating to utility-reported sales and revenues.
Multiple vintages of LEAD Tool data have been released:
- 2018 Update: Based on 2018 5-year ACS data, released July 2020
- 2022 Update: Based on 2022 5-year ACS data, released August 2024
These vintages enable temporal analysis of energy burden trends, but require careful handling of schema differences and income bracket definitions.
Package design philosophy
The emburden package is designed around several key
principles:
- Proper aggregation: Implements weighted mean aggregation using Net Energy Return, with household counts as weights
- Temporal consistency: Normalizes schema differences between LEAD Tool vintages to enable valid comparisons
- Flexible workflows: Supports both database and CSV-based data access with automatic fallback
- Geographic flexibility: Enables analysis from national level down to individual census tracts
Methodology
Data sources
The emburden package provides access to three primary
datasets for household energy burden analysis:
LEAD Tool
The Low-Income Energy Affordability Data (LEAD) Tool (Ma et al. 2019) portrays average income, electricity expenditures, gas expenditures, and other fuel expenditures for cohorts of households segmented by location (census tract, county, state) and household characteristics (ownership status, building age, number of units, attachment status, primary heating fuel).
The dataset is assembled using iterative proportional fitting (IPF), a widely used spatial microsimulation method to allocate households to census tracts while calibrating characteristics to known quantities. The IPF algorithm processes cross-tabulations of household responses from the American Community Survey (ACS) Public Use Microdata Samples, scaling them to match aggregate annual values from utility sales and revenues reported in Energy Information Administration forms 861 (electricity) and 176 (natural gas).
Multiple vintages are available:
- 2018 Update: Based on 2016 5-year ACS data (2012-2016), released July 2020
- 2022 Update: Based on 2018 5-year ACS data (2014-2018), released August 2024
REPLICA dataset
The Renewable Energy Potential of Low-Income Communities in America (REPLICA) dataset (Sigrin and Mooney 2018) adds technical rooftop solar potential and additional techno-economic variables including demographics and electricity rates. The package can merge REPLICA data with LEAD data to enrich analyses with utility type, locale classification, and solar generation potential.
Schema normalization across vintages
A critical challenge in temporal analysis is handling schema differences between LEAD Tool vintages. The package implements automatic normalization through the following transformations:
Income bracket aggregation: The LEAD Tool provides income as a fraction of Area Median Income (AMI) or Federal Poverty Level (FPL). For AMI data, the package can aggregate detailed brackets into simplified categories matching the REPLICA schema:
- 0-30% AMI: Very Low Income
- 30-80% AMI: Low-to-Moderate Income
- 80%+ AMI: Middle-to-High Income
For FPL data, the aggregation follows poverty line definitions:
- 0-100% FPL: In Poverty
- 100%+ FPL: Not In Poverty
Building type simplification: Housing units are classified as:
- 1 Unit: Single-Family
1 Unit: Multi-Family
- Other Unit: Excluded from analysis
These normalizations enable valid temporal comparisons despite underlying schema evolution between vintages.
Data processing
The package processes raw LEAD Tool data through several stages:
Energy burden indicator calculation
For each household cohort, the package calculates:
From these base metrics, all energy burden indicators are derived using the formulas presented in Section 1.1.
Weighted aggregation
The package implements proper weighted aggregation using household counts as weights. For Net Energy Return:
calculate_weighted_metrics(
data,
group_columns = c("state", "income_bracket"),
metric_name = "ner"
)This function:
- Filters data to specified groups
- Calculates weighted means using household counts
- Computes poverty rates below specified thresholds
- Returns summary statistics including quantiles and standard deviations
The key insight is that Net Energy Return allows arithmetic weighted means, while energy burden would require harmonic mean aggregation—a distinction that significantly impacts the validity and interpretability of aggregate statistics.
Data quality considerations
Iterative proportional fitting has limitations as an estimation procedure. The relationship between constraint variables tends toward the average of the initializing dataset, potentially depressing variations among otherwise similar regions. This may explain the large quantities of households estimated to have very low incomes. Validating these estimated data would require randomized surveys along the dimensions of interest.
Additionally, the “primary heating fuel” category derives from the ACS question “Which fuel is used most for heating this house, apartment, or mobile home?” The predictive power of this question for energy expenditures is not fully understood and warrants caution in interpretation.
Though REPLICA relies on a different LEAD vintage (2017) than recent analyses (2019, 2022), the package still enables useful cross-dataset analysis. However, inferring differences among annual estimates should account for the standard error of the data (Ma et al. 2019). Rigorous temporal analysis benefits from comparing identically-processed vintages.
Package architecture
The emburden package is organized into several
functional modules:
Core functions
library(emburden)
# Energy metric calculations
energy_burden_func(gross_income, energy_spending)
ner_func(gross_income, energy_spending) # Net Energy Return
eroi_func(gross_income, energy_spending) # EROI
dear_func(gross_income, energy_spending) # DEAR
# Statistical aggregation
calculate_weighted_metrics(
graph_data,
group_columns = "state",
metric_name = "ner"
)Data loading functions
The package provides automatic data downloading and caching:
# Load census tract data (auto-downloads if not available)
nc_tracts <- load_census_tract_data(states = "NC")
# Load cohort data by income bracket
nc_ami <- load_cohort_data(
dataset = "ami",
states = "NC",
vintage = "2022"
)
# Compare vintages
comparison <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "state"
)Analysis examples
The emburden package’s primary contribution is enabling
temporal analysis of energy burden through proper schema normalization
and aggregation. This section demonstrates the package’s capabilities
through progressively detailed examples.
Temporal comparison workflow
The compare_energy_burden() function provides the core
temporal analysis functionality:
library(emburden)
# Compare North Carolina energy burden: 2018 vs 2022
nc_comparison <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "income_bracket"
)
# View formatted comparison table
print(nc_comparison)The function automatically:
- Downloads both vintages if not cached locally
- Normalizes schema differences between vintages
- Performs proper -based weighted aggregation
- Calculates energy burden for both periods
- Computes changes in percentage points
Understanding the output
The comparison object contains multiple metrics:
# Energy burden in 2018 and 2022
nc_comparison$neb_2018
nc_comparison$neb_2022
# Change in energy burden (percentage points)
nc_comparison$neb_change_pp
# Net Energy Return values
nc_comparison$ner_2018
nc_comparison$ner_2022
# Household counts
nc_comparison$households_2018
nc_comparison$households_2022Example 1: State-level temporal analysis
To examine overall state changes without grouping by demographic characteristics:
# Overall state comparison
nc_state <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "none"
)
# Extract key findings
cat(sprintf(
"North Carolina energy burden changed from %.1f%% (2018) to %.1f%% (2022)\n",
nc_state$neb_2018 * 100,
nc_state$neb_2022 * 100
))
cat(sprintf(
"Change: %+.2f percentage points\n",
nc_state$neb_change_pp * 100
))Example 2: Income bracket analysis
Disaggregating by income bracket reveals which populations experienced the largest changes:
# Compare by income bracket
nc_income <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "income_bracket"
)
# Visualize changes
library(ggplot2)
ggplot(nc_income, aes(x = income_bracket, y = neb_change_pp * 100)) +
geom_col(fill = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Change in Energy Burden by Income Bracket",
subtitle = "North Carolina, 2018 to 2022",
x = "Income Bracket (% of Area Median Income)",
y = "Change in Energy Burden (percentage points)"
) +
theme_minimal()Typical findings show that very low-income households (0-30% AMI) experience the highest energy burdens and are most vulnerable to changes in energy costs or income levels.
Example 3: Multi-state comparison
Comparing multiple states reveals regional patterns and policy impacts:
# Compare Southern states
southern_states <- compare_energy_burden(
dataset = "ami",
states = c("NC", "SC", "GA", "FL"),
group_by = "state"
)
# Which states improved most?
southern_states %>%
arrange(neb_change_pp) %>%
select(state_abbr, neb_2018, neb_2022, neb_change_pp)
# Visualize state comparison
ggplot(southern_states, aes(x = reorder(state_abbr, neb_2022),
y = neb_2022 * 100)) +
geom_col(fill = "darkgreen") +
geom_point(aes(y = neb_2018 * 100), color = "red", size = 3) +
labs(
title = "Energy Burden by State: 2022 (bars) vs 2018 (points)",
x = "State",
y = "Energy Burden (%)"
) +
theme_minimal()Example 4: Housing tenure analysis
Energy burden often varies significantly between renters and homeowners:
# Compare by housing tenure
nc_tenure <- compare_energy_burden(
dataset = "ami",
states = "NC",
group_by = "housing_tenure"
)
# Calculate the renter-owner gap
gap_2018 <- nc_tenure$neb_2018[nc_tenure$housing_tenure == "RENTER"] -
nc_tenure$neb_2018[nc_tenure$housing_tenure == "OWNER"]
gap_2022 <- nc_tenure$neb_2022[nc_tenure$housing_tenure == "RENTER"] -
nc_tenure$neb_2022[nc_tenure$housing_tenure == "OWNER"]
cat(sprintf(
"Renter-Owner energy burden gap: %.2f pp (2018) → %.2f pp (2022)\n",
gap_2018 * 100,
gap_2022 * 100
))Renters typically face higher energy burdens due to split-incentive problems where landlords make efficiency investment decisions but tenants pay energy bills.
Example 5: Federal Poverty Line analysis
For policy applications targeting households below the federal poverty line:
# Use FPL dataset instead of AMI
nc_fpl <- compare_energy_burden(
dataset = "fpl",
states = "NC",
group_by = "income_bracket"
)
# Compare poverty vs non-poverty households
nc_fpl %>%
filter(income_bracket %in% c("Below Federal Poverty Line",
"Above Federal Poverty Line")) %>%
select(income_bracket, neb_2018, neb_2022, neb_change_pp)This analysis is particularly relevant for programs like the Low-Income Home Energy Assistance Program (LIHEAP) which target households below specific poverty thresholds.
Example 6: Census tract-level analysis
For fine-grained spatial analysis, load tract-level data directly:
# Load 2022 census tract data
nc_tracts_2022 <- load_census_tract_data(
states = "NC",
vintage = "2022"
)
# Calculate county-level statistics
nc_counties <- calculate_weighted_metrics(
nc_tracts_2022,
group_columns = "county_name",
metric_name = "ner"
)
# Identify counties with highest energy burden
nc_counties %>%
mutate(energy_burden = 1 / (ner + 1)) %>%
arrange(desc(energy_burden)) %>%
head(10) %>%
select(county_name, energy_burden, household_count)Census tract data enables spatial analysis and mapping applications, revealing urban-rural disparities and identifying communities in need of targeted assistance.
Discussion
Policy implications
The ability to track energy burden changes over time has important policy implications. Programs like LIHEAP (Low-Income Home Energy Assistance Program) and WAP (Weatherization Assistance Program) target households experiencing energy insecurity, but evaluating their effectiveness requires robust temporal analysis.
The emburden package enables researchers and
policymakers to:
- Track program impacts: Compare energy burden before and after policy interventions
- Identify vulnerable populations: Disaggregate trends by income, tenure, and geography
- Allocate resources effectively: Target communities with worsening energy affordability
- Benchmark across jurisdictions: Compare state and local policy outcomes
Split-incentive and principal-agent problems
A persistent challenge in energy equity is the split-incentive problem: landlords make energy efficiency investment decisions, but tenants pay the energy bills. This misalignment of incentives leads to underinvestment in efficiency improvements for rental properties.
The package’s ability to analyze energy burden by housing tenure reveals the magnitude of this problem:
# Quantify the renter-owner gap
tenure_comparison <- compare_energy_burden(
dataset = "ami",
states = "all", # National analysis
group_by = "housing_tenure"
)
# Calculate disparity
renter_burden <- tenure_comparison$neb_2022[
tenure_comparison$housing_tenure == "RENTER"
]
owner_burden <- tenure_comparison$neb_2022[
tenure_comparison$housing_tenure == "OWNER"
]
disparity_ratio <- renter_burden / owner_burdenAddressing this gap requires policy interventions such as:
- On-bill financing programs
- Landlord incentive programs
- Energy efficiency standards for rental properties
- Community-scale renewable energy projects
Data limitations and considerations
Users should be aware of several data limitations:
Iterative proportional fitting constraints
The LEAD Tool uses IPF to allocate households to census tracts, which has important implications:
- Regression toward the mean: IPF tends to depress variations among similar regions
- Estimation uncertainty: Standard errors are substantial, especially for small cohorts
- Temporal comparability: Different ACS vintages may have methodological differences
Income measurement challenges
Household income as reported in the ACS has known limitations:
- Underreporting: Particularly for benefits and informal income
- Timing: Income is annual but energy costs vary seasonally
- Household composition: Per-capita income may be more relevant for some analyses
Energy expenditure estimation
The “primary heating fuel” categorization derives from a single ACS question and may not fully capture:
- Mixed-fuel households
- Behavioral patterns
- Appliance efficiency variations
- Climate variations within states
Despite these limitations, the LEAD Tool represents the most comprehensive spatial dataset available for energy burden analysis in the United States.
Future research directions
Several extensions would enhance the package’s capabilities:
Additional vintages
As DOE releases new LEAD Tool vintages (potentially 2024, 2026, etc.), the package can incorporate them to enable longer-term trend analysis. This would support:
- Multi-year trend identification
- Correlation with economic cycles
- Climate change impact assessment
Additional metrics
The package currently implements Net Energy Return, EROI, and DEAR. Future versions could add:
- Disposable income ratios: Accounting for essential expenses beyond energy
- Energy poverty depth: How far below thresholds households fall
- Vulnerability indices: Combining burden with demographic risk factors
Comparison with existing tools
Several tools exist for energy burden analysis, each with different strengths:
- LEAD Tool web interface: Interactive but limited temporal comparison
- State energy office tools: Customized but not standardized across states
- Academic datasets: Rich but often one-time snapshots
-
emburden: Focused on temporal analysis with proper aggregation methodology
The emburden package fills a gap by providing
programmatic access to multiple vintages with automated schema
normalization, enabling reproducible temporal analyses at scale.
Conclusion
The emburden package provides a robust framework for
temporal analysis of household energy burden using proper Net Energy
Return methodology. By automating data access, normalizing schema
differences, and implementing correct aggregation methods, the package
enables researchers and policymakers to track energy affordability
trends across multiple scales.
Key contributions include:
- Mathematical foundations: Proper Net Energy Return aggregation avoiding double-counting
- Temporal consistency: Automated schema normalization across LEAD Tool vintages
- Flexible analysis: Functions supporting national, state, county, and tract-level analysis
- Policy relevance: Direct support for energy assistance program evaluation
The package is available from GitHub at and is licensed under AGPL-3+. Documentation, vignettes, and issue tracking are available through the package website.