Skip to contents

R package for analyzing household energy burden - the percentage of income spent on energy costs.

Overview

emburden provides tools for calculating and analyzing household energy burden across different geographic areas and demographic groups. The package helps you aggregate energy burden data accurately using the Net Energy Return (Nh) method described in Scheier & Kittner (2022).

Data coverage: All 51 US states and territories (50 states + DC) with 2.3+ million household cohort records covering ~73,000 census tracts. Data downloads automatically from Zenodo/OpenEI on first use - no manual setup required!

Key Features

  • Energy metrics calculations: Energy burden, Net Energy Return (Nh), EROI, DEAR
  • Weighted statistical analysis: Proper aggregation using household weights
  • Flexible grouping: Analyze by utility, state, county, census tract, or custom categories
  • Publication-ready formatting: Functions for creating formatted tables in multiple output formats

Why Net Energy Return?

When calculating energy burden for a single household, it’s straightforward: divide energy spending by income. But when combining data from many households, simply averaging those percentages can introduce errors.

The challenge: Energy burden is a ratio (spending ÷ income), and ratios don’t behave well with simple averages. For example, if one household spends $100 of $1,000 income (10%) and another spends $50 of $10,000 income (0.5%), the simple average of 5.25% doesn’t accurately represent the combined situation.

The solution: The Net Energy Return (Nh) method transforms the data so you can use standard averaging techniques, then converts back to energy burden. Think of Nh as “how much money is left after energy costs, per dollar spent on energy.” This transformation makes aggregation more accurate and interpretable.

When it matters: - ✓ Combining data from many individual households - ✓ Calculating regional or demographic averages - For single households, both methods give identical results

This methodology is detailed in Scheier & Kittner (2022) - see Citation below.

Installation

You can install the development version of emburden from GitHub:

# install.packages("devtools")
devtools::install_github("ericscheier/emburden")

Quick Start

library(emburden)
library(dplyr)

# Data downloads automatically on first use!
# Example 1: Single state (fast, good for learning)
nc_data <- load_cohort_data(dataset = "ami", states = "NC")

# Example 2: Multiple states (regional analysis)
southeast <- load_cohort_data(dataset = "ami", states = c("NC", "SC", "GA", "FL"))

# Example 3: Nationwide (all 51 states - no filter)
us_data <- load_cohort_data(dataset = "ami", vintage = "2022")  # No states filter = all states

# For the examples below, we'll use NC data (faster)
nc_ami <- nc_data

# === EXAMPLE 1: Single household calculation ===
gross_income <- 50000
energy_spending <- 3000

# Method 1: Direct energy burden
eb <- energy_burden_func(gross_income, energy_spending)  # 0.06

# Method 2: Via Net Energy Return (mathematically identical)
nh <- ner_func(gross_income, energy_spending)  # 15.67
neb <- 1 / (nh + 1)  # 0.06 (same as eb)

# === EXAMPLE 2: Individual household data aggregation ===
# Recommended: Use Nh method for accurate aggregation
incomes <- c(30000, 50000, 75000)
spendings <- c(3000, 3500, 4000)
households <- c(100, 150, 200)

nh <- ner_func(incomes, spendings)
nh_mean <- weighted.mean(nh, households)
neb_aggregate <- 1 / (1 + nh_mean)

# Note: Direct averaging of energy burden values can introduce errors
# neb_naive <- weighted.mean(energy_burden_func(incomes, spendings), households)

# === EXAMPLE 3: Cohort data aggregation ===
# For pre-aggregated totals, you can use the direct ratio
neb_cohort <- sum(nc_ami$total_electricity_spend) / sum(nc_ami$total_income)

# === EXAMPLE 4: Grouped analysis ===
results <- calculate_weighted_metrics(
  graph_data = nc_ami,
  group_columns = "income_bracket",
  metric_name = "ner",
  metric_cutoff_level = 15.67,  # 6% energy burden threshold
  upper_quantile_view = 0.95,
  lower_quantile_view = 0.05
)

# Format results for publication
library(scales)
results$formatted_median <- to_percent(results$metric_median)

# === EXAMPLE 5: Temporal comparison ===
# Compare energy burden between 2018 and 2022

# Single state comparison
nc_comparison <- compare_energy_burden(
  dataset = "ami",
  states = "NC",
  group_by = "income_bracket"
)

# Multi-state comparison by state
southeast_comparison <- compare_energy_burden(
  dataset = "ami",
  states = c("NC", "SC", "GA", "FL"),
  group_by = "state"  # Options: "income_bracket", "state", "none", or custom columns
)

# Nationwide comparison by income bracket
us_comparison <- compare_energy_burden(
  dataset = "ami",
  group_by = "income_bracket"  # No states filter = all 51 states
)

# View results
print(nc_comparison)

# Access specific columns
us_comparison$neb_2018  # 2018 energy burden
us_comparison$neb_2022  # 2022 energy burden
us_comparison$neb_change_pp  # Change in percentage points

Sample Data (No Download Required!)

NEW in v0.3.0: The package includes Orange County, NC sample data for instant demos and testing.

# Load sample data (instant - no download!)
data(orange_county_sample)

# Available datasets
names(orange_county_sample)
# [1] "fpl_2018" "fpl_2022" "ami_2018" "ami_2022"

# Quick analysis
library(dplyr)
orange_county_sample$fpl_2022 %>%
  group_by(income_bracket) %>%
  summarise(
    households = sum(households),
    energy_burden = sum(total_electricity_spend + total_gas_spend + total_other_spend) /
                    sum(total_income)
  )
# Shows 16.3% energy burden for lowest income vs 1.0% for highest income

# Use in examples and vignettes
comparison <- compare_energy_burden(
  dataset = "fpl",
  states = "NC",  # Will use sample data if full NC data not available
  group_by = "income_bracket"
)

Sample data coverage: - Orange County, NC (Chapel Hill, Carrboro, Hillsborough) - 42 census tracts (2022 vintage) - Both FPL and AMI cohorts - Both 2018 and 2022 vintages - Only 94 KB - perfect for testing and demos!

Core Functions

Energy Metrics

Household-level calculations (all mathematically related): - energy_burden_func(g, s) - Energy Burden: S/G - neb_func(g, s) - Net Energy Burden: S/G (identical to EB, emphasizes proper aggregation) - ner_func(g, s) - Net Energy Return: (G-S)/S (use this for aggregation!) - eroi_func(g, s) - Energy Return on Investment: G/Se - dear_func(g, s) - Disposable Energy-Adjusted Resources: (G-S)/G

Key relationships: - At household level: neb_func() == energy_burden_func() (identical) - Transformation: neb = 1/(1+nh) and nh = (1/neb) - 1 - 6% energy burden threshold ↔︎ Nh ≥ 15.67

Aggregation guidance: - Individual household data: Calculate nh <- ner_func(income, spending), then neb_aggregate <- 1/(1 + weighted.mean(nh, weights)) - Cohort data (pre-aggregated totals): Calculate neb <- sum(total_spending) / sum(total_income) - Note: Direct averaging of energy burden values (weighted.mean(neb_func(...))) can introduce errors; use the Nh method for individual household data

Statistical Analysis

  • calculate_weighted_metrics() - Weighted mean, median, quantiles with grouping
  • Automatically calculates poverty rates below specified thresholds
  • Handles missing data and small sample sizes

Temporal Comparison

  • compare_energy_burden() - Compare energy burden across data vintages (2018 vs 2022)
  • Automatically handles schema differences between vintages
  • Proper Nh-based aggregation built-in
  • Grouping options: "income_bracket", "state", or "none"

Formatting

Project Structure

This repository contains both the R package and analysis code:

net_energy_equity/
├── R/                      # Package source code (exportable)
│   ├── energy_ratios.R     # Energy metric calculations
│   ├── metrics.R           # Weighted statistical functions
│   └── formatting.R        # Output formatting utilities
├── analysis/               # Analysis scripts and outputs (not in package)
│   ├── scripts/            # Example analysis scripts
│   └── outputs/            # Generated tables and results
├── DESCRIPTION             # Package metadata
├── NAMESPACE               # Package exports
└── README.md               # This file

The package is designed to be extractable to a separate repository while analysis scripts remain here and depend on the installed package.

Example Analysis

See analysis/scripts/ for complete examples:

  • nc_all_utilities_energy_burden.R: Analyze all NC electric utilities
  • nc_cooperatives_energy_burden.R: Focus on NC electric cooperatives
  • all_utilities_energy_burden.R: National-level analysis

Run from project root:

# Load package for development
devtools::load_all()

# Data downloads automatically on first use!
# Run analysis
source("analysis/scripts/nc_all_utilities_energy_burden.R")

# View outputs in analysis/outputs/

Data Requirements

Automatic Data Download

The package automatically downloads LEAD Tool data from Zenodo/OpenEI on first use and caches it locally for fast subsequent access. No manual data setup required!

Data coverage: All 51 US states (50 states + DC) with 2.3+ million cohort records covering ~73,000 census tracts.

Loading data (automatic database/CSV/Zenodo/OpenEI download fallback):

library(emburden)

# Check which data source is available
check_data_sources()

# Example 1: Single state (fast)
nc_data <- load_cohort_data(dataset = "ami", states = "NC")

# Example 2: Multiple states (regional)
southeast <- load_cohort_data(dataset = "ami", states = c("NC", "SC", "GA"))

# Example 3: Nationwide (all 51 states)
us_data <- load_cohort_data(dataset = "ami", vintage = "2022")  # No filter = all states

# Load specific vintage (2018 or 2022)
nc_2018 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2018")
nc_2022 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2022")

# Compare vintages for temporal analysis
nc_comparison <- compare_energy_burden(dataset = "ami", states = "NC", group_by = "income_bracket")

# Nationwide temporal comparison
us_comparison <- compare_energy_burden(dataset = "ami", group_by = "income_bracket")

Data Loading Workflow:

On first use, load_cohort_data() automatically: 1. Tries local database for fast access 2. Falls back to local CSV files if database unavailable 3. Downloads from OpenEI (DOE LEAD dataset) if neither exists 4. Imports to database automatically for subsequent fast access 5. Returns the requested data - no manual steps required!

Data sources: - OpenEI 2018: https://data.openei.org/submissions/573 (4 AMI brackets) - OpenEI 2022: https://data.openei.org/submissions/6219 (6 AMI brackets)

See data-raw/README.md for complete migration documentation and data-raw/LEAD_SCHEMA_COMPARISON.md for vintage differences.

Legacy CSV Files

For backward compatibility, the package still supports CSV files:

  • CensusTractData.csv - Census tract demographics and utility info
  • CohortData_AreaMedianIncome.csv - Energy burden by Area Median Income brackets
  • CohortData_FederalPovertyLine.csv - Energy burden by Federal Poverty Line brackets

These files are large (>100MB each) and not included in git. Contact maintainer for access or use the database (recommended).

Energy Poverty Threshold

The standard 6% energy burden threshold corresponds to:

  • Energy Burden: E_b ≤ 0.06
  • Net Energy Return: Nh ≤ 15.67
  • EROI: EROI ≥ 16.67

Use ner_func(g = 1, s = 0.06) to calculate the Nh threshold for any energy burden level.

Citation

If you use this package or methodology in your research, please cite:

Scheier, E., & Kittner, N. (2022). A measurement strategy to address disparities across household energy burdens. Nature Communications, 13, 1717. https://doi.org/10.1038/s41467-021-27673-y

BibTeX:

@article{scheier2022measurement,
  title={A measurement strategy to address disparities across household energy burdens},
  author={Scheier, Eric and Kittner, Noah},
  journal={Nature Communications},
  volume={13},
  number={1},
  pages={1717},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s41467-021-27673-y}
}

License

GNU Affero General Public License v3.0 or later (AGPL-3+)

See LICENSE for full text.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests and documentation
  4. Submit a pull request

Issues

Report bugs or request features at: https://github.com/ericscheier/emburden/issues

Development

# Load package during development
devtools::load_all()

# Run tests
devtools::test()

# Check package
devtools::check()

# Build documentation
devtools::document()

# Install locally
devtools::install()

Database Integration

The package supports local SQLite database integration for enhanced analysis and improved performance:

What’s Available

Energy Burden Data (new in v0.2.0): - Census tract demographics - 72K tracts with utility service territory info - Household cohort energy burden - 2.4M cohorts by income/tenure/housing type - Area Median Income (AMI) brackets - Income relative to local median - Federal Poverty Line (FPL) brackets - Income relative to poverty threshold

Utility & Market Data: - Utility electricity rates by ZIP code - eGrid emissions subregions for environmental justice analysis - Geographic crosswalks (tract ↔︎ ZIP ↔︎ county) - State retail sales projections (1998-2050) - Renewable generator registry

Installation

# Install database packages
install.packages(c("DBI", "RSQLite"))

# Optional: Set environment variable to use existing database
# Sys.setenv(EMBURDEN_DB_PATH = "/path/to/your/database.sqlite")

Usage

Energy Burden Data (automatic database/CSV/Zenodo/download fallback):

library(emburden)

# Single state analysis
nc_data <- load_cohort_data(dataset = "ami", states = "NC")

# Multi-state regional analysis
southeast <- load_cohort_data(
  dataset = "ami",
  states = c("NC", "SC", "GA", "FL"),
  income_brackets = c("0-30% AMI", "30-50% AMI")
)

# Nationwide analysis (all 51 states)
us_data <- load_cohort_data(dataset = "ami", vintage = "2022")

# Load integrated data (burden + utility rates + emissions)
nc_full <- load_burden_with_utilities(
  states = "NC",
  dataset = "ami",
  income_brackets = "0-30% AMI"
)

Utility Rate Data (requires database connection):

# Connect to database (if available)
# Database integration requires separate setup - see Database Integration section
# conn <- DBI::dbConnect(RSQLite::SQLite(), "/path/to/database.sqlite")

# Get utility rates for North Carolina (requires database connection)
# nc_rates <- get_utility_rates(conn, state = "NC")

# Get emissions regions
egrid <- get_egrid_regions(conn, zips = c(27701, 27705, 28052))

# Get retail sales projections
sales <- get_retail_sales_projections(conn, states = "NC", years = 2020:2030)

# Always disconnect when done
DBI::dbDisconnect(conn)

Example Analyses

See the vignette for complete examples:

vignette("integrating-utility-data", package = "emburden")

And the example script:

source("analysis/scripts/utility_rate_comparison.R")

Benefits: - Faster data access - Database queries 10-50x faster than CSV parsing - Integrated analysis - Join burden data with utility rates and emissions in single query - Flexible filtering - Query only needed states/income brackets instead of loading full CSVs - Environmental justice - Link high-burden areas with high-emissions grid regions - Policy modeling - Scenario analysis with utility rate structures and demand projections

  • Paper: “Net energy metrics reveal striking disparities across United States household energy burdens”
  • Data Source: DOE Low-Income Energy Affordability Data (LEAD) Tool via OpenEI
  • Methodology: See package vignettes