Load DOE LEAD Tool Cohort Data — load_cohort

Load household energy burden cohort data with automatic fallback:

Try local database
Fall back to local CSV files
Auto-download from OpenEI if neither exists
Auto-import downloaded data to database for future use

Usage

load_cohort_data(
  dataset = c("ami", "fpl"),
  states = NULL,
  counties = NULL,
  vintage = "2022",
  income_brackets = NULL,
  verbose = TRUE,
  ...
)

Arguments

dataset: Character, either "ami" (Area Median Income) or "fpl" (Federal Poverty Line)
states: Character vector of state abbreviations to filter by (optional)
counties: Character vector of county names or FIPS codes to filter by (optional). County names are matched case-insensitively. Requires states to be specified.
vintage: Character, data vintage: "2018" or "2022" (default "2022")
income_brackets: Character vector of income brackets to filter by (optional)
verbose: Logical, print status messages (default TRUE)
...: Additional filter expressions passed to dplyr::filter() for dynamic filtering. Allows filtering by any column in the dataset using tidyverse syntax. Example: households > 100, total_income > 50000

Value

A tibble with columns:

geoid: Census tract identifier
income_bracket: Income bracket label
households: Number of households
total_income: Total household income ($)
total_electricity_spend: Total electricity spending ($)
total_gas_spend: Total gas spending ($)
total_other_spend: Total other fuel spending ($)
TEN: Housing tenure category (1=Owned free/clear, 2=Owned with mortgage, 3=Rented, 4=Occupied without rent). Enables analysis of energy burden differences between renters and owners.
TEN-YBL6: Housing tenure crossed with year structure built (6 categories). Allows analysis of how building age and ownership status interact to affect energy burden (e.g., older rental units vs newer owner-occupied homes).
TEN-BLD: Housing tenure crossed with building type (e.g., single-family, multi-unit). Enables analysis of energy burden across different housing structures and ownership patterns.
TEN-HFL: Housing tenure crossed with primary heating fuel type (e.g., gas, electric, oil). Critical for analyzing how heating fuel choice and tenure status jointly influence energy costs and burden.

Examples

if (FALSE) { # \dontrun{
# Single state (fast, good for learning)
nc_ami <- load_cohort_data(dataset = "ami", states = "NC")

# Multiple states (regional analysis)
southeast <- load_cohort_data(dataset = "fpl", states = c("NC", "SC", "GA", "FL"))

# Nationwide (all 51 states - no filter)
us_data <- load_cohort_data(dataset = "ami", vintage = "2022")

# Load specific vintage
nc_2018 <- load_cohort_data(dataset = "ami", states = "NC", vintage = "2018")

# Filter to specific income brackets
low_income <- load_cohort_data(
  dataset = "ami",
  states = "NC",
  income_brackets = c("0-30% AMI", "30-50% AMI")
)

# Filter to specific counties within a state
triangle <- load_cohort_data(
  dataset = "fpl",
  states = "NC",
  counties = c("Orange", "Durham", "Wake")
)

# Or use county FIPS codes
orange <- load_cohort_data(
  dataset = "fpl",
  states = "NC",
  counties = "37135"
)

# Use dynamic filtering for custom criteria
high_burden <- load_cohort_data(
  dataset = "ami",
  states = "NC",
  households > 100,
  total_electricity_spend / total_income > 0.06
)

# Analyze energy burden by housing characteristics
# Compare renters vs owners by heating fuel type
nc_housing <- load_cohort_data(dataset = "ami", states = "NC")
library(dplyr)

# Group by tenure and heating fuel to analyze energy burden patterns
housing_analysis <- nc_housing %>%
  filter(!is.na(TEN), !is.na(`TEN-HFL`)) %>%
  group_by(TEN, `TEN-HFL`) %>%
  summarise(
    total_households = sum(households),
    avg_energy_burden = weighted.mean(
      (total_electricity_spend + total_gas_spend + total_other_spend) / total_income,
      w = households,
      na.rm = TRUE
    ),
    .groups = "drop"
  )
} # }