High resolution synthetic residential energy use profiles for the United States
www.nature.com/scientificdata
High resolution synthetic
Data Descriptor residential energy use profiles for
the United States
OPEN
Swapna Thorve 1,2 ✉, Young Yun Baek1, Samarth Swarup1, Henning Mortveit1,3,
Achla Marathe1, Anil Vullikanti1,2 & Madhav Marathe1,2 ✉
Efficient energy consumption is crucial for achieving sustainable energy goals in the era of climate
change and grid modernization. Thus, it is vital to understand how energy is consumed at finer
resolutions such as household in order to plan demand-response events or analyze impacts of weather,
electricity prices, electric vehicles, solar, and occupancy schedules on energy consumption. However,
availability and access to detailed energy-use data, which would enable detailed studies, has been
rare. In this paper, we release a unique, large-scale, digital-twin of residential energy-use dataset for
the residential sector across the contiguous United States covering millions of households. The data
comprise of hourly energy use profiles for synthetic households, disaggregated into Thermostatically
Controlled Loads (TCL) and appliance use. The underlying framework is constructed using a bottom-up
approach. Diverse open-source surveys and first principles models are used for end-use modeling.
Extensive validation of the synthetic dataset has been conducted through comparisons with reported
energy-use data. We present a detailed, open, high resolution, residential energy-use dataset for the
United States.
Background & Summary
Modernization of the U.S. electric grid is occurring at a noteworthy rate due to the installation of new technologies within the grid such as smart meters. They enable two-way communication between the customer and
utilities, providing information and granular control of power usage for individual households1,2. The grid is
also witnessing rapid transformations due to increasing penetration of electric vehicles (EV) and distributed
energy resources (DER) such as rooftop photovoltaics (PV), community solar, and wind energy. While this wave
of modernization is beneficial, the electric grid is simultaneously facing a sharp increase in crisis situations as
a result of climate change phenomena3,4 such as extreme weather events and global warming. One example of
extreme weather is the February 2021 North American cold wave that caused a tremendous strain on the power
grid especially in Texas where millions lost power for days5. Another example is where global warming impacts
household HVAC energy use. Although the rise of 1° to 2 °C in winter temperatures is expected to decrease
heating requirements, a similar rise in summer temperatures is expected to increase cooling needs significantly6.
In the face of these challenges, achieving sustainable energy goals has become paramount for maintaining a
healthy grid. To this end, the research community is faced with important questions regarding reduction of carbon footprints7–11, incentivizing DER adoption12, studying benefits of building energy retrofit9,13,14, integration
of electric vehicles15 and consumer behavior16 in the grid, and mechanisms for designing electricity pricing17,18
to create efficient residential consumption patterns. Answering many of these questions requires comprehensive
knowledge of energy-use patterns, building stock, the structure of distribution networks, consumer behaviors,
and so on. However, such exhaustive datasets are rarely freely available (or available at all) for research use,
making it hard for the research community to pursue these endeavours19. Reasons for unavailability of such data
range from privacy concerns to the lack of a system for making data available to researchers.
Most of the published energy use data are metered data, a result of longitudinal studies conducted by
researchers (Table 1) with relatively small samples of households that may not be representative of the wider
1
Network Systems Science and Advanced Computing, Biocomplexity Institute and Initiative, University of Virginia,
Charlottesville, USA. 2Department of Computer Science, University of Virginia, Charlottesville, USA. 3Department of
Engineering Systems and Environment, University of Virginia, Charlottesville, USA. ✉e-mail: ;
Scientific Data |
(2023) 10:76 | https://doi.org/10.1038/s41597-022-01914-1
1
www.nature.com/scientificdata
www.nature.com/scientificdata/
Authors/Dataset
Description
Klemanjak et al.26,75
A synthetic energy demand dataset was released for 21 appliances in Austria in 2020. Data collected from two
households was used to train models and then appropriate noise was added for appliance start times and durations to
mimic variations in actual consumption patterns.
Kolter et al.76,77
The Reference Energy Disaggregation Data Set (REDD) is published by MIT. The dataset contains high-frequency
current/voltage waveform data of the power mains in households along with labeled circuits in the house.
Makonin et al.78
The Rainforest Automation Energy (RAE) dataset was published by Harvard in 2017. The dataset contains 1 Hz data
(mains and sub-meters) from two residential houses.
Murray et al.79,80
Load measurements from 20 households of UK from a two year longitudinal study.
Pecan Street
Labeled circuit data for households across major cities in the U.S. This is said to be the most comprehensive disaggregate energy data available for the U.S.
22,23
Rashid et al.81,82
The I-blend dataset has recorded minute-level consumption of all the buildings at an academic institute in India over a
period of 52 months
Paige et al.83,84
The flEECe dataset provides energy data at a 1 Hz sampling rate for four circuits for six net-zero energy senior housing
units in Virginia, USA for nine months
Shin et al.85,86
The first Korean dataset measuring appliance-level energy data was released in 2019 for 22 houses in Korea.
Kelly et al.20,87
Power demand is recorded from five houses UK houses at two levels – whole house and individual appliances. This
dataset is referred to as the UK-Dale dataset. Two versions of this dataset have been released.
Anderson et al.88,89
Building-Level fUlly-labeled dataset for Electricity Disaggregation (BLUED) for one household in Pittsburg U.S. for
one week. State transition of appliances are labeled and time-stamped, providing the necessary ground truth for the
evaluation of NILM algorithms.
Barker et al.90,91
Electricity usage data is monitored every minute from nearly every plug load from 400 anonymous homes.
Beckel et al.92
Electricity consumption is monitored via smart plugs for six households in Switzerland over a period of 8 months.
Pereira et al.93–95
Power usage for 44 apartments and 6 homes in Portugal is collected for 264 days at 30 minute intervals. The advanced
version of this dataset ‘SustDataED2’ dataset contains 96 days of aggregated and individual appliance consumption
from one household in Portugal.
Monacchi et al.96,97
Common household devices are monitored for power (...truncated)