prereise.gather.demanddata.eia package¶
Subpackages¶
- prereise.gather.demanddata.eia.tests package
- Submodules
- prereise.gather.demanddata.eia.tests.test_clean_data module
- prereise.gather.demanddata.eia.tests.test_get_eia_data module
- prereise.gather.demanddata.eia.tests.test_map_ba module
create_ba_to_region_dataframe()
create_loadzone_dataframe()
test_aggregate_ba_demand_sums_first_columns_pairs()
test_aggregate_ba_demand_sums_first_three_columns()
test_get_demand_in_loadzone_case()
test_get_demand_in_loadzone_has_equal_total_demand()
test_map_buses_to_ba()
test_map_buses_to_county()
- Module contents
Submodules¶
prereise.gather.demanddata.eia.clean_data module¶
- prereise.gather.demanddata.eia.clean_data.fill_ba_demand(df_ba, ba_name, day_map)[source]¶
Replace missing data in BA demand and returns result.
- Parameters:
df_ba (pandas.DataFrame) – data frame for BA demand, shifted demand, and day of the week
ba_name (str) – name of the BA in data frame.
day_map (dict) – mapping for replacing missing demand data with shifted demand.
- Returns:
(pandas.Series) – series of BA demand filled in
- prereise.gather.demanddata.eia.clean_data.fix_dataframe_outliers(demand)[source]¶
Make a data frame of demand with outliers replaced with values interpolated from the non-outlier edge points using
slope_interpolate()
.- Parameters:
demand (pandas.Dataframe) – demand data frame with UTC timestamp as indicss and BA name as column name.
- Returns:
(pandas.DataFrame) – data frame with anomalous demand values replaced by interpolated values.
- prereise.gather.demanddata.eia.clean_data.replace_with_shifted_demand(demand, start, end)[source]¶
Replace missing data within overall demand data frame with averages of nearby shifted demand.
- Parameters:
demand (pandas.DataFrame) – data frame with hourly demand where the columns are BA regions.
start (pandas.Timestamp/numpy.datetime64/datetime.datetime) – start of period of interest.
end (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end of period of interest.
- Returns:
(pandas.DataFrame) – data frame with missing demand data filled in.
- prereise.gather.demanddata.eia.clean_data.slope_interpolate(ba_df)[source]¶
Look for demand outliers by applying a z-score threshold to the demand slope. Loop through all the outliers detected, determine the non-outlier edge points and then interpolate a line joining these 2 edge points. The line value at the timestamp of the the outlier event is used to replace the anomalous value.
- Parameters:
ba_df (pandas.DataFrame) – demand data frame with UTC timestamp as indices and BA name as column name.
- Returns:
(pandas.DataFrame) – data frame indexed with anomalous demand values replaced by interpolated values.
Note
It is implicitly assumed that:
1. demand is correlated with temperature, and temperature rise is limited by heat capacity which is finite and generally uniform across region; hence, temperature dependent derivative spikes are unphysical.
2. there is indeed nothing anomalous that happened to electrical usage in the relevant time range, so using a line to estimate the correct value is reasonable.
Todo
If there are more than a few hours (say > 4) of anomalous behavior, linear interpolation may give a bad estimate. Non-linear interpolation methods should be considered, and other information may be needed to interpolate properly, for example, the temperature data or other relevant profiles.
prereise.gather.demanddata.eia.get_eia_data module¶
- class prereise.gather.demanddata.eia.get_eia_data.EIAgov(token, series)[source]¶
Bases:
object
Copied from this link.
- Parameters:
token (str) – EIA token.
series (list) – id code(s) of the series to be downloaded.
- prereise.gather.demanddata.eia.get_eia_data.from_download(tok, start_date, end_date, offset_days, series_list)[source]¶
Download and assemble dataset of demand data per balancing authority for desired date range.
- Parameters:
tok (str) – token obtained by registering with EIA.
start_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – start date.
end_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end data.
series_list (list) – list of demand series names provided by EIA, e.g., [‘EBA.AVA-ALL.D.H’, ‘EBA.AZPS-ALL.D.H’].
offset_days (int) – number of business days for data to stabilize.
- Returns:
(pandas.DataFrame) – data frame with UTC timestamp as indices and BA series name as column names.
- prereise.gather.demanddata.eia.get_eia_data.from_excel(directory, series_list, start_date, end_date)[source]¶
Assemble EIA balancing authority (BA) data from pre-downloaded Excel spreadsheets. The spreadsheets contain data from July 2015 to present.
- Parameters:
directory (str) – location of Excel files.
series_list (list) – list of BA initials, e.g., [‘PSE’,BPAT’,’CISO’].
start_date (datetime.datetime) – desired start of dataset.
end_date (datetime.datetime) – desired end of dataset.
- Returns:
(pandas.DataFrame) – data frame with UTC timestamp as indices and BA series name as column names.
- prereise.gather.demanddata.eia.get_eia_data.get_ba_demand(ba_code_list, start_date, end_date, api_key)[source]¶
Download the demand between two dates for a list of balancing authorities.
- Parameters:
ba_code_list (pandas.DataFrame) – List of BAs to download from eia.
start_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – beginning bound for the demand data frame.
end_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end bound for the demand data frame.
api_key (string) – api key to fetch data.
- Returns:
(pandas.DataFrame) – data frame with columns of demand by BA.
prereise.gather.demanddata.eia.map_ba module¶
- prereise.gather.demanddata.eia.map_ba.aggregate_ba_demand(demand, mapping)[source]¶
Aggregate demand in BAs to regions as defined in the mapping dictionary
- Parameters:
demand (pandas.DataFrame) – demand profiles in BAs.
mapping (dict) – dictionary mapping of BA columns to regions.
- Returns:
(pandas.DataFrame) – aggregated demand profiles
- prereise.gather.demanddata.eia.map_ba.get_demand_in_loadzone(agg_demand, bus_map)[source]¶
Get demand in loadzones from aggregated demand of BA regions.
- Parameters:
agg_demand (pandas.DataFrame) – demand profiles as returned by
aggregate_ba_demand()
bus_map (pandas.DataFrame) – data frame used to map BA regions to load zones using real power demand weighting.
- Returns:
(pandas.DataFrame) – data frame with demand columns according to load zone.
- prereise.gather.demanddata.eia.map_ba.map_buses_to_ba(bus_df)[source]¶
Find the Balancing Authority in the U.S. territory that each query bus belongs to based on GIS information.
- Parameters:
bus_df ((pandas.DataFrame)) – data frame contains a list of entries with lat and long of buses.
- Returns:
(tuple) – the first entry is the input data frame with two columns, “County” and “BA”, added for each bus and the second entry is the list of bus indices that no county matches based on Census API (counties of such buses are assigned based on its nearest neighbour).
- prereise.gather.demanddata.eia.map_ba.map_buses_to_county(bus_county_map)[source]¶
Find the county in the U.S. territory that each bus in the query grid belongs to.
- Parameters:
bus_county_map (pandas.DataFrame) – data frame contains a list of entries with lat and long.
- Returns:
(tuple) – first element is a data frame of counties that buses locate. Second element is a list of bus indices that no county matches.