prereise.gather.demanddata.eia package

Subpackages

Submodules

prereise.gather.demanddata.eia.clean_data module

prereise.gather.demanddata.eia.clean_data.fill_ba_demand(df_ba, ba_name, day_map)[source]

Replace missing data in BA demand and returns result.

Parameters:
  • df_ba (pandas.DataFrame) – data frame for BA demand, shifted demand, and day of the week

  • ba_name (str) – name of the BA in data frame.

  • day_map (dict) – mapping for replacing missing demand data with shifted demand.

Returns:

(pandas.Series) – series of BA demand filled in

prereise.gather.demanddata.eia.clean_data.fix_dataframe_outliers(demand)[source]

Make a data frame of demand with outliers replaced with values interpolated from the non-outlier edge points using slope_interpolate().

Parameters:

demand (pandas.Dataframe) – demand data frame with UTC timestamp as indicss and BA name as column name.

Returns:

(pandas.DataFrame) – data frame with anomalous demand values replaced by interpolated values.

prereise.gather.demanddata.eia.clean_data.replace_with_shifted_demand(demand, start, end)[source]

Replace missing data within overall demand data frame with averages of nearby shifted demand.

Parameters:
  • demand (pandas.DataFrame) – data frame with hourly demand where the columns are BA regions.

  • start (pandas.Timestamp/numpy.datetime64/datetime.datetime) – start of period of interest.

  • end (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end of period of interest.

Returns:

(pandas.DataFrame) – data frame with missing demand data filled in.

prereise.gather.demanddata.eia.clean_data.slope_interpolate(ba_df)[source]

Look for demand outliers by applying a z-score threshold to the demand slope. Loop through all the outliers detected, determine the non-outlier edge points and then interpolate a line joining these 2 edge points. The line value at the timestamp of the the outlier event is used to replace the anomalous value.

Parameters:

ba_df (pandas.DataFrame) – demand data frame with UTC timestamp as indices and BA name as column name.

Returns:

(pandas.DataFrame) – data frame indexed with anomalous demand values replaced by interpolated values.

Note

It is implicitly assumed that:

1. demand is correlated with temperature, and temperature rise is limited by heat capacity which is finite and generally uniform across region; hence, temperature dependent derivative spikes are unphysical.

2. there is indeed nothing anomalous that happened to electrical usage in the relevant time range, so using a line to estimate the correct value is reasonable.

Todo

If there are more than a few hours (say > 4) of anomalous behavior, linear interpolation may give a bad estimate. Non-linear interpolation methods should be considered, and other information may be needed to interpolate properly, for example, the temperature data or other relevant profiles.

prereise.gather.demanddata.eia.get_eia_data module

class prereise.gather.demanddata.eia.get_eia_data.EIAgov(token, series)[source]

Bases: object

Copied from this link.

Parameters:
  • token (str) – EIA token.

  • series (list) – id code(s) of the series to be downloaded.

get_data()[source]

Convert json files into data frame.

Returns:

(pandas.DataFrame) – data frame.

raw(ser)[source]

Download json files from EIA.

Parameters:

ser (str) – list of file names.

Raises:

keyError – when URL or file are either not found or not valid.

prereise.gather.demanddata.eia.get_eia_data.from_download(tok, start_date, end_date, offset_days, series_list)[source]

Download and assemble dataset of demand data per balancing authority for desired date range.

Parameters:
  • tok (str) – token obtained by registering with EIA.

  • start_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – start date.

  • end_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end data.

  • series_list (list) – list of demand series names provided by EIA, e.g., [‘EBA.AVA-ALL.D.H’, ‘EBA.AZPS-ALL.D.H’].

  • offset_days (int) – number of business days for data to stabilize.

Returns:

(pandas.DataFrame) – data frame with UTC timestamp as indices and BA series name as column names.

prereise.gather.demanddata.eia.get_eia_data.from_excel(directory, series_list, start_date, end_date)[source]

Assemble EIA balancing authority (BA) data from pre-downloaded Excel spreadsheets. The spreadsheets contain data from July 2015 to present.

Parameters:
  • directory (str) – location of Excel files.

  • series_list (list) – list of BA initials, e.g., [‘PSE’,BPAT’,’CISO’].

  • start_date (datetime.datetime) – desired start of dataset.

  • end_date (datetime.datetime) – desired end of dataset.

Returns:

(pandas.DataFrame) – data frame with UTC timestamp as indices and BA series name as column names.

prereise.gather.demanddata.eia.get_eia_data.get_ba_demand(ba_code_list, start_date, end_date, api_key)[source]

Download the demand between two dates for a list of balancing authorities.

Parameters:
  • ba_code_list (pandas.DataFrame) – List of BAs to download from eia.

  • start_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – beginning bound for the demand data frame.

  • end_date (pandas.Timestamp/numpy.datetime64/datetime.datetime) – end bound for the demand data frame.

  • api_key (string) – api key to fetch data.

Returns:

(pandas.DataFrame) – data frame with columns of demand by BA.

prereise.gather.demanddata.eia.map_ba module

prereise.gather.demanddata.eia.map_ba.aggregate_ba_demand(demand, mapping)[source]

Aggregate demand in BAs to regions as defined in the mapping dictionary

Parameters:
  • demand (pandas.DataFrame) – demand profiles in BAs.

  • mapping (dict) – dictionary mapping of BA columns to regions.

Returns:

(pandas.DataFrame) – aggregated demand profiles

prereise.gather.demanddata.eia.map_ba.get_demand_in_loadzone(agg_demand, bus_map)[source]

Get demand in loadzones from aggregated demand of BA regions.

Parameters:
  • agg_demand (pandas.DataFrame) – demand profiles as returned by aggregate_ba_demand()

  • bus_map (pandas.DataFrame) – data frame used to map BA regions to load zones using real power demand weighting.

Returns:

(pandas.DataFrame) – data frame with demand columns according to load zone.

prereise.gather.demanddata.eia.map_ba.map_buses_to_ba(bus_df)[source]

Find the Balancing Authority in the U.S. territory that each query bus belongs to based on GIS information.

Parameters:

bus_df ((pandas.DataFrame)) – data frame contains a list of entries with lat and long of buses.

Returns:

(tuple) – the first entry is the input data frame with two columns, “County” and “BA”, added for each bus and the second entry is the list of bus indices that no county matches based on Census API (counties of such buses are assigned based on its nearest neighbour).

prereise.gather.demanddata.eia.map_ba.map_buses_to_county(bus_county_map)[source]

Find the county in the U.S. territory that each bus in the query grid belongs to.

Parameters:

bus_county_map (pandas.DataFrame) – data frame contains a list of entries with lat and long.

Returns:

(tuple) – first element is a data frame of counties that buses locate. Second element is a list of bus indices that no county matches.

Module contents