autocomplete

Collection of functions to perform a data autocomplete

ecodynelec.preprocessing.autocomplete.add_specific_gaps(all_gaps, name, length, long_gaps)[source]
ecodynelec.preprocessing.autocomplete.autocomplete(data: dict, n_hours: int = 2, days_around: int = 7, daytype_only: bool = False, limit: float = 0.3, ignore: bool = False, is_verbose: bool = False)[source]

Main function to auto-complete the data. Works with generation and import.

Parameters:
  • data (dict) – the dict of data to auto-complete.

  • n_hours (int, default to 2) – max number of hours missing in a row to consider a short gap and use linear interpolation.

  • days_around (int, default to 7) – number of days before and after a long gap to be used when creating an average day to complete the gap.

  • daytype_only (bool, default is False) – fills long gap using an average day build only with days of similar type (weekday, Saturday, Sunday)

  • limit (float, default to 0.3) – max relative size of gap to allow an autocomplete. If a gap is longer than this fraction of the data, it will be filled with zeros.

  • ignore (bool, default is False) – the missing data is flagged but not auto-completed. Displays a report if is_verbose is set to True.

  • is_verbose (bool, default is False) – to display information during the process.

Returns:

  • dict – dict of data with autocompleted information

  • pandas.DataFrame – pandas DataFrame with resolutions

ecodynelec.preprocessing.autocomplete.fill_all_excess(data: dict, period_indexes: dict)[source]

Fills with zeros the fields that were skipped

ecodynelec.preprocessing.autocomplete.fill_all_periods(data: dict, period_indexes: ndarray, deltas: dict, daytype_only: bool = False, is_verbose: bool = False)[source]

Fills all long gaps.

Parameters:
  • data (dict) – collection of data, with structure being {country: { unit: pandas.Series } }

  • period_indexes (numpy.ndarray) – matrix indicating the location and length of long gaps

  • deltas (dict) – collection of number of time steps to create the average days around gaps. Structure is {country: {unit: {gap_id: int} } }.

  • daytype_only (bool) – uses an average day build only with days of similar type (weekday, Saturday, Sunday)

  • is_verbose (bool, default to False) – to display information.

ecodynelec.preprocessing.autocomplete.fill_occasional(data: dict)[source]

Fills short gaps of data with linear interpolation.

ecodynelec.preprocessing.autocomplete.fill_one_period(avg_day, to_fill)[source]

Fills one single long gap using one average day.

ecodynelec.preprocessing.autocomplete.fill_one_series(data, period_indexes, delta, daytype_only=False)[source]

Fills all long gaps for one single series in one country

ecodynelec.preprocessing.autocomplete.find_missing(data: dict)[source]

Identifies the missing values for the entire set of data.

Parameters:

data (dict of pandas DataFrames) – the data to process

Returns:

  • dict (keys are countries) of dicts (keys are former columns)

  • of matrices. Final matrix has one identified gap per row and

  • three columns (length of gap, first…, and last index of gap)

ecodynelec.preprocessing.autocomplete.find_missing_one(series)[source]

Identifies all missing values for one single series.

Parameters:

series (pandas Series) – the data to process

Returns:

  • Matrix (n x 3). Final matrix has one identified gap

  • per row (n rows) and three columns (length of gap,

  • first…, and last index of gap)

ecodynelec.preprocessing.autocomplete.get_steps_per_hour(freq, dtype=<class 'int'>)[source]

Retrieve resolution for a specific country and field.

Parameters:
  • freq (str) – the base frequency of the time series

  • dtype (data-type, default to int) – the type of return. Default behavior returns an integer, i.e. zero when the frequency is lower than an hour. It may be convenient to sometimes return a fraction instead, using float.

Returns:

the number of time steps per hour to expect in a time series.

Return type:

dtype

ecodynelec.preprocessing.autocomplete.infer_one(obj)[source]

Infer frequency for one single time Series

ecodynelec.preprocessing.autocomplete.infer_resolution(data: dict)[source]

Infers the resolution of all fields for all countries

ecodynelec.preprocessing.autocomplete.longs_into_days(gaps, indexes, n_hours=2)[source]
ecodynelec.preprocessing.autocomplete.reduce_to_daytype(data, weekday)[source]
ecodynelec.preprocessing.autocomplete.report_missing(gaps: dict, datasizes: dict)[source]

Count and display missings

ecodynelec.preprocessing.autocomplete.select_long_gaps(gaps, name, lower, upper, length)[source]

Identify long gaps for one subcategory of one country with one unique threshold. Can make exception with some cases, e.g. solar at the extremes of dataset.

ecodynelec.preprocessing.autocomplete.set_deltas(data: dict, resolution, days_around: int)[source]

Compute the deltas of each subcategory for each country for the creation of typical days.

ecodynelec.preprocessing.autocomplete.set_lengths(data: dict)[source]

Compute the length of each subcategory for each country

ecodynelec.preprocessing.autocomplete.set_thresholds(data: dict, resolution, n_hours: int)[source]

Compute the thresholds of each subcategory for each country for the flagging of long gaps.

ecodynelec.preprocessing.autocomplete.sort_gaps(gaps: dict, lower: dict, lengths: dict, upper: dict | None = None)[source]

Identify long gaps (above threshold). Needs the length of data for specific processes

ecodynelec.preprocessing.autocomplete.to_original_series(obj, freq)[source]

Scale data back to original resolution. Applicable to pandas Series only.