autocomplete

Collection of functions to perform a data autocomplete

ecodynelec.preprocessing.autocomplete.add_specific_gaps(all_gaps, name, length, long_gaps)[source]

ecodynelec.preprocessing.autocomplete.autocomplete(data: dict, n_hours: int = 2, days_around: int = 7, daytype_only: bool = False, limit: float = 0.3, ignore: bool = False, is_verbose: bool = False)[source]

Main function to auto-complete the data. Works with generation and import.

Parameters:

data (dict) – the dict of data to auto-complete.
n_hours (int, default to 2) – max number of hours missing in a row to consider a short gap and use linear interpolation.
days_around (int, default to 7) – number of days before and after a long gap to be used when creating an average day to complete the gap.
daytype_only (bool, default is False) – fills long gap using an average day build only with days of similar type (weekday, Saturday, Sunday)
limit (float, default to 0.3) – max relative size of gap to allow an autocomplete. If a gap is longer than this fraction of the data, it will be filled with zeros.
ignore (bool, default is False) – the missing data is flagged but not auto-completed. Displays a report if is_verbose is set to True.
is_verbose (bool, default is False) – to display information during the process.

Returns:

dict – dict of data with autocompleted information
pandas.DataFrame – pandas DataFrame with resolutions

ecodynelec.preprocessing.autocomplete.fill_all_excess(data: dict, period_indexes: dict)[source]: Fills with zeros the fields that were skipped

ecodynelec.preprocessing.autocomplete.fill_all_periods(data: dict, period_indexes: ndarray, deltas: dict, daytype_only: bool = False, is_verbose: bool = False)[source]

Fills all long gaps.

Parameters:

data (dict) – collection of data, with structure being {country: { unit: pandas.Series } }
period_indexes (numpy.ndarray) – matrix indicating the location and length of long gaps
deltas (dict) – collection of number of time steps to create the average days around gaps. Structure is {country: {unit: {gap_id: int} } }.
daytype_only (bool) – uses an average day build only with days of similar type (weekday, Saturday, Sunday)
is_verbose (bool, default to False) – to display information.

ecodynelec.preprocessing.autocomplete.fill_occasional(data: dict)[source]: Fills short gaps of data with linear interpolation.

ecodynelec.preprocessing.autocomplete.fill_one_period(avg_day, to_fill)[source]: Fills one single long gap using one average day.

ecodynelec.preprocessing.autocomplete.fill_one_series(data, period_indexes, delta, daytype_only=False)[source]: Fills all long gaps for one single series in one country

ecodynelec.preprocessing.autocomplete.find_missing(data: dict)[source]

Identifies the missing values for the entire set of data.

Parameters:

data (dict of pandas DataFrames) – the data to process

Returns:

dict (keys are countries) of dicts (keys are former columns)
of matrices. Final matrix has one identified gap per row and
three columns (length of gap, first…, and last index of gap)

ecodynelec.preprocessing.autocomplete.find_missing_one(series)[source]

Identifies all missing values for one single series.

Parameters:

series (pandas Series) – the data to process

Returns:

Matrix (n x 3). Final matrix has one identified gap
per row (n rows) and three columns (length of gap,
first…, and last index of gap)

ecodynelec.preprocessing.autocomplete.get_steps_per_hour(freq, dtype=<class 'int'>)[source]

Retrieve resolution for a specific country and field.

Parameters:

freq (str) – the base frequency of the time series
dtype (data-type, default to int) – the type of return. Default behavior returns an integer, i.e. zero when the frequency is lower than an hour. It may be convenient to sometimes return a fraction instead, using float.

Returns:

the number of time steps per hour to expect in a time series.

Return type:

dtype

ecodynelec.preprocessing.autocomplete.infer_one(obj)[source]: Infer frequency for one single time Series

ecodynelec.preprocessing.autocomplete.infer_resolution(data: dict)[source]: Infers the resolution of all fields for all countries

ecodynelec.preprocessing.autocomplete.longs_into_days(gaps, indexes, n_hours=2)[source]

ecodynelec.preprocessing.autocomplete.reduce_to_daytype(data, weekday)[source]

ecodynelec.preprocessing.autocomplete.report_missing(gaps: dict, datasizes: dict)[source]: Count and display missings

ecodynelec.preprocessing.autocomplete.select_long_gaps(gaps, name, lower, upper, length)[source]: Identify long gaps for one subcategory of one country with one unique threshold. Can make exception with some cases, e.g. solar at the extremes of dataset.

ecodynelec.preprocessing.autocomplete.set_deltas(data: dict, resolution, days_around: int)[source]: Compute the deltas of each subcategory for each country for the creation of typical days.

ecodynelec.preprocessing.autocomplete.set_lengths(data: dict)[source]: Compute the length of each subcategory for each country

ecodynelec.preprocessing.autocomplete.set_thresholds(data: dict, resolution, n_hours: int)[source]: Compute the thresholds of each subcategory for each country for the flagging of long gaps.

ecodynelec.preprocessing.autocomplete.sort_gaps(gaps: dict, lower: dict, lengths: dict, upper: dict | None = None)[source]: Identify long gaps (above threshold). Needs the length of data for specific processes

ecodynelec.preprocessing.autocomplete.to_original_series(obj, freq)[source]: Scale data back to original resolution. Applicable to pandas Series only.