autocomplete
Collection of functions to perform a data autocomplete
- ecodynelec.preprocessing.autocomplete.autocomplete(data: dict, n_hours: int = 2, days_around: int = 7, daytype_only: bool = False, limit: float = 0.3, ignore: bool = False, is_verbose: bool = False)[source]
Main function to auto-complete the data. Works with generation and import.
- Parameters:
data (dict) – the dict of data to auto-complete.
n_hours (int, default to 2) – max number of hours missing in a row to consider a short gap and use linear interpolation.
days_around (int, default to 7) – number of days before and after a long gap to be used when creating an average day to complete the gap.
daytype_only (bool, default is False) – fills long gap using an average day build only with days of similar type (weekday, Saturday, Sunday)
limit (float, default to 0.3) – max relative size of gap to allow an autocomplete. If a gap is longer than this fraction of the data, it will be filled with zeros.
ignore (bool, default is False) – the missing data is flagged but not auto-completed. Displays a report if is_verbose is set to True.
is_verbose (bool, default is False) – to display information during the process.
- Returns:
dict – dict of data with autocompleted information
pandas.DataFrame – pandas DataFrame with resolutions
- ecodynelec.preprocessing.autocomplete.fill_all_excess(data: dict, period_indexes: dict)[source]
Fills with zeros the fields that were skipped
- ecodynelec.preprocessing.autocomplete.fill_all_periods(data: dict, period_indexes: ndarray, deltas: dict, daytype_only: bool = False, is_verbose: bool = False)[source]
Fills all long gaps.
- Parameters:
data (dict) – collection of data, with structure being {country: { unit: pandas.Series } }
period_indexes (numpy.ndarray) – matrix indicating the location and length of long gaps
deltas (dict) – collection of number of time steps to create the average days around gaps. Structure is {country: {unit: {gap_id: int} } }.
daytype_only (bool) – uses an average day build only with days of similar type (weekday, Saturday, Sunday)
is_verbose (bool, default to False) – to display information.
- ecodynelec.preprocessing.autocomplete.fill_occasional(data: dict)[source]
Fills short gaps of data with linear interpolation.
- ecodynelec.preprocessing.autocomplete.fill_one_period(avg_day, to_fill)[source]
Fills one single long gap using one average day.
- ecodynelec.preprocessing.autocomplete.fill_one_series(data, period_indexes, delta, daytype_only=False)[source]
Fills all long gaps for one single series in one country
- ecodynelec.preprocessing.autocomplete.find_missing(data: dict)[source]
Identifies the missing values for the entire set of data.
- Parameters:
data (dict of pandas DataFrames) – the data to process
- Returns:
dict (keys are countries) of dicts (keys are former columns)
of matrices. Final matrix has one identified gap per row and
three columns (length of gap, first…, and last index of gap)
- ecodynelec.preprocessing.autocomplete.find_missing_one(series)[source]
Identifies all missing values for one single series.
- Parameters:
series (pandas Series) – the data to process
- Returns:
Matrix (n x 3). Final matrix has one identified gap
per row (n rows) and three columns (length of gap,
first…, and last index of gap)
- ecodynelec.preprocessing.autocomplete.get_steps_per_hour(freq, dtype=<class 'int'>)[source]
Retrieve resolution for a specific country and field.
- Parameters:
freq (str) – the base frequency of the time series
dtype (data-type, default to int) – the type of return. Default behavior returns an integer, i.e. zero when the frequency is lower than an hour. It may be convenient to sometimes return a fraction instead, using float.
- Returns:
the number of time steps per hour to expect in a time series.
- Return type:
dtype
- ecodynelec.preprocessing.autocomplete.infer_one(obj)[source]
Infer frequency for one single time Series
- ecodynelec.preprocessing.autocomplete.infer_resolution(data: dict)[source]
Infers the resolution of all fields for all countries
- ecodynelec.preprocessing.autocomplete.report_missing(gaps: dict, datasizes: dict)[source]
Count and display missings
- ecodynelec.preprocessing.autocomplete.select_long_gaps(gaps, name, lower, upper, length)[source]
Identify long gaps for one subcategory of one country with one unique threshold. Can make exception with some cases, e.g. solar at the extremes of dataset.
- ecodynelec.preprocessing.autocomplete.set_deltas(data: dict, resolution, days_around: int)[source]
Compute the deltas of each subcategory for each country for the creation of typical days.
- ecodynelec.preprocessing.autocomplete.set_lengths(data: dict)[source]
Compute the length of each subcategory for each country
- ecodynelec.preprocessing.autocomplete.set_thresholds(data: dict, resolution, n_hours: int)[source]
Compute the thresholds of each subcategory for each country for the flagging of long gaps.