sfoe_extracting

Module: OFEN PDF parsing & updating

Purpose: extract tables from OFEN PDFs (pages 47/48/50 depending on the year), reshape some columns, and aggregate them into a final DataFrame for use in EcoDynElec.

ecodynelec.preprocessing.sfoe_extracting.ofen_pdf_to_df(file, page)[source]

Load a specific page from an OFEN PDF and return a pre-processed DataFrame.

Parameters:
  • file (str) – Path to the OFEN PDF (e.g. “…/2022.pdf”, “…/2024.pdf”).

  • page (int) – Page number to extract (47, 48, 49, 50).

Returns:

  • Page 47: Nuclear/Thermical/Wind/PV columns extracted

  • Page 48: Total/Conso_pompes_STEP columns extracted

  • Page 50: Conso_pompes_STEP/Prod_nette columns extracted

  • Others: raw table

Return type:

pd.DataFrame

ecodynelec.preprocessing.sfoe_extracting.split_col(df: DataFrame, page: int, names: list) DataFrame[source]

Replace one source column in the DataFrame by multiple new columns, depending on the page and extraction logic.

Parameters:
  • df (pd.DataFrame) – Table extracted with tabula.read_pdf(…) for a given page.

  • page (int) – Logical page number.

  • names (list[str]) – Names of the new columns to insert.

Returns:

Same DataFrame, with the source column replaced by names.

Return type:

pd.DataFrame

ecodynelec.preprocessing.sfoe_extracting.updating_ofen_data(file)[source]

Full pipeline for an OFEN PDF to obtain SFOE_data dataframe fo:

  • Load expected pages (depending on year),

  • Apply column splits and cleanup,

  • Concatenate tables,

  • Rename and reorder final columns,

  • Add ‘mois’ column and sort by year.

/!Only works for the 2024 update.

Parameters:

file (str) – Path to the OFEN PDF (e.g. ‘…/2024.pdf’, ‘…/2022.pdf’).

Returns:

Final DataFrame with columns: [“annee”,”mois”,”Hydro”,”Nuclear”,”Thermical”,”Conso_pompes_STEP”, “Prod_nette”,”Imports”,”Exports”,”Conso_CH”,”Pertes”,”Conso_Finale_CH”] Sorted by year and month.

Return type:

pd.DataFrame

SFOE data downloading :

SFOE data

SFOE data can be found on SFOE under the “Electricity statistics” section. Only pdf from 2022 to later are needed.

The downloaded .pdf files should then be placed in a ‘support_files/ofen_data’ and rename with the year “20XX.pdf” For example:

EcoDynElec/
├── ...
├── support_files/
│   ├── ofen_data/
│       ├── 2022.pdf
│       ├── 2023.pdf
│       └── ...