sfoe_extracting
Module: OFEN PDF parsing & updating
Purpose: extract tables from OFEN PDFs (pages 47/48/50 depending on the year), reshape some columns, and aggregate them into a final DataFrame for use in EcoDynElec.
- ecodynelec.preprocessing.sfoe_extracting.ofen_pdf_to_df(file, page)[source]
Load a specific page from an OFEN PDF and return a pre-processed DataFrame.
- Parameters:
file (str) – Path to the OFEN PDF (e.g. “…/2022.pdf”, “…/2024.pdf”).
page (int) – Page number to extract (47, 48, 49, 50).
- Returns:
Page 47: Nuclear/Thermical/Wind/PV columns extracted
Page 48: Total/Conso_pompes_STEP columns extracted
Page 50: Conso_pompes_STEP/Prod_nette columns extracted
Others: raw table
- Return type:
pd.DataFrame
- ecodynelec.preprocessing.sfoe_extracting.split_col(df: DataFrame, page: int, names: list) DataFrame[source]
Replace one source column in the DataFrame by multiple new columns, depending on the page and extraction logic.
- Parameters:
df (pd.DataFrame) – Table extracted with tabula.read_pdf(…) for a given page.
page (int) – Logical page number.
names (list[str]) – Names of the new columns to insert.
- Returns:
Same DataFrame, with the source column replaced by names.
- Return type:
pd.DataFrame
- ecodynelec.preprocessing.sfoe_extracting.updating_ofen_data(file)[source]
Full pipeline for an OFEN PDF to obtain SFOE_data dataframe fo:
Load expected pages (depending on year),
Apply column splits and cleanup,
Concatenate tables,
Rename and reorder final columns,
Add ‘mois’ column and sort by year.
/!Only works for the 2024 update.
- Parameters:
file (str) – Path to the OFEN PDF (e.g. ‘…/2024.pdf’, ‘…/2022.pdf’).
- Returns:
Final DataFrame with columns: [“annee”,”mois”,”Hydro”,”Nuclear”,”Thermical”,”Conso_pompes_STEP”, “Prod_nette”,”Imports”,”Exports”,”Conso_CH”,”Pertes”,”Conso_Finale_CH”] Sorted by year and month.
- Return type:
pd.DataFrame
SFOE data downloading :
SFOE data
SFOE data can be found on SFOE under the “Electricity statistics” section. Only pdf from 2022 to later are needed.
The downloaded .pdf files should then be placed in a ‘support_files/ofen_data’ and rename with the year “20XX.pdf” For example:
EcoDynElec/
├── ...
├── support_files/
│ ├── ofen_data/
│ ├── 2022.pdf
│ ├── 2023.pdf
│ └── ...