Downloading ENTSO-E data ======================== Data from the ENTSO-E is at the heat of ``ecodynelec``. Though ``ecodynelec`` integrates the possibility to download the required data from ENTSO-E servers directly, and can also deal with already manually downloaded data. This tutorial details the different possibilities to download the ENTSO-E data, i.e. with and without ``ecodynelec``. In any case, \ **an account must be created**\ on the `ENTSO-E website `__ for using the SFTP service and access the data. Manual downloading ------------------ Manual retrieval of the data requires an FTP software. We use `FileZilla `__ to illustrate the procedure. Provide the software with: \* Host: ``sftp://sftp-transparency.entsoe.eu`` \* Port: 22 \* Personal credentials Navigate through the remote architecture and download the files of interest. The generation files are located in ``/TP_export/AggregatedGenerationPerType_16.1.B_C/``. The exchanges files are located in ``/TP_export/PhysicalFlows_12.1.G/``. *Figure 1* give details on how to proceed with FileZilla. | |FileZilla handling| | *Figure 1: Download files using FileZilla* .. |FileZilla handling| image:: https://github.com/LESBAT-HEIG-VD/EcoDynElec/blob/main/docs/examples/images/Filezilla.png?raw=true Downloading via ``ecodynelec`` ------------------------------ The data can be downloaded via ``ecodynelec``. First, the configuration must be adapted, either using a `spreadsheet `__ or using `python `__. Then the download can be triggered, either in a `standalone fashion `__ or as part of the whole computation pipeline. The global tutorials on how to use ``ecodynelec`` `fully in Python `__ and `with spreadsheet configuration `__ give more details on the latter. Configuration via spreadsheet ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The spreadsheet used in this tutorial can be downloaded from the `examples on the github `__. | |Server tab Spreadsheet| | *Figure 2: Spreadsheet for configuration: Server tab suited for download* .. |Server tab Spreadsheet| image:: https://github.com/LESBAT-HEIG-VD/EcoDynElec/blob/main/docs/examples/images/ParameterExcel_ServerDownload.png?raw=true Each field must be written as presented, in low case. The fields are: \* **host**: the address of the sftp server. Per default, we use “*sftp-transparency.entsoe.eu*”. \* **port**: the port to connect to the server. Per default, the port is *22*. \* **username**: your username, as created for free on the `ENTSO-E website `__. It should be an email@address. If the field is left blank in the spreadsheet, the credential will be asked when the downloading is launched. \* **password**: your password, as created for free on the `ENTSO-E website `__. For security reasons, we do recommend to let the field blank, which will let the ``downloading`` package ask for the password in a more secured manner. \* **use server**: **TRUE** if you want to download the data. Blank or **FALSE** will not download the data (default). \* **remove unused**: **TRUE** if you want the target directories (where to download) to be emptied before downloading. Blank or **FALSE** to ignore other files in the target directory (default). The files will be downloaded and saved in the directories indicated at the fields **path generation** and **path exchanges** of the tab *Filepath* of the spreadsheet (c.f. *Figure 3*). Also make sure you set the date accordingly (tab *Parameter*), to allow the selection of files to download. More information on the various configuration possibilities available in the `input data section `__. | |Filepath tab Spreadsheet| | *Figure 3: Spreadsheet for configuration: Paths tab* .. |Filepath tab Spreadsheet| image:: https://github.com/LESBAT-HEIG-VD/EcoDynElec/blob/main/docs/examples/images/ParameterExcel_PathsDownload.png?raw=true Configuration in Python ~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 from ecodynelec.parameter import Parameter # Import the configuration management class # Initialize the parameter class my_config = Parameter() The server connection needs a configuration which is not the default. Here is how to change it. **Note** that credentials can be specified directly in the configuration object, but this is not necessary. Instead in this example, we let them empty (or ``None`` for the example, which is strictly equivalent), and these will be asked later when the downloading starts. .. code:: ipython3 ### Configure the server connection my_config.server.useServer = True # Specifically ask to download data my_config.server.host = "sftp-transparency.entsoe.eu" # This server is already set per default after initialization my_config.server.port = 22 # This port is already set per default after initialization ### Credentials my_config.server.username = None my_config.server.password = None Each field is accessible and modifiable with the syntax ``my_config.server.field``. The fields are: \* ``host``: the address of the sftp server. Per default, we use “*sftp-transparency.entsoe.eu*”. \* ``port``: the port to connect to the server. Per default, the port is *22*. \* ``username``: your username, as created for free on the `ENTSO-E website `__. It should be an email@address. If the field is left blank in the spreadsheet, the credential will be asked when the downloading is launched. \* ``password``: your password, as created for free on the `ENTSO-E website `__. For security reasons, we do recommend to not specify it, which will let the ``downloading`` package ask for the password in a more secured manner. \* ``useServer``: **TRUE** if you want to download the data. Blank or **FALSE** will not download the data (default). \* ``removeUnused``: **TRUE** if you want the target directories (where to download) to be emptied before downloading. Blank or **FALSE** to ignore other files in the target directory (default). \* ``_remoteGenerationDir``: where to find the generation data on the ENTSO-E server. This field should be left per default, i.e. not specified. However it is included for flexibility purpose if the server modifies its architecture. \* ``_remoteExchangesDir``: where to find the exchanges data on the ENTSO-E server. This field should be left per default, i.e. not specified. However it is included for flexibility purpose if the server modifies its architecture. The data will be downloaded where the ``my_config.path`` section points at, these fields must then be specified. Note that these ``path`` settings are the same used by ``ecodynelec`` to find local files containing data to be used in the main computation. .. code:: ipython3 # Indicate where to save generation data my_config.path.generation = "./test_data/downloads/generations/" # Indicate where to save exchange data my_config.path.exchanges = "./test_data/downloads/exchanges/" Finally, the ``start`` and ``end`` dates must be specified in the main section of the configuration object to only download the useful files. .. code:: ipython3 ### Set the dates (to select files to download) my_config.start = '2017-02-01 05:00' my_config.end = '2017-02-01 13:00' Standalone download ~~~~~~~~~~~~~~~~~~~ Once the configuration is set properly, the download can be triggered. This section demonstrates the standalone download, and showcases the additional specific parameters otherwise not accessible. However for a more generic usage, the downloading feature has also been integrated to the `whole computation pipeline `__ of ``ecodynelec``. .. code:: ipython3 from ecodynelec.preprocessing.downloading import download Here all parameters are specified, however only ``config`` is mandatory, and every other parameter use default values if not specified. **Note** that the configuration used here relies on the ``Spreadsheet_download.xlsx``, but similarly to the whole ``ecodynelec`` pipeline, the ``config=`` parameter can also be a ``Parameter`` object, such as the ``my_config`` that was built in the `above section `__. .. code:: ipython3 download(config="./Spreadsheet_download.xlsx", threshold_minutes=15, threshold_size=0.9, is_verbose=True) .. parsed-literal:: Connection... .. parsed-literal:: Username: ledee.public@gmail.com Password: ········ .. parsed-literal:: [Generation 1/1] Transferred: 45.9 MB Out of: 118.1 .. parsed-literal:: KeyboardInterrupt The download can be a time consuming process, Thus in the previous cell, the few extra parameters help deciding whether or not downloading a specific file from the server. This comes handy only in the case of re-using ``ecodynelec`` regularly, an occasional or one-time usage will not be affected by these extra parameters. - ``threshold_minutes``: if the last modification of a file on the server occurred *less than* ``threshold_minutes`` *after* the last download of that file (if the downloaded file still exist on the user’s computer too), the remote file is not downloaded. **Default is 15 min**. The server “modifies” files regularly, either with no changes in the data (simple server maintenance) or some data modifications (as new information comes in). The parameter allows to skip a file if the file on the server is considered as not new enough. - ``threshold_size``: If the file has been partially downloaded, but the size of the local file is *less than* ``threshold_size`` of the server file (i.e. downloaded one is *significantly* smaller than the file on server), then the file is downloaded again. **Default is 90%**, i.e. the local copy must be less than 90% of the size of the remote to force the download. For whatever reason, a download may fail half way (connection issue, manually stopping a too-long process, etc.). In such case, ``threshold_minutes`` may prevent to resume the download. For this reason, ``threshold_size`` was added to still force a download even if a file does not comply with the ``threshold_minutes`` rule.