Skip to main content

πŸš€ Get Started

This tutorial guides you running experiments.

1. ⏬ Clone the Repository​

cd /path/to/your/project
git clone https://github.com/Ladbaby/PyOmniTS.git

2. πŸ’Ώ Prepare the Environment​

  1. Create a new Python virtual environment via the tool of your choice, and activate it. For example, using Miniconda/Anaconda:

    conda create -n pyomnits python=3.12
    conda activate pyomnits

    Python 3.10~3.12 have been tested.

  2. Install dependencies.

    Choose one of the options:

    • Option 1: Fuzzy package versions, the legacy way.

      pip install -r requirements.txt

      πŸ’‘ For faster installation speed, consider installing uv and running uv pip install -r requirements.txt instead.

    • Option 2: Exact package versions, the aggressive way.

      ⚠️ It assumes your Linux server to have cuda version 12, which can be less flexible than option 1.

      Install uv, then:

      uv pip sync requirements.lock

    πŸ”₯Note: some packages are only used by a few models/datasets, which are optional. See comments in requirements.txt.

3. πŸ’Ύ Prepare Datasets​

3.1 Regular​

Get them from [Google Drive] provided by Time-Series-Library, which includes the following datasets in this repository:

  • ECL (electricity)
  • ETTh1 (ETT-small)
  • ETTm1 (ETT-small)
  • ILI (illness)
  • Traffic (traffic)
  • Weather (weather)

And place them under storage/datasets folder of this project (create the folder if not exists, or you can use symbolic link ln -s to redirect to existing dataset files).

You will get the following file structure under storage/datasets:

.
β”œβ”€β”€ electricity/
β”‚ └── electricity.csv
β”œβ”€β”€ ETT-small/
β”‚ β”œβ”€β”€ ETTh1.csv
β”‚ β”œβ”€β”€ ETTh2.csv
β”‚ β”œβ”€β”€ ETTm1.csv
β”‚ └── ETTm2.csv
β”œβ”€β”€ illness/
β”‚ └── national_illness.csv
β”œβ”€β”€ traffic/
β”‚ └── traffic.csv
└── weather/
└── weather.csv

3.2 Irregular​

3.2.1 Human Activity​

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under storage/datasets, after the code finish preprocessing:

.
└── HumanActivity/
β”œβ”€β”€ processed/
β”‚ └── data.pt
└── raw/
└── ConfLongDemo_JSI.txt

3.2.2 MIMIC III​

Since MIMIC III requires credentialed access:

  • Request for raw data from here. Files can be put wherever you like, and you don't have to extract .csv.gz as .csv.

  • Data preprocessing

    Choose one of the options:

    • Option 1: Use the revised scripts in PyOmniTS.
      • Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.7, numpy 1.21.6, and pandas 1.3.5

        conda create -n python37 python=3.7
        conda activate python37
        pip install numpy==1.21.6 pandas==1.3.5
      • python data/dependencies/MIMIC_III/preprocess/0_run_all.py

    • Option 2: Use the original scripts in gru_ode_bayes.
      • Follow the processing scripts in gru_ode_bayes to get complete_tensor.csv.
      • Put the result under ~/.tsdm/rawdata/MIMIC_III_DeBrouwer2019/complete_tensor.csv.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):

.
β”œβ”€β”€ datasets/
β”‚ └── MIMIC_III_DeBrouwer2019/
β”‚ β”œβ”€β”€ metadata.parquet
β”‚ └── timeseries.parquet
└── rawdata/
└── MIMIC_III_DeBrouwer2019/
└── complete_tensor.csv

3.2.2 MIMIC IV​

Since MIMIC IV requires credentialed access:

  • Request for raw data from here. Files can be put wherever you like, and you don't have to extract .csv.gz as .csv.

  • Data preprocessing

    Choose one of the options:

    • Option 1: Use the revised scripts in PyOmniTS.
      • Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.8, numpy 1.24.4, and pandas 2.0.3

        conda create -n python38 python=3.8
        conda activate python38
        pip install numpy==1.24.4 pandas==2.0.3
      • python data/dependencies/MIMIC_IV/preprocess/0_run_all.py

    • Option 2: Use the original scripts in NeuralFlows.
      • Follow the processing scripts in NeuralFlows to get full_dataset.csv.
      • Put the result under ~/.tsdm/rawdata/MIMIC_IV_Bilos2021/full_dataset.csv.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):

.
β”œβ”€β”€ datasets/
β”‚ └── MIMIC_IV_Bilos2021/
β”‚ └── timeseries.parquet
└── rawdata/
└── MIMIC_IV_Bilos2021/
└── full_dataset.csv

3.2.3 PhysioNet'12​

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing:

.
β”œβ”€β”€ datasets/
β”‚ └── Physionet2012/
β”‚ β”œβ”€β”€ Physionet2012-set-A-sparse.tar
β”‚ β”œβ”€β”€ Physionet2012-set-B-sparse.tar
β”‚ └── Physionet2012-set-C-sparse.tar
└── rawdata/
└── Physionet2012/
β”œβ”€β”€ set-a.tar.gz
β”œβ”€β”€ set-b.tar.gz
└── set-c.tar.gz

3.2.4 USHCN​

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing:

.
β”œβ”€β”€ datasets/
β”‚ └── USHCN_DeBrouwer2019/
β”‚ └── USHCN_DeBrouwer2019.parquet
└── rawdata/
└── USHCN_DeBrouwer2019/
└── small_chunked_sporadic.csv

4. πŸ“‚ (Optional) Folder Structure​

You can optionally learn how PyOmniTS organize its folder structure:

.
β”œβ”€β”€ configs/ # (Auto-generated) YAML configs for experiments. Only saved as references, not input parameters.
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ data_provider/
| | β”œβ”€β”€ datasets/ # Main classes of datasets. File names match the string provided in --dataset_name.
| | └── data_factory.py # Provides an interface to get torch.utils.data.Dataset and torch.utils.data.DataLoader
| └── dependencies/ # Dependencies for dataset classes under data/data_provider/datasets/
β”œβ”€β”€ docs # Documentations
β”œβ”€β”€ exp/
| β”œβ”€β”€ exp_basic.py # Parent class for experiments.
| └── exp_main.py # Main class for experiments, inherit from the class in exp_basic.py
β”œβ”€β”€ layers/ # Dependencies for model classes under models/
β”œβ”€β”€ logs/ # (Auto-generated) Auto-rotated logs when running experiments.
β”œβ”€β”€ loss_fns/ # Main classes of loss functions. File names match the string provided in --loss.
β”œβ”€β”€ lr_schedulers/ # Main classes of some learning rate schedulers.
β”œβ”€β”€ models/ # Main classes of models. File names match the string provided in --model_name.
β”œβ”€β”€ scripts/ # Launch scripts for experiments.
β”œβ”€β”€ storage/ # (Auto-generated) General purpose storage folder, not recorded by git.
| β”œβ”€β”€ datasets/ # Time series data for some datasets.
| └── results/ # Experiment results.
β”œβ”€β”€ tests/ # Unit tests only used by PyOmniTS maintainers.
β”œβ”€β”€ utils/
| β”œβ”€β”€ configs.py # Command line arguments accepted by main.py
| β”œβ”€β”€ ExpConfigs.py # Dataclass that wraps utils/configs.py for typo check. Passed to models, datasets, loss_fns,... for their initializations.
| β”œβ”€β”€ globals.py # A few global variables (logger, accelerator,...).
| β”œβ”€β”€ metrics.py # Calculate metrics (e.g., MSE) during testing.
| └── tools.py # misc helper functions and classes.
β”œβ”€β”€ wandb/ # (Auto-generated) Weight & Bias logs when --wandb 1 or --sweep 1.
β”œβ”€β”€ .all-contributorsrc # Only used in README.md.
β”œβ”€β”€ .gitignore # Git ignore rules.
β”œβ”€β”€ .python-version # Recommended Python version, display only.
β”œβ”€β”€ LICENSE # MIT License.
β”œβ”€β”€ main.py # Main entrance for experiments.
β”œβ”€β”€ pyproject.toml # Standard configuration file for Python projects.
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.lock # Python package requirements (with versions).
β”œβ”€β”€ requirements.txt # Python package requirements (without versions).
β”œβ”€β”€ run_unittest.sh # Launch script for unit tests in tests/. Only used by PyOmniTS maintainers.
└── run.sh # Launch script for scripts/. Useful when launching multiple experiments at once.

Core logic when running experiments:

scripts/ β†’ main.py β†’ exp/exp_main.py

5. πŸ”₯ Training​

Training scripts are located in scripts folder. For example, to train mTAN on dataset Human Activity:

sh scripts/mTAN/HumanActivity.sh

Training results will be organized in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0

6. ❄️ Testing​

Testing will be automatically conducted once the training finished. If you wish to run test only, change command line argument --is_training in training script from 1 to 0 and run the script.

Testing result metric.json will be saved in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0/eval_%Y_%m%d_%H%M