🚀 Get Started

This tutorial guides you running experiments.

1. ⏬ Clone the Repository

cd /path/to/your/project
git clone https://github.com/Ladbaby/PyOmniTS.git

2. 💿 Prepare the Environment

Create a new Python virtual environment via the tool of your choice, and activate it. For example, using Miniconda/Anaconda:
```
conda create -n pyomnits python=3.12
conda activate pyomnits
```
Python 3.10~3.12 have been tested.
Install dependencies.

Choose one of the options:
- Option 1: Fuzzy package versions, the legacy way.
```
pip install -r requirements.txt
```
  💡 For faster installation speed, consider installing uv and running uv pip install -r requirements.txt instead.
- Option 2: Exact package versions, the aggressive way.
  
  ⚠️ It assumes your Linux server to have cuda version 12, which can be less flexible than option 1.
  
  Install uv, then:
```
uv pip sync requirements.lock
```
🔥Note: some packages are only used by a few models/datasets, which are optional. See comments in requirements.txt.

3. 💾 Prepare Datasets

3.1 Regular

Get them from [Google Drive] provided by Time-Series-Library, which includes the following datasets in this repository:

ECL (electricity)
ETTh1 (ETT-small)
ETTm1 (ETT-small)
ILI (illness)
Traffic (traffic)
Weather (weather)

And place them under storage/datasets folder of this project (create the folder if not exists, or you can use symbolic link ln -s to redirect to existing dataset files).

You will get the following file structure under storage/datasets:

.
├── electricity/
│   └── electricity.csv
├── ETT-small/
│   ├── ETTh1.csv
│   ├── ETTh2.csv
│   ├── ETTm1.csv
│   └── ETTm2.csv
├── illness/
│   └── national_illness.csv
├── traffic/
│   └── traffic.csv
└── weather/
    └── weather.csv

3.2 Irregular

3.2.1 Human Activity

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under storage/datasets, after the code finish preprocessing:

.
└── HumanActivity/
    ├── processed/
    │   └── data.pt
    └── raw/
        └── ConfLongDemo_JSI.txt

3.2.2 MIMIC III

Since MIMIC III requires credentialed access:

Request for raw data from here. Files can be put wherever you like, and you don't have to extract .csv.gz as .csv.
Data preprocessing

Choose one of the options:
- Option 1: Use the revised scripts in PyOmniTS.
  - Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.7, numpy 1.21.6, and pandas 1.3.5
    conda create -n python37 python=3.7 conda activate python37 pip install numpy==1.21.6 pandas==1.3.5
  - python data/dependencies/MIMIC_III/preprocess/0_run_all.py
- Option 2: Use the original scripts in gru_ode_bayes.
  - Follow the processing scripts in gru_ode_bayes to get complete_tensor.csv.
  - Put the result under ~/.tsdm/rawdata/MIMIC_III_DeBrouwer2019/complete_tensor.csv.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):

.
├── datasets/
│   └── MIMIC_III_DeBrouwer2019/
│       ├── metadata.parquet
│       └── timeseries.parquet
└── rawdata/
    └── MIMIC_III_DeBrouwer2019/
        └── complete_tensor.csv

3.2.2 MIMIC IV

Since MIMIC IV requires credentialed access:

Request for raw data from here. Files can be put wherever you like, and you don't have to extract .csv.gz as .csv.
Data preprocessing

Choose one of the options:
- Option 1: Use the revised scripts in PyOmniTS.
  - Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.8, numpy 1.24.4, and pandas 2.0.3
    conda create -n python38 python=3.8 conda activate python38 pip install numpy==1.24.4 pandas==2.0.3
  - python data/dependencies/MIMIC_IV/preprocess/0_run_all.py
- Option 2: Use the original scripts in NeuralFlows.
  - Follow the processing scripts in NeuralFlows to get full_dataset.csv.
  - Put the result under ~/.tsdm/rawdata/MIMIC_IV_Bilos2021/full_dataset.csv.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):

.
├── datasets/
│   └── MIMIC_IV_Bilos2021/
│       └── timeseries.parquet
└── rawdata/
    └── MIMIC_IV_Bilos2021/
        └── full_dataset.csv

3.2.3 PhysioNet'12

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing:

.
├── datasets/
│   └── Physionet2012/
│       ├── Physionet2012-set-A-sparse.tar
│       ├── Physionet2012-set-B-sparse.tar
│       └── Physionet2012-set-C-sparse.tar
└── rawdata/
    └── Physionet2012/
        ├── set-a.tar.gz
        ├── set-b.tar.gz
        └── set-c.tar.gz

3.2.4 USHCN

No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.

The following file structure will be found under ~/.tsdm, after the code finish preprocessing:

.
├── datasets/
│   └── USHCN_DeBrouwer2019/
│       └── USHCN_DeBrouwer2019.parquet
└── rawdata/
    └── USHCN_DeBrouwer2019/
        └── small_chunked_sporadic.csv

4. 📂 (Optional) Folder Structure

You can optionally learn how PyOmniTS organize its folder structure:

.
├── configs/ # (Auto-generated) YAML configs for experiments. Only saved as references, not input parameters.
├── data/
│   ├── data_provider/
|   |   ├── datasets/ # Main classes of datasets. File names match the string provided in --dataset_name.
|   |   └── data_factory.py # Provides an interface to get torch.utils.data.Dataset and torch.utils.data.DataLoader
|   └── dependencies/ # Dependencies for dataset classes under data/data_provider/datasets/
├── docs # Documentations
├── exp/
|   ├── exp_basic.py # Parent class for experiments.
|   └── exp_main.py # Main class for experiments, inherit from the class in exp_basic.py
├── layers/ # Dependencies for model classes under models/
├── logs/ # (Auto-generated) Auto-rotated logs when running experiments.
├── loss_fns/ # Main classes of loss functions. File names match the string provided in --loss.
├── lr_schedulers/ # Main classes of some learning rate schedulers.
├── models/ # Main classes of models. File names match the string provided in --model_name.
├── scripts/ # Launch scripts for experiments.
├── storage/ # (Auto-generated) General purpose storage folder, not recorded by git.
|   ├── datasets/ # Time series data for some datasets.
|   └── results/ # Experiment results.
├── tests/ # Unit tests only used by PyOmniTS maintainers.
├── utils/
|   ├── configs.py # Command line arguments accepted by main.py
|   ├── ExpConfigs.py # Dataclass that wraps utils/configs.py for typo check. Passed to models, datasets, loss_fns,... for their initializations.
|   ├── globals.py # A few global variables (logger, accelerator,...).
|   ├── metrics.py # Calculate metrics (e.g., MSE) during testing.
|   └── tools.py # misc helper functions and classes.
├── wandb/ # (Auto-generated) Weight & Bias logs when --wandb 1 or --sweep 1.
├── .all-contributorsrc # Only used in README.md.
├── .gitignore # Git ignore rules.
├── .python-version # Recommended Python version, display only.
├── LICENSE # MIT License.
├── main.py # Main entrance for experiments.
├── pyproject.toml # Standard configuration file for Python projects.
├── README.md
├── requirements.lock # Python package requirements (with versions).
├── requirements.txt # Python package requirements (without versions).
├── run_unittest.sh # Launch script for unit tests in tests/. Only used by PyOmniTS maintainers.
└── run.sh # Launch script for scripts/. Useful when launching multiple experiments at once.

Core logic when running experiments:

scripts/ → main.py → exp/exp_main.py

5. 🔥 Training

Training scripts are located in scripts folder. For example, to train mTAN on dataset Human Activity:

sh scripts/mTAN/HumanActivity.sh

Training results will be organized in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0

6. ❄️ Testing

Testing will be automatically conducted once the training finished. If you wish to run test only, change command line argument --is_training in training script from 1 to 0 and run the script.

Testing result metric.json will be saved in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0/eval_%Y_%m%d_%H%M

1. ⏬ Clone the Repository​

2. 💿 Prepare the Environment​

3. 💾 Prepare Datasets​

3.1 Regular​

3.2 Irregular​

3.2.1 Human Activity​

3.2.2 MIMIC III​

3.2.2 MIMIC IV​

3.2.3 PhysioNet'12​

3.2.4 USHCN​

4. 📂 (Optional) Folder Structure​

5. 🔥 Training​

6. ❄️ Testing​

1. ⏬ Clone the Repository

2. 💿 Prepare the Environment

3. 💾 Prepare Datasets

3.1 Regular

3.2 Irregular

3.2.1 Human Activity

3.2.2 MIMIC III

3.2.2 MIMIC IV

3.2.3 PhysioNet'12

3.2.4 USHCN

4. 📂 (Optional) Folder Structure

5. 🔥 Training

6. ❄️ Testing