π Get Started
This tutorial guides you running experiments.
1. β¬ Clone the Repositoryβ
cd /path/to/your/project
git clone https://github.com/Ladbaby/PyOmniTS.git
2. πΏ Prepare the Environmentβ
-
Create a new Python virtual environment via the tool of your choice, and activate it. For example, using Miniconda/Anaconda:
conda create -n pyomnits python=3.12
conda activate pyomnitsPython 3.10~3.12 have been tested.
-
Install dependencies.
Choose one of the options:
-
Option 1: Fuzzy package versions, the legacy way.
pip install -r requirements.txtπ‘ For faster installation speed, consider installing uv and running
uv pip install -r requirements.txtinstead. -
Option 2: Exact package versions, the aggressive way.
β οΈ It assumes your Linux server to have cuda version 12, which can be less flexible than option 1.
Install uv, then:
uv pip sync requirements.lock
π₯Note: some packages are only used by a few models/datasets, which are optional. See comments in
requirements.txt. -
3. πΎ Prepare Datasetsβ
3.1 Regularβ
Get them from [Google Drive] provided by Time-Series-Library, which includes the following datasets in this repository:
- ECL (electricity)
- ETTh1 (ETT-small)
- ETTm1 (ETT-small)
- ILI (illness)
- Traffic (traffic)
- Weather (weather)
And place them under storage/datasets folder of this project (create the folder if not exists, or you can use symbolic link ln -s to redirect to existing dataset files).
You will get the following file structure under storage/datasets:
.
βββ electricity/
β βββ electricity.csv
βββ ETT-small/
β βββ ETTh1.csv
β βββ ETTh2.csv
β βββ ETTm1.csv
β βββ ETTm2.csv
βββ illness/
β βββ national_illness.csv
βββ traffic/
β βββ traffic.csv
βββ weather/
βββ weather.csv
3.2 Irregularβ
3.2.1 Human Activityβ
No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.
The following file structure will be found under storage/datasets, after the code finish preprocessing:
.
βββ HumanActivity/
βββ processed/
β βββ data.pt
βββ raw/
βββ ConfLongDemo_JSI.txt
3.2.2 MIMIC IIIβ
Since MIMIC III requires credentialed access:
-
Request for raw data from here. Files can be put wherever you like, and you don't have to extract
.csv.gzas.csv. -
Data preprocessing
Choose one of the options:
- Option 1: Use the revised scripts in PyOmniTS.
-
Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.7, numpy 1.21.6, and pandas 1.3.5
conda create -n python37 python=3.7
conda activate python37
pip install numpy==1.21.6 pandas==1.3.5 -
python data/dependencies/MIMIC_III/preprocess/0_run_all.py
-
- Option 2: Use the original scripts in gru_ode_bayes.
- Follow the processing scripts in gru_ode_bayes to get
complete_tensor.csv. - Put the result under
~/.tsdm/rawdata/MIMIC_III_DeBrouwer2019/complete_tensor.csv.
- Follow the processing scripts in gru_ode_bayes to get
- Option 1: Use the revised scripts in PyOmniTS.
The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):
.
βββ datasets/
β βββ MIMIC_III_DeBrouwer2019/
β βββ metadata.parquet
β βββ timeseries.parquet
βββ rawdata/
βββ MIMIC_III_DeBrouwer2019/
βββ complete_tensor.csv
3.2.2 MIMIC IVβ
Since MIMIC IV requires credentialed access:
-
Request for raw data from here. Files can be put wherever you like, and you don't have to extract
.csv.gzas.csv. -
Data preprocessing
Choose one of the options:
- Option 1: Use the revised scripts in PyOmniTS.
-
Create a new virtual environment (only used in data preprocessing, not subsequent training) with Python 3.8, numpy 1.24.4, and pandas 2.0.3
conda create -n python38 python=3.8
conda activate python38
pip install numpy==1.24.4 pandas==2.0.3 -
python data/dependencies/MIMIC_IV/preprocess/0_run_all.py
-
- Option 2: Use the original scripts in NeuralFlows.
- Follow the processing scripts in NeuralFlows to get
full_dataset.csv. - Put the result under
~/.tsdm/rawdata/MIMIC_IV_Bilos2021/full_dataset.csv.
- Follow the processing scripts in NeuralFlows to get
- Option 1: Use the revised scripts in PyOmniTS.
The following file structure will be found under ~/.tsdm, after the code finish preprocessing (Note: .parquet files will be generated automatically after training any model on this dataset):
.
βββ datasets/
β βββ MIMIC_IV_Bilos2021/
β βββ timeseries.parquet
βββ rawdata/
βββ MIMIC_IV_Bilos2021/
βββ full_dataset.csv
3.2.3 PhysioNet'12β
No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.
The following file structure will be found under ~/.tsdm, after the code finish preprocessing:
.
βββ datasets/
β βββ Physionet2012/
β βββ Physionet2012-set-A-sparse.tar
β βββ Physionet2012-set-B-sparse.tar
β βββ Physionet2012-set-C-sparse.tar
βββ rawdata/
βββ Physionet2012/
βββ set-a.tar.gz
βββ set-b.tar.gz
βββ set-c.tar.gz
3.2.4 USHCNβ
No need to prepare in advance. Our code will automatically download then preprocess it if you want to train on it.
The following file structure will be found under ~/.tsdm, after the code finish preprocessing:
.
βββ datasets/
β βββ USHCN_DeBrouwer2019/
β βββ USHCN_DeBrouwer2019.parquet
βββ rawdata/
βββ USHCN_DeBrouwer2019/
βββ small_chunked_sporadic.csv
4. π (Optional) Folder Structureβ
You can optionally learn how PyOmniTS organize its folder structure:
.
βββ configs/ # (Auto-generated) YAML configs for experiments. Only saved as references, not input parameters.
βββ data/
β βββ data_provider/
| | βββ datasets/ # Main classes of datasets. File names match the string provided in --dataset_name.
| | βββ data_factory.py # Provides an interface to get torch.utils.data.Dataset and torch.utils.data.DataLoader
| βββ dependencies/ # Dependencies for dataset classes under data/data_provider/datasets/
βββ docs # Documentations
βββ exp/
| βββ exp_basic.py # Parent class for experiments.
| βββ exp_main.py # Main class for experiments, inherit from the class in exp_basic.py
βββ layers/ # Dependencies for model classes under models/
βββ logs/ # (Auto-generated) Auto-rotated logs when running experiments.
βββ loss_fns/ # Main classes of loss functions. File names match the string provided in --loss.
βββ lr_schedulers/ # Main classes of some learning rate schedulers.
βββ models/ # Main classes of models. File names match the string provided in --model_name.
βββ scripts/ # Launch scripts for experiments.
βββ storage/ # (Auto-generated) General purpose storage folder, not recorded by git.
| βββ datasets/ # Time series data for some datasets.
| βββ results/ # Experiment results.
βββ tests/ # Unit tests only used by PyOmniTS maintainers.
βββ utils/
| βββ configs.py # Command line arguments accepted by main.py
| βββ ExpConfigs.py # Dataclass that wraps utils/configs.py for typo check. Passed to models, datasets, loss_fns,... for their initializations.
| βββ globals.py # A few global variables (logger, accelerator,...).
| βββ metrics.py # Calculate metrics (e.g., MSE) during testing.
| βββ tools.py # misc helper functions and classes.
βββ wandb/ # (Auto-generated) Weight & Bias logs when --wandb 1 or --sweep 1.
βββ .all-contributorsrc # Only used in README.md.
βββ .gitignore # Git ignore rules.
βββ .python-version # Recommended Python version, display only.
βββ LICENSE # MIT License.
βββ main.py # Main entrance for experiments.
βββ pyproject.toml # Standard configuration file for Python projects.
βββ README.md
βββ requirements.lock # Python package requirements (with versions).
βββ requirements.txt # Python package requirements (without versions).
βββ run_unittest.sh # Launch script for unit tests in tests/. Only used by PyOmniTS maintainers.
βββ run.sh # Launch script for scripts/. Useful when launching multiple experiments at once.
Core logic when running experiments:
scripts/ β main.py β exp/exp_main.py
5. π₯ Trainingβ
Training scripts are located in scripts folder.
For example, to train mTAN on dataset Human Activity:
sh scripts/mTAN/HumanActivity.sh
Training results will be organized in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0
6. βοΈ Testingβ
Testing will be automatically conducted once the training finished.
If you wish to run test only, change command line argument --is_training in training script from 1 to 0 and run the script.
Testing result metric.json will be saved in storage/results/${DATASET_NAME}/${DATASET_ID}/${MODEL_NAME}/${MODEL_ID}/${SEQ_LEN}_${PRED_LEN}/%Y_%m%d_%H%M/iter0/eval_%Y_%m%d_%H%M