gezi2333 commited on 4 days ago

Commit

3589275

verified ·

1 Parent(s): e40b7dd

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
README.md +207 -0
assets/WVO-WD-TFS.png +3 -0
assets/orthoreg_loss.png +3 -0
environment.yml +140 -0
src/__init__.py +0 -0
src/__pycache__/__init__.cpython-310.pyc +0 -0
src/__pycache__/args.cpython-310.pyc +0 -0
src/__pycache__/attention_only_finetune.cpython-310.pyc +0 -0
src/__pycache__/distributed.cpython-310.pyc +0 -0
src/__pycache__/eval.cpython-310.pyc +0 -0
src/__pycache__/heads.cpython-310.pyc +0 -0
src/__pycache__/linearize.cpython-310.pyc +0 -0
src/__pycache__/modeling.cpython-310.pyc +0 -0
src/__pycache__/task_vectors.cpython-310.pyc +0 -0
src/__pycache__/utils.cpython-310.pyc +0 -0
src/args.py +153 -0
src/attention_only_finetune.py +116 -0
src/datasets/__pycache__/cars.cpython-310.pyc +0 -0
src/datasets/__pycache__/cifar10.cpython-310.pyc +0 -0
src/datasets/__pycache__/cifar100.cpython-310.pyc +0 -0
src/datasets/__pycache__/common.cpython-310.pyc +0 -0
src/datasets/__pycache__/dtd.cpython-310.pyc +0 -0
src/datasets/__pycache__/emnist.cpython-310.pyc +0 -0
src/datasets/__pycache__/eurosat.cpython-310.pyc +0 -0
src/datasets/__pycache__/gtsrb.cpython-310.pyc +0 -0
src/datasets/__pycache__/imagenet.cpython-310.pyc +0 -0
src/datasets/__pycache__/kmnist.cpython-310.pyc +0 -0
src/datasets/__pycache__/mnist.cpython-310.pyc +0 -0
src/datasets/__pycache__/oxfordpets.cpython-310.pyc +0 -0
src/datasets/__pycache__/registry.cpython-310.pyc +0 -0
src/datasets/__pycache__/resisc45.cpython-310.pyc +0 -0
src/datasets/__pycache__/stl10.cpython-310.pyc +0 -0
src/datasets/__pycache__/sun397.cpython-310.pyc +0 -0
src/datasets/__pycache__/svhn.cpython-310.pyc +0 -0
src/datasets/__pycache__/templates.cpython-310.pyc +0 -0
src/datasets/cars.py +155 -0
src/datasets/cifar10.py +56 -0
src/datasets/cifar100.py +30 -0
src/datasets/common.py +139 -0
src/datasets/dtd.py +34 -0
src/datasets/emnist.py +74 -0
src/datasets/eurosat.py +75 -0
src/datasets/gtsrb.py +205 -0
src/datasets/imagenet.py +253 -0
src/datasets/kmnist.py +39 -0
src/datasets/mnist.py +41 -0
src/datasets/oxfordpets.py +38 -0
src/datasets/registry.py +103 -0
src/datasets/resisc45.py +304 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/WVO-WD-TFS.png filter=lfs diff=lfs merge=lfs -text
+assets/orthoreg_loss.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+# Understanding and Enforcing Weight Disentanglement in Task Arithmetic
+[CVPR 2026] Official code of the paper **"Understanding and Enforcing Weight Disentanglement in Task Arithmetic"**.
+[[Paper](https://arxiv.org/abs/2604.17078)] &nbsp; [[Checkpoints](#-checkpoints)] &nbsp; [[Datasets](#-datasets)]
+---
+## 🎯 Abstract
+Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of "weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($\theta_0$) or the task vectors ($\tau_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($\Delta W$) that constitute $\tau_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods.
+<p align="center">
+  <img src="assets/WVO-WD-TFS.png" width="500"/>
+  <br>
+  <em>TFS is the common cause connecting Weight Vector Orthogonality (WVO) with Weight Disentanglement (WD).</em>
+</p>
+### ✨ Key Contributions
+- 📐 **Theory**: We identify TFS as a sufficient condition for weight disentanglement, and WVO as its geometric consequence, providing the first principled explanation for task arithmetic.
+- 🔧 **Method (OrthoReg)**: A simple regularization term added to the fine-tuning loss that enforces column-wise orthogonality on ΔW, for which we prove theoretical efficacy.
+- 🔗 **Connection to TTA**: We show that OrthoReg and Tangent Task Arithmetic (TTA) share the same underlying mechanism (i.e. inter-task vector orthogonality), but OrthoReg achieves this more efficiently.
+- 📊 **Experiments**: Consistent and significant improvements over Non-linear FT, TTA, ATT-FT, LoRA-ATT across ViT-B-32, ViT-B-16, and ViT-L-14.
+---
+### The OrthoReg Loss
+<p align="center">
+  <img src="assets/orthoreg_loss.png" width="560"/>
+</p>
+The total loss adds a regularization term to the standard task objective:
+$$\mathcal{L} = \mathcal{L}_{\text{task}}(\theta_0 + \Delta\theta) + \lambda \cdot \mathcal{L}_{\text{ortho}}(\Delta\theta)$$
+$$\mathcal{L}_{\text{ortho}}(\Delta\theta) = \sum_l \left\|(\Delta W^{(l)})^\top \Delta W^{(l)} - I\right\|_F^2$$
+---
+## 🛠️ Installation
+This codebase is built on top of [Tangent Task Arithmetic (TTA)](https://github.com/gortizji/tangent_task_arithmetic). Environment setup follows theirs exactly.
+To run the code, please install all its dependencies:
+```sh
+conda env create
+conda activate tangent-arithmetic
+```
+and add the `src` directory to the `PYTHONPATH`:
+```sh
+cd OrthoReg
+export PYTHONPATH="$PYTHONPATH:$PWD"
+```
+---
+## 📦 Datasets
+We evaluate on 8 image classification benchmarks following [Task Arithmetic](https://github.com/mlfoundations/task_vectors) and [TTA](https://github.com/gortizji/tangent_task_arithmetic):
+**Cars · DTD · EuroSAT · GTSRB · MNIST · RESISC45 · SUN397 · SVHN**
+For dataset download and preparation, please follow the instructions in the [TTA repository](https://github.com/gortizji/tangent_task_arithmetic#datasets).
+We also provide a pre-packaged dataset archive for convenience:
+> 📥 **Dataset Download:** `https://pan.baidu.com/s/1PgLyjUrAhsmgSAz4ms5mcQ?pwd=fwf5`
+Set the root path via `--data-location /path/to/datasets/`.
+---
+## 🚀 Quick Start
+All scripts are run from the `OrthoReg/` directory. This repository implements **6 finetuning modes**:
+| `--finetuning-mode` | Description |
+|---|---|
+| `standard` | Non-linear full fine-tuning (baseline) |
+| `standard_ortho` | Non-linear FT + OrthoReg |
+| `linear` | TTA — tangent space fine-tuning (baseline) |
+| `linear_ortho` | TTA + OrthoReg |
+| `linear-2` | ATT-FT — attention-only fine-tuning (baseline) |
+| `linear-2_ortho` | ATT-FT + OrthoReg |
+> **Note on LoRA-ATT:** The LoRA-ATT and LoRA-ATT+OrthoReg results from the paper are implemented in a separate repository due to the complexity of patching OpenCLIP's fused QKV projection. Code will be released at: `https://github.com/lshangge/OrthoReg_lora`
+### Step 1 — Fine-tune
+```bash
+python src/finetune.py \
+    --model ViT-B-32 \
+    --finetuning-mode standard_ortho \
+    --ortho-lambda 10 \
+    --lr 1e-5 \
+    --data-location /path/to/datasets/ \
+```
+Switch between all six modes by changing `--finetuning-mode` and `--ortho-lambda`:
+```bash
+--finetuning-mode standard       --ortho-lambda 0     # Non-linear FT
+--finetuning-mode standard_ortho --ortho-lambda xx    # Non-linear FT + OrthoReg
+--finetuning-mode linear         --ortho-lambda 0     # TTA
+--finetuning-mode linear_ortho   --ortho-lambda xx    # TTA + OrthoReg
+--finetuning-mode linear-2       --ortho-lambda 0     # ATT-FT
+--finetuning-mode linear-2_ortho --ortho-lambda xx    # ATT-FT + OrthoReg
+```
+Checkpoints are saved to:
+- `checkpoints_{seed}/{mode}_{lr}_{model}/` — for baselines
+- `checkpoints_{seed}/{mode}_{lr}_lambda{lambda}_{model}/` — for OrthoReg variants
+### Step 2 — Evaluate Single-Task Accuracy
+```bash
+python src/eval_single_task.py \
+    --model ViT-B-32 \
+    --finetuning-mode standard_ortho \
+    --ortho-lambda 10 \
+    --lr 1e-5 \
+    --data-location /path/to/datasets/
+```
+> Run `eval_single_task` with `--finetuning-mode none --ortho-lambda 0` first to generate `zeroshot_accuracies.json`, which is required as the reference for normalized accuracy in Steps 3–4.
+### Step 3 — Evaluate Task Addition
+```bash
+python src/eval_task_addition.py \
+    --model ViT-B-32 \
+    --finetuning-mode standard_ortho \
+    --ortho-lambda 10 \
+    --lr 1e-5 \
+    --data-location /path/to/datasets/
+```
+### Step 4 — Evaluate Task Negation
+```bash
+python src/eval_task_negation.py \
+    --model ViT-B-32 \
+    --finetuning-mode standard_ortho \
+    --ortho-lambda 10 \
+    --lr 1e-5 \
+    --data-location /path/to/datasets/
+```
+---
+## 🔧 Key Arguments
+| Argument | Default | Description |
+|---|:---:|---|
+| `--model` | `ViT-B-32` | CLIP model architecture |
+| `--finetuning-mode` | — | One of the 6 modes above |
+| `--ortho-lambda` | `0.0` | OrthoReg strength λ; set to `0` for baselines |
+| `--lr` | `1e-5` | Learning rate |
+| `--seed` | `1993` | Random seed |
+| `--world-size` | `1` | Number of GPUs (DDP) |
+| `--data-location` | — | Dataset root directory |
+| `--batch-size` | `128` | Batch size per GPU |
+---
+## 📁 Checkpoints
+We release fine-tuned checkpoints for ViT-B-32, ViT-B-16, and ViT-L-14 on all 8 tasks, covering all 6 modes.
+> 📥 **Checkpoint Download:** `https://huggingface.co/gezi2333/OrthoReg_checkpoints`
+Unzip into `OrthoReg/checkpoints_{seed}/` and pass the corresponding `--seed`, `--lr`, and `--ortho-lambda` to the eval scripts to reproduce the paper's results directly.
+---
+## 📝 Citation
+If you find this work useful, please cite:
+```bibtex
+@inproceedings{liu2026orthoreg,
+  title     = {Understanding and Enforcing Weight Disentanglement in Task Arithmetic},
+  author    = {Liu, Shangge and Yin, Yuehan and Wang, Lei and Fan, Qi and
+               Shi, Yinghuan and Li, Wenbin and Gao, Yang and Tao, Dacheng},
+  booktitle = {CVPR},
+  year      = {2026}
+}
+```
+---
+## 📞 Contact
+For questions or issues, please:
+- Open an issue on GitHub
+- Contact the authors at [lshangge@smail.nju.edu.cn]
+---
+## 📬 Acknowledgements
+This codebase is built on top of [Task Arithmetic](https://github.com/mlfoundations/task_vectors), [Tangent Task Arithmetic](https://github.com/gortizji/tangent_task_arithmetic), and [Attention-Only Fine-tuning](https://github.com/kyrie-23/linear_task_arithmetic). We thank the authors for releasing their code.

assets/WVO-WD-TFS.png ADDED Viewed

Git LFS Details

SHA256: bc8a9efc76ecb495a5de03a98215606a8cbab5b38cdbb53ea5d2c2ed133e535a
Pointer size: 131 Bytes
Size of remote file: 150 kB

assets/orthoreg_loss.png ADDED Viewed

Git LFS Details

SHA256: d7974c4003a412cc9a5134f26b71cc58739f5cd1f48301ab0d3d428cb17ecb8e
Pointer size: 131 Bytes
Size of remote file: 158 kB

environment.yml ADDED Viewed

	@@ -0,0 +1,140 @@

+name: tangent-arithmetic
+channels:
+  - pytorch
+  - nvidia
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1
+  - _openmp_mutex=5.1
+  - blas=1.0
+  - brotlipy=0.7.0
+  - bzip2=1.0.8
+  - ca-certificates=2023.05.30
+  - certifi=2023.5.7
+  - cffi=1.15.1
+  - charset-normalizer=2.0.4
+  - cryptography=39.0.1
+  - cuda=11.6.1
+  - cuda-cccl=11.6.55
+  - cuda-command-line-tools=11.6.2
+  - cuda-compiler=11.6.2
+  - cuda-cudart=11.6.55
+  - cuda-cudart-dev=11.6.55
+  - cuda-cuobjdump=11.6.124
+  - cuda-cupti=11.6.124
+  - cuda-cuxxfilt=11.6.124
+  - cuda-driver-dev=11.6.55
+  - cuda-gdb=12.1.105
+  - cuda-libraries=11.6.1
+  - cuda-libraries-dev=11.6.1
+  - cuda-memcheck=11.8.86
+  - cuda-nsight=12.1.105
+  - cuda-nsight-compute=12.1.1
+  - cuda-nvcc=11.6.124
+  - cuda-nvdisasm=12.1.105
+  - cuda-nvml-dev=11.6.55
+  - cuda-nvprof=12.1.105
+  - cuda-nvprune=11.6.124
+  - cuda-nvrtc=11.6.124
+  - cuda-nvrtc-dev=11.6.124
+  - cuda-nvtx=11.6.124
+  - cuda-nvvp=12.1.105
+  - cuda-runtime=11.6.1
+  - cuda-samples=11.6.101
+  - cuda-sanitizer-api=12.1.105
+  - cuda-toolkit=11.6.1
+  - cuda-tools=11.6.1
+  - cuda-visual-tools=11.6.1
+  - ffmpeg=4.3
+  - freetype=2.12.1
+  - gds-tools=1.6.1.9
+  - giflib=5.2.1
+  - gmp=6.2.1
+  - gnutls=3.6.15
+  - idna=3.4
+  - intel-openmp=2023.1.0
+  - jpeg=9e
+  - lame=3.100
+  - lcms2=2.12
+  - ld_impl_linux-64=2.38
+  - lerc=3.0
+  - libcublas=11.9.2.110
+  - libcublas-dev=11.9.2.110
+  - libcufft=10.7.1.112
+  - libcufft-dev=10.7.1.112
+  - libcufile=1.6.1.9
+  - libcufile-dev=1.6.1.9
+  - libcurand=10.3.2.106
+  - libcurand-dev=10.3.2.106
+  - libcusolver=11.3.4.124
+  - libcusparse=11.7.2.124
+  - libcusparse-dev=11.7.2.124
+  - libdeflate=1.17
+  - libffi=3.4.4
+  - libgcc-ng=11.2.0
+  - libgomp=11.2.0
+  - libiconv=1.16
+  - libidn2=2.3.4
+  - libnpp=11.6.3.124
+  - libnpp-dev=11.6.3.124
+  - libnvjpeg=11.6.2.124
+  - libnvjpeg-dev=11.6.2.124
+  - libpng=1.6.39
+  - libstdcxx-ng=11.2.0
+  - libtasn1=4.19.0
+  - libtiff=4.5.0
+  - libunistring=0.9.10
+  - libuuid=1.41.5
+  - libwebp=1.2.4
+  - libwebp-base=1.2.4
+  - lz4-c=1.9.4
+  - mkl=2023.1.0
+  - mkl-service=2.4.0
+  - mkl_fft=1.3.6
+  - mkl_random=1.2.2
+  - ncurses=6.4
+  - nettle=3.7.3
+  - nsight-compute=2023.1.1.4
+  - numpy=1.24.3
+  - numpy-base=1.24.3
+  - openh264=2.1.1
+  - openssl=1.1.1t
+  - pillow=9.4.0
+  - pip=23.0.1
+  - pycparser=2.21
+  - pyopenssl=23.0.0
+  - pysocks=1.7.1
+  - python=3.10.11
+  - pytorch=1.13.1
+  - pytorch-cuda=11.6
+  - pytorch-mutex=1.0
+  - readline=8.2
+  - requests=2.29.0
+  - setuptools=67.8.0
+  - sqlite=3.41.2
+  - tbb=2021.8.0
+  - tk=8.6.12
+  - torchaudio=0.13.1
+  - torchvision=0.14.1
+  - typing_extensions=4.5.0
+  - tzdata=2023c
+  - urllib3=1.26.16
+  - wheel=0.38.4
+  - xz=5.4.2
+  - zlib=1.2.13
+  - zstd=1.5.5
+  - pip:
+    - filelock==3.12.0
+    - fsspec==2023.5.0
+    - ftfy==6.1.1
+    - huggingface-hub==0.15.1
+    - open-clip-torch==2.10.1
+    - packaging==23.1
+    - protobuf==3.20.3
+    - pyyaml==6.0
+    - regex==2023.6.3
+    - safetensors==0.3.1
+    - scipy==1.10.1
+    - sentencepiece==0.1.99
+    - timm==0.9.2
+    - wcwidth==0.2.6

src/__init__.py ADDED Viewed

File without changes

src/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (139 Bytes). View file

src/__pycache__/args.cpython-310.pyc ADDED Viewed

Binary file (3.4 kB). View file

src/__pycache__/attention_only_finetune.cpython-310.pyc ADDED Viewed

Binary file (3.6 kB). View file

src/__pycache__/distributed.cpython-310.pyc ADDED Viewed

Binary file (1.19 kB). View file

src/__pycache__/eval.cpython-310.pyc ADDED Viewed

Binary file (3.38 kB). View file

src/__pycache__/heads.cpython-310.pyc ADDED Viewed

Binary file (1.92 kB). View file

src/__pycache__/linearize.cpython-310.pyc ADDED Viewed

Binary file (6.29 kB). View file

src/__pycache__/modeling.cpython-310.pyc ADDED Viewed

Binary file (6.52 kB). View file

src/__pycache__/task_vectors.cpython-310.pyc ADDED Viewed

Binary file (7.75 kB). View file

src/__pycache__/utils.cpython-310.pyc ADDED Viewed

Binary file (5.95 kB). View file

src/args.py ADDED Viewed

	@@ -0,0 +1,153 @@

+import argparse
+import os
+import torch
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data-location",
+        type=str,
+        default=os.path.expanduser("/path/datasets/"),
+        help="The root directory for the datasets.",
+    )
+    parser.add_argument(
+        "--eval-datasets",
+        default=None,
+        type=lambda x: x.split(","),
+        help="Which datasets to use for evaluation. Split by comma, e.g. MNIST,EuroSAT. ",
+    )
+    parser.add_argument(
+        "--train-dataset",
+        default=None,
+        type=lambda x: x.split(","),
+        help="Which dataset(s) to patch on.",
+    )
+    parser.add_argument(
+        "--exp_name",
+        type=str,
+        default=None,
+        help="Name of the experiment, for organization purposes only.",
+    )
+    parser.add_argument(
+        "--results-db",
+        type=str,
+        default=None,
+        help="Where to store the results, else does not store",
+    )
+    parser.add_argument(
+        "--model",
+        type=str,
+        default="ViT-B-32",
+        help="The type of model (e.g. RN50, ViT-B-32).",
+    )
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=128,
+    )
+    parser.add_argument(
+        "--num-grad-accumulation",
+        type=int,
+        default=1,
+        help="Number of gradient accumulation steps.",
+    )
+    parser.add_argument("--lr", type=float, default=0.001, help="Learning rate.")
+    parser.add_argument("--wd", type=float, default=0.1, help="Weight decay")
+    parser.add_argument("--ls", type=float, default=0.0, help="Label smoothing.")
+    parser.add_argument(
+        "--warmup_length",
+        type=int,
+        default=500,
+    )
+    parser.add_argument(
+        "--epochs",
+        type=int,
+        default=10,
+    )
+    parser.add_argument(
+        "--load",
+        type=lambda x: x.split(","),
+        default=None,
+        help="Optionally load _classifiers_, e.g. a zero shot classifier or probe or ensemble both.",
+    )
+    parser.add_argument(
+        "--save",
+        type=str,
+        default=None,
+        help="Optionally save a _classifier_, e.g. a zero shot classifier or probe.",
+    )
+    parser.add_argument(
+        "--cache-dir",
+        type=str,
+        default=None,
+        help="Directory for caching features and encoder",
+    )
+    parser.add_argument(
+        "--openclip-cachedir",
+        type=str,
+        default=os.path.expanduser("~/openclip-cachedir/open_clip"),
+        help="Directory for caching models from OpenCLIP",
+    )
+    parser.add_argument(
+        "--world-size",
+        type=int,
+        default=1,
+        help="Number of processes for distributed training.",
+    )
+    parser.add_argument(
+        "--checkpoint-every",
+        type=int,
+        default=-1,
+        help="How often to checkpoint the model.",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=12355,
+        help="Port for distributed training.",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=1993,
+        help="Random seed.",
+    )
+    parser.add_argument(
+        "--finetuning-mode",
+        choices=["standard", "standard_ortho", "linear", "linear_ortho", "linear-2", "linear-2_ortho"],
+        help="Finetuning mode: standard/linear/linear-2 with optional ortho regularization.",
+    )
+    parser.add_argument(
+        "--n-eval-points",
+        type=int,
+        default=21,
+        help="Number of evaluation points used to find optimal coefficient in task arithmetic.",
+    )
+    parser.add_argument(
+        "--ortho-lambda",
+        type=float,
+        default=0.0,
+        help="Weight of the orthogonality regularization term. Default 0.0 means no regularization.",
+    )
+    parser.add_argument(
+        "--control_threshold",
+        type=float,
+        default=0.95,
+        help="Control dataset performance degradation threshold.",
+    )
+    parser.add_argument(
+        "--alpha",
+        type=float,
+        default=None,
+        help="Manually specify the scaling coefficient for task vectors. If None, it will be optimized on validation set.",
+    )
+    parsed_args = parser.parse_args()
+    parsed_args.device = "cuda" if torch.cuda.is_available() else "cpu"
+    if parsed_args.load is not None and len(parsed_args.load) == 1:
+        parsed_args.load = parsed_args.load[0]
+    return parsed_args

src/attention_only_finetune.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import os
+import torch
+import torch.nn as nn
+from src.modeling import ImageEncoder
+from src.utils import DotDict
+class AttentionOnlyFinetuneEncoder(ImageEncoder):
+    """
+    A specialized ImageEncoder that fine-tunes only the attention module weights in the ViT.
+    Corresponds to the method described in Jin et al. (2025).
+    """
+    def __init__(self, args, keep_lang=False):
+        # 1. Call the parent constructor to build the full model as usual
+        super().__init__(args, keep_lang=keep_lang)
+        self.args = args
+        # 2. Freeze all model parameters
+        # print("Freezing all parameters of the model initially...")
+        for param in self.model.parameters():
+            param.requires_grad = False
+        # 3. Unfreeze only the Attention module weights (Wq, Wk, Wv, Wo)
+        # print("Unfreezing Attention module weights for fine-tuning...")
+        self._unfreeze_attention_weights(self.model.visual)
+        # 4. (Optional but recommended) Print trainable parameters for verification
+        # self._verify_trainable_params()
+    def _unfreeze_attention_weights(self, vit_model):
+        """
+        Iterate over all Transformer blocks and unfreeze the attention projection weights.
+        """
+        # Iterate over the model and unfreeze target parameters
+        for block in vit_model.transformer.resblocks:
+            # Unfreeze the combined input projection weight for Q, K, V
+            block.attn.in_proj_weight.requires_grad = True
+            # Unfreeze the output projection weight
+            block.attn.out_proj.weight.requires_grad = True
+            # Per the paper's ablation study, not fine-tuning biases yields better results; keep them frozen
+            # block.attn.in_proj_bias.requires_grad = True
+            # block.attn.out_proj.bias.requires_grad = True
+    def _verify_trainable_params(self):
+        """Print all trainable parameters for debugging and verification."""
+        print("="*80)
+        print("Trainable parameters in AttentionOnlyFinetuneEncoder:")
+        trainable_params_count = 0
+        for name, param in self.model.named_parameters():
+            if param.requires_grad:
+                print(f"  - {name}")
+                trainable_params_count += param.numel()
+        print(f"Total trainable parameters: {trainable_params_count / 1e6:.2f}M")
+        print("="*80)
+    def forward(self, images, calculate_ortho_loss=False, pretrained_state_dict=None):
+        """
+        Extended forward method to optionally compute and return the orthogonal loss.
+        Consistent with the logic implemented for standard_ortho.
+        """
+        # Original forward pass
+        features = self.model.encode_image(images)
+        # Return features directly if orthogonal loss is not needed
+        if not calculate_ortho_loss:
+            return features
+        # --- Compute orthogonal loss if requested ---
+        if pretrained_state_dict is None:
+            raise ValueError("pretrained_state_dict must be provided when calculate_ortho_loss is True")
+        ortho_loss = 0.0
+        # self.model is the open_clip model (e.g. ViT); iterate over its parameters
+        for name, p_finetuned in self.model.named_parameters():
+            # Only compute loss for trainable parameters with gradients
+            if p_finetuned.requires_grad and p_finetuned.dim() == 2:
+                if name in pretrained_state_dict:
+                    p_pretrained = pretrained_state_dict[name].to(p_finetuned.device)
+                    delta_W = p_finetuned - p_pretrained
+                    # Compute orthogonal loss (W^T * W - I)
+                    rows, cols = delta_W.shape
+                    if rows < cols:
+                        mat = delta_W @ delta_W.T
+                        identity = torch.eye(rows, device=delta_W.device)
+                    else:
+                        mat = delta_W.T @ delta_W
+                        identity = torch.eye(cols, device=delta_W.device)
+                    ortho_loss += torch.norm(mat - identity, p='fro')
+        return features, ortho_loss
+    def __call__(self, inputs, calculate_ortho_loss=False, pretrained_state_dict=None):
+        # Ensure __call__ forwards all arguments
+        return self.forward(inputs, calculate_ortho_loss, pretrained_state_dict)
+    def save(self, filename):
+        """Save model weights."""
+        # print(f"Saving AttentionOnlyFinetuneEncoder state_dict to {filename}")
+        if os.path.dirname(filename):
+            os.makedirs(os.path.dirname(filename), exist_ok=True)
+        # Save only the state_dict; reconstruct the model on load
+        torch.save(self.state_dict(), filename)
+    @classmethod
+    def load(cls, filename, args):
+        """Load model from a state_dict."""
+        # print(f"Loading AttentionOnlyFinetuneEncoder from {filename}")
+        encoder = cls(args)  # Create a new instance
+        state_dict = torch.load(filename, map_location='cpu')
+        encoder.load_state_dict(state_dict)  # Load weights
+        return encoder

src/datasets/__pycache__/cars.cpython-310.pyc ADDED Viewed

Binary file (5.94 kB). View file

src/datasets/__pycache__/cifar10.cpython-310.pyc ADDED Viewed

Binary file (2.15 kB). View file

src/datasets/__pycache__/cifar100.cpython-310.pyc ADDED Viewed

Binary file (924 Bytes). View file

src/datasets/__pycache__/common.cpython-310.pyc ADDED Viewed

Binary file (5.22 kB). View file

src/datasets/__pycache__/dtd.cpython-310.pyc ADDED Viewed

Binary file (1.39 kB). View file

src/datasets/__pycache__/emnist.cpython-310.pyc ADDED Viewed

Binary file (1.46 kB). View file

src/datasets/__pycache__/eurosat.cpython-310.pyc ADDED Viewed

Binary file (3.02 kB). View file

src/datasets/__pycache__/gtsrb.cpython-310.pyc ADDED Viewed

Binary file (7.68 kB). View file

src/datasets/__pycache__/imagenet.cpython-310.pyc ADDED Viewed

Binary file (15.9 kB). View file

src/datasets/__pycache__/kmnist.cpython-310.pyc ADDED Viewed

Binary file (952 Bytes). View file

src/datasets/__pycache__/mnist.cpython-310.pyc ADDED Viewed

Binary file (947 Bytes). View file

src/datasets/__pycache__/oxfordpets.cpython-310.pyc ADDED Viewed

Binary file (986 Bytes). View file

src/datasets/__pycache__/registry.cpython-310.pyc ADDED Viewed

Binary file (3.07 kB). View file

src/datasets/__pycache__/resisc45.cpython-310.pyc ADDED Viewed

Binary file (9.24 kB). View file

src/datasets/__pycache__/stl10.cpython-310.pyc ADDED Viewed

Binary file (976 Bytes). View file

src/datasets/__pycache__/sun397.cpython-310.pyc ADDED Viewed

Binary file (1.41 kB). View file

src/datasets/__pycache__/svhn.cpython-310.pyc ADDED Viewed

Binary file (1.04 kB). View file

src/datasets/__pycache__/templates.cpython-310.pyc ADDED Viewed

Binary file (18 kB). View file

src/datasets/cars.py ADDED Viewed

	@@ -0,0 +1,155 @@

+import os
+import torch
+import torchvision.datasets as datasets
+import pathlib
+from typing import Callable, Optional, Any, Tuple
+from PIL import Image
+from torchvision.datasets.utils import download_and_extract_archive, download_url, verify_str_arg
+from torchvision.datasets.vision import VisionDataset
+class PytorchStanfordCars(VisionDataset):
+    """`Stanford Cars <https://ai.stanford.edu/~jkrause/cars/car_dataset.html>`_ Dataset
+    The Cars dataset contains 16,185 images of 196 classes of cars. The data is
+    split into 8,144 training images and 8,041 testing images, where each class
+    has been split roughly in a 50-50 split
+    .. note::
+        This class needs `scipy <https://docs.scipy.org/doc/>`_ to load target files from `.mat` format.
+    Args:
+        root (string): Root directory of dataset
+        split (string, optional): The dataset split, supports ``"train"`` (default) or ``"test"``.
+        transform (callable, optional): A function/transform that  takes in an PIL image
+            and returns a transformed version. E.g, ``transforms.RandomCrop``
+        target_transform (callable, optional): A function/transform that takes in the
+            target and transforms it.
+        download (bool, optional): If True, downloads the dataset from the internet and
+            puts it in root directory. If dataset is already downloaded, it is not
+            downloaded again."""
+    def __init__(
+        self,
+        root: str,
+        split: str = "train",
+        transform: Optional[Callable] = None,
+        target_transform: Optional[Callable] = None,
+        download: bool = False,
+    ) -> None:
+        try:
+            import scipy.io as sio
+        except ImportError:
+            raise RuntimeError("Scipy is not found. This dataset needs to have scipy installed: pip install scipy")
+        super().__init__(root, transform=transform, target_transform=target_transform)
+        self._split = verify_str_arg(split, "split", ("train", "test"))
+        self._base_folder = pathlib.Path(root) / "stanford_cars"
+        devkit = self._base_folder / "devkit"
+        if self._split == "train":
+            self._annotations_mat_path = devkit / "cars_train_annos.mat"
+            self._images_base_path = self._base_folder / "cars_train"
+        else:
+            self._annotations_mat_path = self._base_folder / "cars_test_annos_withlabels.mat"
+            self._images_base_path = self._base_folder / "cars_test"
+        if download:
+            self.download()
+        if not self._check_exists():
+            raise RuntimeError("Dataset not found. You can use download=True to download it")
+        self._samples = [
+            (
+                str(self._images_base_path / annotation["fname"]),
+                annotation["class"] - 1,  # Original target mapping  starts from 1, hence -1
+            )
+            for annotation in sio.loadmat(self._annotations_mat_path, squeeze_me=True)["annotations"]
+        ]
+        self.classes = sio.loadmat(str(devkit / "cars_meta.mat"), squeeze_me=True)["class_names"].tolist()
+        self.class_to_idx = {cls: i for i, cls in enumerate(self.classes)}
+    def __len__(self) -> int:
+        return len(self._samples)
+    def __getitem__(self, idx: int) -> Tuple[Any, Any]:
+        """Returns pil_image and class_id for given index"""
+        image_path, target = self._samples[idx]
+        pil_image = Image.open(image_path).convert("RGB")
+        if self.transform is not None:
+            pil_image = self.transform(pil_image)
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+        return pil_image, target
+    def download(self) -> None:
+        if self._check_exists():
+            return
+        download_and_extract_archive(
+            url="https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz",
+            download_root=str(self._base_folder),
+            md5="c3b158d763b6e2245038c8ad08e45376",
+        )
+        if self._split == "train":
+            download_and_extract_archive(
+                url="https://ai.stanford.edu/~jkrause/car196/cars_train.tgz",
+                download_root=str(self._base_folder),
+                md5="065e5b463ae28d29e77c1b4b166cfe61",
+            )
+        else:
+            download_and_extract_archive(
+                url="https://ai.stanford.edu/~jkrause/car196/cars_test.tgz",
+                download_root=str(self._base_folder),
+                md5="4ce7ebf6a94d07f1952d94dd34c4d501",
+            )
+            download_url(
+                url="https://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat",
+                root=str(self._base_folder),
+                md5="b0a2b23655a3edd16d84508592a98d10",
+            )
+    def _check_exists(self) -> bool:
+        if not (self._base_folder / "devkit").is_dir():
+            return False
+        return self._annotations_mat_path.exists() and self._images_base_path.is_dir()
+class Cars:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=32,
+                 num_workers=16):
+        # Data loading code
+        self.train_dataset = PytorchStanfordCars(location, 'train', preprocess, download=False)
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            shuffle=True,
+            batch_size=batch_size,
+            num_workers=num_workers,
+        )
+        self.test_dataset = PytorchStanfordCars(location, 'test', preprocess, download=False)
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            num_workers=num_workers
+        )
+        idx_to_class = dict((v, k)
+                            for k, v in self.train_dataset.class_to_idx.items())
+        self.classnames = [idx_to_class[i].replace(
+            '_', ' ') for i in range(len(idx_to_class))]

src/datasets/cifar10.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import os
+import PIL
+import torch
+import numpy as np
+import torchvision
+from torchvision import transforms
+from torchvision.datasets import CIFAR10 as PyTorchCIFAR10
+from torchvision.datasets import VisionDataset
+cifar_classnames = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
+class CIFAR10:
+    def __init__(self, preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=128,
+                 num_workers=16):
+        self.train_dataset = PyTorchCIFAR10(
+            root=location, download=True, train=True, transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers
+        )
+        self.test_dataset = PyTorchCIFAR10(
+            root=location, download=True, train=False, transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers
+        )
+        self.classnames = self.test_dataset.classes
+def convert(x):
+    if isinstance(x, np.ndarray):
+        return torchvision.transforms.functional.to_pil_image(x)
+    return x
+class BasicVisionDataset(VisionDataset):
+    def __init__(self, images, targets, transform=None, target_transform=None):
+        if transform is not None:
+            transform.transforms.insert(0, convert)
+        super(BasicVisionDataset, self).__init__(root=None, transform=transform, target_transform=target_transform)
+        assert len(images) == len(targets)
+        self.images = images
+        self.targets = targets
+    def __getitem__(self, index):
+        return self.transform(self.images[index]), self.targets[index]
+    def __len__(self):
+        return len(self.targets)

src/datasets/cifar100.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import os
+import torch
+from torchvision.datasets import CIFAR100 as PyTorchCIFAR100
+class CIFAR100:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=128,
+                 num_workers=16):
+        self.train_dataset = PyTorchCIFAR100(
+            root=location, download=True, train=True, transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset, batch_size=batch_size, num_workers=num_workers
+        )
+        self.test_dataset = PyTorchCIFAR100(
+            root=location, download=True, train=False, transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers
+        )
+        self.classnames = self.test_dataset.classes

src/datasets/common.py ADDED Viewed

	@@ -0,0 +1,139 @@

+import os
+import torch
+import json
+import glob
+import collections
+import random
+import numpy as np
+from tqdm import tqdm
+import torchvision.datasets as datasets
+from torch.utils.data import Dataset, DataLoader, Sampler
+class SubsetSampler(Sampler):
+    def __init__(self, indices):
+        self.indices = indices
+    def __iter__(self):
+        return (i for i in self.indices)
+    def __len__(self):
+        return len(self.indices)
+class ImageFolderWithPaths(datasets.ImageFolder):
+    def __init__(self, path, transform, flip_label_prob=0.0):
+        super().__init__(path, transform)
+        self.flip_label_prob = flip_label_prob
+        if self.flip_label_prob > 0:
+            print(f'Flipping labels with probability {self.flip_label_prob}')
+            num_classes = len(self.classes)
+            for i in range(len(self.samples)):
+                if random.random() < self.flip_label_prob:
+                    new_label = random.randint(0, num_classes-1)
+                    self.samples[i] = (
+                        self.samples[i][0],
+                        new_label
+                    )
+    def __getitem__(self, index):
+        image, label = super(ImageFolderWithPaths, self).__getitem__(index)
+        return {
+            'images': image,
+            'labels': label,
+            'image_paths': self.samples[index][0]
+        }
+def maybe_dictionarize(batch):
+    if isinstance(batch, dict):
+        return batch
+    if len(batch) == 2:
+        batch = {'images': batch[0], 'labels': batch[1]}
+    elif len(batch) == 3:
+        batch = {'images': batch[0], 'labels': batch[1], 'metadata': batch[2]}
+    else:
+        raise ValueError(f'Unexpected number of elements: {len(batch)}')
+    return batch
+def get_features_helper(image_encoder, dataloader, device):
+    all_data = collections.defaultdict(list)
+    image_encoder = image_encoder.to(device)
+    image_encoder = torch.nn.DataParallel(image_encoder, device_ids=[x for x in range(torch.cuda.device_count())])
+    image_encoder.eval()
+    with torch.no_grad():
+        for batch in tqdm(dataloader):
+            batch = maybe_dictionarize(batch)
+            features = image_encoder(batch['images'].cuda())
+            all_data['features'].append(features.cpu())
+            for key, val in batch.items():
+                if key == 'images':
+                    continue
+                if hasattr(val, 'cpu'):
+                    val = val.cpu()
+                    all_data[key].append(val)
+                else:
+                    all_data[key].extend(val)
+    for key, val in all_data.items():
+        if torch.is_tensor(val[0]):
+            all_data[key] = torch.cat(val).numpy()
+    return all_data
+def get_features(is_train, image_encoder, dataset, device):
+    split = 'train' if is_train else 'val'
+    dname = type(dataset).__name__
+    if image_encoder.cache_dir is not None:
+        cache_dir = f'{image_encoder.cache_dir}/{dname}/{split}'
+        cached_files = glob.glob(f'{cache_dir}/*')
+    if image_encoder.cache_dir is not None and len(cached_files) > 0:
+        print(f'Getting features from {cache_dir}')
+        data = {}
+        for cached_file in cached_files:
+            name = os.path.splitext(os.path.basename(cached_file))[0]
+            data[name] = torch.load(cached_file)
+    else:
+        print(f'Did not find cached features at {cache_dir}. Building from scratch.')
+        loader = dataset.train_loader if is_train else dataset.test_loader
+        data = get_features_helper(image_encoder, loader, device)
+        if image_encoder.cache_dir is None:
+            print('Not caching because no cache directory was passed.')
+        else:
+            os.makedirs(cache_dir, exist_ok=True)
+            print(f'Caching data at {cache_dir}')
+            for name, val in data.items():
+                torch.save(val, f'{cache_dir}/{name}.pt')
+    return data
+class FeatureDataset(Dataset):
+    def __init__(self, is_train, image_encoder, dataset, device):
+        self.data = get_features(is_train, image_encoder, dataset, device)
+    def __len__(self):
+        return len(self.data['features'])
+    def __getitem__(self, idx):
+        data = {k: v[idx] for k, v in self.data.items()}
+        data['features'] = torch.from_numpy(data['features']).float()
+        return data
+def get_dataloader(dataset, is_train, args, image_encoder=None):
+    if image_encoder is not None:
+        feature_dataset = FeatureDataset(is_train, image_encoder, dataset, args.device)
+        dataloader = DataLoader(feature_dataset, batch_size=args.batch_size, shuffle=is_train)
+    else:
+        dataloader = dataset.train_loader if is_train else dataset.test_loader
+    return dataloader

src/datasets/dtd.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import os
+import torch
+import torchvision.datasets as datasets
+class DTD:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=32,
+                 num_workers=16):
+        # Data loading code
+        traindir = os.path.join(location, 'dtd', 'train')
+        valdir = os.path.join(location, 'dtd', 'val')
+        self.train_dataset = datasets.ImageFolder(
+            traindir, transform=preprocess)
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            shuffle=True,
+            batch_size=batch_size,
+            num_workers=num_workers,
+        )
+        self.test_dataset = datasets.ImageFolder(valdir, transform=preprocess)
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            num_workers=num_workers
+        )
+        idx_to_class = dict((v, k)
+                            for k, v in self.train_dataset.class_to_idx.items())
+        self.classnames = [idx_to_class[i].replace(
+            '_', ' ') for i in range(len(idx_to_class))]

src/datasets/emnist.py ADDED Viewed

	@@ -0,0 +1,74 @@

+import os
+import torch
+import torchvision
+import torchvision.datasets as datasets
+def rotate_img(img):
+    return torchvision.transforms.functional.rotate(img, -90)
+def flip_img(img):
+    return torchvision.transforms.functional.hflip(img)
+def emnist_preprocess():
+    return torchvision.transforms.Compose(
+        [
+            rotate_img,
+            flip_img,
+        ]
+    )
+class EMNIST:
+    def __init__(
+        self,
+        preprocess,
+        location,
+        batch_size=128,
+        num_workers=8,
+    ):
+        preprocess1 = emnist_preprocess()
+        preprocess = torchvision.transforms.Compose(
+            [
+                preprocess,
+                preprocess1,
+            ]
+        )
+        # if not os.path.exists(location):
+        #     os.makedirs(location, exist_ok=True)
+        self.train_dataset = datasets.EMNIST(
+            root=location,
+            download=True,
+            split="digits",
+            transform=preprocess,
+            train=True,
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            batch_size=batch_size,
+            shuffle=True,
+            num_workers=num_workers,
+        )
+        self.test_dataset = datasets.EMNIST(
+            root=location,
+            download=True,
+            split="digits",
+            transform=preprocess,
+            train=False,
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=32,
+            shuffle=False,
+            num_workers=num_workers,
+        )
+        self.classnames = self.train_dataset.classes

src/datasets/eurosat.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import os
+import torch
+import torchvision.datasets as datasets
+import re
+def pretify_classname(classname):
+    l = re.findall(r'[A-Z](?:[a-z]+|[A-Z]*(?=[A-Z]|$))', classname)
+    l = [i.lower() for i in l]
+    out = ' '.join(l)
+    if out.endswith('al'):
+        return out + ' area'
+    return out
+class EuroSATBase:
+    def __init__(self,
+                 preprocess,
+                 test_split,
+                 location='~/datasets',
+                 batch_size=32,
+                 num_workers=16):
+        # Data loading code
+        traindir = os.path.join(location, 'EuroSAT_splits', 'train')
+        testdir = os.path.join(location, 'EuroSAT_splits', test_split)
+        self.train_dataset = datasets.ImageFolder(traindir, transform=preprocess)
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            shuffle=True,
+            batch_size=batch_size,
+            num_workers=num_workers,
+        )
+        self.test_dataset = datasets.ImageFolder(testdir, transform=preprocess)
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            num_workers=num_workers
+        )
+        idx_to_class = dict((v, k)
+                            for k, v in self.train_dataset.class_to_idx.items())
+        self.classnames = [idx_to_class[i].replace('_', ' ') for i in range(len(idx_to_class))]
+        self.classnames = [pretify_classname(c) for c in self.classnames]
+        ours_to_open_ai = {
+            'annual crop': 'annual crop land',
+            'forest': 'forest',
+            'herbaceous vegetation': 'brushland or shrubland',
+            'highway': 'highway or road',
+            'industrial area': 'industrial buildings or commercial buildings',
+            'pasture': 'pasture land',
+            'permanent crop': 'permanent crop land',
+            'residential area': 'residential buildings or homes or apartments',
+            'river': 'river',
+            'sea lake': 'lake or sea',
+        }
+        for i in range(len(self.classnames)):
+            self.classnames[i] = ours_to_open_ai[self.classnames[i]]
+class EuroSAT(EuroSATBase):
+    def __init__(self,
+                 preprocess,
+                 location='~/datasets',
+                 batch_size=32,
+                 num_workers=16):
+        super().__init__(preprocess, 'test', location, batch_size, num_workers)
+class EuroSATVal(EuroSATBase):
+    def __init__(self,
+                 preprocess,
+                 location='~/datasets',
+                 batch_size=32,
+                 num_workers=16):
+        super().__init__(preprocess, 'val', location, batch_size, num_workers)

src/datasets/gtsrb.py ADDED Viewed

	@@ -0,0 +1,205 @@

+import csv
+import os
+import pathlib
+from typing import Any, Callable, Dict, List, Optional, Tuple
+import numpy as np
+import PIL
+import torch
+from torchvision.datasets.folder import make_dataset
+from torchvision.datasets.utils import (download_and_extract_archive,
+                                        verify_str_arg)
+from torchvision.datasets.vision import VisionDataset
+def find_classes(directory: str) -> Tuple[List[str], Dict[str, int]]:
+    """Finds the class folders in a dataset.
+    See :class:`DatasetFolder` for details.
+    """
+    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
+    if not classes:
+        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
+    class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
+    return classes, class_to_idx
+class PyTorchGTSRB(VisionDataset):
+    """`German Traffic Sign Recognition Benchmark (GTSRB) <https://benchmark.ini.rub.de/>`_ Dataset.
+    Modified from https://pytorch.org/vision/main/_modules/torchvision/datasets/gtsrb.html#GTSRB.
+    Args:
+        root (string): Root directory of the dataset.
+        split (string, optional): The dataset split, supports ``"train"`` (default), or ``"test"``.
+        transform (callable, optional): A function/transform that  takes in an PIL image and returns a transformed
+            version. E.g, ``transforms.RandomCrop``.
+        target_transform (callable, optional): A function/transform that takes in the target and transforms it.
+        download (bool, optional): If True, downloads the dataset from the internet and
+            puts it in root directory. If dataset is already downloaded, it is not
+            downloaded again.
+    """
+    def __init__(
+        self,
+        root: str,
+        split: str = "train",
+        transform: Optional[Callable] = None,
+        target_transform: Optional[Callable] = None,
+        download: bool = False,
+    ) -> None:
+        super().__init__(root, transform=transform, target_transform=target_transform)
+        self._split = verify_str_arg(split, "split", ("train", "test"))
+        self._base_folder = pathlib.Path(root) / "gtsrb"
+        self._target_folder = (
+            self._base_folder / "GTSRB" / ("Training" if self._split == "train" else "Final_Test/Images")
+        )
+        if download:
+            self.download()
+        if not self._check_exists():
+            raise RuntimeError("Dataset not found. You can use download=True to download it")
+        if self._split == "train":
+            _, class_to_idx = find_classes(str(self._target_folder))
+            samples = make_dataset(str(self._target_folder), extensions=(".ppm",), class_to_idx=class_to_idx)
+        else:
+            with open(self._base_folder / "GT-final_test.csv") as csv_file:
+                samples = [
+                    (str(self._target_folder / row["Filename"]), int(row["ClassId"]))
+                    for row in csv.DictReader(csv_file, delimiter=";", skipinitialspace=True)
+                ]
+        self._samples = samples
+        self.transform = transform
+        self.target_transform = target_transform
+    def __len__(self) -> int:
+        return len(self._samples)
+    def __getitem__(self, index: int) -> Tuple[Any, Any]:
+        path, target = self._samples[index]
+        sample = PIL.Image.open(path).convert("RGB")
+        if self.transform is not None:
+            sample = self.transform(sample)
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+        return sample, target
+    def _check_exists(self) -> bool:
+        return self._target_folder.is_dir()
+    def download(self) -> None:
+        if self._check_exists():
+            return
+        base_url = "https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/"
+        if self._split == "train":
+            download_and_extract_archive(
+                f"{base_url}GTSRB-Training_fixed.zip",
+                download_root=str(self._base_folder),
+                md5="513f3c79a4c5141765e10e952eaa2478",
+            )
+        else:
+            download_and_extract_archive(
+                f"{base_url}GTSRB_Final_Test_Images.zip",
+                download_root=str(self._base_folder),
+                md5="c7e4e6327067d32654124b0fe9e82185",
+            )
+            download_and_extract_archive(
+                f"{base_url}GTSRB_Final_Test_GT.zip",
+                download_root=str(self._base_folder),
+                md5="fe31e9c9270bbcd7b84b7f21a9d9d9e5",
+            )
+class GTSRB:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=128,
+                 num_workers=16):
+        # to fit with repo conventions for location
+        self.train_dataset = PyTorchGTSRB(
+            root=location,
+            download=True,
+            split='train',
+            transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            batch_size=batch_size,
+            shuffle=True,
+            num_workers=num_workers
+        )
+        self.test_dataset = PyTorchGTSRB(
+            root=location,
+            download=True,
+            split='test',
+            transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=num_workers
+        )
+        # from https://github.com/openai/CLIP/blob/e184f608c5d5e58165682f7c332c3a8b4c1545f2/data/prompts.md
+        self.classnames = [
+            'red and white circle 20 kph speed limit',
+            'red and white circle 30 kph speed limit',
+            'red and white circle 50 kph speed limit',
+            'red and white circle 60 kph speed limit',
+            'red and white circle 70 kph speed limit',
+            'red and white circle 80 kph speed limit',
+            'end / de-restriction of 80 kph speed limit',
+            'red and white circle 100 kph speed limit',
+            'red and white circle 120 kph speed limit',
+            'red and white circle red car and black car no passing',
+            'red and white circle red truck and black car no passing',
+            'red and white triangle road intersection warning',
+            'white and yellow diamond priority road',
+            'red and white upside down triangle yield right-of-way',
+            'stop',
+            'empty red and white circle',
+            'red and white circle no truck entry',
+            'red circle with white horizonal stripe no entry',
+            'red and white triangle with exclamation mark warning',
+            'red and white triangle with black left curve approaching warning',
+            'red and white triangle with black right curve approaching warning',
+            'red and white triangle with black double curve approaching warning',
+            'red and white triangle rough / bumpy road warning',
+            'red and white triangle car skidding / slipping warning',
+            'red and white triangle with merging / narrow lanes warning',
+            'red and white triangle with person digging / construction / road work warning',
+            'red and white triangle with traffic light approaching warning',
+            'red and white triangle with person walking warning',
+            'red and white triangle with child and person walking warning',
+            'red and white triangle with bicyle warning',
+            'red and white triangle with snowflake / ice warning',
+            'red and white triangle with deer warning',
+            'white circle with gray strike bar no speed limit',
+            'blue circle with white right turn arrow mandatory',
+            'blue circle with white left turn arrow mandatory',
+            'blue circle with white forward arrow mandatory',
+            'blue circle with white forward or right turn arrow mandatory',
+            'blue circle with white forward or left turn arrow mandatory',
+            'blue circle with white keep right arrow mandatory',
+            'blue circle with white keep left arrow mandatory',
+            'blue circle with white arrows indicating a traffic circle',
+            'white circle with gray strike bar indicating no passing for cars has ended',
+            'white circle with gray strike bar indicating no passing for trucks has ended',
+        ]

src/datasets/imagenet.py ADDED Viewed

	@@ -0,0 +1,253 @@

+import os
+import torch
+from .common import ImageFolderWithPaths, SubsetSampler
+import numpy as np
+imagenet_classnames = [
+    "tench", "goldfish", "great white shark", "tiger shark", "hammerhead shark", "electric ray",
+    "stingray", "rooster", "hen", "ostrich", "brambling", "goldfinch", "house finch", "junco",
+    "indigo bunting", "American robin", "bulbul", "jay", "magpie", "chickadee", "American dipper",
+    "kite (bird of prey)", "bald eagle", "vulture", "great grey owl", "fire salamander",
+    "smooth newt", "newt", "spotted salamander", "axolotl", "American bullfrog", "tree frog",
+    "tailed frog", "loggerhead sea turtle", "leatherback sea turtle", "mud turtle", "terrapin",
+    "box turtle", "banded gecko", "green iguana", "Carolina anole",
+    "desert grassland whiptail lizard", "agama", "frilled-necked lizard", "alligator lizard",
+    "Gila monster", "European green lizard", "chameleon", "Komodo dragon", "Nile crocodile",
+    "American alligator", "triceratops", "worm snake", "ring-necked snake",
+    "eastern hog-nosed snake", "smooth green snake", "kingsnake", "garter snake", "water snake",
+    "vine snake", "night snake", "boa constrictor", "African rock python", "Indian cobra",
+    "green mamba", "sea snake", "Saharan horned viper", "eastern diamondback rattlesnake",
+    "sidewinder rattlesnake", "trilobite", "harvestman", "scorpion", "yellow garden spider",
+    "barn spider", "European garden spider", "southern black widow", "tarantula", "wolf spider",
+    "tick", "centipede", "black grouse", "ptarmigan", "ruffed grouse", "prairie grouse", "peafowl",
+    "quail", "partridge", "african grey parrot", "macaw", "sulphur-crested cockatoo", "lorikeet",
+    "coucal", "bee eater", "hornbill", "hummingbird", "jacamar", "toucan", "duck",
+    "red-breasted merganser", "goose", "black swan", "tusker", "echidna", "platypus", "wallaby",
+    "koala", "wombat", "jellyfish", "sea anemone", "brain coral", "flatworm", "nematode", "conch",
+    "snail", "slug", "sea slug", "chiton", "chambered nautilus", "Dungeness crab", "rock crab",
+    "fiddler crab", "red king crab", "American lobster", "spiny lobster", "crayfish", "hermit crab",
+    "isopod", "white stork", "black stork", "spoonbill", "flamingo", "little blue heron",
+    "great egret", "bittern bird", "crane bird", "limpkin", "common gallinule", "American coot",
+    "bustard", "ruddy turnstone", "dunlin", "common redshank", "dowitcher", "oystercatcher",
+    "pelican", "king penguin", "albatross", "grey whale", "killer whale", "dugong", "sea lion",
+    "Chihuahua", "Japanese Chin", "Maltese", "Pekingese", "Shih Tzu", "King Charles Spaniel",
+    "Papillon", "toy terrier", "Rhodesian Ridgeback", "Afghan Hound", "Basset Hound", "Beagle",
+    "Bloodhound", "Bluetick Coonhound", "Black and Tan Coonhound", "Treeing Walker Coonhound",
+    "English foxhound", "Redbone Coonhound", "borzoi", "Irish Wolfhound", "Italian Greyhound",
+    "Whippet", "Ibizan Hound", "Norwegian Elkhound", "Otterhound", "Saluki", "Scottish Deerhound",
+    "Weimaraner", "Staffordshire Bull Terrier", "American Staffordshire Terrier",
+    "Bedlington Terrier", "Border Terrier", "Kerry Blue Terrier", "Irish Terrier",
+    "Norfolk Terrier", "Norwich Terrier", "Yorkshire Terrier", "Wire Fox Terrier",
+    "Lakeland Terrier", "Sealyham Terrier", "Airedale Terrier", "Cairn Terrier",
+    "Australian Terrier", "Dandie Dinmont Terrier", "Boston Terrier", "Miniature Schnauzer",
+    "Giant Schnauzer", "Standard Schnauzer", "Scottish Terrier", "Tibetan Terrier",
+    "Australian Silky Terrier", "Soft-coated Wheaten Terrier", "West Highland White Terrier",
+    "Lhasa Apso", "Flat-Coated Retriever", "Curly-coated Retriever", "Golden Retriever",
+    "Labrador Retriever", "Chesapeake Bay Retriever", "German Shorthaired Pointer", "Vizsla",
+    "English Setter", "Irish Setter", "Gordon Setter", "Brittany dog", "Clumber Spaniel",
+    "English Springer Spaniel", "Welsh Springer Spaniel", "Cocker Spaniel", "Sussex Spaniel",
+    "Irish Water Spaniel", "Kuvasz", "Schipperke", "Groenendael dog", "Malinois", "Briard",
+    "Australian Kelpie", "Komondor", "Old English Sheepdog", "Shetland Sheepdog", "collie",
+    "Border Collie", "Bouvier des Flandres dog", "Rottweiler", "German Shepherd Dog", "Dobermann",
+    "Miniature Pinscher", "Greater Swiss Mountain Dog", "Bernese Mountain Dog",
+    "Appenzeller Sennenhund", "Entlebucher Sennenhund", "Boxer", "Bullmastiff", "Tibetan Mastiff",
+    "French Bulldog", "Great Dane", "St. Bernard", "husky", "Alaskan Malamute", "Siberian Husky",
+    "Dalmatian", "Affenpinscher", "Basenji", "pug", "Leonberger", "Newfoundland dog",
+    "Great Pyrenees dog", "Samoyed", "Pomeranian", "Chow Chow", "Keeshond", "brussels griffon",
+    "Pembroke Welsh Corgi", "Cardigan Welsh Corgi", "Toy Poodle", "Miniature Poodle",
+    "Standard Poodle", "Mexican hairless dog (xoloitzcuintli)", "grey wolf", "Alaskan tundra wolf",
+    "red wolf or maned wolf", "coyote", "dingo", "dhole", "African wild dog", "hyena", "red fox",
+    "kit fox", "Arctic fox", "grey fox", "tabby cat", "tiger cat", "Persian cat", "Siamese cat",
+    "Egyptian Mau", "cougar", "lynx", "leopard", "snow leopard", "jaguar", "lion", "tiger",
+    "cheetah", "brown bear", "American black bear", "polar bear", "sloth bear", "mongoose",
+    "meerkat", "tiger beetle", "ladybug", "ground beetle", "longhorn beetle", "leaf beetle",
+    "dung beetle", "rhinoceros beetle", "weevil", "fly", "bee", "ant", "grasshopper",
+    "cricket insect", "stick insect", "cockroach", "praying mantis", "cicada", "leafhopper",
+    "lacewing", "dragonfly", "damselfly", "red admiral butterfly", "ringlet butterfly",
+    "monarch butterfly", "small white butterfly", "sulphur butterfly", "gossamer-winged butterfly",
+    "starfish", "sea urchin", "sea cucumber", "cottontail rabbit", "hare", "Angora rabbit",
+    "hamster", "porcupine", "fox squirrel", "marmot", "beaver", "guinea pig", "common sorrel horse",
+    "zebra", "pig", "wild boar", "warthog", "hippopotamus", "ox", "water buffalo", "bison",
+    "ram (adult male sheep)", "bighorn sheep", "Alpine ibex", "hartebeest", "impala (antelope)",
+    "gazelle", "arabian camel", "llama", "weasel", "mink", "European polecat",
+    "black-footed ferret", "otter", "skunk", "badger", "armadillo", "three-toed sloth", "orangutan",
+    "gorilla", "chimpanzee", "gibbon", "siamang", "guenon", "patas monkey", "baboon", "macaque",
+    "langur", "black-and-white colobus", "proboscis monkey", "marmoset", "white-headed capuchin",
+    "howler monkey", "titi monkey", "Geoffroy's spider monkey", "common squirrel monkey",
+    "ring-tailed lemur", "indri", "Asian elephant", "African bush elephant", "red panda",
+    "giant panda", "snoek fish", "eel", "silver salmon", "rock beauty fish", "clownfish",
+    "sturgeon", "gar fish", "lionfish", "pufferfish", "abacus", "abaya", "academic gown",
+    "accordion", "acoustic guitar", "aircraft carrier", "airliner", "airship", "altar", "ambulance",
+    "amphibious vehicle", "analog clock", "apiary", "apron", "trash can", "assault rifle",
+    "backpack", "bakery", "balance beam", "balloon", "ballpoint pen", "Band-Aid", "banjo",
+    "baluster / handrail", "barbell", "barber chair", "barbershop", "barn", "barometer", "barrel",
+    "wheelbarrow", "baseball", "basketball", "bassinet", "bassoon", "swimming cap", "bath towel",
+    "bathtub", "station wagon", "lighthouse", "beaker", "military hat (bearskin or shako)",
+    "beer bottle", "beer glass", "bell tower", "baby bib", "tandem bicycle", "bikini",
+    "ring binder", "binoculars", "birdhouse", "boathouse", "bobsleigh", "bolo tie", "poke bonnet",
+    "bookcase", "bookstore", "bottle cap", "hunting bow", "bow tie", "brass memorial plaque", "bra",
+    "breakwater", "breastplate", "broom", "bucket", "buckle", "bulletproof vest",
+    "high-speed train", "butcher shop", "taxicab", "cauldron", "candle", "cannon", "canoe",
+    "can opener", "cardigan", "car mirror", "carousel", "tool kit", "cardboard box / carton",
+    "car wheel", "automated teller machine", "cassette", "cassette player", "castle", "catamaran",
+    "CD player", "cello", "mobile phone", "chain", "chain-link fence", "chain mail", "chainsaw",
+    "storage chest", "chiffonier", "bell or wind chime", "china cabinet", "Christmas stocking",
+    "church", "movie theater", "cleaver", "cliff dwelling", "cloak", "clogs", "cocktail shaker",
+    "coffee mug", "coffeemaker", "spiral or coil", "combination lock", "computer keyboard",
+    "candy store", "container ship", "convertible", "corkscrew", "cornet", "cowboy boot",
+    "cowboy hat", "cradle", "construction crane", "crash helmet", "crate", "infant bed",
+    "Crock Pot", "croquet ball", "crutch", "cuirass", "dam", "desk", "desktop computer",
+    "rotary dial telephone", "diaper", "digital clock", "digital watch", "dining table",
+    "dishcloth", "dishwasher", "disc brake", "dock", "dog sled", "dome", "doormat", "drilling rig",
+    "drum", "drumstick", "dumbbell", "Dutch oven", "electric fan", "electric guitar",
+    "electric locomotive", "entertainment center", "envelope", "espresso machine", "face powder",
+    "feather boa", "filing cabinet", "fireboat", "fire truck", "fire screen", "flagpole", "flute",
+    "folding chair", "football helmet", "forklift", "fountain", "fountain pen", "four-poster bed",
+    "freight car", "French horn", "frying pan", "fur coat", "garbage truck",
+    "gas mask or respirator", "gas pump", "goblet", "go-kart", "golf ball", "golf cart", "gondola",
+    "gong", "gown", "grand piano", "greenhouse", "radiator grille", "grocery store", "guillotine",
+    "hair clip", "hair spray", "half-track", "hammer", "hamper", "hair dryer", "hand-held computer",
+    "handkerchief", "hard disk drive", "harmonica", "harp", "combine harvester", "hatchet",
+    "holster", "home theater", "honeycomb", "hook", "hoop skirt", "gymnastic horizontal bar",
+    "horse-drawn vehicle", "hourglass", "iPod", "clothes iron", "carved pumpkin", "jeans", "jeep",
+    "T-shirt", "jigsaw puzzle", "rickshaw", "joystick", "kimono", "knee pad", "knot", "lab coat",
+    "ladle", "lampshade", "laptop computer", "lawn mower", "lens cap", "letter opener", "library",
+    "lifeboat", "lighter", "limousine", "ocean liner", "lipstick", "slip-on shoe", "lotion",
+    "music speaker", "loupe magnifying glass", "sawmill", "magnetic compass", "messenger bag",
+    "mailbox", "tights", "one-piece bathing suit", "manhole cover", "maraca", "marimba", "mask",
+    "matchstick", "maypole", "maze", "measuring cup", "medicine cabinet", "megalith", "microphone",
+    "microwave oven", "military uniform", "milk can", "minibus", "miniskirt", "minivan", "missile",
+    "mitten", "mixing bowl", "mobile home", "ford model t", "modem", "monastery", "monitor",
+    "moped", "mortar and pestle", "graduation cap", "mosque", "mosquito net", "vespa",
+    "mountain bike", "tent", "computer mouse", "mousetrap", "moving van", "muzzle", "metal nail",
+    "neck brace", "necklace", "baby pacifier", "notebook computer", "obelisk", "oboe", "ocarina",
+    "odometer", "oil filter", "pipe organ", "oscilloscope", "overskirt", "bullock cart",
+    "oxygen mask", "product packet / packaging", "paddle", "paddle wheel", "padlock", "paintbrush",
+    "pajamas", "palace", "pan flute", "paper towel", "parachute", "parallel bars", "park bench",
+    "parking meter", "railroad car", "patio", "payphone", "pedestal", "pencil case",
+    "pencil sharpener", "perfume", "Petri dish", "photocopier", "plectrum", "Pickelhaube",
+    "picket fence", "pickup truck", "pier", "piggy bank", "pill bottle", "pillow", "ping-pong ball",
+    "pinwheel", "pirate ship", "drink pitcher", "block plane", "planetarium", "plastic bag",
+    "plate rack", "farm plow", "plunger", "Polaroid camera", "pole", "police van", "poncho",
+    "pool table", "soda bottle", "plant pot", "potter's wheel", "power drill", "prayer rug",
+    "printer", "prison", "missile", "projector", "hockey puck", "punching bag", "purse", "quill",
+    "quilt", "race car", "racket", "radiator", "radio", "radio telescope", "rain barrel",
+    "recreational vehicle", "fishing casting reel", "reflex camera", "refrigerator",
+    "remote control", "restaurant", "revolver", "rifle", "rocking chair", "rotisserie", "eraser",
+    "rugby ball", "ruler measuring stick", "sneaker", "safe", "safety pin", "salt shaker", "sandal",
+    "sarong", "saxophone", "scabbard", "weighing scale", "school bus", "schooner", "scoreboard",
+    "CRT monitor", "screw", "screwdriver", "seat belt", "sewing machine", "shield", "shoe store",
+    "shoji screen / room divider", "shopping basket", "shopping cart", "shovel", "shower cap",
+    "shower curtain", "ski", "balaclava ski mask", "sleeping bag", "slide rule", "sliding door",
+    "slot machine", "snorkel", "snowmobile", "snowplow", "soap dispenser", "soccer ball", "sock",
+    "solar thermal collector", "sombrero", "soup bowl", "keyboard space bar", "space heater",
+    "space shuttle", "spatula", "motorboat", "spider web", "spindle", "sports car", "spotlight",
+    "stage", "steam locomotive", "through arch bridge", "steel drum", "stethoscope", "scarf",
+    "stone wall", "stopwatch", "stove", "strainer", "tram", "stretcher", "couch", "stupa",
+    "submarine", "suit", "sundial", "sunglasses", "sunglasses", "sunscreen", "suspension bridge",
+    "mop", "sweatshirt", "swim trunks / shorts", "swing", "electrical switch", "syringe",
+    "table lamp", "tank", "tape player", "teapot", "teddy bear", "television", "tennis ball",
+    "thatched roof", "front curtain", "thimble", "threshing machine", "throne", "tile roof",
+    "toaster", "tobacco shop", "toilet seat", "torch", "totem pole", "tow truck", "toy store",
+    "tractor", "semi-trailer truck", "tray", "trench coat", "tricycle", "trimaran", "tripod",
+    "triumphal arch", "trolleybus", "trombone", "hot tub", "turnstile", "typewriter keyboard",
+    "umbrella", "unicycle", "upright piano", "vacuum cleaner", "vase", "vaulted or arched ceiling",
+    "velvet fabric", "vending machine", "vestment", "viaduct", "violin", "volleyball",
+    "waffle iron", "wall clock", "wallet", "wardrobe", "military aircraft", "sink",
+    "washing machine", "water bottle", "water jug", "water tower", "whiskey jug", "whistle",
+    "hair wig", "window screen", "window shade", "Windsor tie", "wine bottle", "airplane wing",
+    "wok", "wooden spoon", "wool", "split-rail fence", "shipwreck", "sailboat", "yurt", "website",
+    "comic book", "crossword", "traffic or street sign", "traffic light", "dust jacket", "menu",
+    "plate", "guacamole", "consomme", "hot pot", "trifle", "ice cream", "popsicle", "baguette",
+    "bagel", "pretzel", "cheeseburger", "hot dog", "mashed potatoes", "cabbage", "broccoli",
+    "cauliflower", "zucchini", "spaghetti squash", "acorn squash", "butternut squash", "cucumber",
+    "artichoke", "bell pepper", "cardoon", "mushroom", "Granny Smith apple", "strawberry", "orange",
+    "lemon", "fig", "pineapple", "banana", "jackfruit", "cherimoya (custard apple)", "pomegranate",
+    "hay", "carbonara", "chocolate syrup", "dough", "meatloaf", "pizza", "pot pie", "burrito",
+    "red wine", "espresso", "tea cup", "eggnog", "mountain", "bubble", "cliff", "coral reef",
+    "geyser", "lakeshore", "promontory", "sandbar", "beach", "valley", "volcano", "baseball player",
+    "bridegroom", "scuba diver", "rapeseed", "daisy", "yellow lady's slipper", "corn", "acorn",
+    "rose hip", "horse chestnut seed", "coral fungus", "agaric", "gyromitra", "stinkhorn mushroom",
+    "earth star fungus", "hen of the woods mushroom", "bolete", "corn cob", "toilet paper"
+]
+class ImageNet:
+    def __init__(self,
+                 preprocess,
+                 location='~/data',
+                 batch_size=32,
+                 num_workers=32):
+        self.preprocess = preprocess
+        self.location = '/path/ImageNet2012/' # TODO
+        self.batch_size = batch_size
+        self.num_workers = num_workers
+        self.classnames = imagenet_classnames
+        self.populate_train()
+        self.populate_test()
+    def populate_train(self):
+        traindir = os.path.join(self.location, 'train')
+        self.train_dataset = ImageFolderWithPaths(
+            traindir,
+            transform=self.preprocess)
+        sampler = self.get_train_sampler()
+        kwargs = {'shuffle' : True} if sampler is None else {}
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            sampler=sampler,
+            batch_size=self.batch_size,
+            num_workers=self.num_workers,
+            **kwargs,
+        )
+    def populate_test(self):
+        self.test_dataset = self.get_test_dataset()
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=self.batch_size,
+            num_workers=self.num_workers,
+            sampler=self.get_test_sampler()
+        )
+    def get_test_path(self):
+        test_path = os.path.join(self.location, 'val_dir')
+        if not os.path.exists(test_path):
+            test_path = os.path.join(self.location,'val')
+        return test_path
+    def get_train_sampler(self):
+        return None
+    def get_test_sampler(self):
+        return None
+    def get_test_dataset(self):
+        return ImageFolderWithPaths(self.get_test_path(), transform=self.preprocess)
+    def name(self):
+        return 'imagenet'
+class ImageNetTrain(ImageNet):
+    def get_test_dataset(self):
+        pass
+class ImageNetK(ImageNet):
+    def get_train_sampler(self):
+        idxs = np.zeros(len(self.train_dataset.targets))
+        target_array = np.array(self.train_dataset.targets)
+        for c in range(1000):
+            m = target_array == c
+            n = len(idxs[m])
+            arr = np.zeros(n)
+            arr[:self.k()] = 1
+            np.random.shuffle(arr)
+            idxs[m] = arr
+        idxs = idxs.astype('int')
+        sampler = SubsetSampler(np.where(idxs)[0])
+        return sampler

src/datasets/kmnist.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import os
+import torch
+import torchvision.datasets as datasets
+class KMNIST:
+    def __init__(
+        self,
+        preprocess,
+        location=os.path.expanduser("~/data"),
+        batch_size=128,
+        num_workers=6,
+    ):
+        location = os.path.join(location, "KMNIST")
+        self.train_dataset = datasets.KMNIST(
+            root=location, download=True, train=True, transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            batch_size=batch_size,
+            shuffle=True,
+            num_workers=num_workers,
+        )
+        self.test_dataset = datasets.KMNIST(
+            root=location, download=True, train=False, transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=num_workers,
+        )
+        self.classnames = self.train_dataset.classes

src/datasets/mnist.py ADDED Viewed

	@@ -0,0 +1,41 @@

+import os
+import torch
+import torchvision.datasets as datasets
+class MNIST:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=128,
+                 num_workers=16):
+        self.train_dataset = datasets.MNIST(
+            root=location,
+            download=True,
+            train=True,
+            transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            batch_size=batch_size,
+            shuffle=True,
+            num_workers=num_workers
+        )
+        self.test_dataset = datasets.MNIST(
+            root=location,
+            download=True,
+            train=False,
+            transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=num_workers
+        )
+        self.classnames = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

src/datasets/oxfordpets.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import os
+import torch
+import torchvision.datasets as datasets
+class OxfordIIITPet:
+    def __init__(
+        self,
+        preprocess,
+        location=os.path.expanduser("~/data"),
+        batch_size=128,
+        num_workers=6,
+    ):
+        location = os.path.join(location, "OxfordIIITPet")
+        self.train_dataset = datasets.OxfordIIITPet(
+            root=location, download=True, split="trainval", transform=preprocess
+        )
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            batch_size=batch_size,
+            shuffle=True,
+            num_workers=num_workers,
+        )
+        self.test_dataset = datasets.OxfordIIITPet(
+            root=location, download=True, split="test", transform=preprocess
+        )
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=num_workers,
+        )
+        self.classnames = self.train_dataset.classes

src/datasets/registry.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import sys
+import inspect
+import random
+import torch
+import copy
+from torch.utils.data.dataset import random_split
+from src.datasets.cars import Cars
+from src.datasets.cifar10 import CIFAR10
+from src.datasets.cifar100 import CIFAR100
+from src.datasets.dtd import DTD
+from src.datasets.eurosat import EuroSAT, EuroSATVal
+from src.datasets.gtsrb import GTSRB
+from src.datasets.imagenet import ImageNet
+from src.datasets.mnist import MNIST
+from src.datasets.resisc45 import RESISC45
+from src.datasets.stl10 import STL10
+from src.datasets.svhn import SVHN
+from src.datasets.sun397 import SUN397
+from src.datasets.emnist import EMNIST
+from src.datasets.kmnist import KMNIST
+from src.datasets.oxfordpets import OxfordIIITPet
+registry = {
+    name: obj for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass)
+}
+class GenericDataset(object):
+    def __init__(self):
+        self.train_dataset = None
+        self.train_loader = None
+        self.test_dataset = None
+        self.test_loader = None
+        self.classnames = None
+def split_train_into_train_val(dataset, new_dataset_class_name, batch_size, num_workers, val_fraction, max_val_samples=None, seed=0):
+    assert val_fraction > 0. and val_fraction < 1.
+    total_size = len(dataset.train_dataset)
+    val_size = int(total_size * val_fraction)
+    if max_val_samples is not None:
+        val_size = min(val_size, max_val_samples)
+    train_size = total_size - val_size
+    assert val_size > 0
+    assert train_size > 0
+    lengths = [train_size, val_size]
+    trainset, valset = random_split(
+        dataset.train_dataset,
+        lengths,
+        generator=torch.Generator().manual_seed(seed)
+    )
+    if new_dataset_class_name == 'MNISTVal':
+        assert trainset.indices[0] == 36044
+    new_dataset = None
+    new_dataset_class = type(new_dataset_class_name, (GenericDataset, ), {})
+    new_dataset = new_dataset_class()
+    new_dataset.train_dataset = trainset
+    new_dataset.train_loader = torch.utils.data.DataLoader(
+        new_dataset.train_dataset,
+        shuffle=True,
+        batch_size=batch_size,
+        num_workers=num_workers,
+    )
+    new_dataset.test_dataset = valset
+    new_dataset.test_loader = torch.utils.data.DataLoader(
+        new_dataset.test_dataset,
+        batch_size=batch_size,
+        num_workers=num_workers
+    )
+    new_dataset.classnames = copy.copy(dataset.classnames)
+    return new_dataset
+def get_dataset(dataset_name, preprocess, location, batch_size=128, num_workers=16, val_fraction=0.1, max_val_samples=5000):
+    if dataset_name.endswith('Val'):
+        # Handle val splits
+        if dataset_name in registry:
+            dataset_class = registry[dataset_name]
+        else:
+            base_dataset_name = dataset_name.split('Val')[0]
+            base_dataset = get_dataset(base_dataset_name, preprocess, location, batch_size, num_workers)
+            dataset = split_train_into_train_val(
+                base_dataset, dataset_name, batch_size, num_workers, val_fraction, max_val_samples)
+            return dataset
+    else:
+        assert dataset_name in registry, f'Unsupported dataset: {dataset_name}. Supported datasets: {list(registry.keys())}'
+        dataset_class = registry[dataset_name]
+    dataset = dataset_class(
+        preprocess, location=location, batch_size=batch_size, num_workers=num_workers
+    )
+    return dataset

src/datasets/resisc45.py ADDED Viewed

	@@ -0,0 +1,304 @@

+import os
+import torch
+import abc
+import os
+from typing import Any, Callable, Dict, Optional, Tuple
+import numpy as np
+import torch
+from torch import Tensor
+from torch.utils.data import Dataset
+from torchvision.datasets import ImageFolder
+from torchvision.datasets.folder import default_loader as pil_loader
+# modified from: https://github.com/microsoft/torchgeo
+class VisionDataset(Dataset[Dict[str, Any]], abc.ABC):
+    """Abstract base class for datasets lacking geospatial information.
+    This base class is designed for datasets with pre-defined image chips.
+    """
+    @abc.abstractmethod
+    def __getitem__(self, index: int) -> Dict[str, Any]:
+        """Return an index within the dataset.
+        Args:
+            index: index to return
+        Returns:
+            data and labels at that index
+        Raises:
+            IndexError: if index is out of range of the dataset
+        """
+    @abc.abstractmethod
+    def __len__(self) -> int:
+        """Return the length of the dataset.
+        Returns:
+            length of the dataset
+        """
+    def __str__(self) -> str:
+        """Return the informal string representation of the object.
+        Returns:
+            informal string representation
+        """
+        return f"""\
+{self.__class__.__name__} Dataset
+    type: VisionDataset
+    size: {len(self)}"""
+class VisionClassificationDataset(VisionDataset, ImageFolder):
+    """Abstract base class for classification datasets lacking geospatial information.
+    This base class is designed for datasets with pre-defined image chips which
+    are separated into separate folders per class.
+    """
+    def __init__(
+        self,
+        root: str,
+        transforms: Optional[Callable[[Dict[str, Tensor]], Dict[str, Tensor]]] = None,
+        loader: Optional[Callable[[str], Any]] = pil_loader,
+        is_valid_file: Optional[Callable[[str], bool]] = None,
+    ) -> None:
+        """Initialize a new VisionClassificationDataset instance.
+        Args:
+            root: root directory where dataset can be found
+            transforms: a function/transform that takes input sample and its target as
+                entry and returns a transformed version
+            loader: a callable function which takes as input a path to an image and
+                returns a PIL Image or numpy array
+            is_valid_file: A function that takes the path of an Image file and checks if
+                the file is a valid file
+        """
+        # When transform & target_transform are None, ImageFolder.__getitem__(index)
+        # returns a PIL.Image and int for image and label, respectively
+        super().__init__(
+            root=root,
+            transform=None,
+            target_transform=None,
+            loader=loader,
+            is_valid_file=is_valid_file,
+        )
+        # Must be set after calling super().__init__()
+        self.transforms = transforms
+    def __getitem__(self, index: int) -> Dict[str, Tensor]:
+        """Return an index within the dataset.
+        Args:
+            index: index to return
+        Returns:
+            data and label at that index
+        """
+        image, label = self._load_image(index)
+        if self.transforms is not None:
+            return self.transforms(image), label
+        return image, label
+    def __len__(self) -> int:
+        """Return the number of data points in the dataset.
+        Returns:
+            length of the dataset
+        """
+        return len(self.imgs)
+    def _load_image(self, index: int) -> Tuple[Tensor, Tensor]:
+        """Load a single image and it's class label.
+        Args:
+            index: index to return
+        Returns:
+            the image
+            the image class label
+        """
+        img, label = ImageFolder.__getitem__(self, index)
+        label = torch.tensor(label)
+        return img, label
+class RESISC45Dataset(VisionClassificationDataset):
+    """RESISC45 dataset.
+    The `RESISC45 <http://www.escience.cn/people/JunweiHan/NWPU-RESISC45.html>`_
+    dataset is a dataset for remote sensing image scene classification.
+    Dataset features:
+    * 31,500 images with 0.2-30 m per pixel resolution (256x256 px)
+    * three spectral bands - RGB
+    * 45 scene classes, 700 images per class
+    * images extracted from Google Earth from over 100 countries
+    * images conditions with high variability (resolution, weather, illumination)
+    Dataset format:
+    * images are three-channel jpgs
+    Dataset classes:
+    0. airplane
+    1. airport
+    2. baseball_diamond
+    3. basketball_court
+    4. beach
+    5. bridge
+    6. chaparral
+    7. church
+    8. circular_farmland
+    9. cloud
+    10. commercial_area
+    11. dense_residential
+    12. desert
+    13. forest
+    14. freeway
+    15. golf_course
+    16. ground_track_field
+    17. harbor
+    18. industrial_area
+    19. intersection
+    20. island
+    21. lake
+    22. meadow
+    23. medium_residential
+    24. mobile_home_park
+    25. mountain
+    26. overpass
+    27. palace
+    28. parking_lot
+    29. railway
+    30. railway_station
+    31. rectangular_farmland
+    32. river
+    33. roundabout
+    34. runway
+    35. sea_ice
+    36. ship
+    37. snowberg
+    38. sparse_residential
+    39. stadium
+    40. storage_tank
+    41. tennis_court
+    42. terrace
+    43. thermal_power_station
+    44. wetland
+    This dataset uses the train/val/test splits defined in the "In-domain representation
+    learning for remote sensing" paper:
+    * https://arxiv.org/abs/1911.06721
+    If you use this dataset in your research, please cite the following paper:
+    * https://doi.org/10.1109/jproc.2017.2675998
+    """
+    # url = "https://drive.google.com/file/d/1DnPSU5nVSN7xv95bpZ3XQ0JhKXZOKgIv"
+    # md5 = "d824acb73957502b00efd559fc6cfbbb"
+    # filename = "NWPU-RESISC45.rar"
+    directory = "resisc45/NWPU-RESISC45"
+    splits = ["train", "val", "test"]
+    split_urls = {
+        "train": "https://storage.googleapis.com/remote_sensing_representations/resisc45-train.txt",  # noqa: E501
+        "val": "https://storage.googleapis.com/remote_sensing_representations/resisc45-val.txt",  # noqa: E501
+        "test": "https://storage.googleapis.com/remote_sensing_representations/resisc45-test.txt",  # noqa: E501
+    }
+    split_md5s = {
+        "train": "b5a4c05a37de15e4ca886696a85c403e",
+        "val": "a0770cee4c5ca20b8c32bbd61e114805",
+        "test": "3dda9e4988b47eb1de9f07993653eb08",
+    }
+    classes = [
+        "airplane",
+        "airport",
+        "baseball_diamond",
+        "basketball_court",
+        "beach",
+        "bridge",
+        "chaparral",
+        "church",
+        "circular_farmland",
+        "cloud",
+        "commercial_area",
+        "dense_residential",
+        "desert",
+        "forest",
+        "freeway",
+        "golf_course",
+        "ground_track_field",
+        "harbor",
+        "industrial_area",
+        "intersection",
+        "island",
+        "lake",
+        "meadow",
+        "medium_residential",
+        "mobile_home_park",
+        "mountain",
+        "overpass",
+        "palace",
+        "parking_lot",
+        "railway",
+        "railway_station",
+        "rectangular_farmland",
+        "river",
+        "roundabout",
+        "runway",
+        "sea_ice",
+        "ship",
+        "snowberg",
+        "sparse_residential",
+        "stadium",
+        "storage_tank",
+        "tennis_court",
+        "terrace",
+        "thermal_power_station",
+        "wetland",
+    ]
+    def __init__(
+        self,
+        root: str = "data",
+        split: str = "train",
+        transforms: Optional[Callable[[Dict[str, Tensor]], Dict[str, Tensor]]] = None,
+    ) -> None:
+        """Initialize a new RESISC45 dataset instance.
+        Args:
+            root: root directory where dataset can be found
+            split: one of "train", "val", or "test"
+            transforms: a function/transform that takes input sample and its target as
+                entry and returns a transformed version
+        """
+        assert split in self.splits
+        self.root = root
+        valid_fns = set()
+        with open(os.path.join(self.root, "resisc45", f"resisc45-{split}.txt")) as f:
+            for fn in f:
+                valid_fns.add(fn.strip())
+        is_in_split: Callable[[str], bool] = lambda x: os.path.basename(
+            x) in valid_fns
+        super().__init__(
+            root=os.path.join(root, self.directory),
+            transforms=transforms,
+            is_valid_file=is_in_split,
+        )
+class RESISC45:
+    def __init__(self,
+                 preprocess,
+                 location=os.path.expanduser('~/data'),
+                 batch_size=32,
+                 num_workers=16):
+        self.train_dataset = RESISC45Dataset(root=location, split='train', transforms=preprocess)
+        self.train_loader = torch.utils.data.DataLoader(
+            self.train_dataset,
+            shuffle=True,
+            batch_size=batch_size,
+            num_workers=num_workers,
+        )
+        self.test_dataset = RESISC45Dataset(root=location, split='test', transforms=preprocess)
+        self.test_loader = torch.utils.data.DataLoader(
+            self.test_dataset,
+            batch_size=batch_size,
+            num_workers=num_workers
+        )
+        # class names have _ so split on this for better zero-shot head
+        self.classnames = [' '.join(c.split('_')) for c in RESISC45Dataset.classes]