SkySense++

SkySense++ is a semantic-enhanced multi-modal remote sensing foundation model for Earth observation. It fuses high-resolution optical imagery (HR), Sentinel-2 (S2), and Sentinel-1 SAR (S1) through independent backbones, an optional modality-completion VAE, and a shared transformer fusion encoder.

Primary use: representation extraction. The pretrained backbones produce rich feature representations for downstream tasks (classification, segmentation, regression). Extract features_hr, features_s2, features_s1, or features_fusion and feed them to your task-specific head. Fine-tuning on your target dataset is required. See the main SkySensePlusPlus repository for pretraining, 1-shot, and finetuning workflows.

Model Metadata

Attribute	Value
Model type	Multi-modal segmentation (HR + S2 + S1)
Paper	SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation
Publication	Nature Machine Intelligence, 2025
License	Apache-2.0
Input modalities	High-resolution optical, Sentinel-2, Sentinel-1
Output	Semantic segmentation (65 classes)
Checkpoint contents	Backbone weights only; segmentation head not pretrained
HR input size	512×512
S2/S1 patch size	16×16

Model Variants

Variant	Path	Sources	Use Modal VAE	Description
full (default)	`.`	hr, s2, s1	Yes	All three modalities with VAE completion
hr	`hr/`	hr	No	High-resolution optical only
s2	`s2/`	s2	No	Sentinel-2 only
s1	`s1/`	s1	No	Sentinel-1 only

Repository structure (full variant, diffusers layout)

.
├── config.json
├── model.safetensors
├── modality_vae/                      # VAE subfolder (diffusers standard)
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── modeling_skysensepp.py
├── configuration_skysensepp.py
├── pipeline_skysensepp.py
├── sky_sensepp_impl/                  # ModalityCompletionVAE, ModalityCompletionVAEPipeline in necks/
├── hr/, s2/, s1/                      # Single-modality variants

VAE loads automatically from modality_vae/ subfolder. Legacy modality_vae.safetensors at root is also supported. Migrate with: python tools/split_vae_from_checkpoint.py --model-dir path/to/model --migrate

Installation

pip install transformers torch safetensors diffusers

The modality VAE uses diffusers VQModel. Legacy checkpoints (ConvVQVAEv2) load via backward-compatible fallback.

Usage

Diffusers-style loading and inference

The VAE follows the diffusers layout: model in a modality_vae/ subfolder with config.json and diffusion_pytorch_model.safetensors. Load and run inference like this:

import torch
from transformers import AutoModel

# Load full model (VAE auto-loads from modality_vae/ subfolder, diffusers-style)
model = AutoModel.from_pretrained("path/to/SkySensepp", trust_remote_code=True)
model = model.eval().to("cuda")

# Prepare inputs
hr_img = torch.randn(1, 3, 512, 512, device="cuda")
s2_img = torch.randn(1, 10, 2, 256, 256, device="cuda")  # B, 10 bands, S steps, H, W
s1_img = torch.randn(1, 2, 2, 256, 256, device="cuda")    # B, 2 bands, S steps, H, W
modalities = torch.ones(1, 3, dtype=torch.bool, device="cuda")  # [hr, s2, s1] present

# Inference
with torch.no_grad():
    out = model(
        hr_img=hr_img,
        s2_img=s2_img,
        s1_img=s1_img,
        modality_flag_hr=modalities[:, :1],
        modality_flag_s2=modalities[:, 1:2],
        modality_flag_s1=modalities[:, 2:],
        return_features=True,
    )

features_fusion = out["features_fusion"]
logits_hr = out.get("logits_hr")

Load VAE component only (diffusers-style)

from sky_sensepp_impl.necks import ModalityCompletionVAE

# Load VAE from subfolder (same pattern as diffusers Stable Diffusion VAE)
vae = ModalityCompletionVAE.from_pretrained(
    "path/to/SkySensepp",
    subfolder="modality_vae",
)
vae = vae.eval().to("cuda")

# Run modality completion on backbone features (e.g. 2816-d, 16×16)
feat_hr = torch.randn(1, 2816, 16, 16, device="cuda")
feat_s2 = torch.randn(1, 2816, 16, 16, device="cuda")
feat_s1 = torch.randn(1, 2816, 16, 16, device="cuda")
modality_info = torch.ones(1, 3, dtype=torch.bool, device="cuda")

with torch.no_grad():
    out = vae(feat_hr, feat_s2, feat_s1, modality_info)

hr_out = out["hr_out"]
s2_out = out["s2_out"]
s1_out = out["s1_out"]

ModalityCompletionVAEPipeline (modular, diffusers-style)

from sky_sensepp_impl.necks import ModalityCompletionVAE, ModalityCompletionVAEPipeline

# Load pipeline (VAE from modality_vae/ subfolder)
pipe = ModalityCompletionVAEPipeline.from_pretrained(
    "path/to/SkySensepp",
    subfolder="modality_vae",
)
pipe = pipe.to("cuda")

# Inference on features
out = pipe(
    feat_hr=feat_hr,
    feat_s2=feat_s2,
    feat_s1=feat_s1,
    modality_info=modality_info,
)
hr_out, s2_out, s1_out = out["hr_out"], out["s2_out"], out["s1_out"]

# Modular: inject custom VAE
custom_vae = ModalityCompletionVAE.from_pretrained("path/to/custom_vae")
pipe = ModalityCompletionVAEPipeline.from_pretrained("path/to/SkySensepp", vae=custom_vae)

# Or swap components after load
pipe.register_components(vae=custom_vae)

Load model and attach VAE manually

model = AutoModel.from_pretrained("path/to/SkySensepp", trust_remote_code=True)
model.load_vae(
    pretrained_model_name_or_path="path/to/SkySensepp",
    subfolder="modality_vae",
)

Variants (single-modality, no VAE)

model = AutoModel.from_pretrained("path/to/SkySensepp/hr", trust_remote_code=True)
model = AutoModel.from_pretrained("path/to/SkySensepp/s2", trust_remote_code=True)
model = AutoModel.from_pretrained("path/to/SkySensepp/s1", trust_remote_code=True)

Representation shapes (HR-only)

Output	Shape	Description
`features_hr[i]`	multi-scale	Backbone features at 4 scales (stage 0–3)
`features_fusion`	`(B, 1024, H, W)`	Fused spatial representation for downstream head

Input Formats

Modality	Shape	Description
hr_img	`(B, 3, H, W)`	RGB high-res, H=W=512 typical
s2_img	`(B, 10, S, H, W)`	Sentinel-2, 10 bands, S time steps
s1_img	`(B, 2, S, H, W)`	Sentinel-1 VV/VH, S time steps

Citation

@article{skysensepp2025,
  title={SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation},
  journal={Nature Machine Intelligence},
  year={2025},
  url={https://www.nature.com/articles/s42256-025-01078-8}
}

References

Downloads last month: 234