You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PTv3-DALES: Point Transformer V3 for Aerial LiDAR Semantic Segmentation

A Point Transformer V3 (PTv3) model trained from scratch on the DALES aerial LiDAR dataset for 9-class semantic segmentation of airborne point clouds.

Model Description

Existing pre-trained point cloud models (e.g., from Open3D-ML) were trained on indoor scenes (S3DIS, ScanNet) or autonomous driving data (SemanticKITTI, nuScenes). These models fail on aerial LiDAR due to fundamental domain mismatch: different viewpoint (top-down vs. street-level), scale (kilometres vs. rooms), and class semantics (ground/buildings/vegetation vs. walls/chairs). This model addresses the gap by training PTv3 directly on aerial LiDAR data.

  • Model architecture: Point Transformer V3 (m1_base variant) with U-Net style encoder-decoder
  • Parameters: ~46M
  • Input: XYZ coordinates + normalised return number (4 channels)
  • Output: Per-point class probabilities (9 classes)
  • Training framework: PyTorch + Pointcept

Training Details

Dataset

DALES (Dayton Aerial LiDAR Data Set) โ€” 40 aerial LiDAR tiles (~500 m x 500 m each) from urban and suburban areas:

  • Train: 25 tiles (2,567 blocks after spatial splitting)
  • Val: 4 tiles (411 blocks, 15% held-out from train split)
  • Test: 11 tiles
  • Preprocessing: Tiles split into 50 m x 50 m non-overlapping blocks with mean-centred coordinates

Classes

ID Class Train Distribution
0 Unknown Ignored (weight = 0)
1 Ground Dominant
2 Vegetation Dominant
3 Cars Rare
4 Trucks Very rare
5 Power lines Rare
6 Fences Rare
7 Poles Very rare
8 Buildings Common

Hyperparameters

Parameter Value
Epochs 100
Optimizer AdamW
Learning rate 0.0005
Scheduler OneCycleLR (warmup 10%)
Effective batch size 16 (batch=1 x accum=16)
Loss Cross-entropy + Lovasz softmax
Gradient clipping max_norm = 1.0
Grid size (voxel) 0.15 m
Max points per sample 40,000
Mixed precision Disabled (fp32 only)
Class weighting Inverse-frequency

Architecture

Component Configuration
Serialisation orders z, z-trans, hilbert, hilbert-trans
Encoder depths (2, 2, 2, 6, 2)
Encoder channels (32, 64, 128, 256, 512)
Encoder heads (2, 4, 8, 16, 32)
Decoder depths (2, 2, 2, 2)
Decoder channels (64, 64, 128, 256)
Decoder heads (4, 4, 8, 16)
Patch size 1024
MLP ratio 4
Drop path 0.3

Compute

  • GPU: NVIDIA Tesla V100-PCIE-32GB (single GPU)
  • Training time: 27.5 hours (~16.5 min/epoch)
  • Precision: fp32 (fp16 AMP causes NaN on V100 due to attention softmax overflow)

Results

Test Set (26.2M points, 11 tiles)

Class IoU Precision Recall Support
Ground 95.69% 97.41% 98.18% 12,065,396
Vegetation 90.01% 95.80% 93.71% 8,691,162
Buildings 93.73% 97.57% 95.98% 4,880,561
Power lines 88.63% 96.11% 91.92% 73,735
Cars 71.40% 86.05% 80.74% 273,015
Poles 57.48% 66.48% 80.94% 29,046
Fences 37.76% 41.06% 82.43% 172,857
Trucks 19.60% 28.54% 38.49% 38,279
Metric Score
Overall Accuracy 95.88%
mIoU (8 classes) 69.29%
Best epoch 83

Validation Set (8.5M points, 4 tiles)

Metric Score
Overall Accuracy 92.14%
mIoU (8 classes) 63.46%

Usage

Evaluate on DALES

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="jayakumarpujar/Ptv3",
    filename="base_model_ptv3_dales_.pth",
)
python scripts/train_ptv3_dales.py \
    --data_root data/dales_ptv3 \
    --eval_only \
    --checkpoint $CKPT_PATH \
    --no_amp

Fine-tune on custom aerial LiDAR data

# 1. Preprocess your LAS files (same 9-class scheme or remap labels)
python scripts/preprocess_dales_ptv3.py \
    --input_dir your_las_data \
    --output_dir data/your_dataset

# 2. Fine-tune from pre-trained checkpoint
python scripts/train_ptv3_dales.py \
    --data_root data/your_dataset \
    --resume $CKPT_PATH \
    --epochs 50 --lr 0.0001 \
    --no_amp

Checkpoint Contents

The .pth file contains:

{
    "epoch": int,
    "model_state_dict": OrderedDict,     # Full model weights
    "optimizer_state_dict": OrderedDict,  # AdamW state (for resuming)
    "scheduler_state_dict": dict,         # OneCycleLR state
    "scaler_state_dict": dict,            # GradScaler state
    "best_miou": float,
    "best_epoch": int,
    "num_classes": 9,
    "class_names": {0: "unknown", 1: "ground", ...},
}

Dependencies

  • PyTorch >= 2.0
  • Pointcept (with compiled CUDA pointops extension)
  • NumPy >= 1.24
  • laspy >= 2.5

Limitations

  • Trucks (19.6% IoU) and fences (37.8% IoU) have poor performance due to very low representation in the training data and geometric ambiguity with other classes.
  • Requires a CUDA GPU โ€” Pointcept's serialisation-based attention relies on custom CUDA kernels.
  • V100 GPUs must use fp32 (--no_amp); fp16 mixed precision causes NaN in the attention softmax. Ampere+ GPUs (A100, H100) can use AMP and flash attention.
  • Trained on DALES (Canadian urban/suburban scenes). Performance on other geographic regions or landscapes may vary.

Citation

@inproceedings{wu2024ptv3,
    title={Point Transformer V3: Simpler, Faster, Stronger},
    author={Wu, Xiaoyang and Jiang, Li and Wang, Peng-Shuai and Liu, Zhijian and Liu, Xihui and Qiao, Yu and Ouyang, Wanli and He, Tong and Zhao, Hengshuang},
    booktitle={CVPR},
    year={2024}
}

@inproceedings{varney2020dales,
    title={DALES: A Large-scale Aerial LiDAR Data Set for Semantic Segmentation},
    author={Varney, Nina and Asari, Vijayan K. and Graehling, Quinn},
    booktitle={CVPRW},
    year={2020}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support