LR-ASPP MobileNetV3-Large β€” LiteRT (on-device semantic segmentation, fully-GPU)

Lite R-ASPP with a MobileNetV3-Large backbone (torchvision lraspp_mobilenet_v3_large, COCO-VOC 21 classes), converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. A pure-CNN real-time semantic segmentation model β€” it labels every pixel as one of 21 PASCAL-VOC classes (person, dog, car, chair, …).

LR-ASPP β€” input | segmentation (on-device LiteRT GPU)

On-device (Pixel 8a, Tensor G3 β€” verified)

nodes on GPU 242 / 242 LITERT_CL (full residency)
inference ~5 ms (512Γ—512)
size 6.7 MB (fp16)
accuracy device-vs-PyTorch corr 0.99998, argmax agreement 99.85%
image[1,3,512,512] (ImageNet-normalized) β†’[GPU: MobileNetV3 + Lite R-ASPP]β†’ logits[1,512,512,21]

Usage (Android, LiteRT CompiledModel)

val model = CompiledModel.create(modelPath, CompiledModel.Options(Accelerator.GPU), null)
val input = model.createInputBuffers(); val output = model.createOutputBuffers()
input[0].writeFloat(chw)              // [1,3,512,512] ImageNet-normalized, NCHW
model.run(input, output)
val logits = output[0].readFloat()    // [1,512,512,21] NHWC; argmax per pixel for the class map

How it converts (litert-torch)

Pure CNN β€” a single re-authoring: the MobileNetV3 Squeeze-Excite blocks and the Lite R-ASPP scale branch use AdaptiveAvgPool2d(1) (global average pool), each replaced with mean(3).mean(2) (two single-axis means β€” a single multi-axis pool is mis-computed on the Mali delegate). Everything else is already GPU-clean (Hardswish/Hardsigmoid β†’ native HARD_SWISH, align_corners=False). Result: banned ops NONE, all tensors ≀4D, tflite-vs-torch corr 1.0, device-vs-torch corr 1.0.

License

BSD-3-Clause (torchvision). Upstream: pytorch/vision lraspp_mobilenet_v3_large.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for mlboydaisuke/LRASPP-MobileNetV3-LiteRT