LR-ASPP MobileNetV3-Large — LiteRT (on-device semantic segmentation, fully-GPU)

Lite R-ASPP with a MobileNetV3-Large backbone (torchvision lraspp_mobilenet_v3_large, COCO-VOC 21 classes), converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. A pure-CNN real-time semantic segmentation model — it labels every pixel as one of 21 PASCAL-VOC classes (person, dog, car, chair, …).

On-device (Pixel 8a, Tensor G3 — verified)


nodes on GPU	242 / 242 LITERT_CL (full residency)
inference	~5 ms (512×512)
size	6.7 MB (fp16)
accuracy	device-vs-PyTorch corr 0.99998, argmax agreement 99.85%

image[1,3,512,512] (ImageNet-normalized) →[GPU: MobileNetV3 + Lite R-ASPP]→ logits[1,512,512,21]

Usage (Android, LiteRT CompiledModel)

val model = CompiledModel.create(modelPath, CompiledModel.Options(Accelerator.GPU), null)
val input = model.createInputBuffers(); val output = model.createOutputBuffers()
input[0].writeFloat(chw)              // [1,3,512,512] ImageNet-normalized, NCHW
model.run(input, output)
val logits = output[0].readFloat()    // [1,512,512,21] NHWC; argmax per pixel for the class map

How it converts (litert-torch)

Pure CNN — a single re-authoring: the MobileNetV3 Squeeze-Excite blocks and the Lite R-ASPP scale branch use AdaptiveAvgPool2d(1) (global average pool), each replaced with mean(3).mean(2) (two single-axis means — a single multi-axis pool is mis-computed on the Mali delegate). Everything else is already GPU-clean (Hardswish/Hardsigmoid → native HARD_SWISH, align_corners=False). Result: banned ops NONE, all tensors ≤4D, tflite-vs-torch corr 1.0, device-vs-torch corr 1.0.

License

BSD-3-Clause (torchvision). Upstream: pytorch/vision lraspp_mobilenet_v3_large.

Downloads last month: -

Paper for mlboydaisuke/LRASPP-MobileNetV3-LiteRT

Searching for MobileNetV3

Paper • 1905.02244 • Published May 6, 2019