Motion-SMD LoRA adapters
LoRA adapters for "Encoder-Free Human Motion Understanding via Structured Motion Descriptions". Fine-tuned with PEFT over several LLM backbones on Structured Motion Description (SMD) inputs.
- 🌐 Project page: https://yaozhang182.github.io/motion-smd/
- 💻 Code: https://github.com/yaozhang182/motion-smd
- 🗂️ Dataset (SMD + preprocessed inputs): https://huggingface.co/datasets/zyyy12138/motion-smd-data
- 📄 Paper (arXiv): https://arxiv.org/abs/2604.21668
Adapter index
| Adapter folder | Base model | Task | SMD variant | Paper table |
|---|---|---|---|---|
lora_qwen2.5-7b_qa_v5/ |
Qwen2.5-7B-Instruct | Motion QA (BABEL-QA + HuMMan-QA) | All-26 joints + trajectory | Main QA row (66.7% / 90.1%) |
lora_qwen2.5-7b_caption_v5/ |
Qwen2.5-7B-Instruct | Motion Captioning (HumanML3D) | All-26 joints + trajectory | Main caption row (R@1 = 0.584) |
lora_qwen2.5-7b_qa_v5_top3/ |
Qwen2.5-7B-Instruct | Motion QA | Top-3 joints per body part | Attention visualization §Interpretability |
lora_qwen2.5-7b_caption_v5_top3/ |
Qwen2.5-7B-Instruct | Motion Captioning | Top-3 joints per body part | Attention visualization §Interpretability |
lora_gemma3-4b_qa_top3/ |
google/gemma-3-4b-it | Motion QA | Top-3 | Backbone portability |
lora_qwen3-8b_qa_top3/ |
Qwen/Qwen3-8B | Motion QA | Top-3 | Backbone portability |
lora_llama3.1-8b_qa_top3/ |
meta-llama/Llama-3.1-8B-Instruct | Motion QA | Top-3 | Backbone portability |
lora_glm4-9b_qa_top3/ |
THUDM/glm-4-9b-chat | Motion QA | Top-3 | Backbone portability |
Each folder contains adapter_config.json, adapter_model.safetensors, and the tokenizer files of the corresponding base model (LoRA-only, so each adapter is ≈ 65 MB).
Training configuration
Shared across all adapters (from scripts/finetune/train_lora_llm.py):
- LoRA rank = 16, alpha = 32, dropout = 0.05
- Target modules: attention Q/K/V/O projections and MLP up/down projections
- Optimizer: AdamW, lr = 1e-4, cosine decay
- Max sequence length: 8192 tokens
- Epochs: 5
- Hardware: 1× H200 GPU (2–8 GPU-hours depending on backbone)
- Input: Structured Motion Description text + task-specific prompt
See scripts/slurm/slurm_train_qa.sh / slurm_train_caption.sh /
slurm_train_backbone.sh in the code repo for exact commands.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# 1. Load the base LLM
base_id = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
base_id, torch_dtype=torch.bfloat16, device_map="auto"
)
# 2. Attach the LoRA adapter from this repo
model = PeftModel.from_pretrained(
base,
"zyyy12138/motion-smd-lora",
subfolder="lora_qwen2.5-7b_qa_v5",
)
model.eval()
# 3. Use: build a prompt with SMD text + question
smd_text = """Motion: 3.2s (64 frames at 20 FPS)
Trajectory: displacement 0.12m, height change +0.01m, avg height 0.94m
Global Trajectory: ... (full SMD)
Joint Angles: ... (full SMD)"""
question = "Which body part is moving?"
options = ["Left arm", "Right arm", "Head", "Torso"]
prompt = f"""Motion description:
{smd_text}
Question: {question}
Options: {chr(10).join([f'{i}) {o}' for i, o in enumerate(options)])}
Answer:"""
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=8)
print(tok.decode(out[0], skip_special_tokens=True))
The exact prompt templates used in the paper live in
scripts/finetune/dataset_text_only.py and
captioning/scripts/dataset_caption.py in the code repo.
Evaluation results (from the paper)
| Adapter | Task | Benchmark | Metric | Value |
|---|---|---|---|---|
lora_qwen2.5-7b_qa_v5 |
QA | BABEL-QA (test) | Accuracy | 66.7% |
lora_qwen2.5-7b_qa_v5 |
QA | HuMMan-QA (test) | Accuracy | 90.1% |
lora_qwen2.5-7b_caption_v5 |
Caption | HumanML3D (test) | R@1 | 0.584 |
lora_qwen2.5-7b_caption_v5 |
Caption | HumanML3D (test) | CIDEr | 53.16 |
Complete tables and backbone-portability numbers in the paper.
Base model licenses
LoRA adapters inherit the license of their base model. Before using an adapter, make sure you comply with the base model's terms:
- Qwen2.5-7B-Instruct — Tongyi Qianwen License
- Gemma-3-4B — Gemma Terms of Use
- Qwen3-8B — Tongyi Qianwen License
- Llama-3.1-8B-Instruct — Llama 3.1 Community License
- GLM-4-9B-Chat — GLM-4 License
Adapter weight license
The LoRA adapter weights themselves (adapter_model.safetensors) are
released under the Apache-2.0 license.
Citation
@article{zhang2026smd,
title = {Encoder-Free Human Motion Understanding via Structured Motion Descriptions},
author = {Zhang, Yao and Liu, Zhuchenyang and Ploetz, Thomas and Xiao, Yu},
journal = {arXiv preprint arXiv:2604.21668},
year = {2026}
}
Contact
Yao Zhang — yao.1.zhang@aalto.fi
- Downloads last month
- -