Motion-SMD LoRA adapters

LoRA adapters for "Encoder-Free Human Motion Understanding via Structured Motion Descriptions". Fine-tuned with PEFT over several LLM backbones on Structured Motion Description (SMD) inputs.

🌐 Project page: https://yaozhang182.github.io/motion-smd/
💻 Code: https://github.com/yaozhang182/motion-smd
🗂️ Dataset (SMD + preprocessed inputs): https://huggingface.co/datasets/zyyy12138/motion-smd-data
📄 Paper (arXiv): https://arxiv.org/abs/2604.21668

Adapter index

Adapter folder	Base model	Task	SMD variant	Paper table
`lora_qwen2.5-7b_qa_v5/`	Qwen2.5-7B-Instruct	Motion QA (BABEL-QA + HuMMan-QA)	All-26 joints + trajectory	Main QA row (66.7% / 90.1%)
`lora_qwen2.5-7b_caption_v5/`	Qwen2.5-7B-Instruct	Motion Captioning (HumanML3D)	All-26 joints + trajectory	Main caption row (R@1 = 0.584)
`lora_qwen2.5-7b_qa_v5_top3/`	Qwen2.5-7B-Instruct	Motion QA	Top-3 joints per body part	Attention visualization §Interpretability
`lora_qwen2.5-7b_caption_v5_top3/`	Qwen2.5-7B-Instruct	Motion Captioning	Top-3 joints per body part	Attention visualization §Interpretability
`lora_gemma3-4b_qa_top3/`	google/gemma-3-4b-it	Motion QA	Top-3	Backbone portability
`lora_qwen3-8b_qa_top3/`	Qwen/Qwen3-8B	Motion QA	Top-3	Backbone portability
`lora_llama3.1-8b_qa_top3/`	meta-llama/Llama-3.1-8B-Instruct	Motion QA	Top-3	Backbone portability
`lora_glm4-9b_qa_top3/`	THUDM/glm-4-9b-chat	Motion QA	Top-3	Backbone portability

Each folder contains adapter_config.json, adapter_model.safetensors, and the tokenizer files of the corresponding base model (LoRA-only, so each adapter is ≈ 65 MB).

Training configuration

Shared across all adapters (from scripts/finetune/train_lora_llm.py):

LoRA rank = 16, alpha = 32, dropout = 0.05
Target modules: attention Q/K/V/O projections and MLP up/down projections
Optimizer: AdamW, lr = 1e-4, cosine decay
Max sequence length: 8192 tokens
Epochs: 5
Hardware: 1× H200 GPU (2–8 GPU-hours depending on backbone)
Input: Structured Motion Description text + task-specific prompt

See scripts/slurm/slurm_train_qa.sh / slurm_train_caption.sh / slurm_train_backbone.sh in the code repo for exact commands.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# 1. Load the base LLM
base_id = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto"
)

# 2. Attach the LoRA adapter from this repo
model = PeftModel.from_pretrained(
    base,
    "zyyy12138/motion-smd-lora",
    subfolder="lora_qwen2.5-7b_qa_v5",
)
model.eval()

# 3. Use: build a prompt with SMD text + question
smd_text = """Motion: 3.2s (64 frames at 20 FPS)
Trajectory: displacement 0.12m, height change +0.01m, avg height 0.94m
Global Trajectory: ... (full SMD)
Joint Angles: ... (full SMD)"""

question = "Which body part is moving?"
options = ["Left arm", "Right arm", "Head", "Torso"]
prompt = f"""Motion description:
{smd_text}

Question: {question}
Options: {chr(10).join([f'{i}) {o}' for i, o in enumerate(options)])}

Answer:"""

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=8)
print(tok.decode(out[0], skip_special_tokens=True))

The exact prompt templates used in the paper live in scripts/finetune/dataset_text_only.py and captioning/scripts/dataset_caption.py in the code repo.

Evaluation results (from the paper)

Adapter	Task	Benchmark	Metric	Value
`lora_qwen2.5-7b_qa_v5`	QA	BABEL-QA (test)	Accuracy	66.7%
`lora_qwen2.5-7b_qa_v5`	QA	HuMMan-QA (test)	Accuracy	90.1%
`lora_qwen2.5-7b_caption_v5`	Caption	HumanML3D (test)	R@1	0.584
`lora_qwen2.5-7b_caption_v5`	Caption	HumanML3D (test)	CIDEr	53.16

Complete tables and backbone-portability numbers in the paper.

Base model licenses

LoRA adapters inherit the license of their base model. Before using an adapter, make sure you comply with the base model's terms:

Qwen2.5-7B-Instruct — Tongyi Qianwen License
Gemma-3-4B — Gemma Terms of Use
Qwen3-8B — Tongyi Qianwen License
Llama-3.1-8B-Instruct — Llama 3.1 Community License
GLM-4-9B-Chat — GLM-4 License

Adapter weight license

The LoRA adapter weights themselves (adapter_model.safetensors) are released under the Apache-2.0 license.

Citation

@article{zhang2026smd,
  title   = {Encoder-Free Human Motion Understanding via Structured Motion Descriptions},
  author  = {Zhang, Yao and Liu, Zhuchenyang and Ploetz, Thomas and Xiao, Yu},
  journal = {arXiv preprint arXiv:2604.21668},
  year    = {2026}
}

Contact

Yao Zhang — yao.1.zhang@aalto.fi

Downloads last month: -

Model tree for zyyy12138/motion-smd-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1859)

this model

Paper for zyyy12138/motion-smd-lora

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Paper • 2604.21668 • Published 3 days ago • 1