Motion-SMD LoRA adapters

LoRA adapters for "Encoder-Free Human Motion Understanding via Structured Motion Descriptions". Fine-tuned with PEFT over several LLM backbones on Structured Motion Description (SMD) inputs.


Adapter index

Adapter folder Base model Task SMD variant Paper table
lora_qwen2.5-7b_qa_v5/ Qwen2.5-7B-Instruct Motion QA (BABEL-QA + HuMMan-QA) All-26 joints + trajectory Main QA row (66.7% / 90.1%)
lora_qwen2.5-7b_caption_v5/ Qwen2.5-7B-Instruct Motion Captioning (HumanML3D) All-26 joints + trajectory Main caption row (R@1 = 0.584)
lora_qwen2.5-7b_qa_v5_top3/ Qwen2.5-7B-Instruct Motion QA Top-3 joints per body part Attention visualization §Interpretability
lora_qwen2.5-7b_caption_v5_top3/ Qwen2.5-7B-Instruct Motion Captioning Top-3 joints per body part Attention visualization §Interpretability
lora_gemma3-4b_qa_top3/ google/gemma-3-4b-it Motion QA Top-3 Backbone portability
lora_qwen3-8b_qa_top3/ Qwen/Qwen3-8B Motion QA Top-3 Backbone portability
lora_llama3.1-8b_qa_top3/ meta-llama/Llama-3.1-8B-Instruct Motion QA Top-3 Backbone portability
lora_glm4-9b_qa_top3/ THUDM/glm-4-9b-chat Motion QA Top-3 Backbone portability

Each folder contains adapter_config.json, adapter_model.safetensors, and the tokenizer files of the corresponding base model (LoRA-only, so each adapter is ≈ 65 MB).

Training configuration

Shared across all adapters (from scripts/finetune/train_lora_llm.py):

  • LoRA rank = 16, alpha = 32, dropout = 0.05
  • Target modules: attention Q/K/V/O projections and MLP up/down projections
  • Optimizer: AdamW, lr = 1e-4, cosine decay
  • Max sequence length: 8192 tokens
  • Epochs: 5
  • Hardware: 1× H200 GPU (2–8 GPU-hours depending on backbone)
  • Input: Structured Motion Description text + task-specific prompt

See scripts/slurm/slurm_train_qa.sh / slurm_train_caption.sh / slurm_train_backbone.sh in the code repo for exact commands.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# 1. Load the base LLM
base_id = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto"
)

# 2. Attach the LoRA adapter from this repo
model = PeftModel.from_pretrained(
    base,
    "zyyy12138/motion-smd-lora",
    subfolder="lora_qwen2.5-7b_qa_v5",
)
model.eval()

# 3. Use: build a prompt with SMD text + question
smd_text = """Motion: 3.2s (64 frames at 20 FPS)
Trajectory: displacement 0.12m, height change +0.01m, avg height 0.94m
Global Trajectory: ... (full SMD)
Joint Angles: ... (full SMD)"""

question = "Which body part is moving?"
options = ["Left arm", "Right arm", "Head", "Torso"]
prompt = f"""Motion description:
{smd_text}

Question: {question}
Options: {chr(10).join([f'{i}) {o}' for i, o in enumerate(options)])}

Answer:"""

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=8)
print(tok.decode(out[0], skip_special_tokens=True))

The exact prompt templates used in the paper live in scripts/finetune/dataset_text_only.py and captioning/scripts/dataset_caption.py in the code repo.

Evaluation results (from the paper)

Adapter Task Benchmark Metric Value
lora_qwen2.5-7b_qa_v5 QA BABEL-QA (test) Accuracy 66.7%
lora_qwen2.5-7b_qa_v5 QA HuMMan-QA (test) Accuracy 90.1%
lora_qwen2.5-7b_caption_v5 Caption HumanML3D (test) R@1 0.584
lora_qwen2.5-7b_caption_v5 Caption HumanML3D (test) CIDEr 53.16

Complete tables and backbone-portability numbers in the paper.

Base model licenses

LoRA adapters inherit the license of their base model. Before using an adapter, make sure you comply with the base model's terms:

Adapter weight license

The LoRA adapter weights themselves (adapter_model.safetensors) are released under the Apache-2.0 license.

Citation

@article{zhang2026smd,
  title   = {Encoder-Free Human Motion Understanding via Structured Motion Descriptions},
  author  = {Zhang, Yao and Liu, Zhuchenyang and Ploetz, Thomas and Xiao, Yu},
  journal = {arXiv preprint arXiv:2604.21668},
  year    = {2026}
}

Contact

Yao Zhang — yao.1.zhang@aalto.fi

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zyyy12138/motion-smd-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(1859)
this model

Paper for zyyy12138/motion-smd-lora