pookiefoof commited on
Commit
79736ca
Β·
verified Β·
1 Parent(s): 8a38c64

Release SkinTokens: TokenRig + FSQ-CVAE checkpoints

Browse files
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: pytorch
6
+ tags:
7
+ - rigging
8
+ - skinning
9
+ - skeleton
10
+ - autoregressive
11
+ - fsq
12
+ - vae
13
+ - 3d
14
+ - animation
15
+ - VAST
16
+ - Tripo
17
+ ---
18
+
19
+ # SkinTokens
20
+
21
+ Pretrained checkpoints for **SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging**.
22
+
23
+ [![Project Page](https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white)](https://zjp-shadow.github.io/works/SkinTokens/)
24
+ [![arXiv](https://img.shields.io/badge/arXiv-2602.04805-b31b1b.svg)](https://arxiv.org/abs/2602.04805)
25
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/VAST-AI-Research/SkinTokens)
26
+ [![Tripo](https://img.shields.io/badge/Tripo-3D_Studio-ff7a00)](https://www.tripo3d.ai)
27
+
28
+ This repository stores the model checkpoints used by the [SkinTokens codebase](https://github.com/VAST-AI-Research/SkinTokens), including:
29
+
30
+ - the **FSQ-CVAE** that learns the *SkinTokens* discrete representation of skinning weights, and
31
+ - the **TokenRig** autoregressive Transformer (Qwen3-0.6B architecture, GRPO-refined) that jointly generates skeletons and SkinTokens from a 3D mesh.
32
+
33
+ SkinTokens is the successor to [UniRig](https://github.com/VAST-AI-Research/UniRig) (SIGGRAPH '25). While UniRig treats skeleton and skinning as decoupled stages, SkinTokens unifies both into a single autoregressive sequence via learned discrete skin tokens, yielding **98%–133%** improvement in skinning accuracy and **17%–22%** improvement in bone prediction over state-of-the-art baselines.
34
+
35
+ ## What Is Included
36
+
37
+ The repository is organized exactly like the `experiments/` folder expected by the main SkinTokens codebase:
38
+
39
+ ```text
40
+ experiments/
41
+ β”œβ”€β”€ articulation_xl_quantization_256_token_4/
42
+ β”‚ └── grpo_1400.ckpt # TokenRig autoregressive rigging model (GRPO-refined)
43
+ └── skin_vae_2_10_32768/
44
+ └── last.ckpt # FSQ-CVAE for SkinTokens (skin-weight tokenizer)
45
+ ```
46
+
47
+ Approximate total size: about **1.6 GB**.
48
+
49
+ > The training data (`ArticulationXL` splits and processed meshes) used to train these checkpoints will be released separately in a future update.
50
+
51
+ ## Checkpoint Overview
52
+
53
+ ### SkinTokens β€” FSQ-CVAE (skin-weight tokenizer)
54
+
55
+ **File:** `experiments/skin_vae_2_10_32768/last.ckpt`
56
+
57
+ Compresses sparse skinning weights into discrete *SkinTokens* using a Finite Scalar Quantized Conditional VAE with codebook levels `[8, 8, 8, 5, 5, 5]` (64,000 entries). Used both to tokenize ground-truth weights during training and to decode TokenRig's output tokens back into per-vertex skinning at inference.
58
+
59
+ ### TokenRig β€” autoregressive rigging model
60
+
61
+ **File:** `experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt`
62
+
63
+ Qwen3-0.6B-based Transformer trained on a composite of **ArticulationXL 2.0 (70%)**, **VRoid Hub (20%)**, and **ModelsResource (10%)**, with quantization 256 and 4 skin tokens per bone, then refined with GRPO for 1,400 steps. **This is the recommended checkpoint** β€” it generates the skeleton and the SkinTokens in a single unified sequence.
64
+
65
+ > Both checkpoints are required for end-to-end inference: TokenRig generates the rig as a token sequence, and the FSQ-CVAE decoder turns SkinTokens back into dense per-vertex skinning weights.
66
+
67
+ ## How To Use
68
+
69
+ The easiest way is to use the helper script in the main SkinTokens codebase, which downloads both checkpoints and the required Qwen3-0.6B config into the expected layout:
70
+
71
+ ```bash
72
+ git clone https://github.com/VAST-AI-Research/SkinTokens.git
73
+ cd SkinTokens
74
+ python download.py --model
75
+ ```
76
+
77
+ ### Option 1 β€” Download with `hf` CLI
78
+
79
+ ```bash
80
+ hf download VAST-AI/SkinTokens \
81
+ --repo-type model \
82
+ --local-dir .
83
+ ```
84
+
85
+ ### Option 2 β€” Download with `huggingface_hub` (Python)
86
+
87
+ ```python
88
+ from huggingface_hub import snapshot_download
89
+
90
+ snapshot_download(
91
+ repo_id="VAST-AI/SkinTokens",
92
+ repo_type="model",
93
+ local_dir=".",
94
+ local_dir_use_symlinks=False,
95
+ )
96
+ ```
97
+
98
+ ### Option 3 β€” Download individual files
99
+
100
+ ```python
101
+ from huggingface_hub import hf_hub_download
102
+
103
+ tokenrig_ckpt = hf_hub_download(
104
+ repo_id="VAST-AI/SkinTokens",
105
+ filename="experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt",
106
+ )
107
+ skin_vae_ckpt = hf_hub_download(
108
+ repo_id="VAST-AI/SkinTokens",
109
+ filename="experiments/skin_vae_2_10_32768/last.ckpt",
110
+ )
111
+ ```
112
+
113
+ ### Option 4 β€” Web UI
114
+
115
+ Browse the [Files and versions](https://huggingface.co/VAST-AI/SkinTokens/tree/main) tab and download the folders manually, keeping the `experiments/...` layout intact.
116
+
117
+ After download, you should have:
118
+
119
+ ```text
120
+ experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt
121
+ experiments/skin_vae_2_10_32768/last.ckpt
122
+ ```
123
+
124
+ ## Run TokenRig With These Weights
125
+
126
+ Once the `experiments/` folder is in place (and the environment is installed per the [GitHub README](https://github.com/VAST-AI-Research/SkinTokens#installation)), you can run:
127
+
128
+ ```bash
129
+ python demo.py --input examples/giraffe.glb --output results/giraffe.glb --use_transfer
130
+ ```
131
+
132
+ Or launch the Gradio demo:
133
+
134
+ ```bash
135
+ python demo.py
136
+ ```
137
+
138
+ Then open `http://127.0.0.1:1024` in your browser.
139
+
140
+ ## Notes
141
+
142
+ - **Keep the directory names unchanged.** The SkinTokens code expects the exact `experiments/.../*.ckpt` layout shown above.
143
+ - **TokenRig requires both checkpoints.** `grpo_1400.ckpt` generates discrete tokens; the SkinTokens FSQ-CVAE (`last.ckpt`) is needed to decode them into per-vertex skinning weights.
144
+ - **Qwen3-0.6B architecture.** TokenRig adopts the Qwen3-0.6B architecture (GQA + RoPE) for its autoregressive backbone; the [Qwen3 config](https://huggingface.co/Qwen/Qwen3-0.6B) is fetched automatically by `download.py`.
145
+ - **Hardware.** An NVIDIA GPU with at least **14 GB** of memory is required for inference.
146
+ - **Training data.** The checkpoints were trained on a composite of ArticulationXL 2.0 (70%), VRoid Hub (20%), and ModelsResource (10%); the processed data splits will be released as a separate dataset repository later.
147
+
148
+ ## Related Links
149
+
150
+ - Your 3D AI workspace β€” **Tripo**: <https://www.tripo3d.ai>
151
+ - Project page: <https://zjp-shadow.github.io/works/SkinTokens/>
152
+ - Paper (arXiv): <https://arxiv.org/abs/2602.04805>
153
+ - Main code repository: <https://github.com/VAST-AI-Research/SkinTokens>
154
+ - Predecessor: [UniRig (SIGGRAPH '25)](https://github.com/VAST-AI-Research/UniRig)
155
+ - More from VAST-AI Research: <https://huggingface.co/VAST-AI>
156
+
157
+ ## Acknowledgements
158
+
159
+ - [UniRig](https://github.com/VAST-AI-Research/UniRig) β€” the predecessor to this work.
160
+ - [Qwen3](https://github.com/QwenLM/Qwen3) β€” the LLM architecture used by the TokenRig autoregressive backbone.
161
+ - [3DShape2VecSet](https://github.com/1zb/3DShape2VecSet), [Michelangelo](https://github.com/NeuralCarver/Michelangelo) β€” the shape encoder backbone used by the FSQ-CVAE.
162
+ - [FSQ](https://arxiv.org/abs/2309.15505) β€” Finite Scalar Quantization, the discretization scheme behind SkinTokens.
163
+ - [GRPO](https://arxiv.org/abs/2402.03300) β€” the policy-optimization method used for RL refinement.
164
+
165
+ ## Citation
166
+
167
+ If you find this work helpful, please consider citing our paper:
168
+
169
+ ```bibtex
170
+ @article{zhang2026skintokens,
171
+ title = {SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging},
172
+ author = {Zhang, Jia-Peng and Pu, Cheng-Feng and Guo, Meng-Hao and Cao, Yan-Pei and Hu, Shi-Min},
173
+ journal = {arXiv preprint arXiv:2602.04805},
174
+ year = {2026}
175
+ }
176
+ ```
experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4e4706a11cfb520cdde65156a0358545e4fbf8f36237aca01ea5e79d5cb5692
3
+ size 1131603979
experiments/skin_vae_2_10_32768/last.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4843f49e58afff88345806b94ca82e6cc9d8def6e7432e2853c677b154de0ed4
3
+ size 487311745