YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3 BART Inversion Model
Vec2Text embedding inversion model using BART-base (English-only) as the decoder. Converts Qwen3-Embedding-8B (4096-dim) embeddings back to text.
Model Details
- Base model:
facebook/bart-base - Embedding model: Qwen3-Embedding-8B (4096-dim)
- Training data: 1M sentences from ClimbMix corpus
- Architecture: Vec2Text embedding inversion (embedding -> text)
Evaluation Results (200 held-out samples)
| Metric | Score |
|---|---|
| Token F1 | 0.6837 (+/-0.1693) |
| BLEU-4 | 0.2797 (+/-0.2664) |
| ROUGE-L | 0.5701 (+/-0.2118) |
| Exact Match | 7/200 (3.5%) |
Example Reconstructions
| Original | Reconstructed | Token F1 |
|---|---|---|
| "This is important because childhood sets the stage for the robustness of the im... | "This is crucial because the childhood sets robust foundations for the immune sy... | 0.811 |
| Chapter 8: Made in the USA - A Statement About Quality and Pride |
Have you ever ... | Chapter 8: Quality and Quality: The USA Made in America
Have you ever felt that... | 0.594 | | Less than 30% of the 36 million cat owners in the U.S. know that the beautiful s... | Did you know that less than 36% of cat owners in the U.S. know that their cat ca... | 0.560 | | The key ingredients of Turkish meals are meat, vegetables, and legumes.... | Key ingredients of Turkish meals include meat, vegetables, and legumes.... | 0.828 | | The platform will then generate a list of available options with prices, travel ... | The platform will then list available options with a given price, amenities, tra... | 0.775 |
Usage
import torch, torch.nn as nn, transformers
from safetensors.torch import load_file
# Load model
model = transformers.AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/bart-base")
hidden = model.config.hidden_size
embedding_transform = nn.Sequential(
nn.Linear(4096, 4096), nn.LayerNorm(4096), nn.Dropout(0.1), nn.GELU(),
nn.Linear(4096, hidden * 16),
)
# Load weights (download from this repo)
state = load_file("model.safetensors") # or torch.load("bart_noisy.pt")
et_state = {k.replace("embedding_transform.", ""): v for k, v in state.items() if k.startswith("embedding_transform.")}
embedding_transform.load_state_dict(et_state)
ed_state = {k.replace("encoder_decoder.", ""): v for k, v in state.items() if k.startswith("encoder_decoder.")}
model.load_state_dict(ed_state, strict=False)
# Invert a Qwen3-Embedding-8B embedding (4096-dim)
device = torch.device("cuda")
model, embedding_transform = model.to(device).eval(), embedding_transform.to(device).eval()
with torch.no_grad():
emb = torch.tensor(your_embedding, dtype=torch.float32).unsqueeze(0).to(device)
proj = embedding_transform(emb).reshape(1, 16, hidden)
out = model.generate(inputs_embeds=proj, attention_mask=torch.ones(1, 16, device=device),
max_length=128, num_beams=4, early_stopping=True)
text = tokenizer.decode(out[0], skip_special_tokens=True)
- Downloads last month
- 36