1 15

Tranheden

WilhelmT

AI & ML interests

None yet

Recent Activity

liked a model about 23 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

updated a Space 2 days ago

embedl/Edge-Inference-Benchmarks

updated a model 2 days ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

View all activity

Organizations

liked a model about 23 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

Image-Text-to-Text • 2B • Updated about 23 hours ago • 603 • 3

updated a Space 2 days ago

Edge Inference Benchmarks

🚀

On-Device benchmarks across devices and models.

updated a model 2 days ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

Image-Text-to-Text • 2B • Updated about 23 hours ago • 603 • 3

liked a Space 10 days ago

Edge Inference Benchmarks

🚀

On-Device benchmarks across devices and models.

reacted to JonnaMat's post with 🔥 14 days ago

Post

1605

🤯 Edge-Grade Vision Reasoning. Now Practically Lossless. 🤯

Introducing
👉 embedl/Cosmos-Reason2-2B-W4A16-Edge2
Optimized for Jetson Orin Nano Super and AGX Orin

nvidia .

🚄 Try it out on Jetson (image+video+text):

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2

🤓 What is Edge2? Most weights → INT4 | Activations → FP16 | Select sensitive layers → kept in FP16.
Edge2 preserves precision where it matters most; while keeping the model small and fast enough for edge GPUs. 😎

liked 2 models 15 days ago

embedl/Cosmos-Reason2-2B-NVFP4A16

Image-Text-to-Text • 2B • Updated 10 days ago • 319 • 1

embedl/Cosmos-Reason2-2B-W4A16-Edge2

Image-Text-to-Text • 2B • Updated 10 days ago • 12.9k • 10

reacted to JonnaMat's post with 🔥 16 days ago

Post

2532

⚡ Blackwell-native Vision Reasoning at the edge ⚡

Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16

💖 Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.

Thorough on-device benchmarks on AGX Thor in the modelcard. 🤓 📊

Try it out:

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9