Ilyas Moutawwakil's picture

Ilyas Moutawwakil

IlyasMoutawwakil

·

IlyasMoutawwakil

AI & ML interests

Optimization, LLMs, Hardware, Backends, ..

Recent Activity

replied to their post 1 day ago

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥 Why it had to be done 👇 PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph ! Transformers models are now easier to: ⚙️ Compile end-to-end with torch.compile backends 📦 Export reliably via torch.export and torch.onnx.export 🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes. This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators. We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere. There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs. PR in the comments ! More updates coming coming soon !

posted an update 1 day ago

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥 Why it had to be done 👇 PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph ! Transformers models are now easier to: ⚙️ Compile end-to-end with torch.compile backends 📦 Export reliably via torch.export and torch.onnx.export 🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes. This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators. We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere. There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs. PR in the comments ! More updates coming coming soon !

liked a Space 10 days ago

nvidia/kvpress-leaderboard

View all activity

Organizations

published an article 3 months ago

Article

Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

+2

Oct 16, 2025

•

18

published an article 3 months ago

Article

Get your VLM running in 3 simple steps on Intel CPUs

+3

Oct 15, 2025

•

22

published an article 9 months ago

Article

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

+7

Apr 29, 2025

•

43

published an article 10 months ago

Article

🚀 Accelerating LLM Inference with TGI on Intel Gaudi

+3

Mar 28, 2025

•

14

published an article about 1 year ago

Article

Benchmarking Language Model Performance on 5th Gen Xeon at GCP

+1

Dec 17, 2024

•

7

published an article about 2 years ago

Article

AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU

+4

Dec 5, 2023

•

4

published an article over 2 years ago

Article

Overview of natively supported quantization schemes in 🤗 Transformers

+3

Sep 12, 2023

•

13

published an article over 2 years ago

Article

Overview of natively supported quantization schemes in 🤗 Transformers

+3

Sep 12, 2023

•

13