AI & ML interests

AGI, LLMs, Knowledge Graph, Palmyra, Domain Specific LLM

Articles

wassemgtk 
posted an update 10 months ago
view post
Post
3271
I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb
  • 1 reply
·
wassemgtk 
posted an update 10 months ago
view post
Post
2137
For fun, a new project: SuperTokenizer! A BPE tokenizer trained on C4 to beat GPT-4. Byte-level, A100-powered, and open-source. Messing around with tokens!
https://github.com/wassemgtk/SuperTokenizer
  • 1 reply
·
wassemgtk 
posted an update 11 months ago
view post
Post
1922
# GESAL: Real-Time Adaptation for LLMs


We’re excited to unveil **Graph-Enhanced Singular Adaptive Learning (GESAL)**, a framework that lets LLMs like meta-llama/Llama-3.2-1B adapt in real time using user feedback. Check out the code and white paper on GitHub!

🔗 **Code**: [https://github.com/writer/AI-Adaptive-Learning-GESAL](https://github.com/writer/AI-Adaptive-Learning-GESAL)

---

## Why GESAL?

Static LLMs struggle to adapt without heavy retraining. GESAL solves this with:
- **SVF**: Adapts weights via \( W' = U (\Sigma \cdot z) V^T \), using few parameters.
- **Graph Memory**: Stores adaptations in nodes for scalability.
- **RL**: Updates via \( J(z) = \mathbb{E}[\log \pi_z(y|x) r] \) based on feedback.

---

## How It Works

Ask "How many R’s in ‘strawberry’?" If it says "2" and you say "no," GESAL learns to say "3" next time, avoiding repeats.

---

## Try It

Built with Hugging Face’s transformers:
pip install transformers torch numpy
python Adaptive_Learning_(GESAL).py

Needs a Hugging Face token for Llama-3.2-1B.

---

## Results

GESAL hits 95% accuracy after 5 feedbacks vs. LoRA’s 70%. It’s efficient (~0.5M params) and scalable.
·
wassemgtk 
posted an update almost 2 years ago
view post
Post
3656
Writer team had the opportunity to run an eval for Mixtral-8x22b, results were interesting.

| ---------------------------- |
| #mmlu 77.26 |
| ---------------------------- |
| #hellaswag 88.81 |
| ---------------------------- |
| #truthfulqa 52.05 |
| ---------------------------- |
| #arc_challenge 70.31 |
| ---------------------------- |
| #winogrande 84.93 |
| ---------------------------- |
| #gsm8k 76.65 |
| ---------------------------- |
  • 2 replies
·
wassemgtk 
posted an update almost 2 years ago
view post
Post
We are thrilled to announce the release of the OmniACT dataset! This revolutionary dataset and benchmark focuses on pushing the limits of how virtual agents can facilitate the automation of our computer tasks. Imagine less clicking and typing, and more observation as your computer takes care of tasks such as organizing schedules or arranging travel arrangements on its own.

Check it out ➡️ [OmniACT Dataset on Hugging Face]( Writer/omniact)

For a deep dive, here’s the paper: [OmniACT Paper](https://arxiv.org/abs/2402.17553)