In a Training Loop 🔄

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

updated a dataset about 11 hours ago

John6666/forum3

liked a model about 12 hours ago

NhatCuong22/Qwen2.5-7B-LogicReasoning-LoRA

liked a model about 12 hours ago

mradermacher/Qwen2.5-7B-LogicReasoning-LoRA-GGUF

View all activity

Organizations

reacted to prithivMLmods's post with 🔥🤗 4 days ago

Post

5984

PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

🤗 Space: prithivMLmods/PiD-Image-Upscaler
🔗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > To learn more, visit the app page or the respective model pages.

reacted to sergiopaniego's post with 🔥 4 days ago

Post

185

periodic reminder 🧐

some HF blog posts use a special template, long-form, animated, super deep

I keep them all in one collection that gets updated every time a new one drops so you don't lose track

https://hf.co/collections/sergiopaniego/research-and-long-form-blog-posts

spot one missing? let me know

reacted to RiverRider's post with 🔥 4 days ago

Post

4771

SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning

I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.

The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.

This is not a summary of the final output. It is a direct window into the model’s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.

The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.

Try it:
RiverRider/srt-introspect

reacted to juiceb0xc0de's post with 👀 4 days ago

Post

169

What am i building now you ask? A Hugging Face Space that maps ML training components as a compatibility graph. You pick an optimizer, see what pairs with it, what breaks, and why (with cited sources).

Think skill tree meets training recipe builder. Helps you discover that you don't always need AdamW + Cosine. There's a whole ecosystem of combinations most people never try.

If you're new to ML or just stuck in a routine build yourself a new training suite. It could be a great decision! Or a waste on GPaaS I really can't say. I'm just a bartender, don't believe what I say most of the time.

juiceb0xc0de/forge

reacted to appvoid's post with 🔥 4 days ago

Post

152

As an advocate for small language models I just want to say. It might not actually be the end for small models. We are just getting started! Now that we have super good models we can find creative ways to replicate the behavior at small scale!

I'll show you in a few weeks what a small model is capable of, you will surprised.

1 reply

reacted to SeaWolf-AI's post with 🔥 4 days ago

Post

4158

Darwin-60B-DUO: Two SOTAs, One Endpoint — 88.38% on GPQA Diamond 🚀

We're excited to release Darwin-60B-DUO, the Darwin family's first DUO model. Take two domain-verified specialists, hide them behind a single OpenAI-compatible endpoint, and let a router decide which one (or both) answers. You see one model, one API — but get the best of both.

The number that matters: on the full 198-question GPQA Diamond, Darwin-60B-DUO hits 88.38%. The constituents alone land at 69.70% (Darwin-28B-REASON) and 77.27% (AWAXIS-Think-31B); a naive cascade only reaches 83.84%. The DUO clears them all. Two small specialists, intelligently routed, beat one big generalist on cost and quality. Both are independently verified — Darwin-28B-REASON is #3 on the HF GPQA Diamond leaderboard, AWAXIS-Think-31B is #1 on Korea's national K-AI Leaderboard (MSIT).

The brains is a Hybrid-A router picking one of five strategies on the fly. Korean → AWAXIS, English/STEM → Darwin (single-backend, ~70% of traffic at 1× cost). When a Korean answer needs rigorous English reasoning, split_refine fires — Darwin drafts, AWAXIS polishes; MCQ/short-answer runs both with self-consistency + cross-verify. Net effective cost: only ~1.3× a single 30B model.

The part the community will care about: the gateway is model-agnostic and Apache-2.0. Point it at any two OpenAI-compatible backends and you've got a DUO in minutes — teach router.py when to use which, and parallel calls, response merging, and routing transparency via _duo_route are handled for you. Fork it and tell us what you built.

Painless deploy: docker compose up for both vLLM backends + gateway; FP8 ~30GB colocates on a single B200/H100. One git clone (~120GB). Text-only for now, streaming in v1.1.
Two SOTAs, one endpoint. Come build your own on the Community tab.

👇
🔗 FINAL-Bench/Darwin-60B-DUO

reacted to TravisMuhlestein's post with 😎 4 days ago

Post

2308

Interesting to see broader ecosystem momentum forming around open standards for agentic systems.

Feels like conversations are increasingly converging around the same operational requirements: identity, interoperability, governance, trust boundaries, orchestration, and coordination between agents, tools, and services.

As agents become more operational, these infrastructure layers seem increasingly important for making larger multi-agent ecosystems reliable outside controlled environments.

https://www.linuxfoundation.org/press/agentic-ai-foundation-adds-43-new-members-as-enterprise-and-government-adoption-of-open-agent-standards-accelerates

reacted to codelion's post with 😎 4 days ago

Post

3044

Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.

It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.

Model: codelion/dhara-250m

Try the chat demo here: codelion/dhara-chat

3 replies

reacted to kanaria007's post with 👀 4 days ago

Post

169

✅ Article highlight: *Appeals, Dispute Resolution, and Moderation Courts* (art-60-160, v0.1)

TL;DR:
This article argues that moderation is not just an admin action. It is an institution with due process.

If a system can ban, seize, disqualify, fine, or imprison, then “trust us” is not enough. Coercive actions need a full receipted chain: pinned law/policy, incident + evidence, enforcement action, appeal path, panel decision, disclosure rules, and remedies that can be audited later.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns moderation from opaque power into legible institutional process
• makes bans, seizure, prison, fines, and DQ answerable to receipts instead of vibes
• adds bounded appeals, panel policy, quorum, and disclosure rules
• shows how remedies and custody corrections can be governed without silent history edits

What’s inside:
• a *moderation case envelope* that binds incident → enforcement → appeal → adjudication
• *appeal policies* with filing windows, standards, and evidence/disclosure rules
• *panel policies* and *panel assignment receipts* so adjudication has real authority and quorum
• *disclosure manifests* that explain what was shown, what was redacted, and why
• *custody correction receipts* for reversing seizure, rollback transfers, and ownership fixes
• bounded publication rules so “we banned them” becomes a scoped claim, not a fact dump

Key idea:
Do not say:

*“the mods reviewed it and took action.”*

Say:

*“this coercive action was bound to this law and policy basis, this incident and evidence bundle, this enforcement receipt, this appeal path, this panel decision, this disclosure posture, and these receipted remedies.”*

That is how moderation becomes legible coercion instead of hidden power.

reacted to AbstractPhil's post with 🤗 4 days ago

Post

796

The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.

The transformer operates with the "single" setting.

AbstractPhil/geolip-svae-transformer

I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.

As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.

Crusher is ready, transformer_v3.

You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.

9 replies

reacted to ProCreations's post with 🤗 4 days ago

Post

581

I kind of forgot to post that I made my AI model Intellite version 1, but yea, it is here. ProCreations/intellite-500m-sft

It is tiny and not extremely trained making it prone to hallucination, please double check all information. I can't afford to train it more or increase model size, so if anyone somehow has access to compute and want's to contribute, let me know.

reacted to pankajpandey-dev's post with 👀 4 days ago

Post

646

🇮🇳 Just shipped: MiniCPM5-1B-Hindi-Instruct (+ GGUF quants)

First Hindi instruction-tuned fine-tune of OpenBMB's brand-new MiniCPM5-1B (released this week).

Trained with Unsloth + LoRA (r=32) on AI4Bharat's anudesh + dolly Hindi splits — ~4k high-quality examples, 2 epochs on a single T4 in 60 minutes.

🔗 Model (16-bit + LoRA adapter):
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

📦 GGUF quants for llama.cpp / Ollama / LM Studio:
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

5 quant levels — from Q3_K_M (~560 MB, runs on a Raspberry Pi) to Q8_0 (~1.2 GB, near-lossless). Q4_K_M is the recommended default.

Part of my ongoing 🇮🇳 Hindi LLM Series — bringing strong open-source LLMs to Indian languages.

#Hindi #IndicNLP #MiniCPM5 #LoRA #Unsloth #GGUF #llamacpp #Ollama #LocalLLM

reacted to PhysiQuanty's post with 🔥 5 days ago

Post

4408

🌐 We crawled the entirety of Hugging Face to help the community! Huge thanks to the Hugging Face API 🌐
🤖 2.91M model repos (file names included), 📚 1.02M dataset repos, 🚀 1.31M Space repos
🤗 617,501 committers (datasets and models), we’ll share Hugging Face statistics with you in the coming days..

We also identified 61,398 users with “AI/ML Interests”, and NOW we can find each other through our “AI/ML Interests”🤗
HF-Collab-Center/Searching-For-HuggingFace-Users
HF-Collab-Center/All-Model-Repos
HF-Collab-Center/All-Dataset-Repos
HF-Collab-Center/All-Space-Repos

HF-Collab-Center/HF-Users
HF-Collab-Center/HF-Users-with-last-seen
HF-Collab-Center/HF-Users-With-AI-ML-Interests-Only

Made By @QuantaSparkLabs and @PhysiQuanty
C'est français, bon.. en anglais.. mais c'est français ;)

5 replies

replied to PhysiQuanty's post 5 days ago

Hmm... I'm not really familiar with handling that kind of HF API endpoint...🤔

reacted to Hari5115's post with 👍 6 days ago

Post

135

Spanglish. Hinglish. Franglais. Real users don't speak textbook languages 🌍

500M+ Indians type messages like the below example, Most NLP pipelines fail silently on code-switched text, so I built something to start closing that gap for regional language

Fine-tuned MuRIL on 3,000 synthetic Hinglish examples. 97.6% F1.
Not perfect, but open, working, and hopefully useful.

"bhai mera refund kab aayega" → refund_status ✓
"wrong item aaya hai" → exchange_product ✓
"payment cut ho gaya order nahi hua" → payment_issue ✓

🤖 Hari5115/hinglish-retail-intent-classifier
📦 Hari5115/hinglish-retail-intent-dataset
🚀 Hari5115/hinglish-retail-intent-demo

Building for multilingual markets? Have real code-switched data? Would love to connect 🙏

#Hindi #IndianAI #RegionalNLP #NLP #MultilingualAI #OpenSource

reacted to Locutusque's post with 👀 6 days ago

Post

183

🚀 Introducing Esmeralda-Llama-3.1-8B-control
The first release in the Esmeralda model family by Locutusque.

This model is intentionally small and experimental — a control/baseline proof-of-concept designed to answer one question:

«“How strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?”»

Training Details

- Base: Llama 3.1 8B
- Training precision: bf16 mixed precision
- Chat template: modified ChatML
- Dataset size: ~37k examples
- Examples actually used for this run: ~5k

The dataset includes:

- multi-turn agentic traces
- reasoning traces
- structured assistant behavior
- generalist instruction data

Benchmark Results

Compared against:

- Llama 3.1 8B Instruct
- Hermes-3-Llama-3.1-8B

HumanEval

57.3 — Esmeralda
56.1 — Llama 3.1 Instruct
52.4 — Hermes-3

MBPP

53.2 — Esmeralda
56.8 — Llama 3.1 Instruct
48.2 — Hermes-3

GPQA Diamond

15.7 — Esmeralda
15.7 — Llama 3.1 Instruct
18.2 — Hermes-3

EQ-Bench

59.2 — Esmeralda
61.1 — Llama 3.1 Instruct
63.1 — Hermes-3

EQ-Bench Parseable (Syntax Stability)

🔥 100.0% — Esmeralda
92.4% — Llama 3.1 Instruct
91.2% — Hermes-3

Here Be Dragons 🐉

I also experimented with a new TruthfulQA free-generation evaluation setup.

- Responses were judged by Gemma 4 26B A4B
- The judge compared generations directly against ground-truth answers
- Models were evaluated in 8-bit quantized form to speed up inference

TruthfulQA (LLM Judge)

0.682 — Esmeralda-Llama-3.1-8B-control
0.587 — Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs)

For a lightweight control run trained on only a fraction of the dataset, I’m pretty encouraged by the results.

The model is released under the standard Llama 3.1 license, and I’d genuinely love feedback from people testing it in real workflows.

Model: Locutusque/Esmeralda-Llama-3.1-8B-control

Dataset: Locutusque/esmeralda-agentic

reacted to wenhuach's post with 🔥 6 days ago

Post

4446

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

6 replies

reacted to AmelieSchreiber's post with 🧠 6 days ago

Post

137

# New Spin on Old Methods
---
- https://huggingface.co/blog/AmelieSchreiber/toricgt
- https://huggingface.co/collections/AmelieSchreiber/toricgt
---
~A

reacted to pankajpandey-dev's post with 👍 6 days ago

Post

2652

🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference:
• Q2_K → 1.4 GB
• Q3_K_M → 1.8 GB
• Q4_K_M → 2.1 GB ⭐
• Q5_K_M → 2.4 GB
• Q6_K → 2.7 GB
• Q8_0 → 3.5 GB
🔗 pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model 🙏
#GGUF #llamacpp #genomics #DNA

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity