🦜 ARCHE3-Animal β€” A Biologically-Inspired Cognitive Architecture

Open Synapse Labs β€” Intelligence for a Better World

"First and foremost, we create intelligence, not neural networks. Because only this way can we achieve results that truly affect the reality we live in."

β€” Ilya Osovskoy, Founder of Open Synapse Labs

License: CC BY-NC-SA 4.0 Python 3.10+ PyTorch Steps GitHub


Overview

ARCHE3-Animal is a 1-billion-parameter neural architecture designed as a cognitive simulation of the African Grey Parrot (Psittacus erithacus) β€” one of the most cognitively sophisticated animals on Earth. This is a research project from Open Synapse Labs exploring what it means to build intelligence from the ground up: not by scaling transformers, but by designing systems that learn the way biological brains do.

The model is built around two core innovations: Hebbian learning for expert specialization and Dopamine System v2 that modulates the balance between general and specialized knowledge in real time. These are not retrofitted add-ons β€” they are the primary learning mechanisms. Backpropagation is used only for the Dense Core; the 7,200 expert zones learn exclusively through dopamine-modulated Hebbian updates, with no gradient flow through them whatsoever.

The model has been trained for 83,480 steps across all four cognitive hemispheres.

What is available:

  • βœ… Model weights (this repository) β€” free to use for research and non-commercial purposes
  • βœ… Reference architecture code on our GitHub: github.com/opensynapselabs

What is not available:

  • ❌ Training and initialization scripts (proprietary)
  • ❌ Dopamine System v2 source (proprietary)
  • ❌ Dataset generation scripts (proprietary)

The weights may not be sold or used for any commercial purpose. See license for full terms.

If you are a researcher and want to learn more, or if you want to join the team β€” write to: opensynapselabs@proton.me

Code was written with assistance from Claude (Anthropic) to accelerate development. All algorithms β€” Hebbian learning rules, Dopamine System v2, SmartRouter, and HiveStore β€” are original work by Open Synapse Labs.


Architecture

System Overview

Input (tokens or audio)
        ↓
Token Embedding  [vocab: 16,000 tokens, d_model: 1024]
        ↓
Dense Core Input  [3 transformer layers, 8 heads, GQA n_kv=2]
        ↓
SmartRouter β†’ selects top-3 experts from 7,200 zones
        ↓
HiveStore Expert Load [mmap from disk, only 3 experts active per step]
        ↓
Expert Projections  [rank: 128 β†’ d_ff: 2048, SiLU activation]
        ↓
Dopamine-Modulated Fusion [dynamic gain: 0.5×–10.0Γ—]
        ↓
Dense Core Fusion  [3 synthesis layers]
        ↓
Output logits [vocab_size: 16,000]

In parallel (after every step):
Dopamine System v2 β†’ Sparse Hebbian Update β†’ 3 active experts written back to HiveStore

Cognitive Hemispheres

The model is divided into four semantic hemispheres, each containing 1,800 expert zones:

Hemisphere Zones Specialization Token Share
AudioMimicry 1,800 Sound mimicry, acoustic patterns ~23%
AgencyExploration 1,800 Problem-solving, tool use, causality ~31%
SocialResonance 1,800 Empathy, social dynamics, attachment ~15%
EnvironmentMapping 1,800 Spatial navigation, mental maps, landmarks ~31%

Hemispheres are not isolated silos. During generation, the model switches between them 6–8 times per 10 tokens β€” a direct reflection of distributed processing observed in biological neural systems.

Audio Tokenization (VQ)

Audio input is processed through a discrete tokenization pipeline:

MP3/WAV [16kHz, mono]
    ↓
Mel-Spectrogram [80 bins, hop=160ms]
    ↓
VQ Encoder [256 β†’ 128 dim]
    ↓
VectorQuantizer [1024 codebook, commitment_cost=0.25]
    ↓
Discrete tokens <vq_N> [N ∈ 0..1023]
    ↓
Decoder [128 β†’ 1024 (d_model)]

78 unique VQ tokens were extracted from 11 MP3 files (~4 hours of parrot audio), producing 943,310 token instances. The tokenizer was expanded from 20 base tokens to 2,357 (2,279 text + 78 VQ).

HiveStore β€” Memory-Mapped Expert Storage

All 7,200 experts are stored in a single binary file animal_hive.bin and accessed via mmap. This is the mechanism that makes a 1B-parameter model trainable on consumer hardware:

Component On Disk RAM During Training
Dense Core (dense_core.bin) ~0.24 GB Fully loaded β€” updated via backprop every step
HiveStore (animal_hive.bin) ~1.76 GB Only 3 active experts (~1.5 MB) streamed per step

Dense Core must reside fully in RAM during training because it is updated via backpropagation on every step. Expert zones, by contrast, are updated one at a time through Hebbian learning and are streamed from disk on demand β€” the remaining 7,197 experts are never touched during a given step.

Before optimization:  10+ GB RAM (all experts as nn.Parameter)
After:                 0.66 GB RAM total

Hebbian Learning & Dopamine System v2

These two mechanisms are the core of what makes ARCHE3-Animal distinct from standard transformer architectures. They replace the optimizer for all expert weights.

Hebbian Learning

Expert zones do not use gradient descent. They are updated through a Sparse Hebbian rule, where only the 3 zones selected by the SmartRouter receive weight updates after each training step:

hive.read_expert(id) β†’ compute Hebbian delta β†’ hive.write_expert(id, updated_weights)

The magnitude of the update is modulated by the current dopamine signal β€” zones that contributed to high-reward outcomes are reinforced more strongly. Zones that were not selected during a step are untouched. This creates a form of continual learning where knowledge accumulates in specialist zones without catastrophic forgetting in the rest.

Expert weights are initialized using Xavier initialization (std β‰ˆ 0.0417), which ensures non-zero initial activations and stable Hebbian dynamics from the very first training step.

Dopamine System v2

The Dopamine System v2 solves a fundamental signal imbalance. Without intervention, the Dense Core output was approximately 10,000Γ— larger in magnitude than the expert (Hive) output, effectively drowning out all specialist knowledge in the fusion step.

Dopamine-Modulated Gain addresses this by dynamically scaling the expert contribution based on the current reward state:

Fusion_Input = Dense_Core_Output + Dopamine_Gain Γ— Hive_Output

Dopamine_Gain = gain_min + (gain_max βˆ’ gain_min) Γ— dopamine_level
dopamine_level ∈ [0, 1]   ← EMA of normalized reward (Ξ± = 0.9/0.1)
Dopamine Level Gain System Behavior
0.1 (low reward) 1.45Γ— Dense Core dominates β€” stability phase
0.5 (neutral) 5.25Γ— Balanced
0.9 (high reward) 9.05Γ— Experts dominate β€” specialization phase

The EMA smoothing (coefficient 0.9/0.1) prevents sharp oscillations and gives the system inertia β€” a single poor step does not collapse expert influence. This directly mirrors neuromodulation in biological brains, where dopamine gates how strongly specialized circuits influence downstream processing.

Additional Dopamine System v2 components:

  • Creation bonus β€” extra reward for activating rarely-used zones, encouraging exploration of underutilized expert knowledge
  • 4th sense (intuition) β€” an internal drive vector that modulates zone activation based on accumulated training signal
  • Novelty tracking β€” zones processing genuinely novel inputs receive boosted dopamine signals (Ο„ = 1,000 steps decay)

Research Findings

All results below were obtained through systematic analysis scripts run after training.

1. Routing Stability

Test: input "hello" repeated 20 times.

  • Hemisphere stability: 100% (all 20 β†’ Hemisphere 0 / AudioMimicry)
  • Zone reuse rate: 95% (same 3 zones out of 60 possible)
  • Routing: fully deterministic (softmax + argmax)

Identical inputs always activate identical zones. This is a prerequisite for efficient inference caching and predictable model behavior.

2. Generation Diversity

Test: 10 independent generations from prompt "hello".

1. "hello bird pretty whistle trill chirp"
2. "hello boy trill trill <sound> good"
3. "hello whistle pretty click cracker <mimic>"
4. "hello hello <bos> growl whistle growl"
5. "hello <mimic> cracker <eos> click chirp"

Diversity score: 1.0 (100% unique outputs)

With 95% routing stability, generation is still 100% diverse. The same zones produce genuinely different content β€” diversity emerges from sampling, not routing variation. Stable routing with stochastic sampling is the optimal operating point.

3. Zone Specialization

Category Unique Zones Activations Specialization Score
Greetings (hello Γ— 3 variants) 3 9 0.667 (HIGH)
Sounds (squawk, whistle, chirp) 9 9 0.000 (LOW)
Social (good boy, pretty bird) 6 6 0.000 (LOW)

All three "hello" variants activated exactly the same 3 zones: [97, 1373, 46]. Frequent patterns develop dedicated zone ensembles β€” a direct parallel to specialized neural assemblies in biological memory systems.

4. Memorization vs. Generalization

Test 1 β€” Training phrase recall: The model predicts out-of-vocabulary tokens with low confidence (0.05–0.07%). No mechanical memorization of training data.

Test 2 β€” Novel combinations:

"hello hello" β†’ "hello hello hello bird trill"    βœ… generalizes
"good good"   β†’ "good good boy <silence> <sound>" βœ… generalizes
"boy bird"    β†’ "boy bird click growl <bos>"       βœ… generalizes

Test 3 β€” Compositional hemisphere routing:

"hello" (H3) + "boy" (H0)   β†’ "hello boy"   routes to H3  βœ…
"good"  (H2) + "bird" (H2)  β†’ "good bird"   routes to H2  βœ…
"pretty" (H3) + "boy" (H0)  β†’ "pretty boy"  routes to H3  βœ…

Combined inputs inherit the routing of their dominant component. The model has learned compositional semantics, not surface patterns.

Test 4 β€” Zone activation overlap:

Repetition scaling ("hello", "hello hello", "hello hello hello"):

  • All activate the same 3 zones [339, 796, 1556]
  • Overlap: 100% β€” shared representations for repetition patterns

Sound category ("squawk", "whistle", "chirp"):

  • Activate completely different zones
  • Overlap: 0% β€” fully specialized representations per concept

5. Cross-Hemisphere Dynamics

Key discovery: The model switches hemispheres 6–8 times during 10-token generation.

"hello" β†’ AudioMimicry β†’ AgencyExploration β†’ AudioMimicry β†’ AudioMimicry β†’
           AgencyExploration β†’ AudioMimicry β†’ EnvironmentMapping β†’ AudioMimicry β†’
           AgencyExploration β†’ SocialResonance

Transition rate: 60–80%. No single hemisphere dominates generation. All four hemispheres participate in producing every output sequence.

Context-independent routing (surprising finding):

Testing "hello" embedded in four different contexts:

Context Hemisphere Probability Zones
"hello" alone AudioMimicry 0.304 [1045, 703, 1207]
"good hello" AudioMimicry 0.304 [1045, 703, 1207]
"squawk hello" AudioMimicry 0.304 [1045, 703, 1207]
"bird hello" AudioMimicry 0.304 [1045, 703, 1207]

Individual tokens route deterministically regardless of surrounding context. Context affects what gets generated next, not how the current token routes. This property is highly desirable for inference-time caching of routing decisions.

Close hemisphere competition:

Input Top Hemisphere Score 2nd Hemisphere Score Diff
"hello boy" EnvironmentMapping 0.340 SocialResonance 0.298 0.043 ⚠️
"squawk hello" AudioMimicry 0.304 EnvironmentMapping 0.273 0.031 ⚠️
"pretty whistle" AudioMimicry 0.613 SocialResonance 0.184 0.430 βœ…

Mixed-semantics inputs produce close competition between top hemispheres (diff < 0.15), suggesting both contribute. Aligned-semantics inputs produce a clear single winner (diff > 0.4).

6. Weight Statistics

Hemisphere Mean Std Dev Sparsity
AudioMimicry -0.000000 0.00276 0.0%
AgencyExploration -0.000001 0.01754 0.0%
SocialResonance 0.000006 0.02454 0.0%
EnvironmentMapping -0.000005 0.01676 0.0%

Zero sparsity across all hemispheres. No dead neurons. Hebbian learning without backpropagation produces healthy, fully-utilized weight distributions β€” no explicit sparsity regularization was required.


Technical Specification

Config:
  d_model         = 1024
  vocab_size      = 16,000
  n_layers_input  = 3
  n_layers_fusion = 3
  n_heads         = 8
  n_kv_heads      = 2          # Grouped Query Attention
  d_ff_dense      = 1536

  num_hemispheres      = 4
  zones_per_hemisphere = 1800
  total_zones          = 7200

  expert_rank = 128
  expert_d_ff = 2048
  k_max       = 3              # top-k sparse activation

  dopamine_gain_min    = 0.5
  dopamine_gain_max    = 10.0
  dopamine_alpha       = 0.4
  dopamine_beta        = 0.3
  novelty_tau          = 1000
  creation_weight      = 0.6

  sample_rate = 16000          # Hz
  n_mels      = 80
  hop_length  = 160            # 10ms frames

  VQ_codebook_size    = 1024
  VQ_dim              = 128
  VQ_commitment_cost  = 0.25

Total parameters: ~1B (1,024M)


Weight Storage

Expert weights and the Dense Core are stored as flat binary files, updated in-place after every training step:

  • dense_core.bin β€” Dense Core weights, updated via backprop (fully in RAM during training)
  • animal_hive.bin β€” 7,200 expert zones, updated via Hebbian learning (streamed via mmap)

No external save() calls. No checkpoint duplication. The model state is always current.


Roadmap

  • VQ audio tokenization implemented and tested
  • Dopamine-Modulated Gain (System v2)
  • HiveStore mmap architecture (RAM: 10+ GB β†’ 0.66 GB)
  • Hebbian learning without backpropagation for experts
  • Cross-hemisphere research and analysis
  • 83,480 training steps completed
  • Extended training on expanded audio and text datasets
  • Arche4-rift β€” next-generation model applying ARCHE3 findings to code generation

License

Model weights are released under CC BY-NC-SA 4.0.

  • βœ… Free to download and use for research
  • βœ… Free to share with attribution
  • ❌ Cannot be used commercially
  • ❌ Cannot be sold or monetized
  • ❌ Modifications and fine-tunes allowed, but must use the same CC BY-NC-SA 4.0 license

Citation

@misc{osovskoy2026arche3animal,
  title        = {ARCHE3-Animal: A Biologically-Inspired Sparse MoE Architecture
                  Simulating African Grey Parrot Cognition},
  author       = {Osovskoy, Ilya},
  year         = {2026},
  organization = {Open Synapse Labs},
  url          = {https://huggingface.co/opensynapselabs/arche3-animal},
  note         = {Code written with assistance from Claude (Anthropic).
                  All algorithms are original work by Open Synapse Labs.}
}

Open Synapse Labs

opensynapselabs@proton.me Β· github.com/opensynapselabs

"First and foremost, we create intelligence, not neural networks. Because only this way can we achieve results that truly affect the reality we live in."

β€” Ilya Osovskoy, Founder of Open Synapse Labs

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support