SymbolicLight V1 Open Package

This repository is the public release root for SymbolicLight V1. It combines two parts of the project narrative in one place:

The 194M Dual-Path pre-training story, which provides the main controlled language-modeling evidence.
The 0.8B scale-up release, which provides a public checkpoint, tokenizer, executable code paths, and artifact-based verification material.

The repository should therefore be read as a unified public package, not as two separate model lines. The architecture name used throughout the current paper, code, and public materials is SymbolicLight V1.

Package Layout

LICENSE: Apache License, Version 2.0, covering the released code and public assets unless otherwise stated
WEIGHTS_LICENSE.md: license scope for the cleaned model weights and tokenizer assets
MODEL_CARD.md: model card for the released SymbolicLight V1 checkpoint
NOTICE: copyright and release-boundary notice
THIRD_PARTY_NOTICES.md: third-party dependency notice
src/: public Python implementation, training loop, tokenizer tooling, and inference scripts
tokenizer/: released tokenizer model, vocabulary, and configuration
weights/pytorch/latest.pt: released weights-only 0.8B checkpoint
paper/: unified English manuscript, Chinese companion manuscript, bibliography, and compiled PDFs
artifacts/: public smoke-test logs and checkpoint metadata summaries
docs/: project lineage, 194M training narrative, and release-facing documentation
REPRODUCIBILITY.md: artifact-based reproducibility scope and verified commands
train_runs_194m.json: registry for the four main 194M runs and selected historical comparison checkpoints

What This Package Tries To Tell

The intended public narrative is:

194M SymbolicLight V1 is the main controlled study. It establishes that a spike-gated dual-path language model can train stably at high activation sparsity and remain competitive with dense baselines.
0.8B SymbolicLight V1 is scale-up evidence. It shows that the same overall architectural direction can be extended to a larger native pre-training run.
The public release supports artifact inspection, checkpoint verification, inference verification, and smoke-test training. It does not provide full public reconstruction of the original private pre-training corpus.

Public Boundary

The release includes code, tokenizer assets, cleaned model weights, and artifact-level documentation. Training and validation corpora are not distributed with this repository; only aggregate data categories, mixture proportions, and preprocessing rules are documented.

License

The Apache-2.0 license applies to the released code, tokenizer assets, cleaned model weights, and public documentation. It does not apply to training or validation corpora, which are not distributed with this repository. The public data disclosure is limited to aggregate domain categories, mixture proportions, and preprocessing rules.

Downloads last month: -; Downloads are not tracked for this model. How to track

SymbolicLight-AGI
/

SymbolicLight-V1

SymbolicLight V1 Open Package

Package Layout

What This Package Tries To Tell

Recommended Reading Order

Public Boundary

License