Feature Extraction
Transformers
PyTorch
scaling_law_forecaster
scaling-laws
neural-scaling
performance-prediction
configuration-to-performance
custom_code
Instructions to use OptimizerStudy/NCPL-intermediate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OptimizerStudy/NCPL-intermediate with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="OptimizerStudy/NCPL-intermediate", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OptimizerStudy/NCPL-intermediate", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: Qwen/Qwen3-1.7B | |
| tags: | |
| - scaling-laws | |
| - neural-scaling | |
| - performance-prediction | |
| - configuration-to-performance | |
| - pytorch | |
| library_name: transformers | |
| # NCPL-intermediate: Neural Configuration to Performance Scaling Law | |
| This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations. | |
| ## Model Description | |
| **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that: | |
| - Takes pretraining configurations as input | |
| - Predicts intermediate performance metrics using learned scaling law patterns | |
| - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP | |
| - Supports multiple scaling law formulations (Marin, StepLaw) | |
| ### Architecture | |
| The model consists of: | |
| 1. **Base Model**: Qwen/Qwen3-1.7B | |
| - Provides contextual embeddings for text tokens | |
| 2. **Numeric MLP**: | |
| - Processes numeric values (performance metrics, configuration parameters) | |
| - Projects numeric inputs to the same hidden dimension as text embeddings | |
| - Architecture: Linear(1 → 2*hidden_size) → ReLU → Linear(2*hidden_size → hidden_size) | |
| 3. **Prediction Head**: | |
| - Linear layer mapping from hidden_size to scalar predictions | |
| - Outputs performance forecasts for each token position | |
| ## Training Data | |
| The model was trained on: | |
| - **Datasets**: Marin and StepLaw scaling law datasets | |
| - **Training configuration**: | |
| - Stage 1: 10 epochs with learning rate 5e-5 (frozen base model) | |
| - Stage 2: 400 epochs with learning rate 1e-5 (full fine-tuning) | |
| - Batch size: 480 (across 8 GPUs) | |
| - Weight decay: 0.01 | |
| - Loss: MSE (Mean Squared Error) | |
| ## Usage | |
| The `ScalingLawForecaster` class can be found in the [GitHub repository](https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law). | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer | |
| # Get ScalingLawForecaster from: https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law | |
| from model import ScalingLawForecaster | |
| # Load model | |
| model = ScalingLawForecaster( | |
| base_model_name="Qwen/Qwen3-1.7B", | |
| init_from_pretrained=True, | |
| force_fp32=True | |
| ) | |
| # Load checkpoint | |
| checkpoint = torch.load("pytorch_model.bin") | |
| model.load_state_dict(checkpoint["model_state_dict"]) | |
| model.eval() | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B") | |
| # Prepare inputs | |
| # input_ids: tokenized text sequence | |
| # is_number_mask: boolean mask indicating which tokens are numeric | |
| # number_values_filled: actual numeric values (0 for non-numeric tokens) | |
| with torch.no_grad(): | |
| predictions = model( | |
| input_ids=input_ids, | |
| is_number_mask=is_number_mask, | |
| number_values_filled=number_values_filled, | |
| attention_mask=attention_mask | |
| ) | |
| ``` | |
| ## Input Format | |
| The model expects three key inputs: | |
| 1. **input_ids** (torch.LongTensor): Tokenized sequence with special numeric tokens | |
| 2. **is_number_mask** (torch.BoolTensor): Boolean mask marking numeric token positions | |
| 3. **number_values_filled** (torch.FloatTensor): Actual numeric values at marked positions | |
| ## Intended Use | |
| This model is designed for: | |
| - **Scaling law research**: Understanding how neural network performance scales with configuration | |
| - **Performance forecasting**: Predicting model performance before full training | |
| - **Configuration optimization**: Finding optimal hyperparameters based on scaling patterns | |
| - **Resource planning**: Estimating computational requirements for different model sizes | |
| ## Limitations | |
| - Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning | |
| - Requires properly formatted inputs with numeric tokens replaced and masked | |
| ## Citation | |
| If you use this model in your research, please cite: | |
| ```bibtex | |
| @article{ncpl2026, | |
| title = {Neural Configuration to Performance Scaling Law}, | |
| author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma}, | |
| journal = {arXiv preprint arXiv:2602.10300}, | |
| year = {2026}, | |
| url = {https://www.arxiv.org/abs/2602.10300} | |
| } | |
| ``` | |