| --- |
| license: apache-2.0 |
| tags: |
| - devsecops |
| - llm |
| - sft |
| - lora |
| - tulu-3 |
| - kubernetes |
| - terraform |
| --- |
| |
| # DevSecOps Model Platform |
|
|
| > Train a secure model on the best data, then deploy it securely. |
|
|
| ## Start Here: Train Your Model |
|
|
| | Dataset | Size | What It Gives You | Command | |
| |---------|------|-------------------|---------| |
| | **tulu-3-sft-mixture** | 940K | Math, code, safety, chat (BEST) | python model/train_tulu3.py | |
| | **OpenThoughts-114k** | 114K | Reasoning, chain-of-thought | python model/train_openthoughts.py | |
|
|
| **allenai/tulu-3-sft-mixture** is from Allen AI Tulu 3 - current SOTA open instruction-tuned model. Proven on Llama-3.1-8B: MMLU 53.5, GSM8K 79.9, HumanEval 76.8. |
|
|
| LoRA config from LoRA Without Regret (Schulman 2025): r=256, alpha=16, all-linear = matches full fine-tuning at 67% compute. |
|
|
| ## Repository Structure |
|
|
| ``` |
| model/ THE MODEL - train, serve, enhance |
| train_tulu3.py Primary: 940K best data (zero preprocessing) |
| train_openthoughts.py Reasoning: 114K CoT traces |
| finetune_configurable.py Multi-dataset configurable trainer |
| rag_pipeline.py RAG for DevSecOps knowledge |
| DATASETS.md Why these datasets, proven recipes |
| |
| deployment/ SERVE IT - Kubernetes + Docker + vLLM |
| deployment.yaml ML inference K8s manifest |
| mlflow-deployment.yaml Experiment tracking |
| Dockerfile.ml-inference Hardened multi-stage image |
| |
| security/ PROTECT IT - scanning + policies |
| scanning/ Trivy, Semgrep, Checkov, SBOM |
| policies/ Kyverno, OPA Gatekeeper |
| |
| infrastructure/ RUN IT - Terraform + monitoring + CI/CD |
| terraform/ VPC, EKS, RDS, S3, IAM, KMS, GuardDuty, Macie |
| monitoring/ Prometheus, Alertmanager, OTEL, Grafana |
| ci-cd/ GitHub Actions DevSecOps pipeline |
| |
| compliance/ CERTIFY IT - SOC2, NIST, CIS |
| controls-mapping.yaml SOC2 Type II |
| nist-800-53-mapping.yaml NIST 800-53 Rev5 |
| cis-eks-k8s.yaml CIS Benchmarks |
| ``` |
|
|
| ## Quick Commands |
|
|
| ```bash |
| # Train on best data (A100, ~6h) |
| python model/train_tulu3.py |
| |
| # Quick test (any GPU) |
| python model/train_tulu3.py --max_steps 100 --no_push |
| |
| # Security scan |
| python security/scanning/security_audit.py |
| |
| # Deploy model to K8s |
| kubectl apply -f deployment/deployment.yaml |
| |
| # Infrastructure (Terraform) |
| cd infrastructure/terraform/environments/prod && terraform apply |
| ``` |
|
|