File size: 2,489 Bytes
9d4d5c7
 
 
 
 
 
 
 
 
 
 
7c19d46
9d4d5c7
7c19d46
9d4d5c7
7c19d46
9d4d5c7
7c19d46
9d4d5c7
 
 
 
7c19d46
9d4d5c7
7c19d46
9d4d5c7
7c19d46
9d4d5c7
7c19d46
9d4d5c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82ebd41
7c19d46
9d4d5c7
7c19d46
9d4d5c7
 
 
7c19d46
9d4d5c7
 
7c19d46
9d4d5c7
 
7c19d46
9d4d5c7
 
7c19d46
9d4d5c7
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: apache-2.0
tags:
- devsecops
- llm
- sft
- lora
- tulu-3
- kubernetes
- terraform
---

# DevSecOps Model Platform

> Train a secure model on the best data, then deploy it securely.

## Start Here: Train Your Model

| Dataset | Size | What It Gives You | Command |
|---------|------|-------------------|---------|
| **tulu-3-sft-mixture** | 940K | Math, code, safety, chat (BEST) | python model/train_tulu3.py |
| **OpenThoughts-114k** | 114K | Reasoning, chain-of-thought | python model/train_openthoughts.py |

**allenai/tulu-3-sft-mixture** is from Allen AI Tulu 3 - current SOTA open instruction-tuned model. Proven on Llama-3.1-8B: MMLU 53.5, GSM8K 79.9, HumanEval 76.8.

LoRA config from LoRA Without Regret (Schulman 2025): r=256, alpha=16, all-linear = matches full fine-tuning at 67% compute.

## Repository Structure

```
model/                     THE MODEL - train, serve, enhance
  train_tulu3.py             Primary: 940K best data (zero preprocessing)
  train_openthoughts.py      Reasoning: 114K CoT traces
  finetune_configurable.py   Multi-dataset configurable trainer
  rag_pipeline.py             RAG for DevSecOps knowledge
  DATASETS.md                 Why these datasets, proven recipes

deployment/               SERVE IT - Kubernetes + Docker + vLLM
  deployment.yaml             ML inference K8s manifest
  mlflow-deployment.yaml      Experiment tracking
  Dockerfile.ml-inference     Hardened multi-stage image

security/                 PROTECT IT - scanning + policies
  scanning/                   Trivy, Semgrep, Checkov, SBOM
  policies/                   Kyverno, OPA Gatekeeper

infrastructure/           RUN IT - Terraform + monitoring + CI/CD
  terraform/                  VPC, EKS, RDS, S3, IAM, KMS, GuardDuty, Macie
  monitoring/                 Prometheus, Alertmanager, OTEL, Grafana
  ci-cd/                      GitHub Actions DevSecOps pipeline

compliance/               CERTIFY IT - SOC2, NIST, CIS
  controls-mapping.yaml       SOC2 Type II
  nist-800-53-mapping.yaml    NIST 800-53 Rev5
  cis-eks-k8s.yaml            CIS Benchmarks
```

## Quick Commands

```bash
# Train on best data (A100, ~6h)
python model/train_tulu3.py

# Quick test (any GPU)
python model/train_tulu3.py --max_steps 100 --no_push

# Security scan
python security/scanning/security_audit.py

# Deploy model to K8s
kubectl apply -f deployment/deployment.yaml

# Infrastructure (Terraform)
cd infrastructure/terraform/environments/prod && terraform apply
```