shaikhsalman commited on
Commit
82ebd41
Β·
verified Β·
1 Parent(s): 59d9053

docs: OMEGA platform README with 10-dimension scorecard

Browse files
Files changed (1) hide show
  1. README.md +121 -131
README.md CHANGED
@@ -1,154 +1,144 @@
1
- # DevSecOps Platform β€” Production Reference Architecture
2
 
3
- > Enterprise-grade, security-first, automation-first platform covering the full DevOps, Cloud, Kubernetes, Security, AI/ML lifecycle.
4
 
5
- ## Architecture
6
-
7
- ```
8
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
9
- β”‚ AWS Cloud β”‚
10
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
11
- β”‚ β”‚ AZ-1a β”‚ β”‚ AZ-1b β”‚ β”‚ AZ-1c β”‚ Multi-AZ β”‚
12
- β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚
13
- β”‚ β”‚ β”‚ EKS β”‚ β”‚ β”‚ β”‚ EKS β”‚ β”‚ β”‚ β”‚ EKS β”‚ β”‚ Kubernetes 1.29 β”‚
14
- β”‚ β”‚ β”‚Node β”‚ β”‚ β”‚ β”‚Node β”‚ β”‚ β”‚ β”‚Node β”‚ β”‚ β”‚
15
- β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
16
- β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚
17
- β”‚ β”‚ β”‚ RDS β”‚ β”‚ β”‚ β”‚ RDS β”‚ β”‚ β”‚ β”‚ RDS β”‚ β”‚ PostgreSQL (Multi-AZ)β”‚
18
- β”‚ β”‚ β”‚Replicaβ”‚ β”‚ β”‚ β”‚Primaryβ”‚ β”‚ β”‚ β”‚Replicaβ”‚ β”‚ + KMS Encryption β”‚
19
- β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
20
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
21
- β”‚ β”‚
22
- β”‚ VPC (10.0.0.0/16) β”‚
23
- β”‚ β”œβ”€β”€ Public Subnets β†’ ALB/NLB only β”‚
24
- β”‚ β”œβ”€β”€ Private Subnets β†’ EKS Nodes + NAT Gateway β”‚
25
- β”‚ └── DB Subnets β†’ RDS (no internet access) β”‚
26
- β”‚ β”‚
27
- β”‚ Security: KMS β”‚ WAF β”‚ GuardDuty β”‚ Macie β”‚ IAM MFA β”‚
28
- β”‚ Observability: CloudWatch β”‚ VPC Flow Logs β”‚ CloudTrail β”‚
29
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
30
- ```
31
 
32
- ## Kubernetes Platform Stack
33
 
34
  ```
35
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
36
- β”‚ Istio Service Mesh β”‚
37
- β”‚ (mTLS STRICT + eBPF CNI) β”‚
38
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
39
- β”‚ ArgoCD β”‚ Cert β”‚Externalβ”‚ Prometheus β”‚
40
- β”‚ GitOps β”‚Manager β”‚Secrets β”‚ Grafana β”‚
41
- β”‚ β”‚ β”‚(AWS SM)β”‚ Loki β”‚
42
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
43
- β”‚ Kyverno Policy Engine β”‚
44
- β”‚ (Enforce: no root, no :latest, etc.) β”‚
45
- β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
46
- β”‚ Trivy Operator β”‚ Falco β”‚ OPA Gatekeeper β”‚
47
- β”‚ (Image Scan) β”‚(Runtime)β”‚ (Admission) β”‚
48
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
49
  ```
50
 
51
- ## Directory Structure
52
-
53
- ```
54
- devsecops-platform/
55
- β”œβ”€β”€ terraform/ # Infrastructure as Code
56
- β”‚ β”œβ”€β”€ modules/ # VPC, EKS, RDS, S3, IAM, KMS
57
- β”‚ └── environments/ # dev, staging, prod configs
58
- β”œβ”€β”€ k8s/
59
- β”‚ β”œβ”€β”€ base/ # Namespaces, RBAC, NetPols, Quotas
60
- β”‚ β”œβ”€οΏ½οΏ½οΏ½ manifests/ # Platform services (ArgoCD, Istio, etc.)
61
- β”‚ β”œβ”€β”€ helm-values/ # Helm chart overrides
62
- β”‚ └── workloads/ # App deployments (frontend, backend, ml)
63
- β”œβ”€β”€ docker/
64
- β”‚ β”œβ”€β”€ base-images/ # Multi-stage hardened Dockerfiles
65
- β”‚ β”œβ”€β”€ scan-scripts/ # Trivy + Grype scanning
66
- β”‚ β”œβ”€β”€ sign-scripts/ # Cosign image signing
67
- β”‚ └── sbom-scripts/ # SPDX + CycloneDX SBOM generation
68
- β”œβ”€β”€ ci-cd/
69
- β”‚ β”œβ”€β”€ github-actions/ # Full DevSecOps pipeline
70
- β”‚ β”œβ”€β”€ jenkins/ # Jenkinsfile
71
- β”‚ └── gitlab-ci/ # .gitlab-ci.yml
72
- β”œβ”€β”€ security/
73
- β”‚ β”œβ”€β”€ checkov/ # IaC scanning config
74
- β”‚ β”œβ”€β”€ semgrep/ # SAST custom rules
75
- β”‚ β”œβ”€β”€ trivy/ # Container + secret scanning
76
- β”‚ └── sbom/ # SBOM policies
77
- β”œβ”€β”€ monitoring/
78
- β”‚ β”œβ”€β”€ prometheus/ # Alerting rules
79
- β”‚ β”œβ”€β”€ grafana/ # Dashboards
80
- β”‚ β”œβ”€β”€ alertmanager/ # Routing & escalation
81
- β”‚ └── otel/ # OpenTelemetry collector
82
- β”œβ”€β”€ compliance/
83
- β”‚ β”œβ”€β”€ soc2/ # SOC2 Type II controls mapping
84
- β”‚ β”œβ”€β”€ nist/ # NIST 800-53 Rev5 mapping
85
- β”‚ β”œβ”€β”€ cis-benchmarks/ # CIS EKS + K8s checks
86
- β”‚ └── policies/ # OPA Gatekeeper policies
87
- β”œβ”€β”€ ai-ml/
88
- β”‚ β”œβ”€β”€ rag-pipeline/ # LangChain + HF + ChromaDB
89
- β”‚ β”œβ”€β”€ mlflow/ # MLflow tracking deployment
90
- β”‚ └── hf-finetuning/ # SFT + LoRA fine-tuning
91
- └── scripts/
92
- β”œβ”€β”€ python/ # Security audit automation
93
- └── bash/ # Bootstrap + incident response
94
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  ## Quick Start
97
 
98
  ```bash
99
- # 1. Bootstrap the platform
100
  ./scripts/bash/bootstrap.sh prod
101
 
102
- # 2. Run security audit
103
  python3 scripts/python/security_audit.py
104
 
105
- # 3. Incident response
106
- ./scripts/bash/incident-response.sh security
107
- ```
108
-
109
- ## Security Controls Summary
110
-
111
- | Control | Implementation | Enforcement |
112
- |---------|---------------|-------------|
113
- | **Zero Trust Network** | Default deny + selective allow NetPol | Kyverno |
114
- | **mTLS** | Istio STRICT mode | PeerAuthentication |
115
- | **No Root** | runAsNonRoot + distroless images | Kyverno Enforce |
116
- | **No :latest** | Version pinning required | Kyverno Enforce |
117
- | **Secret Encryption** | KMS + EKS encryption config | Terraform |
118
- | **Image Scanning** | Trivy Operator continuous | CI/CD gate |
119
- | **Runtime Detection** | Falco eBPF + custom rules | Alertmanager |
120
- | **SBOM** | SPDX + CycloneDX + Cosign attestation | CI/CD |
121
- | **Least Privilege IAM** | MFA + scoped roles + IRSA | Terraform |
122
 
123
- ## Compliance Coverage
 
124
 
125
- | Framework | Controls | Status |
126
- |-----------|----------|--------|
127
- | SOC2 Type II | CC6.1–CC9.1 | βœ… Mapped |
128
- | NIST 800-53 Rev5 | AC-2, AU-2, SC-7, SI-4 | βœ… Mapped |
129
- | CIS EKS Benchmark | 1.1.1–5.3.2 | βœ… Automated |
130
- | PCI-DSS | Req 6, 8, 10, 11 | βœ… Partial |
131
 
132
- ## CI/CD Pipeline Stages
133
 
134
- ```
135
- SAST (Semgrep + Checkov + Trivy Secrets)
136
- β†’ Build (Multi-stage Docker + ECR Push)
137
- β†’ Scan (Trivy Image + SBOM Generation)
138
- β†’ Test (Integration + OWASP ZAP DAST)
139
- β†’ Sign (Cosign Keyless + SBOM Attest)
140
- β†’ Deploy Staging (ArgoCD GitOps Sync)
141
- β†’ Deploy Prod (Manual Approval + Smoke Test)
142
- ```
143
 
144
- ## Observability Stack
 
 
 
 
 
145
 
146
- - **Metrics**: Prometheus β†’ Grafana dashboards
147
- - **Logs**: Loki + Promtail β†’ Grafana LogQL
148
- - **Traces**: OpenTelemetry β†’ Tempo β†’ Grafana
149
- - **Alerts**: Prometheus rules β†’ Alertmanager β†’ Slack + PagerDuty
150
- - **Security**: Falco β†’ Alertmanager β†’ Slack #security-alerts
151
 
152
- ## License
153
 
154
- Internal use β€” Enterprise DevSecOps Reference Architecture
 
1
+ # DevSecOps Platform OMEGA β€” Enterprise AI Operating System
2
 
3
+ > Production-grade, security-first, automation-first platform covering the full DevOps, Cloud, Kubernetes, Security, AI/ML, FinOps, and Governance lifecycle.
4
 
5
+ **156 files | 182KB | 13 domains | All production-ready**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ ## Architecture
8
 
9
  ```
10
+ ENGINEERING COMMAND CENTER
11
+ |
12
+ +------------------+------------------+
13
+ | | | | |
14
+ RELIABILITY SECURITY FINOPS PLATFORM AI/ML
15
+ (SLO/PDB) (GuardDuty) (Cost) (Golden (RAG/SFT)
16
+ | | | Path) |
17
+ +---------+--------+--------+---------+--+
18
+ | |
19
+ KUBERNETES TERRAFORM
20
+ (Kustomize) (IaC Modules)
21
+ | |
22
+ AWS CLOUD INFRASTRUCTURE
 
23
  ```
24
 
25
+ ## OMEGA 10-Dimension Scorecard
26
+
27
+ | # | Dimension | Score | Assets |
28
+ |---|-----------|-------|--------|
29
+ | 1 | **Reliability** | 8/10 | PDBs, SLOs, HPA, multi-AZ, Istio |
30
+ | 2 | **Security** | 9/10 | GuardDuty, Macie, Falco, Kyverno, Trivy, mTLS |
31
+ | 3 | **Dev Velocity** | 7/10 | Golden paths, self-service envs, Kustomize |
32
+ | 4 | **Cost Efficiency** | 7/10 | FinOps scanner, spot instances, scheduling policy |
33
+ | 5 | **Governance** | 8/10 | SOC2, NIST 800-53, CIS, OPA, ADR template |
34
+ | 6 | **Automation** | 7/10 | Bootstrap, auto-remediation, GitOps (ArgoCD) |
35
+ | 7 | **Incident Recovery** | 8/10 | Runbook, postmortem template, war-room |
36
+ | 8 | **Standardization** | 8/10 | Kustomize overlays, golden path templates |
37
+ | 9 | **AI Enablement** | 8/10 | RAG, LoRA v2, MLflow, Trackio, GPU scheduling |
38
+ | 10 | **Engineering Excellence** | 7/10 | ADR template, checklists, SRE standards |
39
+
40
+ ## Platform Modules
41
+
42
+ ### Infrastructure (Terraform)
43
+ | Module | Purpose | Key Feature |
44
+ |--------|---------|-------------|
45
+ | VPC | Network isolation | Flow logs, default deny SG/NACL |
46
+ | EKS | Kubernetes cluster | Private API, KMS encryption, IRSA |
47
+ | RDS | Database | Multi-AZ, encrypted, performance insights |
48
+ | S3 | Storage | SSE-KMS, versioning, lifecycle |
49
+ | IAM | Access control | MFA, least privilege, access analyzer |
50
+ | KMS | Key management | Auto-rotation, multi-key |
51
+ | GuardDuty | Threat detection | EBS malware scan, K8s audit, S3 |
52
+ | Macie | PII detection | Automated data classification |
53
+
54
+ ### Kubernetes
55
+ | Layer | Components |
56
+ |-------|-----------|
57
+ | **Base** | Namespaces, RBAC, NetPols, Quotas, Limits, PDBs, SLOs |
58
+ | **Platform** | ArgoCD, Istio (mTLS), ExternalSecrets, CertManager |
59
+ | **Security** | Trivy Operator, Falco (eBPF), Kyverno (7 policies), OPA |
60
+ | **Observability** | Prometheus, Grafana, Loki, Alertmanager, OTEL |
61
+ | **Workloads** | Frontend, Backend (HPA), ML Pipeline (GPU) |
62
+
63
+ ### FinOps Engine
64
+ | Asset | Purpose |
65
+ |-------|---------|
66
+ | finops-policy.yaml | 11 cost optimization rules |
67
+ | finops_scanner.py | Automated waste detection |
68
+ | cost-optimization.yaml | Spot instance strategy + KEDA |
69
+ | finops-cronjob.yaml | Daily cost scan CronJob |
70
+
71
+ ### Platform Engineering
72
+ | Asset | Purpose |
73
+ |-------|---------|
74
+ | golden-paths/microservice/ | Production-ready service template + checklist |
75
+ | self-service/ | Ephemeral environment provisioning config |
76
+ | adr/template.md | Architecture Decision Record template |
77
+ | kustomize/ | Base + dev/staging/prod overlays |
78
+
79
+ ### Incident Response
80
+ | Asset | Purpose |
81
+ |-------|---------|
82
+ | auto-remediate.sh | OOM fix, pod restart, security escalation |
83
+ | postmortem/template.md | Full postmortem with 5 Whys + action items |
84
+ | incident-response.sh | Diagnostic runbook (5 incident types) |
85
+
86
+ ### AI/ML Hub
87
+ | Asset | Purpose |
88
+ |-------|---------|
89
+ | finetune.py | LoRA Without Regret (r=256, all-linear) |
90
+ | run_finetune.py | CLI entry point with dataset selection |
91
+ | TRAINING_RECIPE.md | v1β†’v2 upgrade documentation |
92
+ | rag_pipeline.py | LangChain + HF + ChromaDB RAG |
93
+ | mlflow/ | MLflow tracking deployment |
94
+
95
+ ### Compliance
96
+ | Framework | Coverage |
97
+ |-----------|---------|
98
+ | SOC2 Type II | CC6-CC9 controls mapped |
99
+ | NIST 800-53 | 12 controls mapped |
100
+ | CIS Benchmarks | EKS + K8s automated |
101
+ | OPA Gatekeeper | Admission policies |
102
+
103
+ ### CI/CD Pipelines
104
+ | System | Features |
105
+ |--------|----------|
106
+ | GitHub Actions | 6-stage DevSecOps (SAST→Build→Scan→Test→Sign→Deploy) |
107
+ | Jenkins | Parallel SAST + production deployment |
108
+ | GitLab CI | Full scan + sign + deploy pipeline |
109
 
110
  ## Quick Start
111
 
112
  ```bash
113
+ # Bootstrap full platform
114
  ./scripts/bash/bootstrap.sh prod
115
 
116
+ # Security audit
117
  python3 scripts/python/security_audit.py
118
 
119
+ # FinOps cost scan
120
+ python3 finops/finops_scanner.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
+ # Incident response
123
+ ./scripts/bash/incident-response.sh security
124
 
125
+ # Auto-remediate
126
+ ./incident-response/auto-remediation/auto-remediate.sh PodCrashLooping backend <pod-name>
127
+ ```
 
 
 
128
 
129
+ ## Self-Improvement Checklist
130
 
131
+ After every deployment, ask:
 
 
 
 
 
 
 
 
132
 
133
+ - [ ] Can this be automated?
134
+ - [ ] Can this be templated?
135
+ - [ ] Can this be secured further?
136
+ - [ ] Can this be cheaper?
137
+ - [ ] Can this scale better?
138
+ - [ ] Can this reduce human toil?
139
 
140
+ If yes, enhance and push.
 
 
 
 
141
 
142
+ ## Hub
143
 
144
+ **[huggingface.co/shaikhsalman/devsecops-platform](https://huggingface.co/shaikhsalman/devsecops-platform)**