cardio-risk-rf
Production-grade tabular cardiovascular-disease classifier on the sulianova Cardiovascular Disease Dataset (70 000 patients, 11 clinical features, balanced 50/50 target cardio). Main artefact is cardio_risk_lgbm.joblib (LightGBM, Optuna-tuned, native NaN handling); baseline cardio_risk_rf.joblib (RandomForest + median imputer).
Metrics (held-out test, n=10501)
| Metric | Value |
|---|---|
| main_model | LightGBM |
| main_roc_auc | 79.8% |
| main_pr_auc | 78.1% |
| main_f1 | 71.9% |
| main_brier | 0.1824 |
| baseline_model | RandomForest |
| baseline_roc_auc | 79.5% |
| baseline_pr_auc | 77.9% |
| baseline_f1 | 70.8% |
| baseline_brier | 0.1837 |
| test_size | 10501 |
| positive_rate | 50.0% |
| threshold | 0.5 |
Usage
from huggingface_hub import hf_hub_download
import joblib, pandas as pd
path = hf_hub_download(repo_id="kiselyovd/cardio-risk-rf", filename="cardio_risk_lgbm.joblib")
model = joblib.load(path)
x = pd.DataFrame([{"age": 56.0, "gender": 1.0, "height": 152.0, "weight": 72.0, "ap_hi": 160.0, "ap_lo": 90.0, "cholesterol": 3.0, "gluc": 1.0, "smoke": 0.0, "alco": 0.0, "active": 1.0}])
print(model.predict_proba(x))
Feature order (11)
age (years), gender (1=female, 2=male), height (cm), weight (kg), ap_hi (systolic BP mmHg), ap_lo (diastolic BP mmHg), cholesterol (1=normal, 2=above, 3=well-above), gluc (1-3 same scale), smoke (0/1), alco (0/1), active (0/1).
Any field may be null โ LightGBM handles NaN natively; the RandomForest pipeline imputes with the training-set median. Input is coerced to float at serve-time.
Top SHAP drivers (global, on val)
ap_hi(systolic blood pressure) โ dominantagecholesterolweightap_lo
Intended use
Educational artifact demonstrating a production-grade tabular ML pipeline (LightGBM + SHAP + FastAPI + Docker + HF Hub). Not a medical device. Do not use for clinical decisions. The cardio target is cross-sectional (presence at examination), not a prospective 10-year risk.
- Downloads last month
- -
Evaluation results
- roc_auc on sulianova Cardiovascular Disease Datasetself-reported79.8%
- pr_auc on sulianova Cardiovascular Disease Datasetself-reported78.1%
- f1 on sulianova Cardiovascular Disease Datasetself-reported71.9%
- brier on sulianova Cardiovascular Disease Datasetself-reported0.182