cardio-risk-rf

Production-grade tabular cardiovascular-disease classifier on the sulianova Cardiovascular Disease Dataset (70 000 patients, 11 clinical features, balanced 50/50 target cardio). Main artefact is cardio_risk_lgbm.joblib (LightGBM, Optuna-tuned, native NaN handling); baseline cardio_risk_rf.joblib (RandomForest + median imputer).

Metrics (held-out test, n=10501)

Metric Value
main_model LightGBM
main_roc_auc 79.8%
main_pr_auc 78.1%
main_f1 71.9%
main_brier 0.1824
baseline_model RandomForest
baseline_roc_auc 79.5%
baseline_pr_auc 77.9%
baseline_f1 70.8%
baseline_brier 0.1837
test_size 10501
positive_rate 50.0%
threshold 0.5

Usage

from huggingface_hub import hf_hub_download
import joblib, pandas as pd

path = hf_hub_download(repo_id="kiselyovd/cardio-risk-rf", filename="cardio_risk_lgbm.joblib")
model = joblib.load(path)
x = pd.DataFrame([{"age": 56.0, "gender": 1.0, "height": 152.0, "weight": 72.0, "ap_hi": 160.0, "ap_lo": 90.0, "cholesterol": 3.0, "gluc": 1.0, "smoke": 0.0, "alco": 0.0, "active": 1.0}])
print(model.predict_proba(x))

Feature order (11)

age (years), gender (1=female, 2=male), height (cm), weight (kg), ap_hi (systolic BP mmHg), ap_lo (diastolic BP mmHg), cholesterol (1=normal, 2=above, 3=well-above), gluc (1-3 same scale), smoke (0/1), alco (0/1), active (0/1). Any field may be null โ€” LightGBM handles NaN natively; the RandomForest pipeline imputes with the training-set median. Input is coerced to float at serve-time.

Top SHAP drivers (global, on val)

  1. ap_hi (systolic blood pressure) โ€” dominant
  2. age
  3. cholesterol
  4. weight
  5. ap_lo

Intended use

Educational artifact demonstrating a production-grade tabular ML pipeline (LightGBM + SHAP + FastAPI + Docker + HF Hub). Not a medical device. Do not use for clinical decisions. The cardio target is cross-sectional (presence at examination), not a prospective 10-year risk.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results

  • roc_auc on sulianova Cardiovascular Disease Dataset
    self-reported
    79.8%
  • pr_auc on sulianova Cardiovascular Disease Dataset
    self-reported
    78.1%
  • f1 on sulianova Cardiovascular Disease Dataset
    self-reported
    71.9%
  • brier on sulianova Cardiovascular Disease Dataset
    self-reported
    0.182