Probability of coronary artery disease

Predicted likelihood that a coronary angiogram would show ≥ 50% blockage in at least one major artery, based on the inputs below.

–

awaiting input

0

100

Awaiting input

Adjust the inputs below and the score updates live.

Low Moderate High Very high

–

Clinical context 10-yr general-CVD risk (D'Agostino 2008)

–%

Population benchmark Adjust inputs to compute.

–

Headline model

Model agreement awaiting input

–

All models how this works ›

Adjust inputs to see all four models live.

Top reasons for this score raises lowers

Adjust the inputs below to see what's driving this estimate.

See all 13 reasons →

Patient History

Saved locally in this browser, never sent to a server. Click a row to reload that patient.

No saved patients yet. Click Save on the dashboard to add one.

What-If Simulator

Hold every other feature constant and vary one. See how risk changes across all four models.

Vary feature

Current value: –

Cross-Model Feature Importance

Mean |SHAP| per feature, evaluated on the held-out test set. Higher bar = more influential overall.

5-Fold Cross-Validation Detail

Per-metric mean ± standard deviation across folds.

Care near you

We can show nearest cardiology clinics, hospitals, and emergency facilities. Your location is requested only when you opt in and never sent to our server.

🌐

Region

Search a city below or use your location.

Try:

Type a city above to load the map

Type a city, postal code, or address above to load nearby cardiology, hospital, and urgent-care facilities from OpenStreetMap.

If symptoms feel urgent

Call your local emergency number

Chest pain, sudden shortness of breath, weakness on one side, or difficulty speaking can all be heart-attack or stroke warning signs. Don't wait, call now.

Call 911

Number adjusts to your detected region. UK: 999, EU: 112, US: 911. If unsure, dial your country's emergency number directly.

Trusted heart-health resources

Country-specific links surface based on your detected region.

Cardiovascular disease in your region

Population-level burden: context for what your individual risk score means.

Sources: WHO Global Health Observatory · CDC · ESC.

Lifestyle guides

Curated, evidence-based resources from major cardiovascular societies.

Heart-healthy diet

DASH, Mediterranean, and AHA guidelines for what to eat (and avoid).

Exercise & activity

Cardio, strength, and flexibility prescriptions backed by AHA recommendations.

Blood-pressure mastery

CDC's complete guide to monitoring, lowering, and tracking BP at home.

Cholesterol explained

LDL, HDL, triglycerides, statins: what really matters and what to ask your doctor.

Quit smoking: proven plans

Smoking is the single biggest reversible cardiovascular risk factor. Free programs that work.

Stress & sleep

Chronic stress and poor sleep raise cardiovascular risk. Evidence-based techniques to reset both.

Privacy-preserving training

Federated Learning Simulation

Three simulated hospitals each trained the model on their own data, then shared only the learned updates with a central server. No raw patient data was ever shared.

–

Centralized AUC

–

Federated (IID) AUC

–

Federated (non-IID) AUC

–

Communication rounds

Split:

Privacy payload In federated mode, no raw patient records were shared. Only model weights were sent each round, unlike centralized training, where all training data would be pooled in one place.

Model Card

Rheocor v3.0: Cardiovascular Risk Ensemble

Released June 2026 License Research and educational use Author Aryav Kaushik

A four-model ensemble (Logistic Regression, Random Forest, XGBoost, and a small MLP) trained on the UCI Cleveland Heart Disease dataset, with cross-validated confidence intervals, multi-method SHAP/LIME/Gradient explainability, federated-learning compatibility (FedAvg), and temporal risk projection.

1. Intended use

RheocorAI is intended as a research and educational tool to demonstrate explainable cardiovascular ML. It is not intended for clinical decision-making, and is not a regulated medical device.

Primary users. ML researchers, clinical-informatics students, public-health educators.
Out-of-scope uses. Triage, diagnosis, treatment planning, screening, or counselling of identified patients.

2. Dataset

Source. UCI Heart Disease dataset, Cleveland subset (Detrano et al., 1989; Janosi et al., 1988).
Size. 13 standardized clinical features. Class balance ≈ 46% positive (any disease) / 54% negative (no disease).
Features. Age, sex, chest-pain type, resting BP, cholesterol, fasting blood sugar > 120, resting ECG, peak heart rate, exercise angina, ST depression (oldpeak), ST slope, fluoroscopy vessels, thallium stress test.
Splits. 80/20 stratified train/test split (seed 42; test set touched once, at evaluation), plus 5-fold stratified CV over the full dataset for the headline metrics.
Pre-processing. Split first; the 6 missing cells imputed with training-set medians; StandardScaler fit on training rows only; binary target collapsed from the 0–4 severity scale. Cross-validation re-fits imputation and scaling inside every fold.

3. Model architectures

Logistic Regressionscikit-learn LR · max_iter=2000 · L2-regularized.

Random Forest300 trees · max_depth=6 · bootstrap.

XGBoost400 boosters · max_depth=4 · lr=0.05 · subsample 0.9.

Neural NetworkPyTorch MLP · 13→32→16→1 · ReLU · 30/20% dropout · Adam.

4. Performance: 5-fold cross-validation (full dataset)

Federated learning is a single-machine simulation (FedAvg, 3 simulated sites × 25 rounds × 5 local epochs): the aggregated model converges to within noise of the centrally-trained baseline on the 61-row test split. With n=61, differences of ±0.01–0.02 AUC between the federated and centralized runs are not meaningful. The result demonstrates convergence, not superiority.

5. Explainability methods

SHAP. Tree (RF, XGBoost), Linear (LR), Kernel (NN). Per-prediction and global mean |SHAP|.
LIME. Local linear surrogate at every prediction (600 perturbation samples).
Gradient × Input. Vanilla saliency for the neural net (sigmoid-output gradient).
Comparison. Spearman rank correlation between SHAP and LIME, top-3 feature overlap.

6. Known limitations & biases

Cohort bias. Cleveland subset is from a single US clinical site (Cleveland Clinic, 1980s). Performance on other populations is uncharacterized.
Sex imbalance. ~68% male / ~32% female; estimates for women are derived from a smaller sample.
Missing modalities. No HDL, LDL, smoking status, family history, or BMI, all features known to be cardiovascularly important. The Framingham comparison panel uses dataset proxies and documented assumptions.
Reverse causality. Some features (oldpeak, exercise angina) are themselves disease manifestations, so attributions partly reflect "disease present today" rather than long-term risk drivers.
Out-of-distribution risk. Sliders allow values outside the training range; predictions in those zones are extrapolations.
Temporal model. Cleveland labels are prevalent, not incident, so the temporal projection is age-conditional risk extrapolation, not true survival modeling.

7. Citation

Generate a properly-formatted citation in your preferred style.

Example (APA, 7th ed.):

Kaushik, A. (2026). Rheocor v3.0: An explainable cardiovascular risk
ensemble [Computer software]. https://github.com/aryavkaushik/rheocor

Click a button above to copy this in APA, MLA, or BibTeX format.

Companion paper

Kaushik, A. (2026). Rheocor v3.0: Technical Summary. Available at /static/rheocorpaper.pdf.

Download paper

Research paper

Rheocor v3.0: Technical Summary

Author Aryav Kaushik Format PDF Updated June 2026

Download PDF Open in new tab

PDF may appear inverted in dark mode; use Open in new tab for the best experience.

Meet the developer

Built by Aryav Kaushik

Basis Independent Silicon Valley c/o 2027 · Building what cardiology doesn't have yet.

This project started with a family diagnosis in India. It became something bigger.

The story

Cardiovascular disease is the leading cause of death worldwide. Every year, it takes nearly 20 million lives globally, including about 900,000 people in the United States. It affects people across race, ethnicity, gender, and background, yet early detection and intervention still fail too many people.

I built Rheocor because I wanted anyone, anywhere, to have a simple way to better understand their heart disease risk from home. The platform gives a risk score, explains which factors matter most, and makes the result easier to understand.

Over sixteen months, I cleaned and engineered the UCI Heart Disease Dataset, trained four machine learning models, and built explanation tools around them, including SHAP values, confidence intervals, and Framingham comparisons.

I make no money from Rheocor. I built it because I care about making cardiovascular risk easier to understand before it becomes an emergency. If this site helps even one person take their heart health more seriously, it matters. Please share Rheocor with anyone who could benefit from it.

Connect on LinkedIn

Technical philosophy

I built Rheocor because heart risk should be easier to understand. Too many tools give a score without explaining what caused it, which makes the result hard to trust.

Rheocor shows the factors behind each prediction, how much each one mattered, and how the result compares with the Framingham Risk Score. I also tested the models across five cross-validation folds so the results were more consistent and reliable.

The goal is to make the model's reasoning visible, so patients and clinicians can understand the prediction instead of just accepting a number.

Contact

For research inquiries or collaboration:

aryavkaushk2009@gmail.com

RheocorAI helps you understand your heart risk now.

I built Rheocor using four machine-learning models trained on the UCI Heart Disease dataset. The platform lets users explore each prediction, see which factors shaped the result, and understand the risk score more clearly.

4

ML models

5×

CV folds

13

Clinical features

–

Best CV AUC

Keyboard shortcuts 1234 switch views Rreset Ssave PPDF ?help

Topic

Why Rheocor uses four models

Heart risk is complex, so Rheocor looks at each patient through four different models. When the models agree, the result is more reassuring. When they disagree, the platform shows that clearly instead of hiding it.

Logistic Regression

This is the simplest model in the group. It looks at each health factor one by one and gives a steady, easy-to-understand baseline.

Random Forest

This model uses 300 decision trees that each look at the data in a slightly different way. Together, they help catch patterns that a simpler model might miss.

XGBoost

This model learns by correcting its own mistakes over time. It is especially useful for structured medical data like the UCI Heart Disease dataset.

Neural Network

This model adds a more flexible approach. It can pick up more complex patterns between features, while dropout helps reduce overfitting.

Methodology & validation

5-fold stratified cross-validation: every patient appears in a held-out fold exactly once, and each fold imputes and scales with its own training rows only, so the reported metrics aren't an artifact of one lucky split or of preprocessing leakage.
Per-prediction stability range: your patient is run through all five fold-models of each architecture; the min–max spread shows how much the prediction depends on the training sample. A wide range means treat with caution.
Bootstrap CIs and DeLong tests on the held-out test set (see results/metrics.csv and results/delong_tests.csv): the single-split AUC differences between the four models are not statistically significant.
Model agreement tells you when the four models reach the same conclusion (high trust) versus when they split (low trust).
SHAP explanations attribute the score to individual features so you can always answer "why did the model say that?"
Data-completeness indicator warns you when a sparse input set is degrading reliability: fewer inputs, less confidence.

The dataset

RheocorAI is trained on the Cleveland subset of the UCI Heart Disease dataset, with 13 standardized clinical features (age, blood pressure, cholesterol, exercise-induced ECG changes, and more). It's a benchmark used in cardiovascular ML research since 1988.

What you can do here

The What-If Simulator holds every other feature constant and sweeps one across its full range, showing how cholesterol, blood pressure, or max heart rate changes the score across all four models.
Patient History remembers cases you've saved (in your browser only, nothing leaves your device) so you can compare them side by side.
The Compare Models view stacks per-feature SHAP importance head-to-head across architectures and shows the full CV-metrics table.
Live updates: every slider movement re-runs all four models. SHAP recomputes the moment you stop adjusting.
One-click PDF generates a structured patient report: scores, CIs, agreement, top SHAP contributions, full feature list.

How to read the score

0–24% Low: risk consistent with healthy controls in this dataset.

25–49% Moderate: borderline; consider lifestyle factors and routine follow-up.

50–74% High: the dashboard surfaces a "see a doctor" prompt at this point.

75–100% Very high: strong model evidence. The alert escalates to "seek medical advice promptly".

These bands are interpretive aids only. The model produces a continuous probability; the bands exist to make the result legible at a glance.

Glossary: what every term means

Skim this if any of the inputs or readouts feel cryptic.

Risk score: The probability (0–100%) that this patient has cardiovascular disease, according to the selected model.
Model stability range: The spread of predictions from five models trained on different 80% subsets of the data (the cross-validation folds). A wide range means the prediction depends heavily on which patients the model happened to see, so treat it with more caution. It is a stability indicator, not a formal confidence interval.
Model agreement: Whether the four models reach the same conclusion at a 50% threshold. Strong agreement is more trustworthy than a split.
SHAP value: How much one feature changed this prediction. Positive raises risk, negative lowers it. Bigger absolute value = bigger effect.
Cross-validation (CV): Splitting the data into 5 folds and training/testing 5 times so reported accuracy isn't an artifact of one lucky split.
ROC-AUC: A model's ability to rank patients with disease above patients without: 1.0 is perfect, 0.5 is random.
Angina: Chest pain or pressure caused by reduced blood flow to the heart muscle.
ECG / ST segment: An electrocardiogram records the heart's electrical activity. The "ST segment" is a specific part of each beat that changes when the heart muscle is short on oxygen.
Oldpeak: How far the ECG's ST segment drops during exercise compared to rest. Bigger drops are more concerning.
Fluoroscopy: An X-ray test where contrast dye is injected to see how blood flows through the coronary arteries.
Thallium stress test: A scan that uses a radioactive tracer to show which parts of the heart are getting good blood flow during exercise.
LV hypertrophy: Thickening of the heart's main pumping chamber (left ventricle), usually a sign of long-standing high blood pressure or strain.

Tech stack

Models
scikit-learn · XGBoost · PyTorch

Explainability
SHAP (Tree, Linear, Kernel)

Backend
Flask + NumPy

Frontend
Vanilla JS · Chart.js · jsPDF

Important disclaimer

RheocorAI is a research and educational tool.

Do not use RheocorAI to diagnose, treat, or counsel patients. If you have any concerns about your heart health, please consult a qualified healthcare provider. For chest pain, sudden shortness of breath, or weakness on one side, call your local emergency number immediately.