MediAI Pro - Healthcare Analytics
AI-powered real-time data analytics for predictive medicine, operational optimization, and patient insights
Data Upload Requirements
Before analyzing your healthcare data, please ensure it meets the following standards:
Supported Formats
CSV, JSON, Excel (XLSX), HL7 FHIR, DICOM metadata
Data Structure
Structured data with clear column headers. Each row should represent a unique patient/encounter.
Minimum Fields
Patient ID, Age, Gender, Diagnosis codes, Admission/Discharge dates
File Size
Up to 100MB for real-time analysis. Larger datasets may require batch processing.
Data Privacy
All data is encrypted in transit and at rest. PHI should be de-identified when possible.
Recommended Fields
Lab results, medications, vitals, prior admissions, social determinants
Patient Readmission Risk
Chronic Disease Patients
Preventive Care Adherence
Operational Efficiency
Data Analysis Center
import pandas as pd import matplotlib.pyplot as plt # Load healthcare data with proper encoding df = pd.read_csv('patient_data.csv', encoding='utf-8') # Handle missing data df.fillna({ 'age': df['age'].median(), 'blood_pressure': df['blood_pressure'].mean(), 'cholesterol': df['cholesterol'].median() }, inplace=True) # Basic statistics with percentiles print(df.describe(percentiles=[.25, .5, .75, .9])) # Visualize readmission risk distribution plt.figure(figsize=(10, 6)) plt.hist(df['readmission_risk'], bins=20, color='skyblue', edgecolor='black') plt.title('Patient Readmission Risk Distribution', fontsize=14) plt.xlabel('Risk Score', fontsize=12) plt.ylabel('Count', fontsize=12) plt.grid(axis='y', alpha=0.3) plt.show()
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import (accuracy_score, roc_auc_score, classification_report, confusion_matrix) from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline # Prepare data with feature selection features = ['age', 'bmi', 'blood_pressure', 'cholesterol', 'prior_admissions', 'medication_count'] target = 'readmission_flag' X = df[features] y = df[target] # Split into train/test with stratification X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y) # Create pipeline with scaling and model model = make_pipeline( StandardScaler(), RandomForestClassifier(n_estimators=200, max_depth=5, class_weight='balanced', random_state=42) ) # Train model model.fit(X_train, y_train) # Evaluate model y_pred = model.predict(X_test) y_proba = model.predict_proba(X_test)[:, 1] print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}") print(f"AUC Score: {roc_auc_score(y_test, y_proba):.3f}") print("\nClassification Report:") print(classification_report(y_test, y_pred))
import statsmodels.api as sm from statsmodels.stats.outliers_influence import variance_inflation_factor # Prepare data for logistic regression X = df[['age', 'bmi', 'blood_pressure', 'cholesterol', 'prior_admissions', 'medication_count']] X = sm.add_constant(X) # Add intercept y = df['readmission_flag'] # Check for multicollinearity vif_data = pd.DataFrame() vif_data["feature"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(len(X.columns))] print("VIF Analysis:\n", vif_data) # Fit logistic regression model model = sm.Logit(y, X).fit(disp=0) # Print summary with odds ratios print(model.summary()) print("\nOdds Ratios:") print(np.exp(model.params))
# Load healthcare data with proper NA handling df <- read.csv("patient_data.csv", na.strings = c("NA", "", " ")) # Install and load necessary packages if (!require("tidyverse")) install.packages("tidyverse") if (!require("caret")) install.packages("caret") library(tidyverse) library(caret) # Data cleaning and feature engineering df_clean <- df %>% mutate( age = ifelse(is.na(age), median(age, na.rm = TRUE), age), bmi = ifelse(is.na(bmi), median(bmi, na.rm = TRUE), bmi), risk_category = cut(readmission_risk, breaks = c(0, 0.1, 0.3, 1), labels = c("Low", "Medium", "High")) ) # Descriptive statistics summary(df_clean) psych::describe(df_clean %>% select(age, bmi, blood_pressure, cholesterol)) # Visualize readmission risk with ggplot2 ggplot(df_clean, aes(x = readmission_risk, fill = risk_category)) + geom_histogram(bins = 30, alpha = 0.8, color = "white") + scale_fill_manual(values = c("#28a745", "#ffc107", "#dc3545")) + labs(title = "Patient Readmission Risk Distribution", x = "Risk Score", y = "Count") + theme_minimal() # Predictive modeling with caret set.seed(42) train_index <- createDataPartition(df_clean$readmission_flag, p = 0.8, list = FALSE) train <- df_clean[train_index, ] test <- df_clean[-train_index, ] # Train random forest with cross-validation ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE, summaryFunction = twoClassSummary) model <- train(readmission_flag ~ age + bmi + blood_pressure + cholesterol + prior_admissions + medication_count, data = train, method = "rf", trControl = ctrl, metric = "ROC", tuneLength = 3) # Evaluate model predictions <- predict(model, newdata = test) confusionMatrix(predictions, test$readmission_flag)
-- Create optimized tables with proper indexing CREATE TABLE patients ( patient_id VARCHAR(36) PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), age INT, gender VARCHAR(10), admission_date TIMESTAMP, discharge_date TIMESTAMP, readmission_risk DECIMAL(5,4), INDEX idx_readmission_risk (readmission_risk), INDEX idx_discharge_date (discharge_date) ); -- Query patient readmission rates with window functions WITH readmission_stats AS ( SELECT department, COUNT(*) AS total_patients, SUM(readmission_flag) AS readmissions, ROUND(AVG(readmission_risk), 3) AS avg_risk, ROUND(SUM(readmission_flag) * 100.0 / COUNT(*), 2) AS readmission_rate, RANK() OVER (ORDER BY SUM(readmission_flag) * 100.0 / COUNT(*) DESC) AS rank FROM patients WHERE discharge_date BETWEEN DATE_SUB(NOW(), INTERVAL 1 YEAR) AND NOW() GROUP BY department ) SELECT department, total_patients, readmissions, avg_risk, readmission_rate FROM readmission_stats ORDER BY readmission_rate DESC LIMIT 10; -- Find high-risk patients with complex conditions SELECT p.patient_id, CONCAT(p.first_name, ' ', p.last_name) AS patient_name, p.age, p.readmission_risk, COUNT(d.diagnosis_id) AS diagnosis_count, GROUP_CONCAT(d.diagnosis_code SEPARATOR ', ') AS diagnoses FROM patients p JOIN patient_diagnoses d ON p.patient_id = d.patient_id WHERE p.readmission_risk > 0.3 AND p.discharge_date BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW() GROUP BY p.patient_id, p.first_name, p.last_name, p.age, p.readmission_risk HAVING COUNT(d.diagnosis_id) > 2 ORDER BY p.readmission_risk DESC LIMIT 20;
// Load and preprocess data with tensorflow.js async function loadAndPreprocessData() { // Load CSV data const dataUrl = 'patient_data.csv'; const rawData = tf.data.csv(dataUrl, { columnConfigs: { readmission_flag: { isLabel: true } } }); // Define feature columns const featureColumns = [ 'age', 'bmi', 'blood_pressure', 'cholesterol', 'prior_admissions', 'medication_count' ]; // Process features const numFeatures = featureColumns.length; const processedData = rawData.map(({xs, ys}) => { const features = featureColumns.map(col => xs[col]); return { xs: tf.tensor1d(features), ys: tf.oneHot(tf.tensor1d([ys.readmission_flag], 'int32'), 2) }; }).batch(32); return processedData; } // Define neural network architecture function createModel() { const model = tf.sequential(); // Input layer model.add(tf.layers.dense({ units: 64, activation: 'relu', inputShape: [6] // Number of features })); // Hidden layers with dropout for regularization model.add(tf.layers.dropout({rate: 0.2})); model.add(tf.layers.dense({ units: 32, activation: 'relu', kernelRegularizer: tf.regularizers.l2({l2: 0.01})) })); // Output layer model.add(tf.layers.dense({ units: 2, activation: 'softmax' })); // Compile model with custom metrics model.compile({ optimizer: tf.train.adam(0.001), loss: 'categoricalCrossentropy', metrics: ['accuracy', tf.metrics.auc()] })); return model; } // Train and evaluate model async function trainModel() { const data = await loadAndPreprocessData(); const model = createModel(); // Split data (80% train, 20% test) const trainSize = Math.floor(0.8 * 1000); // Assuming 1000 samples const trainData = data.take(trainSize); const testData = data.skip(trainSize); // Train model with early stopping await model.fitDataset(trainData, { epochs: 50, validationData: testData, callbacks: { onEpochEnd: (epoch, logs) => { console.log(`Epoch ${epoch}: loss = ${logs.loss.toFixed(4)}, acc = ${logs.acc.toFixed(4)}, val_loss = ${logs.val_loss.toFixed(4)}`); } } }); // Make prediction on new data const samplePatient = tf.tensor2d([[65, 28.5, 140, 200, 3, 5]]); const prediction = model.predict(samplePatient); const risk = prediction.dataSync()[1] * 100; console.log(`Predicted readmission risk: ${risk.toFixed(1)}%`); }
Report Generation
Executive Summary
The current 30-day readmission rate is 12.4%, showing a 2.1% improvement from last month. Our predictive models identify 42 high-risk patients (risk >30%) who account for 68% of expected readmissions. The cardiology department has the highest readmission rate at 18.7%.
Key Findings
- Patients with multiple chronic conditions have 3.2x higher readmission risk
- Medication non-adherence contributes to 42% of preventable readmissions
- Weekend discharges show 28% higher readmission rates
- Patients without follow-up within 7 days have 2.5x higher risk
Predictive Insights
The random forest model (AUC=0.87) identifies age, HbA1c levels, and prior admissions as top predictors. Patients scoring above 0.3 on our risk scale should be prioritized for care transition programs.
Recommendations
- Implement real-time risk alerts for discharging physicians
- Expand post-discharge follow-up calls for high-risk patients
- Develop targeted education for patients with multiple medications
- Optimize weekend discharge protocols with enhanced support
Detailed Risk Stratification
Risk distribution across patient population showing opportunities for targeted interventions.