[{"data":1,"prerenderedAt":945},["ShallowReactive",2],{"content-query-g3XKmJa8ix":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":6,"gallery":18,"body":20,"_type":938,"_id":939,"_source":940,"_file":941,"_stem":942,"_extension":943,"sitemap":944},"/tools/roc-curve","tools",false,"","ROC Curve and AUC Calculator","Plot ROC curves and calculate AUC online from Excel or CSV data. Compare classifiers, optimize thresholds, and inspect sensitivity and specificity with AI.","ROC Curve and AUC",{"prefix":12,"label":13,"placeholder":14},"Plot ROC curve and calculate AUC","Describe the classifier evaluation you want to run","e.g. plot ROC curve for logistic regression predictions, calculate AUC with 95% CI, find optimal threshold by Youden's J, compare to random forest",[16],"statistics",true,[19],"/img/tools/roc-curve.png",{"type":21,"children":22,"toc":927},"root",[23,32,60,78,110,116,203,209,381,393,399,565,571,727,733,787,793,830,836,857,880,903],{"type":24,"tag":25,"props":26,"children":28},"element","h2",{"id":27},"what-is-a-roc-curve",[29],{"type":30,"value":31},"text","What Is a ROC Curve?",{"type":24,"tag":33,"props":34,"children":35},"p",{},[36,38,44,46,51,53,58],{"type":30,"value":37},"The ",{"type":24,"tag":39,"props":40,"children":41},"strong",{},[42],{"type":30,"value":43},"Receiver Operating Characteristic (ROC) curve",{"type":30,"value":45}," is a graphical plot of a binary classifier's performance across all possible classification thresholds. It plots the ",{"type":24,"tag":39,"props":47,"children":48},{},[49],{"type":30,"value":50},"True Positive Rate",{"type":30,"value":52}," (TPR = sensitivity = recall) on the y-axis against the ",{"type":24,"tag":39,"props":54,"children":55},{},[56],{"type":30,"value":57},"False Positive Rate",{"type":30,"value":59}," (FPR = 1 − specificity) on the x-axis as the threshold varies from 1 (predict everything negative) to 0 (predict everything positive). Each point on the curve represents a (FPR, TPR) pair at a specific threshold. A perfect classifier passes through the top-left corner (FPR = 0, TPR = 1); a random classifier follows the diagonal from (0, 0) to (1, 1). The ROC curve is threshold-independent, making it a principled way to compare the discriminative ability of classifiers without committing to a specific operating point.",{"type":24,"tag":33,"props":61,"children":62},{},[63,64,69,71,76],{"type":30,"value":37},{"type":24,"tag":39,"props":65,"children":66},{},[67],{"type":30,"value":68},"Area Under the ROC Curve (AUC)",{"type":30,"value":70},", also written ",{"type":24,"tag":39,"props":72,"children":73},{},[74],{"type":30,"value":75},"AUROC",{"type":30,"value":77},", summarizes the entire curve as a single number between 0 and 1. Probabilistic interpretation: AUC is the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative instance (the Wilcoxon-Mann-Whitney statistic). AUC = 0.5 means the classifier performs no better than chance; AUC = 0.70–0.80 is acceptable; 0.80–0.90 is excellent; > 0.90 is outstanding. AUC is robust to class imbalance — unlike accuracy, it does not change if you have 10× more negatives than positives — making it the preferred metric for imbalanced classification problems in medicine, fraud detection, and churn prediction.",{"type":24,"tag":33,"props":79,"children":80},{},[81,82,87,89,94,96,101,103,108],{"type":30,"value":37},{"type":24,"tag":39,"props":83,"children":84},{},[85],{"type":30,"value":86},"optimal threshold",{"type":30,"value":88}," is the classification probability cutoff that best balances sensitivity and specificity for your specific application. Common methods: ",{"type":24,"tag":39,"props":90,"children":91},{},[92],{"type":30,"value":93},"Youden's J",{"type":30,"value":95}," (maximizes sensitivity + specificity − 1, geometric optimum on the ROC curve); ",{"type":24,"tag":39,"props":97,"children":98},{},[99],{"type":30,"value":100},"minimum distance to corner",{"type":30,"value":102}," (closest point to the ideal top-left corner); ",{"type":24,"tag":39,"props":104,"children":105},{},[106],{"type":30,"value":107},"cost-based optimization",{"type":30,"value":109}," (minimizes misclassification cost when false positives and false negatives have different costs). A concrete example: a disease screening test with AUC = 0.88 might use a low threshold (high sensitivity) to minimize missed cases, accepting more false positives; a confirmatory diagnostic test with the same AUC might use a higher threshold (high specificity) to minimize unnecessary treatment.",{"type":24,"tag":25,"props":111,"children":113},{"id":112},"how-it-works",[114],{"type":30,"value":115},"How It Works",{"type":24,"tag":117,"props":118,"children":119},"ol",{},[120,152,168],{"type":24,"tag":121,"props":122,"children":123},"li",{},[124,129,131,136,138,143,145,150],{"type":24,"tag":39,"props":125,"children":126},{},[127],{"type":30,"value":128},"Upload your data",{"type":30,"value":130}," — provide a CSV or Excel file with a ",{"type":24,"tag":39,"props":132,"children":133},{},[134],{"type":30,"value":135},"true label",{"type":30,"value":137}," column (0/1 or binary categorical) and one or more ",{"type":24,"tag":39,"props":139,"children":140},{},[141],{"type":30,"value":142},"predicted probability",{"type":30,"value":144}," or ",{"type":24,"tag":39,"props":146,"children":147},{},[148],{"type":30,"value":149},"score",{"type":30,"value":151}," columns (one per model to compare). One row per observation.",{"type":24,"tag":121,"props":153,"children":154},{},[155,160,162],{"type":24,"tag":39,"props":156,"children":157},{},[158],{"type":30,"value":159},"Describe the analysis",{"type":30,"value":161}," — e.g. ",{"type":24,"tag":163,"props":164,"children":165},"em",{},[166],{"type":30,"value":167},"\"plot ROC curve for the 'predicted_prob' column vs 'diagnosis'; calculate AUC with 95% DeLong CI; find optimal threshold by Youden's J; compare to a second model in 'rf_prob'\"",{"type":24,"tag":121,"props":169,"children":170},{},[171,176,178,185,187,193,195,201],{"type":24,"tag":39,"props":172,"children":173},{},[174],{"type":30,"value":175},"Get full results",{"type":30,"value":177}," — the AI writes Python code using ",{"type":24,"tag":179,"props":180,"children":182},"a",{"href":181},"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html",[183],{"type":30,"value":184},"scikit-learn roc_curve",{"type":30,"value":186},", ",{"type":24,"tag":179,"props":188,"children":190},{"href":189},"https://docs.scipy.org/doc/scipy/reference/stats.html",[191],{"type":30,"value":192},"scipy.stats",{"type":30,"value":194},", and ",{"type":24,"tag":179,"props":196,"children":198},{"href":197},"https://plotly.com/python/",[199],{"type":30,"value":200},"Plotly",{"type":30,"value":202}," to plot ROC curves, compute AUC with confidence intervals, identify the optimal threshold, and generate a sensitivity/specificity vs threshold plot",{"type":24,"tag":25,"props":204,"children":206},{"id":205},"required-data-format",[207],{"type":30,"value":208},"Required Data Format",{"type":24,"tag":210,"props":211,"children":212},"table",{},[213,237],{"type":24,"tag":214,"props":215,"children":216},"thead",{},[217],{"type":24,"tag":218,"props":219,"children":220},"tr",{},[221,227,232],{"type":24,"tag":222,"props":223,"children":224},"th",{},[225],{"type":30,"value":226},"Column",{"type":24,"tag":222,"props":228,"children":229},{},[230],{"type":30,"value":231},"Description",{"type":24,"tag":222,"props":233,"children":234},{},[235],{"type":30,"value":236},"Example",{"type":24,"tag":238,"props":239,"children":240},"tbody",{},[241,292,341],{"type":24,"tag":218,"props":242,"children":243},{},[244,255,260],{"type":24,"tag":245,"props":246,"children":247},"td",{},[248],{"type":24,"tag":249,"props":250,"children":252},"code",{"className":251},[],[253],{"type":30,"value":254},"label",{"type":24,"tag":245,"props":256,"children":257},{},[258],{"type":30,"value":259},"True binary outcome",{"type":24,"tag":245,"props":261,"children":262},{},[263,269,271,277,279,285,286],{"type":24,"tag":249,"props":264,"children":266},{"className":265},[],[267],{"type":30,"value":268},"1",{"type":30,"value":270}," (positive), ",{"type":24,"tag":249,"props":272,"children":274},{"className":273},[],[275],{"type":30,"value":276},"0",{"type":30,"value":278}," (negative) or ",{"type":24,"tag":249,"props":280,"children":282},{"className":281},[],[283],{"type":30,"value":284},"'disease'",{"type":30,"value":186},{"type":24,"tag":249,"props":287,"children":289},{"className":288},[],[290],{"type":30,"value":291},"'healthy'",{"type":24,"tag":218,"props":293,"children":294},{},[295,303,308],{"type":24,"tag":245,"props":296,"children":297},{},[298],{"type":24,"tag":249,"props":299,"children":301},{"className":300},[],[302],{"type":30,"value":149},{"type":24,"tag":245,"props":304,"children":305},{},[306],{"type":30,"value":307},"Predicted probability or continuous score",{"type":24,"tag":245,"props":309,"children":310},{},[311,317,318,324,325,331,333,339],{"type":24,"tag":249,"props":312,"children":314},{"className":313},[],[315],{"type":30,"value":316},"0.82",{"type":30,"value":186},{"type":24,"tag":249,"props":319,"children":321},{"className":320},[],[322],{"type":30,"value":323},"0.34",{"type":30,"value":186},{"type":24,"tag":249,"props":326,"children":328},{"className":327},[],[329],{"type":30,"value":330},"0.91",{"type":30,"value":332}," (probabilities in ",{"type":24,"tag":334,"props":335,"children":336},"span",{},[337],{"type":30,"value":338},"0, 1",{"type":30,"value":340},")",{"type":24,"tag":218,"props":342,"children":343},{},[344,353,358],{"type":24,"tag":245,"props":345,"children":346},{},[347],{"type":24,"tag":249,"props":348,"children":350},{"className":349},[],[351],{"type":30,"value":352},"score_2",{"type":24,"tag":245,"props":354,"children":355},{},[356],{"type":30,"value":357},"Optional: second model's scores for comparison",{"type":24,"tag":245,"props":359,"children":360},{},[361,367,368,374,375],{"type":24,"tag":249,"props":362,"children":364},{"className":363},[],[365],{"type":30,"value":366},"0.75",{"type":30,"value":186},{"type":24,"tag":249,"props":369,"children":371},{"className":370},[],[372],{"type":30,"value":373},"0.41",{"type":30,"value":186},{"type":24,"tag":249,"props":376,"children":378},{"className":377},[],[379],{"type":30,"value":380},"0.88",{"type":24,"tag":33,"props":382,"children":383},{},[384,386,391],{"type":30,"value":385},"Any column names work — describe them in your prompt. Predicted probabilities must be for the positive class. If your model outputs scores on any scale (not ",{"type":24,"tag":334,"props":387,"children":388},{},[389],{"type":30,"value":390},"0,1",{"type":30,"value":392},"), the ROC curve is still valid — the ranking is all that matters.",{"type":24,"tag":25,"props":394,"children":396},{"id":395},"interpreting-the-results",[397],{"type":30,"value":398},"Interpreting the Results",{"type":24,"tag":210,"props":400,"children":401},{},[402,418],{"type":24,"tag":214,"props":403,"children":404},{},[405],{"type":24,"tag":218,"props":406,"children":407},{},[408,413],{"type":24,"tag":222,"props":409,"children":410},{},[411],{"type":30,"value":412},"Output",{"type":24,"tag":222,"props":414,"children":415},{},[416],{"type":30,"value":417},"What it means",{"type":24,"tag":238,"props":419,"children":420},{},[421,437,453,469,485,501,517,533,549],{"type":24,"tag":218,"props":422,"children":423},{},[424,432],{"type":24,"tag":245,"props":425,"children":426},{},[427],{"type":24,"tag":39,"props":428,"children":429},{},[430],{"type":30,"value":431},"AUC",{"type":24,"tag":245,"props":433,"children":434},{},[435],{"type":30,"value":436},"Area under ROC curve — probability of correctly ranking a random positive above a random negative",{"type":24,"tag":218,"props":438,"children":439},{},[440,448],{"type":24,"tag":245,"props":441,"children":442},{},[443],{"type":24,"tag":39,"props":444,"children":445},{},[446],{"type":30,"value":447},"95% CI on AUC",{"type":24,"tag":245,"props":449,"children":450},{},[451],{"type":30,"value":452},"DeLong method confidence interval — if CI excludes 0.5, classifier is significantly better than random",{"type":24,"tag":218,"props":454,"children":455},{},[456,464],{"type":24,"tag":245,"props":457,"children":458},{},[459],{"type":24,"tag":39,"props":460,"children":461},{},[462],{"type":30,"value":463},"Sensitivity (TPR)",{"type":24,"tag":245,"props":465,"children":466},{},[467],{"type":30,"value":468},"True positive rate at the chosen threshold — fraction of positives correctly identified",{"type":24,"tag":218,"props":470,"children":471},{},[472,480],{"type":24,"tag":245,"props":473,"children":474},{},[475],{"type":24,"tag":39,"props":476,"children":477},{},[478],{"type":30,"value":479},"Specificity",{"type":24,"tag":245,"props":481,"children":482},{},[483],{"type":30,"value":484},"True negative rate at the chosen threshold = 1 − FPR",{"type":24,"tag":218,"props":486,"children":487},{},[488,496],{"type":24,"tag":245,"props":489,"children":490},{},[491],{"type":24,"tag":39,"props":492,"children":493},{},[494],{"type":30,"value":495},"Youden's J index",{"type":24,"tag":245,"props":497,"children":498},{},[499],{"type":30,"value":500},"Sensitivity + Specificity − 1 at each threshold — maximized at the optimal operating point",{"type":24,"tag":218,"props":502,"children":503},{},[504,512],{"type":24,"tag":245,"props":505,"children":506},{},[507],{"type":24,"tag":39,"props":508,"children":509},{},[510],{"type":30,"value":511},"Optimal threshold",{"type":24,"tag":245,"props":513,"children":514},{},[515],{"type":30,"value":516},"Score cutoff that maximizes Youden's J (or minimizes distance to top-left corner)",{"type":24,"tag":218,"props":518,"children":519},{},[520,528],{"type":24,"tag":245,"props":521,"children":522},{},[523],{"type":24,"tag":39,"props":524,"children":525},{},[526],{"type":30,"value":527},"PPV / NPV",{"type":24,"tag":245,"props":529,"children":530},{},[531],{"type":30,"value":532},"Positive / negative predictive value at the chosen threshold — depend on class prevalence",{"type":24,"tag":218,"props":534,"children":535},{},[536,544],{"type":24,"tag":245,"props":537,"children":538},{},[539],{"type":24,"tag":39,"props":540,"children":541},{},[542],{"type":30,"value":543},"DeLong test",{"type":24,"tag":245,"props":545,"children":546},{},[547],{"type":30,"value":548},"Compares AUC of two classifiers; p-value \u003C 0.05 means one discriminates significantly better",{"type":24,"tag":218,"props":550,"children":551},{},[552,560],{"type":24,"tag":245,"props":553,"children":554},{},[555],{"type":24,"tag":39,"props":556,"children":557},{},[558],{"type":30,"value":559},"Partial AUC",{"type":24,"tag":245,"props":561,"children":562},{},[563],{"type":30,"value":564},"AUC restricted to a FPR range (e.g. 0–0.1) — relevant when only high-specificity operating points matter",{"type":24,"tag":25,"props":566,"children":568},{"id":567},"example-prompts",[569],{"type":30,"value":570},"Example Prompts",{"type":24,"tag":210,"props":572,"children":573},{},[574,590],{"type":24,"tag":214,"props":575,"children":576},{},[577],{"type":24,"tag":218,"props":578,"children":579},{},[580,585],{"type":24,"tag":222,"props":581,"children":582},{},[583],{"type":30,"value":584},"Scenario",{"type":24,"tag":222,"props":586,"children":587},{},[588],{"type":30,"value":589},"What to type",{"type":24,"tag":238,"props":591,"children":592},{},[593,610,626,643,660,677,693,710],{"type":24,"tag":218,"props":594,"children":595},{},[596,601],{"type":24,"tag":245,"props":597,"children":598},{},[599],{"type":30,"value":600},"Basic ROC + AUC",{"type":24,"tag":245,"props":602,"children":603},{},[604],{"type":24,"tag":249,"props":605,"children":607},{"className":606},[],[608],{"type":30,"value":609},"plot ROC curve for predicted probabilities vs true labels; calculate AUC with 95% CI; annotate AUC on plot",{"type":24,"tag":218,"props":611,"children":612},{},[613,617],{"type":24,"tag":245,"props":614,"children":615},{},[616],{"type":30,"value":511},{"type":24,"tag":245,"props":618,"children":619},{},[620],{"type":24,"tag":249,"props":621,"children":623},{"className":622},[],[624],{"type":30,"value":625},"ROC curve; find optimal threshold by Youden's J; report sensitivity, specificity, PPV, NPV at that threshold",{"type":24,"tag":218,"props":627,"children":628},{},[629,634],{"type":24,"tag":245,"props":630,"children":631},{},[632],{"type":30,"value":633},"Multi-model comparison",{"type":24,"tag":245,"props":635,"children":636},{},[637],{"type":24,"tag":249,"props":638,"children":640},{"className":639},[],[641],{"type":30,"value":642},"compare ROC curves for logistic_prob and rf_prob; DeLong test for AUC difference; which model is significantly better?",{"type":24,"tag":218,"props":644,"children":645},{},[646,651],{"type":24,"tag":245,"props":647,"children":648},{},[649],{"type":30,"value":650},"Threshold plot",{"type":24,"tag":245,"props":652,"children":653},{},[654],{"type":24,"tag":249,"props":655,"children":657},{"className":656},[],[658],{"type":30,"value":659},"plot sensitivity, specificity, and Youden's J vs threshold; mark the optimal operating point",{"type":24,"tag":218,"props":661,"children":662},{},[663,668],{"type":24,"tag":245,"props":664,"children":665},{},[666],{"type":30,"value":667},"Class imbalance",{"type":24,"tag":245,"props":669,"children":670},{},[671],{"type":24,"tag":249,"props":672,"children":674},{"className":673},[],[675],{"type":30,"value":676},"ROC curve for imbalanced dataset (5% positive rate); also plot precision-recall curve; compare AUC and AUPRC",{"type":24,"tag":218,"props":678,"children":679},{},[680,684],{"type":24,"tag":245,"props":681,"children":682},{},[683],{"type":30,"value":559},{"type":24,"tag":245,"props":685,"children":686},{},[687],{"type":24,"tag":249,"props":688,"children":690},{"className":689},[],[691],{"type":30,"value":692},"compute partial AUC for FPR 0–0.1 (high specificity region); standardize to [0,1] scale",{"type":24,"tag":218,"props":694,"children":695},{},[696,701],{"type":24,"tag":245,"props":697,"children":698},{},[699],{"type":30,"value":700},"Cross-validated AUC",{"type":24,"tag":245,"props":702,"children":703},{},[704],{"type":24,"tag":249,"props":705,"children":707},{"className":706},[],[708],{"type":30,"value":709},"5-fold cross-validated ROC curve with mean ± SD band; report mean AUC and 95% CI",{"type":24,"tag":218,"props":711,"children":712},{},[713,718],{"type":24,"tag":245,"props":714,"children":715},{},[716],{"type":30,"value":717},"Cost-sensitive threshold",{"type":24,"tag":245,"props":719,"children":720},{},[721],{"type":24,"tag":249,"props":722,"children":724},{"className":723},[],[725],{"type":30,"value":726},"find threshold that minimizes total cost assuming false negative costs 5× more than false positive",{"type":24,"tag":25,"props":728,"children":730},{"id":729},"assumptions-to-check",[731],{"type":30,"value":732},"Assumptions to Check",{"type":24,"tag":734,"props":735,"children":736},"ul",{},[737,747,757,767,777],{"type":24,"tag":121,"props":738,"children":739},{},[740,745],{"type":24,"tag":39,"props":741,"children":742},{},[743],{"type":30,"value":744},"Binary outcome",{"type":30,"value":746}," — ROC curves apply to binary classifiers (positive/negative, disease/healthy, fraud/legitimate); for multi-class problems, compute one-vs-rest ROC curves for each class",{"type":24,"tag":121,"props":748,"children":749},{},[750,755],{"type":24,"tag":39,"props":751,"children":752},{},[753],{"type":30,"value":754},"Calibrated probabilities",{"type":30,"value":756}," — the ROC curve only requires correct ranking (not calibrated probabilities), so raw scores, log-odds, or decision function values all work; however, threshold interpretation requires well-calibrated probabilities — check with a calibration plot",{"type":24,"tag":121,"props":758,"children":759},{},[760,765],{"type":24,"tag":39,"props":761,"children":762},{},[763],{"type":30,"value":764},"Independent test set",{"type":30,"value":766}," — computing AUC on training data inflates the estimate; use a held-out test set, k-fold cross-validation, or bootstrap resampling",{"type":24,"tag":121,"props":768,"children":769},{},[770,775],{"type":24,"tag":39,"props":771,"children":772},{},[773],{"type":30,"value":774},"Sufficient positives",{"type":30,"value":776}," — AUC is unreliable with very few positives (\u003C 20–30); the CI will be wide and the optimal threshold estimate noisy",{"type":24,"tag":121,"props":778,"children":779},{},[780,785],{"type":24,"tag":39,"props":781,"children":782},{},[783],{"type":30,"value":784},"Class prevalence affects PPV/NPV",{"type":30,"value":786}," — sensitivity and specificity are prevalence-independent but PPV and NPV depend heavily on the positive rate in your population; a 95% sensitivity test with 1% disease prevalence has PPV ≈ 16% (most positives are false positives)",{"type":24,"tag":25,"props":788,"children":790},{"id":789},"related-tools",[791],{"type":30,"value":792},"Related Tools",{"type":24,"tag":33,"props":794,"children":795},{},[796,798,804,806,812,814,820,822,828],{"type":30,"value":797},"Use the ",{"type":24,"tag":179,"props":799,"children":801},{"href":800},"/tools/logistic-regression",[802],{"type":30,"value":803},"Logistic Regression",{"type":30,"value":805}," calculator to build a logistic regression model and generate the predicted probabilities needed for ROC analysis. Use the ",{"type":24,"tag":179,"props":807,"children":809},{"href":808},"/tools/chi-square-test",[810],{"type":30,"value":811},"Chi-Square Test Calculator",{"type":30,"value":813}," to evaluate a classifier at a fixed threshold using a 2×2 confusion matrix. Use the ",{"type":24,"tag":179,"props":815,"children":817},{"href":816},"/tools/power-analysis",[818],{"type":30,"value":819},"Power Analysis Calculator",{"type":30,"value":821}," to determine the sample size needed for a study aiming to achieve a specified AUC. Use the ",{"type":24,"tag":179,"props":823,"children":825},{"href":824},"/tools/volcano-plot",[826],{"type":30,"value":827},"Volcano Plot Generator",{"type":30,"value":829}," for visualizing feature importance scores from a classifier alongside significance.",{"type":24,"tag":25,"props":831,"children":833},{"id":832},"frequently-asked-questions",[834],{"type":30,"value":835},"Frequently Asked Questions",{"type":24,"tag":33,"props":837,"children":838},{},[839,844,849,851,855],{"type":24,"tag":39,"props":840,"children":841},{},[842],{"type":30,"value":843},"When should I use AUC vs accuracy?",{"type":24,"tag":39,"props":845,"children":846},{},[847],{"type":30,"value":848},"Accuracy",{"type":30,"value":850}," (fraction of correct predictions) is appropriate only when classes are balanced and false positives and false negatives have equal cost. In most real-world problems — disease diagnosis, fraud detection, churn prediction — the positive class is rare (class imbalance) and the two error types have very different costs. A classifier that predicts \"negative\" for all samples achieves 99% accuracy on a 1% positive-rate dataset while being completely useless. ",{"type":24,"tag":39,"props":852,"children":853},{},[854],{"type":30,"value":431},{"type":30,"value":856}," is unaffected by class imbalance because it evaluates ranking performance across all thresholds. Use AUC for model selection and comparison; report sensitivity, specificity, PPV, and NPV at your chosen operating threshold for clinical or operational use.",{"type":24,"tag":33,"props":858,"children":859},{},[860,865,867,871,873,878],{"type":24,"tag":39,"props":861,"children":862},{},[863],{"type":30,"value":864},"What is the DeLong method for comparing two AUC values?",{"type":30,"value":866},"\nThe ",{"type":24,"tag":39,"props":868,"children":869},{},[870],{"type":30,"value":543},{"type":30,"value":872}," (DeLong et al., 1988) computes the variance and covariance of two AUC estimates from the same test set, accounting for the fact that the same subjects were scored by both classifiers. It then produces a z-statistic and p-value for the hypothesis AUC₁ = AUC₂. Because the same subjects appear in both curves, a paired test is more powerful than independent comparison. The test is asymptotically normal and works well for n ≥ 30. A significant DeLong p-value (\u003C 0.05) means one model discriminates significantly better; a non-significant result means the data cannot distinguish the two models' performance (not that they are identical). Ask the AI to ",{"type":24,"tag":163,"props":874,"children":875},{},[876],{"type":30,"value":877},"\"compare AUC for logistic_prob and rf_prob using DeLong test; report Z-statistic, p-value, and 95% CI on the AUC difference\"",{"type":30,"value":879},".",{"type":24,"tag":33,"props":881,"children":882},{},[883,888,889,894,896,901],{"type":24,"tag":39,"props":884,"children":885},{},[886],{"type":30,"value":887},"What is the difference between ROC-AUC and Precision-Recall AUC?",{"type":30,"value":866},{"type":24,"tag":39,"props":890,"children":891},{},[892],{"type":30,"value":893},"ROC curve",{"type":30,"value":895}," measures the trade-off between true positive rate and false positive rate — it is symmetric with respect to class balance and stable when the negative class is large. The ",{"type":24,"tag":39,"props":897,"children":898},{},[899],{"type":30,"value":900},"precision-recall (PR) curve",{"type":30,"value":902}," measures the trade-off between precision (PPV) and recall (sensitivity) — it is highly sensitive to class imbalance and collapses to a near-zero baseline when positives are rare. For highly imbalanced datasets (\u003C 5% positives), the PR curve and its area (AUPRC) are more informative than ROC-AUC because they directly reflect the practical precision achieved at a given recall level. For balanced datasets, the two measures tend to agree on relative model rankings. Best practice: report both AUC and AUPRC for imbalanced classification problems.",{"type":24,"tag":33,"props":904,"children":905},{},[906,911,913,918,920,925],{"type":24,"tag":39,"props":907,"children":908},{},[909],{"type":30,"value":910},"Why do I get different AUC values from different tools?",{"type":30,"value":912},"\nAUC can be estimated in two ways: (1) ",{"type":24,"tag":39,"props":914,"children":915},{},[916],{"type":30,"value":917},"trapezoidal rule",{"type":30,"value":919}," on the empirical ROC curve (scikit-learn default) — exact but may interpolate poorly when the curve has few points; (2) ",{"type":24,"tag":39,"props":921,"children":922},{},[923],{"type":30,"value":924},"Wilcoxon-Mann-Whitney statistic",{"type":30,"value":926}," — exact rank-based calculation giving the same result as the trapezoidal rule for large n. Differences between tools arise from: (a) whether the AUC is computed on the training set vs test set; (b) whether cross-validation is used; (c) how ties in the score are handled; (d) whether the curve is micro-averaged, macro-averaged, or per-class for multi-class problems. The largest differences come from training vs test evaluation — always specify which you're computing.",{"title":7,"searchDepth":928,"depth":928,"links":929},2,[930,931,932,933,934,935,936,937],{"id":27,"depth":928,"text":31},{"id":112,"depth":928,"text":115},{"id":205,"depth":928,"text":208},{"id":395,"depth":928,"text":398},{"id":567,"depth":928,"text":570},{"id":729,"depth":928,"text":732},{"id":789,"depth":928,"text":792},{"id":832,"depth":928,"text":835},"markdown","content:tools:065.roc-curve.md","content","tools/065.roc-curve.md","tools/065.roc-curve","md",{"loc":4},1775502471940]