[{"data":1,"prerenderedAt":987},["ShallowReactive",2],{"content-query-i9TA6vQe6O":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":6,"gallery":18,"body":20,"_type":980,"_id":981,"_source":982,"_file":983,"_stem":984,"_extension":985,"sitemap":986},"/tools/confusion-matrix","tools",false,"","Confusion Matrix Calculator with Sensitivity and Specificity","Create confusion matrices online from Excel, CSV, or cell counts. Calculate sensitivity, specificity, PPV, NPV, F1, MCC, and kappa with AI.","Confusion Matrix Calculator",{"prefix":12,"label":13,"placeholder":14},"Generate a confusion matrix","Describe the predictions you want to evaluate","e.g. confusion matrix from predicted vs actual columns; report sensitivity, specificity, PPV, NPV, F1 score, MCC, and Cohen's kappa; visualize with heatmap",[16],"statistics",true,[19],"/img/tools/confusion-matrix.png",{"type":21,"children":22,"toc":969},"root",[23,32,88,121,147,153,225,231,346,358,364,605,611,752,758,812,818,855,861,906,922,945],{"type":24,"tag":25,"props":26,"children":28},"element","h2",{"id":27},"what-is-a-confusion-matrix",[29],{"type":30,"value":31},"text","What Is a Confusion Matrix?",{"type":24,"tag":33,"props":34,"children":35},"p",{},[36,38,44,46,51,53,58,60,65,67,72,74,79,81,86],{"type":30,"value":37},"A ",{"type":24,"tag":39,"props":40,"children":41},"strong",{},[42],{"type":30,"value":43},"confusion matrix",{"type":30,"value":45}," is a table that summarizes the performance of a binary (or multi-class) classifier by cross-tabulating ",{"type":24,"tag":39,"props":47,"children":48},{},[49],{"type":30,"value":50},"predicted",{"type":30,"value":52}," versus ",{"type":24,"tag":39,"props":54,"children":55},{},[56],{"type":30,"value":57},"actual",{"type":30,"value":59}," class labels. For a binary classifier, the four cells are: ",{"type":24,"tag":39,"props":61,"children":62},{},[63],{"type":30,"value":64},"True Positives (TP)",{"type":30,"value":66}," — correctly predicted positives; ",{"type":24,"tag":39,"props":68,"children":69},{},[70],{"type":30,"value":71},"True Negatives (TN)",{"type":30,"value":73}," — correctly predicted negatives; ",{"type":24,"tag":39,"props":75,"children":76},{},[77],{"type":30,"value":78},"False Positives (FP)",{"type":30,"value":80}," — negatives incorrectly predicted as positive (Type I errors); and ",{"type":24,"tag":39,"props":82,"children":83},{},[84],{"type":30,"value":85},"False Negatives (FN)",{"type":30,"value":87}," — positives incorrectly predicted as negative (Type II errors). All other diagnostic performance metrics derive from these four numbers. The confusion matrix is the starting point for evaluating any binary classifier — a disease screening test, a fraud detection model, a quality control system, or a machine learning classifier.",{"type":24,"tag":33,"props":89,"children":90},{},[91,93,98,100,105,107,112,114,119],{"type":30,"value":92},"From the four cells, a complete set of diagnostic performance metrics can be derived. ",{"type":24,"tag":39,"props":94,"children":95},{},[96],{"type":30,"value":97},"Sensitivity",{"type":30,"value":99}," (= recall = true positive rate) = TP/(TP+FN) measures how well the test finds true positives — critical for screening tests where missing a disease case is dangerous. ",{"type":24,"tag":39,"props":101,"children":102},{},[103],{"type":30,"value":104},"Specificity",{"type":30,"value":106}," (= true negative rate) = TN/(TN+FP) measures how well the test avoids false alarms — critical when false positives are costly (unnecessary surgery, treatment side effects). ",{"type":24,"tag":39,"props":108,"children":109},{},[110],{"type":30,"value":111},"Positive Predictive Value (PPV)",{"type":30,"value":113}," = TP/(TP+FP) is the probability that a positive test result is truly positive, which depends on disease prevalence. ",{"type":24,"tag":39,"props":115,"children":116},{},[117],{"type":30,"value":118},"Negative Predictive Value (NPV)",{"type":30,"value":120}," = TN/(TN+FN) is the probability that a negative test truly rules out the condition.",{"type":24,"tag":33,"props":122,"children":123},{},[124,126,131,133,138,140,145],{"type":30,"value":125},"Summary metrics that balance multiple aspects: ",{"type":24,"tag":39,"props":127,"children":128},{},[129],{"type":30,"value":130},"F1 score",{"type":30,"value":132}," = 2 × (PPV × sensitivity) / (PPV + sensitivity) — the harmonic mean of precision and recall, appropriate when both false positives and false negatives matter. ",{"type":24,"tag":39,"props":134,"children":135},{},[136],{"type":30,"value":137},"Matthews Correlation Coefficient (MCC)",{"type":30,"value":139}," = (TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) — a single balanced metric particularly recommended for imbalanced datasets, ranging from −1 (perfect disagreement) to +1 (perfect prediction). ",{"type":24,"tag":39,"props":141,"children":142},{},[143],{"type":30,"value":144},"Cohen's κ",{"type":30,"value":146}," measures agreement corrected for chance. None of these single numbers fully replaces the full confusion matrix — always report the four cell counts alongside derived metrics.",{"type":24,"tag":25,"props":148,"children":150},{"id":149},"how-it-works",[151],{"type":30,"value":152},"How It Works",{"type":24,"tag":154,"props":155,"children":156},"ol",{},[157,182,198],{"type":24,"tag":158,"props":159,"children":160},"li",{},[161,166,168,173,175,180],{"type":24,"tag":39,"props":162,"children":163},{},[164],{"type":30,"value":165},"Upload your data",{"type":30,"value":167}," — provide a CSV or Excel file with a ",{"type":24,"tag":39,"props":169,"children":170},{},[171],{"type":30,"value":172},"true label",{"type":30,"value":174}," column and a ",{"type":24,"tag":39,"props":176,"children":177},{},[178],{"type":30,"value":179},"predicted label",{"type":30,"value":181}," column (one row per observation), or simply describe the four cell counts directly in your prompt without uploading a file.",{"type":24,"tag":158,"props":183,"children":184},{},[185,190,192],{"type":24,"tag":39,"props":186,"children":187},{},[188],{"type":30,"value":189},"Describe the analysis",{"type":30,"value":191}," — e.g. ",{"type":24,"tag":193,"props":194,"children":195},"em",{},[196],{"type":30,"value":197},"\"confusion matrix comparing 'actual' vs 'predicted' columns; report all metrics including sensitivity, specificity, PPV, NPV, F1, MCC, and kappa; heatmap visualization\"",{"type":24,"tag":158,"props":199,"children":200},{},[201,206,208,215,217,223],{"type":24,"tag":39,"props":202,"children":203},{},[204],{"type":30,"value":205},"Get full results",{"type":30,"value":207}," — the AI writes Python code using ",{"type":24,"tag":209,"props":210,"children":212},"a",{"href":211},"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html",[213],{"type":30,"value":214},"scikit-learn confusion_matrix",{"type":30,"value":216}," and ",{"type":24,"tag":209,"props":218,"children":220},{"href":219},"https://plotly.com/python/",[221],{"type":30,"value":222},"Plotly",{"type":30,"value":224}," to produce the heatmap with annotated cell counts and percentages, and a complete metrics summary table",{"type":24,"tag":25,"props":226,"children":228},{"id":227},"required-data-format",[229],{"type":30,"value":230},"Required Data Format",{"type":24,"tag":232,"props":233,"children":234},"table",{},[235,259],{"type":24,"tag":236,"props":237,"children":238},"thead",{},[239],{"type":24,"tag":240,"props":241,"children":242},"tr",{},[243,249,254],{"type":24,"tag":244,"props":245,"children":246},"th",{},[247],{"type":30,"value":248},"Column",{"type":24,"tag":244,"props":250,"children":251},{},[252],{"type":30,"value":253},"Description",{"type":24,"tag":244,"props":255,"children":256},{},[257],{"type":30,"value":258},"Example",{"type":24,"tag":260,"props":261,"children":262},"tbody",{},[263,314],{"type":24,"tag":240,"props":264,"children":265},{},[266,276,281],{"type":24,"tag":267,"props":268,"children":269},"td",{},[270],{"type":24,"tag":271,"props":272,"children":274},"code",{"className":273},[],[275],{"type":30,"value":57},{"type":24,"tag":267,"props":277,"children":278},{},[279],{"type":30,"value":280},"True class labels",{"type":24,"tag":267,"props":282,"children":283},{},[284,290,292,298,300,306,308],{"type":24,"tag":271,"props":285,"children":287},{"className":286},[],[288],{"type":30,"value":289},"1",{"type":30,"value":291}," (positive), ",{"type":24,"tag":271,"props":293,"children":295},{"className":294},[],[296],{"type":30,"value":297},"0",{"type":30,"value":299}," (negative) or ",{"type":24,"tag":271,"props":301,"children":303},{"className":302},[],[304],{"type":30,"value":305},"'disease'",{"type":30,"value":307},", ",{"type":24,"tag":271,"props":309,"children":311},{"className":310},[],[312],{"type":30,"value":313},"'healthy'",{"type":24,"tag":240,"props":315,"children":316},{},[317,325,330],{"type":24,"tag":267,"props":318,"children":319},{},[320],{"type":24,"tag":271,"props":321,"children":323},{"className":322},[],[324],{"type":30,"value":50},{"type":24,"tag":267,"props":326,"children":327},{},[328],{"type":30,"value":329},"Predicted class labels",{"type":24,"tag":267,"props":331,"children":332},{},[333,338,339,344],{"type":24,"tag":271,"props":334,"children":336},{"className":335},[],[337],{"type":30,"value":289},{"type":30,"value":307},{"type":24,"tag":271,"props":340,"children":342},{"className":341},[],[343],{"type":30,"value":297},{"type":30,"value":345}," (same classes as actual)",{"type":24,"tag":33,"props":347,"children":348},{},[349,351,356],{"type":30,"value":350},"Any column names work — describe them in your prompt. For multi-class problems, provide all class labels in both columns. If you already have the 4 cell counts, describe them directly: ",{"type":24,"tag":193,"props":352,"children":353},{},[354],{"type":30,"value":355},"\"TP=85, FN=15, FP=22, TN=178\"",{"type":30,"value":357}," — no file upload needed.",{"type":24,"tag":25,"props":359,"children":361},{"id":360},"interpreting-the-results",[362],{"type":30,"value":363},"Interpreting the Results",{"type":24,"tag":232,"props":365,"children":366},{},[367,388],{"type":24,"tag":236,"props":368,"children":369},{},[370],{"type":24,"tag":240,"props":371,"children":372},{},[373,378,383],{"type":24,"tag":244,"props":374,"children":375},{},[376],{"type":30,"value":377},"Metric",{"type":24,"tag":244,"props":379,"children":380},{},[381],{"type":30,"value":382},"Formula",{"type":24,"tag":244,"props":384,"children":385},{},[386],{"type":30,"value":387},"What it means",{"type":24,"tag":260,"props":389,"children":390},{},[391,412,432,453,474,495,516,543,563,584],{"type":24,"tag":240,"props":392,"children":393},{},[394,402,407],{"type":24,"tag":267,"props":395,"children":396},{},[397],{"type":24,"tag":39,"props":398,"children":399},{},[400],{"type":30,"value":401},"Sensitivity (Recall)",{"type":24,"tag":267,"props":403,"children":404},{},[405],{"type":30,"value":406},"TP / (TP + FN)",{"type":24,"tag":267,"props":408,"children":409},{},[410],{"type":30,"value":411},"Fraction of true positives correctly identified — critical for screening",{"type":24,"tag":240,"props":413,"children":414},{},[415,422,427],{"type":24,"tag":267,"props":416,"children":417},{},[418],{"type":24,"tag":39,"props":419,"children":420},{},[421],{"type":30,"value":104},{"type":24,"tag":267,"props":423,"children":424},{},[425],{"type":30,"value":426},"TN / (TN + FP)",{"type":24,"tag":267,"props":428,"children":429},{},[430],{"type":30,"value":431},"Fraction of true negatives correctly identified — critical for confirmation",{"type":24,"tag":240,"props":433,"children":434},{},[435,443,448],{"type":24,"tag":267,"props":436,"children":437},{},[438],{"type":24,"tag":39,"props":439,"children":440},{},[441],{"type":30,"value":442},"PPV (Precision)",{"type":24,"tag":267,"props":444,"children":445},{},[446],{"type":30,"value":447},"TP / (TP + FP)",{"type":24,"tag":267,"props":449,"children":450},{},[451],{"type":30,"value":452},"Probability a positive result is truly positive — depends on prevalence",{"type":24,"tag":240,"props":454,"children":455},{},[456,464,469],{"type":24,"tag":267,"props":457,"children":458},{},[459],{"type":24,"tag":39,"props":460,"children":461},{},[462],{"type":30,"value":463},"NPV",{"type":24,"tag":267,"props":465,"children":466},{},[467],{"type":30,"value":468},"TN / (TN + FN)",{"type":24,"tag":267,"props":470,"children":471},{},[472],{"type":30,"value":473},"Probability a negative result is truly negative",{"type":24,"tag":240,"props":475,"children":476},{},[477,485,490],{"type":24,"tag":267,"props":478,"children":479},{},[480],{"type":24,"tag":39,"props":481,"children":482},{},[483],{"type":30,"value":484},"Accuracy",{"type":24,"tag":267,"props":486,"children":487},{},[488],{"type":30,"value":489},"(TP + TN) / n",{"type":24,"tag":267,"props":491,"children":492},{},[493],{"type":30,"value":494},"Overall fraction correct — misleading for imbalanced classes",{"type":24,"tag":240,"props":496,"children":497},{},[498,506,511],{"type":24,"tag":267,"props":499,"children":500},{},[501],{"type":24,"tag":39,"props":502,"children":503},{},[504],{"type":30,"value":505},"F1 Score",{"type":24,"tag":267,"props":507,"children":508},{},[509],{"type":30,"value":510},"2·PPV·Sensitivity / (PPV + Sensitivity)",{"type":24,"tag":267,"props":512,"children":513},{},[514],{"type":30,"value":515},"Harmonic mean of precision and recall — use for imbalanced data",{"type":24,"tag":240,"props":517,"children":518},{},[519,527,532],{"type":24,"tag":267,"props":520,"children":521},{},[522],{"type":24,"tag":39,"props":523,"children":524},{},[525],{"type":30,"value":526},"MCC",{"type":24,"tag":267,"props":528,"children":529},{},[530],{"type":30,"value":531},"(TP·TN − FP·FN) / √(...)",{"type":24,"tag":267,"props":533,"children":534},{},[535,537],{"type":30,"value":536},"Balanced single metric; best for imbalanced classes; range ",{"type":24,"tag":538,"props":539,"children":540},"span",{},[541],{"type":30,"value":542},"−1, +1",{"type":24,"tag":240,"props":544,"children":545},{},[546,553,558],{"type":24,"tag":267,"props":547,"children":548},{},[549],{"type":24,"tag":39,"props":550,"children":551},{},[552],{"type":30,"value":144},{"type":24,"tag":267,"props":554,"children":555},{},[556],{"type":30,"value":557},"(p_o − p_e) / (1 − p_e)",{"type":24,"tag":267,"props":559,"children":560},{},[561],{"type":30,"value":562},"Agreement corrected for chance; κ > 0.8 = strong agreement",{"type":24,"tag":240,"props":564,"children":565},{},[566,574,579],{"type":24,"tag":267,"props":567,"children":568},{},[569],{"type":24,"tag":39,"props":570,"children":571},{},[572],{"type":30,"value":573},"LR+",{"type":24,"tag":267,"props":575,"children":576},{},[577],{"type":30,"value":578},"Sensitivity / (1 − Specificity)",{"type":24,"tag":267,"props":580,"children":581},{},[582],{"type":30,"value":583},"Positive likelihood ratio — how much the test increases disease odds",{"type":24,"tag":240,"props":585,"children":586},{},[587,595,600],{"type":24,"tag":267,"props":588,"children":589},{},[590],{"type":24,"tag":39,"props":591,"children":592},{},[593],{"type":30,"value":594},"LR−",{"type":24,"tag":267,"props":596,"children":597},{},[598],{"type":30,"value":599},"(1 − Sensitivity) / Specificity",{"type":24,"tag":267,"props":601,"children":602},{},[603],{"type":30,"value":604},"Negative likelihood ratio — how much the test decreases disease odds",{"type":24,"tag":25,"props":606,"children":608},{"id":607},"example-prompts",[609],{"type":30,"value":610},"Example Prompts",{"type":24,"tag":232,"props":612,"children":613},{},[614,630],{"type":24,"tag":236,"props":615,"children":616},{},[617],{"type":24,"tag":240,"props":618,"children":619},{},[620,625],{"type":24,"tag":244,"props":621,"children":622},{},[623],{"type":30,"value":624},"Scenario",{"type":24,"tag":244,"props":626,"children":627},{},[628],{"type":30,"value":629},"What to type",{"type":24,"tag":260,"props":631,"children":632},{},[633,650,667,684,701,718,735],{"type":24,"tag":240,"props":634,"children":635},{},[636,641],{"type":24,"tag":267,"props":637,"children":638},{},[639],{"type":30,"value":640},"From data columns",{"type":24,"tag":267,"props":642,"children":643},{},[644],{"type":24,"tag":271,"props":645,"children":647},{"className":646},[],[648],{"type":30,"value":649},"confusion matrix from 'actual' vs 'predicted' columns; sensitivity, specificity, PPV, NPV, F1, MCC, and kappa",{"type":24,"tag":240,"props":651,"children":652},{},[653,658],{"type":24,"tag":267,"props":654,"children":655},{},[656],{"type":30,"value":657},"From cell counts",{"type":24,"tag":267,"props":659,"children":660},{},[661],{"type":24,"tag":271,"props":662,"children":664},{"className":663},[],[665],{"type":30,"value":666},"confusion matrix with TP=85, FN=15, FP=22, TN=178; compute all diagnostic metrics with 95% Wilson CI",{"type":24,"tag":240,"props":668,"children":669},{},[670,675],{"type":24,"tag":267,"props":671,"children":672},{},[673],{"type":30,"value":674},"Multi-class",{"type":24,"tag":267,"props":676,"children":677},{},[678],{"type":24,"tag":271,"props":679,"children":681},{"className":680},[],[682],{"type":30,"value":683},"multi-class confusion matrix for 4 classes (0,1,2,3); per-class precision, recall, F1; macro and weighted averages",{"type":24,"tag":240,"props":685,"children":686},{},[687,692],{"type":24,"tag":267,"props":688,"children":689},{},[690],{"type":30,"value":691},"Prevalence adjustment",{"type":24,"tag":267,"props":693,"children":694},{},[695],{"type":24,"tag":271,"props":696,"children":698},{"className":697},[],[699],{"type":30,"value":700},"test sensitivity=85%, specificity=90%; compute PPV and NPV at disease prevalence of 1%, 5%, 10%, and 20%",{"type":24,"tag":240,"props":702,"children":703},{},[704,709],{"type":24,"tag":267,"props":705,"children":706},{},[707],{"type":30,"value":708},"Threshold comparison",{"type":24,"tag":267,"props":710,"children":711},{},[712],{"type":24,"tag":271,"props":713,"children":715},{"className":714},[],[716],{"type":30,"value":717},"confusion matrices at thresholds 0.3, 0.5, 0.7; compare sensitivity/specificity tradeoff across thresholds",{"type":24,"tag":240,"props":719,"children":720},{},[721,726],{"type":24,"tag":267,"props":722,"children":723},{},[724],{"type":30,"value":725},"Confidence intervals",{"type":24,"tag":267,"props":727,"children":728},{},[729],{"type":24,"tag":271,"props":730,"children":732},{"className":731},[],[733],{"type":30,"value":734},"confusion matrix TP=85, FN=15, FP=22, TN=178; 95% Wilson confidence intervals for sensitivity and specificity",{"type":24,"tag":240,"props":736,"children":737},{},[738,743],{"type":24,"tag":267,"props":739,"children":740},{},[741],{"type":30,"value":742},"Normalized matrix",{"type":24,"tag":267,"props":744,"children":745},{},[746],{"type":24,"tag":271,"props":747,"children":749},{"className":748},[],[750],{"type":30,"value":751},"normalized confusion matrix (row-normalized) showing recall per class; annotate with counts and percentages",{"type":24,"tag":25,"props":753,"children":755},{"id":754},"assumptions-to-check",[756],{"type":30,"value":757},"Assumptions to Check",{"type":24,"tag":759,"props":760,"children":761},"ul",{},[762,772,782,792,802],{"type":24,"tag":158,"props":763,"children":764},{},[765,770],{"type":24,"tag":39,"props":766,"children":767},{},[768],{"type":30,"value":769},"Binary vs multi-class",{"type":30,"value":771}," — for multi-class problems, sensitivity and specificity are computed per class (one-vs-rest); report macro-averaged and weighted-averaged metrics alongside per-class values",{"type":24,"tag":158,"props":773,"children":774},{},[775,780],{"type":24,"tag":39,"props":776,"children":777},{},[778],{"type":30,"value":779},"Prevalence dependence of PPV/NPV",{"type":30,"value":781}," — PPV and NPV change with disease prevalence; a test with 90% sensitivity and 90% specificity has PPV ≈ 50% in a 1% prevalence population and PPV ≈ 91% in a 50% prevalence population; always specify the prevalence context when reporting PPV and NPV",{"type":24,"tag":158,"props":783,"children":784},{},[785,790],{"type":24,"tag":39,"props":786,"children":787},{},[788],{"type":30,"value":789},"Class imbalance",{"type":30,"value":791}," — accuracy is misleading when classes are imbalanced (a classifier that always predicts \"negative\" achieves 99% accuracy at 1% disease prevalence); use F1, MCC, or the ROC-AUC for imbalanced classification evaluation",{"type":24,"tag":158,"props":793,"children":794},{},[795,800],{"type":24,"tag":39,"props":796,"children":797},{},[798],{"type":30,"value":799},"Threshold sensitivity",{"type":30,"value":801}," — all confusion matrix metrics depend on the chosen classification threshold; report the threshold used and consider whether the ROC curve (threshold-free) better represents model performance",{"type":24,"tag":158,"props":803,"children":804},{},[805,810],{"type":24,"tag":39,"props":806,"children":807},{},[808],{"type":30,"value":809},"Representative test set",{"type":30,"value":811}," — the confusion matrix must be computed on a held-out test set (or cross-validated), never on training data; in-sample confusion matrices systematically overestimate performance",{"type":24,"tag":25,"props":813,"children":815},{"id":814},"related-tools",[816],{"type":30,"value":817},"Related Tools",{"type":24,"tag":33,"props":819,"children":820},{},[821,823,829,831,837,839,845,847,853],{"type":30,"value":822},"Use the ",{"type":24,"tag":209,"props":824,"children":826},{"href":825},"/tools/roc-curve",[827],{"type":30,"value":828},"ROC Curve and AUC Calculator",{"type":30,"value":830}," to evaluate the classifier across all possible thresholds — the confusion matrix is a single point on the ROC curve. Use the ",{"type":24,"tag":209,"props":832,"children":834},{"href":833},"/tools/fishers-exact-test",[835],{"type":30,"value":836},"Fisher's Exact Test Calculator",{"type":30,"value":838}," to test whether the association between predicted and actual labels is statistically significant (the confusion matrix is a 2×2 contingency table). Use the ",{"type":24,"tag":209,"props":840,"children":842},{"href":841},"/tools/chi-square-test",[843],{"type":30,"value":844},"Chi-Square Test Calculator",{"type":30,"value":846}," for larger contingency tables (multi-class confusion matrices with many classes). Use the ",{"type":24,"tag":209,"props":848,"children":850},{"href":849},"/tools/power-analysis",[851],{"type":30,"value":852},"Power Analysis Calculator",{"type":30,"value":854}," to determine sample size needed to achieve a target sensitivity and specificity with specified precision.",{"type":24,"tag":25,"props":856,"children":858},{"id":857},"frequently-asked-questions",[859],{"type":30,"value":860},"Frequently Asked Questions",{"type":24,"tag":33,"props":862,"children":863},{},[864,869,871,876,878,883,885,890,892,897,899,904],{"type":24,"tag":39,"props":865,"children":866},{},[867],{"type":30,"value":868},"Which metric should I use as the primary performance measure?",{"type":30,"value":870},"\nThe answer depends on what errors cost in your application. For ",{"type":24,"tag":39,"props":872,"children":873},{},[874],{"type":30,"value":875},"medical screening",{"type":30,"value":877}," (where missing a disease is dangerous), maximize sensitivity — you can tolerate false positives. For ",{"type":24,"tag":39,"props":879,"children":880},{},[881],{"type":30,"value":882},"confirmatory diagnosis",{"type":30,"value":884}," (where unnecessary treatment is harmful), maximize specificity or PPV. For ",{"type":24,"tag":39,"props":886,"children":887},{},[888],{"type":30,"value":889},"balanced binary classification",{"type":30,"value":891}," with equal class prevalence, F1 score is a good primary metric. For ",{"type":24,"tag":39,"props":893,"children":894},{},[895],{"type":30,"value":896},"imbalanced classification",{"type":30,"value":898}," (rare positives), MCC is the most informative single number — it accounts for all four cells of the confusion matrix and is not inflated by class imbalance. For ",{"type":24,"tag":39,"props":900,"children":901},{},[902],{"type":30,"value":903},"overall performance evaluation",{"type":30,"value":905},", report the full confusion matrix, sensitivity, specificity, F1, and MCC together — no single number tells the whole story.",{"type":24,"tag":33,"props":907,"children":908},{},[909,914,916,920],{"type":24,"tag":39,"props":910,"children":911},{},[912],{"type":30,"value":913},"What is the Matthews Correlation Coefficient and why is it better than F1 for imbalanced data?",{"type":30,"value":915},"\nThe ",{"type":24,"tag":39,"props":917,"children":918},{},[919],{"type":30,"value":526},{"type":30,"value":921}," is the correlation coefficient between the true labels and predicted labels, ranging from −1 (all predictions wrong) to +1 (perfect prediction) with 0 meaning performance no better than chance. Unlike F1 which uses only TP, FP, and FN (ignoring TN), MCC uses all four cells. For a highly imbalanced dataset where 99% of samples are negative, a classifier predicting \"always negative\" achieves F1 = 0 (correctly — it has zero sensitivity) but also a very high accuracy of 99%. The MCC of this trivial classifier is 0.0, correctly indicating chance-level discrimination. This makes MCC the recommended metric by several machine learning researchers (Chicco & Jurman, 2020) for imbalanced binary classification.",{"type":24,"tag":33,"props":923,"children":924},{},[925,930,931,936,938,943],{"type":24,"tag":39,"props":926,"children":927},{},[928],{"type":30,"value":929},"How do I compute confidence intervals for sensitivity and specificity?",{"type":30,"value":915},{"type":24,"tag":39,"props":932,"children":933},{},[934],{"type":30,"value":935},"Wilson score interval",{"type":30,"value":937}," is the recommended method for proportions with small-to-moderate sample sizes: CI = (p + z²/2n ± z√(p(1−p)/n + z²/4n²)) / (1 + z²/n), where p is the proportion (sensitivity or specificity) and z = 1.96 for 95% CI. For large samples (n > 100), the normal approximation CI = p ± 1.96 × √(p(1−p)/n) is adequate. The sensitivity CI uses n = TP + FN (total positives); the specificity CI uses n = TN + FP (total negatives). Wide CIs indicate insufficient sample size for precise performance estimates — a common problem in small validation studies. Ask the AI to ",{"type":24,"tag":193,"props":939,"children":940},{},[941],{"type":30,"value":942},"\"compute 95% Wilson CI for sensitivity, specificity, PPV, and NPV\"",{"type":30,"value":944},".",{"type":24,"tag":33,"props":946,"children":947},{},[948,953,955,960,962,967],{"type":24,"tag":39,"props":949,"children":950},{},[951],{"type":30,"value":952},"Why does PPV change with disease prevalence if sensitivity and specificity are fixed?",{"type":30,"value":954},"\nSensitivity and specificity are ",{"type":24,"tag":39,"props":956,"children":957},{},[958],{"type":30,"value":959},"intrinsic properties of the test",{"type":30,"value":961}," — they describe the test's ability to detect disease and rule out disease in diseased and healthy populations respectively. They do not change with prevalence. PPV and NPV are ",{"type":24,"tag":39,"props":963,"children":964},{},[965],{"type":30,"value":966},"extrinsic",{"type":30,"value":968}," — they describe what a test result means for a specific patient in a specific population with a specific prior probability (prevalence). Using Bayes' theorem: PPV = (sensitivity × prevalence) / (sensitivity × prevalence + (1−specificity) × (1−prevalence)). In a population with 1% disease prevalence and a test with 90% sensitivity and 90% specificity: PPV = (0.9 × 0.01) / (0.9 × 0.01 + 0.1 × 0.99) = 0.009 / 0.108 ≈ 8.3%. Most positive tests are false positives in low-prevalence settings — an important consideration for population screening programs.",{"title":7,"searchDepth":970,"depth":970,"links":971},2,[972,973,974,975,976,977,978,979],{"id":27,"depth":970,"text":31},{"id":149,"depth":970,"text":152},{"id":227,"depth":970,"text":230},{"id":360,"depth":970,"text":363},{"id":607,"depth":970,"text":610},{"id":754,"depth":970,"text":757},{"id":814,"depth":970,"text":817},{"id":857,"depth":970,"text":860},"markdown","content:tools:070.confusion-matrix.md","content","tools/070.confusion-matrix.md","tools/070.confusion-matrix","md",{"loc":4},1775502471979]