[{"data":1,"prerenderedAt":840},["ShallowReactive",2],{"content-query-N7vYqLumfk":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":6,"presets":18,"gallery":36,"body":38,"_type":833,"_id":834,"_source":835,"_file":836,"_stem":837,"_extension":838,"sitemap":839},"/tools/cohens-kappa","tools",false,"","Cohen's Kappa Calculator","Calculate Cohen's kappa online from Excel or CSV ratings. Measure inter-rater agreement for categorical labels with AI.","Cohen's Kappa",{"prefix":12,"label":13,"placeholder":14},"Calculate Cohen's kappa","Describe the raters and categories you want to assess","e.g. 2 raters classified 120 patients as low/medium/high severity; compute Cohen's kappa, 95% CI, agreement matrix heatmap, per-category kappa",[16],"statistics",true,[19,25,30],{"label":20,"prompt":21,"dataset_url":22,"dataset_title":23,"dataset_citation":24},"Clinical Diagnosis Agreement","Cohen's kappa for 2 clinicians classifying patients into 3 diagnostic categories; compute κ with 95% CI; agreement matrix heatmap; per-category kappa; classify agreement strength (Landis & Koch benchmarks)","https://data.cdc.gov/api/views/iuq5-y9ct/rows.csv?accessType=DOWNLOAD","NHANES Mental Health Assessment","CDC",{"label":26,"prompt":27,"dataset_url":28,"dataset_title":29,"dataset_citation":24},"Pathology Slide Classification","Weighted kappa (quadratic weights) for 2 pathologists grading tissue samples on a 4-point ordinal scale (0–3); compute weighted κ, 95% CI; agreement matrix; compare weighted vs unweighted kappa","https://data.cdc.gov/api/views/dppn-5tm3/rows.csv?accessType=DOWNLOAD","NCHS Health and Nutrition Examination Survey",{"label":31,"prompt":32,"dataset_url":33,"dataset_title":34,"dataset_citation":35},"Survey Response Agreement","Cohen's kappa for 2 coders categorizing open-ended survey responses into 5 themes; kappa with 95% CI; agreement matrix; identify categories with lowest pairwise agreement; Fleiss kappa if 3+ coders","https://ourworldindata.org/grapher/happiness-cantril-ladder.csv","Self-Reported Life Satisfaction Categories","Our World in Data",[37],"/img/tools/cohens-kappa.png",{"type":39,"children":40,"toc":822},"root",[41,50,76,88,93,99,157,163,317,322,328,477,483,640,646,697,703,740,746,777,800],{"type":42,"tag":43,"props":44,"children":46},"element","h2",{"id":45},"what-is-cohens-kappa",[47],{"type":48,"value":49},"text","What Is Cohen's Kappa?",{"type":42,"tag":51,"props":52,"children":53},"p",{},[54,60,62,67,69,74],{"type":42,"tag":55,"props":56,"children":57},"strong",{},[58],{"type":48,"value":59},"Cohen's kappa (κ)",{"type":48,"value":61}," is the standard measure of ",{"type":42,"tag":55,"props":63,"children":64},{},[65],{"type":48,"value":66},"inter-rater agreement for categorical data",{"type":48,"value":68}," — it quantifies how consistently two raters (judges, coders, classifiers) assign the same categories to the same subjects, corrected for the agreement expected by chance. Simple ",{"type":42,"tag":55,"props":70,"children":71},{},[72],{"type":48,"value":73},"percent agreement",{"type":48,"value":75}," is misleading because two raters randomly assigning categories will agree by chance at a rate proportional to category prevalences — if both raters assign \"positive\" 90% of the time, they agree 81% of the time by chance even with zero skill. Kappa corrects for this: κ = (p_o − p_e) / (1 − p_e), where p_o is the observed agreement and p_e is the expected chance agreement. κ = 0 means agreement no better than chance; κ = 1 means perfect agreement; negative κ means agreement worse than chance (systematic disagreement).",{"type":42,"tag":51,"props":77,"children":78},{},[79,81,86],{"type":48,"value":80},"The formula for chance-expected agreement is p_e = Σ (row_marginal_i × col_marginal_i), where the marginals represent each rater's category usage frequencies. For a binary classification (disease/no disease), if Rater 1 calls 70% positive and Rater 2 calls 65% positive, the expected chance agreement is 0.70×0.65 + 0.30×0.35 = 0.455 + 0.105 = 0.56 — so even perfect chance-level raters would agree 56% of the time. ",{"type":42,"tag":55,"props":82,"children":83},{},[84],{"type":48,"value":85},"Weighted kappa",{"type":48,"value":87}," extends the basic formula to ordinal categories, where partial credit is given for near-misses: a disagreement of one step (Low vs Medium) is penalized less than a large disagreement (Low vs High). Linear and quadratic weighting schemes are available; quadratic weighting (which penalizes large disagreements more heavily) is most common and equals the ICC(3,1) consistency for ordinal data.",{"type":42,"tag":51,"props":89,"children":90},{},[91],{"type":48,"value":92},"A practical example: two radiologists classify 200 chest X-rays as Normal, Suspicious, or Malignant. Observed agreement = 74%, but expected chance agreement = 46%, yielding κ = (0.74 − 0.46) / (1 − 0.46) = 0.52 (moderate agreement). The agreement matrix reveals that most disagreements are between Suspicious and Malignant — both raters agree well on Normal cases (per-category κ = 0.81) but struggle to distinguish the two abnormal categories (per-category κ = 0.45 and 0.41). This directs training efforts: the raters need calibration on the Suspicious-to-Malignant boundary, not on identifying normal scans.",{"type":42,"tag":43,"props":94,"children":96},{"id":95},"how-it-works",[97],{"type":48,"value":98},"How It Works",{"type":42,"tag":100,"props":101,"children":102},"ol",{},[103,114,130],{"type":42,"tag":104,"props":105,"children":106},"li",{},[107,112],{"type":42,"tag":55,"props":108,"children":109},{},[110],{"type":48,"value":111},"Upload your data",{"type":48,"value":113}," — provide a CSV or Excel file with one column per rater, one row per subject. Each cell should contain the category label (text or numeric code) assigned by that rater.",{"type":42,"tag":104,"props":115,"children":116},{},[117,122,124],{"type":42,"tag":55,"props":118,"children":119},{},[120],{"type":48,"value":121},"Describe the analysis",{"type":48,"value":123}," — e.g. ",{"type":42,"tag":125,"props":126,"children":127},"em",{},[128],{"type":48,"value":129},"\"2 raters, 3 categories (low/medium/high); compute Cohen's kappa with 95% CI; agreement matrix heatmap; per-category kappa; Landis & Koch interpretation\"",{"type":42,"tag":104,"props":131,"children":132},{},[133,138,140,147,149,155],{"type":42,"tag":55,"props":134,"children":135},{},[136],{"type":48,"value":137},"Get full results",{"type":48,"value":139}," — the AI writes Python code using ",{"type":42,"tag":141,"props":142,"children":144},"a",{"href":143},"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html",[145],{"type":48,"value":146},"scikit-learn",{"type":48,"value":148}," and ",{"type":42,"tag":141,"props":150,"children":152},{"href":151},"https://plotly.com/python/",[153],{"type":48,"value":154},"Plotly",{"type":48,"value":156}," to compute kappa, weighted kappa, 95% CI, per-category kappa, the full agreement matrix, and produce the heatmap visualization",{"type":42,"tag":43,"props":158,"children":160},{"id":159},"required-data-format",[161],{"type":48,"value":162},"Required Data Format",{"type":42,"tag":164,"props":165,"children":166},"table",{},[167,191],{"type":42,"tag":168,"props":169,"children":170},"thead",{},[171],{"type":42,"tag":172,"props":173,"children":174},"tr",{},[175,181,186],{"type":42,"tag":176,"props":177,"children":178},"th",{},[179],{"type":48,"value":180},"Column",{"type":42,"tag":176,"props":182,"children":183},{},[184],{"type":48,"value":185},"Description",{"type":42,"tag":176,"props":187,"children":188},{},[189],{"type":48,"value":190},"Example",{"type":42,"tag":192,"props":193,"children":194},"tbody",{},[195,262,284],{"type":42,"tag":172,"props":196,"children":197},{},[198,209,214],{"type":42,"tag":199,"props":200,"children":201},"td",{},[202],{"type":42,"tag":203,"props":204,"children":206},"code",{"className":205},[],[207],{"type":48,"value":208},"rater1",{"type":42,"tag":199,"props":210,"children":211},{},[212],{"type":48,"value":213},"Categories assigned by rater 1",{"type":42,"tag":199,"props":215,"children":216},{},[217,223,225,231,232,238,240,246,247,253,254,260],{"type":42,"tag":203,"props":218,"children":220},{"className":219},[],[221],{"type":48,"value":222},"low",{"type":48,"value":224},", ",{"type":42,"tag":203,"props":226,"children":228},{"className":227},[],[229],{"type":48,"value":230},"medium",{"type":48,"value":224},{"type":42,"tag":203,"props":233,"children":235},{"className":234},[],[236],{"type":48,"value":237},"high",{"type":48,"value":239}," (or ",{"type":42,"tag":203,"props":241,"children":243},{"className":242},[],[244],{"type":48,"value":245},"1",{"type":48,"value":224},{"type":42,"tag":203,"props":248,"children":250},{"className":249},[],[251],{"type":48,"value":252},"2",{"type":48,"value":224},{"type":42,"tag":203,"props":255,"children":257},{"className":256},[],[258],{"type":48,"value":259},"3",{"type":48,"value":261},")",{"type":42,"tag":172,"props":263,"children":264},{},[265,274,279],{"type":42,"tag":199,"props":266,"children":267},{},[268],{"type":42,"tag":203,"props":269,"children":271},{"className":270},[],[272],{"type":48,"value":273},"rater2",{"type":42,"tag":199,"props":275,"children":276},{},[277],{"type":48,"value":278},"Categories assigned by rater 2",{"type":42,"tag":199,"props":280,"children":281},{},[282],{"type":48,"value":283},"Same category labels as rater 1",{"type":42,"tag":172,"props":285,"children":286},{},[287,296,301],{"type":42,"tag":199,"props":288,"children":289},{},[290],{"type":42,"tag":203,"props":291,"children":293},{"className":292},[],[294],{"type":48,"value":295},"subject",{"type":42,"tag":199,"props":297,"children":298},{},[299],{"type":48,"value":300},"Optional: subject identifier",{"type":42,"tag":199,"props":302,"children":303},{},[304,310,311],{"type":42,"tag":203,"props":305,"children":307},{"className":306},[],[308],{"type":48,"value":309},"P001",{"type":48,"value":224},{"type":42,"tag":203,"props":312,"children":314},{"className":313},[],[315],{"type":48,"value":316},"case_12",{"type":42,"tag":51,"props":318,"children":319},{},[320],{"type":48,"value":321},"Any column names work — describe them in your prompt. Both rater columns must use the same category labels. For Fleiss' kappa (3+ raters), provide one column per rater. Missing values (subject not rated by one rater) should be handled by excluding that subject or using pairwise deletion.",{"type":42,"tag":43,"props":323,"children":325},{"id":324},"interpreting-the-results",[326],{"type":48,"value":327},"Interpreting the Results",{"type":42,"tag":164,"props":329,"children":330},{},[331,347],{"type":42,"tag":168,"props":332,"children":333},{},[334],{"type":42,"tag":172,"props":335,"children":336},{},[337,342],{"type":42,"tag":176,"props":338,"children":339},{},[340],{"type":48,"value":341},"Output",{"type":42,"tag":176,"props":343,"children":344},{},[345],{"type":48,"value":346},"What it means",{"type":42,"tag":192,"props":348,"children":349},{},[350,366,382,398,414,430,445,461],{"type":42,"tag":172,"props":351,"children":352},{},[353,361],{"type":42,"tag":199,"props":354,"children":355},{},[356],{"type":42,"tag":55,"props":357,"children":358},{},[359],{"type":48,"value":360},"κ (kappa)",{"type":42,"tag":199,"props":362,"children":363},{},[364],{"type":48,"value":365},"Agreement corrected for chance — ranges from \u003C 0 (worse than chance) to 1 (perfect)",{"type":42,"tag":172,"props":367,"children":368},{},[369,377],{"type":42,"tag":199,"props":370,"children":371},{},[372],{"type":42,"tag":55,"props":373,"children":374},{},[375],{"type":48,"value":376},"95% CI",{"type":42,"tag":199,"props":378,"children":379},{},[380],{"type":48,"value":381},"Uncertainty in kappa — always report; wide CI with small n",{"type":42,"tag":172,"props":383,"children":384},{},[385,393],{"type":42,"tag":199,"props":386,"children":387},{},[388],{"type":42,"tag":55,"props":389,"children":390},{},[391],{"type":48,"value":392},"Strength classification",{"type":42,"tag":199,"props":394,"children":395},{},[396],{"type":48,"value":397},"Landis & Koch (1977): \u003C 0.20 slight; 0.21–0.40 fair; 0.41–0.60 moderate; 0.61–0.80 substantial; > 0.80 almost perfect",{"type":42,"tag":172,"props":399,"children":400},{},[401,409],{"type":42,"tag":199,"props":402,"children":403},{},[404],{"type":42,"tag":55,"props":405,"children":406},{},[407],{"type":48,"value":408},"Observed agreement (p_o)",{"type":42,"tag":199,"props":410,"children":411},{},[412],{"type":48,"value":413},"Raw % agreement — misleading without chance correction",{"type":42,"tag":172,"props":415,"children":416},{},[417,425],{"type":42,"tag":199,"props":418,"children":419},{},[420],{"type":42,"tag":55,"props":421,"children":422},{},[423],{"type":48,"value":424},"Expected agreement (p_e)",{"type":42,"tag":199,"props":426,"children":427},{},[428],{"type":48,"value":429},"Chance agreement based on marginal category frequencies",{"type":42,"tag":172,"props":431,"children":432},{},[433,440],{"type":42,"tag":199,"props":434,"children":435},{},[436],{"type":42,"tag":55,"props":437,"children":438},{},[439],{"type":48,"value":85},{"type":42,"tag":199,"props":441,"children":442},{},[443],{"type":48,"value":444},"Kappa giving partial credit for near-miss disagreements — appropriate for ordinal categories",{"type":42,"tag":172,"props":446,"children":447},{},[448,456],{"type":42,"tag":199,"props":449,"children":450},{},[451],{"type":42,"tag":55,"props":452,"children":453},{},[454],{"type":48,"value":455},"Per-category kappa",{"type":42,"tag":199,"props":457,"children":458},{},[459],{"type":48,"value":460},"Kappa treating each category as a binary (this category vs all others) — identifies which categories are hardest to agree on",{"type":42,"tag":172,"props":462,"children":463},{},[464,472],{"type":42,"tag":199,"props":465,"children":466},{},[467],{"type":42,"tag":55,"props":468,"children":469},{},[470],{"type":48,"value":471},"Agreement matrix",{"type":42,"tag":199,"props":473,"children":474},{},[475],{"type":48,"value":476},"Cross-tabulation of rater 1 vs rater 2 ratings — diagonal cells are agreements; off-diagonal are disagreements",{"type":42,"tag":43,"props":478,"children":480},{"id":479},"example-prompts",[481],{"type":48,"value":482},"Example Prompts",{"type":42,"tag":164,"props":484,"children":485},{},[486,502],{"type":42,"tag":168,"props":487,"children":488},{},[489],{"type":42,"tag":172,"props":490,"children":491},{},[492,497],{"type":42,"tag":176,"props":493,"children":494},{},[495],{"type":48,"value":496},"Scenario",{"type":42,"tag":176,"props":498,"children":499},{},[500],{"type":48,"value":501},"What to type",{"type":42,"tag":192,"props":503,"children":504},{},[505,522,538,555,572,589,606,623],{"type":42,"tag":172,"props":506,"children":507},{},[508,513],{"type":42,"tag":199,"props":509,"children":510},{},[511],{"type":48,"value":512},"Basic 2-rater kappa",{"type":42,"tag":199,"props":514,"children":515},{},[516],{"type":42,"tag":203,"props":517,"children":519},{"className":518},[],[520],{"type":48,"value":521},"2 raters, columns rater1 and rater2; compute Cohen's kappa, 95% CI; agreement matrix heatmap; classify agreement strength",{"type":42,"tag":172,"props":523,"children":524},{},[525,529],{"type":42,"tag":199,"props":526,"children":527},{},[528],{"type":48,"value":85},{"type":42,"tag":199,"props":530,"children":531},{},[532],{"type":42,"tag":203,"props":533,"children":535},{"className":534},[],[536],{"type":48,"value":537},"ordinal categories 1–5; weighted kappa with quadratic weights; compare to unweighted kappa; agreement matrix",{"type":42,"tag":172,"props":539,"children":540},{},[541,546],{"type":42,"tag":199,"props":542,"children":543},{},[544],{"type":48,"value":545},"Per-category analysis",{"type":42,"tag":199,"props":547,"children":548},{},[549],{"type":42,"tag":203,"props":550,"children":552},{"className":551},[],[553],{"type":48,"value":554},"compute per-category kappa for each of the 4 categories; identify which category has worst agreement",{"type":42,"tag":172,"props":556,"children":557},{},[558,563],{"type":42,"tag":199,"props":559,"children":560},{},[561],{"type":48,"value":562},"3+ raters",{"type":42,"tag":199,"props":564,"children":565},{},[566],{"type":42,"tag":203,"props":567,"children":569},{"className":568},[],[570],{"type":48,"value":571},"3 raters in columns r1, r2, r3; Fleiss' kappa for multiple raters; overall and per-category kappa",{"type":42,"tag":172,"props":573,"children":574},{},[575,580],{"type":42,"tag":199,"props":576,"children":577},{},[578],{"type":48,"value":579},"Binary classification",{"type":42,"tag":199,"props":581,"children":582},{},[583],{"type":42,"tag":203,"props":584,"children":586},{"className":585},[],[587],{"type":48,"value":588},"binary outcome (yes/no); Cohen's kappa, sensitivity, specificity, and percent agreement; 2×2 agreement table",{"type":42,"tag":172,"props":590,"children":591},{},[592,597],{"type":42,"tag":199,"props":593,"children":594},{},[595],{"type":48,"value":596},"Confidence intervals",{"type":42,"tag":199,"props":598,"children":599},{},[600],{"type":42,"tag":203,"props":601,"children":603},{"className":602},[],[604],{"type":48,"value":605},"compute kappa with 95% CI using both asymptotic formula and bootstrap (1000 samples); compare CI methods",{"type":42,"tag":172,"props":607,"children":608},{},[609,614],{"type":42,"tag":199,"props":610,"children":611},{},[612],{"type":48,"value":613},"Prevalence-adjusted",{"type":42,"tag":199,"props":615,"children":616},{},[617],{"type":42,"tag":203,"props":618,"children":620},{"className":619},[],[621],{"type":48,"value":622},"compute PABAK (prevalence-adjusted bias-adjusted kappa) alongside standard kappa to account for high prevalence imbalance",{"type":42,"tag":172,"props":624,"children":625},{},[626,631],{"type":42,"tag":199,"props":627,"children":628},{},[629],{"type":48,"value":630},"Minimum sample size",{"type":42,"tag":199,"props":632,"children":633},{},[634],{"type":42,"tag":203,"props":635,"children":637},{"className":636},[],[638],{"type":48,"value":639},"how many subjects needed to estimate kappa ≥ 0.70 with 95% CI width ≤ 0.15? compute for κ₀ = 0.70, 2 raters",{"type":42,"tag":43,"props":641,"children":643},{"id":642},"assumptions-to-check",[644],{"type":48,"value":645},"Assumptions to Check",{"type":42,"tag":647,"props":648,"children":649},"ul",{},[650,660,677,687],{"type":42,"tag":104,"props":651,"children":652},{},[653,658],{"type":42,"tag":55,"props":654,"children":655},{},[656],{"type":48,"value":657},"Independence of ratings",{"type":48,"value":659}," — each subject should be rated independently by each rater without knowledge of the other rater's assessment; if raters discuss cases before rating, agreement will be artificially inflated and kappa will overestimate reliability",{"type":42,"tag":104,"props":661,"children":662},{},[663,668,670,675],{"type":42,"tag":55,"props":664,"children":665},{},[666],{"type":48,"value":667},"Marginal homogeneity",{"type":48,"value":669}," — standard kappa assumes both raters use the categories at similar frequencies (similar marginal distributions); when one rater systematically uses a category more than the other (marginal heterogeneity), kappa can be paradoxically low even with high raw agreement; ",{"type":42,"tag":55,"props":671,"children":672},{},[673],{"type":48,"value":674},"PABAK",{"type":48,"value":676}," (prevalence-adjusted, bias-adjusted kappa) corrects for this",{"type":42,"tag":104,"props":678,"children":679},{},[680,685],{"type":42,"tag":55,"props":681,"children":682},{},[683],{"type":48,"value":684},"Landis & Koch benchmarks are arbitrary",{"type":48,"value":686}," — the commonly cited thresholds (0.41–0.60 = moderate, etc.) were proposed without empirical justification; acceptable kappa depends on the application context; a kappa of 0.60 may be excellent for a complex clinical judgment and inadequate for a simple binary classification; always discuss agreement in light of the specific decision stakes",{"type":42,"tag":104,"props":688,"children":689},{},[690,695],{"type":42,"tag":55,"props":691,"children":692},{},[693],{"type":48,"value":694},"Weighted kappa choice",{"type":48,"value":696}," — for ordinal data, the choice of weighting scheme (linear vs quadratic) changes the kappa value; quadratic weights penalize large disagreements more and equal ICC(3,1); linear weights give equal penalty per ordinal step; choose based on clinical relevance of the error magnitude, not to maximize kappa",{"type":42,"tag":43,"props":698,"children":700},{"id":699},"related-tools",[701],{"type":48,"value":702},"Related Tools",{"type":42,"tag":51,"props":704,"children":705},{},[706,708,714,716,722,724,730,732,738],{"type":48,"value":707},"Use the ",{"type":42,"tag":141,"props":709,"children":711},{"href":710},"/tools/icc-calculator",[712],{"type":48,"value":713},"Intraclass Correlation Coefficient (ICC) Calculator",{"type":48,"value":715}," for continuous measurements where reliability is assessed by agreement in numeric values — Cohen's kappa is for categorical ratings while ICC handles continuous and ordinal scores (with quadratic weighted kappa ≈ ICC(3,1) consistency). Use the ",{"type":42,"tag":141,"props":717,"children":719},{"href":718},"/tools/confusion-matrix",[720],{"type":48,"value":721},"Confusion Matrix & Sensitivity Specificity Calculator",{"type":48,"value":723}," when one rater is the gold standard (ground truth) and the other is a test — kappa treats both raters symmetrically while the confusion matrix treats one as the reference. Use the ",{"type":42,"tag":141,"props":725,"children":727},{"href":726},"/tools/fishers-exact-test",[728],{"type":48,"value":729},"Fisher's Exact Test Calculator",{"type":48,"value":731}," to test whether the association between two binary raters' classifications is statistically significant — the 2×2 agreement table is a contingency table and Fisher's test provides a p-value for association. Use the ",{"type":42,"tag":141,"props":733,"children":735},{"href":734},"/tools/cronbachs-alpha",[736],{"type":48,"value":737},"Cronbach's Alpha Calculator",{"type":48,"value":739}," for scale reliability — when the same construct is measured with multiple parallel items by a single rater, Cronbach's alpha is the appropriate reliability measure rather than kappa.",{"type":42,"tag":43,"props":741,"children":743},{"id":742},"frequently-asked-questions",[744],{"type":48,"value":745},"Frequently Asked Questions",{"type":42,"tag":51,"props":747,"children":748},{},[749,754,756,761,763,768,770,775],{"type":42,"tag":55,"props":750,"children":751},{},[752],{"type":48,"value":753},"When should I use weighted kappa instead of Cohen's kappa?",{"type":48,"value":755},"\nUse ",{"type":42,"tag":55,"props":757,"children":758},{},[759],{"type":48,"value":760},"weighted kappa",{"type":48,"value":762}," whenever the rating categories are ",{"type":42,"tag":55,"props":764,"children":765},{},[766],{"type":48,"value":767},"ordinal",{"type":48,"value":769}," — that is, they have a natural order where being \"one step off\" is less severe than being \"many steps off\". Examples: pain scale (none/mild/moderate/severe), tumor staging (Stage I–IV), Likert agreement (strongly disagree to strongly agree). Unweighted kappa treats all disagreements equally — calling Stage I when the answer is Stage IV is penalized the same as calling Stage I vs Stage II. Weighted kappa with quadratic weights gives partial credit for near-misses and is mathematically equivalent to ICC(3,1) consistency. Use ",{"type":42,"tag":55,"props":771,"children":772},{},[773],{"type":48,"value":774},"unweighted kappa",{"type":48,"value":776}," only when categories are truly nominal with no ordering (e.g., disease type: cardiac/pulmonary/neurological/other — there is no natural ordering between disease types).",{"type":42,"tag":51,"props":778,"children":779},{},[780,785,787,792,794,798],{"type":42,"tag":55,"props":781,"children":782},{},[783],{"type":48,"value":784},"Why can kappa be low even when percent agreement is high?",{"type":48,"value":786},"\nThis is the ",{"type":42,"tag":55,"props":788,"children":789},{},[790],{"type":48,"value":791},"kappa paradox",{"type":48,"value":793}," (Cicchetti & Feinstein, 1990): when one category has very high prevalence, both raters default to that category most of the time, inflating percent agreement while kappa remains low. Example: if 95% of patients are disease-free, two raters who always say \"no disease\" achieve 95% raw agreement but κ = 0 (no better than chance). Conversely, when a rare category is almost never used, the off-diagonal cells for that category are near-empty, deflating kappa. The ",{"type":42,"tag":55,"props":795,"children":796},{},[797],{"type":48,"value":674},{"type":48,"value":799}," (prevalence-adjusted, bias-adjusted kappa) accounts for both prevalence imbalance and systematic rater bias; it is defined as 2p_o − 1 and gives 1.0 only when p_o = 1. Report both kappa and percent agreement with category marginals so readers can assess whether the paradox may apply.",{"type":42,"tag":51,"props":801,"children":802},{},[803,808,813,815,820],{"type":42,"tag":55,"props":804,"children":805},{},[806],{"type":48,"value":807},"What is Fleiss' kappa and when should I use it?",{"type":42,"tag":55,"props":809,"children":810},{},[811],{"type":48,"value":812},"Cohen's kappa",{"type":48,"value":814}," is designed for exactly 2 raters. ",{"type":42,"tag":55,"props":816,"children":817},{},[818],{"type":48,"value":819},"Fleiss' kappa",{"type":48,"value":821}," generalizes the agreement correction to 3 or more raters, handling the case where each subject is rated by the same k raters. Fleiss' kappa is computed from the proportion of rater pairs who agree for each subject, averaged across subjects and corrected for chance. Note that Fleiss' kappa measures agreement among a fixed set of raters — it is not equivalent to computing all pairwise Cohen's kappas and averaging them (which is a valid but different approach that gives per-pair estimates). If raters are different for different subjects (e.g., each subject is rated by 2 raters chosen from a pool), use a generalized agreement coefficient (Gwet's AC1 or Krippendorff's alpha) instead.",{"title":7,"searchDepth":823,"depth":823,"links":824},2,[825,826,827,828,829,830,831,832],{"id":45,"depth":823,"text":49},{"id":95,"depth":823,"text":98},{"id":159,"depth":823,"text":162},{"id":324,"depth":823,"text":327},{"id":479,"depth":823,"text":482},{"id":642,"depth":823,"text":645},{"id":699,"depth":823,"text":702},{"id":742,"depth":823,"text":745},"markdown","content:tools:079.cohens-kappa.md","content","tools/079.cohens-kappa.md","tools/079.cohens-kappa","md",{"loc":4},1775502472625]