[{"data":1,"prerenderedAt":967},["ShallowReactive",2],{"content-query-n2c3F95Sln":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":6,"presets":18,"gallery":36,"body":38,"_type":960,"_id":961,"_source":962,"_file":963,"_stem":964,"_extension":965,"sitemap":966},"/tools/icc-calculator","tools",false,"","Intraclass Correlation Coefficient Calculator","Calculate ICC online from Excel or CSV data. Assess inter-rater, intra-rater, and test-retest reliability with AI.","ICC Calculator",{"prefix":12,"label":13,"placeholder":14},"Calculate intraclass correlation coefficient","Describe the raters, subjects, and measurement design","e.g. 3 raters scored 30 patients on a pain scale; compute ICC(2,1) absolute agreement; 95% CI; profile plot of rater scores; interpret reliability level",[16],"statistics",true,[19,25,30],{"label":20,"prompt":21,"dataset_url":22,"dataset_title":23,"dataset_citation":24},"Clinical Rater Reliability Study","ICC analysis: 3 clinicians each rated 30 patients on a pain scale (0–10); compute ICC(2,1) absolute agreement and consistency; 95% CI; profile plot showing each rater's scores; classify reliability level (poor/moderate/good/excellent)","https://data.cdc.gov/api/views/dppn-5tm3/rows.csv?accessType=DOWNLOAD","NCHS Health and Nutrition Examination Survey","CDC",{"label":26,"prompt":27,"dataset_url":28,"dataset_title":29,"dataset_citation":24},"Test-Retest Reliability Study","ICC(3,1) consistency for test-retest reliability: same instrument administered twice to 40 subjects; compute ICC, 95% CI, standard error of measurement (SEM), and minimal detectable change (MDC); Bland-Altman plot of test-retest difference","https://data.cdc.gov/api/views/iuq5-y9ct/rows.csv?accessType=DOWNLOAD","NHANES Mental Health Assessment",{"label":31,"prompt":32,"dataset_url":33,"dataset_title":34,"dataset_citation":35},"Multi-Site Measurement Consistency","ICC(2,k) average measures for 4 measurement sites evaluating the same specimens; absolute agreement; compute ICC, 95% CI; identify sites with systematic bias; variance components breakdown","https://ourworldindata.org/grapher/happiness-cantril-ladder.csv","Self-Reported Life Satisfaction Scores","Our World in Data",[37],"/img/tools/icc-calculator.png",{"type":39,"children":40,"toc":949},"root",[41,50,78,132,151,157,215,221,409,414,420,570,576,733,739,811,817,854,860,877,908,939],{"type":42,"tag":43,"props":44,"children":46},"element","h2",{"id":45},"what-is-the-intraclass-correlation-coefficient",[47],{"type":48,"value":49},"text","What Is the Intraclass Correlation Coefficient?",{"type":42,"tag":51,"props":52,"children":53},"p",{},[54,56,62,64,69,71,76],{"type":48,"value":55},"The ",{"type":42,"tag":57,"props":58,"children":59},"strong",{},[60],{"type":48,"value":61},"intraclass correlation coefficient (ICC)",{"type":48,"value":63}," is the standard measure of reliability for continuous measurements — it quantifies how consistent or interchangeable measurements are when made by different raters, instruments, or test occasions. Unlike Pearson's r (which requires exactly two variables and ignores systematic bias), ICC can handle any number of raters, accounts for both systematic and random sources of disagreement, and is defined as the ratio of ",{"type":42,"tag":57,"props":65,"children":66},{},[67],{"type":48,"value":68},"between-subject variance",{"type":48,"value":70}," to ",{"type":42,"tag":57,"props":72,"children":73},{},[74],{"type":48,"value":75},"total variance",{"type":48,"value":77}," (between-subject + within-subject + error). ICC ranges from 0 (no reliability — all variance is random error) to 1 (perfect reliability — all variance reflects true between-subject differences).",{"type":42,"tag":51,"props":79,"children":80},{},[81,83,88,90,95,97,102,104,109,111,116,118,123,125,130],{"type":48,"value":82},"The key conceptual distinction in ICC is between ",{"type":42,"tag":57,"props":84,"children":85},{},[86],{"type":48,"value":87},"consistency",{"type":48,"value":89}," and ",{"type":42,"tag":57,"props":91,"children":92},{},[93],{"type":48,"value":94},"absolute agreement",{"type":48,"value":96},". ",{"type":42,"tag":57,"props":98,"children":99},{},[100],{"type":48,"value":101},"Consistency",{"type":48,"value":103}," asks whether raters rank subjects in the same order — it ignores systematic bias (one rater always scoring 3 points higher than another). ",{"type":42,"tag":57,"props":105,"children":106},{},[107],{"type":48,"value":108},"Absolute agreement",{"type":48,"value":110}," additionally requires that raters give the same numerical values — it penalizes systematic differences between raters. For instrument interchangeability (e.g., can device A replace device B?), absolute agreement is appropriate. For assessing whether raters can discriminate between subjects (e.g., ranking pain severity), consistency is sufficient. Shrout and Fleiss (1979) and McGraw and Wong (1996) provide the definitive taxonomy: ",{"type":42,"tag":57,"props":112,"children":113},{},[114],{"type":48,"value":115},"ICC(1,1)",{"type":48,"value":117}," — one-way random, single measures; ",{"type":42,"tag":57,"props":119,"children":120},{},[121],{"type":48,"value":122},"ICC(2,1)",{"type":48,"value":124}," — two-way mixed/random, absolute agreement, single measures; ",{"type":42,"tag":57,"props":126,"children":127},{},[128],{"type":48,"value":129},"ICC(3,1)",{"type":48,"value":131}," — two-way mixed, consistency, single measures; and their average-measures counterparts ICC(1,k), ICC(2,k), ICC(3,k).",{"type":42,"tag":51,"props":133,"children":134},{},[135,137,142,144,149],{"type":48,"value":136},"A practical example: three physical therapists rate shoulder abduction range-of-motion in 30 patients. ICC(2,1) absolute agreement = 0.89 (95% CI: 0.82–0.94), classified as \"good\" reliability. The profile plot reveals that Rater 2 consistently scores 3° higher than the other two raters — a systematic bias that makes absolute agreement lower than consistency (ICC(2,1) consistency = 0.92). The ",{"type":42,"tag":57,"props":138,"children":139},{},[140],{"type":48,"value":141},"standard error of measurement (SEM = SD × √(1−ICC))",{"type":48,"value":143}," = 4.2° indicates the typical measurement error for a single rater. The ",{"type":42,"tag":57,"props":145,"children":146},{},[147],{"type":48,"value":148},"minimal detectable change (MDC₉₅ = 1.96 × √2 × SEM)",{"type":48,"value":150}," = 11.7° gives the threshold above which a change in a patient's score can be attributed to a real change rather than measurement error.",{"type":42,"tag":43,"props":152,"children":154},{"id":153},"how-it-works",[155],{"type":48,"value":156},"How It Works",{"type":42,"tag":158,"props":159,"children":160},"ol",{},[161,172,188],{"type":42,"tag":162,"props":163,"children":164},"li",{},[165,170],{"type":42,"tag":57,"props":166,"children":167},{},[168],{"type":48,"value":169},"Upload your data",{"type":48,"value":171}," — provide a CSV or Excel file in wide format: one row per subject, one column per rater/measurement occasion. Include a subject ID column if available.",{"type":42,"tag":162,"props":173,"children":174},{},[175,180,182],{"type":42,"tag":57,"props":176,"children":177},{},[178],{"type":48,"value":179},"Describe the design",{"type":48,"value":181}," — e.g. ",{"type":42,"tag":183,"props":184,"children":185},"em",{},[186],{"type":48,"value":187},"\"3 raters, 30 subjects, pain scale 0–10; compute ICC(2,1) absolute agreement; 95% CI; profile plot; SEM and MDC; classify reliability\"",{"type":42,"tag":162,"props":189,"children":190},{},[191,196,198,205,207,213],{"type":42,"tag":57,"props":192,"children":193},{},[194],{"type":48,"value":195},"Get full results",{"type":48,"value":197}," — the AI writes Python code using ",{"type":42,"tag":199,"props":200,"children":202},"a",{"href":201},"https://pingouin-stats.org/",[203],{"type":48,"value":204},"pingouin",{"type":48,"value":206}," or ",{"type":42,"tag":199,"props":208,"children":210},{"href":209},"https://docs.scipy.org/doc/scipy/",[211],{"type":48,"value":212},"scipy",{"type":48,"value":214}," to compute the appropriate ICC, 95% CI via F-distribution, variance components, SEM, MDC, and produce the profile plot and ICC summary table",{"type":42,"tag":43,"props":216,"children":218},{"id":217},"required-data-format",[219],{"type":48,"value":220},"Required Data Format",{"type":42,"tag":222,"props":223,"children":224},"table",{},[225,249],{"type":42,"tag":226,"props":227,"children":228},"thead",{},[229],{"type":42,"tag":230,"props":231,"children":232},"tr",{},[233,239,244],{"type":42,"tag":234,"props":235,"children":236},"th",{},[237],{"type":48,"value":238},"Column",{"type":42,"tag":234,"props":240,"children":241},{},[242],{"type":48,"value":243},"Description",{"type":42,"tag":234,"props":245,"children":246},{},[247],{"type":48,"value":248},"Example",{"type":42,"tag":250,"props":251,"children":252},"tbody",{},[253,289,329,369],{"type":42,"tag":230,"props":254,"children":255},{},[256,267,272],{"type":42,"tag":257,"props":258,"children":259},"td",{},[260],{"type":42,"tag":261,"props":262,"children":264},"code",{"className":263},[],[265],{"type":48,"value":266},"subject",{"type":42,"tag":257,"props":268,"children":269},{},[270],{"type":48,"value":271},"Subject/item identifier",{"type":42,"tag":257,"props":273,"children":274},{},[275,281,283],{"type":42,"tag":261,"props":276,"children":278},{"className":277},[],[279],{"type":48,"value":280},"P001",{"type":48,"value":282},", ",{"type":42,"tag":261,"props":284,"children":286},{"className":285},[],[287],{"type":48,"value":288},"P002",{"type":42,"tag":230,"props":290,"children":291},{},[292,301,306],{"type":42,"tag":257,"props":293,"children":294},{},[295],{"type":42,"tag":261,"props":296,"children":298},{"className":297},[],[299],{"type":48,"value":300},"rater1",{"type":42,"tag":257,"props":302,"children":303},{},[304],{"type":48,"value":305},"Scores from rater 1",{"type":42,"tag":257,"props":307,"children":308},{},[309,315,316,322,323],{"type":42,"tag":261,"props":310,"children":312},{"className":311},[],[313],{"type":48,"value":314},"6.2",{"type":48,"value":282},{"type":42,"tag":261,"props":317,"children":319},{"className":318},[],[320],{"type":48,"value":321},"8.5",{"type":48,"value":282},{"type":42,"tag":261,"props":324,"children":326},{"className":325},[],[327],{"type":48,"value":328},"4.1",{"type":42,"tag":230,"props":330,"children":331},{},[332,341,346],{"type":42,"tag":257,"props":333,"children":334},{},[335],{"type":42,"tag":261,"props":336,"children":338},{"className":337},[],[339],{"type":48,"value":340},"rater2",{"type":42,"tag":257,"props":342,"children":343},{},[344],{"type":48,"value":345},"Scores from rater 2",{"type":42,"tag":257,"props":347,"children":348},{},[349,355,356,362,363],{"type":42,"tag":261,"props":350,"children":352},{"className":351},[],[353],{"type":48,"value":354},"5.8",{"type":48,"value":282},{"type":42,"tag":261,"props":357,"children":359},{"className":358},[],[360],{"type":48,"value":361},"8.9",{"type":48,"value":282},{"type":42,"tag":261,"props":364,"children":366},{"className":365},[],[367],{"type":48,"value":368},"4.4",{"type":42,"tag":230,"props":370,"children":371},{},[372,381,386],{"type":42,"tag":257,"props":373,"children":374},{},[375],{"type":42,"tag":261,"props":376,"children":378},{"className":377},[],[379],{"type":48,"value":380},"rater3",{"type":42,"tag":257,"props":382,"children":383},{},[384],{"type":48,"value":385},"Optional: additional rater",{"type":42,"tag":257,"props":387,"children":388},{},[389,395,396,402,403],{"type":42,"tag":261,"props":390,"children":392},{"className":391},[],[393],{"type":48,"value":394},"6.0",{"type":48,"value":282},{"type":42,"tag":261,"props":397,"children":399},{"className":398},[],[400],{"type":48,"value":401},"8.3",{"type":48,"value":282},{"type":42,"tag":261,"props":404,"children":406},{"className":405},[],[407],{"type":48,"value":408},"4.6",{"type":42,"tag":51,"props":410,"children":411},{},[412],{"type":48,"value":413},"Wide format only (one row per subject, one column per rater). If data are in long format (one row per rating), ask the AI to pivot to wide format first. All measurements must be on the same numeric scale. Missing values exclude that subject from the analysis.",{"type":42,"tag":43,"props":415,"children":417},{"id":416},"interpreting-the-results",[418],{"type":48,"value":419},"Interpreting the Results",{"type":42,"tag":222,"props":421,"children":422},{},[423,439],{"type":42,"tag":226,"props":424,"children":425},{},[426],{"type":42,"tag":230,"props":427,"children":428},{},[429,434],{"type":42,"tag":234,"props":430,"children":431},{},[432],{"type":48,"value":433},"Output",{"type":42,"tag":234,"props":435,"children":436},{},[437],{"type":48,"value":438},"What it means",{"type":42,"tag":250,"props":440,"children":441},{},[442,458,474,490,506,522,538,554],{"type":42,"tag":230,"props":443,"children":444},{},[445,453],{"type":42,"tag":257,"props":446,"children":447},{},[448],{"type":42,"tag":57,"props":449,"children":450},{},[451],{"type":48,"value":452},"ICC point estimate",{"type":42,"tag":257,"props":454,"children":455},{},[456],{"type":48,"value":457},"Proportion of total variance due to true subject differences — higher = more reliable",{"type":42,"tag":230,"props":459,"children":460},{},[461,469],{"type":42,"tag":257,"props":462,"children":463},{},[464],{"type":42,"tag":57,"props":465,"children":466},{},[467],{"type":48,"value":468},"95% CI",{"type":42,"tag":257,"props":470,"children":471},{},[472],{"type":48,"value":473},"Uncertainty in the ICC estimate — always report; wide CI with small n is common",{"type":42,"tag":230,"props":475,"children":476},{},[477,485],{"type":42,"tag":257,"props":478,"children":479},{},[480],{"type":42,"tag":57,"props":481,"children":482},{},[483],{"type":48,"value":484},"ICC model",{"type":42,"tag":257,"props":486,"children":487},{},[488],{"type":48,"value":489},"Which of the 6 Shrout-Fleiss models was used — must match the study design",{"type":42,"tag":230,"props":491,"children":492},{},[493,501],{"type":42,"tag":257,"props":494,"children":495},{},[496],{"type":42,"tag":57,"props":497,"children":498},{},[499],{"type":48,"value":500},"Reliability classification",{"type":42,"tag":257,"props":502,"children":503},{},[504],{"type":48,"value":505},"Poor \u003C 0.50 · Moderate 0.50–0.75 · Good 0.75–0.90 · Excellent ≥ 0.90 (Koo & Mae, 2016)",{"type":42,"tag":230,"props":507,"children":508},{},[509,517],{"type":42,"tag":257,"props":510,"children":511},{},[512],{"type":42,"tag":57,"props":513,"children":514},{},[515],{"type":48,"value":516},"SEM",{"type":42,"tag":257,"props":518,"children":519},{},[520],{"type":48,"value":521},"Standard Error of Measurement = SD × √(1−ICC) — absolute measurement precision in original units",{"type":42,"tag":230,"props":523,"children":524},{},[525,533],{"type":42,"tag":257,"props":526,"children":527},{},[528],{"type":42,"tag":57,"props":529,"children":530},{},[531],{"type":48,"value":532},"MDC₉₅",{"type":42,"tag":257,"props":534,"children":535},{},[536],{"type":48,"value":537},"Minimal Detectable Change = 1.96 × √2 × SEM — smallest real change detectable above measurement noise",{"type":42,"tag":230,"props":539,"children":540},{},[541,549],{"type":42,"tag":257,"props":542,"children":543},{},[544],{"type":42,"tag":57,"props":545,"children":546},{},[547],{"type":48,"value":548},"Variance components",{"type":42,"tag":257,"props":550,"children":551},{},[552],{"type":48,"value":553},"Between-subject, between-rater, and residual variance — identifies the primary source of unreliability",{"type":42,"tag":230,"props":555,"children":556},{},[557,565],{"type":42,"tag":257,"props":558,"children":559},{},[560],{"type":42,"tag":57,"props":561,"children":562},{},[563],{"type":48,"value":564},"Profile plot",{"type":42,"tag":257,"props":566,"children":567},{},[568],{"type":48,"value":569},"Rater scores per subject with group means — reveals systematic rater bias visually",{"type":42,"tag":43,"props":571,"children":573},{"id":572},"example-prompts",[574],{"type":48,"value":575},"Example Prompts",{"type":42,"tag":222,"props":577,"children":578},{},[579,595],{"type":42,"tag":226,"props":580,"children":581},{},[582],{"type":42,"tag":230,"props":583,"children":584},{},[585,590],{"type":42,"tag":234,"props":586,"children":587},{},[588],{"type":48,"value":589},"Scenario",{"type":42,"tag":234,"props":591,"children":592},{},[593],{"type":48,"value":594},"What to type",{"type":42,"tag":250,"props":596,"children":597},{},[598,615,632,649,665,682,699,716],{"type":42,"tag":230,"props":599,"children":600},{},[601,606],{"type":42,"tag":257,"props":602,"children":603},{},[604],{"type":48,"value":605},"Basic 2-rater ICC",{"type":42,"tag":257,"props":607,"children":608},{},[609],{"type":42,"tag":261,"props":610,"children":612},{"className":611},[],[613],{"type":48,"value":614},"2 raters, 25 subjects; ICC(2,1) absolute agreement; 95% CI; classify reliability; scatter plot rater1 vs rater2",{"type":42,"tag":230,"props":616,"children":617},{},[618,623],{"type":42,"tag":257,"props":619,"children":620},{},[621],{"type":48,"value":622},"Test-retest",{"type":42,"tag":257,"props":624,"children":625},{},[626],{"type":42,"tag":261,"props":627,"children":629},{"className":628},[],[630],{"type":48,"value":631},"test and retest scores for same instrument; ICC(3,1) consistency; SEM; MDC95; Bland-Altman plot of difference vs mean",{"type":42,"tag":230,"props":633,"children":634},{},[635,640],{"type":42,"tag":257,"props":636,"children":637},{},[638],{"type":48,"value":639},"3+ raters",{"type":42,"tag":257,"props":641,"children":642},{},[643],{"type":42,"tag":261,"props":644,"children":646},{"className":645},[],[647],{"type":48,"value":648},"4 raters, 30 subjects; ICC(2,1) and ICC(2,4) average measures; compare single vs average reliability; profile plot",{"type":42,"tag":230,"props":650,"children":651},{},[652,656],{"type":42,"tag":257,"props":653,"children":654},{},[655],{"type":48,"value":548},{"type":42,"tag":257,"props":657,"children":658},{},[659],{"type":42,"tag":261,"props":660,"children":662},{"className":661},[],[663],{"type":48,"value":664},"compute variance components: between-subject, between-rater, residual; pie chart; identify main source of unreliability",{"type":42,"tag":230,"props":666,"children":667},{},[668,673],{"type":42,"tag":257,"props":669,"children":670},{},[671],{"type":48,"value":672},"Absolute vs consistency",{"type":42,"tag":257,"props":674,"children":675},{},[676],{"type":42,"tag":261,"props":677,"children":679},{"className":678},[],[680],{"type":48,"value":681},"compute both ICC(2,1) absolute and ICC(2,1) consistency; compare; if they differ substantially, report rater bias",{"type":42,"tag":230,"props":683,"children":684},{},[685,690],{"type":42,"tag":257,"props":686,"children":687},{},[688],{"type":48,"value":689},"SEM and MDC",{"type":42,"tag":257,"props":691,"children":692},{},[693],{"type":42,"tag":261,"props":694,"children":696},{"className":695},[],[697],{"type":48,"value":698},"ICC(2,1); compute SEM and MDC95 in original units; interpret: what score change is clinically meaningful vs noise?",{"type":42,"tag":230,"props":700,"children":701},{},[702,707],{"type":42,"tag":257,"props":703,"children":704},{},[705],{"type":48,"value":706},"Two-way ANOVA table",{"type":42,"tag":257,"props":708,"children":709},{},[710],{"type":42,"tag":261,"props":711,"children":713},{"className":712},[],[714],{"type":48,"value":715},"full two-way ANOVA table underlying the ICC: SS, df, MS for subjects, raters, residual; F-tests",{"type":42,"tag":230,"props":717,"children":718},{},[719,724],{"type":42,"tag":257,"props":720,"children":721},{},[722],{"type":48,"value":723},"Minimum sample size",{"type":42,"tag":257,"props":725,"children":726},{},[727],{"type":42,"tag":261,"props":728,"children":730},{"className":729},[],[731],{"type":48,"value":732},"how many subjects needed to estimate ICC ≥ 0.75 with CI width ≤ 0.20? compute with 3 raters",{"type":42,"tag":43,"props":734,"children":736},{"id":735},"assumptions-to-check",[737],{"type":48,"value":738},"Assumptions to Check",{"type":42,"tag":740,"props":741,"children":742},"ul",{},[743,771,781,791,801],{"type":42,"tag":162,"props":744,"children":745},{},[746,751,753,757,759,763,765,769],{"type":42,"tag":57,"props":747,"children":748},{},[749],{"type":48,"value":750},"Correct ICC model for the design",{"type":48,"value":752}," — the most common error in ICC analysis is using the wrong model; use ",{"type":42,"tag":57,"props":754,"children":755},{},[756],{"type":48,"value":115},{"type":48,"value":758}," when raters are randomly sampled and each subject is rated by a different random subset of raters; use ",{"type":42,"tag":57,"props":760,"children":761},{},[762],{"type":48,"value":122},{"type":48,"value":764}," when all subjects are rated by the same raters AND raters are considered a random sample from a larger pool (e.g., any physical therapist); use ",{"type":42,"tag":57,"props":766,"children":767},{},[768],{"type":48,"value":129},{"type":48,"value":770}," when all subjects are rated by the same raters AND these are the only raters of interest (fixed raters); if you want to generalize to new raters, use ICC(2,1), not ICC(3,1)",{"type":42,"tag":162,"props":772,"children":773},{},[774,779],{"type":42,"tag":57,"props":775,"children":776},{},[777],{"type":48,"value":778},"Normal distribution of scores",{"type":48,"value":780}," — ICC is derived from ANOVA and assumes normally distributed subject scores; check with Q-Q plot; with large n (> 50 subjects) the ANOVA is robust to non-normality; for ordinal scales with few categories, consider weighted kappa instead",{"type":42,"tag":162,"props":782,"children":783},{},[784,789],{"type":42,"tag":57,"props":785,"children":786},{},[787],{"type":48,"value":788},"No systematic rater×subject interactions",{"type":48,"value":790}," — standard ICC assumes raters differ from each other by a constant bias (one rater always scores 2 points higher); if bias depends on the subject (e.g., raters disagree more for severe cases), the two-way ANOVA residuals will be large and ICC will be artificially deflated; inspect the profile plot for crossing lines",{"type":42,"tag":162,"props":792,"children":793},{},[794,799],{"type":42,"tag":57,"props":795,"children":796},{},[797],{"type":48,"value":798},"Homoscedasticity",{"type":48,"value":800}," — measurement error should be constant across the measurement range; if variability increases with the score magnitude (common in biological measurements), consider log-transforming the data before computing ICC",{"type":42,"tag":162,"props":802,"children":803},{},[804,809],{"type":42,"tag":57,"props":805,"children":806},{},[807],{"type":48,"value":808},"Sufficient sample size",{"type":48,"value":810}," — reliable ICC estimates require n ≥ 30 subjects; the 95% CI width for ICC ≈ 0.80 is approximately ±0.15 at n = 30 and narrows to ±0.08 at n = 100; for regulatory submissions (device validation, clinical test reliability), n ≥ 50–100 subjects is recommended",{"type":42,"tag":43,"props":812,"children":814},{"id":813},"related-tools",[815],{"type":48,"value":816},"Related Tools",{"type":42,"tag":51,"props":818,"children":819},{},[820,822,828,830,836,838,844,846,852],{"type":48,"value":821},"Use the ",{"type":42,"tag":199,"props":823,"children":825},{"href":824},"/tools/bland-altman-plot",[826],{"type":48,"value":827},"Bland-Altman Plot Generator",{"type":48,"value":829}," alongside ICC — ICC quantifies the proportion of variance due to subjects, while the Bland-Altman plot visualizes the actual magnitude of disagreement between two measurement methods in original units; for method comparison studies, both are required. Use the ",{"type":42,"tag":199,"props":831,"children":833},{"href":832},"/tools/cronbachs-alpha",[834],{"type":48,"value":835},"Cronbach's Alpha Calculator",{"type":48,"value":837}," when items are parallel indicators of a latent construct (psychometric reliability) rather than repeated measurements of the same physical quantity — Cronbach's alpha and ICC(2,k) consistency are mathematically equivalent in the two-way mixed model. Use the ",{"type":42,"tag":199,"props":839,"children":841},{"href":840},"/tools/mixed-effects-model",[842],{"type":48,"value":843},"Linear Mixed Effects Model Calculator",{"type":48,"value":845}," when ICC is a secondary output (the model's random intercept variance / total variance) in a longitudinal study where the primary goal is estimating fixed effects. Use the ",{"type":42,"tag":199,"props":847,"children":849},{"href":848},"/tools/power-analysis",[850],{"type":48,"value":851},"Power Analysis Calculator",{"type":48,"value":853}," to determine sample size needed to estimate ICC with a target CI width.",{"type":42,"tag":43,"props":855,"children":857},{"id":856},"frequently-asked-questions",[858],{"type":48,"value":859},"Frequently Asked Questions",{"type":42,"tag":51,"props":861,"children":862},{},[863,868,870,875],{"type":42,"tag":57,"props":864,"children":865},{},[866],{"type":48,"value":867},"Which ICC model should I use?",{"type":48,"value":869},"\nFollow the Koo and Mae (2016) decision tree: (1) Are all subjects rated by the same raters? If NO → ICC(1,1). If YES → (2) Are the raters a random sample from a larger pool (you want to generalize to new raters)? If YES → ICC(2,1). If NO (these specific raters are the only ones of interest) → ICC(3,1). Then: (3) Is the goal to use a single rater's score in practice? → single measures ICC (the k=1 version). Is the goal to average multiple raters' scores? → average measures ICC(k). For clinical outcome measurement validation, ",{"type":42,"tag":57,"props":871,"children":872},{},[873],{"type":48,"value":874},"ICC(2,1) absolute agreement",{"type":48,"value":876}," is typically most appropriate because you want the measurement to be reliable across any trained clinician, not just the specific raters in the study.",{"type":42,"tag":51,"props":878,"children":879},{},[880,885,887,892,894,899,901,906],{"type":42,"tag":57,"props":881,"children":882},{},[883],{"type":48,"value":884},"What is the difference between SEM and MDC?",{"type":48,"value":886},"\nThe ",{"type":42,"tag":57,"props":888,"children":889},{},[890],{"type":48,"value":891},"Standard Error of Measurement (SEM)",{"type":48,"value":893}," = SD × √(1−ICC) quantifies the typical magnitude of measurement error for a single measurement — it is in the same units as the original scale. A patient's true score is estimated to be within ±1.96 × SEM of their observed score with 95% probability. The ",{"type":42,"tag":57,"props":895,"children":896},{},[897],{"type":48,"value":898},"Minimal Detectable Change (MDC₉₅)",{"type":48,"value":900}," = 1.96 × √2 × SEM is the smallest change in score between two measurement occasions that exceeds measurement error with 95% confidence — any change smaller than MDC₉₅ cannot be distinguished from random fluctuation. Example: if SEM = 3 points on a pain scale, MDC₉₅ = 1.96 × √2 × 3 = 8.3 points — a patient's pain must decrease by at least 8.3 points to confidently claim a real improvement. MDC should be contrasted with the ",{"type":42,"tag":57,"props":902,"children":903},{},[904],{"type":48,"value":905},"Minimal Clinically Important Difference (MCID)",{"type":48,"value":907},", which is determined by patient-reported outcomes, not psychometrics.",{"type":42,"tag":51,"props":909,"children":910},{},[911,916,918,923,925,930,932,937],{"type":42,"tag":57,"props":912,"children":913},{},[914],{"type":48,"value":915},"My ICC is high (0.85) but raters clearly disagree — what went wrong?",{"type":48,"value":917},"\nHigh ICC with visible rater disagreement usually means: (1) ",{"type":42,"tag":57,"props":919,"children":920},{},[921],{"type":48,"value":922},"large between-subject variance",{"type":48,"value":924}," — if subjects vary enormously in their true scores, even large absolute rater differences produce a high ICC (ICC measures relative agreement, not absolute agreement); (2) ",{"type":42,"tag":57,"props":926,"children":927},{},[928],{"type":48,"value":929},"wrong ICC model",{"type":48,"value":931}," — if you used consistency ICC when absolute agreement was needed, systematic rater biases are ignored; (3) ",{"type":42,"tag":57,"props":933,"children":934},{},[935],{"type":48,"value":936},"small absolute error relative to range",{"type":48,"value":938}," — with a 0–100 scale and subjects spanning the full range, 5-point rater differences produce high ICC even though 5 points may be clinically meaningful. Always report both ICC and SEM/MDC together — ICC alone is insufficient for clinical decision-making.",{"type":42,"tag":51,"props":940,"children":941},{},[942,947],{"type":42,"tag":57,"props":943,"children":944},{},[945],{"type":48,"value":946},"How many raters and subjects do I need?",{"type":48,"value":948},"\nThe precision of the ICC estimate (CI width) depends on both the number of subjects (n) and raters (k). Adding subjects narrows the CI more efficiently than adding raters. Rough guidance for ICC ≈ 0.70–0.85: to achieve 95% CI width ≤ 0.20, you need approximately n = 30 subjects with k = 2 raters, or n = 20 with k = 3 raters. For CI width ≤ 0.10 (regulatory-grade precision): n ≈ 100 with k = 2, or n ≈ 60 with k = 3. Use a dedicated ICC sample size calculator (e.g., Bonett's method) for exact calculations with your target ICC and acceptable CI width.",{"title":7,"searchDepth":950,"depth":950,"links":951},2,[952,953,954,955,956,957,958,959],{"id":45,"depth":950,"text":49},{"id":153,"depth":950,"text":156},{"id":217,"depth":950,"text":220},{"id":416,"depth":950,"text":419},{"id":572,"depth":950,"text":575},{"id":735,"depth":950,"text":738},{"id":813,"depth":950,"text":816},{"id":856,"depth":950,"text":859},"markdown","content:tools:078.icc-calculator.md","content","tools/078.icc-calculator.md","tools/078.icc-calculator","md",{"loc":4},1775502472614]