[{"data":1,"prerenderedAt":879},["ShallowReactive",2],{"content-query-zXj74iUiQ6":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":6,"presets":18,"gallery":36,"body":38,"_type":872,"_id":873,"_source":874,"_file":875,"_stem":876,"_extension":877,"sitemap":878},"/tools/item-analysis","tools",false,"","Item Analysis Calculator for Tests and Exams","Analyze exam and survey items online from Excel or CSV data. Calculate difficulty, discrimination, and weak-item flags with AI.","Item Analysis Calculator",{"prefix":12,"label":13,"placeholder":14},"Run item analysis","Describe the test items and scoring","e.g. 20 multiple-choice items, 200 students, binary (0/1) scoring; compute item difficulty, point-biserial discrimination, flag items with r \u003C 0.20 or p \u003C 0.20; difficulty vs discrimination scatter plot",[16],"statistics",true,[19,25,30],{"label":20,"prompt":21,"dataset_url":22,"dataset_title":23,"dataset_citation":24},"Classroom Exam Item Quality","Item analysis for a 25-item multiple-choice exam: compute item difficulty (p-value), point-biserial correlation (r_pb) for each item; flag items with r_pb \u003C 0.20 or difficulty outside 0.20–0.80; difficulty vs discrimination scatter; rank items by discrimination","https://data.cdc.gov/api/views/iuq5-y9ct/rows.csv?accessType=DOWNLOAD","NHANES Mental Health Assessment","CDC",{"label":26,"prompt":27,"dataset_url":28,"dataset_title":29,"dataset_citation":24},"Health Survey Response Analysis","Item analysis on binary survey items: item difficulty, point-biserial discrimination, Cronbach's alpha if item deleted; identify items that reduce reliability; bar chart of r_pb by item sorted descending","https://data.cdc.gov/api/views/dppn-5tm3/rows.csv?accessType=DOWNLOAD","NCHS Health and Nutrition Examination Survey",{"label":31,"prompt":32,"dataset_url":33,"dataset_title":34,"dataset_citation":35},"Standardized Test Bank Review","Item discrimination analysis: compute r_pb and biserial r for each item; upper/lower 27% discrimination index (D); flag items with D \u003C 0.20; distractor frequency table for each item; summary table of all indices","https://ourworldindata.org/grapher/pisa-test-score-mean-performance-on-the-reading-scale.csv","PISA Mean Reading Performance Scores","Our World in Data",[37],"/img/tools/item-analysis.png",{"type":39,"children":40,"toc":861},"root",[41,50,76,95,107,113,171,177,323,328,334,468,474,630,636,690,696,733,739,763,815,837],{"type":42,"tag":43,"props":44,"children":46},"element","h2",{"id":45},"what-is-item-analysis",[47],{"type":48,"value":49},"text","What Is Item Analysis?",{"type":42,"tag":51,"props":52,"children":53},"p",{},[54,60,62,67,69,74],{"type":42,"tag":55,"props":56,"children":57},"strong",{},[58],{"type":48,"value":59},"Item analysis",{"type":48,"value":61}," is the systematic evaluation of individual test or survey questions (items) to determine whether they function as intended — distinguishing high-ability from low-ability respondents, covering the intended difficulty range, and contributing to measurement reliability. Two core statistics underpin item analysis: ",{"type":42,"tag":55,"props":63,"children":64},{},[65],{"type":48,"value":66},"item difficulty (p-value)",{"type":48,"value":68},", the proportion of examinees who answered the item correctly (higher p = easier item), and ",{"type":42,"tag":55,"props":70,"children":71},{},[72],{"type":48,"value":73},"item discrimination",{"type":48,"value":75},", which measures how well the item differentiates between high- and low-scoring respondents. Together, these indices guide decisions about which items to retain, revise, or discard from a test bank.",{"type":42,"tag":51,"props":77,"children":78},{},[79,81,86,88,93],{"type":48,"value":80},"The ",{"type":42,"tag":55,"props":82,"children":83},{},[84],{"type":48,"value":85},"point-biserial correlation (r_pb)",{"type":48,"value":87}," is the standard discrimination index for binary items — it is the Pearson correlation between the binary item score (0/1) and the total test score. An item with r_pb = 0.40 means that students who answered it correctly scored substantially higher overall; an item with r_pb near zero or negative provides no useful information and may actually mislead scoring. Classical guidelines flag items with r_pb \u003C 0.20 as poor discriminators. The ",{"type":42,"tag":55,"props":89,"children":90},{},[91],{"type":48,"value":92},"upper-lower discrimination index (D)",{"type":48,"value":94}," provides an intuitive alternative: compute the proportion correct in the top 27% of scorers minus the proportion in the bottom 27%; D > 0.30 is considered adequate, D \u003C 0.20 poor. Items with p-values outside the 0.20–0.80 range (too easy or too hard) also warrant review, as they contribute little variance to the total score.",{"type":42,"tag":51,"props":96,"children":97},{},[98,100,105],{"type":48,"value":99},"A practical example: a 30-item anatomy exam administered to 150 medical students. Item analysis reveals that item Q7 has p = 0.94 (nearly everyone answers correctly — the item is too easy, adding no discriminating information) and item Q22 has r_pb = −0.08 (students who got it right actually scored lower overall — possible keying error or ambiguous wording). Item Q15 is optimal: p = 0.52 (moderate difficulty), r_pb = 0.47 (strong discrimination). The ",{"type":42,"tag":55,"props":101,"children":102},{},[103],{"type":48,"value":104},"difficulty vs discrimination scatter plot",{"type":48,"value":106}," visualizes all 30 items simultaneously, with the shaded green region indicating the optimal zone (p = 0.30–0.80, r_pb ≥ 0.30).",{"type":42,"tag":43,"props":108,"children":110},{"id":109},"how-it-works",[111],{"type":48,"value":112},"How It Works",{"type":42,"tag":114,"props":115,"children":116},"ol",{},[117,128,144],{"type":42,"tag":118,"props":119,"children":120},"li",{},[121,126],{"type":42,"tag":55,"props":122,"children":123},{},[124],{"type":48,"value":125},"Upload your data",{"type":48,"value":127}," — provide a CSV or Excel file with one row per examinee and one column per item. Items should be scored 0 (incorrect/disagree) and 1 (correct/agree). A total score column is optional — the AI can compute it.",{"type":42,"tag":118,"props":129,"children":130},{},[131,136,138],{"type":42,"tag":55,"props":132,"children":133},{},[134],{"type":48,"value":135},"Describe the analysis",{"type":48,"value":137}," — e.g. ",{"type":42,"tag":139,"props":140,"children":141},"em",{},[142],{"type":48,"value":143},"\"20 binary items, columns Q1–Q20; compute item difficulty, point-biserial r for each item; flag items with r \u003C 0.20; difficulty vs discrimination scatter plot\"",{"type":42,"tag":118,"props":145,"children":146},{},[147,152,154,161,163,169],{"type":42,"tag":55,"props":148,"children":149},{},[150],{"type":48,"value":151},"Get full results",{"type":48,"value":153}," — the AI writes Python code using ",{"type":42,"tag":155,"props":156,"children":158},"a",{"href":157},"https://pandas.pydata.org/",[159],{"type":48,"value":160},"pandas",{"type":48,"value":162}," and ",{"type":42,"tag":155,"props":164,"children":166},{"href":165},"https://plotly.com/python/",[167],{"type":48,"value":168},"Plotly",{"type":48,"value":170}," to compute all item indices, flag poor-performing items, and produce the scatter plot and discrimination bar chart",{"type":42,"tag":43,"props":172,"children":174},{"id":173},"required-data-format",[175],{"type":48,"value":176},"Required Data Format",{"type":42,"tag":178,"props":179,"children":180},"table",{},[181,205],{"type":42,"tag":182,"props":183,"children":184},"thead",{},[185],{"type":42,"tag":186,"props":187,"children":188},"tr",{},[189,195,200],{"type":42,"tag":190,"props":191,"children":192},"th",{},[193],{"type":48,"value":194},"Column",{"type":42,"tag":190,"props":196,"children":197},{},[198],{"type":48,"value":199},"Description",{"type":42,"tag":190,"props":201,"children":202},{},[203],{"type":48,"value":204},"Example",{"type":42,"tag":206,"props":207,"children":208},"tbody",{},[209,257,290],{"type":42,"tag":186,"props":210,"children":211},{},[212,233,238],{"type":42,"tag":213,"props":214,"children":215},"td",{},[216,223,225,231],{"type":42,"tag":217,"props":218,"children":220},"code",{"className":219},[],[221],{"type":48,"value":222},"Q1",{"type":48,"value":224},", ",{"type":42,"tag":217,"props":226,"children":228},{"className":227},[],[229],{"type":48,"value":230},"Q2",{"type":48,"value":232},", …",{"type":42,"tag":213,"props":234,"children":235},{},[236],{"type":48,"value":237},"Binary item score",{"type":42,"tag":213,"props":239,"children":240},{},[241,247,249,255],{"type":42,"tag":217,"props":242,"children":244},{"className":243},[],[245],{"type":48,"value":246},"1",{"type":48,"value":248}," (correct) or ",{"type":42,"tag":217,"props":250,"children":252},{"className":251},[],[253],{"type":48,"value":254},"0",{"type":48,"value":256}," (incorrect)",{"type":42,"tag":186,"props":258,"children":259},{},[260,269,274],{"type":42,"tag":213,"props":261,"children":262},{},[263],{"type":42,"tag":217,"props":264,"children":266},{"className":265},[],[267],{"type":48,"value":268},"student_id",{"type":42,"tag":213,"props":270,"children":271},{},[272],{"type":48,"value":273},"Optional: examinee identifier",{"type":42,"tag":213,"props":275,"children":276},{},[277,283,284],{"type":42,"tag":217,"props":278,"children":280},{"className":279},[],[281],{"type":48,"value":282},"S001",{"type":48,"value":224},{"type":42,"tag":217,"props":285,"children":287},{"className":286},[],[288],{"type":48,"value":289},"student_42",{"type":42,"tag":186,"props":291,"children":292},{},[293,302,307],{"type":42,"tag":213,"props":294,"children":295},{},[296],{"type":42,"tag":217,"props":297,"children":299},{"className":298},[],[300],{"type":48,"value":301},"total",{"type":42,"tag":213,"props":303,"children":304},{},[305],{"type":48,"value":306},"Optional: pre-computed total score",{"type":42,"tag":213,"props":308,"children":309},{},[310,316,317],{"type":42,"tag":217,"props":311,"children":313},{"className":312},[],[314],{"type":48,"value":315},"18",{"type":48,"value":224},{"type":42,"tag":217,"props":318,"children":320},{"className":319},[],[321],{"type":48,"value":322},"24",{"type":42,"tag":51,"props":324,"children":325},{},[326],{"type":48,"value":327},"Any column names work — describe them in your prompt. Items must be binary (0/1). For polytomous items (0–4 Likert scale), mention that in the prompt so the AI uses the polyserial correlation instead. Missing responses should be treated as 0 (incorrect) or excluded — specify your preference.",{"type":42,"tag":43,"props":329,"children":331},{"id":330},"interpreting-the-results",[332],{"type":48,"value":333},"Interpreting the Results",{"type":42,"tag":178,"props":335,"children":336},{},[337,353],{"type":42,"tag":182,"props":338,"children":339},{},[340],{"type":42,"tag":186,"props":341,"children":342},{},[343,348],{"type":42,"tag":190,"props":344,"children":345},{},[346],{"type":48,"value":347},"Output",{"type":42,"tag":190,"props":349,"children":350},{},[351],{"type":48,"value":352},"What it means",{"type":42,"tag":206,"props":354,"children":355},{},[356,372,388,404,420,436,452],{"type":42,"tag":186,"props":357,"children":358},{},[359,367],{"type":42,"tag":213,"props":360,"children":361},{},[362],{"type":42,"tag":55,"props":363,"children":364},{},[365],{"type":48,"value":366},"p-value (difficulty)",{"type":42,"tag":213,"props":368,"children":369},{},[370],{"type":48,"value":371},"Proportion of examinees who answered correctly — 0.50 is optimal; \u003C 0.20 or > 0.80 is problematic",{"type":42,"tag":186,"props":373,"children":374},{},[375,383],{"type":42,"tag":213,"props":376,"children":377},{},[378],{"type":42,"tag":55,"props":379,"children":380},{},[381],{"type":48,"value":382},"Point-biserial r (r_pb)",{"type":42,"tag":213,"props":384,"children":385},{},[386],{"type":48,"value":387},"Correlation between item score and total score — measures discrimination; \u003C 0.20 is poor, ≥ 0.30 is good",{"type":42,"tag":186,"props":389,"children":390},{},[391,399],{"type":42,"tag":213,"props":392,"children":393},{},[394],{"type":42,"tag":55,"props":395,"children":396},{},[397],{"type":48,"value":398},"Biserial r",{"type":42,"tag":213,"props":400,"children":401},{},[402],{"type":48,"value":403},"Corrected version of r_pb assuming underlying continuous ability — slightly higher than r_pb",{"type":42,"tag":186,"props":405,"children":406},{},[407,415],{"type":42,"tag":213,"props":408,"children":409},{},[410],{"type":42,"tag":55,"props":411,"children":412},{},[413],{"type":48,"value":414},"Discrimination index D",{"type":42,"tag":213,"props":416,"children":417},{},[418],{"type":48,"value":419},"(% correct in top 27%) − (% correct in bottom 27%) — ≥ 0.30 is adequate, \u003C 0.20 is poor",{"type":42,"tag":186,"props":421,"children":422},{},[423,431],{"type":42,"tag":213,"props":424,"children":425},{},[426],{"type":42,"tag":55,"props":427,"children":428},{},[429],{"type":48,"value":430},"Alpha if item deleted",{"type":42,"tag":213,"props":432,"children":433},{},[434],{"type":48,"value":435},"Cronbach's alpha of the remaining items if this item is removed — if higher than overall alpha, item is hurting reliability",{"type":42,"tag":186,"props":437,"children":438},{},[439,447],{"type":42,"tag":213,"props":440,"children":441},{},[442],{"type":42,"tag":55,"props":443,"children":444},{},[445],{"type":48,"value":446},"Difficulty vs discrimination plot",{"type":42,"tag":213,"props":448,"children":449},{},[450],{"type":48,"value":451},"Scatter of p-value vs r_pb — shaded zone shows optimal items; flagged items appear outside",{"type":42,"tag":186,"props":453,"children":454},{},[455,463],{"type":42,"tag":213,"props":456,"children":457},{},[458],{"type":42,"tag":55,"props":459,"children":460},{},[461],{"type":48,"value":462},"Distractor analysis",{"type":42,"tag":213,"props":464,"children":465},{},[466],{"type":48,"value":467},"For multiple-choice items, frequency of each answer option by performance group — distractors should attract more low scorers than high scorers",{"type":42,"tag":43,"props":469,"children":471},{"id":470},"example-prompts",[472],{"type":48,"value":473},"Example Prompts",{"type":42,"tag":178,"props":475,"children":476},{},[477,493],{"type":42,"tag":182,"props":478,"children":479},{},[480],{"type":42,"tag":186,"props":481,"children":482},{},[483,488],{"type":42,"tag":190,"props":484,"children":485},{},[486],{"type":48,"value":487},"Scenario",{"type":42,"tag":190,"props":489,"children":490},{},[491],{"type":48,"value":492},"What to type",{"type":42,"tag":206,"props":494,"children":495},{},[496,513,530,546,562,579,596,613],{"type":42,"tag":186,"props":497,"children":498},{},[499,504],{"type":42,"tag":213,"props":500,"children":501},{},[502],{"type":48,"value":503},"Basic item analysis",{"type":42,"tag":213,"props":505,"children":506},{},[507],{"type":42,"tag":217,"props":508,"children":510},{"className":509},[],[511],{"type":48,"value":512},"20 binary items Q1–Q20; item difficulty and point-biserial r; flag r \u003C 0.20; scatter plot; summary table",{"type":42,"tag":186,"props":514,"children":515},{},[516,521],{"type":42,"tag":213,"props":517,"children":518},{},[519],{"type":48,"value":520},"Alpha if deleted",{"type":42,"tag":213,"props":522,"children":523},{},[524],{"type":42,"tag":217,"props":525,"children":527},{"className":526},[],[528],{"type":48,"value":529},"item difficulty and r_pb; Cronbach's alpha if each item deleted; identify items that reduce reliability",{"type":42,"tag":186,"props":531,"children":532},{},[533,537],{"type":42,"tag":213,"props":534,"children":535},{},[536],{"type":48,"value":414},{"type":42,"tag":213,"props":538,"children":539},{},[540],{"type":42,"tag":217,"props":541,"children":543},{"className":542},[],[544],{"type":48,"value":545},"upper/lower 27% discrimination index D for each item; flag D \u003C 0.20; rank items by D",{"type":42,"tag":186,"props":547,"children":548},{},[549,553],{"type":42,"tag":213,"props":550,"children":551},{},[552],{"type":48,"value":462},{"type":42,"tag":213,"props":554,"children":555},{},[556],{"type":42,"tag":217,"props":557,"children":559},{"className":558},[],[560],{"type":48,"value":561},"5-option MCQ items; distractor frequency table for top/middle/bottom third of scorers; flag non-functioning distractors",{"type":42,"tag":186,"props":563,"children":564},{},[565,570],{"type":42,"tag":213,"props":566,"children":567},{},[568],{"type":48,"value":569},"Polytomous items",{"type":42,"tag":213,"props":571,"children":572},{},[573],{"type":42,"tag":217,"props":574,"children":576},{"className":575},[],[577],{"type":48,"value":578},"Likert items scored 1–5; polyserial correlation for each item; item-total correlation; flag correlations \u003C 0.30",{"type":42,"tag":186,"props":580,"children":581},{},[582,587],{"type":42,"tag":213,"props":583,"children":584},{},[585],{"type":48,"value":586},"Test revision",{"type":42,"tag":213,"props":588,"children":589},{},[590],{"type":42,"tag":217,"props":591,"children":593},{"className":592},[],[594],{"type":48,"value":595},"item analysis; identify items to discard (r \u003C 0.20 or p > 0.85); recompute alpha after removing flagged items",{"type":42,"tag":186,"props":597,"children":598},{},[599,604],{"type":42,"tag":213,"props":600,"children":601},{},[602],{"type":48,"value":603},"Difficulty targeting",{"type":42,"tag":213,"props":605,"children":606},{},[607],{"type":42,"tag":217,"props":608,"children":610},{"className":609},[],[611],{"type":48,"value":612},"sort items by p-value; plot distribution of difficulty; identify how many items fall in optimal zone 0.30–0.70",{"type":42,"tag":186,"props":614,"children":615},{},[616,621],{"type":42,"tag":213,"props":617,"children":618},{},[619],{"type":48,"value":620},"Item fit (IRT)",{"type":42,"tag":213,"props":622,"children":623},{},[624],{"type":42,"tag":217,"props":625,"children":627},{"className":626},[],[628],{"type":48,"value":629},"fit 1-parameter logistic (Rasch) model; item difficulty parameters; flag items with poor fit (χ² p \u003C 0.05)",{"type":42,"tag":43,"props":631,"children":633},{"id":632},"assumptions-to-check",[634],{"type":48,"value":635},"Assumptions to Check",{"type":42,"tag":637,"props":638,"children":639},"ul",{},[640,650,660,670,680],{"type":42,"tag":118,"props":641,"children":642},{},[643,648],{"type":42,"tag":55,"props":644,"children":645},{},[646],{"type":48,"value":647},"Sufficient sample size",{"type":48,"value":649}," — item statistics are unstable with small samples; p-values and r_pb require at least n ≥ 100 examinees for reliable estimation; with n \u003C 50, item statistics should be interpreted cautiously and confirmed with a new administration",{"type":42,"tag":118,"props":651,"children":652},{},[653,658],{"type":42,"tag":55,"props":654,"children":655},{},[656],{"type":48,"value":657},"Unidimensionality",{"type":48,"value":659}," — item discrimination indices assume a single dominant trait underlies all items; if the test measures multiple independent constructs, run separate item analyses by subscale rather than against the total score",{"type":42,"tag":118,"props":661,"children":662},{},[663,668],{"type":42,"tag":55,"props":664,"children":665},{},[666],{"type":48,"value":667},"Binary scoring",{"type":48,"value":669}," — classical item analysis statistics (p-value, point-biserial r) apply to items scored 0/1; for partial credit or Likert items, use polyserial correlation and mean inter-item correlation instead",{"type":42,"tag":118,"props":671,"children":672},{},[673,678],{"type":42,"tag":55,"props":674,"children":675},{},[676],{"type":48,"value":677},"Criterion-referenced vs norm-referenced",{"type":48,"value":679}," — optimal difficulty depends on the test purpose: norm-referenced tests (designed to spread students out) benefit most from p ≈ 0.50; criterion-referenced mastery tests (pass/fail) may legitimately have many easy items if the material is expected to be known",{"type":42,"tag":118,"props":681,"children":682},{},[683,688],{"type":42,"tag":55,"props":684,"children":685},{},[686],{"type":48,"value":687},"Item independence",{"type":48,"value":689}," — items that scaffold on each other (if Q5 is answered wrong, Q6 is forced wrong) violate the independence assumption; verify item independence by design before interpreting discrimination indices",{"type":42,"tag":43,"props":691,"children":693},{"id":692},"related-tools",[694],{"type":48,"value":695},"Related Tools",{"type":42,"tag":51,"props":697,"children":698},{},[699,701,707,709,715,717,723,725,731],{"type":48,"value":700},"Use the ",{"type":42,"tag":155,"props":702,"children":704},{"href":703},"/tools/cronbachs-alpha",[705],{"type":48,"value":706},"Cronbach's Alpha Calculator",{"type":48,"value":708}," to assess overall test reliability after completing item analysis — after removing flagged items, recompute alpha to confirm reliability improved. Use the ",{"type":42,"tag":155,"props":710,"children":712},{"href":711},"/tools/factor-analysis",[713],{"type":48,"value":714},"Factor Analysis Calculator",{"type":48,"value":716}," to examine the dimensional structure of the item set before running item analysis — if items load on multiple factors, item analysis should be conducted within each factor separately. Use the ",{"type":42,"tag":155,"props":718,"children":720},{"href":719},"/tools/cohens-kappa",[721],{"type":48,"value":722},"Cohen's Kappa Calculator",{"type":48,"value":724}," when evaluating inter-rater agreement on subjectively scored items (e.g., essay questions) before computing item-level discrimination. Use the ",{"type":42,"tag":155,"props":726,"children":728},{"href":727},"/tools/roc-curve",[729],{"type":48,"value":730},"ROC Curve and AUC Calculator",{"type":48,"value":732}," to evaluate item performance when the external criterion is binary (pass/fail, diagnosis/no diagnosis) — AUC for each item is equivalent to the probability that a randomly chosen passer scores higher than a randomly chosen failer on that item.",{"type":42,"tag":43,"props":734,"children":736},{"id":735},"frequently-asked-questions",[737],{"type":48,"value":738},"Frequently Asked Questions",{"type":42,"tag":51,"props":740,"children":741},{},[742,747,749,754,756,761],{"type":42,"tag":55,"props":743,"children":744},{},[745],{"type":48,"value":746},"What is the optimal item difficulty?",{"type":48,"value":748},"\nFor ",{"type":42,"tag":55,"props":750,"children":751},{},[752],{"type":48,"value":753},"norm-referenced tests",{"type":48,"value":755}," designed to rank examinees, items at p = 0.50 contribute the most variance to total scores and thus maximize the test's discriminating power. However, perfectly calibrated items at p = 0.50 are rare in practice, and the recommended range is p = 0.30–0.70 (some guidelines use 0.20–0.80). The exact optimal p depends on the number of distractors: for a 4-option MCQ where guessing probability = 0.25, the optimal p = (1 + 0.25) / 2 = 0.625, not 0.50, because random guessing raises the floor. For ",{"type":42,"tag":55,"props":757,"children":758},{},[759],{"type":48,"value":760},"criterion-referenced mastery tests",{"type":48,"value":762}," (e.g., licensing exams), items should be calibrated near the passing standard, and many items with p > 0.80 may be appropriate if the passing standard is high.",{"type":42,"tag":51,"props":764,"children":765},{},[766,771,773,778,780,785,787,792,794,799,801,806,808,813],{"type":42,"tag":55,"props":767,"children":768},{},[769],{"type":48,"value":770},"When should I flag an item for revision?",{"type":48,"value":772},"\nFlag items meeting any of these criteria: (1) ",{"type":42,"tag":55,"props":774,"children":775},{},[776],{"type":48,"value":777},"r_pb \u003C 0.20",{"type":48,"value":779}," — poor discrimination; the item does not differentiate between high and low performers; (2) ",{"type":42,"tag":55,"props":781,"children":782},{},[783],{"type":48,"value":784},"p > 0.90",{"type":48,"value":786}," — nearly everyone answers correctly; the item adds almost no variance; (3) ",{"type":42,"tag":55,"props":788,"children":789},{},[790],{"type":48,"value":791},"p \u003C 0.15",{"type":48,"value":793}," — nearly everyone answers incorrectly; may indicate an error in the key or unreasonably difficult content; (4) ",{"type":42,"tag":55,"props":795,"children":796},{},[797],{"type":48,"value":798},"negative r_pb",{"type":48,"value":800}," — examinees with higher total scores are answering the item ",{"type":42,"tag":139,"props":802,"children":803},{},[804],{"type":48,"value":805},"incorrectly",{"type":48,"value":807}," at higher rates; this is a red flag for a keying error, ambiguous wording, or a trick question; (5) ",{"type":42,"tag":55,"props":809,"children":810},{},[811],{"type":48,"value":812},"non-functioning distractors",{"type":48,"value":814}," — for MCQ items, if one or more distractors are chosen by fewer than 5% of examinees (across all ability levels), they provide no useful information and should be revised. Before revising any item, review its content alongside the statistics — sometimes a statistically poor item is pedagogically important.",{"type":42,"tag":51,"props":816,"children":817},{},[818,823,828,830,835],{"type":42,"tag":55,"props":819,"children":820},{},[821],{"type":48,"value":822},"What is the difference between classical test theory (CTT) and item response theory (IRT)?",{"type":42,"tag":55,"props":824,"children":825},{},[826],{"type":48,"value":827},"Classical test theory (CTT)",{"type":48,"value":829}," — which underlies standard item analysis — characterizes each item by a small number of sample-dependent statistics (p-value, r_pb). These statistics depend on the ability distribution of the examinees: the same item appears harder when administered to a high-ability group. ",{"type":42,"tag":55,"props":831,"children":832},{},[833],{"type":48,"value":834},"Item response theory (IRT)",{"type":48,"value":836}," models the probability that a respondent with a given ability level answers correctly, producing sample-invariant item parameters (difficulty b, discrimination a, guessing c in the 3PL model). IRT is more powerful for test equating, adaptive testing, and detecting item bias (differential item functioning), but requires larger samples (n ≥ 200 for 1PL, n ≥ 500 for 2PL, n ≥ 1000 for 3PL). For routine classroom or small-scale test development, CTT item analysis is sufficient and more interpretable.",{"type":42,"tag":51,"props":838,"children":839},{},[840,845,847,852,854,859],{"type":42,"tag":55,"props":841,"children":842},{},[843],{"type":48,"value":844},"My test has very few items (5–10) — does item analysis still work?",{"type":48,"value":846},"\nItem analysis is less stable with few items because removing or revising one item can dramatically change the total score (which is the criterion for r_pb). With 5–10 items, use the ",{"type":42,"tag":55,"props":848,"children":849},{},[850],{"type":48,"value":851},"corrected item-total correlation",{"type":48,"value":853}," (r_pb computed against the total score ",{"type":42,"tag":139,"props":855,"children":856},{},[857],{"type":48,"value":858},"excluding",{"type":48,"value":860}," the item itself — also called the \"item-rest correlation\") to avoid spurious inflation from the item contributing to its own criterion. The corrected r is always lower than uncorrected r, but more accurately reflects the item's independent contribution to the scale. Cronbach's alpha is also unreliable with fewer than 10 items — report the average inter-item correlation instead.",{"title":7,"searchDepth":862,"depth":862,"links":863},2,[864,865,866,867,868,869,870,871],{"id":45,"depth":862,"text":49},{"id":109,"depth":862,"text":112},{"id":173,"depth":862,"text":176},{"id":330,"depth":862,"text":333},{"id":470,"depth":862,"text":473},{"id":632,"depth":862,"text":635},{"id":692,"depth":862,"text":695},{"id":735,"depth":862,"text":738},"markdown","content:tools:080.item-analysis.md","content","tools/080.item-analysis.md","tools/080.item-analysis","md",{"loc":4},1775502475387]