Rapid advances in the statistical measurement of judicial behavior have provided concise, meaningful, and intuitive summaries of differences between judges based on votes. Yet such scores remain poorly understood, widely misinterpreted, and commonly misused. We provide a guide for how to interpret such measures, clarify major misconceptions, and argue that extant scores are merely a special case of a general model-based measurement approach to studying judicial behavior. When applied beyond aggregate merits votes, such measurement approaches empower the meaningful examination, data collection, and incorporation of legal doctrine. We demonstrate how such measures – when augmented with jurisprudentially meaningful data – facilitate the study of substantive questions beyond the hackneyed “law vs. policy” debate, with case studies of the constitutional revolution of 1937, the dimensionality of the Supreme Court, the historical origins of the standing doctrine, statutory interpretation, and backlash.