Reliability in the Assessment of Learning

Reliability refers to the consistency of the scores obtained — how consistent they are for each individual from one administration of an instrument to another and from one set of items to another. We already gave the formula for computing the reliability of a test: for internal consistency; for instance, we could use the split-half method or the Kuder-Richardson formulae (KR-20 or KR-21)

Reliability and validity are related concepts. If an instrument is unreliable, it cannot yet valid outcomes. As reliability improves, validity may improve (or it may not). However, if an instrument is shown scientifically to be valid then it is almost certain that it is also reliable.

The following table is a standard followed almost universally in educational tests and measurement.

Reliability Interpretation
.90 and above
Excellent reliability; at the level of the best standardized tests
.80 - .90
Very good for a classroom test
.70 - .80
Good for a classroom test; in the range of most. There are probably a few items which could be improved.
.60 - .70
Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to determine grades. There are probably some items which could be improved.
.50 - .60
Suggests need for revision of test, unless it is quite short (ten or fewer items). The test definitely needs to be supplemented by other measures (e.g., more tests) for grading.
.50 or below
Questionable reliability. This test should not contribute heavily to the course grade, and it needs revision.
