Part 1: Reliability. Are all career assessments created equal? If not, are there ways to judge which ones are better than others? Career assessments vary widely in their quality, and yes, there are ways to differentiate good assessments from bad ones. This has nothing to do with how attractive a score report is, how well an instrument is marketed, or its popularity. Psychological assessment is principally about translating a theorized concept (like interests, or personality) into a measurable unit, and some assessments do this much better than others. Reliability and validity evidence serve as quality control criteria, and career assessments are only good if they succeed at reliably and validly measuring what they are intended to measure, and predicting the outcomes they are designed to predict. In this blog post—part one of two—we take a look at reliability.
The question of reliability is a critical one to ask of any assessment. Formally, reliability is “the degree to which scores are free from unsystematic error.” If scores on a test are free from this, they’ll be consistent across related measurement. So, an easy way to think about reliability is with the word “consistency.”
Note the word “unsystematic” in the definition. Scores on an assessment may contain error but still be reliable. For example, if a thermometer is always 5 degrees low, it will be reliable--meaning consistent--but reliably inaccurate. Or, consider the case in which one professor is a harsh grader and one is a lenient grader. Each of them may be reliable, but one assigns grades that are systematically too low, and another gives grades that are systematically too high.
Types of Reliability. There are several types of reliability that can be used to evaluate an assessment’s scores, but only two are usually relevant for most career assessments: test-retest reliability and internal consistency reliability.
Do you know the reliability evidence of the career assessments you are currently using with students? The publishers of instruments you use should freely make this evidence available. PathwayU does so within the resources page in its counselor portal. There, you can learn that the internal consistency reliabilities for our six interest scales, for example, are all mid-.70s or higher—a high degree of reliability given how short the scales are. (Longer scales are usually more reliable than shorter ones, but with a trade-off; obviously they take longer for users to complete.) Similarly, test-retest reliabilities for interests are high, reflecting the high degree of stability for vocational interests over time. Whatever instruments you use with students, make sure the scores have strong evidence of reliability.
Without that, there is no way they can be valid. What is validity? Stay tuned—this is the focus of Part 2 in this series.