Are your career assessments any good, scientifically speaking?  Part 1

Part 1: Reliability.  Are all career assessments created equal? If not, are there ways to judge which ones are better than others? Career assessments vary widely in their quality, and yes, there are ways to differentiate good assessments from bad ones. This has nothing to do with how attractive a score report is, how well an instrument is marketed, or its popularity. Psychological assessment is principally about translating a theorized concept (like interests, or personality) into a measurable unit, and some assessments do this much better than others. Reliability and validity evidence serve as quality control criteria, and career assessments are only good if they succeed at reliably and validly measuring what they are intended to measure, and predicting the outcomes they are designed to predict. In this blog post—part one of two—we take a look at reliability.

The question of reliability is a critical one to ask of any assessment. Formally, reliability is “the degree to which scores are free from unsystematic error.” If scores on a test are free from this, they’ll be consistent across related measurement. So, an easy way to think about reliability is with the word “consistency.”

Note the word “unsystematic” in the definition. Scores on an assessment may contain error but still be reliable. For example, if a thermometer is always 5 degrees low, it will be reliable--meaning consistent--but reliably inaccurate. Or, consider the case in which one professor is a harsh grader and one is a lenient grader. Each of them may be reliable, but one assigns grades that are systematically too low, and another gives grades that are systematically too high.

Types of Reliability. There are several types of reliability that can be used to evaluate an assessment’s scores, but only two are usually relevant for most career assessments: test-retest reliability and internal consistency reliability.

  1. Test-retest reliability. This is a measure of stability, and it asks the question: How stable are scores on the assessment over time? Do you get the same results (or at least very close to the same) on two separate occasions? This is computed numerically using a correlation coefficient. The thing to keep in mind with test-retest reliability is that it only makes sense if the trait being measured is supposed to be stable. Most variables that career counselors want to assess (e.g., interests, values, personality, ability) are quite stable on average, once people reach early adulthood. Otherwise it would make little sense to measure these things and use them to inform career decisions that may have long-term implications.
  1. Internal Consistency Reliability. This is a measure of consistency within the test. How consistent are scores for the items on this test? Do all the items “fit together”? Do they all measure the same thing? The are main type of internal consistency reliability is captured by coefficent alpha. Conceptually, you can think of it as essentially splitting the measure in half many different times—as many different ways as is possible—and calculating the reliability coefficient for each pair of halves, then calculating the average of all those split-half reliability coefficients. In practice there we use statistical software to do this, so it is easy. Coefficient alpha can also be thought of as the degree to which each item contributes to the total score.

Do you know the reliability evidence of the career assessments you are currently using with students? The publishers of instruments you use should freely make this evidence available. PathwayU does so within the resources page in its counselor portal. There, you can learn that the internal consistency reliabilities for our six interest scales, for example, are all mid-.70s or higher—a high degree of reliability given how short the scales are. (Longer scales are usually more reliable than shorter ones, but with a trade-off; obviously they take longer for users to complete.) Similarly, test-retest reliabilities for interests are high, reflecting the high degree of stability for vocational interests over time. Whatever instruments you use with students, make sure the scores have strong evidence of reliability.

Without that, there is no way they can be valid. What is validity?  Stay tuned—this is the focus of Part 2 in this series.