Are your career assessments any good, scientifically speaking? Part 2

Part 2: Validity. In my last blog post, I introduced the idea of reliability (e.g., consistency) and noted that if scores on a career assessment are not reliable, there is no way they are valid. But what does validity mean, and how can you tell whether an assessment provides valid scores or not?

Sometimes validity is referred to as addressing the question “does the assessment measure what it is designed to measure?” A broader and more appropriate way to think about validity addresses the question “does the assessment accomplish its intended purpose?” or “does the assessment meet the claims made for it?”

Types of Validity. There are three primary types of validity that are used to evaluate an assessment’s scores. (The list below has four types, but the first doesn’t really count.) They are as follows:

1. Face validity. This really isn’t validity in a scientific sense. If a measurement instrument has face validity, it just means it looks like it measures what it’s supposed to measure. If you have a measure of interests and the items look like they are tapping into a person’s interests, you have face validity. This can be good if it builds rapport with the user, but it doesn’t have any real scientific value. All things being equal, it’s good if you have it, but it doesn’t mean that the assessment is accomplishing its intended purpose.
2. Content validity. Content validity refers to how well an assessment covers all relevant aspects of the domain it is supposed to measure.

Conceptually, you could ensure content validity if you could write items to cover absolutely every detail about the construct you are setting out to measure. For example, if a test developer wanted to measure a particular style of leadership, that developer could write every single item she could possibly think of that would be relevant to that leadership style. Then she could take a random sample of those items and include them in the scale. Unfortunately, this is not only impractical, it is impossible.

Often, content validity is assessed by expert judgment. You could assess the content validity of an assessment by having an expert or experts examine the items and determine whether the items are a good representation of the entire universe of items. If the leadership style measure described above has low content validity, it would probably include a lot of items that aren’t relevant to that style, and it would probably leave out items that are very relevant to that style.

3. Criterion-related validity. This refers to how well an assessment correlates with performance or whatever other criterion you’re interested in. It answers the question: “Do scores on the measure allow us to infer information about performance on some criterion?” There are two types of criterion-related validity: concurrent and predictive.

Concurrent validity. This refers to how well the assessment scores correlate with some criterion, when both measures are taken at the same time. For an interest inventory, for example, a good question is whether people who are, say, engineers score high on a scale designed to measure interest in engineering.
Predictive validity. This refers to how well the test scores correlate with future criteria. For example, what percentage of people will end up in a career field down the road that corresponds to high scores on scales designed to measure a person’s interest in that field?

4) Construct validity. Construct validity most directly addresses the question of “does the test measure what it’s designed to measure?” It refers to how well the test assesses the underlying construct that is theorized. To demonstrate evidence of construct validity, a test developer would show that scores on her measure would have a strong relationship with certain variables (the ones that are very similar to what is being measured) and a weak relationship with other variables (those that are conceptualized as being dissimilar to the construct being measured). There are two types: convergent and discriminant.

Be Sure To Ask Yourself... What's the evidence of reliability and validity for how I plan to use this instrument?”

a. Convergent validity. Convergent validity is the extent to which scores on a measure are related to scores on measures of the same or similar constructs. For example, let’s say your personality test has an extraversion scale. You might expect that the more extraverted a person is, the more likely they are to have high levels of sociability. If there is a strong positive relationship between scores on your extraversion measure and the scores on measures of sociability, then your scale’s scores have evidence of convergent validity.

b. Discriminant validity. Support for discriminant validity is demonstrated by showing that an assessment does not measure something it is not intended to measure. For example, if you have an extraversion scale in your measure, you might consider also administering a measure of emotional stability. We know that extraversion and emotional stability are two different things. If you asked people to take your scale and a measure of emotional stability and you find a small correlation between scores on these two scales, you have shown that your scale measures something other than emotional stability. Note that you haven’t shown what it does measure, you have just shown what it does not measure.

Remember too that an instrument’s scores can be reliable but not valid. However, if an instrument’s scores are not reliable, there is no way that they can be valid.

Why does this matter? If you want to know how “good” a career assessment is, ask this question: “What is the evidence of reliability and validity for how I plan to use this instrument?” A counselor should be able to answer this question, and a vendor or test developer should, upon request, provide some basic information (and links to more detailed information) to show that its scores provide reliable and valid results. If no information about a particular assessment can be found, it is best to avoid that instrument, or at the very least to view scores generated by that assessment in only the most tentative way. For details on the evidence of validity for scores on PathwayU’s instruments, consult the resources page in our counselor portal, or contact us.

Are your career assessments any good, scientifically speaking? Part 2

MAILING ADDRESS