Skip to content

What can we learn from the Implicit Association Test? A Brains Blog Roundtable

Recently there has been a lot of discussion of the value of the Implicit Association Test (IAT) as a measure of implicit bias — discussion generated largely by a new paper by Calvin Lai, Patrick Forscher and their colleagues that presents the results of a meta-analysis of studies conducted using the IAT, plus a provocative article in New York magazine by Jesse Singal that discusses that paper and the methodological controversy it’s a part of. The title of Singal’s article? “Psychology’s Favorite Tool for Measuring Racism Isn’t Up to the Job: Almost two decades after its introduction, the implicit association test has failed to deliver on its lofty promises”. (Please bear in mind that headlines are usually written by someone other than the author.)

In light of this I invited several philosophers to share their views in a roundtable discussion of the value of the IAT and the general question of how to understand, and properly measure, implicit bias. (For other coverage, see this post at Daily Nous, as well as our series of posts last year from the authors of chapters in Michael Brownstein and Jennifer Saul’s Implicit Bias and Philosophy.) The participants in this roundtable are Michael Brownstein (John Jay College of Criminal Justice, CUNY), Nick Byrd (Florida State University), Keith Frankish (The Open University), Jules Holroyd (Sheffield), Neil Levy (Oxford / Melbourne), Edouard Machery (University of Pittsburgh), Alex Madva (Cal Poly Pomona), Shannon Spaulding (Oklahoma State University), and Chandra Sripada (University of Michigan).

You can read each contribution by clicking on the author’s name below. Many thanks to all those involved!


Michael Brownstein:

I won’t say much about the recent coverage of implicit attitude research in, for example, the Chronicle of Higher Education and New York Magazine. Both articles have their virtues and vices. The New York Magazine piece, while unusually detailed and careful compared with science reporting elsewhere, is one-sided. Several of the researches quoted in it have complained on social media that the author included comments in which they described problems with implicit attitude research but excluded the rest of what they said (which was more positive).

I’ll focus instead on some take-aways from recent analyses of the implicit attitude literature (in particular, from Patrick Forscher, Calvin Lai, and colleagues’ meta-analysis of change in implicit attitudes (here) and a bit from Fredrick Oswald and colleagues’ meta-analysis of the predictive validity of the race-IAT (here)).

I think it is clear that general measures of implicit attitudes (e.g., as represented on a race-evaluation IAT) don’t predict specific individual behavior (e.g., biased grading) very well. However, this should be unsurprising, for several reasons.

First, predicting behavior is hard. Research on implicit social cognition arose out of the recognition that self-reported measures of attitudes don’t predict behavior very well. In some contexts, self-report measures outperform indirect measures; in other contexts, it is the opposite. Likewise, changing attitudes in a way that changes behavior is hard. It is not a special problem facing implicit attitude research to identify manipulations that do a good job of changing behavior by changing attitudes.

Second, general measures of preferences shouldn’t be expected to predict specific behaviors in specific contexts very well. The attitude-behavior link is highly context- and person-specific. Some models of implicit attitudes identify key moderators (e.g., Olson and Fazio’s MODE model), but more work in this vein is needed. Relatedly, Alex Madva and I have recently argued (here) that indirect measures may be improved by targeting the activation of specific associations in specific contexts with specific behavioral outcomes. Broadly speaking, I think this should be the take-away from the recent meta-analyses. Not that implicit attitude research is invalid, but that there is much room for improvement for measures of implicit attitudes (again, just as there is for measures of self-reported attitudes). Note also that a related point can be made in response to other psychometric critiques of implicit attitude research (e.g., low test-retest validity; low correlation between measures): what we need are theories that make principled predictions about the specific conditions under which test-retest correlations will be high or low and conditions under which various measures will or won’t correlate. For examples of what I have in mind, see here and here.

Third, while some researchers have been guilty of overselling the power of indirect measures, others have been recommending caution for years. For example, in 2012, Brian Nosek cautioned against using the IAT as a diagnostic tool for predicting individual behavior (here). The fact that the IAT is not suitable as a diagnostic tool, or, worse, as a tool for classifying kinds of people (e.g., “implicit racists”), does not mean that it cannot make predictions about human behavior that are both theoretically interesting and socially important. Greenwald, Banaji, and Nosek’s reply to the Oswald et al. meta-analysis discusses this issue (here).

The Forscher, Lai, et al. meta-analysis focuses specifically on change in implicit attitudes. I think there are important findings in this paper. (A note of cautious caution: the “multivariate network meta-analysis” that they use is new and, so far as I understand, relatively untested. So caution should held in drawing from their analysis. I say this cautiously, however, and hope those with more statistical savvy than I have will weigh in.) Arguably the most striking finding is that changes in implicit attitudes don’t appear to cause changes in behavior. While related, this is not a claim about the predictive validity of indirect measures. However, I think we should be extra cautious about this claim. Of the 426 studies Forscher, Lai, et al. examined (427 in the pre-print was a typo; also, the final number will be closer to 500), only 22 included a longitudinal component. And of those, only a few included a behavioral outcome measure. In fact, only 15% of all the studies included in the meta-analysis included a behavioral outcome measure (and this includes measures of intentions to behave in such-and-such a way and measures that simply ask people how they would behave in a hypothetical situation). This means that the vast majority of the studies under consideration in this paper used a one-shot manipulation of implicit attitudes, and, of those that examined changes in attitudes over time, most didn’t examine changes in behavior, and those that did examine behavior sometimes took intentions and hypothetical predictions as proxies for actual behavior. A priori, one might think that the latter would be the crucial studies. In other words, to really know whether changes in implicit attitudes cause changes in behavior, one might want to look at studies that create durable attitude change and then examine the effects of those lasting changes on behavior. It’s not that Forscher, Lai, and colleagues chose not to include such studies; rather, such studies almost entirely have not been done. (Calvin Lai’s earlier studies showing that the effects of manipulations of implicit attitudes don’t last long (here) use extremely minimal manipulations (e.g., lasting 5 minutes). Future research will hopefully look at more robust forms of implicit attitude change over time.)

In their paper, Forscher, Lai, and colleagues echo these sentiments, writing, “the present meta-analysis speaks more to the processes that produce short-term shifts in implicit bias than to the processes that produce lasting changes. Knowledge of how long-term change in implicit bias occurs is critical for developing a complete theoretical account of implicit bias. Insofar as some forms of problematic behavior are the result of automatic processes, understanding long-term change is also critical for developing interventions to resolve problems caused by this behavior. What processes determine whether a shift in implicit bias will be temporary or long-lasting? When will a shift in implicit bias translate into a permanent change in orientation? Theory and practice-oriented researchers alike would be well-served to contend with these questions.”

As Kate Ratliff put it on Facebook, it is a long jump from the very real challenges facing implicit attitude research to the “implicit bias isn’t real and the IAT is bogus” rhetoric found in recent press articles. (I would encourage readings to check out the discussion on the Psych Map Facebook group of these issues.)

Nick Byrd:

The implicit association test (IAT) is one way to measure implicitly biased behavior. In the IAT, “participants […] are asked to rapidly categorize two [kinds of stimuli] (black vs. white [faces]) [into one of] two attributes (‘good’ vs. ‘bad’). Differences in response latency (and sometimes differences in error-rates) are then treated as a measure of the association between the target [stimuli] and the target attribute” (Huebner 2016). Likewise, changes in response latencies and error-rates resulting from experimental interventions are treated as experimentally manipulated changes in associations.

As philosophers, we are in the business of arguments and their propositions, not associations. So we might wonder whether we can use arguments to intervene on our implicitly biased behavior. And it turns out that we can — even if the findings are not always significant and the effect sizes are often small. Some think that this effect of arguments on IAT performance falsifies the idea that implicitly biased behavior is realized by associations (Mandelbaum 2015). The idea is that propositions are fundamentally different than associations. So associations cannot be modified by propositions. So if an arguments’ propositions can change participants’ implicitly biased behavior — as measured by the IAT — then implicit biases might “not [be] predicated on [associations] but [rather] unconscious propositionally structured beliefs” (Mandelbaum 2015, bracketed text and italics added). But there is some reason to think that such falsification relies on oversimplification. After all, there are many processes involved in our behavior — implicitly biased or otherwise. So there are many processes that need to be accounted for when trying to measure the effect of an intervention on our implicitly biased behavior — e.g., participants’ concern about discrimination, their motivation to respond without prejudice (Plant & Devine, 1998), and their personal awareness of bias. So what happens when we control for these variables? In many cases, we find that argument-like interventions on implicitly biased behavior IAT are actually explained by changes in participants’ concern(s), motivation(s), and/or awareness, but not changes in associations (Devine, Forscher, Austin, and Cox 2013; Conrey, Sherman, Gawronski, Hugenberg, and Groom 2005).


Conrey, F. R., Sherman, J. W., Gawronski, B., Hugenberg, K., & Groom, C. J. (2005). Separating Multiple Processes in Implicit Social Cognition: The Quad Model of Implicit Task Performance. Journal of Personality and Social Psychology, 89(4), 469–487.

Devine, P. G., Forscher, P. S., Austin, A. J., & Cox, W. T. L. (2012). Long-term reduction in implicit race bias: A prejudice habit-breaking intervention. Journal of Experimental Social Psychology, 48(6), 1267–1278.

Huebner, B. (2016). Implicit Bias, Reinforcement Learning, and Scaffolded Moral Cognition. In Implicit Bias and Philosophy, Vol 1.

Mandelbaum, E. (2016). Attitude, Inference, Association: On the Propositional Structure of Implicit Bias. Noûs, 50(3), 629–658.

Plant, A. E., & Devine, P. G. (1998). Internal and external motivation to respond without prejudice. Journal of Personality and Social Psychology, 75(3), 811–832.

Keith Frankish:

Implicit Bias and the IAT

To be implicitly biased is to display discriminatory behaviour that one does not consciously intend or endorse. One sincerely affirms (say) that black people are no less smart than white people, yet behaves as if they are. The IAT is widely thought to provide strong evidence for the existence of implicit bias, but I am sceptical. There are many methodological concerns about the test (summarized by Jesse Singal in his recent New York Magazine article), but the core problem is simpler and deeper. The IAT aims to measure associations between stimuli (typically words and images), and we cannot extrapolate from such associations to behaviour. I assume that intelligent behaviour is the product of practical reasoning, and if it is systematically biased, then this is because the agent holds biased beliefs (nonconscious ones, if the bias is implicit). Yet we cannot infer a person’s beliefs from their associations between stimuli. A given word-image association might be accompanied by a wide range of different beliefs about the relation between the represented objects, or by no specific belief at all. We shouldn’t expect an association test to tell us much about behaviour.

This doesn’t mean that I doubt the existence of implicit bias itself. In fact, I suspect that it is widespread. Much of our behaviour is under the control of nonconscious mental processes, and it wouldn’t be surprising if these do not always mirror our avowed attitudes. We know from everyday observation that people often fail to live up to their ideals and lack insight into their motivations, and the human propensity for self-deception has been a common theme in literature since ancient times. Indeed, it may be that belief in implicit bias has fostered interest in the IAT, rather than the other way round; the test seems to offer a scientific basis for something we find intuitively plausible.

So I believe in implicit bias while doubting that the IAT does much to confirm its existence. In fact, I’d go further. The IAT may offer too comforting a picture of implicit bias. It encourages us to think of our biases as peripheral factors — culturally acquired associations that interfere with our explicit egalitarian attitudes. But they may in fact be much more central to who we are. It may be that our behaviour is wholly the product of implicit, and often biased, mental states, and that our avowed views are merely window dressing. Perhaps we assert egalitarian views, not because we really believe them, but because we have a strong implicit desire to conceal our biases. Perhaps we are unwitting hypocrites, mistaking pragmatic self-presentation for sincere belief. This isn’t a view I endorse (for my considered view, see my 2016), but in treating implicit cognition as the default, I think it puts the emphasis in the right place.


Frankish, K (2016) Playing double: Implicit bias, dual levels, and self-control. In M. Brownstein and J. Saul (eds.), Implicit Bias and Philosophy Volume I: Metaphysics and Epistemology (pp.23-46). Oxford University Press.

Jules Holroyd:

We have plenty of evidence that people discriminate unintentionally. Critical race scholars have long discussed the testimony of victims and witnesses of such discrimination (see Baldwin, Lourdes, hooks, Yamato, and more recently Rankine, Coates inter alia). Research programs into implicit bias are valuable in bringing a richer understanding of the cognitive mechanisms underpinning such unintentional discrimination. But what of the recent challenges to these research programs (e.g. here and here)?

Does the IAT tell us that we are implicitly biased? No: it tells us that individuals – pervasively – show certain patterns of response, from which it is (often) inferred that they harbour certain kinds of associations. Whether these associations are implicit biases will depend on what implicit biases are, and this is a hotly contested philosophical issue.

Does the IAT tell us that we will discriminate? No. It tells us about the presence of a risk factor for discrimination – our very own cognitions – and one we are often ill-placed to mitigate or even detect.

If the IAT turns out not to be a reliable measure or valid predictor of behaviour, does that debunk the whole research program on implicit bias? No: many other indirect measures have tracked unreported cognitive associations, or unintended behavioural responses (shooter bias tasks, studies on microbehaviours, CV & hiring decision studies, monitoring of doctor’s patterns of prescription, and so on).

Does the fact that interventions to change implicit bias have been found largely ineffective in changing behaviour mean that “we cannot claim that implicit bias is a useful target of intervention” (Forscher, quoted in this article)? My view is that, whilst biases are malleable, changing individual cognition is not obviously the right starting place. It is no surprise that manipulating one aspect of our vastly complex and socially embedded cognitions does not result directly in a change in behaviour (note: changing our explicit beliefs is also often ineffective in bringing about changes in behaviour, but that can still have value and important downstream effects). Discussions of implicit bias have motivated many people to consider and enact changes to institutional structures and procedures that may make them more robust against the possibility of implicit bias. In our own discipline, this has included working to challenge under-representation on reading lists, in conference programs, and amongst our academic staff; the anonymous marking of undergraduate essays, evaluation of applications; more rigorous uniformity in interviewing processes… These sorts of interventions seem multiply justified: they may insulate procedures from bias; they may even, downstream, change biases; but more importantly they directly target the goals of tackling marginalisation and under-representation. To my mind, the research programs on implicit bias help us to motivate and think about how best to formulate these kinds of interventions, and it is here that our efforts are best placed.

Neil Levy:

The reception of an exciting idea often passes through three stages. First, it is embraced as the key to explaining something we care about. Then there is a backlash, and the idea is rejected as explaining nothing. Of course, sometimes the backlash is fully justified, but when the idea actually has something going for it, we often move to a third stage, in which it is accepted as a useful tool, explaining, perhaps, some of the variance in the phenomenon we’re interested in. After all the hype surrounding implicit bias, we were due for a correction, and we’re certainly getting one. While implicit bias has been overhyped, though, we seem in danger of rejecting it entirely, on grounds that are spurious.

Jesse Singal is certainly right in suggesting that there is ongoing controversy over what the IAT measures. Is the Black/Bad association driven by an implicit belief that blacks are bad, or that black people are treated badly (or by something else again)? But we need to distinguish questions about the content and the structure of implicit attitudes (are they mere associations, unconscious beliefs, patchy endorsements, or something else altogether?) from questions about their effects on behaviour. Quite different states can have similar effects in some conditions. Implicit processes play an essential role in all deliberation – winnowing options and automatically assigning weights to them – without which we would face an intractable problem of combinatorial explosion. My implicit belief that blacks are bad (potentially a condemnable state of mine) might cause me to prefer a white job applicant over a black one. But so might my (potentially praiseworthy) implicit belief that blacks are treated badly: having this laudable content may be consistent with the state leading me to prefer the white candidate. It may lead me to take the same range of options seriously.

It is true that implicit bias apparently explains only a small percentage of variance in behaviour. That fact does not make it unimportant. As Greenwald et al. note, the effect size estimated by Oswald et al. is higher than the effect size of a daily aspirin. But taking a daily aspirin would prevent more than 400,000 heart attacks in the United States annually. Conscious deliberation is extremely powerful: more powerful than many writers on implicit bias appreciate. Most of the time, our behaviour is very much better explained by our conscious attitudes than are nonconscious. But in certain circumstances – when we respond under time pressure, stress or cognitive load, and when matters are very evenly balanced, we can expect implicit bias to make a difference. When things are evenly poised (as they routinely are in the context of jobs in philosophy, for example, a very small effect size can make a decisive difference).

Edouard Machery:

Should We Throw the IAT on the Scrap Heap of Indirect Measures?

Social psychology emerged as distinct field of psychology in part to measure people’s preferences or, as they are known in psychology, people’s attitudes. Being able to measure people’s attitudes accurately would of course be of great interest to many potential consumers, and, even more interesting, funders, of psychology, from politicians to advertisers to corporations. Social psychologists have long tried to develop measures (“indirect measures”) that go around the limitations that affect self reports of attitudes, such as presentation concerns and limited awareness of one’s own attitudes. Sadly, it is fair to say that these efforts have been for naught: The history of attitude measurement in psychology is one of exuberant, irrational enthusiasm followed by disappointment when the shortcomings of the new indirect measures come to light.

The recent history of the implicit association test is just the most recent episode in this sad history of irrational exuberance followed by disappointment. We were told that the IAT measures a novel type of attitude—mental states that are both unconscious and beyond intentional control, which we’ve come to know as “implicit attitudes”—and that people’s explicit and implicit attitudes can diverge dramatically: As we’ve been told dozens of times, the racial egalitarian can be implicitly racist, and the sexist egalitarian can implicitly be a sexist pig! And law enforcement agencies, deans and provosts at universities, pundits, and philosophers concerned with the sad gender and racial distribution of philosophy have swallowed this story.

But then we’ve learned that people aren’t really unaware of whatever it is that the IAT measures. So, whatever it is that the IAT measures isn’t really unconscious. And we’ve learned that the IAT predicts very little proportion of variance. In particular, only a tiny proportion of biased behavior correlates with IAT scores. We have also learned that your IAT score today will be quite different from your IAT score tomorrow. And it is now clear that there is precious little, perhaps no, evidence that whatever it is that the IAT measures causes biased behavior. So, we have a measure of attitude that is not reliable, does not predict behavior well, may not measure anything causally relevant, and does not give us access to the unconscious causes of human behavior. It would be irresponsible to put much stock in it and to build theoretical castles on such quicksand.

Lesson: Those who ignore the history of psychology are bound to repeat its mistakes.

Alex Madva:

I love cheesesteaks. If I took an IAT comparing cheesesteaks to just about any other food (except maybe cheeseburgers, or my grandmother’s kibbeh), I bet that I’d more strongly associate images of cheesesteaks with words like “good,” “pleasant,” or “tasty.” However, I don’t eat cheesesteaks, for ethical reasons. So the range of behavior predicted by my abiding love of cheesesteaks is relatively small. It still predicts some behavior: for example, when you ask me if I like them, I’ll tell you. But suppose I inhabit a social world that says it’s immoral to eat or even to like cheesesteaks. Then maybe I wouldn’t openly admit my love of cheesesteaks (perhaps I’m embarrassed, or I have a bunch of conflicting feelings and I only report the ones that sound ethical… or maybe living in a world where it’s taboo to like or even talk about cheesesteaks makes self-knowledge about this topic unusually difficult). Then maybe indirect measures like the IAT would be the best way to acquire (partial, non-decisive) evidence about whether I like them. Of course, if my love of cheesesteaks is correlated with so little of my behavior, you might wonder why anybody would be interested in uncovering my cheesesteak preferences in the first place.

Well, let’s dwell a little longer in the nearby world where liking cheesesteaks is widely regarded as immoral. Suppose that in this world it’s nevertheless the case that a lot of cheesesteaks are getting eaten. Field studies find many people eating cheesesteaks despite insisting that they don’t like them. Lab experiments show that sometimes people nibble on cheesesteaks when they think nobody else is looking, or even gobble down whole cheesesteaks without realizing that they’ve done so! How bizarre! This requires explaining. We identify some conditions in which people can be coaxed into admitting that they like cheesesteaks (for example, when they see prominent politicians boast about loving cheesesteaks and thereby re-normalize unacceptable behavior), and we develop a bunch of powerful theories and evidence about what those conditions are. We also identify conditions where people deny liking cheesesteaks, but eat them nevertheless, and this behavior is predicted to some extent by cheesesteak IATs. We develop compelling theories and evidence about the conditions under which cheesesteak IATs are more and less likely to predict behavior. For example, if I’ve just eaten three cheesesteaks and I feel uncomfortably full, or if I’ve just watched a film about where cheesesteak ingredients come from, then I am temporarily less likely to have pro-cheesesteak IAT scores.

We even deliver principled reasons and empirical evidence to explain why cheesesteak IATs are often worse at predicting behavior than IATs about other topics are. What I see when I look at all this evidence is us gaining knowledge about ways to improve cheesesteak IATs, and us learning more about the precise conditions when cheesesteak IATs predict behavior and when they don’t. All these complexities are washed away, however, when someone comes along and does a meta-analysis.

Here’s an example (moving on from the belabored cheesesteak analogy, which sounds like a good name for a jam band). I recently came across a paper by Levinson, Smith, and Young (2014) which developed a sort of “do blacks matter?” IAT, and found that individuals tended to associate white faces with words like “merit” and “value” and black faces with words like “expendable” and “worthless.” This measure predicted several things. For example, mock jurors with stronger racial bias on this measure were comparatively more likely to sentence a black defendant to death rather than life in prison. That correlation provides some initial evidence that the race-value IAT really is tracking, at least to some extent, something like a disposition to devalue black lives. They also found that another IAT, which used words including “lazy” and “unemployed,” did not predict death sentencing. Now, it could be that the significant correlations found with the race-value IAT were flukes. We’d need more studies to know. But from the perspective of a meta-analysis, what we have is one IAT that predicted behavior and one that didn’t: more grist for the miller who says the IAT is an inconsistent predictor of behavior. But in hindsight it actually makes sense that one of these predicted this particular judgment and one didn’t. Also, both measures are exploratory. We are figuring out as we go which forms of the IAT better predict which behaviors in which contexts. Using either of these findings in a meta-analysis is, to my mind, inappropriate and misleading. Given that they are two distinct IATs, it’s not even clear to me why we are lumping them together in one meta-analysis.

Inferring from the fact that some IATs are bad predictors in some contexts to the general conclusion that IATs are bad predictors strikes me as pretty flawed reasoning. Suppose we want to know whether voting patterns are predicted by beliefs about cutting taxes for the rich. Suppose we find in several studies that there’s little to no correlation (e.g., because the Republican base does not support cutting taxes for the rich). Should we conclude that BELIEFS IN GENERAL don’t predict voting patterns? Or that MEASURES OF BELIEF IN GENERAL don’t predict voting patterns? Or even MEASURES OF BELIEFS ABOUT TAXES? Such inferences would be absurd. What about other beliefs that might predict voting patterns, other measures of belief with slightly different wording, etc.? Suppose we’ve been trying to figure out which beliefs predict voting behavior (and how) and we’ve been trying out a whole bunch of different measures of belief in an exploratory way, including bizarre measures and beliefs that a priori we wouldn’t expect to correlate with voting behavior. A meta-analysis on the relationship between “measures of belief” and voting behavior would then surely reveal low correlations.

There is clearly room for improving the IAT and other indirect measures. There’s already tons of theory and evidence about the limitations and possibilities for improvement, and there’s more coming out all the time. One recent example is Cooley and Payne’s finding that using images of groups of people, rather than isolated individuals, improved the AMP in various ways, including test-retest reliability. Maybe this change would also improve the IAT. Even if it does, we likely won’t see Project Implicit “switch over” to a new and improved version, for a variety of reasons, but mostly, it seems to me, because of institutional inertia. I think it’s understandable but unfortunate that the field basically settled on what they took to be a “good enough” measure.

All the improvement in the world will only take associative and attitudinal measures so far. Any given attitude could lead to radically different behavior depending on what else is going on in a person’s mind or context. For example, a working paper by Meier, Schmid, and Stutzer found that people were less likely to vote against the status quo if it was raining on election day. This tendency has evidently swayed several elections in Switzerland. So even if we develop a bunch of good evidence and theories to explain how attitudes predict voting behavior, there will always be further contextual monkey wrenches like this getting thrown into the outcome. (However, I also can’t help but wonder whether we could try to debias this very disposition—which could itself be construed as an attitude—perhaps by encouraging people to think “when it rains, I’ll vote for change!” or to form rain-change automatic associations.)

There’s also plenty of room for exploring alternative ways to get at people’s attitudes, beyond both associative measures and explicit measures like feeling thermometers. In a forthcoming paper, Guillermo del Pinal, Kevin Reuter, and I present some initial evidence that one of the notorious gender stereotypes plaguing fields like philosophy—that women have less innate brilliance or raw talent—has a different conceptual structure from a brute association. We offer principled reasons for thinking this stereotype won’t show up on associative measures like the IAT but will affect various judgments and behaviors. We also think this bias will be less susceptible to contextual variation. Now, we do not frame these findings as a criticism of the IAT or of research on associative biases in general. There is a whole lot of discrimination to explain, and the mind is populated with an abundance of biases to help explain it (and of course there will also be lots of structural factors as well!). We simply suggest it’s a mistake to think that all biases should be understood in associative terms.

On the specific meta-analysis by Forscher, Lai, et al., I would just point out that the possibility of interventions that change test scores without equally strong changes on the actual construct of interest is a ubiquitous problem, which affects everything from medicine (e.g., changing cholesterol without improving heart conditions) to education (e.g., teaching to the test rather than teaching real skills). I don’t personally care much about changing IAT scores. I care about changing the affective-cognitive-motivational-behavioral dispositions that the IAT scores are intended to track. If after going through some prejudice-reducing intervention, people’s IAT scores become less reliable but their behavior becomes less biased, so be it. Suitably improved indirect measures might still be useful for helping us identify our biases, even if they become less useful after debiasing interventions.

Shannon Spaulding:

The IAT is meant to measure implicit bias. The recent critiques of the IAT tend to focus on the robustness of IAT results, the relation between IAT scores and other measures of bias, and whether IAT scores predict discriminatory behavior. Although these critiques are legitimate in a way, the problem is not so much with the IAT but the fact that we often reduce implicit bias to an IAT score. IAT is just one way of measuring implicit bias. (Other measures include lexical decision tasks, sequential priming, word completion tasks, go/no-go association tasks, false memory tasks, etc.) In my view, IAT tracks salient associations between categories (e.g., race or gender) and features (e.g., dangerous or family-oriented). These salient associations are real – i.e., IAT really can detect biases in one’s associations – but they are highly unstable and can vary even with small changes in context. For example, a typical White American’s association between a visual representation of a Black face and a negative feature can (and does) break down when that Black face is presented in the background context of a church interior as opposed to an urban street corner (Wittenbrink, Judd, & Park, 2001). The negative association salient in the latter context is not salient in the former context. A similar thing happens when the faces presented are not just generic faces of White or Black men but are faces of well-known liked/disliked people, e.g., Adolf Hitler vs. Michael Jordan (Govan & Williams, 2004). The kind of instability in IAT results is similar to the instability seen in the results of priming studies of implicit bias. The fact that the salient associations are unstable does not mean they aren’t tapping into a real bias.They are. But it’s just not the kind of bias that really concerns us when we are talking about implicit bias. We are deeply concerned about the more stable, deep-seated biases, the kind of biases that predict discriminatory and prejudicial behavior. One way of understanding the more deep-seated biases is in terms of how central a feature is to our concept, e.g., how central being family-oriented is to our concept of women or how central being dangerous is to our concept of Black men. The centrality of a feature for a concept determines its cross-contextual stability, i.e., whether the association will survive background changes. It is the kind of measure that predicts our inferences and behavior. The IAT is a limited tool, and it simply is not designed to detect and measure such biases. So, if we are interested in implicit biases that reliably predict how we think and behave, it would be useful to focus more on something like conceptual centrality. There are measures of conceptual centrality that could be deployed to study implicit biases, e.g., Carey, 2009; Johnson and Keil, 2000; Sloman, et al., 1998. This hasn’t been the focus of empirical or philosophical discussions of implicit bias, but going forward it should be.


Carey, S. (2009). The Origin of Concepts: Oxford University Press.

Govan, C. L., & Williams, K. D. (2004). Changing the affective valence of the stimulus items influences the IAT by re-defining the category labels. Journal of Experimental Social Psychology, 40(3), 357-365.

Johnson, C., & Keil, F. C. (2000). Explanatory understanding and conceptual combination. In F. Keil & C. Johnson (Eds.), Explanation and Cognition (pp. 328-359). Cambridge, MA: MIT Press.

Sloman, S. A., Love, B. C., & Ahn, W. K. (1998). Feature centrality and conceptual coherence. Cognitive Science, 22(2), 189-228.

Wittenbrink, B., Judd, C. M., & Park, B. (2001). Spontaneous prejudice in context: variability in automatically activated attitudes. Journal of Personality and Social Psychology, 81(5), 815.

Chandra Sripada:

Putting “tiny” correlations between implicit attitudes and behavior in perspective

Race IAT scores account for just 1% or 2% of the variance in laboratory measures of discriminatory behavior. Many critics seize on this observation to dismiss the race IAT—the test has inadequate predictive validity it is said. I believe this criticism is misguided, and an example from baseball (inspired by Abelson 1985) helps to show why.There are one hundred batters who have differing levels of skill at hitting the ball, captured by their batting averages. The batting averages (and thus the players’ skill levels) are distributed normally with a mean of 0.2 and standard deviation of 0.05 (not too dissimilar from the distribution in Major League Baseball). Skill level here is clearly a powerful predictor of ball hitting behavior: a player in the 95th percentile of skill level hits hundreds more balls over the course of a 1000 at bat season than a player in the 5th percentile. But what is the correlation between skill level and getting a hit at a single at bat? It is 0.12 and so skill level accounts for 1.4% of the variance in ball hitting. Clearly anyone citing this meager correlation to dismiss the predictive validity of batting skill is making a serious mistake.

The lesson here is that when the construct of interest is expected to influence behavior repeatedly across a multitude of occasions, we need a more nuanced understanding of predictive validity. One’s racial attitudes obviously fit this expectation. The tiny correlations referenced above for the race IAT are mostly based on observations of a single occasion of behavior. It is possible, then, that, like in the baseball example, “cumulative” prediction of behavior across multiple occasions remains strong.

That such a possibility exists says nothing about whether it is actual. Studies in the race IAT literature simply have not looked much beyond single occasion observations (though this is perhaps starting to change). So let me be clear: I am not here affirming the predictive credentials of the race IAT. My point, rather, is a narrow one that the charge of “tiny correlations with behavior” we hear repeated so often is itself based on a simplistic and blinkered understanding of predictive validity.


Go to Source

Posted in General.