N-back training is one of the most-studied and most-contested cognitive interventions of the past 20 years. It is a working-memory task in which the user identifies whether the current stimulus matches the one shown N positions back in a sequence. The dual n-back variant adds a second simultaneous stream, typically auditory and visual, and is meaningfully harder than it sounds the first time you try it.
The reason n-back training became famous is that a 2008 paper in PNAS by Jaeggi and colleagues claimed it could improve fluid intelligence, the kind of abstract problem-solving that IQ tests are built to measure. That claim was electrifying and turned out to be largely unreplicable in the form it was originally made. The story of what happened to the n-back literature between 2008 and now is a useful worked example of how a real but narrow finding gets oversold, then walked back, then settles into a quieter and more honest place.
The short answer: N-back training reliably improves performance on n-back. It produces modest improvement on similar working-memory tasks. It does not reliably improve fluid intelligence or general cognitive ability. Use it for the trained skill, not for IQ.
What is the n-back task, really?
The standard visual n-back works like this. A grid appears on the screen. A square highlights one cell, then another, then another, in a sequence of (typically) 20 to 30 trials. The user presses a key whenever the current position matches the position N steps back. At N=1, the comparison is to the previous trial; that is easy. At N=3 or N=4, working memory is heavily loaded, because the user has to maintain a constantly-updating buffer of the last few positions while comparing the current one to a target position several steps in the past.
Dual n-back adds a parallel auditory stream. Letters are spoken alongside the visual grid, and the user must respond separately to visual matches and auditory matches. The task is now both updating two buffers and inhibiting cross-modal interference. It is a strong load on working memory, attentional control, and updating.
Adaptive versions of n-back, the kind used in the research, increase N when the user gets too good and decrease it when the user struggles, keeping the task at the edge of capacity. This is the core feature that distinguishes laboratory n-back from the casual versions in some apps.
What did the 2008 Jaeggi study claim?
Jaeggi, Buschkuehl, Jonides, and Perrig published "Improving fluid intelligence with training on working memory" in PNAS in 2008. They trained 70 young adults on dual n-back for periods ranging from 8 to 19 days, then tested fluid intelligence using Raven's Advanced Progressive Matrices. The training group improved on the matrices test more than a passive control group, and the reported effect was dose-dependent (more training days, larger gain).
The paper was a sensation. It was widely cited (now over 4,000 citations), it inspired commercial products, it spawned dual-n-back communities online, and it became one of the most prominent claims in cognitive training history. It also had problems that became apparent over the following decade.
The control group was passive (no training), which means anything that engaged participants more than nothing could produce the apparent transfer. The matrices test was administered at pre and post, with no Solomon-style control for retest effects. The sample size per condition was small. Subsequent attempts at replication, with active controls and larger samples, did not consistently reproduce the far-transfer effect.
"Working memory training programs produce short-term, specific training effects that do not generalize."
Melby-Lervåg, Redick, and Hulme, 2016, Perspectives on Psychological Science
What did the meta-analyses find?
Three large meta-analyses now anchor the evidence base.
Melby-Lervåg & Hulme, 2013, Developmental Psychology. Pooled 23 studies covering working-memory training in general. They found robust near transfer (improvement on tasks similar to those trained), no reliable far transfer to nonverbal ability, and no reliable transfer to verbal ability or arithmetic. The training-related gain on intelligence tests was small and not statistically distinguishable from zero once active controls were considered.
Melby-Lervåg, Redick, & Hulme, 2016, Perspectives on Psychological Science. A larger and more rigorous follow-up. Pooled 87 studies across age groups, including the n-back literature specifically. The conclusion: working memory training produces durable improvements on the trained task, weak and inconsistent improvements on similar tasks, and no reliable far transfer to academic outcomes, intelligence, or executive function broadly construed.
Soveri et al., 2017, Psychonomic Bulletin & Review. A focused meta-analysis of n-back specifically. Pooled 33 studies. Found a moderate effect on untrained n-back tasks (a near-transfer effect), a small effect on other working-memory tasks, and no reliable effect on fluid intelligence. The effect on fluid intelligence shrank further when only studies with active controls were included.
The pattern across all three is identical and now well-established: n-back trains n-back, and produces some near transfer, and that's most of it. The original far-transfer claim has not held up.
Why was the original effect so hard to replicate?
A few likely contributors, in roughly the order they explain:
- Passive controls inflate apparent effects. When the control group does nothing, any group that engages with effortful training shows test-retest improvement plus expectancy effects. Active controls (a different effortful task) wash most of this out.
- Small samples are noisy. The original Jaeggi paper had about 17 participants per condition. Effect sizes from small samples are unstable and often shrink in larger replications, a pattern the broader replication crisis has documented across psychology.
- Publication bias. Studies that find effects get published faster. Studies that find null results sit in file drawers. Soveri's meta-analysis found evidence of publication bias in the n-back literature, with smaller studies systematically reporting larger effects than larger ones.
- The task was novel and engaging. Particularly in 2008, dual n-back was an interesting, demanding, and somewhat enjoyable task. Some of the early effect was probably motivational, not memory-mechanistic.
None of this means the original researchers were dishonest. It means the original finding was unstable in ways that became visible only after the field replicated it.
Should you do n-back training?
A few specific recommendations follow from the evidence:
- Do not do dual n-back to raise your IQ. The far-transfer claim does not replicate. If you want fluid-intelligence gains, the only intervention with consistent evidence is education, not n-back.
- Use n-back if you enjoy it. Effortful, engaging mental practice is good for you. There is no specific harm to dual n-back; the issue is misallocated expectations, not damage.
- For evidence-backed cognitive training, prefer protocols with replicated transfer. The ACTIVE trial is the strongest evidence in the field, and its protocol was speed-of-processing training, not n-back. Posit Science's BrainHQ implements the ACTIVE-derived exercises.
- Distribute your practice. Whatever you train, the spacing effect applies. Brief daily sessions outperform long infrequent ones.
- Pair training with the lifestyle factors with stronger evidence. Exercise, sleep, hearing, and blood-pressure control are the load-bearing pieces of long-term cognitive health. Cognitive training is supplementary.
The honest version, which we think the literature supports: n-back is an interesting working-memory task with reliable near transfer and unreliable far transfer. Its place in your cognitive-training routine should be small. It should not be your routine.
What this means for the broader brain-training conversation
The n-back story is the cognitive-training literature's clearest worked example of what happens when a striking finding gets sold ahead of replication. The 2008 paper made a real contribution; n-back is a well-characterized task, and it does train working memory in measurable ways. The mistake was extrapolating from "improves performance on this task" to "improves general intelligence." That mistake is the same mistake the FTC fined Lumosity for making in 2016. It is not specific to n-back.
For the broader picture, see our cognitive training guide and our take on whether brain training is a scam. The n-back literature is one of several pieces of evidence pointing toward the same conclusion: cognitive training works for what it trains. Generalizing further is where the field repeatedly stumbles.