technologyAI-Enhanced

Jun 5, 2026

Study Reveals Limitations of AI Models in Stroop Test, Highlighting Challenges for Artificial General Intelligence

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

Techradar

·6 min read·artificial-intelligencepsychologycognitive-science

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

Image: Techradar

💡 In a Nutshell

A recent study published in PNAS Nexus reveals that AI models like ChatGPT and Claude struggle with the Stroop test, demonstrating significant limitations in attention control compared to humans. This raises concerns about their capability to achieve artificial general intelligence (AGI), emphasizing the need for improved executive control mechanisms in AI development.

◆🔑 Key Points

01The study tested AI models GPT-4o and Claude 3.5 Sonnet on the Stroop test, revealing a sharp decline in accuracy with longer lists of incongruent words.
02Humans maintain about 95% accuracy on the Stroop test, while GPT-4o dropped to 22% and Claude 3.5 Sonnet to 24% with 40-word lists.
03Despite criticisms regarding the use of outdated AI models, the researchers argue that the findings reflect fundamental limitations inherent to transformer-based architectures.
04Recent tests on newer models like GPT-5 and Claude Opus 4.1 showed only slight improvements, indicating ongoing executive attention deficiencies.
05The authors suggest that future AI development should focus on integrating sophisticated executive control systems to enhance decision-making and cognitive flexibility.

In-Article Ad

✎📝 Full Summary

A study published in the journal PNAS Nexus has highlighted significant limitations in AI models such as ChatGPT and Claude when subjected to the Stroop psychological test, which measures attention control. The Stroop effect illustrates how humans struggle to name the color of a word when it conflicts with the word's meaning. In the study, GPT-4o and Claude 3.5 Sonnet were tested, showing high accuracy in word reading but poor performance in color naming, especially under incongruent conditions. For instance, GPT-4o's accuracy plummeted from 91% with five words to just 22% with 20 words. Critics pointed out that the study utilized outdated models, yet researchers maintained that the findings are relevant, indicating inherent architectural constraints in transformer-based AI. Follow-up tests on newer models like GPT-5 indicated only minor improvements, reinforcing the need for advanced executive control mechanisms in AI to achieve artificial general intelligence (AGI). The authors conclude that enhancing AI's cognitive flexibility is crucial for future development.

In-Article Ad

##️⃣ Key Figures

22%

Accuracy of GPT-4o in the 40-word incongruent Stroop test

95%

Human accuracy in the Stroop test

57%

Accuracy of GPT-4o in a 10-word incongruent Stroop test

76%

Accuracy of Claude 3.5 Sonnet in a 20-word incongruent Stroop test

In-Article Ad

?❓ FAQ

The Stroop test measures cognitive interference by asking participants to name the color of the ink used to write a word, which can conflict with the word's meaning.

The study suggests that improving executive control mechanisms in AI is essential for achieving artificial general intelligence.

✦

Reader Poll

Advanced AnalyticsAnalytics

Do you think AI can achieve artificial general intelligence?

YesNoNot sure

Connecting to poll...

Read the original article

Visit the source for the complete story.

Read Original

Study Reveals Limitations of AI Models in Stroop Test, Highlighting Challenges for Artificial General Intelligence

Topics in this story

Reader Poll

Related Stories

Exploring DLAA: A Game-Changer for High-End GPU Users in 007 First Light

Valve Confirms Release Plans for Steam Machine and Steam Frame Despite Component Shortages

Valve Plans Summer Release for Steam Machine Amid Rising Hardware Costs

Epic Games Urges Supreme Court to Reject Apple's App Store Appeal

Leaked Images Reveal Vibrant Color Options for iPhone 18 Pro

Popular Topics

Study Reveals Limitations of AI Models in Stroop Test, Highlighting Challenges for Artificial General Intelligence

Reader Poll

Read the original article

Related Stories

Exploring DLAA: A Game-Changer for High-End GPU Users in 007 First Light

Valve Confirms Release Plans for Steam Machine and Steam Frame Despite Component Shortages

Valve Plans Summer Release for Steam Machine Amid Rising Hardware Costs

Epic Games Urges Supreme Court to Reject Apple's App Store Appeal

Leaked Images Reveal Vibrant Color Options for iPhone 18 Pro

Popular Topics

🔔 Never Miss a Story