Study Reveals Limitations of AI Models in Stroop Test, Highlighting Challenges for Artificial General Intelligence
AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence
Techradar
Image: Techradar
A recent study published in PNAS Nexus reveals that AI models like ChatGPT and Claude struggle with the Stroop test, demonstrating significant limitations in attention control compared to humans. This raises concerns about their capability to achieve artificial general intelligence (AGI), emphasizing the need for improved executive control mechanisms in AI development.
- 01The study tested AI models GPT-4o and Claude 3.5 Sonnet on the Stroop test, revealing a sharp decline in accuracy with longer lists of incongruent words.
- 02Humans maintain about 95% accuracy on the Stroop test, while GPT-4o dropped to 22% and Claude 3.5 Sonnet to 24% with 40-word lists.
- 03Despite criticisms regarding the use of outdated AI models, the researchers argue that the findings reflect fundamental limitations inherent to transformer-based architectures.
- 04Recent tests on newer models like GPT-5 and Claude Opus 4.1 showed only slight improvements, indicating ongoing executive attention deficiencies.
- 05The authors suggest that future AI development should focus on integrating sophisticated executive control systems to enhance decision-making and cognitive flexibility.
Advertisement
In-Article Ad
A study published in the journal PNAS Nexus has highlighted significant limitations in AI models such as ChatGPT and Claude when subjected to the Stroop psychological test, which measures attention control. The Stroop effect illustrates how humans struggle to name the color of a word when it conflicts with the word's meaning. In the study, GPT-4o and Claude 3.5 Sonnet were tested, showing high accuracy in word reading but poor performance in color naming, especially under incongruent conditions. For instance, GPT-4o's accuracy plummeted from 91% with five words to just 22% with 20 words. Critics pointed out that the study utilized outdated models, yet researchers maintained that the findings are relevant, indicating inherent architectural constraints in transformer-based AI. Follow-up tests on newer models like GPT-5 indicated only minor improvements, reinforcing the need for advanced executive control mechanisms in AI to achieve artificial general intelligence (AGI). The authors conclude that enhancing AI's cognitive flexibility is crucial for future development.
Advertisement
In-Article Ad
Advertisement
In-Article Ad
Reader Poll
Do you think AI can achieve artificial general intelligence?
Connecting to poll...
Read the original article
Visit the source for the complete story.




