OpenAI's GPT-5.5 Matches Anthropic's Mythos Preview in Cybersecurity Tests
GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests
Ars Technica
Image: Ars Technica
New research from the UK's AI Security Institute reveals that OpenAI's GPT-5.5, recently launched, matches Anthropic's Mythos Preview in cybersecurity evaluations. Both models were tested on various Capture the Flag challenges, with GPT-5.5 slightly outperforming Mythos Preview in expert tasks, indicating significant advancements in AI capabilities for cybersecurity.
- 01GPT-5.5 achieved an average score of 71.4% on expert cybersecurity tasks.
- 02Mythos Preview scored 68.6% on the same tasks, showing competitive performance.
- 03GPT-5.5 solved a complex disassembler task in just over 10 minutes.
- 04Both models struggled with the Cooling Tower simulation, failing to disrupt control software.
- 05GPT-5.5 succeeded in 3 out of 10 attempts on a data extraction attack simulation.
Advertisement
In-Article Ad
Recent evaluations by the UK's AI Security Institute (AISI) show that OpenAI's GPT-5.5, which launched publicly last week, matches Anthropic's Mythos Preview in cybersecurity performance. The AISI tested both models on 95 Capture the Flag challenges, focusing on skills like reverse engineering and cryptography. GPT-5.5 achieved an average score of 71.4% on expert tasks, slightly surpassing Mythos Preview's 68.6%. Notably, GPT-5.5 completed a challenging disassembler task in 10 minutes and 22 seconds with a cost of $1.73 in API calls. In a simulated data extraction attack, GPT-5.5 succeeded in 3 out of 10 attempts, compared to 2 out of 10 for Mythos Preview. However, both models failed to perform well in the Cooling Tower simulation, which tests the disruption of power plant control software, a challenge that has stumped previous AI models as well.
Advertisement
In-Article Ad
Advertisement
In-Article Ad
Reader Poll
Do you believe AI models like GPT-5.5 will significantly enhance cybersecurity?
Connecting to poll...
More about OpenAI
Read the original article
Visit the source for the complete story.




