Quality Assessment of Generative AI in Cybersecurity Certification
Abstract
Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), is rapidly changing how higher education approaches teaching, learning, and assessment. In cybersecurity education, professional certification exams are key for measuring competence and helping professionals find better job offers, but there is little research on how GenAI systems perform in these exam settings. This study looks at how three popular LLMs, ChatGPT-5, Gemini-2.5 Pro, and Copilot-2.5 Pro, handle 183 practice questions from the CompTIA Security+ certification. The study used a two-phase evaluation: a domain-based assessment and a full-length practice exam that mirrors real certification tests. The researchers measured model performance with accuracy scores, chi-square tests for statistical differences, and an error taxonomy to spot patterns of mistakes important for education. All three GenAI systems scored above the passing mark, and there were no significant differences between them. Still, the error analysis showed ongoing conceptual and classification mistakes that did not show up in the overall accuracy scores. Our results show that GenAI systems can pass structured certification tests, but accuracy by itself does not fully measure professional skills. The study points out important issues for the reliability and validity of AI-based assessments in higher education and stresses the need for more realistic, concept-focused ways to evaluate GenAI in cybersecurity education.
Más información
| Título según WOS: | ID WOS:001749749800001 Not found in local WOS DB |
| Título de la Revista: | INFORMATICS-BASEL |
| Volumen: | 13 |
| Número: | 4 |
| Editorial: | MDPI |
| Fecha de publicación: | 2026 |
| DOI: |
10.3390/informatics13040053 |
| Notas: | ISI |