INTEGRATING AN LLM-BASED CYBERSECURITY CONSULTATION LAYER INTO A NATIONAL AWARENESS BENCHMARKING SYSTEM
Keywords:
Cybersecurity Awareness, Human–AI Evaluation, Personalized Learning, Large Language Models (LLMs), Security Behavior Intentions (SeBIS)Abstract
This study integrates a large language model (LLM) consultation service into Indonesia’s national Cyber Security Awareness Survey (“Survei Kesadaran Keamanan Siber”/SKKS) to convert survey benchmarking into immediate, personalized cybersecurity remediation and evaluate its safety, usability, and potential short-term proximal intention shift among Generation Z respondents. Using a two-phase, multi-method design, Phase I conducted a model-centric expert evaluation of LLM-generated recommendations across 20 standardized synthetic SKKS profiles, assessing relevance, accuracy, completeness, clarity, and safety. Phase II implemented a single-session within- subject study (N = 104) that measured post-interaction user experience and pre–post changes in security behavior intentions using an adapted Security Behavior Intentions Scale (SeBIS). Expert results showed consistently high ratings across dimensions (all means > 4.0/5) with no safety veto triggers and strong inter-rater reliability (ICC[2,k] = 0.82–1.00). Users reported a positive experience (means ≈ 3.84–3.96/5), sustained engagement, and a significant increase in SeBIS total score (dz = 0.42), with the largest gains in password-management intentions. Novelty lies in embedding LLM-based, profile-driven consultation within a national-scale awareness survey and validating it through both expert human review and behavioral-intention measurement. Beyond cybersecurity, this work contributes to the broader literature on AI- mediated educational systems in safety-critical domains by demonstrating how adaptive dialogue systems can operationalize assessment-to-action loops and support scalable, human-centered personalization.
Downloads
References
Abduel, M. (2024). Users’ Awareness of Cyber Security Practices for Preventing Data Attacks in Public Organisations. The Journal of Informatics, 4(1). https://doi.org/10.59645/tji.v4i1.355
Ahmad, M. R., Osman, M. H., Abdullah, A., & Sharif, K. Y. (2024). Evolution of Information Security Awareness towards Maturity: A Systematic Review. International Journal on Advanced Science, Engineering and Information Technology, 14(5), 1738–1747. https://doi.org/10.18517/ijaseit.14.5.20234
Alotaibi, S., Furnell, S., & He, Y. (2023). Towards a Framework for the Personalization of Cybersecurity Awareness. In S. Furnell & N. Clarke (Eds.), Human Aspects of Information Security and Assurance (Vol. 674, pp. 143–153). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-38530-8_12
Alrababah, H., Iqbal, H., & Khan, M. A. (2024). The Effect of User Behavior in Online Banking on Cybersecurity Knowledge. International Journal of Intelligent Systems, 2024(1), 9949510. https://doi.org/10.1155/int/9949510
Awasthi, R., Mishra, S., Mahapatra, D., Khanna, A., Maheshwari, K., Cywinski, J., Papay, F., & Mathur, P. (2023). HumanELY: Human evaluation of LLM yield, using a novel web-based evaluation tool. Health Informatics. https://doi.org/10.1101/2023.12.22.23300458
Ayyagari, R. (2020). Risk and Demographics’ Influence on Security Behavior Intentions. Journal of the Southern Association for Information Systems, 7(1). https://doi.org/10.17705/3JSIS.00013
BSSN. (2025). Cyber Security Awareness Survey (SKKS) Report 2024. Badan Siber dan Sandi Negara. https://www.bssn.go.id/wp-content/uploads/2025/02/Laporan-SKKS-2024.pdf
Cabitza, F., Fregosi, C., Campagner, A., & Natali, C. (2024). Explanations Considered Harmful: The Impact of Misleading Explanations on Accuracy in Hybrid Human-AI Decision Making. In L. Longo, S. Lapuschkin, & C. Seifert (Eds.), Explainable Artificial Intelligence (Vol. 2156, pp. 255–269). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-63803-9_14
Chang, Yupeng, Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Yi, Yu, P. S., Yang, Q., & Xie, X. (2024). A Survey on Evaluation of Large Language Models. ACM Transactions on Intelligent Systems and Technology, 15(3), 1–45. https://doi.org/10.1145/3641289
Egelman, S., & Peer, E. (2015). Scaling the Security Wall: Developing a Security Behavior Intentions Scale (SeBIS). Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2873–2882. https://doi.org/10.1145/2702123.2702249
Elangovan, A., Liu, L., Xu, L., Bodapati, S. B., & Roth, D. (2024). ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1137–1160. https://doi.org/10.18653/v1/2024.acl-long.63
Gessinger, I., Seaborn, K., Steeds, M., & Cowan, B. R. (2025). ChatGPT and me: First-time and experienced users’ perceptions of ChatGPT’s communicative ability as a dialogue partner. International Journal of Human-Computer Studies, 194, 103400. https://doi.org/10.1016/j.ijhcs.2024.103400
Hölbling, L., Maier, S., & Feuerriegel, S. (2025). A meta-analysis of the persuasive power of large language models. Scientific Reports, 15(1), 43818. https://doi.org/10.1038/s41598-025-30783-y
Huang, H.-Y., Demetriou, S., Hassan, M., Tuncay, G. S., Gunter, C. A., & Bashir, M. (2023). Evaluating User Behavior in Smartphone Security: A Psychometric Perspective. Proceedings of the Nineteenth Symposium on Usable Privacy and Security (SOUPS 2023), 509–524. https://www.usenix.org/conference/soups2023/presentation/huang
Iskandar, A. S., Hilman, M., & Yazid, S. (2026). Information security awareness assessment for civil servant recruitment committee in Indonesia using HAIS-Q. Information & Computer Security, 34(1), 86–103. https://doi.org/10.1108/ICS-01-2025-0019
Khan, N. F., Ikram, N., Murtaza, H., & Javed, M. (2023). Evaluating protection motivation based cybersecurity awareness training on Kirkpatrick’s Model. Computers & Security, 125, 103049. https://doi.org/10.1016/j.cose.2022.103049
Kim, J., Kim, J. H., Kim, C., & Park, J. (2023). Decisions with ChatGPT: Reexamining choice overload in ChatGPT recommendations. Journal of Retailing and Consumer Services, 75, 103494. https://doi.org/10.1016/j.jretconser.2023.103494
Klingbeil, A., Grützner, C., & Schreck, P. (2024). Trust and reliance on AI — An experimental study on the extent and costs of overreliance on AI. Computers in Human Behavior, 160, 108352. https://doi.org/10.1016/j.chb.2024.108352
Kurniawan, Y., Santoso, S. I., Wibowo, R. R., Anwar, N., Bhutkar, G., & Halim, E. (2023). Analysis of Higher Education Students’ Awareness in Indonesia on Personal Data Security in Social Media. Sustainability, 15(4), 3814. https://doi.org/10.3390/su15043814
Li, L., Zhang, Y., & Chen, L. (2023). Personalized Prompt Learning for Explainable Recommendation. ACM Transactions on Information Systems, 41(4), 1–26. https://doi.org/10.1145/3580488
Manzoor, A., Ziegler, S. C., Garcia, K. Maria. P., & Jannach, D. (2024). ChatGPT as a Conversational Recommender System: A User-Centric Analysis. Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, 267–272. https://doi.org/10.1145/3627043.3659574
Marshall, N., Sturman, D., & Auton, J. C. (2024). Exploring the evidence for email phishing training: A scoping review. Computers & Security, 139, 103695. https://doi.org/10.1016/j.cose.2023.103695
Ng, S. W. T., & Zhang, R. (2025). Trust in AI chatbots: A systematic review. Telematics and Informatics, 97, 102240. https://doi.org/10.1016/j.tele.2025.102240
Noh, H.-H., Rim, H. B., & Lee, B.-K. (2025). Exploring User Attitudes and Trust Toward ChatGPT as a Social Actor: A UTAUT-Based Analysis. Sage Open, 15(2), 21582440251345896. https://doi.org/10.1177/21582440251345896
Pandey, S., & Sharma, S. (2023). A comparative study of retrieval-based and generative-based chatbots using Deep Learning and Machine Learning. Healthcare Analytics, 3, 100198. https://doi.org/10.1016/j.health.2023.100198
Passi, S., Dhanorkar, S., & Vorvoreanu, M. (2024). Appropriate Reliance on Generative AI: Research Synthesis. Microsoft Research. https://www.microsoft.com/en-us/research/publication/appropriate-reliance-on-generative-ai-research-synthesis/
Perkasa, D. A., & Setiawan, B. (2024). Measuring Information Security Awareness Level of High School Students. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(4), 1301–1308. https://doi.org/10.57152/malcom.v4i4.1461
Prümmer, J., Van Steen, T., & Van Den Berg, B. (2024). A systematic review of current cybersecurity training methods. Computers & Security, 136, 103585. https://doi.org/10.1016/j.cose.2023.103585
Prümmer, J., Van Steen, T., & Van Den Berg, B. (2025). Assessing the effect of cybersecurity training on End-users: A Meta-analysis. Computers & Security, 150, 104206. https://doi.org/10.1016/j.cose.2024.104206
Reinheimer, B., Aldag, L., Mayer, P., Mossano, M., Duezguen, R., & Volkamer, M. (2020). An Investigation of Phishing Awareness and Education Over Time: When and How to Best Remind Users. Proceedings of the Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020). https://www.usenix.org/conference/soups2020/presentation/reinheimer
Rizal, M. A., & Setiawan, B. (2024). Information Security Awareness Literature Review: Focus Area for Measurement Instruments. Procedia Computer Science, 234, 1420–1427. https://doi.org/10.1016/j.procs.2024.03.141
Robbins, M. S., & Robbins, C. (2025). Impact of Information Security Awareness Training on Knowledge, Attitude, and Behavior: A K-12 Case Study. Journal of Cybersecurity Education, Research and Practice, 2025(1). https://doi.org/10.62915/2472-2707.1252
Sawaya, Y., Lu, S., Isohara, T., & Sharif, M. (2024). A High Coverage Cybersecurity Scale Predictive of User Behavior. 33rd USENIX Security Symposium (USENIX Security 24), 5503–5520. https://www.usenix.org/conference/usenixsecurity24/presentation/sawaya
Schöni, L., Roch, N., Sievers, H., Strohmeier, M., Mayer, P., & Zimmermann, V. (2025). It’s a Match—Enhancing the Fit between Users and Phishing Training through Personalisation. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–25. https://doi.org/10.1145/3706598.3713845
Shari, A. M. J., Ahmad, M., Razali, R. R. R., & Sujak, A. F. A. (2023). Knowledge, Attitude, and Practices Towards Internet Safety and Security Among Generation Z in Malaysia: A Conceptual Paper. In S. K. Bhar & H. Rahmat (Eds.), Proceedings of the International Conference on Communication, Language, Education and Social Sciences (CLESS 2022) (pp. 4–10). Atlantis Press SARL. https://doi.org/10.2991/978-2-494069-61-9_2
Shelby, R., Rismani, S., Henne, K., Moon, Aj., Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N., Gallegos, J., Smart, A., Garcia, E., & Virk, G. (2023). Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 723–741. https://doi.org/10.1145/3600211.3604673
Spitzer, P., Holstein, J., Morrison, K., Holstein, K., Satzger, G., & Kühl, N. (2025). Don’t Be Fooled: The Misinformation Effect of Explanations in Human–AI Collaboration. International Journal of Human–Computer Interaction, 1–29. https://doi.org/10.1080/10447318.2025.2574511
Sun, N., Miao, Y., Mo, X., & Zhang, J. (2025). Large Language Models for Cybersecurity Education: A Survey of Current Practices and Future Directions. In X. Wu, M. Spiliopoulou, C. Wang, V. Kumar, L. Cao, X. Zhou, G. Pang, & J. Gama (Eds.), Data Science: Foundations and Applications (Vol. 15875, pp. 3–20). Springer Nature Singapore. https://doi.org/10.1007/978-981-96-8295-9_1
Taherdoost, H. (2024). A Critical Review on Cybersecurity Awareness Frameworks and Training Models. Procedia Computer Science, 235, 1649–1663. https://doi.org/10.1016/j.procs.2024.04.156
Tam, T. Y. C., Sivarajkumar, S., Kapoor, S., Stolyar, A. V., Polanska, K., McCarthy, K. R., Osterhoudt, H., Wu, X., Visweswaran, S., Fu, S., Mathur, P., Cacciamani, G. E., Sun, C., Peng, Y., & Wang, Y. (2024). A framework for human evaluation of large language models in healthcare derived from literature review. Npj Digital Medicine, 7(1), 258. https://doi.org/10.1038/s41746-024-01258-7
Vasconcelos, H., Jörke, M., Grunde-McLaughlin, M., Gerstenberg, T., Bernstein, M. S., & Krishna, R. (2023). Explanations Can Reduce Overreliance on AI Systems During Decision-Making. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–38. https://doi.org/10.1145/3579605
Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P.-S., Mellor, J., Glaese, A., Cheng, M., Balle, B., Kasirzadeh, A., Biles, C., Brown, S., Kenton, Z., Hawkins, W., Stepleton, T., Birhane, A., Hendricks, L. A., Rimell, L., Isaac, W., … Gabriel, I. (2022). Taxonomy of Risks posed by Language Models. 2022 ACM Conference on Fairness Accountability and Transparency, 214–229. https://doi.org/10.1145/3531146.3533088
Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., Yu, T., Liu, Y., & Wang, H. (2025). Large Language Models for Cyber Security: A Systematic Literature Review. ACM Transactions on Software Engineering and Methodology, 3769676. https://doi.org/10.1145/3769676
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Raden Budiarto Hadiprakoso, Rakhmat Dramaga, Nurul Qomariasih

This work is licensed under a Creative Commons Attribution 4.0 International License.











