Generative AI Models: A Comparison of Application Analysis on Web AI-Based Decision Support Systems for Satellite Anomaly Identification

Abdul Mutholib; Nadirah Abdul Rahim; Teddy Surya Gunawan; Ahmad Shah Hizam Md Yasir

doi:10.31436/iiumej.v27i2.4061

Authors

Abdul Mutholib International Islamic University Malaysia https://orcid.org/0009-0008-8718-2930
Nadirah Abdul Rahim International Islamic University Malaysia https://orcid.org/0000-0003-2508-5998
Teddy Surya Gunawan International Islamic University Malaysia https://orcid.org/0000-0003-3345-4669
Ahmad Shah Hizam Md Yasir Rabdan Academy https://orcid.org/0000-0003-0253-0796

DOI:

https://doi.org/10.31436/iiumej.v27i2.4061

Keywords:

GenAI, Gemma 3, Llama 4 Maverick, Nemotron Nano V2, Devstral Small, Satellite Anomaly Identification, Clarity, Accuracy, Completeness, Relevance, Web AI-Based Decision Support System

Abstract

The rapid innovation of Generative Artificial Intelligence (GenAI) has transformed Decision Support Systems (DSS) across various domains, including satellite operations. This paper presents a comparative analysis of four free Generative AI models, including Gemma 3 by Google, Llama 4 Maverick by Meta AI, Nemotron Nano 2 by NVIDIA, and Devstral Small by Mistral, in the context of integrating them into a Web AI-based Decision Support System (DSS) for satellite anomaly identification. Using a dataset of over 4,455 satellite anomaly records covering 1957 to 2024 provided by Seradata, with scalability and adaptability to diverse mission profiles. We evaluate these models by generating anomaly analyses for clarity, accuracy, completeness, and relevance across the Incident Overview, Reliability Trend, Insight, and Stakeholder Recommendation categories, using a 5-point Likert scale and Fleiss' Kappa for internal consistency. The comprehensive evaluation of GenAI models for Web AI-based DSS delineates a clear performance stratification, with scores of 4.44 and 4.39 for Nemotron Nano 2 and Llama 4 Maverick, respectively, confirming their positions as the leading systems based on overall Likert scores. However, the analysis further revealed a critical trade-off between absolute quality and internal consistency (Fleiss' Kappa). The superior models, Nemotron Nano 2 and Llama 4 Maverick, achieved high Likert scores by displaying pronounced performance peaks but suffered the lowest internal predictability, with ? = 0.18 on Llama 4 Maverick, indicating a highly volatile output structure in which strong clarity often masked critical incompleteness. Conversely, Devstrall Small, despite its suboptimal mean score of 3.44, demonstrated the highest internal consistency, with ? = 0.66. This robust predictability, even at a lower level, underscores a significant implication for DSS development. The model selection must prioritize the required balance between absolute performance ceiling and the predictability of the output structure. The findings highlight the potential of GenAI implementation in DSS to enhance the reliability of satellite operations, while exploring future directions for research and development in this area. This research contributes to the development of more resilient and intelligent satellite anomaly identification systems, with broader implications for space mission safety, resource optimization, cost reduction, and the future of AI-driven aerospace technologies.

ABSTRAK: Inovasi pesat Kecerdasan Buatan Generatif (GenAI) telah mengubah Sistem Sokongan Keputusan (DSS) merentasi pelbagai domain, termasuk operasi satelit. Kajian ini membentangkan analisis perbandingan empat model AI Generatif percuma termasuk: Gemma 3 oleh Google, Llama 4 Maverick oleh Meta AI, Nemotron Nano 2 oleh NVIDIA dan Devstral Small oleh Mistral, dalam menyepadu Sistem Sokongan Keputusan (DSS) berasaskan web AI bagi mengenal pasti anomali satelit. Mengguna pakai lebih 4,455 rekod set data anomali satelit meliputi tahun 1957 hingga 2024 oleh Seradata, set data ini mempunyai ciri kebolehskalaan dan kebolehsuaian pada profil misi yang pelbagai. Kajian ini menilai kesemua model melalui penjanaan analisis anomali melalui kejelasan, ketepatan, kelengkapan dan kerelevanan merentasi kategori Gambaran Keseluruhan Insiden, Trend Kebolehpercayaan bagi, Wawasan dan Cadangan Pihak Berkepentingan, mengguna skala Likert 5-mata dan Kappa Fleiss kekonsistensi dalaman. Penilaian komprehensif model GenAI DSS berasaskan web AI menggariskan stratifikasi prestasi yang jelas pada skor 4.44 dan 4.39 untuk Nemotron Nano 2 and Llama 4 Maverick masing-masing, mengesahkan keupayaan sebagai sistem utama berdasarkan skor Likert keseluruhan. Walau bagaimanapun, analisis selanjutnya mendedahkan pertukaran kritikal antara kualiti mutlak dan kekonsistensi dalaman (Fleiss' Kappa). Model terbaik, Nemotron Nano 2 and Maverick Llama 4, mencapai skor Likert tertinggi dengan puncak prestasi yang tinggi tetapi keboleh ramalan dalaman terendah iaitu ?»0.18 pada Maverick Llama 4, menandakan dapatan struktur yang sangat tidak menentu di mana kejelasan yang kuat sering menutupi ketidaklengkapan kritikal. Sebaliknya, Mistral Devstrall Small menunjukkan kekonsistensi dalaman tertinggi ?»0.66 walaupun skor min suboptimumnya 3.44. Keboleh ramalan yang mantap ini, walaupun pada tahap lebih rendah, menggariskan implikasi ketara bagi pembangunan DSS. Pemilihan model mesti mengutamakan keseimbangan yang diperlukan antara siling prestasi mutlak dan keboleh ramalan dapatan struktur. Penemuan ini mengetengah potensi pelaksanaan GenAI DSS dalam meningkatkan kebolehpercayaan operasi satelit sambil meneroka hala tuju masa depan bagi tujuan penyelidikan dan pembangunan. Penyelidikan ini menyumbang kepada pembangunan sistem mengenal pasti anomali satelit berdaya tahan dan pintar, dengan implikasi yang lebih luas bagi keselamatan misi angkasa lepas, mengoptimum sumber, mengurangkan kos operasi dan masa depan teknologi aero angkasa yang dipacu AI.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

N. Kazanskiy, R. Khabibullin, A. Nikonorov, and S. Khonina, "A Comprehensive Review of Remote Sensing and Artificial Intelligence Integration: Advances, Applications, and Challenges," Sensors, vol. 25, no. 19, p. 5965, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/25/19/5965.

M. Edson de Carvalho Souza and L. Weigang, "Grok, Gemini, ChatGPT and DeepSeek: Comparison and Applications in Conversational Artificial Intelligence," vol. 1, 02/18 2025, doi: 10.5281/zenodo.14885243.

A. Mutholib, N. A. Rahim, T. S. Gunawan, and A. A. Ahmarofi, "Performance Comparison of Data Preprocessing Methods for Trade-Space Exploration with AI Model: Case Study of Satellite Anomalies Detection," in 2024 IEEE 10th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), 2024: IEEE, pp. 271-275.

A. Mutholib, N. A. Rahim, T. S. Gunawan, and M. Kartiwi, "Trade-Space Exploration with Data Preprocessing and Machine Learning for Satellite Anomalies Reliability Classification," IEEE Access, 2025.

P. H. Tran, A. Ahmadi Nadi, T. H. Nguyen, K. D. Tran, and K. P. Tran, "Application of machine learning in statistical process control charts: A survey and perspective," in Control charts and machine learning for anomaly detection in manufacturing: Springer, 2022, pp. 7-42.

H. Akbarian, "Deep Learning Based Anomaly Detection in Space Systems and Operations," Florida Atlantic University, 2024.

Cusumano, M. A., V. F. Farias, and R. Ramakrishnan, "Generative AI as a New Platform for Applications Development," An MIT Exploration of Generative AI no. September, 2024, doi: https://doi.org/10.21428/e4baedd9.f189351f.

G. Team et al., "Gemma 3 technical report," arXiv preprint arXiv:2503.19786, 2025.

G. Team et al., "Gemma: Open models based on gemini research and technology," arXiv preprint arXiv:2403.08295, 2024.

T. Kao. "Why Gemma adopts the Decoder-Only Transformer architecture?" https://makerpro.cc/2024/04/why-gemma-adopts-decoder-only-transformer-architecture/ (accessed 20 October, 2025).

"The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation." https://ai.meta.com/blog/llama-4-multimodal-intelligence/ (accessed 11/07/2025.

A. Basant et al., "Nvidia nemotron nano 2: An accurate and efficient hybrid mamba-transformer reasoning model," arXiv preprint arXiv:2508.14444, 2025.

A. Rastogi et al., "Devstral: Fine-tuning Language Models for Coding Agent Applications," arXiv preprint arXiv:2509.25193, 2025.

E. Turban, J. E. Aronson, and T. P. Liang, Decision Support Systems and Intelligent Systems, 7th Edition ed. New Delhi: Prentice Hall of India, 2007.

R. Islam et al., "The future of cloud computing: benefits and challenges," International Journal

A. Mutholib, N. A. Rahim, and T. S. Gunawan, "Prototype Development of Web AI-Based Decision Support System: Insights and Recommendations for Satellite Anomaly Identification," in Advanced Maui Optical and Space Surveillance Technologies Conference (AMOS), Maui, Hawaii, September 16-19 2025.

E. Miko?ajewska, D. Miko?ajewski, T. Miko?ajczyk, and T. Paczkowski, "Generative AI in AI-Based Digital Twins for Fault Diagnosis for Predictive Maintenance in Industry 4.0/5.0," Applied Sciences, vol. 15, no. 6, p. 3166, 2025. [Online]. Available: https://www.mdpi.com/2076-3417/15/6/3166.

A. Alabed and T. Özkul, "Preliminary Design and Methodological Framework for an AI-Driven Decision Support System in Earth Observation Satellites," in 2025 7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA),

B. Lane, M. Poole, M. Camp, and J. Murray-Krezan, "Using machine learning for advanced anomaly detection and classification," in Advanced Maui Optical and Space Surveillance Tech. Conf.(AMOS), 2016.

T. M. Grile and R. A. Bettinger, "Statistical reliability estimation for satellites operating from 1991-2020 with payload reliability focus," in 2022 6th International Conference on System Reliability and Safety (ICSRS), 2022: IEEE, pp. 378-386.

"Google: Gemma 3 27B (free)." https://openrouter.ai/google/gemma-3-27b-it:free (accessed 16 October, 2025).

"Llama 4 Maverick 17B 128E Instruct." https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct (accessed 11/07/2025.

N. Corporation. "NVIDIA-Nemotron-Nano-9B-v2." https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2 (accessed 12 October, 2025).

J. Denize. "Devstral Small" https://huggingface.co/mistralai/Devstral-Small-2505 (accessed 11/07/2025.

A. Marengo, F. G. Karaoglan-Yilmaz, R. Y?lmaz, and M. Ceylan, "Development and validation of generative artificial intelligence attitude scale for students," Frontiers in Computer Science, vol. 7, p. 1528455, 2025.

R. Cole, "Inter-Rater Reliability Methods in Qualitative Case Study Research," Sociological Methods & Research, vol. 53, no. 4, pp. 1944-1975, 2024, doi: 10.1177/00491241231156971.

A. Bonnet. "AI Metrics that Matter: A Guide to Assessing Generative AI Quality." https://encord.com/blog/generative-ai-metrics/ (accessed 15/09, 2025).