A Detector for Textual-Visual Fake News Using Text Summarization and Contrastive Language-Image Pretraining Embedding Model

Authors

DOI:

https://doi.org/10.31436/iiumej.v26i3.3615

Keywords:

False news, multimodal data, BERT, Gate fusion, Deep learning

Abstract

Recently, social media have become an influential means of spreading news in multiple forms, including text, images, audio, and video. With the development taking place today and the heavy reliance of society members on social media to share content online, social media platforms have become a means of spreading unfavorable stories. This study proposes a model to identify false information in various media and determine its authenticity. After pre-processing both textual and visual data independently, the proposed CLIPCrossAttGFN model is used to initialize the fusion feature vector passed to the fully connected layer as part of the classification step. The CLIPCrossAttGFN model starts by passing the image to the Contrastive Language-Image Pretraining (CLIP) visual encoder as a feature extraction method. Also, text, after being summarized to 512 words based on the Bidirectional Encoder Representations from Transformers (BERT), which can identify meaningful words based on the semantic relationship of the text as a whole, is passed to the CLIP encoder before sending the initial extracted features to several layers of CNN, LSTM, and a cross-attention mechanism as a final feature extraction technique. Finally, a multimodal feature is merged using a gate fusion network. The results reveal that the proposed model has better accuracy, ensuring reliable detection compared to other available approaches.

ABSTRAK: Baru-baru ini, media sosial telah menjadi saluran berpengaruh dalam penyebaran berita dalam pelbagai bentuk, termasuk teks, imej, audio dan video. Dengan perkembangan pesat berlaku hari ini, dan  kebergantungan tinggi masyarakat pada media sosial bagi berkongsi konten secara atas talian, platform media sosial telah menjadi medium penyebaran maklumat yang tidak sahih. Kajian ini mencadangkan model bagi mengenal pasti maklumat palsu dalam beberapa media dan menentukan sama ada ianya benar atau palsu. Selepas kedua-dua data teks dan visual melalui proses pra-memprosesan secara berasingan, model CLIPCrossAttGFN yang dicadangkan diguna pakai bagi memulakan vektor ciri gabungan yang dihantar ke lapisan bersambung sepenuhnya sebagai langkah pengelasan. Model CLIPCrossAttGFN bermula dengan menghantar imej kepada pengekod visual Contrastive Language-Image Pretraining (CLIP) sebagai kaedah pengekstrakan ciri. Teks pula diringkaskan kepada 512 perkataan berdasarkan Perwakilan Pengekod Dwi Arah daripada Transformers (BERT), yang dapat mengenal pasti perkataan bermakna berdasarkan hubungan semantik teks secara keseluruhan, dihantar kepada pengekod CLIP sebelum ciri awal diekstrak ke beberapa lapisan CNN, LSTM dan mekanisme silang perhatian sebagai teknik pengekstrakan ciri akhir. Akhirnya, ciri multimodal digabungkan menggunakan rangkaian gabungan berpintu. Dapatan kajian mendapati bahawa model yang dicadangkan mempunyai ketepatan tertinggi, menjamin pengesanan yang boleh dipercayai berbanding pendekatan sedia ada.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Yadav A, Gaba S, Khan H, Budhiraja I, Singh A, Singh KK (2024) ETMA: Efficient transformer-based multilevel attention framework for multimodal fake news detection. IEEE Trans Comput Soc Syst. 11(4):5015-5027. doi:10.1109/tcss.2023.3255242.

Abduljaleel IQ, Ali IH (2024) Deep learning and fusion mechanism-based multimodal fake news detection methodologies: A review. Eng Technol Appl Sci Res. 14(4):15665-15675. doi:10.48084/etasr.7907.

Nawaz MZ, Nawaz MS, Fournier-Viger P, He Y (2024) Analysis and classification of fake news using sequential pattern mining. Big Data Min Anal. 7(3):942-963. doi:10.26599/bdma.2024.9020015.

Abbood EA, Al-Assadi TA (2022) A new convolution neural layer based on weights constraints. In: 2022 International Conference on Data Science and Intelligent Computing (ICDSIC). IEEE.

Abduljaleel IQ, Ali, IH (2025) Detecting fake news using BERT word embedding, attention mechanism, partition and overlapping text techniques. TEM Journal, 1152–1165. https://doi.org/10.18421/tem142-16.

Gu Y, Castro I, Tyson G (2024) Detecting Multimodal Fake News with Gated Variational AutoEncoder. In: ACM Web Science Conference. ACM; 129-138. doi: https://doi.org/10.1145/3614419.3643992.

Lu J, Goswami V, Rohrbach M, Parikh D, Lee S (2020) 12-in-1: Multi-Task Vision and Language Representation Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). doi: 10.1109/CVPR42600.2020.01045.

Yan F, Zhang M, Wei B, Ren K, Jiang W (2024) FMC: Multimodal fake news detection based on multi-granularity feature fusion and contrastive learning. Alex Eng J.109:376-393. doi:10.1016/j.aej.2024.08.103.

Kaliyar RK, Goswami A, Narang P (2021) FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed Tools Appl. ;80(8):11765-11788. doi:10.1007/s11042-020-10183-2.

Rustam F, Aljedaani W, Jurcut AD, Alfarhood S, Safran M, Ashraf I (2024) Fake news detection using enhanced features through text to image transformation with customized models. Discov Computing.;27(1). doi:10.1007/s10791-024-09490-1.

Dai K, Shao J, Gong B, Jing L, Chen Y (2024) CLIP-FSSC: A transferable visual model for fish and shrimp species classification based on natural language supervision. Aquacult Eng. ;107(102460):102460. doi:10.1016/j.aquaeng.2024.102460.

Luvembe AM, Li W, Li S, Liu F, Wu X (2024) CAF-ODNN: Complementary attention fusion with optimized deep neural network for multimodal fake news detection. Inf Process Manag. 61(3):103653. doi:10.1016/j.ipm.2024.103653.

Yang H, Zhang J, Zhang L, Cheng X, Hu Z (2024) MRAN: Multimodal relationship-aware attention network for fake news detection. Computer Stand Interfaces. 89(103822):103822. doi:10.1016/j.csi.2023.103822.

Yan F, Zhang M, Wei B, Ren K, Jiang W (2024) SARD: Fake news detection based on CLIP contrastive learning and multimodal semantic alignment. J King Saud Univ - Computer Inf Sci. ;36(8):102160. doi:10.1016/j.jksuci.2024.102160.

Qu Z, Meng Y, Muhammad G, Tiwari P (2024) QMFND: A quantum multimodal fusion-based fake news detection model for social media. Inf Fusion. 104(102172):102172. doi:10.1016/j.inffus.2023.102172.

Jonnapalli TR, Selvi M (2024) Detecting fake news in social media networks with deep learning techniques. In: AIP Conference Proceedings. Vol 3086. AIP Publishing;030006. doi: https://doi.org/10.1063/5.0211567.

Rashid J, Kim J, Masood A (2024) Unraveling the tangle of disinformation: A multimodal approach for fake news identification on social media. In: Companion Proceedings of the ACM Web Conference 2024. Vol 3. ACM; 1849-1853. doi: https://doi.org/10.1145/3589335.3651972.

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15. doi: https://doi.org/10.48550/arXiv.1409.0473.

Kou F, Wang B, Li H, Zhu C, Shi L, Zhang J, et al (2025) Potential Features Fusion Network for multimodal fake news detection. ACM Trans Multimedia Computer Communication Appl. doi:10.1145/3711866.

Wei K, Dai J, Hong D, Ye Y (2024) MGFNet: An MLP-dominated gated fusion network for semantic segmentation of high-resolution multi-modal remote sensing images. Int J Appl Earth Obs Geoinf. 135(104241):104241. doi:10.1016/j.jag.2024.104241.

Zhang C, Wu J (2024) Software defect prediction based on effective fusion of multiple features. IEEE Access. Published online 2024:1-1. doi:10.1109/access.2024.3409709.

Singhal S, Shah RR, Chakraborty T, Kumaraguru P, and Satoh S (2019) Spotfake: A multi-modal framework for fake news detection. in: 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), IEEE, pp. 39–47. Doi: https://doi.org/10.1109/BigMM.2019.00-44.

Silva A, Luo L, Karunasekera S, and Leckie C (2021) Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. Proc. Conf. AAAI Artificial Intelligence., vol. 35, no. 1, pp. 557–565. doi: https://doi.org/10.1609/aaai.v35i1.16134.

Chen Y, Li D, Zhang P, Sui J, Lv Q, Tun L, and Shang L (2022) Cross-modal ambiguity learning for multimodal fake news detection. in: Proceedings of the ACM Web Conference 2022, pp. 2897–2905. Doi: https://doi.org/10.1145/3485447.3511968.

Downloads

Published

2025-09-09

How to Cite

Abduljaleel, I. Q., & Ali, I. H. (2025). A Detector for Textual-Visual Fake News Using Text Summarization and Contrastive Language-Image Pretraining Embedding Model. IIUM Engineering Journal, 26(3), 185–199. https://doi.org/10.31436/iiumej.v26i3.3615

Issue

Section

Electrical, Computer and Communications Engineering