Enhanced Beach Photo Translation using Modified Unsupervised GAN with Regularization

Kartika Fithriasari; Benedictus Kenny Tjahjono

doi:10.31436/iiumej.v27i1.3824

Authors

Kartika Fithriasari Sepuluh Nopember Institute of Technology https://orcid.org/0000-0003-4543-4884
Benedictus Kenny Tjahjono Sepuluh Nopember Institute of Technology https://orcid.org/0009-0006-2989-9648

DOI:

https://doi.org/10.31436/iiumej.v27i1.3824

Keywords:

Beach Photo, CycleGAN, Neural Network, Unsupervised GAN, Image Translation

Abstract

To optimize time and cost, tourists often require tools to modify the sky background and atmosphere in beach photos, such as replacing blue-sky views with sunsets or vice versa. Independent modification of the sky and sea is difficult because their color palettes are similar. Another problem that often occurs in image translation is the scarcity of paired datasets, and beach photo datasets are particularly limited in lighting conditions, weather variations, and viewing perspectives. This limitation can cause Generative Adversarial Networks (GAN) models to lose their generalization ability, become prone to overfitting, and produce visual artifacts in the outputs. Therefore, this study proposes an unsupervised GAN approach using a modified CycleGAN and improves its performance for beach image translation by integrating identity mapping, -parameter optimization, a multiscale kernel, and regularization techniques. CycleGAN consists of two generators and two discriminators. The sunset generator translates a blue sky into a sunset sky; the generated output is then passed to the sunset discriminator to determine whether it is real or fake. The generator input image is resized and normalized through preprocessing. The generator architecture is structured to enhance image reconstruction and feature extraction. The details of the translation results are fine-tuned using a 30x30 PatchGAN discriminator and a multiscale kernel convolutional layer. The effect of the hyperparameter , which strikes a balance between cycle consistency, structural preservation, and color fidelity, is also investigated in this work. The findings indicate that while higher values increase generator loss, they also improve consistency, making it harder to handle dark objects and white clothing. To overcome this issue, regularization techniques, namely photometric augmentation and spectral normalization (SN), together with multiscale kernel convolutional (MSCov), have been applied. Photometric augmentation and MSCov are used to enhance the model's robustness to photographic variations, while SN improves its efficiency and stability. The results of the study show that the proposed method improves image translation accuracy as measured by Mean Squared Error (MSE), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS).

ABSTRAK: Bagi mengoptimum masa dan kos, pelancong sering memerlukan alat menukar foto latar belakang langit dan suasana pantai, seperti mengganti pemandangan langit biru dengan matahari terbenam atau sebaliknya. Pengubahsuaian bebas langit dan laut sukar dilakukan kerana palet warna langit dan laut adalah serupa. Masalah lain sering berlaku dalam menterjemah imej adalah ketiadaan set data foto berpasangan dan foto pantai sering mengalami kepelbagaian terhad, terutama dari segi pencahayaan, variasi cuaca dan perspektif tontonan. Had ini boleh menyebabkan model Rangkain Generatif Adversari (GAN) ??kehilangan keupayaan generalisasi, terdedah kepada terlebih muat dan penghasilan artifak visual dalam hasil terjemahan. Oleh itu, kajian ini mencadangkan pendekatan GAN tanpa pengawasan menggunakan CycleGAN yang diubah suai bagi meningkatkan prestasi terjemahan imej pantai melalui penyepaduan pemetaan identiti, pengoptimuman parameter , kernel berbilang skala dan menggunakan teknik regularisasi. CycleGAN terdiri daripada dua generator dan dua rangkaian neural diskriminator. Generator matahari terbenam digunakan dalam menterjemah langit biru kepada langit matahari terbenam, kemudian imej terhasil dimajukan kepada diskriminator matahari terbenam bagi menentukan sama ada imej terhasil dikelaskan sebagai imej sebenar atau imej palsu. Imej input generator diubah saiz dan dinormalkan melalui prapemprosesan. Seni bina generator distrukturkan dengan meningkatkan pembinaan semula imej dan pengekstrakan ciri. Butiran hasil terjemahan diperhalusi menggunakan diskriminator PatchGAN 30x30 dan lapisan konvolusi kernel berbilang skala. Kesan hiper parameter turut dikaji bagi mencapai keseimbangan antara ketekalan kitaran, pemeliharaan struktur, dan kesetiaan warna. Dapatan kajian menunjukkan bahawa walaupun nilai lebih tinggi ianya meningkatkan kehilangan generator dan konsistensi, menjadikannya lebih sukar dalam mengendali objek gelap dan pakaian putih. Bagi mengatasi isu ini, teknik regularization iaitu pembesaran fotometrik dan normalisasi spektral (SN) bersama Konvulasi Kernel Skala Berbilang (MSCov) telah digunakan. Pembesaran fotometrik dan MSCov dilaksanakan dalam meningkatkan keteguhan model pada variasi fotografi, manakala SN digunakan bagi meningkatkan kecekapan dan kestabilan model. Hasil kajian menunjukkan kaedah ini mampu meningkatkan ketepatan hasil terjemahan imej berdasarkan Ralat Purata Kuasa Dua (MSE), Indeks Persamaan Struktur (SSIM) dan Persamaan Tampalan Perseptual Imej Terpelajar (LPIPS).

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, 2223-2232. https://openaccess.thecvf.com/content_

iccv_2017/html/Zhu_Unpaired_Image-To-Image_Translation_ICCV_2017_paper.html

Tong, Z. (2024). Exploring the Impact of Hyperparameters on the Generation Quality of CycleGAN. Transactions on Computer Science and Intelligent Systems Research, 5, 265–271. https://doi.org/10.62051/01M93A63

Hu, Y. (2024). Impact of Hyperparameters on the Quality of Image Translation Using CycleGAN. Transactions on Computer Science and Intelligent Systems Research, 5, 487–492. https://doi.org/10.62051/M04WSD55

Zhao, S., Liu, Z., Lin, J., Zhu Adobe, J.-Y., & Song Han, C. (2020). Differentiable Augmentation for Data-Efficient GAN Training. Advances in Neural Information Processing Systems, 33, 7559–7570.https://proceedings.neurips.cc/paper_files/paper/2020/file/55479c55ebd1efd3ff125f1337100388-Paper.pdf

Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6(1), 1–48. https://doi.org/10.1186/S40537-019-0197-0

Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The Expressive Power of Neural Networks: A View from the Width. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/32cbf687880eb1674a07bf717761dd3a-Abstract.html

Qin, H., Gong, R., Liu, X., Shen, M., Wei, Z., Yu, F., & Song, J. (2020). Forward and backward information retention for accurate binary neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2250–2259. http://openaccess.thecvf.com/content_CVPR_2020/html/Qin_Forward_and_Backward_Information_Retention_for_Accurate_Binary_Neural_Networks_CVPR_2020_paper.html

Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2), 223–311. https://doi.org/10.1137/16M1080173

Kingma, D. P., & Ba, J. L. (2014). Adam: A method for stochastic optimization. ArXiv Preprint ArXiv, 1412.6980. https://arxiv.org/abs/1412.6980

Aggarwal, C. C. (2018). Neural networks and deep learning: a textbook. Springer. https://dlib.hust.edu.vn/handle/HUST/24439

Javier, F., Morales, O., & Roggen, D. (2016). Deep Convolutional Feature Transfer Across Mobile Activity Recognition Domains, Sensor Modalities and Locations. Proceedings of the 2016 ACM International Symposium on Wearable Computers, 92 – 99. https://doi.org/

1145/2971763.2971764

Brahimi, S., Ben Aoun, N., Ben Amar, C., Benoit, A., & Lambert, P. (2018). Multiscale Fully Convolutional DenseNet for Semantic Segmentation. WSCG J, 26(2), 104–111. https://doi.org/10.24132/JWSCG.2018.26.2.5

Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv Preprint ArXiv, 1409.1556. https://arxiv.org/abs/

1556

Bello, A., Ng, S. C., & Leung, M. F. (2024). Skin cancer classification using fine-tuned transfer learning of DENSENET-121. Applied Sciences, 14(17), 7707. https://doi.org/10.

/APP14177707

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (Vol. 27). Curran Associates, Inc. https://proceedings.

neurips.cc/paper_files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf

Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral Normalization for Generative Adversarial Networks. 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. https://arxiv.org/abs/

05957

Yoshida, Y., & Miyato, T. (2017). Spectral Norm Regularization for Improving the Generalizability of Deep Learning. ArXiv Preprint ArXiv, 1705.10941. https://arxiv.org/

abs/1705.10941