MAINTAIN AGENT CONSISTENCY IN SURAKARTA CHESS USING DUELING DEEP NETWORK WITH INCREASING BATCH

Authors

DOI:

https://doi.org/10.31436/iiumej.v23i1.1807

Keywords:

Reinforcement Learning, Dueling Deep Q-Network, Surakarta Chess, Incremental Batch Size

Abstract

Deep reinforcement learning usage in creating intelligent agents for various tasks has shown outstanding performance, particularly the Q-Learning algorithm. Deep Q-Network (DQN) is a reinforcement learning algorithm that combines the Q-Learning algorithm and deep neural networks as an approximator function. In the single-agent environment, the DQN model successfully surpasses human ability several times over. Still, when there are other agents in the environment, DQN may experience decreased performance. This research evaluated a DQN agent to play in the two-player traditional board game of Surakarta Chess. One of the drawbacks that we found when using DQN in two-player games is its consistency. The agent will experience performance degradation when facing different opponents. This research shows Dueling Deep Q-Network usage with increasing batch size can improve the agent's performance consistency. Our agent trained against a rule-based agent that acts based on the Surakarta Chess positional properties and was then evaluated using different rule-based agents. The best agent used Dueling DQN architecture with increasing batch size that produced a 57% average win rate against ten different agents after training for a short period.

ABSTRAK: Pembelajaran Peneguhan Mendalam adalah terbaik apabila digunakan bagi mewujudkan ejen pintar dalam menyelesaikan pelbagai tugasan, terutama jika ia melibatkan algoritma Pembelajaran-Q. Algoritma Rangkaian-Q Mendalam (DQN) adalah Pembelajaran Peneguhan berasaskan gabungan algoritma Pembelajaran-Q dan rangkaian neural sebagai fungsi penghampiran. Melalui persekitaran ejen tunggal, model DQN telah beberapa kali berjaya mengatasi kemampuan manusia. Namun, ketika ejen lain berada dalam persekitaran ini, DQN mungkin kurang berjaya. Kajian ini melibatkan ejen DQN bermain papan tradisional iaitu Catur Surakarta dengan dua pemain. Salah satu kekurangan yang dijumpai adalah konsistensi. Ejen ini akan kurang bagus ketika berhadapan lawan berbeza. Kajian menunjukkan dengan penggunaan Rangkaian-Q Dwipertarungan Mendalam bersama peningkatan saiz kumpulan dapat meningkatkan konsistensi prestasi ejen. Ejen ini telah dilatih untuk melawan ejen lain berasaskan peraturan dan sifat kedudukan Catur Surakarta. Kemudian, ejen ini diuji berpandukan peraturan berbeza. Ejen terbaik adalah yang menggunakan rekaan DQN Dwipertarungan bersama peningkatan saiz kumpulan. Ianya berhasil memenangi permainan dengan purata 57% berbanding sepuluh agen lain melalui latihan jangka masa pendek.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations: 2-4 May 2016; San Juan, Puerto Rico. Retrieved from http://arxiv.org/abs/1509.02971.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. (2015) Human-level control through deep reinforcement learning. Nature, 518 (7540): 529–533. doi: 10.1038/nature14236. DOI: https://doi.org/10.1038/nature14236

Yogatama D, Blunsom P, Dyer C, Grefenstette E, Ling W Learning to compose words into sentences with reinforcement learning. In 5th International Conference on Learning Representations: 24-26 April 2017; Toulon, France. Retrieved from https://openreview.net/forum?id=Skvgqgqxe.

LeCun Y, Bottou L, Bengio Y, Haffner P. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11): 2278–2324. DOI: https://doi.org/10.1109/5.726791

Watkins CJ, Dayan P. (1992) Q-learning. Machine learning, 8 (3–4): 279–292. DOI: https://doi.org/10.1023/A:1022676722315

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. (2016) Mastering the game of Go with deep neural networks and tree search. Nature, 529 (7587): 484–489. doi: 10.1038/nature16961. DOI: https://doi.org/10.1038/nature16961

van den Dries S, Wiering MA. (2012) Neural-fitted TD-leaf learning for playing othello with structured neural networks. IEEE Transactions on Neural Networks and Learning Systems, 23 (11): 1701–1713. doi: 10.1109/TNNLS.2012.2210559. DOI: https://doi.org/10.1109/TNNLS.2012.2210559

Liskowski P, Jaskowski W, Krawiec K. (2018) Learning to play othello with deep neural networks. IEEE Transactions on Games, 10 (4): 354–364. doi: 10.1109/TG.2018.2799997. DOI: https://doi.org/10.1109/TG.2018.2799997

Kasim MF. (2016) Playing the game of congklak with reinforcement learning. In Proceedings of 8th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE. pp 1–5. doi: 10.1109/ICITEED.2016.7863309. DOI: https://doi.org/10.1109/ICITEED.2016.7863309

Lai M. (2015) Giraffe: Using deep reinforcement learning to play chess. Master’s thesis, Imperial College London, Retrieved from https://arxiv.org/pdf/1509.01549.pdf.

David OE, Netanyahu NS, Wolf L. (2016) DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. In Villa AEP, Masulli P, Pons Rivero AJ eds Artificial Neural Networks and Machine Learning – ICANN 2016. Cham, Springer International Publishing. pp 88–96. doi: 10.1007/978-3-319-44781-0_11.Retrieved from http://link.springer.com/10.1007/978-3-319-44781-0_11. DOI: https://doi.org/10.1007/978-3-319-44781-0_11

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. (2017) Mastering the game of Go without human knowledge. Nature, 550 (7676): 354–359. doi: 10.1038/nature24270. DOI: https://doi.org/10.1038/nature24270

Morav?ík M, Schmid M, Burch N, Lis? V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M. (2017) DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356 (6337): 508. doi: 10.1126/science.aam6960. DOI: https://doi.org/10.1126/science.aam6960

Littman ML. (1994) Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994. Elsevier. pp 157–163.Retrieved from https://doi.org/10.1016/b978-1-55860-335-6.50027-1. DOI: https://doi.org/10.1016/B978-1-55860-335-6.50027-1

Somasundaram TS, Panneerselvam K, Bhuthapuri T, Mahadevan H, Jose A. (2018) Double Q–learning Agent for Othello Board Game. In Proceedings of 10th International Conference on Advanced Computing (ICoAC). IEEE. pp 216–223. doi: 10.1109/ICoAC44903.2018.8939117. DOI: https://doi.org/10.1109/ICoAC44903.2018.8939117

QIU H, WANG Y, GAO F, Quan QIU. (2013) Research on search engine techniques of Surakarta game. The Journal of China Universities of Posts and Telecommunications, 20 117–120. doi: 10.1016/S1005-8885(13)60235-6. DOI: https://doi.org/10.1016/S1005-8885(13)60235-6

Liu C, Wang J, Zhang Y. (2014) Search strategy research and analysis for surakarta chess game. In Proceedings of The 26th Chinese Control and Decision Conference (2014 CCDC). IEEE. pp 3362–3366. DOI: https://doi.org/10.1109/CCDC.2014.6852756

Winands MH. (2015) The Surakarta bot revealed. In Computer Games. Springer. pp 71–82. DOI: https://doi.org/10.1007/978-3-319-39402-2_6

Zuo G, Wu C. (2016) A heuristic Monte Carlo tree search method for surakarta chess. In Proceedings of Chinese Control and Decision Conference (CCDC). IEEE. pp 5515–5518. doi: 10.1109/CCDC.2016.7531982. DOI: https://doi.org/10.1109/CCDC.2016.7531982

Li S, Qi Y, Bo J, Fu Y. (2019) Design and Implementation of Surakarta Game System Based on Reinforcement Learning. In Proceedings of Chinese Control and Decision Conference (CCDC). IEEE. pp 6326–6329. doi: 10.1109/CCDC.2019.8832340. DOI: https://doi.org/10.1109/CCDC.2019.8832340

Draskovic D, Brzakovic M, Nikolic B. (2019) A comparison of machine leaming methods using a two player board game. In Proceedings of IEEE EUROCON 2019-18th International Conference on Smart Technologies. IEEE. pp 1–5. doi: 10.1109/EUROCON.2019.8861927. DOI: https://doi.org/10.1109/EUROCON.2019.8861927

Young K, Vasan G, Hayward R. (2016) Neurohex: A deep q-learning hex agent. In Computer Games. Springer. pp 3–18.Retrieved from https://doi.org/10.1007/978-3-319-57969-6_1. DOI: https://doi.org/10.1007/978-3-319-57969-6_1

Arvidsson O, Wallgren L. (2010) Q-Learning for a Simple Board Game. PhD Thesis, Bachelor’s Thesis, School of Computer Science and Engineering, KTH Royal Institute of Technology, Retrieved from http://www.csc.kth.se/utbildning/kandidatexjobb/datateknik/2010/rapport/arvidsson_oskar_OCH_wallgren_linus_K10047.pdf.

Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N. (2016) Dueling network architectures for deep reinforcement learning. In Proceedings of International conference on machine learning. PMLR. pp 1995–2003.Retrieved from http://proceedings.mlr.press/v48/wangf16.html.

Smith SL, Kindermans P-J, Ying C, Le QV. (2018) Don’t Decay the Learning Rate, Increase the Batch Size. In 6th International Conference on Learning Representations: 30 April - 3 May 2018; Vancouver, BC, Canada. Retrieved from https://openreview.net/forum?id=B1Yy1BxCZ.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F dtextquotesingle, Fox E, Garnett R eds Proceedings of Advances in Neural Information Processing Systems 32. Curran Associates, Inc. pp 8026–8037.Retrieved from http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.

Rajagede RA, Mahardhika GP. (2021) Gapoera: Application Programming Interface for AI Environment of Indonesian Board Game. In arXiv. Retrieved from https://arxiv.org/abs/2110.11924.

Kingma DP. (2015) Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations; San Diego, CA, USA. Retrieved from http://arxiv.org/abs/1412.6980.

Glorot X, Bengio Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings. pp 249–256.

He K, Zhang X, Ren S, Sun J. (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. pp 1026–1034. DOI: https://doi.org/10.1109/ICCV.2015.123

Downloads

Published

2022-01-04

How to Cite

Rian Adam Rajagede. (2022). MAINTAIN AGENT CONSISTENCY IN SURAKARTA CHESS USING DUELING DEEP NETWORK WITH INCREASING BATCH. IIUM Engineering Journal, 23(1), 159–171. https://doi.org/10.31436/iiumej.v23i1.1807

Issue

Section

Electrical, Computer and Communications Engineering