Sample Efficient Reinforcement Learning Through Learning From Demonstrations In Minecraft

From World History
Jump to: navigation, search

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. gaming proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Cite this Paper



BibTeX @InProceedingspmlr-v123-scheller20a, title = Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft, author = Scheller, Christian and Schraner, Yanick and Vogel, Manfred, booktitle = Proceedings of the NeurIPS 2019 Competition and Demonstration Track, pages = 67--76, year = 2020, editor = Escalante, Hugo Jair and Hadsell, Raia, volume = 123, series = Proceedings of Machine Learning Research, month = 08--14 Dec, publisher = PMLR, pdf = http://proceedings.mlr.press/v123/scheller20a/scheller20a.pdf, url = https://proceedings.mlr.press/v123/scheller20a.html, abstract = Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Copy to ClipboardDownload Endnote %0 Conference Paper %T Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft %A Christian Scheller %A Yanick Schraner %A Manfred Vogel %B Proceedings of the NeurIPS 2019 Competition and Demonstration Track %C Proceedings of Machine Learning Research %D 2020 %E Hugo Jair Escalante %E Raia Hadsell %F pmlr-v123-scheller20a %I PMLR %P 67--76 %U https://proceedings.mlr.press/v123/scheller20a.html %V 123 %X Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using MINECRAFT , experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning. Copy to ClipboardDownload APA Scheller, C., Schraner, Y. & Vogel, M.. (2020). Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft. Proceedings of the NeurIPS 2019 Competition and Demonstration Track, in Proceedings of Machine Learning Research 123:67-76 Available from https://proceedings.mlr.press/v123/scheller20a.html.