Abstract:Deep reinforcement learning uses deep neural networks to make significant progress in solving complex tasks.However,policy inference is costly,which decreases the practicality.Reducing inference cost is an important challenge for the implementation of this technology.This article shows that not all states in the target task are difficult to make decisions.Inspired by that,this work proposes an adaptive policy inference framework which achieves low policy inference cost with quality guarantees.Dynamic policy training algorithm is proposed:First,to accelerate inference at easy states,sub-policy networks ordered by its capacity are generated;Second,meta-policy is trained to choose a suitable sub-policy network subjected to state difficulty dynamically.To improve the efficiency of meta-policy inference,the framework shares the neural network parameters between sub-policy and meta-policy networks,and then train meta-policy under an extended Markov Decision Process.Extensive experiments conducted in gym show that the adaptive inference framework reduces the FLOPs by 3.4 while maintaining similar quality.