Workflow Scheduling Strategy for Reasoning Task of Autonomous Driving
LIN Kai1,LU Yu1,CHEN Xing2,LIN Bing1
1(College of Physics and Energy,Fujian Normal University,Fuzhou 350117,China)
2(College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350116,China)
摘要 目前自动驾驶推理任务调度中要解决的关键问题是如何在不同的时间窗内,让实时推理任务满足可容忍时间约束的前提下,在相应的处理设备上被调度执行完成.在不同时间窗内,依据边缘节点的数量变化以及推理任务的不同,设计了一种边缘环境下基于强化学习算法的工作流调度策略.首先,利用推理任务工作流调度算法计算任务的完成时间;其次,采用基于模拟退火的Q学习算法(Q-learning based on simulated annealing,SA-QL)来优化推理任务的完成时间;最后,从可行性、收敛性、有效性和探索性四个角度来体现基于模拟退火的强化学习算法(Reinforement learning based on simulated annealing,SA-RL)和粒子群优化算法(Particle Swarm Optimization,PSO)的性能差异.实验结果表明,模拟退火的强化学习算法和粒子群优化算法都具有可行性和有效性,单步时序差分算法(TD(0))具有更强的探索性,多步时序差分算法(TD(λ))具有更强的收敛性.
Abstract:At present,the key problem to be solved in task scheduling of autonomous driving reasoning is how to schedule the real-time reasoning task on the corresponding processing equipment satisfying the constraint of tolerance time in different time-slots.In different time-slots,a workflow scheduling strategy based on reinforcement learning algorithm is designed according to the number of edge nodes and different reasoning tasks.First of all,the completion time of the task is calculated by the workflow scheduling algorithm of reasoning task.Secondly,Q-learning based on simulated annealing(SA-QL)is used to optimize the completion time of reasoning task.Finally,the performance differences between SA-RL and PSO are reflected from the four aspects of feasibility,convergence,effectiveness and exploration.The experimental results show that SA-RL and PSO are feasible and effective.TD(0)algorithms show better performance of exploration,TD(λ)algorithms show that of convergence.