Manav Mishra, Hritik Bana, Saswata Sarkar, Sujeevraja Sanjeevi, PB Sujit, and Kaarthik Sundar
This article presents a deep reinforcement learning-based approach to tackle a persistent monitoring mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with greedy heuristics and min-max latency TSP solutions.