Comparative Analysis of DRL for UAVs: Achieving Superior Performance with ETGL-DDPG

شاهمیر, محمدباقر; دهقانی, فرشته

doi:10.22052/scj.2026.258546.1338

Comparative Analysis of DRL for UAVs: Achieving Superior Performance with ETGL-DDPG

مقالات آماده انتشار

نوع مقاله : مقاله پژوهشی

نویسندگان

دانشکده مهندسی برق و کامپیوتر، گروه هوش مصنوعی ، دانشگاه کاشان، کاشان، ایران.

10.22052/scj.2026.258546.1338

چکیده

Precise and robust quadcopter trajectory tracking is inherently challenged by non-linear dynamics, cross-axis coupling, and external aerodynamic disturbances. While classical control paradigms suffice under nominal operations, environmental uncertainties often degrade their reliability. Deep reinforcement learning offers a powerful, model-free alternative; however, standard continuous actor-critic methods vary drastically in optimization stability, sample efficiency, and computational overhead. To address these limitations, this paper presents a rigorous comparative evaluation of five continuous DRL frameworks, DDPG, PPO, SAC, SD3, and our proposed ETGL-DDPG architecture, benchmarked within a high-fidelity, six-degrees-of-freedom quadcopter simulation. All controllers are implemented under identical environmental constraints to directly modulate continuous motor velocities via a unified quadratic-linear reward function designed through a structured grid search. We systematically analyze performance across multiple independent seeds using rigorous metrics, including cumulative reward trajectories, policy variance envelopes, convergence velocity, and 3D flight paths.Experimental results demonstrate that while PPO stabilizes early updates and SAC achieves high reward bounds, they suffer from sample inefficiency and steady-state tracking oscillations, respectively. Conversely, the proposed ETGL-DDPG framework demonstrates uncompromised superiority across all evaluation frontiers. Driven by a Goal-Conditioned Dual Replay Buffer and an adaptive longest-step return propagation mechanism, ETGL-DDPG eliminates catastrophic policy collapse, establishes tight variance bounds, and delivers a dominant final task success rate of $98.4\%$ alongside a minimal tracking Root Mean Square Error of $0.206\text{ m}$. Finally, hardware latency profiles confirm that our framework preserves real-time operational efficiency, establishing its viability for resource-constrained onboard flight deployment.

کلیدواژه‌ها

موضوعات

هوش مصنوعی

عنوان مقاله [English]