FPGA Acceleration of Deep Reinforcement Learning Using On-chip Replay Management

Yuan Meng, Zhang Chi, Viktor Prasanna

May, 2022

DRL System

Abstract

A major bottleneck in parallelizing deep reinforcement learning (DRL) is in the high latency to perform various operations used to update the Prioritized Replay Buffer on CPU. The low arithmetic intensity of these operations leads to severe under-utilization of the SIMT computation power of GPUs. In this work, we propose a high-throughput on-chip accelerator for Prioritized Replay Buffer and learner that efficient allocates computation and memory resources to saturate the FPGA computation power. Our design features hardware pipelining on FPGA such that the latency of replay operations is completely hidden.

Type

Conference paper

Publication

In Proceedings of the 19th ACM International Conference on Computing Frontiers

FPGA Acceleration of Deep Reinforcement Learning Using On-chip Replay Management

Abstract

Yuan Meng

Senior SDE - AI Engine Architecture Team