Yuan Meng

Yuan Meng

Senior SDE - AI Engine Architecture Team

Advanced Micro Devices, Inc. (AMD)

Biography

Yuan Meng is an AI/ML Performance Software Engineer at AMD, working on optimizing DNN workloads performance and driving the architecture for next-generation AI Engines. Prior to joining AMD, Yuan Meng obtained Ph.D. in Computer Engineering at the University of Southern California under the advisement of Professor Viktor K. Prasanna. Her research interests are accelerating computationally intensive algorithms on heterogeneous computing platforms, and developing portable software and libraries for the same. The main focuses in her thesis are in accelerating Reinforcement Learning algorithms and Deep Learning Models on emerging infrastructures with shared memory, data parallel architecture (CPU, GPU), and spatial architecture (FPGA).

Interests
  • Heterogeneous Computing
  • Artificial Intelligence
  • FPGA
  • AI Engines, NPUs
  • Parallel Programming
Education
  • PhD in Computer Engineering, 2024

    University of Southern California

  • BSc in Electrical and Computer Engineering, 2019

    Rensselaer Polytechnic Institute

Recent Publications

(2023). Accelerating Deep Neural Network guided MCTS using Adaptive Parallelism. In ACM/IA3(SC'23 Workshop).

PDF Cite Code

(2022). Accelerator Design and Exploration for Deformable Convolution Networks. In IEEE/SiPS.

PDF Cite Code

(2022). Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform. In IEEE/FPL.

PDF Cite Code

(2022). FPGA Acceleration of Deep Reinforcement Learning Using On-chip Replay Management. In ACM/CF.

PDF Cite Code

(2021). PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization. In IEEE/TPDS.

PDF Cite Code

(2021). FGYM: Toolkit for Benchmarking FPGA based Reinforcement Learning Algorithms. In IEEE/FPL.

PDF Cite Code

(2021). Dynamap: Dynamic Algorithm Mapping Framework for Low Latency CNN Inference. In ACM/FPGA.

PDF Cite

(2021). How to Avoid Zero-spacing in Fractionally-Strided Convolution? A Hardware-Algorithm Co-design Methodology. In IEEE/HiPC.

PDF Cite Code

(2020). How to Efficiently Train Your AI Agent? Characterizing and Evaluating Reep Reinforcement Learning on Heterogeneous Platforms. In IEEE/HPEC.

PDF Cite

(2020). QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators. In IEEE/IPDPSW.

PDF Cite Code

(2020). Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms. In IEEE/FCCM.

PDF Cite

News

Academic Service
I am serving as the Publications Chair for FCCM 2023
Academic Service
I am serving as the Proceedings Chair for HiPC 2022 and 2023
Scholarship Nomination
I am selected as one of the finalists of the Ming Hsieh Ph.D. Scholar!
Best Paper Award
Our paper “FPGA Acceleration of Deep Reinforcement Learning using On-Chip Replay Management” received Best Paper Award in the 2022 ACM International Conference on Computing Frontiers!
Outstanding Student Paper Award
Our paper “How to efficiently train your ai agent? characterizing and evaluating deep reinforcement learning on heterogeneous platforms” received Outstanding Student Paper Award in the 2020 IEEE High Performance Extreme Computing Virtual Conference!

Experience

 
 
 
 
 
University of Southern California
Graduate Teaching Assistant
University of Southern California
January 2020 – December 2021 California

Taught courses:

  • Parallel and Distributed Computing
  • Accelerated Computing using FPGAs
  • Parallel Programming
 
 
 
 
 
Rensselaer Polytechnic Institute
Undergraduate Teaching Assistant
Rensselaer Polytechnic Institute
January 2017 – January 2019 California

Taught courses:

  • Embedded Control
  • Foundation of Computer Science
 
 
 
 
 
Hasbro. Inc
Electronics Engineer (Intern)
Hasbro. Inc
January 2018 – June 2018 California
Prototyping for Animatronics and games; Research on embedded voice recognition and computer vision applications in toys.

Skills

Parallel Programming

P-Threads, OpenMP, MPI, SYCL

Accelerator Design

High-Level Synthesis, OpenCL

Embedded Prototyping

Arduino, stm32 microprocessors, Raspberry Pi

Contact

Views