File(s) under permanent embargo
Resource-constrained FPGA implementation of YOLOv2
journal contributionposted on 2022-10-28, 03:09 authored by Z Zhang, M A Parvez Mahmud, Abbas KouzaniAbbas Kouzani
Progress is being made to deploy convolutional neural networks (CNNs) into the Internet of Things (IoT) edge devices for handling image analysis tasks locally. These tasks require low-latency and low-power computation on low-resource IoT edge devices. However, CNN-based algorithms, e.g. YOLOv2, typically contain millions of parameters. With the increase in the CNN’s depth, filters are increased by a power of two. A large number of filters and operations could lead to frequent off-chip memory access that affects the operation speed and power consumption of the device. Therefore, it is a challenge to map a deep CNN into a low-resource edge IoT platform. To address this challenge, we present a resource-constrained Field-Programmable Gate Array implementation of YOLOv2 with optimized data transfer and computing efficiency. Firstly, a scalable cross-layer dataflow strategy is proposed which allows on-chip data transfer between different types of layers, and offers flexible off-chip data transfer when the intermediate results are unaffordable on-chip. Next, a filter-level data-reuse dataflow strategy together with a filter-level parallel multiply-accumulate operation computing processing elements array is developed. Finally, multi-level sliding buffers are developed to optimize the convolutional computing loop and reuse the input feature maps and weights. Experiment results show that our implementation has achieved 4.8 W of low-power consumption for executing YOLOv2, an 8-bit deep CNN containing 50.6 MB weights, using low-resource of 8.3 Mbits on-chip memory. The throughput and power efficiency are 100.33 GOP/s and 20.90 GOP/s/W, respectively.