Deep Learning Using Zynq US+ FPGA Deep learning algorithms are becoming more popular for IoT applications on the edge because of human-level accuracy in object recognition and classification. Some uses cases are included but not limited to face detection and recognition in security cameras, video classification, speech recognition, real time multiple object tracking, character recognition, gesture recognition, financial forecasting and medical diagnostic systems. Deep learning algorithms, as a subset of machine learning, are inspired by human brain neural network. Deploying the biological neural network concepts in machine learning has demonstrated strong effectiveness in solving learning problems not possible before. In particular, Convolutional Neural Networks (CNNs) have shown agile and reliable image detection and recognition for computer vision applications. Deep layers of such networks create a neural network which is used to create a model in deep learning. As it sounds, implementing these algorithms in IoT edge devices requires processing systems which support such computationally heavy multi-layer networks with low power consumption. CPUs fail to achieve desired performance as they require intensive CPU operations and memory bandwidth. In this regard, FPGAs, ASICs and GPUs are in demand to be used as the edge processing core. FPGAs, due to reconfigure ability, parallelism, and energy efficiency have proved better performance in computationally intensive image and voice recognition applications. We have analyzed the GPUs vs FPGAs for machine learning applications in a separate topic in more details find here. Developing a deep learning application using FPGA might sound difficult. At Aldec, we have paved the path by providing ready-to-use FPGA-based object detection solutions using CNN to our customers to kick off their project fast. In these applications, Deep Learning Processing Units (DPU) are implemented into the FPGA side for acceleration which results in 45 fps for 3 channels input. As a matter of fact, the bigger the FPGA, the more DPU units we could add which brings better performance. TySOM-3A-ZU19EG embedded prototyping board has 1,143K logic cells which allows the implementation of multiple (1-3) DPUs that is critical for many channel processing applications. Image below shows a structure of the demo and the results on the output screen. TySOM-3A-ZU19EG Embedded Prototyping Board The user is able to use either live video camera or the prerecorded video as the input for this reference design. To connect external cameras, Aldec provides FMC-ADAS card which features 5x FPD-Link with HSD connectors. It can be used as an extension to Aldec’s ADAS application which has several camera inputs. The prerecorded video could be provided to the board using the micro SD card, SATA or from the cloud. In addition to the object detection application, Aldec provides the SDx platform for Face detection, Gesture detection, Pedestrian detection and Segmentation. These demo designs are provided as a reference design to our Zynq-based prototyping board customers. These designs are tested using different inputs such as USB camera, FMC-ADAS card which connects to a blue eagle camera using FPD III link and pre-recorded videos stored on SD card. Please see the following table for the performance analysis. APPLICATION NAME INPUT PERFORMANCE TySOM-3A TySOM-3 FMC-ADAS Up to 30 FPS Up to 30 FPS Gesture Detection Video Up to 18 FPS Up to 17 FPS FMC-ADAS Up to 30 FPS Up to 30 FPS Pedestrian Detection Video Up to 25 FPS Up to 25 FPS FMC-ADAS Up to 30 FPS Up to 30 FPS Segmentation Video Up to 24 FPS Up to 24 FPS FMC-ADAS Up to 10 FPS Up to 10 FPS Traffic Detection Video Up to 36 FPS Up to 35 FPS FMC-ADAS Up to 30 FPS Up to 30 FPS Table 1. DNN based Designs Performance Analysis for on TySOM-3A and TySOM-3 Main Features Includes Either TySOM-3A-ZU19EG or TySOM-3-ZU7EV Includes reference designs and the instruction on how to create the DNN design ADAS based solution includes FMC-ADAS card and blue eagle camera with 192-degree wide lens Up to 40 fps performance on each video channel Prebuilt and ready to use files pre-loaded into the SD card Solution Contents Face detection, gesture detection, pedestrian detection, Segmentation and Traffic detection reference design using DNN TySOM-3A-ZU19EG and TySOM-3-ZU7EV SDx platform for a ll the designs Prebuilt petalinux embedded OS to run the design Includes instruction and the source files to run the design