The efficacy of Convolutional Neural Networks (CNN) has been proven in a wide range of machine learning applications. However, the high computational complexity of CNNs often hinders their widespread adoption for real-time and low power applications. FPGAs are poised to take a significant role for high-performance and low-energy computation of CNNs for both mobile (e.g., UAVs or self-driving cars) and cloud computing domains. However, it is challenging to implement an effective and efficient CNN system on FPGAs. To address these challenges, we propose a novel automated toolchain called Open-DNN. Our toolchain takes trained CNN models specified
in either Caffe or annotated TensorFlow as input, performs a set of transformations, and maps the model to a cloud-based FPGA. Open-DNN can significantly improve the overall design productivity of neural networks on FPGAs, while also satisfying the emergent computational and energy efficiency requirements. Our design presents an alternative solution compared to other cloud-based options (e.g., CPUs or GPUs) while offering flexibility, low power/energy, and high performance. Open-DNN also provides additional features such as supporting quantized network model with fixed-point representation and balancing the on-chip resource usage during the implementation.
Technology Features, Specifications and Advantages
Specifically, the company toolchain integrates numerical techniques into an automated framework for analyzing, generating and implementing trained neural network models in cloud-based FPGA platforms by taking advantage of High-Level Synthesis (HLS) design methodology.
Our core features include:
• A fully hardware friendly higher level language template library that containing all fundamental CNN functionalities.
• An automated generation flow for CNN implementation on cloud-based FPGAs with Caffe and TensorFlow input.
• Accurate model of the accelerator template is implemented, along with model-based system optimization to generate an optimal system configuration.
• Full software stack generation, together with the accelerator IP system construction, which provides a system level solution for the input network models.
High throughput and low energy cost image and video analysis tasks with Neural Networks as the core functionality, such as current high-accuracy large-scale facial recognition, surveillance data processing.
Most of the current Artificial Intelligent applications with CNNs as the major functionalities, such as intelligent traffic control, video-based anomaly detection systems.
The company toolchain is proposed to provide the high computational capability and usability, to fully explore the computational potential that could be provided by the FPGA devices. Experimental results demonstrate comparable usability, flexibility, and strong quality when compared to CPU and GPU implementations.