{"id":243,"date":"2020-04-27T08:33:32","date_gmt":"2020-04-27T06:33:32","guid":{"rendered":"http:\/\/blog.zhaw.ch\/high-performance\/?p=243"},"modified":"2020-04-27T08:36:30","modified_gmt":"2020-04-27T06:36:30","slug":"znnn-the-framework-to-port-neural-networks-to-fpga","status":"publish","type":"post","link":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/","title":{"rendered":"ZNNN the Framework to Port Neural Networks to FPGA"},"content":{"rendered":"\n<p>BY TOBIAS WELTI AND HANS-JOACHIM GELKE<\/p>\n\n\n\n<p>Due to their hardware architecture, Field Programmable Gate Arrays (FPGAs) are optimally suited for the execution of machine learning algorithms. These algorithms require the calculation of millions or even billions of multiplications for each input. To successfully accelerate a neural network, parallel execution of multiplication is the key. The obvious suggestion for parallel execution is a Graphics Processing Unit (GPU), offering hundreds of execution cores. For years, GPU vendors have been adapting the capabilities of their GPUs to meet the demand for narrow integer and floating-point data types used in AI. But still, a GPU will execute one Neural Network (NN) layer after the other, with data transfers between computation cores and memory.<\/p>\n\n\n\n<p>Implementing Neural Networks in FPGAs has several advantages:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Flexible bit widths for both integer and fixed-point data types.<\/li><li>Large numbers of scalable hardware multiplier cores.<\/li><li>Flexibility due to tightly coupled memory blocks with wide parallel interfaces, allowing access to vast numbers of data points in each clock cycle.<\/li><\/ol>\n\n\n\n<p>Considering the previous points, the FPGA clearly provides all the resources required for highly parallel execution of NN algorithms.<\/p>\n\n\n\n<p><strong>Existing\nframeworks<\/strong><\/p>\n\n\n\n<p>Unfortunately,\nthe act of porting a trained network to HDL code for implementation in the FPGA\nis not trivial. FPGA vendors have started to provide frameworks for running NNs\nin their devices. These include HDL-coded NN-coprocessor cores as IP blocks and\nmatching compilers to convert a trained NN into a binary executable which will\nrun on the coprocessor. However, these frameworks are based on a specific\nsoftware library and therefore require a processor core running an operating\nsystem and controlling software. This means that the NN input data and network\nparameters are transferred from the software to the coprocessor in order to\ncalculate the output of the NN. The output values are then transferred back to\nthe software for interpretation.<\/p>\n\n\n\n<p>This is\nsubstantial overhead, especially if the input data is sampled or preprocessed\nin the FPGA fabric. It would be preferable to implement the neural network\nentirely in the FPGA fabric, capable of running independently from software.<\/p>\n\n\n\n<p><strong>ZHAW\nNative Neural Network<\/strong><\/p>\n\n\n\n<p>The ZHAW\nNative Neural Network (ZNNN) framework is aimed at the following goals:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Input may be received directly from FPGA fabric<\/li><li>Inference independent of CPU and software<\/li><li>Minimal latency<\/li><li>Maximal throughput<\/li><li>No access to DRAM required<\/li><\/ul>\n\n\n\n<p>With these\ngoals in mind, it is obvious that we trade in flexibility to gain performance\nand simplicity. The NN is implemented as a rigid block, designed for one single\nNN application. To allow for minimum latency, we use dedicated multipliers for\neach neuron, and each layer has its own memory block for the weights and\nbiases. Ping-Pong buffers allow to process one input vector in one layer while\nreceiving the next input vector. With this structure, pipelining delays can be\nminimized to the execution time of the largest layer.<\/p>\n\n\n\n<p>Our framework will take as input a structured text file with a description of the NN, including number of inputs, data-bit widths, fix point precision, number of neurons per layer for fully connected layers, number of filters and kernel size for convolutional layers, max pool and flatting layers. From this configuration file and a training and verification data set, it will generate:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Input A trained NN model<\/li><li> A behavioural model written in C programming language to generate a data set for verification of the VHDL code in simulation<\/li><li>A test bench for verification<\/li><li>The VHDL code of the NN ready for instantiation in your design.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"369\" src=\"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png\" alt=\"\" class=\"wp-image-244\" srcset=\"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png 1024w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-300x108.png 300w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-768x277.png 768w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-676x244.png 676w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow.png 1102w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Dedicated\nmultipliers for the neurons will use a significant amount of the available\nresources and it must be noted that larger networks will require considerably\nlarger devices. This will not be suitable for all NN applications. Our ZNNN\nframework is optimally suited for applications such as industrial machine\nsurveillance where only small networks will meet the latency requirements while\nstill achieving the required accuracy.<\/p>\n\n\n\n<p><strong>Performance<\/strong><\/p>\n\n\n\n<p>A direct\ncomparison of ZNNN with the Deep Learning Processing Unit (DPU) coprocessor\nfrom Xilinx shows that both have their justification, depending on the\napplication at hand:<\/p>\n\n\n\n<p>If you need\nto run multiple, different neural networks on your FPGA with a fair\nperformance, you should go with the Xilinx solution. The DPU allows to process\ndifferent NN on the same implementation but is restricted to\nsoftware-controlled operation.<\/p>\n\n\n\n<p>If\nperformance is essential and your application needs a single neural network,\nyou should use the ZNNN.<\/p>\n\n\n\n<p>The amount\nof resources in a Xilinx Zynq UltraScale+ EG9 device used by the different\nsolutions is shown in the following table. The &#8216;Xilinx DPU&#8217; will always use\nroughly the same amount of resources (depending on its configuration). It can\nprocess various neural networks, including very large ones, with a trade-off in\nthroughput and processing time (latency). The resource requirements of ZNNN\nstrongly depend on the size and type of NN you implement. &#8216;ZNNN MNIST&#8217; is a NN\nwith only dense layers, trained for the well-known MNIST example. MNIST is a NN\napplication that recognizes handwritten numbers. &#8216;ZNNN CONV&#8217; is a NN using\n1D-convolutional layers for non-linear signal processing in an industrial\napplication which accepts 64 data points as input. &#8216;ZNNN VIS&#8217; is a dense\nnetwork with 2304 inputs and one single output, used for an industrial\napplication. According to the large number of inputs, the number of multipliers\nrequired is very large.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"\"><thead><tr><td class=\"has-text-align-center\" data-align=\"center\">\n   NN\n   <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n   LUT\n   <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n   BRAM\n   <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n   DSP\n   <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n   Throughput (FPS)\n   <\/td><\/tr><\/thead><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\">\n  Xilinx DPU\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  47\n  k (17%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  132 (14%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  326 (13%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  2.5 k\n  <\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">\n  ZNNN MNIST\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  34\n  k (12%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  182 (20%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  947 (37%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  8510 k\n  <\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">\n  ZNNN CONV\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  20\n  k (7%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  124 (13%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  712 (28%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  291 k\n  <\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">\n  ZNNN VIS\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  87\n  k (32%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  182 (20%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  2467 (98%)\n  <\/td><td class=\"has-text-align-center\" data-align=\"center\">\n  4081 k\n  <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Throughput of a NN can be measured in the number of inputs processed per second (FPS). On the Xilinx DPU, the whole NN is processed for one set of input data before the next set can be passed. Our ZNNN framework implements layer pipelining, meaning that as soon as the first layer is processed, the next input set can be accepted as shown in the following figure.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"314\" src=\"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining-1024x314.png\" alt=\"\" class=\"wp-image-245\" srcset=\"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining-1024x314.png 1024w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining-300x92.png 300w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining-768x236.png 768w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining-676x207.png 676w, https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/Pipelining.png 1112w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The latency\nis slightly increased because not all layers have the same processing time, but\nall the layers are processed in parallel. In return, the delay between two\ninputs is greatly reduced, allowing to process more FPS. Because ZNNN includes\nall the required weight parameters in the design, these don&#8217;t need to be loaded\ninto the FPGA at runtime. This allows to increase the FPS by orders of\nmagnitude in comparison with the Xilinx DPU.<\/p>\n\n\n\n<p><strong>Conclusion<\/strong><\/p>\n\n\n\n<p>Both the power and the cost of ZNNNs become visible in comparison with the\nDPU: The DPU offers the flexibility to run various NNs on one implementation,\nincluding larger NNs like Resnet50. The DPU is controlled by software and\ntherefore requires a CPU running a Linux operating system. ZNNN implementations\nare ideal for small NNs and run independently from software, take their input\ndirectly from FPGA and process orders of magnitude faster than the DPU!<\/p>\n\n\n\n<p>The ZNNN\nframework is suitable for low latency, high throughput execution of small\nconvolutional and fully connected NNs. It generates VHDL code for a specific NN\nimplementation in FGPA without the development overhead of hand-written HDL\ncode and testbenches. The processing performance of the ZNNN is orders of\nmagnitude faster than Xilinx&#8217; DPU thanks to a high level of pipelining.<\/p>\n\n\n\n<p>We are aware that the ZNNN implementation can require more FPGA resources than the DPU, but there are industrial applications where this approach is a perfect fit and the achieved performance meets the requirements. With the ZNNN running independently of CPU and software and the input data coming directly from the FPGA fabric, we have principally no bottlenecks in the design.<\/p>\n\n\n\n<p>Our team will continually improve the ZNNN framework by making trade-offs between resource requirements and performance configurable.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>BY TOBIAS WELTI AND HANS-JOACHIM GELKE Due to their hardware architecture, Field Programmable Gate Arrays (FPGAs) are optimally suited for the execution of machine learning algorithms. These algorithms require the calculation of millions or even billions of multiplications for each input. To successfully accelerate a neural network, parallel execution of multiplication is the key. The [&hellip;]<\/p>\n","protected":false},"author":271,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1],"tags":[],"features":[],"class_list":["post-243","post","type-post","status-publish","format-standard","hentry","category-allgemein"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.7 (Yoast SEO v27.7) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>ZNNN the Framework to Port Neural Networks to FPGA - Embedded High Performance Multimedia Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ZNNN the Framework to Port Neural Networks to FPGA\" \/>\n<meta property=\"og:description\" content=\"BY TOBIAS WELTI AND HANS-JOACHIM GELKE Due to their hardware architecture, Field Programmable Gate Arrays (FPGAs) are optimally suited for the execution of machine learning algorithms. These algorithms require the calculation of millions or even billions of multiplications for each input. To successfully accelerate a neural network, parallel execution of multiplication is the key. The [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/\" \/>\n<meta property=\"og:site_name\" content=\"Embedded High Performance Multimedia Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-04-27T06:33:32+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-04-27T06:36:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png\" \/>\n<meta name=\"author\" content=\"gelk\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"gelk\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/\"},\"author\":{\"name\":\"gelk\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/#\\\/schema\\\/person\\\/f4fab1587a03110cf79f1bf51f32ebfa\"},\"headline\":\"ZNNN the Framework to Port Neural Networks to FPGA\",\"datePublished\":\"2020-04-27T06:33:32+00:00\",\"dateModified\":\"2020-04-27T06:36:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/\"},\"wordCount\":1268,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2020\\\/04\\\/ZNNN_workflow-1024x369.png\",\"articleSection\":[\"Allgemein\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/\",\"url\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/\",\"name\":\"ZNNN the Framework to Port Neural Networks to FPGA - Embedded High Performance Multimedia Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2020\\\/04\\\/ZNNN_workflow-1024x369.png\",\"datePublished\":\"2020-04-27T06:33:32+00:00\",\"dateModified\":\"2020-04-27T06:36:30+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/#\\\/schema\\\/person\\\/f4fab1587a03110cf79f1bf51f32ebfa\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#primaryimage\",\"url\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2020\\\/04\\\/ZNNN_workflow-1024x369.png\",\"contentUrl\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2020\\\/04\\\/ZNNN_workflow-1024x369.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/2020\\\/04\\\/27\\\/znnn-the-framework-to-port-neural-networks-to-fpga\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"ZNNN the Framework to Port Neural Networks to FPGA\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/#website\",\"url\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/\",\"name\":\"Embedded High Performance Multimedia Blog\",\"description\":\"A Blog of the ZHAW Zurich University of Applied Sciences\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/#\\\/schema\\\/person\\\/f4fab1587a03110cf79f1bf51f32ebfa\",\"name\":\"gelk\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2016\\\/05\\\/blog_portrait-e1464078746248.png\",\"url\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2016\\\/05\\\/blog_portrait-e1464078746248.png\",\"contentUrl\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/files\\\/2016\\\/05\\\/blog_portrait-e1464078746248.png\",\"caption\":\"gelk\"},\"url\":\"https:\\\/\\\/blog.zhaw.ch\\\/high-performance\\\/author\\\/gelk\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"ZNNN the Framework to Port Neural Networks to FPGA - Embedded High Performance Multimedia Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/","og_locale":"en_GB","og_type":"article","og_title":"ZNNN the Framework to Port Neural Networks to FPGA","og_description":"BY TOBIAS WELTI AND HANS-JOACHIM GELKE Due to their hardware architecture, Field Programmable Gate Arrays (FPGAs) are optimally suited for the execution of machine learning algorithms. These algorithms require the calculation of millions or even billions of multiplications for each input. To successfully accelerate a neural network, parallel execution of multiplication is the key. The [&hellip;]","og_url":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/","og_site_name":"Embedded High Performance Multimedia Blog","article_published_time":"2020-04-27T06:33:32+00:00","article_modified_time":"2020-04-27T06:36:30+00:00","og_image":[{"url":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png","type":"","width":"","height":""}],"author":"gelk","twitter_card":"summary_large_image","twitter_misc":{"Written by":"gelk","Estimated reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#article","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/"},"author":{"name":"gelk","@id":"https:\/\/blog.zhaw.ch\/high-performance\/#\/schema\/person\/f4fab1587a03110cf79f1bf51f32ebfa"},"headline":"ZNNN the Framework to Port Neural Networks to FPGA","datePublished":"2020-04-27T06:33:32+00:00","dateModified":"2020-04-27T06:36:30+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/"},"wordCount":1268,"commentCount":0,"image":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png","articleSection":["Allgemein"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/","url":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/","name":"ZNNN the Framework to Port Neural Networks to FPGA - Embedded High Performance Multimedia Blog","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#primaryimage"},"image":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png","datePublished":"2020-04-27T06:33:32+00:00","dateModified":"2020-04-27T06:36:30+00:00","author":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/#\/schema\/person\/f4fab1587a03110cf79f1bf51f32ebfa"},"breadcrumb":{"@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#primaryimage","url":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png","contentUrl":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2020\/04\/ZNNN_workflow-1024x369.png"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.zhaw.ch\/high-performance\/2020\/04\/27\/znnn-the-framework-to-port-neural-networks-to-fpga\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/blog.zhaw.ch\/high-performance\/"},{"@type":"ListItem","position":2,"name":"ZNNN the Framework to Port Neural Networks to FPGA"}]},{"@type":"WebSite","@id":"https:\/\/blog.zhaw.ch\/high-performance\/#website","url":"https:\/\/blog.zhaw.ch\/high-performance\/","name":"Embedded High Performance Multimedia Blog","description":"A Blog of the ZHAW Zurich University of Applied Sciences","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.zhaw.ch\/high-performance\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/blog.zhaw.ch\/high-performance\/#\/schema\/person\/f4fab1587a03110cf79f1bf51f32ebfa","name":"gelk","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2016\/05\/blog_portrait-e1464078746248.png","url":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2016\/05\/blog_portrait-e1464078746248.png","contentUrl":"https:\/\/blog.zhaw.ch\/high-performance\/files\/2016\/05\/blog_portrait-e1464078746248.png","caption":"gelk"},"url":"https:\/\/blog.zhaw.ch\/high-performance\/author\/gelk\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/posts\/243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/users\/271"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/comments?post=243"}],"version-history":[{"count":3,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/posts\/243\/revisions"}],"predecessor-version":[{"id":248,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/posts\/243\/revisions\/248"}],"wp:attachment":[{"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/media?parent=243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/categories?post=243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/tags?post=243"},{"taxonomy":"features","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/high-performance\/wp-json\/wp\/v2\/features?post=243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}