Nyuzi: An Open Source GPGPU for Graphics, Enhanced with OpenCL Compiler for Calculations Jeff Bush<sup>†</sup>, Nima Taherinejad\*, Edwin Willegger\*, Mariusz Wojcik\*, Markus Kessler\*, Johannes Blatnik\*, Ioannis Daktylidis\*, Jonas Ferdig\*, and Daniel Haslauer\* \* TU Wien, Vienna, Austria E-mail: nima.taherinejad@tuwien.ac.at † E-mail: jeffbush001@gmail.com Many details of Graphics Processing Unit (GPU) hardware have been traditionally kept as trade secrets. This makes research, studying, and teaching many important aspects of these critical high performance computing units extremely hard, if not impossible in some cases. Open source GPUs help with facilitating these works. Moreover, seeing and working with details of GPUs in practice and hands-on, in our experience, have proven to be very inspiring and highly educational tools for students. Lastly, with the rise of wearable and Internet of Things (IoT) devices, open source GPUs can significantly reduce production costs and help smaller companies in the field, in particular start-ups. Nyuzi is an open source processor designed for highly parallel, computationally intensive tasks and General Purpose Graphics Processing Unit (GPGPU) applications. It was inspired by Intel's Larrabee, although the instruction set and the micro architecture, shown in Figure 1, are substantially different. Among fully open source GPUs with soft IPs, such as [1], [2], Nyuzi provides the most complete tool set. It is the only open source GPGPU with proven support for graphic applications (see e.g. Figure 2 as a sample of images rendered by Nyuzi). Moreover, we have recently added OpenCL compilation capabilities to it, enabling it to perform scientific calculations too. Nyuzi hence can be used to experiment with microarchitectural and instruction set design trade-offs for both graphic and scientific applications. The project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++/OpenCL compiler, software libraries, and tests. It has been implemented and tested on both Altera and Xilinx Field Programmable Gate Arrays (FPGAs). The main project has been conceived and is led by Jeff Bush. The original project and its implementation on Altera platform (https://github.com/jbush001/NyuziProcessor). At the TU Wien we (led by Dr. Nima TaheriNejad) have contributed to the project by porting the design to and testing it on Xilinx platforms, improving the modularity of the design, enhancing certain modules/aspects, and expanding the design for larger, higher performance implementations. The resulting system contains four Nyuzi GPGPU cores in conjunction with a general purpose Central Processing Unit (CPU) acting as a host for the graphics processor implemented onXilinx ZCU102 Ultrascale+ (https://github.com/mkessler001/NyuziProcessor). The setup can render simple 3D scenes with moderate frame rates and is easily extendable to be used for other tasks. To the best of our knowledge, existing open source GPUs, such as MIAOW [1] and FGPU [2], support only OpenCL compilation and no graphic applications. In contrast, so far Nyuzi could support only graphic applications and no OpenCL programs. Hence recently, our team at TU Wien has added the necessary capabilities for OpenCL compilations, enabling Nyuzi to be used not only for graphic purposes but also for scientific calculations. The OpenCL compilation capabilities were tested using AMD APP benchmark, namely Binary-Search, BitonicSort, MatrixTranspose, PrefixSum, and Reduce applications. This latest addition renders Nyuzi as the open source GPGPU with the most complete tool-chain. Fig. 1: Nyuzi Architecture. Fig. 2: Photographs of sample images rendered by Nyuzi. ## REFERENCES - [1] R. Balasubramanian, V. Gangadhar, Z. Guo, C. Ho, C. Joseph, J. Menon, M. P. Drumond, R. Paul, S. Prasad, P. Valathol, and K. Sankaralingam. Miaow - an open source rtl implementation of a gpgpu. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII), pages 1–3, 2015. - [2] Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. Fgpu: An simt-architecture for fpgas. In *Proceedings of the 2016 ACM/SIGDA In*ternational Symposium on Field-Programmable Gate Arrays, FPGA '16, page 254–263, New York, NY, USA, 2016. Association for Computing Machinery. 1