Abstract
Heterogeneous computing platforms combining general-purpose processing elements with different accelerators (such as GPU or FPGAs) are ideally suited for efficient processing of compute-intensive data analytics kernels. In this chapter, we focus on the acceleration of data analytics kernels on heterogenous computing systems with FPGAs. The introduction of FPGAs in the context of data analytics is negatively impacted by the difficulty in programming such systems given the increasing complexity of FPGA-based accelerators. This makes high-level synthesis (HLS) an attractive solution to improve designer productivity by abstracting the programming effort above register-transfer level (RTL). HLS offers various architectural design options with different trade-offs via pragmas (loop unrolling, loop pipelining, array partitioning). However, non-negligible HLS runtime renders manual or automated HLS-based exhaustive architectural exploration for implementation of the kernels practically infeasible. To address this challenge, we have developed Lin-Analyzer, a high-level accurate performance analysis tool that enables rapid design space exploration with various pragmas for FPGA-based accelerators without requiring RTL implementations. We show how Lin-Analyzer can enable easy but performance efficient implementation of computational kernels from a variety of data analytics applications onto FPGA-based heterogeneous systems.
Alok completed this project while working at SoC, NUS
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
S. Bilavarn, G. Gogniat, J.L. Philippe, L. Bossuet, Design space pruning through early estimations of area/delay tradeoffs for FPGA implementations. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25. Doi:10.1109/TCAD.2005.862742
Cadence Inc. C-to-Silicon Compiler (2015)
A. Canis, J. Choi, M. Aldham et al., LegUp: high-level synthesis for FPGA-based processor/accelerator systems, in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’2011), Monterey (2011)
A. Canis, D. Brown, J.H., Anderson, Modulo SDC scheduling with recurrence minimization in high-level synthesis, in The 24th International Conference on Field Programmable Logic and Applications (FPL), Munich (2014)
S. Che, M. Boyer, J. Meng, D. Tarjan, J.W. Sheaffer, K. Skadron, A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68 (10), 1370–1380 (2008)
S. Che, J.W. Sheaffer, M. Boyer, L.G. Szafaryn, L. Wang, K. Skadron, A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads, in 2010 IEEE International Symposium on in Workload Characterization (IISWC) (2010), pp. 1–11
J. Cong, Z. Zhang, An efficient and versatile scheduling algorithm based on SDC formulation, in The 43rd ACM/IEEE Design Automation Conference (DAC’2006), San Francisco (2006)
J. Cong, W. Jiang, B. Liu, Y. Zou, Automatic memory partitioning and scheduling for throughput and power optimization, in IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers, San Jose, CA (2009)
J. Cong, M. Huang, P. Pan, Y. Wang, P. Zhang, Source-to-Source Optimization for HLS, FPGAs for Software Programmers, chap. 8 (Springer International Publishing, Cham, 2016), pp. 137–163. Doi:http://dx.doi.org/10.1145/2209291.2209302. ISBN 978-3-319-26408-0
W.J. Dally, J.D. Balfour, D. Black-Schaffer, J. Chen, R.C. Harting, V. Parikh, J. Park, D. Sheffield, Efficient embedded computing. IEEE Comput. 41 (7), 27–32 (2008)
R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, A.R. LeBlanc, Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J. Solid State Circuits 9 (5), 256–268 (1974)
H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, D. Burger, Dark silicon and the end of multicore scaling, in 2011 38th Annual International Symposium on Computer Architecture (ISCA) (IEEE, New York, 2011), pp. 365–376
A.P. Greenhalgh, Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7 (2011)
M. Guevara, B. Lubin, B.C. Lee, Navigating heterogeneous processors with market mechanisms, in 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA2013) (IEEE, New York, 2013), pp. 95–106
J. Holewinski, R. Ramamurthi, M. Ravishankar, N. Fauzia, L.N. Pouchet, A. Rountev, P. Sadayappan, Dynamic trace-based analysis of vectorization potential of applications, in The 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Beijing (2012)
Ineda Systems, Hierarchical computing (2014). [Online]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: convolutional architecture for fast feature embedding. Preprint (2014). arXiv:1408.5093
R. Kumar, K.I. Farkas, N.P. Jouppi, P. Ranganathan, D.M. Tullsen, Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction, in MICRO (2003), pp. 81–92
C. Lattner, V. Adve, LLVM: a compilation framework for lifelong program analysis & transformation, in Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO), Palo Alto, CA (2004)
P. Li, P. Zhang, L.N. Pouchet, J. Cong, Resource-Aware Throughput Optimization for High-Level Synthesis, in The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, CA (2015)
Y. Liang, K. Rupnow, Y. Li, D. Min, M.N. Do, D. Chen, High-level synthesis: productivity, performance, and software constraints. J. Electr. Comput. Eng. 2012 (2012). Doi:10.1155/2012/649057
H. Liu, L.P. Carloni, On learning-based methods for design-space exploration with high-level synthesis, in The 50th Annual Design Automation Conference (DAC), Austin (2013)
G.E. Moore, Cramming more components onto integrated circuits. Proc. IEEE 86 (1), 82–85 (1998)
T.S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, S. Vishin, Hierarchical power management for asymmetric multi-core in dark silicon era, in Proceedings of the 50th Annual Design Automation Conference (ACM, New York, 2013), p. 174
T.S. Muthukaruppan, A. Pathania, T. Mitra, Price theory based power management for heterogeneous multi-cores, in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and operating systems (ACM, New York, 2014), pp. 161–176
nVidia, Variable SMP—a multi-core CPU architecture for low power and high performance (2011)
Odroid-XU3. http://goo.gl/Nn6z3O
A. Pathania, Q. Jiao, A. Prakash, T. Mitra, Integrated CPU-GPU power management for 3D mobile games,” in Proceedings of the the 51st Annual Design Automation Conference on Design Automation Conference (ACM, New York, 2014), pp. 1–6
N. Pham, A.K. Singh, A. Kumar, M.M.A. Khin, Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis, in Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, San Jose, CA (2015)
L. Pouchet, PolyBench/C3.2 (2012)
M. Pricopi, T. Mitra, Bahurupi: a polymorphic heterogeneous multi-core architecture. ACM Trans. Archit. Code Optim. 8 (4), 22 (2012)
M. Pricopi, T. Mitra, Task scheduling on adaptive multi-core. IEEE Trans. Comput. 63 (10), 2590–2603 (2014)
M. Pricopi, T.S. Muthukaruppan, V. Venkataramani, T. Mitra, S. Vishin, Power-performance modeling on asymmetric multi-cores, in 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) (2013), pp. 1–10
A. Prost-Boucle, O. Muller, F. Rousseau, A fast and autonomous HLS methodology for hardware accelerator generation under resource constraints, in Euromicro Conference on Digital System Design (DSD), Los Alamitos, CA (2013)
A. Putnam, A.M. Caulfield, E.S. Chung, D. Chiou, K. Constantinides et al., A reconfigurable fabric for accelerating large-scale datacenter services, in Proceeding of the 41st Annual International Symposium on Computer Architecuture (IEEE, New York, 2014), pp. 13–24
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection. Preprint (2015). arXiv:1506.02640
B.C. Schafer, K. Wakabayashi, Divide and conquer high-level synthesis design space exploration. ACM Trans. Des. Autom. Electron. Syst. 17 (3), Article 29 (2012), 19pp. Doi:http://dx.doi.org/10.1145/2209291.2209302
Y. Shao, B. Reagen, G.Y. Wei, D. Brooks, Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures, in The 41st Annual International Symposium on Computer Architecture (ISCA), Minneapolis (2014)
B. So, M.W. Hall, P.C. Diniz, A compiler approach to fast hardware design space exploration in FPGA-based systems, in Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, Berlin (2002)
Synopsys Inc. (2015)
M.A. Todd, S.S. Gurindar, Dynamic dependency analysis of ordinary programs, in The 19th Annual International Symposium on Computer Architecture, New York (1992)
F.M. Vallina, C. Kohn, P. Joshi, Zynq all programmable SoC Sobel filter implementation using the Vivado HLS tool. Application Note XAPP890, Xilinx (2012)
Xilinx Inc. (2015)
Z. Zhang, B. Liu, SDC-based modulo scheduling for pipeline synthesis, in IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA (2013)
G. Zhong, V. Venkataramani, Y. Liang, T. Mitra, S. Niar, Design space exploration of multiple loops on FPGAs using high level synthesis, in 2014 IEEE 32nd International Conference on Computer Design (ICCD), Seoul (2014)
G. Zhong, A. Prakash, Y. Liang, T. Mitra, S. Niar, Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators, in The 53rd Annual Design Automation Conference (DAC), Austin (2016)
Acknowledgements
This work was partially supported by the Singapore Ministry of Education Academic Research Fund Tier 2 MOE2015-T2-2-088.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Zhong, G., Prakash, A., Mitra, T. (2017). Accelerating Data Analytics Kernels with Heterogeneous Computing. In: Chattopadhyay, A., Chang, C., Yu, H. (eds) Emerging Technology and Architecture for Big-data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-54840-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-54840-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54839-5
Online ISBN: 978-3-319-54840-1
eBook Packages: EngineeringEngineering (R0)