Follow
Sheng Ma
Title
Cited by
Cited by
Year
DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
S Ma, N Enright Jerger, Z Wang
Proceedings of the 38th annual international symposium on Computer …, 2011
2232011
Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip
S Ma, NE Jerger, Z Wang
IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012
1112012
Low-cost binary128 floating-point FMA unit design with SIMD support
L Huang, S Ma, L Shen, Z Wang, N Xiao
IEEE Transactions on Computers 61 (5), 745-751, 2011
492011
Supporting efficient collective communication in NoCs
S Ma, NE Jerger, Z Wang
IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012
472012
Leaving one slot empty: Flit bubble flow control for torus cache-coherent NoCs
S Ma, Z Wang, Z Liu, NE Jerger
IEEE Transactions on Computers 64 (3), 763-777, 2013
452013
A high performance reliable NoC router
L Wang, S Ma, C Li, W Chen, Z Wang
Integration 58, 583-592, 2017
372017
Novel flow control for fully adaptive routing in cache-coherent NoCs
S Ma, Z Wang, NE Jerger, L Shen, N Xiao
IEEE Transactions on Parallel and Distributed Systems 25 (9), 2397-2407, 2013
332013
SIF: Overcoming the limitations of SIMD devices via implicit permutation
L Huang, L Shen, Z Wang, W Shi, N Xiao, S Ma
HPCA-16 2010 The Sixteenth International Symposium on High-Performance …, 2010
262010
A low-cost conflict-free NoC for GPGPUs
X Zhao, S Ma, Y Liu, L Eeckhout, Z Wang
Proceedings of the 53rd Annual Design Automation Conference, 1-6, 2016
212016
Networks-on-chip: from implementations to programming paradigms
S Ma, L Huang, M Lai, W Shi
Morgan Kaufmann, 2014
212014
Configurable multi-directional systolic array architecture for convolutional neural networks
R Xu, S Ma, Y Wang, X Chen, Y Guo
ACM Transactions on Architecture and Code Optimization (TACO) 18 (4), 1-24, 2021
202021
A heterogeneous low-cost and low-latency ring-chain network for GPGPUs
X Zhao, S Ma, C Li, L Eeckhout, Z Wang
2016 IEEE 34th International Conference on Computer Design (ICCD), 472-479, 2016
192016
Priority-based PCIe scheduling for multi-tenant multi-GPU systems
C Li, Y Sun, L Jin, L Xu, Z Cao, P Fan, D Kaeli, S Ma, Y Guo, J Yang
IEEE Computer Architecture Letters 18 (2), 157-160, 2019
162019
Heterogeneous systolic array architecture for compact cnns hardware accelerators
R Xu, S Ma, Y Wang, Y Guo, D Li, Y Qiao
IEEE Transactions on Parallel and Distributed Systems 33 (11), 2860-2871, 2021
142021
A comprehensive comparison between virtual cut-through and wormhole routers for cache coherent Network on-Chips
P Wang, S Ma, H Lu, Z Wang
IEICE Electronics Express 11 (14), 20140496-20140496, 2014
142014
Holistic routing algorithm design to support workload consolidation in NoCs
S Ma, NE Jerger, Z Wang, M Lai, L Huang
IEEE Transactions on Computers 63 (3), 529-542, 2012
122012
HeSA: Heterogeneous systolic array architecture for compact CNNs hardware accelerators
R Xu, S Ma, Y Wang, Y Guo
2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 657-662, 2021
102021
CMSA: Configurable multi-directional systolic array for convolutional neural networks
R Xu, S Ma, Y Wang, Y Guo
2020 IEEE 38th International Conference on Computer Design (ICCD), 494-497, 2020
102020
Coordinated DMA: improving the DRAM access efficiency for matrix multiplication
S Ma, Z Liu, S Chen, L Huang, Y Guo, Z Wang, M Zhang
IEEE Transactions on Parallel and Distributed Systems 30 (10), 2148-2164, 2019
102019
Dycache: Dynamic multi-grain cache management for irregular memory accesses on GPU
H Guo, L Huang, Y Lü, S Ma, Z Wang
IEEE Access 6, 38881-38891, 2018
102018
The system can't perform the operation now. Try again later.
Articles 1–20