A Reconfigurable 64-Dimension K-Means Clustering Accelerator With Adaptive Overflow Control
L Du, Y Du, MCF Chang - … on Circuits and Systems II: Express …, 2019 - ieeexplore.ieee.org
L Du, Y Du, MCF Chang
IEEE Transactions on Circuits and Systems II: Express Briefs, 2019•ieeexplore.ieee.orgThis brief presents a novel reconfigurable K-means clustering accelerator that is suitable for
integration in both IoT and data center system. The high vector dimension reconfigurability
and design cost reduction is achieved through vector-streaming and adaptive overflow
control to adapt distance computation using as-needed precision (dynamic 16-bit fixed-point
data format). A two-stage shift-bit counted comparator is proposed. It can determine most
results through only turning on the shift-bit comparator (3-bit), reducing the power …
integration in both IoT and data center system. The high vector dimension reconfigurability
and design cost reduction is achieved through vector-streaming and adaptive overflow
control to adapt distance computation using as-needed precision (dynamic 16-bit fixed-point
data format). A two-stage shift-bit counted comparator is proposed. It can determine most
results through only turning on the shift-bit comparator (3-bit), reducing the power …
This brief presents a novel reconfigurable K-means clustering accelerator that is suitable for integration in both IoT and data center system. The high vector dimension reconfigurability and design cost reduction is achieved through vector-streaming and adaptive overflow control to adapt distance computation using as-needed precision (dynamic 16-bit fixed-point data format). A two-stage shift-bit counted comparator is proposed. It can determine most results through only turning on the shift-bit comparator (3-bit), reducing the power consumption by 7× compared to the direct full dynamic range comparison. Four vectors with two cluster centroids are processed simultaneously. Up to 8-dimension cluster vectors are stored in local buffer to reduce data exchange between the main memory and the processing engine. A prototype accelerator was implemented in TSMC 65 nm. The accelerator occupied 0.26 mm2 and can support up to 64-D vector clustering. It achieved 31.2M query vectors/s with 41-mW power consumption at 250-MHz clock (cluster number: 2, vector dimension: 64) and an energy efficiency of 0.41 TOPS/W at 30 MHz clock.
ieeexplore.ieee.org
Showing the best result for this search. See all results