For real-time image processing applications in consumer electronic products, highspeed preprocessing algorithms are necessary and have been widely investigated. This article presents a highly efficient very large scale integrated (VLSI) circuit implementation of Canny edge detection. We employed an approximation method that reduces hardware costs without affecting computation results. Additionally, we divided the whole image into several blocks for processing to obtain superior detection performance. It can efficiently prevent missing the real edge in low-contrast regions. The VLSI architecture of our design yields a processing rate of approximately 250 MHz using the Xilinx Virtex-5 field-programmable gate array. The simulation result shows that the proposed circuit takes 0.14ms for processing 512 x 512 test image database and requires the least number of operations compared with previous techniques; therefore, it is suitable for low-cost high-performance system on chip systems.