Abstract:
The field of artificial intelligence and its paradigms continue to achieve state-of-the-art accuracy in various
tasks. The future of artificial intelligence-based solutions will be populated with smart devices that
require low computational power and inexpensive hardware platforms. Typically, the machine learning
algorithms are complex, iterative, time-consuming, and hence are usually executed on general-purpose
or high-performance computers. The computational engine for inferencing is significantly less complex
than machine learning algorithms. While standard computers can incorporate sophisticated floatingpoint
units (FPU), this is not the case with embedded processors that may host a simplified FPU. On
dedicated inference engines, FPUs may not be possible due to cost (space, power, and speed). To solve
this problem, we focus on efficient algorithms for various building blocks of neural networks. Our approach
starts by introducing the square law that significantly reduces the computation requirements of
machine learning models by eliminating the need for mathematical operators such as exponent, floatingpoint
division, square root, and logarithm. The square law algorithm can reduce computation time on
CPU by 1.3x to 4.3x and on ARM processors by 4x to 169x without hurting the prediction accuracy.
We also discovered that square law can be applied across a wide variety of machine learning building
blocks. We propose distinct technologies to make a complete neural network on a chip. The square
law-based solutions use standard digital building blocks and can be implemented on ASICs or FPGAs.
On ASIC platform, our algorithm records area efficiency (throughput per gate) of 1.79x to 3.75x over
baselines without hurting the prediction accuracy.