ABSTRACT
This study proposes a new direct exponential function computer based on CORDIC algorithm to solve the neuron realization problem on the Evolvable Hardware (EHW). The direct exponential function computer combines the essential character of exponential function with CORDIC algorithm and turns aside the conventional approach which computes the exponential function by adding sinh (x) and cosh (x) indirectly. Because the direct computer executes shift-add operation in allusion to exponential function directly and get rid of the computation of sinh (x) and cosh (x), it not only calculates much faster than the indirect computer, but also saves much more logic gates. In addition, through strict and integrated analysis, the direct computer is proved to own perfect convergence. It converges to the real value with arbitrary precision and operates shift-add operation without any unnecessary rotation. Since of the direct efficient shift-add operation, the direct computer is useful for the application of neural network based on the EHW.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/itj.2008.1156.1162
URL: https://scialert.net/abstract/?doi=itj.2008.1156.1162
INTRODUCTION
Evolvable hardware is a new concept in the development of online adaptive machines. In contrast to conventional hardware where the structure is irreversibly fixed in the design process, EHW is designed to adapt to changes in task requirements or changes in the environment through its ability to reconfigure its own hardware structure online (dynamically) and autonomously.
This study is supported by the National Natural Science Foundation of China (No. 60874047) and Hubei Science Foundation of China (No. 2007ABA281) combine the EHW with artificial neural network to design complex configuration which adaptive to changes of the environment (Gallagher et al., 2005; Vigraham and John, 2005, 2006; Wang Piao et al., 2007; Rocke et al., 2005; Saumil et al., 2006; Glette et al., 2007; Murakawa et al., 1999; De et al., 2000, 2001). The CAM-Brain Machine (CBM) (De et al., 2000, 2001) is the most famous one of the applications which combines EHW with neural network. It is an FPGA based tool for evolving a 75 million neuron artificial brain to control a lifesized kitten robot. So, many neurons require large-scale logic gates to realize the function of neural networks. Obviously, the realization of the activation function of neuron is the key factor which decides the scale of logic gates. Usually the activation function is a nonlinear function such as sigmoid function (Yongquan et al., 2006). Direct implementations of nonlinear activation functions can be expensive. In order to simplify the realization of neural network, many implementations used hard limiters or saturated linear activation function to avoid building the sigmoid activation function or hyperbolic tangent function which provide smooth transitions between excitation and inhibition thereby improving neural response (Jiang et al., 2005; Rocke et al., 2007; De, 2006; Upegui et al., 2005). However, different problem will need neural network with different activation function and structure. If the sigmoid activation function or hyperbolic tangent function is necessary for the problem, it will complicate the realization of neural networks. In the piecewise-linear approximation of the non-linear activation function is used to realize sigmoid function (Porrmann, 2002). Although, the piecewise-linear approximation realizes sigmoid function successfully, it needs several different circuits to approximate the entire sigmoid function. Usually, each individual circuit contains multiplier which takes up large scale logic gates. The more precise, the more multiplier needed. In the sigmoid activation function is calculated by the lookup tables (LUTs) method which pre-stores the output values of the activation function in a lookup table (Krips et al., 2002). The choice of precision of data used would also be driven by the size of the activation function lookup table. Higher precision would require a bigger table. So far, the most efficient approach to realize sigmoid function is the Coordinate Rotation Digital Computer (CORDIC) which can compute a wide range of functions including certain trigonometric, hyperbolic, linear and logarithmic functions through only shift-and-add operation. In the sigmoid function is realized based on calculating exponential function as ex = sinh (x)+cosh (x) and the hyperbolic sine and cosine function are calculated by CORDIC algorithm (Liddicoat, 2006; Chen et al., 2007; Meng, 2006). Although sigmoid function can be calculated by CORDIC algorithm indirectly, there are two defects during the calculation. First, the indirect calculation of exponential function ex prolong the computing time. For instance, if ex = e1.39 (e1.39 = 4), the direct CORDIC algorithm will only shift the e0.693 (e0.693 = 2) left with 1 bit. However, the indirect calculation of ex = e1.39 will rotate many shift-add steps by calculating sinh (x) and cosh (x). Compared with the direct CORDIC algorithm, the indirect calculation is inefficient. Second, there is a multiplicative operation in each iteration during the calculating of sinh (x) and cosh (x), it not only prolongs the computing time, but also takes up much more logic gates. Obviously, the defects of individual neuron realization do not attract people`s attention. But from the macrostructure, if the neural networks include 75 million neurons as the CBM, the indirect calculation of exponential function will expend a great deal of supererogatory logic gates than direct CORDIC algorithm. In real world application, every EHW chip has finite number of logic gate. The more complex of the problem, the more logic gate needed. With the inefficient realization of activation function, it will be difficult to realize the neural networks even if it is possible in theory.
In order to solve complex problem with simple neural network, it is necessary to use efficient CORDIC algorithm to realize nonlinear activation function. This study proposes a direct CORDIC algorithm which calculates sigmoid function efficiently. It takes advantages of exponential function and CORDIC to calculate exponential function only by shift-add operation. It uses ln2i or ln2-i as the rotation angle in each step. Without unnecessary calculation of sin h(x) and cos h(x), the scale of logic gate used to realized neuron is reduced significantly. Further more, the computing time of sigmoid function is also reduced distinctly.
DIRECT CORDIC IN EXPONENTIAL FUNCTION
Direct CORDIC: The symmetric sigmoid function is the most common activation function used in neural network. Its expression written as follow:
(1) |
Fig. 1: | Curve of sigmoid function |
Fig. 2: | Character of exponential function |
The curve of sigmoid function is shown as Fig. 1. Shown by the curve, if x ε [-5.3 5.3], sigmoid function has nonlinear character. If x>5.3 and x<-5.3, sigmoid function keeps 1 or -1 all the way. Because the curve possesses odd symmetric character, it is only need to computer sigmoid function with positive or negative x.
Clearly, the difficulty of sigmoid function realization is to compute the exponential function. Turning aside the relation between exponential function and hyperbolic sine/cosine function, it is necessary to analyze the essential characteristic of exponential function and CORDIC together. For the essential characteristic of exponential function, suppose x1, x2 and x3 are arbitrary positive value and x1 = x2 + x3. Figure 2 shows the relation of . If , then the can be computed as:
(2) |
However, x2 and x3 are arbitrary positive value, it is impossible to get y2 and y3 beforehand. At this time, it is necessary to find the and based on the CORDIC.
According to CORDIC principle, it is a bit-recursive implementation of the forward and backward Givens rotations (Jack, 2000; Walther, 2000). The basic CORDIC iteration equations at the ith step are (Walther, 2000):
(3) |
where, the iterations are performed for arbitrary i (i = 0, 1, 2,…, N-1). The coordinate parameter m identifies the coordinate system type (namely, circular, linear and hyperbolic coordinates for m equal to 1, 0 and -1, respectively). The rotation direction σi for rotation is σi = -sign(zi), while, for vectoring, it is σi = -sign(Xi, Yi). For m = 1, the shift sequence S(1, i) is defined by (S(1, i) = 0, 1, 2, 3, 4, 5,….). The rotation angle θm,i is given by θmi = arctan 2-s(1,i) = arctan 2-i. For each rotation performed, there is a scale factor Ki which corrects the amplification introduced by the linearized rotation in the x and y coordinates. With the above assumptions, for the ith iteration. After all iterations, the total scale factor is . With different coordinate parameters, the COORDIC can calculate different type function. Such as Chen et al. (2007), it selects m = -1 and computes exponential function as ex = sinh (x)+cosh (x). It is a indirect application of CORDIC algorithm and does not apperceive the essence of CORDIC. In Eq. 3, the essential principle of CORDIC is to establish a iteration as:
(4) |
where, g(•) is the mapping function of 2i and target function value, v(•) is the rotation angle which is relative to 2i and f(x). As Eq. 2 shown, in the computation of exponential function, the g(•) can be written as: . If and are expressed as a product of 2i, the will be computed only by shift-add operation. In order to use 2i to express and , it is necessary to divide ex into two parts since of ex ε (1, ∞) and 2i ε (0, 1) • [2, +∞]. When ex ε (1 2), it is easy to get ex–1 ε (0 1). If, ex–1 = 2-i (i<0) then x = ln(2-i +1). When ex ε [2 +∞], if ex = 2i, then x = ln(2i). Table 1 gives out ln(2i) and ln(1+2i) with i ε [1 12].
Based on the above analysis, it is easy to computer arbitrary ex by subtracting a series of ln(2i) and ln(1+2-i) from x. According to standard CORDIC principle, the expression of direct CORDIC in the computation of ex can be synthesized as:
(5) |
After some iterative shift-add operation, if zn = 0, the computation will stop and give out the ex. For instance, if x = 5.3, because 5.3>4.852, then e5.3 = e4.852xe0.448, z2 = 5.3–4.852 = 0.448. Comparing z2 with a series of ln(1+2-i), it is easy to get that e0.448 = e0.4055xe0.0308x e0.0078xe0.0039. Then the integrated computation of e5.3 is e5.3 = e4.852xe0.4055xe0.0308xe0.0078xe0.0039 = 27x(1+2-1)x (1+2-5)x(1+2-7)x(1+2-8).
Clearly, without any multiplicative operation, it is only need five shift-add operations to computer the e5.3. In order to interpret direct CORDIC distinctly, Fig. 3 shows the flow of direct CORDIC in the computation of ex.
Convergence of direct CORDIC: In standard CORDIC algorithm, the rotation of different angle does not assure convergence of the computation at all time (Lang and Antelo, 1998). Thus, it is necessary to analyze the convergence of the direct CORDIC in computation of exponential function. Suppose x is an arbitrary positive vale. The analysis, it is also necessary to divide x into two parts to analyze the convergence.
At first, if xi<0.6931, the and . According to computer science theory (Cowlishaw et al., 2001), ex–1 can be expressed by a binary code easily. With different precision, the binary code is different. For instance, if we use 20 bits to express an arbitrary positive value, the can be expressed as the sum of 1 and a decimal fraction as follow:
Table 1: | The ln(2i) and ln(1+2-i) |
Fig. 3: | The flow of direct exponential function computer |
(6) |
where, d1 is the decimal fraction and d1 ε (0 1). λi is weight of each 2-i and λi = 1 or 0. Suppose, , λm = 1 and 2-m is bigger than other λi2-i in Eq. 6. Then Eq. 6 can be transformed into:
(7) |
Since of d2<1, it is equal to . At this time, suppose, , λn = 1 and 2-n is bigger than other λi2-i. Then Eq. 7 can be transformed into:
(8) |
Obviously, if , the Eq. 8 also can be written as:
(9) |
where, and d4>d3>1. Analogous, if the decimal fraction di>9.5x10-7, the transformation of Eq. 9 will go on. Because of di<di–1<...<d2<d1<1, the limit of di is 0. When di>9.5x10-7, it is impossible to use 2-i express it and . Thus, the final expression of is:
(10) |
where, xi = ln(1+2-i). By shift-add operations, the direct CORDIC converges at accurately. The longer of the binary code which used to express decimal fraction, the more accurately direct CORDIC converges at . Based on the above proof, the direct CORDIC converges at real ex obviously.
Second, if x1≥0.6931, it is necessary to compare x1 with a series of ln(2i). Since of the x-axis is divided into some seriate districts such as (ln(2i) ln(2i+1)) (i≥0).
Consequently, x1 should belong to one of the districts by the comparison. Then can be expressed as
(11) |
After shift operation in Eq. 11, it is need to computer . Because of ln(2i+1)–ln(2i) = ln(2) = 0.6931, the range of each district is 0.6931. So, x1–ln(2i)–ln(2i+1)–ln(2i) = 0.6931. Suppose x2 = x1–ln(2i)<0.6931, according to Eq. 11, the final computation of is:
(12) |
Of course, by shift-add operations in Eq. 12, the direct CORDIC also converges at accurately.
By analyzing the convergence with 0<x<0.6931 and x≥0.6931, respectively, the direct CORDIC is proved convergent at all time. Without any vibration around the ex, it approaches to ex step by step.
CIRCUIT DESIGN AND SIMULATION OF DIRECT CORDIC
In order to design an efficient direct CORDIC circuit, it is not only need to pay attention to convergence and compute speed, but also the logic gates scale. Based on the analysis of the section II, the length of binary code used to express decimal fraction of ln(2i) and ln(1+2-i) decides the precision of direct CORDIC.
Table 2: | Precision and error of direct CORDIC |
Fig. 4: | The data format of ln(2i) and ln(1+2-i) |
Fig. 5: | The simulation result of (a) direct and (b) indirect computer |
The more precise of computation, the more logic gate needed. So, it is important to design the data format for the series of ln(2i) and ln(1+2-i). If the decimal fraction is expressed by n bits, then the code precision is 1/2n. Suppose is the biggest decimal value expressed by binary code. No matter what n is selected, s is smaller than 1. According to Eq. 6 and 12, the error between direct CORDIC output and real ex is 2ix(e1–s–1) (i>0). With different precision, the error appears distinct difference. Since of x ε (0 5.3) in the computation of sigmoid function, the biggest error brought out by different precision is a = 27x(e1–s–1). Table 2 shows the s and a. Because high precision need more logic gates to construct CORDIC circuit, it is necessary to consider precision and logic scale together. When error is 0.01, the computation result satisfies the requirement of engineering application. From Table 2, the reasonable length of binary code is n = 14. Because x ε (0 5.3) in the computation of sigmoid function, the integer part of x can be expressed by Eq. 3 bits binary code. With the sign bit, the shortest reasonable length of binary code is 18. For instance, if x = 0.6931, the binary code is x = 000010110001011100h, the data format is shown by Fig. 4. With the reasonable data format, the direct CORDIC is a real efficient exponential function computer.
This study uses verilog language to describe the direct CORDIC circuit and selects Xinlinx Sparton3E500 as the platform to simulate the function of direct CORDIC. Based on the above analysis, the program is constituted according to Fig. 3. Because the neuron computers at each biologic neural impulses (Schreiner, 2001), then the exponential function computer is triggered by clock pulse. Through different clock period testing, the shortest clock period is 200 ns. The function simulation result of exponential function computer is shown as Fig. 5a. In Fig. 5a, it is clearly that the exponential function computer calculates exponential value during the positive pulse period. Then the computation time of direct exponential function computer is 100 ns. In order to compare with other approach, this study also simulates the function of indirect CORDIC according to (Chen et al., 2007). With the same clock period and data format, the simulation result is shown as Fig. 5b. Because of indirect computation, the computation time continues 5 us.
Table 3: | Logic gate scale in exponential function computer |
Table 4: | The output of exponential function computer |
Obviously, the direct exponential function computer calculates much faster than the indirect computer. With regard to logic gate scale in realization, the number of slice and look up table (LUT) used in the exponential function computer is shown as Table 3. From the data of Table 3 the direct exponential function computer saves much more logic gate than indirect computer.
Beside the computation time and logic scale, the computation precision is shown as Table 4. In order to test the smoothness of exponential function computer in different district, the input data is distributed uniformly over x ε (0 5.3). As Table 4 shows, the precision of two approaches is equivalent.
CONCLUSION
This study proposes a new direct CORDIC to compute arbitrary ex value in EHW application. By comparing with indirect CORDIC approach, the direct computer not only calculates much faster than the indirect computer, but also saves much more logic gates. The improved computer is useful for real world application. Regard to logic gate scale, the direct computer increases the probability to solve complex problem since of the researchers can design more neurons under the restrict of chip resource. As for the computation time in many fields, the faster of computation, the better of performance. It is useful for field such as control system, signal processing field and computer field.
ACKNOWLEDGMENT
Thanks for the supports of the National Natural Science Foundation of China (No. 60874047) and Hubei Natural Science Foundation of China (No. 2007ABA281).
REFERENCES
- Upegui, A., C.A. Pena-Reyes and E. Sanchez, 2005. An FPGA platform for on-line topology exploration of spiking neural networks. Microprocessors Microsyst., 29: 211-223.
CrossRefDirect Link - Liddicoat, A.A., L.A. Slivovsky, T. McLenegan and D. Heyer, 2006. FPGA-based artificial neural network using CORDIC modules. Proceedings of the SPIE: The International Society for Optical Engineering, August 15, 2006, San Diego, CA., USA., pp: 63130-63130.
CrossRefDirect Link - Chen, X., G. Wang and K. Liu, 2007. Implementation of activated function of neural networks by using hybrid CORDIC. J. Huazhong Univ. Sci. Technol. (Natural Sci. Edn)., 35: 114-117.
Direct Link - Gallagher, J.C., K.B. Sanjay and S. Vigraham, 2005. A reconfigurable continuous time recurrent neural network for evolvable hardware applications. Proceedings of the 2005 NASA/DoD Conference on Evolvable Hardware, June 29July 1, 2005, Washington, DC. USA., pp: 247-250.
CrossRefDirect Link - De, G.H., A. Buller, P.L. De, M. Korkin, G. Fehr and T. Chodakowski, 2001. Initial evolvability experiments on the CAM-Brain machines (CBMs). Proceeding of the Congress on Evolutionary Computation, May 27-30, 2001, Seoul, South Korea, pp: 635-642.
CrossRefDirect Link - De, G.H., M. Korkin, P. Guttikonda and D. Cooley, 2000. Evolving detectors of 2D patterns on a simulated CAM-Brain Machine, an evolvable hardware tool for building a 75 million neuron artificial brain. Proceedings of the Critical Technologies for the Future of Computing, July 31, 2000, San Diego, USA., pp: 13-13.
CrossRefDirect Link - De, G.H., 2006. Hardware accelerators for evolving building block modules for artificial brains. Proceedings of the 1st NASA/ESA Conference on Adaptive Hardware and Systems, Jun 15-18, 2006, Piscataway, NJ 08855-1331, USA., pp: 222-224.
CrossRefDirect Link - Schreiner, K., 2001. Neuron function: The mystery persists. IEEE Intelligent Syst., 16: 4-7.
CrossRefDirect Link - Glette, K., J. Torresen and M. Yasunaga, 2007. An online EHW pattern recognition system applied to sonar spectrum classification. Proceedings of the 7th International Conference on Evolvable Systems: From Biology to Hardware, September 21-23, 2007, Springer Verlag, Heidelberg, pp: 1-12.
CrossRefDirect Link - Saumil, M., D. Peterson Gregory and K. Park Sang, 2006. FPGA implementation of evolvable block-based neural networks. Proceedings of the Congress on Evolutionary Computation, September 11, 2006, Vancouver, BC., Canada, pp: 3129-3136.
CrossRefDirect Link - Murakawa, M., S. Yoshizawa and I. Kajitani, 1999. GRD chip: Genetic reconfiguration of DSPs for neural network processing. IEEE Trans. Comput., 48: 628-639.
CrossRefDirect Link - Porrmann, M., U. Witkowski, H. Kalte and U. Ruckert, 2002. Implementation of artificial neural networks on a reconfigurable hardware accelerator. Proceedings of the 10th Euromicro Parallel, Distributed and Network-based Processing, January 9-11, 2002, Los Alamitos, CA., USA., pp: 243-250.
CrossRefDirect Link - Krips, M., T. Lammert and A. Kummert, 2002. FPGA implementation of a neural network for a real-time hand tracking system. Proceedings of the 1st International Workshop on Electronic Design, Test and Applications, January 29-31, 2002, Christchurch, New Zealand, pp: 313-317.
CrossRefDirect Link - Cowlishaw, M.F., E.M. Schwarz, R.M. Smith and C.F. Webb, 2001. A decimal floating-point specification. Proceedings of the 15th Symposium on Computer Arithmetic, June 11-13, 2001, Vail, Co. USA., pp: 147-154.
CrossRefDirect Link - Rocke, P., J. Maher and F. Morgan, 2005. Platform for intrinsic evolution of analogue neural networks. Proceeding of the International Conference on Reconfigurable Computing and FPGAs, September 28-30, 2005, Piscataway, USA., pp: 8-8.
CrossRefDirect Link - Rocke, P., B. McGinley, F. Morgan and J. Maher, 2007. Reconfigurable hardware evolution platform for a spiking neural network robotics controller. Proceedings of the 3rd International Workshop on Applied Reconfigurable Computing, March 27-29, 2007, Springer Verlag, Heidelberg, pp: 373-378.
CrossRefDirect Link - Meng, Q., 2006. Application of CORDIC algorithm to neural networks VLSI design. Proceedings of the Multiconference on Computational Engineering in Systems Applications, October 4-6, 2006, Beijing, pp: 504-508.
CrossRefDirect Link - Lang, T. and E. Antelo, 1998. CORDIC vectoring with arbitrary target value. IEEE Trans. Comput., 47: 736-749.
CrossRefDirect Link - Vigraham, S.A. and C.G. John, 2006. CTRNN-EH in silicon: Challenges in realizing configurable CTRNNs in VLSI. Proceedings of the Congress on Evolutionary Computation, September 11, 2006, Vancouver, BC., Canada, pp: 2807-2813.
CrossRefDirect Link - Vigraham, S.A. and C.G. John, 2005. A case for using minipop as the evolutionary engine in a CTRNN-EH control device: An analysis of area requirements and search efficacy. Proceedings of the NASA/DoD Conference on Evolvable Hardware, June 29-July 1, 2005, Washington, DC, USA., pp: 221-228.
CrossRefDirect Link - Jiang, W., S.G. Kong and G.D. Peterson, 2005. ECG signal classification using block-based neural networks. Proceedings of the International Joint Conference on Neural Networks, July 31-August 4, 2005, Montreal, QC., Canada, pp: 326-331.
CrossRefDirect Link - Wang Piao, J., H. Chang and H. Lee Chong, 2007. FPGA implementation of evolvable characters recognizer with self-adaptive mutation rates. Proceedings of the 8th International Conference on Adaptive and Natural Computing Algorithms, April 11-14, 2007, Warsaw, Poland, pp: 286-295.
CrossRefDirect Link - Yongquan, Z., L. Daozhu and Y. Yindong, 2006. Functional network and tunable activation function neural network. Proceedings of the 6th World Congress on Intelligent Control and Automation, October 23, 2006, Dalian, pp: 2757-2762.
CrossRefDirect Link