BoxMuller AWGN Signal Generator For FPGA
Acknowledgement
Box-Muller AWGN signal generator implemented for FPGA/ASIC is presented. Mainly refer to [6] and [7], thanks to the authors.
Features
- Based on the Box-Muller algorithm with no central limit theorem required
- Piecewise polynomial based Chebyshev function approximation units with range reduction
- Period of generated noise sequence is 2^88 (about 10^25)
- 18-bit noise samples with 5 bits of integer and 13 bits of fraction, accurate to one unit in the last place (ulp) up to 8.16*sigma, which models the true Gaussian PDF accurately for a simulation size of over samples 10^15
- Core can be reset to its initial state and work on clock enable signal
- Bit-true MATLAB and C mex programs included
- Maximum clock rate and output sample rate of 320MHz on Xilinx VCU108 Board
Box-Muller Algorithm
The Box–Muller transform, by George Edward Pelham Box and Mervin Edgar Muller [1], is a pseudo-random number sampling method for generating pairs of independent, standard, normally distributed (zero expectation, unit variance) random numbers, given a source of uniformly distributed random numbers.
The Box-Muller method starts with two independent uniform random variables, and, over the interval [0, 1). The following mathematical operations are performed to generate two samples, and, of a Gaussian distribution N(0,1).
Fixed Point Method
Generally, when evaluating channel codes, one needs 100 to 1000 bits in error to draw conclusions on a simulation with enough confidence. Hence, with samples, one can examine channel code behavior for bit error rates as low as 10^-12 to 10^-13.
Two key specifications are set for our noise generator: periodicity of and 16-bit noise samples. In order to meet the periodicity requirement, the URNGs should have a period of at least 10^15.
By examining the normal distribution N(0,1) using norminv function in MATLAB, we observe that we need to be able to represent up to for a population of samples. In other words, the probability of the absolute value of a single sample from that population being larger thanis less than 0.5 (0.444 to be exact).
Bit-width for u0 and u1
Bit-width for e, f and g
After generating the two 32-bit Tausworthe URNGs numbers u0 and u1. The implementation of (3)~(5) in hardware utilizes look up tables where the coefficients were found using Chebyshev series approximations [5].
Moreover, function (3) and (4) processing need to split into three steps: range reduction, function approximation and range reconstruction [6][7]. These steps are utilized because it facilitates the implementation by lowering the complexity to only approximating the function on a smaller input interval. Truncation is used here to further decrease hardware complexity.
Chebyshev coefficients for (3) (4) and (5) approximation are generated with MATLAB codes, refer to “chebv_approx.m” file. For example, the Degree-2 Chebyshev approximation coefficients for –ln(x) calculation is like this:
The error analysis starts at the end and makes its way back to the beginning because the main goal is for the output’s error to meet the faithful rounding requirement. Please refer to [6] and [7] for detail analysis description. The result of fixed point from [6] is as Figure 1 (f and g width expanded).
Figure 1 Hardware Computation procedure diagram from [6]
Hardware Implementation
Tausworthe URNG Generation
Although traditional linear feedback shift registers (LFSRs) are often sufficient as a uniform random number generator (URNG), Tausworthe URNGs [3] are fast and occupy less area. Tausworthe URNG follows the algorithm presented by L’Ecuyer [4], which combines three LFSR-based URNGs to obtain improved statistical properties. It generates a 32-bit uniform random number per clock and has a large period of (about).
Figure 2 Tausworthe URNG generation codes
Normal Distribution Performance Test
Test Environment
Taus_URNG, seeds are 2846420573 2846420573 2846420573
Taus_URNG, seeds are 912462866 912462866 912462866
Test Generated samples number for analysis: 1e7
Fixed point Accuracy Evaluation
Fixed point output is compared to float point model (with same URNG input):
Floor truncation | Round | |
e max error value (abs) | 6.3224e-08 | 3.4341e-08 |
f max error value (abs) | 3.1588e-05 | 2.3977e-05 |
g0/g1 max error value (abs) | 1.0174e-04
1.0174 e-04 |
1.1467e-04
1.1467 e-04 |
x0/x1 max error value (abs) | 4.8891e-04
4.8123e-04 |
5.5051e-04
6.1058e-04 |
x0/x1 mean value | -5.6479e-04
8.4952e-05 |
-5.6477e-04
8.4947e-05 |
x0/x1 variance value | 0.99978
1.0001 |
0.99982
1.0001 |
The results show that accuracy loss caused by truncation can be neglected. So truncation method is used in hardware.
Anderson-Darling test result
adtest result (vs. MATLAB randn() function):
H | P | adsta | cv | |
This design | 0 | 0.7810 | 0.2398 | 0.7519 |
MATLAB randn() | 0 | 0.3270 | 0.4209 | 0.7519 |
Distribution Figure
Figure 3. PDF Distribution of generated 1e7 samples
Resource Utilization and Throughput
Xilinx VCU108 FPGA Evaluation Board
Fmax : 330MHz (can be improved further more)
Latency : 16 cycles
Figure 4. Resource Utilization
More Methods to generate AWGN
Wallace method
D.-U. Lee, W. Luk, J. D. Villasenor, G. Zhang, and P. H. W. Leong, “A hardware gaussian noise generator using the wallace method,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 8, pp. 911–920, Aug 2005.
Ziggurat methods
Hörmann and J. Leydold, “Continuous random variate generation by fast numerical inversion,” ACM Trans. Model. Comput. Simul., vol. 13, no. 4, pp.347–362, Oct. 2003
Inversion method
Gutierrez, V. Torres, and J. Valls, “Hardware architecture of a gaussian noise generator based on the inversion method,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 8, pp. 501–505, Aug 2012.
C. C. Cheung, D. U. Lee, W. Luk, and J. D. Villasenor, “Hardware generation of arbitrary random number distributions from uniform distributions via the inversion method,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 15, no. 8, pp. 952–962, Aug 2007.
CORDIC use
S. Malik, A. Hemani, and N. D. Gohar, “Unifying cordic and box-muller algorithms: An accurate and efcient gaussian random number generator,” in 2013 IEEE 24th International Conference on Application-Specifc Systems, Architectures and Processors, June 2013, pp. 277–280.
Wang and Z. Bie, “A novel hardware gaussian noise generator using box-muller and cordic,” in 2014 Sixth International Conference on Wireless Communications and Signal Processing (WCSP), Oct 2014, pp. 1–6.
Reference
[1] G. E. P. Box and Mervin E. Muller, A Note on the Generation of Random Normal Deviates, The Annals of Mathematical Statistics (1958), Vol. 29, No. 2 pp. 610–611 doi:10.1214/aoms/1177706645, JSTOR 2237361
[2] Box-Muller Transform wiki page https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform#cite_note-1
[3] R.C. Tausworthe, “Random Numbers Generated by Linear Recurrence Modulo Two,” Math. and Computation, vol. 19, pp. 201-209, 1965.
[4] P. L’Ecuyer, “Maximally Equidistributed Combined Tausworthe Generators,” Math. Computation, vol. 65, no. 213, pp. 203-213, 1996.
[5] M. J. Schulte and E. E. Swartzlander, “Hardware designs for exactly rounded elementary functions,” IEEE Transactions on Computers, vol. 43, no. 8, pp. 964–973,Aug 1994.
[6] D. U. Lee, J. D. Villasenor, W. Luk, and P. H. W. Leong, “A hardware Gaussian noise generator using the box-muller method and its error analysis,” IEEE Transactions on Computers, vol. 55, no. 6, pp. 659–671, June 2006.
[7] Lincoln Glauser, “The Design of a 24-bit Hardware Gaussian Noise Generator via the Box-Muller Method and its Error Analysis”, (2017). Tesis. Rochester Institute of Technology.