# An Optimized Ultrasound Digital Beamformer with Dynamic Focusing Implemented on FPGA

Mohamed Almekkawy<sup>1</sup>, Jingwei Xu<sup>2</sup> and Mohan Chirala<sup>3</sup>

*Abstract*— We present a resource-optimized dynamic digital beamformer for an ultrasound system based on a fieldprogrammable gate array (FPGA). A comprehensive 64-channel receive beamformer with full dynamic focusing is embedded in the Altera Arria V FPGA chip. To improve spatial and contrast resolution, full dynamic beamforming is implemented by a novel method with resource optimization. This was conceived using the implementation of the delay summation through a bulk (coarse) delay and fractional (fine) delay. The sampling frequency is 40 MHz and the beamformer includes a 240 MHz polyphase filter that enhances the temporal resolution of the system while relaxing the Analog-to-Digital converter (ADC) bandwidth requirement. The results indicate that our 64-channel dynamic beamformer architecture is amenable for a low power FPGA-based implementation in a portable ultrasound system.

#### I. INTRODUCTION

Ultrasound imaging is an important, noninvasive and safe tool for medical diagnosis. Typically operating at frequencies between 1 and 10 MHz, the electrical signals are transformed into pressure waves using piezoelectric transducers and produces images through the echoes of the mechanical energy from boundaries through tissues due to reflection and scattering. Basically, the spatial pulse echo response for a single scatterer corresponds to the convolution of the spatial impulse responses of the transmit and receive apertures. Ultrasound systems have been increasingly targeted for portable, low cost and low-power consumption with some ultrasound devices implemented as applicationspecific integrated circuits (ASICs) [1]. However, the ASIC design suffers from its limitation to provide flexibility for further evolving applications. Some other applications of ultrasound utilized the digital signal processors (DSPs) to improve flexibility [2]. However, because of its limited transfer bandwidth, it is challenging to do beamforming for multiple channels on a single DSP chip. Although some applications based on Multi-Core DSPs have been proposed recently [3], the DSP-based approach is suffering from higher power consumption and non-prototypical design. Other ultrasound applications have been implemented in FPGA [4], because of their flexibility for further application and trade-off between performance and efficiency. In this paper, FPGA is chosen

as the platform to implement the proposed beamformer. Besides the diversity of the platforms, the implemented signal processing schemes also vary in terms of beamformer operation. The echoes are received by a group of transducer elements, and an alignment of the echo signals of different transducers is necessary, hence the beamformer plays the role of aligning signals by delaying them appropriately. To improve resolution, accurate delays are desired in ultrasound imaging. The limitations of analog-digital converters (ADC) constrain the time difference between two samples. Typically, there are two ways to improve the time delay accuracy of a beamformer. One is using phase rotation, which is also known as direct sampled in-phase/quadrature (DSIQ) beamforming. This method yields degraded image quality for its narrow band signal [3], [5]. Another way to improve time delay is by using an interpolation filter. Traditionally an interpolation-based filter requires up-sampling before a low-pass filter [6], which requires high operating frequency in the DSP part. This approach is not suitable for a lowcost FPGA as it would not lead to an optimum usage of resources. A more efficient method involving a polyphase filter is employed in this paper to achieve that objective. Besides the implementation of delay units, the calculation of delay is challenging for dynamic focusing, which requires delays to be updated continuously. An efficient way to compute the delay information is proposed in [7], but the resolution of the delay is bound by the period of operating frequency. To acquire more accurate delay information, a higher operating frequency is necessary. Another widelyused method is to pre-compute the delays and save them in memory. Owing to memory-limitation, only pseudo-dynamic focusing is implemented in the related works, where delay information is only updated for some pre-determined depths [4], [7], instead of continuously updating them. In this paper, we present an FPGA-based Ultrasound beamformer, in which a novel and economic scheme is used to support dynamic focusing. The 64-channel beamformer is embedded in an Altera Arria V FPGA and the performance of the implementation is evaluated by real-data experiments.

#### II. BEAMFORMING TECHNIQUE

The nature of the near-field ultrasound signals with considerable time difference between incident signals of different channels, requires the beamformer to improve the quality of the ultrasound image. Fig. 1 shows the nature of how signals are organized to detect a focal point in an ultrasound system. Assuming that the focal point G is under detection, the signals echo back from the focal point and arrive at

<sup>&</sup>lt;sup>1</sup>M. Almekkawy is with Department of Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, USA, alme0078 at umn.edu

<sup>&</sup>lt;sup>2</sup>J. Xu is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, USA, xujw07 at tamu.edu

<sup>3</sup>M. Chirala is with Samsung Research America, Dallas, Texas, USA, m.chirala at samsung.com



Fig. 1: Signals Alignment in Ultrasound System



Fig. 2: Variations of Echo Paths In Received Signals.

varied channels at different times. The role of beamforming is to appropriately delay and sum echo signals from the transducers. After the proper delays in the beamformer, the signals are aligned together in the same time interval. Then all aligned signals are summed together to acquire the signal corresponding to point G. During the receive phase, dynamic focusing requires the beamformer to change the focal point continuously to the corresponding sampled signal depth by adjusting the delays of all channels. Fig. 2 shows how the flight paths of signals change during the receive phase. Multiple channels are used to sense the echoed back ultrasound signals. An entire ultrasound image consists of multiple scan lines. To detect a single scan line, a series of focal points are sensed. Assuming the scan line starts from the position of channel A, the locations of the focal points are determined by the signals sampled by channel A. As Fig. 2 shows, there is a depth in the physical positions between each two sampled data point. The distance d between two focal points satisfies that  $d = (v * T_s)/2$ , where  $T_s$  is the sampling interval and  $v$  is the speed of sound in tissue. The dynamic beamformer should delay signals received by all transducer variably for every sample. For the focal point 1 in the figure, it is assumed that the time of flight from transmission to reception by channel A, is  $T_1$ . The time for channel B to receive it is  $T_2$ , where  $T_2 > T_1$  holds. It implies that the signal echoed back from point 1 will be delayed by  $(T_2 - T_1)$  in channel A to be aligned with the signal in channel B. For the subsequent focal point 2, the times that corresponding signals are received by channel A and channel B respectively are:

$$
\hat{T}_1 = \Delta T_1 + T_1, \quad \hat{T}_2 = \Delta T_2 + T_2 \tag{1}
$$

According to the property of the triangle, we have that

 $(b - a) < d$ . Thus:

$$
\Delta T_2 = \frac{d+b-a}{v} < \frac{2d}{v} \tag{2}
$$
\n
$$
\frac{2d}{v} = \Delta T_1
$$

This implies that the corresponding signal of focal point 2 has arrived at channel B before its next sampling. If  $\Delta T_1 - \Delta T_2$  is relatively short, we can retrieve the fractional delay between two samples after sampling the next signals in channel B. However if  $\Delta T_1 - \Delta T_2$  is large enough, the corresponding signal has to be accessed without sampling the next signal. In this case, the previous sampled signal from channel B will be used twice and be aligned with both signals received from the channel for focal point 1 and focal point 2. Although the same signal would be used twice to be aligned, different fractional delays are necessarily to affect the identical sample to achieve finer alignment. To acquire fractional delays on the sample, there are techniques that use a 100 MHz sampling rate with digital interpolation filters [8], which allow the signals to be sampled at a lower frequency and concurrently get finer delays [9], [10]. An efficient polyphase interpolation filter is employed in this paper to acquire finer delays. The details of the implementation are presented in the following.

#### III. SYSTEM ARCHITECTURE

Fig. 3 shows the overall proposed 64-channel ultrasound system diagram. There are 64 transducers to receive the echo back signals when the analog front end converts analog signals to digital signals. Each channel path processed a signal from the corresponding transducer and has the ability to delay the signal. The Digital signal processing block contains two main parts, the Midend beamformer and Backend processing. The Midend portion is responsible for the delay of signals in different channels to align signals before the sum operation. The summed signal is fed into the backend, where the signal is demodulated and a following envelope detection calculates the amplitude of the signals. There is also a control block that generates control signals by locally pre-computed data. Due to the limitation of the space, local memory is updated by each scanline. All delay information is saved in an external memory. The Midend contains 64 paths and the amounts of delays for each are dynamically set by the control signals from the control block. The signals are delayed by two units, bulk delay and fractional delay, sequentially. In addition to the delay, each path has an apodization weight. The Bulk delay part delays the signal by integer number of sampling period while the fractional delay is a polyphase filter, which provides the interpolated data without up-sampling. The apodization coefficients are a multiplication to assign the signals different weights in different channels .

### IV. DIGITAL BEAMFORMING IMPLEMENTATION

The architecture of the hardware design provides dynamic focusing with apodization. This dynamic focusing is neces-



Fig. 3: Architecture of the Digital Beamforming Technique.



Fig. 4: Simulink Model for Bulk and Fractional Delay.

sary to achieve a high performance imaging system. More specifically, the Bulk part delays the signals by an integer number of sampling period while the fractional delay provides the fractionally interpolated data between two sampled signals. The bulk delay has been implemented as a first-in first-out (FIFO) buffer. Hence, the bulk delay has a buffered output signal when the desired control delay is equal to the number of signals stored in the FIFO buffer. This mechanism can dynamically support the controlled delay increases. Once the control signal increases by 1, the output will not be updated because the equation between the control signal and number of buffered signals does not stand. From the system description, it is known that some signals would be used twice for alignment for genuine dynamic beamforming. In order to keep the sampled signal, the bulk delay should be increased by 1 to prevent updating output signals. Therefore, through focusing the points from near to far along a scanline, the desired delays for all channels increase monotonously. Notice that there exist cases where if the bulk delay increases by one, the output of the bulk delay will not be updated. To protect the correctness of the functionality of the poly-phase filter, the registers should reject the last repeated signals. A clock-gating scheme is introduced on the fractional delay part to guarantee correctness. Once the bulk delay is increased by 1, the pipeline Enable signal will be pulled down to gate the clock. Therefore, the pipeline in fractional delay will be stalled in one cycle to reject data input. The envelope instruction requires in-phase and quadrature components of the received signal, which can be achieved by demodulation of the received signal. To visualize a more detailed understanding of this implementation, a Simulink model of the implementation of this method is shown in Fig. 4. The FIFO of the bulk delay block is implemented in Simulink as a variable integer delay block. Fig. 5 shows the architecture of fractional delay. The Fractional delay provides

the interpolation functionality through an efficient poly-phase filter implementation, thereby obviating the need for upsampling. The poly-phase delay filter is designed based on the 48-tap low-pass filter, which could be used to play the role of the interpolation low-pass filter. The parameters of the 48-tap filter, from C0 to C47, are set according to the carrier frequency and baseband width. Instead of using an up-sampling and down-sampling method, the poly-phase filter will be used to directly access interpolated signals without changing the sampling rate. This technique has been implemented with  $7^{th}$  order finite impulse response (FIR) filter with six groups of coefficients. Each of them represents a desired interpolating position. The group of coefficients is selected by the control signals based on desired phase. The registers store all 7 previous sample signals. This calculation is under the assumption that the required resolution has to be at least  $\lambda/16$  and the maximum fundamental frequency is 12 MHz with a 40 MHz sampling frequency. Basically, the number of filter coefficient groups is calculated to satisfy the required resolution as stated in equation (3).

$$
RFR = F_{max} * \frac{16}{F_s} \tag{3}
$$

 $RFR$  is the required frequency resolution,  $F_{max}$  is the max frequency and  $F_s$  is the sampling frequency. The number of coefficient groups  $n$  should satisfy the required resolution according to equation (4).

$$
RFR < \frac{1}{F_s * n} \tag{4}
$$

From equations (3),(4) the reader can easily conclude that an  $n$  equal to six is an optimized value of the number of group coefficients. Instead of traditionally saving all delays for a frame in local memory, we proposed a new scheme to store only the delays for the current scanline. The delay information updated each scanline by the micro controller; all delay information is in an external DDR3 memory. To improve the efficiency, a double buffering scheme is used as shown in Fig. 6. There are two local memories, as one of them provides delay and the other one is buffering information for the next scan line. Besides the buffering scheme, the delta-coding scheme is used to reduce the space to save delay information. Assuming a  $64\times2$  scan line and 5200-sampling-point ultrasound mode, the memory for each channel is 41Kb, compared to 7.6 Mb in the conventional method. There is a huge saving in memory by employing that method.

## V. RESULTS AND DISCUSSION

The above architecture was implemented and synthesized by Quartus II for Altera Arria V. and this was described by verilog. A 40 MHz sampling rate was used for the 12 MHz excitation. The sum operation across all channels is pipelined to incorporate numerous inputs and process them at a high clock frequency. The first stage in the pipeline contains 32 inputs, each producing the sum of the incoming 12-bits from 2 channels. The resulting 16 outputs are summed in



Fig. 5: Polyphase Fractional Delay Filter Block Diagram.



Fig. 6: Block Diagram of Local Memory Management.

pairs in the second pipeline stage. The last summation stage produces one 18-bit number that is truncated to a 12-bit number and fed into the high pass filter. Table I shows the device utilization for the whole architecture, mid-end and the control block. The 64 channel model consumes 63696 adaptive logic modules (ALMs) and 1290 ALMs for the beamforming block and control block. The designed board contains the architecture in FPGA and DDR3 which is the external memory to store all delay information. We use the FMC connector to feed testing data into FPGA from stimulus boards. A USB interface is used to backhaul the data for display. To validate our design, software-based beamforming is performed by Simulink as a reference. Data generated by a quantized Matlab module is used to verify the output of the



Fig. 7: Images Processed by Simulink and RTL using Verilog.

TABLE I: Implementation Parameters

| Parameter         | Value   | Utilization |
|-------------------|---------|-------------|
| <b>ALMs</b>       | 64,986  | 48%         |
| Registers         | 21,374  |             |
| Pins              | 54      | 10%         |
| Block memory bits | 393.216 | 2%          |
| <b>DSP</b> Blocks |         | 6%          |

board. Fig. 7 shows the output images from the quantized Simulink Model and the Altera Arria V FPGA chip. The image is generated from a flow phantom Gammex Model 1425A with an ultrasound linear probe array. The output images show that a full dynamic beamforming ultrasound system could be implemented in the proposed fashion for an resource-limited hand-held device without performance lost.

#### VI. CONCLUSION

In this paper, we presented a full dynamic beamformer for a hand-held real-time ultrasound system. The proposed architecture has been synthesized and implemented in an FPGA. It also paves the way for high resolution Ultrasound imaging using low cost FPGAs and a portable, hand-held form-factor. The used technique improves the traditional precomputing delays method to meet the requirement so that delays vary continuously to support dynamic beamforming and the results indicate that a genuine dynamic beamformer can be implemented in an FPGA-based portable ultrasound system.

#### **REFERENCES**

- [1] V. S. Gierenz, R. Schwann,T .G. Noll *"A low power digital beamformer for handheld ultrasound systems,"* Solid-State Circuits Conference, Proceedings of the 27th European, pp.261,264, 18-20 Sept, 2001
- [2] V. Shamdasani, R. Managuli,S. Sikdar, Y. Kim, *"Ultrasound colorflow imaging on a programmable system,"* Information Technology in Biomedicine, IEEE Transactions on , vol.8, no.2, pp.191,199, June 2004
- [3] Jieming Ma, *"Software-based ultrasound phase rotation beamforming on multi-core DSP,"*, 3rd ed. University of Washington master thesis, pp. 69, 2012.
- [4] Gi-duck Kim, Changhan Yoon, Sang-Bum Kye, Youngbae Lee, Jeeun Kang, Yangmo Yoo, Tai-Kyong Song, *"A single FPGA-based portable ultrasound imaging system for point-of-care applications,"* IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol.59, no.7, pp.1386-1394, July 2012
- [5] C. Yoon, J. Lee, Y.M. Yoo, T.-K. Song, *"Display pixel based focusing using multi-order sampling for medical ultrasound imaging,"*, 3rd ed. Electronics Letters , vol.45, no.25, pp.1292-1294, December 3 2009.
- [6] Alan V. Oppenheim and Ronald W. Schafer, *"Discrete-Time Signal Processing,"* 3rd Edition, Prentice-Hall Signal Processing Series, 28 August 2009.
- [7] H. T. Feldkamper, R. Schwann, V. Gierenz, T.G. Noll, *"Low power delay calculation for digital beamforming in handheld ultrasound* IEEE Ultrasonics Symposium, vol.2, no., pp.1763-1766, Oct 2000.
- [8] C. H. Hu, X. C. Xu, J. M. Cannata, J. T. Yen and K. K. Shung, *"Devolopment of a real time high frequency ultrasound digital beamformer for high frequency linear array trasducers,"* IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 53, no. 2, pp. 317-323, 2006.
- [9] R. A. Mucci , *"A comparison of efficient beamforming algorithms,"* IEEE Trans. Acoust. Speach Signal Processing, vol. ASSP-32, no. 3, pp.548-558, 1984.
- [10] T. I. Laasko, V. Valimaki, M. Karjalainen, and U .K. Laine, "Splitting the unit delay," IEEE Signal Processing Mag., vol. 13, IEEE Signal Processing Mag., vol. 13, pp.30-60, 1996.