# Distributed Clock Gating for Power Reduction of a Programmable Waveform Generator for Neural Stimulation

Emilia Noorsal<sup>1</sup>, Kriangkrai Sooksood<sup>2</sup>, Ulrich Bihr<sup>3</sup>, Joachim Becker<sup>3</sup> and Maurits Ortmanns<sup>3</sup>

Abstract— This paper describes how to employ distributed clock gating to achieve an overall low power design of a programmable waveform generator intended for a neural stimulator. The power efficiency is enabled using global timing control combined with local amplitude distribution over a bus to the local stimulator frontends. This allows the combination of local and global clock gating for complete sub-blocks of the design. A counter and a shifter employed at the local digital stimulator reduce the design complexity for the waveform generation and thus the overall power consumptions. The average power results indicate that 63% power can be saved for the global stimulator control unit and 89-96% power can be saved for the local digital stimulator by using the proposed approach. The circuit has been implemented and successfully tested in a 0.35  $\mu$ m AMS HVCMOS technology.

## I. INTRODUCTION

The design complexity of implantable stimulators has increased over the last decade due to increasing requirements on flexibility such as programmability of stimulus parameters, arbitrary wave pattern generations, stimulation strategies and especially an increasing number of stimulation sites [1]-[5]. Therefore, several design methods have been employed in the digital part of the stimulator such as the usage of dedicated microprocessor, DSP, and memory units [1], [2], [6]. Thereby, the digital stimulator became more complex and required relatively high power and area consumptions. For implantable stimulators especially with a large number of electrodes, power and area are the main critical issues which need to be considered during the design phase. In order to reduce the power consumption of the complex digital circuits in the implantable stimulators, various design techniques have been employed such as global timing assignments [3], programmable low-frequency clock [4] and clock gating [6]. In this paper, in addition to the proposed stimulator frontend and system architecture in [7], the global stimulation control is explained and tested. Moreover, the effect of global timing assignment with distributed clock gating to save the power in the digital control units are discussed and analyzed. It is shown that the architecture allows the unique feature to basically disconnect major parts of the chip from the clock and thereby achieve more than 90% power savings.

This paper is organized as follows. Section II discusses on the state of the art for flexible stimulators. Then, the overall

<sup>3</sup>U. Bihr, J. Becker and M. Ortmanns are with the Institute of Microelectronics, University of Ulm, 89081 Ulm, Germany architecture and the low power design techniques employed in the digital stimulator are described in Section III. Section IV briefly explains on the power analysis methodology which is then followed by results and discussion in Section V. Lastly, Section VI presents the conclusion.

## II. STATE OF THE ART OF FLEXIBLE STIMULATORS

Due to the increasing demands of having flexible stimulation patterns for various neural applications, several design methods have been proposed and employed. For multichannel stimulation with large number of electrodes such as retinal implant, the wave patterns are mostly limited to symmetric and asymmetric shapes with flexibility only on the timing and the amplitude level [3]-[5]. A global timing controller was employed by [3] in order to reduce the area and hardware complexity due to large number of stimulation sites. However, in [4], a local timing controller was implemented at the local stimulation unit to have a precise and better control of stimulation refresh rate, to generate pulse train as well as to control the pulse-width duration. This method increases the area and hardware complexity due to the existence of timer generators at each stimulation site. To circumvent the area overhead and the hardware complexity, the timing generator or controller was implemented externally [5], [8]. The pulse timing was controlled by sequence of successive command-frames which were sent wirelessly from external. The drawback of this method is the high risk of harmful operation in case of the link breaks and the difficulty in controlling two stimulation sites to operate simultaneously.

For other functional electrical stimulation (FES) applications that require arbitrary wave patterns, a memory unit (RAM) was mostly employed to store the stimulation profiles at each of the stimulator generator [1], [2]. In addition, a dedicated microprocessor (RISC), a DSP or an FPGA was used in order to control the stimulation operation as well as to generate flexible wave patterns [1], [6]. The drawback of these methods is the high area and power consumption, especially when the number of stimulator generator increases due to an increase number of stimulation electrodes.

The state of the art clearly proves that requiring a high flexibility of the neural stimulator electronics drastically increases the complexity in the digital circuits which indirectly increases the power consumption. In the following, the global stimulation control of the proposed architecture by [7] is outlined and the power efficient operation of the architecture by means of distributed clock gating is explained.

<sup>&</sup>lt;sup>1</sup>E. Noorsal is with the Faculty of Electrical Engineering, University Teknologi MARA (UiTM), 13500, Pulau Pinang, Malaysia emilia659@ppinang.uitm.edu.my

<sup>&</sup>lt;sup>2</sup>K. Sooksood is with the Department of Electronic Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand



Fig. 1. Overall neural stimulator architecture with global and local functions [7].

## **III. PROPOSED DESIGN ARCHITECTURE**

## A. Overall Stimulator Design Architecture

Fig. 1 illustrates the overall neural stimulator architecture, which mainly consists of global and local function blocks. This paper focuses mainly on the design technique used in the Global Stimulation Unit (GSU) and Pixel Control Unit (PCU) in order to achieve low power consumption. The area and power efficiency at each local stimulator is achieved by having a global timing and local amplitude control over a bus from the GSU. The GSU is the main control unit and controls the whole implant chip operation. The stimulation commands  $(stim\_cmd)$ , a global clock system  $(clk\_qbl)$ , the address bus, and the local stimulation data are distributed globally to all local stimulation units (LSUs). The LSU, which is addressed and digitally programmed by the GSU, mainly provides a current stimulation pulse to the attached electrodes. Each LSU consists of a digital PCU, a 5-bit current steering DAC, a 1:4 demultiplexed output HV current driver and charge balancers. With 10-bit electrode addressing in the currently implemented design, a maximum number of 256 LSUs with a total number of 1024 electrodes are supported.

The GSU mainly consists of Data Sampling Unit, Main Control Unit, Data Distributor, Stimulator and Clock Generator. The Main Control Unit controls the GSU operation such as receiving serial Manchester data from the external control unit, distributing and storing the received stimulation data, starting and ending of the stimulation process using a 4-bit finite state machine (FSM). The master clock of the GSU is 13.56MHz. The Clock Generator generates several low clock frequencies for the GSU internal operation and the  $clk_{-}gbl$ to be distributed to all LSUs. The  $clk_gbl$  which is used by all LSUs, operates at 1MHz clock frequency. The received serial stimulation data in Manchester code are first checked for any errors using 16-bit CRC in the Data Sampling Unit. There are two types of stimulation data; global stimulation profiles and local amplitude data [7]. The global stimulation profiles consist of pulse duration and stimulation commands. These stimulation data are distributed and stored in the Stimulator's register file and PCU's storage registers by the Data Distributor. During the stimulation process, the Stimulator decodes the global stimulation profiles and generates the 5-



Fig. 2. Stimulation commands for programmable waveform generation.

bit *stim\_cmd* which are distributed to all LSUs.

Fig. 2 illustrates the 5-bit *stim\_cmd* that are sent sequentially from the GSU to all LSUs at a given instant of time. The 5-bit *stim\_cmd* are used to control the stimulation operation in the LSU such as to start and end the stimulation, to turn on and off the current, to invert the polarity, to initialize or change the amplitude level for wave shaping and to execute charge balancing.

The PCU provides local amplitude storage, decodes the 5-bit stim\_cmd and interfaces the digital commands into the analog circuitry, such as stimulation pulse generation and the charge balancing process. The design architecture of the PCU is simplified with the omission of timing controller locally. This method will indirectly yield the largest amount of power reduction in each LSU and the whole implant chip especially when the number of LSU increases due to the increased number of electrodes. Additionally, a counter and a shifter are employed in the DACreg of the PCU for the programmable wave pattern generations. By sending appropriate 5-bit *stim\_cmd* from the GSU, the local amplitude in the DACreg can be increased or decreased by 1 LSB, and can be doubled or halved. This method requires an overhead of only a few combinatorial logic gates which reduces the power consumption considerably as compared to the usage of memory unit at each local stimulator. Detailed functionality of the stimulator frontend and the overall system architecture can be found in [7].

## B. Local and Global Clock Gating

Clock gating is a well known technique and has been widely used to reduce the dynamic power consumption which is caused by the clock switching activity in sequential circuits. This method shuts off the clock signals for the selected register banks when the stored values remain unchanged [6].

The digital GSU and PCU are designed using 0.35  $\mu$ m AMS HVCMOS technology with local clock gating employed. Several registers in the GSU are clock gated with a total number of 531 out of 757 registers. For the PCU, 20 out of 23 registers are clock gated. Unlike the GSU, the purpose of local clock gating in the PCU is mainly to shut off the local clock to most of the registers when the PCU is not active. It is to note that not all registers in the GSU and the PCU can be local clock gated to ensure correct timing operation and functionality.



Fig. 3. Chip photo of GSU and LSU with the routed GSU and PCU blocks.

The global clock gating is feasible in this design, which uses global timing assignment technique controlled by the GSU. Since the timing is controlled globally, the global clock to all LSUs can be shut off when it is not needed. Therefore, the  $clk\_gbl$  in Fig. 1 is shut off when there is no  $stim\_cmd$ being sent from the GSU to the LSU or during command "0" as shown in Fig. 2. Most of the  $stim\_cmd$  are sent for only 1 clock cycle except for charge balancing commands. As a result, for simple rectangular shape, the  $clk\_gbl$  is shut off to almost 90% of the whole stimulation time. This is only possible with global timing and local waveform execution.

## IV. POWER ANALYSIS METHODOLOGY

The power consumption of the designed GSU and PCU are simulated at two conditions. One is without local clock gating (NOCG) and the other is with local clock gating (CG) employed. In addition, the designed PCUs are simulated at two global clock conditions. Firstly is with  $clk_{-}gbl$  being supplied all the time to the PCU and secondly using the global clock gating as mentioned in Section III-B. At each design method, the average power is taken from two types of PCUs, namely Active and Non-Active PCUs. The Active PCU is programmed and stimulating while the Non-Active PCU is idling during the stimulation mode. Two types of wave patterns are tested such as rectangular and ramp up shapes. The post Place and Route (PnR) simulation is done using Cadence Nclaunch and the toggling signals of the GSU and PCUs are saved as switching file in value change dump (vcd) format. The dynamic, static and leakage power are then calculated using Synopsys PrimeTime PX by taking several input files such as the switching file, the parasitic file, the gate level netlist of the routed GSU and PCU, the constraints file and the 0.35  $\mu$ m AMS HVCMOS technology library file for an average power analysis.

# V. RESULT AND DISCUSSION

Fig. 3 depicts the routed and fabricated GSU and LSU using 0.35  $\mu$ m AMS HVCMOS technology. The stimulator frontend (LSU) is reused from [7]. The size of the routed GSU is 460 × 1430  $\mu$ m<sup>2</sup> with a total gate count of 6977 while the size of a single PCU is 100 × 400  $\mu$ m<sup>2</sup> with a total gate count of 433.

## A. Measured Waveform Generation and Average Power

Fig. 4 indicates the measured stimulation wave patterns from the fabricated chip. The measured wave patterns are



Fig. 4. Measured output waveforms with global clock activity  $(clk\_gbl)$ . Waveform1 (rectangular) and Waveform2 (ramp up) shapes with active spike charge balancing.



Fig. 5. Simulated average power of GSU with and without local clock gating.

Waveform1 (rectangular) and Waveform 2 (ramp up) shapes with active spike charge balancing as well as the  $clk\_gbl$  with global clock gating employed. For these waveforms, 6 and 39 commands are sent from the GSU to the PCU. Therefore, the  $clk\_gbl$  for the rectangular shape is less often needed than for the ramp up shape. In the rectangular case, the  $clk\_gbl$  is shut off to almost 90% of the stimulation time. The measured average power for the fabricated digital control units, which have 1 GSU, 1 activated PCU and 3 non-activated PCUs is approximately 1.22 mW for both types of waveforms.

## B. GSU local clock gating power analysis

Fig. 5 illustrates the simulated average power of the GSU with and without local clock gating. The average power of the GSU with and without local clock gating is the same for both waveform shapes. Without local clock gating, the GSU consumes an average power of 3.2 mW and with local clock gating, the average power reduces to 1.2 mW with power saving of 63%.

The total power for the GSU top level can be divided into two main parts; the GSU sub-modules and the clock buffers. The rectangular pulse shape is used as a reference to analyse the power distribution and power saving for the GSU top level and sub-modules. Table I lists the percentage of power saving for the GSU sub-modules and the clock buffers using the NOCG and CG method. The power of the GSU sub-modules and the clock buffers is reduced to 72% and 41% respectively by using the CG method. The results indicate that large amount of power can be saved in the GSU sub-modules and clock buffers by having local clock gating employed.

| Cells                  | NOCG (mW) | CG (mW) | % Save |
|------------------------|-----------|---------|--------|
| Gsu sub-module         | 2.220     | 0.613   | 72     |
| Clock tree and buffers | 1.023     | 0.603   | 41     |
| Total                  | 3.243     | 1.216   | 63     |

TABLE I

POWER SAVING IN GSU TOP LEVEL

TABLE II Power saving in GSU sub-modules

| GSU sub-modules   | NOCG (mW) | CG (mW) | % Save | Reg. |
|-------------------|-----------|---------|--------|------|
| Data Sampling     | 0.2330    | 0.1848  | 21     | 16   |
| Clock Generator   | 0.1456    | 0.1620  | -11    | -    |
| Main Control Unit | 0.0974    | 0.0997  | -2     | -    |
| Stimulator        | 1.6650    | 0.0863  | 95     | 501  |
| Data Distributor  | 0.0438    | 0.0447  | -2     | -    |
| Error counter     | 0.0001    | 0.0000  | 65     | 14   |
| Others            | 0.0351    | 0.0357  | -2     | -    |

The power saving for each sub-module in the GSU is listed in Table II. From the power results, it is obviously seen that the GSU sub-modules with CG employed, namely *Data Sampling*, *Stimulator* and *Error Counter* have high power saving. The *Stimulator* sub-module has the highest power saving since 501 registers are clock gated, in which 473 of the gated registers are the *Stimulator*'s register file. The other sub-modules have almost the same power reading or slightly higher in order to drive the logic gates.

### C. PCU local and global clock gating power analysis

Fig. 6 illustrates the simulated average power of the PCUs with and without local and global clock gating for active and non-active PCU's for one stimulation electrode. In general, the average power for the ramp up shape is slightly higher than the rectangular shape since more commands are sent to the PCU. The Active and Non-Active PCUs without local and global clock gating produce the highest and the second highest average power around  $11 \,\mu\text{W}$  and  $10 \,\mu\text{W}$ , respectively. When local clock gating is applied to the PCU, the average power reduces to 11-14% and 75-76% for Active and Non-Active PCUs. The higher power saving in the Non-Active PCU is due to the local clock gating which shut off the clock to most of the registers during inactive mode. However, if the global clock is shut off when no communication is done, the power for the Active and Non-Active PCUs are reduced considerably to 89-96%. It is noted that, in this global clock gating mode, the average power for the Active PCU with local clock gating is slightly higher than the Active PCU without local clock gating due to the extra logic gates from the clock gating. However, the Non-Active PCU with local and global clock gating produces the highest power saving of 96% which is beneficial for the overall implant power since not all PCUs are activated during the stimulation mode. The results of average power by the PCU is multiplied by the number of stimulated electrode.

# VI. CONCLUSION

The architecture and design techniques for a low power programmable digital stimulator have been presented. Global



Fig. 6. Simulated average power for Active and Non-active PCUs with and without local and global clock gating. Results for 1 stimulation electrode.

clock gating is achieved by having globally controlled timing and only local execution of globally distributed commands. This allows to completely deactivate major parts of the digital stimulator control during operation. The average power analyses conducted on two stimulation wave patterns using PrimeTime PX demonstrated that tremendous amount of power could be saved by using global and local clock gating in the design circuits. The GSU has 63% power saving when local clock gating is applied to 531 registers. For the PCU, the global and local clock gating proved to provide the highest power saving around 89-96%. For 1024 electrodes implementation, which requires 256 LSUs, this corresponds to an approximately 2 mW power saving for the GSU and an accumulated 2.5 mW for the LSUs. In conclusion, proper design techniques with global and local clock gating, can enormously increase the power effiency in the digital stimulator control units.

#### REFERENCES

- J. Mouine, K. A. Ammar, and Z. Chtourou, "A completely programmable and very flexible implantable pain controller," *Proc. IEEE EMBS*, vol. 2, pp. 1104–1107 vol.2, 2000.
- [2] A. Ba and M. Sawan, "Multiwaveforms generator dedicated to selective and continuous stimulations of the bladder," *Proc. IEEE EMBS*, vol. 2, pp. 1569–1572 Vol.2, 2003.
- [3] M. Ortmanns, A. Rocke, M. Gehrke, and H. J. A. T. H. J. Tiedtke, "A 232-channel epiretinal stimulator asic," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2946–2959, 2007.
- [4] C. Kuanfu, Y. Zhi, H. Linh, J. Weiland, M. Humayun, and L. Wentai, "An integrated 256-channel epiretinal prosthesis," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1946–1956, 2010.
- [5] A. Rothermel, L. Liu, N. P. Aryan, M. Fischer, J. Wuenschmann, S. Kibbel, and A. Harscher, "A cmos chip with active pixel array and specific test features for subretinal implantation," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 290–300, 2009.
- [6] S. Mai, C. Zhang, and Z. Wang, "An application-specific low power speech processor for cochlear implants," *Proc. IEEE BioCAS*, pp. 177 –180, 2009.
- [7] E. Noorsal, K. Sooksood, H. Xu, R. Hornig, J. Becker, and M. Ortmanns, "A neural stimulator frontend with high-voltage compliance and programmable pulse shape for epiretinal implants," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 244 –256, jan. 2012.
- [8] M. Ghovanloo and K. Najafi, "A modular 32-site wireless neural stimulation microsystem," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2457–2466, 2004.