# Fully Integrated 60 GHz Power Amplifiers in 45 nm SOI CMOS

by

Hassan Shakoor

A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2016

© Hassan Shakoor 2016

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.

#### Abstract

With the rapid growth of consumer demand for high data rates and high speed communications, the wireless spectrum has become increasingly precious. This has promoted the evolution of new standards and modulation schemes to improve spectral efficiency. The allocation of large bandwidths is an alternative to increase the channel capacity and data rate, however the availability of spectrum below 10 GHz is very limited. Recently, the 60 GHz spectrum has emerged as a potential candidate to support multi-Gb/s applications. It offers 7 GHz of unlicensed spectrum, for development of Wireless Personal Area Networks (WPAN) and Wireless HD streaming. Meanwhile, the scaling and advancement of low-cost complementary metal-oxide semiconductor (CMOS) technologies has enabled the use of CMOS devices at millimeter wave frequencies and the integration of analogue and digital circuitry has created platform for single chip radio development. However, low power density, low optimum load resistance and poor quality integrated passives (due to lossy silicon substrate) make CMOS technology a poor candidate for power amplifier (PA) design, compared to silicon germanium and Group III-V technologies (gallium nitride, gallium arsenide and indium phosphide).

In order to overcome the above mentioned challenges in CMOS, this thesis explores FET-stacking as a power combining technique at 60 GHz using 45nm silicon-on-insulator (SOI) CMOS for millimeter-wave PAs. The stacking approach enables the use of higher supply voltages to obtain higher output power, and its higher load line resistance  $R_{opt}$  allows for the use of low impedance transformation matching networks. The reliability of CMOS PA under large signal operation is also addressed and improved with the FET-stacking approach applied in this work.

This thesis divides the millimeter-wave PA design problem in to two areas, active and passive, both of which are critically designed for optimum performance in terms of efficiency and output power while taking device and substrate parasitics into consideration. A transistor unit cell combination topology, the 'Manifold', has been analyzed and applied in 45 nm SOI CMOS for large RF power transistor cells. Moreover, various topologies of slow wave coplanar waveguide (CPW) lines are analyzed and implemented on the SOI substrate to synthesize inductors for matching networks at 60 GHz.

To demonstrate the active and passive design performance in 45nm SOI CMOS at 60 GHz, a two-stage cascode PA is presented. Measurement under continuous wave (CW) stimulus shows 18.1 dB gain, a 3 dB bandwidth of 19%, 14 dBm saturated output power at 21% peak power-added efficiency (PAE). Moreover, to validate the FET-stacking analysis, a three-stack PA is designed with an output performance of 8.8 dB gain, a 3 dB bandwidth

of 20%, 16 dBm saturated output power at 14% peak PAE. Finally, a wideband three stage amplifier is designed utilizing the two-stage cascode and three-stack PA, achieving 21.5 dB flat gain over a fractional bandwidth of 20%, and 16 dBm saturated output power at 13.8% PAE.

#### Acknowledgements

I would like to express my utmost gratitude to Dr. Slim Boumaiza for introducing me to the magnificent world of microwave and RF. His constant support, guidance and encouragement during tough times has made the pursuit of my goals possible. I would also like to thank Dr. John Long and Dr. Manoj Sachdev for reading my thesis and providing valuable feedback.

I would like to thank my colleagues Hamed, Peter, Yushi, Mingming, Kasyap and Stanley, for sharing their knowledge and experience with me. Thanks to my EmRG family - for being a great support and for putting up with me during these two years.

I would also like to thank Steve Kovacic and Foad Arfaei Malekzadeh from Skyworks Solutions for their helpful advice and fabrication support.

Finally, I would like to thank the three most important people in my life: my parents and my sister. Their love, support and encouragement are the greatest gifts in life.

# **Table of Contents**

| Li            | st of                                                      | Tables                          | 5                                                                                     | viii |  |  |  |
|---------------|------------------------------------------------------------|---------------------------------|---------------------------------------------------------------------------------------|------|--|--|--|
| $\mathbf{Li}$ | st of                                                      | Figure                          | ∋ <b>s</b>                                                                            | x    |  |  |  |
| 1             | Intr                                                       | ntroduction                     |                                                                                       |      |  |  |  |
|               | 1.1                                                        | Motiva                          | ation for 60 GHz Radio                                                                | 1    |  |  |  |
|               | 1.2                                                        | Proble                          | em Statement                                                                          | 3    |  |  |  |
|               | 1.3                                                        | Thesis                          | Organization                                                                          | 4    |  |  |  |
| <b>2</b>      | Ove                                                        | erview                          | of Millimeter-Wave Power Amplifiers                                                   | 6    |  |  |  |
|               | 2.1                                                        | Classie                         | cal Power Amplifiers                                                                  | 6    |  |  |  |
|               | 2.2                                                        | Wavef                           | orm Engineered and Switching Mode Power Amplifiers                                    | 9    |  |  |  |
|               | 2.3                                                        | Previo                          | us Work on Millimeter-Wave CMOS Power Amplifiers                                      | 11   |  |  |  |
|               |                                                            | 2.3.1                           | 60 GHz PA in CMOS and SiGe                                                            | 12   |  |  |  |
|               |                                                            | 2.3.2                           | 45 nm CMOS SOI for Millimeter-Wave PAs $\ \ldots \ \ldots \ \ldots \ \ldots \ \ldots$ | 14   |  |  |  |
|               |                                                            | 2.3.3                           | Literature Review Summary                                                             | 18   |  |  |  |
| 3             | 3 Analysis of 45 nm CMOS SOI Active and Passive Components |                                 |                                                                                       |      |  |  |  |
|               | 3.1                                                        | Overview of CMOS SOI Technology |                                                                                       |      |  |  |  |
|               | 3.2                                                        | Active                          | Device Layout Analysis for Millimeter Wave Power Amplifiers                           | 22   |  |  |  |
|               |                                                            | 3.2.1                           | Unit Cell Analysis                                                                    | 23   |  |  |  |

|   |     | 3.2.2   | Manifold Layout                                               | 26 |
|---|-----|---------|---------------------------------------------------------------|----|
|   |     | 3.2.3   | Grid Layout                                                   | 26 |
|   |     | 3.2.4   | Round-Table Layout                                            | 29 |
|   |     | 3.2.5   | Overall Performance Summary                                   | 29 |
|   | 3.3 | Passiv  | es Analysis                                                   | 32 |
|   |     | 3.3.1   | Substrate Shielding                                           | 33 |
|   |     | 3.3.2   | Slow-wave CPW and Grounded Shielded CPW                       | 34 |
|   |     | 3.3.3   | Fully Shielded and Top Shielded CPWs                          | 34 |
|   |     | 3.3.4   | Performance Summary                                           | 36 |
| 4 | Mil | limeter | r-wave Cascode and Stacked-FET Power Amplifier                | 41 |
|   | 4.1 | Casco   | de and FET Stacking for Millimeter-Wave Power Transistor      | 41 |
|   |     | 4.1.1   | Cascode Cell Analysis                                         | 42 |
|   |     | 4.1.2   | FET-Stacking Analysis                                         | 44 |
|   | 4.2 | A 2-st  | age 60 GHz Cascode Power Amplifier in 45 nm CMOS SOI $\ldots$ | 48 |
|   |     | 4.2.1   | Output Power Stage                                            | 50 |
|   |     | 4.2.2   | Input Stage and Inter-stage Matching                          | 51 |
|   |     | 4.2.3   | Final Circuit                                                 | 52 |
|   |     | 4.2.4   | Small and Large-signal Measurement Setup                      | 52 |
|   |     | 4.2.5   | Simulation and Measurement Results                            | 54 |
|   | 4.3 | A 3-st  | ack 60 GHz Power Amplifier in 45 nm CMOS SOI                  | 54 |
|   |     | 4.3.1   | Inter-stack Matching                                          | 58 |
|   |     | 4.3.2   | Final Circuit                                                 | 63 |
|   |     | 4.3.3   | Simulation Results                                            | 67 |
|   | 4.4 | A 3-st  | age 60 GHz Power Amplifier in 45 nm CMOS SOI                  | 68 |
|   |     | 4.4.1   | Input, Inter-stage and Output Matching Networks               | 70 |
|   |     | 4.4.2   | Simulation Results                                            | 71 |
|   | 4.5 | Higher  | r Order Harmonic Control for 60 GHz PA Design                 | 72 |
|   |     |         |                                                               |    |

| <b>5</b> | Conclusions and Future Work |             |    |  |
|----------|-----------------------------|-------------|----|--|
|          | 5.1                         | Conclusions | 77 |  |
|          | 5.2                         | Future Work | 78 |  |
| Re       | efere                       | nces        | 80 |  |

#### References

# List of Tables

|                                                            | 9                                                                                                                                                                             |
|------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ture $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 19                                                                                                                                                                            |
| PA Literature                                              | 20                                                                                                                                                                            |
| anifold and Grid Layout Topologies                         | 31                                                                                                                                                                            |
| CPW Lines Analyzed for Fixed eristic Impedance             | 39                                                                                                                                                                            |
| ary of Cstack and Lstack                                   | 62                                                                                                                                                                            |
| a<br>[]<br>[]<br>[]<br>[]<br>[]<br>[]                      | ature          I PA Literature          Ianifold and Grid Layout Topologies         s CPW Lines Analyzed for Fixed         eristic Impedance         ary of Cstack and Lstack |

# List of Figures

| 1.1  | Availability of 60 GHz spectrum worldwide                                                                                                                                                                                                 | 2  |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1  | DC characteristics of ideal FET showing (a) $V_{gs}$ and $I_{ds}$ bias points for various class of operation and (b) class A and B loadlines $\ldots \ldots \ldots$                                                                       | 8  |
| 2.2  | Fourier analyses of reduced conduction angle current waveforms $\ldots$ .                                                                                                                                                                 | 10 |
| 2.3  | Class F topology showing (a) general implementation of class-F PA and (b) class-F current and voltage waveforms                                                                                                                           | 11 |
| 2.4  | A generalized 4-way 3-stage power combining PA architecture $\ldots$ .                                                                                                                                                                    | 12 |
| 2.5  | Entire 3-stage DAT PA topology                                                                                                                                                                                                            | 14 |
| 2.6  | Schematic of 3-stage transformer-coupled 4-way combining differential PA                                                                                                                                                                  | 14 |
| 2.7  | Schematic of 90 GHz multi-drive stacked-FET PA                                                                                                                                                                                            | 16 |
| 2.8  | A 4-stacked PA comprising of two common source and two common gate cells                                                                                                                                                                  | 17 |
| 2.9  | Schematic of the 33-46 GHz watt-class PA array prototype                                                                                                                                                                                  | 17 |
| 2.10 | System block diagram for PA utilizing spatial combining with 2 x 2 antenna array system                                                                                                                                                   | 18 |
| 3.1  | IBM 45 nm CMOS SOI stackup                                                                                                                                                                                                                | 22 |
| 3.2  | Optimized 30 $\mu$ m unit cell layout $\ldots \ldots \ldots$                                                                                                               | 24 |
| 3.3  | Simulated (a) $f_{max}$ and (b) $f_t$ of regular pitch floating body transistors of various widths $\ldots \ldots \ldots$ | 25 |
| 3.4  | View of a high power LDMOS transistor with lid removed                                                                                                                                                                                    | 26 |
| 3.5  | Layout view of a 180 $\mu$ m manifold transistor                                                                                                                                                                                          | 27 |

| 3.6  | Layout view of a 4x4 grid layout (left) utilizing 15 x 2 $\mu \rm m$ unit cell (right)                                                                                                                                   | 28 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.7  | Staircase structure for source and drain, implemented for grid topology $\ .$ .                                                                                                                                          | 28 |
| 3.8  | Illustration of the round table layout concept                                                                                                                                                                           | 29 |
| 3.9  | Equivalent ColdFET model for parasitic extraction                                                                                                                                                                        | 30 |
| 3.10 | (a) CPW cross section on silicon substrate and (b) single $\pi$ -section lumped element circuit model of transmission line on silicon substrate $\ldots \ldots \ldots$                                                   | 33 |
| 3.11 | Three-dimensional view of (a) SW-CPW and (b) GS-CPW transmission line                                                                                                                                                    | 35 |
| 3.12 | Cross-section of (a) fully shielded and (b) top shielded transmission lines .                                                                                                                                            | 35 |
| 3.13 | Simulation results for (a) Q factor, (b) attenuation per mm of length and (c) phase constant (Beta) in radians per mm for SW-CPW, GS-CPW and unshielded CPW (as a reference), with $SL = 2 \ \mu m$ and $SS = 2 \ \mu m$ | 37 |
| 3.14 | Simulation results for (a) Q factor, (b) attenuation per mm of length and (c) phase constant (Beta) in radians per mm for FS-CPW, TS-CPW and unshielded CPW (as a reference)                                             | 38 |
| 3.15 | Illustration of slot dimensions including slot length and slot spacing for SW-CPW and GS-CPW topologies                                                                                                                  | 39 |
| 3.16 | Simulation results for (a) Q factor and (b) attenuation per mm of length for various shield lengths and shield spacings                                                                                                  | 40 |
| 4.1  | Cascode transistor topology with associated interstage node parasitics                                                                                                                                                   | 42 |
| 4.2  | Maximum available gain of a 30 um common source and cascode transistor<br>in 45 nm SOI technology                                                                                                                        | 43 |
| 4.3  | Common gate circuit used for stability analysis                                                                                                                                                                          | 44 |
| 4.4  | (a) Cascode topology and (b) 2-stack topologies depicting voltage swings (red) at each node                                                                                                                              | 45 |
| 4.5  | (a) Generalized stacked topology with capacitive parasitic for k stages and (b) a small signal model of $k^{th}$ stacked transistor                                                                                      | 47 |
| 4.6  | Variation in $OP_1$ dB and ITR for various transistor widths at 60 GHz                                                                                                                                                   | 49 |
| 4.7  | Output power and PAE contours for 90 um cascode device at 60 GHz $~$                                                                                                                                                     | 50 |
| 4.8  | LC output matching topology realized with SW CPW $\ldots \ldots \ldots \ldots$                                                                                                                                           | 51 |
| 4.9  | L-C-L based inter-stage matching topology between driver and output stages                                                                                                                                               | 52 |

| 4.10 | Simplified schematic of the single-ended 2-stage cascode PA $\ldots \ldots \ldots$                                                                                                                                                                                                                                                 | 53 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.11 | Die photomicrograph of 2-stage single-ended cascode PA $\ldots \ldots \ldots \ldots$                                                                                                                                                                                                                                               | 53 |
| 4.12 | Measurement setup for (a) small-signal and (b) large-signal PA characterization                                                                                                                                                                                                                                                    | 55 |
| 4.13 | Results for (a) forward gain $(S_{21})$ , (b) input reflection coefficient $(S_{11})$ , (c) output reflection coefficient $(S_{22})$ and (d) isolation $(S_{12})$ of the 2-stage cascode PA $\ldots \ldots \ldots$ | 56 |
| 4.14 | Large signal performance showing (a) AM-AM and efficiency performance<br>at 60 GHz, (b) AM-PM at 60 GHz and (c) measured PAE, OP-1dB and<br>$P_{sat}$ across 17% fractional bandwidth of 2-stage PA                                                                                                                                | 57 |
| 4.15 | Small signal model of inter-stack network between the $k^{th}$ and $(k+1)^{th}$ stage                                                                                                                                                                                                                                              | 59 |
| 4.16 | Matching techniques:(a) shunt capacitor (b) series inductance (c) shunt in-<br>ductance                                                                                                                                                                                                                                            | 61 |
| 4.17 | Simulated drain to source voltage swing Vds (a) with and (b) without inter-<br>stack matching network synthesized by shunt inductor                                                                                                                                                                                                | 63 |
| 4.18 | Simulated drain to gate voltage swing Vdg (a) with and (b) without inter-<br>stack matching network synthesized by shunt inductor                                                                                                                                                                                                  | 64 |
| 4.19 | Simulated gate to source voltage swing Vgs (a) with and (b) without inter-<br>stack matching network synthesized by shunt inductor                                                                                                                                                                                                 | 65 |
| 4.20 | Simulated (a) AM-AM and (b) drain efficiency performance of the 3-stack PA with and without inter-stack matching network synthesized by shunt inductor                                                                                                                                                                             | 66 |
| 4.21 | Simplified schematic of the single-ended 3-stack PA                                                                                                                                                                                                                                                                                | 67 |
| 4.22 | Simplified schematic of the single-ended 3-stack PA                                                                                                                                                                                                                                                                                | 68 |
| 4.23 | Small signal results for (a) forward gain $(S_{21})$ , (b) input reflection coefficient $(S_{11})$ , (c) output reflection coefficient $(S_{22})$ and (d) isolation $(S_{12})$ for 3-stack PA                                                                                                                                      | 69 |
| 4.24 | Simulated large signal performance showing (a) AM-AM and efficiency and (b) AM-PM performance of 3-stack PA at 60 GHz                                                                                                                                                                                                              | 70 |
| 4.25 | L-L section matching topology for (a) input and (b) output of the 3-stage PA                                                                                                                                                                                                                                                       | 71 |
| 4.26 | Simplified schematic of the single-ended 3-stage PA                                                                                                                                                                                                                                                                                | 72 |

| 4.27 | Layout of proposed 3-stage PA                                                                                                                                                                                        | 73 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.28 | Simulated small signal performance for (a) forward gain $(S_{21})$ , (b) input<br>reflection coefficient $(S_{11})$ , (c) output reflection coefficient $(S_{22})$ and (d)<br>isolation $(S_{12})$ of the 3-stage PA | 74 |
| 4.29 | Simulated large signal performance showing (a) AM-AM and PAE at 60 GHz (b) AM-PM 60 GHz and (c) PAE, OP-1dB and $P_{sat}$ across 20% fractional bandwidth for the 3-stage PA                                         | 75 |
| 4.30 | Simulated output power and PAE contours for second harmonic, 120 GHz, (a) load pull and (b) source pull on a 90 $\mu$ m cascode transistor                                                                           | 76 |
| 5.1  | Traditional Doherty PA                                                                                                                                                                                               | 79 |

### Chapter 1

### Introduction

#### 1.1 Motivation for 60 GHz Radio

Serving a growing population of users around the world, wireless communication has served as the backbone of communication mediums, of supporting applications ranging from household Wi-Fi networks to satellite communication. Over the past decade, connectivity through wireless communication has seen a significant shift in its utilization, increasingly moving from voice centric to data-centric applications. This is primarily due to the introduction of smart phones and an ever increasing user base of cellular communication systems which pose challenges as attempts are made to meet their never ending demands for higher throughput and bandwidth. The evolution of new modulation schemes such as Long Term Evolution (LTE), that utilizes Carrier Aggregation to efficiently utilize bandwidth and increase bit rate [1], have ameliorated the situation, however the constant need for larger bandwidths for high data rate applications still exists.

The Shannon-Hartley law dictates the maximum theoretical channel capacity (bits/s), and is formulated as:

$$C = BW \log_2(1 + SNR) \tag{1.1}$$

The relationship above shows channel capacity being in direct proportion to bandwidth, (BW), and logarithmically proportional to the signal to noise ratio, (SNR). Hence, the quest for high data rates has led academia and industry to utilize the unlicensed spectrums to obtain larger bandwidth and channel capacity. The evolution of mobile communications to fifth generation (5G) technologies is driven by the above mentioned factors along with

the desire to service future use cases of voice and data communications that will require, low latency ultra-reliable communications, machine type communications and enhanced mobile broadband.

Several potential unlicensed frequency bands above 6 GHz are available to be utilized to implement the next generation communication standard. The 6 to 30 GHz band is already mostly licensed and does not offer unused multi-gigahertz bandwidth. Most of the frequency in this region has been allotted for satellite communication, mainly in the Ka/Ku band [2]. The 28 GHz band has recently gained attention as a potential spectrum for 5G communication and was initially allotted for Local Multipoint Distribution Service (LMDS), in the United States (US) [2].

Moving above 30 GHz, the 40 to 45 GHz spectrum is currently used infrequently in Europe, although it has secondary allocations in China, South Korea, Japan and the US. The E band across 70 GHz to 100 GHz has gained popularity for automotive radar (76 GHz to 81 GHz) and wireless backhaul purposes. [2].

In 2001, the Fedral Communications Commission (FCC) in the US allocated 57 GHz to 64 GHz for unlicensed use [3]. The Conference on Postal and Telecommunication Administration also opened up the 57 GHz to 60 GHz bandwidth for exploration. Figure 1.1 shows the availability of the 60 GHz spectrum for unlicensed use worldwide [4].



Figure 1.1: Availability of 60 GHz spectrum worldwide [4]

While the abundance of 60 GHz unlicensed spectrum around the world provides the ability to support high-rate communication, this spectrum poses several implementation difficulties and challenges. Studies on 60 GHz channel characterization have shown 20 to 40 dB free space path loss, 15 to 30 dB/km atmospheric absorption (depending on atmospheric conditions) and difficult non-line of sight communication due to significant multipath effects [5]. Nevertheless, many applications for the 60 GHz spectrum are targeting its short range characteristics for point to point links such as WiGig (IEEE 802.11 ad), wireless high definition video streaming and wireless backhaul.

The attractive opportunity to explore the 60 GHz spectrum for high bandwidth applications puts a strain on existing semiconductor technologies for radio hardware and poses new sets of challenges for performance delivery on newer technology nodes. Silicon germanium (SiGe) and group III-V technologies are generally more suitable for millimeter wave (mm-wave) applications due to their lower loss substrate and high speed active devices. However, complementary metal-oxide semiconductor (CMOS) technology provides significant potential for low cost integration with other parts of the radio system. Moreover, with shrinking CMOS gate lengths, significant active device speeds are attainable for deep nano scale nodes, with peak unity current gain and unity unilateral power gain of 250 GHz and 280 GHz respectively for 45 nm CMOS silicon-on-insulator (SOI) technologies.

#### 1.2 Problem Statement

With interest growing in utilizing mm-wave bands for applications in wireless communication, automotive radar and satellite radio, nano-scale CMOS technology provides an attractive design platform due to its low manufacturing cost, integration capability and high cut-off,  $f_t$ , and oscillation,  $f_{max}$  frequencies, (> 200 GHz). However, CMOS suffers from its own set of challenges when used in design of power amplifiers (PAs), key elements in wireless communication systems. The low break-down voltage limits the maximum voltage swing and instantly makes CMOS a poor candidate technology for high power output applications in transmitters. This limitation is closely followed by its poor quality integrated passives on low resistance silicon substrate that degrades the efficiency and bandwidth of matching networks and power combiners. Lastly, the low optimum load resistance,  $R_{opt}$ , of CMOS transistors leads to a large impedance transformation ratio (upto 50  $\Omega$ ) making matching networks on chip large and difficult to realize.

Power combining is commonly used in PAs designed for low frequency and mm-wave frequency operating regimes to obtain large powers from multiple on chip unit PA cells. However, limitations of such architectures lie in the output power combiner efficiency. The large area consumption and low quality factor of passive combiners on silicon, such as the Wilkinson combiner, degrade the overall performance- primarily gain, power output and efficiency. Use of transformers for power combining has been reported in the literature on multistage and differential PAs due to their compact layout, low loss and impedance transformation that facilitates design of matching networks. The purpose of this thesis is to investigate the design of mm-wave PAs with enhanced efficiency and linearity using an alternative technique of power combining, the stacked-field effect transistor (FET) approach. Stacking helps to overcome the low breakdown voltage issue of CMOS FETs while providing a higher radio frequency (RF) voltage swing at the output. Various single-ended design topologies are presented and measured to understand the feasibility of their implementation and performance in transmitter systems. Apart, from the PA design, the objective of this thesis is to evaluate and experiment with the IBM 45 nm CMOS SOI technology as a potential candidate technology for 60 GHz mm-wave transmitters.

#### **1.3** Thesis Organization

The organization of the thesis is as follows. Chapter 2 starts by introducing different PA classes of operation including linear classes (Class A, B, AB and C) as well as waveform engineered PAs (class F,  $F^{-1}$ ) followed by a literature review of existing PA designs in the 60 GHz band. Moreover, the literature review will also focus on PAs implemented in 45 nm CMOS SOI technology.

Chapter 3 discusses the 45 nm CMOS SOI technology and presents a thorough analysis of different transistor layouts and their performances under mm-wave operation. Limitations of the CMOS SOI process will also be discussed for some layout topologies along with a proposed optimized PA layout design to minimize device parasitic and thermal effects. A study on passives is also presented and focuses on various coplanar wave guide (CPW) topologies including, grounded shielded CPW (GS-CPW), slow wave CPW (SW-CPW), top shielded CPW (TS-CPW) and fully shielded CPW (FS-CPW). The analysis is based on comparing these CPW topologies to obtain compact and low insertion loss CPW lines to realize as inductors for matching networks.

Moving forward, Chapter 4 presents the FET-stacking topology as an alternative power combining technique to cascode for PAs, and the performances of the two approaches are compared. The implementation of a 2-stage cascode PA at 60 GHz is also presented along with simulation and measurement results. Chapter 4 goes on to discuss the extension of the FET-stacking technique to a three-stack (3-stack) stage PA at 60 GHz, along with its simulated and measured results. This chapter concludes by presenting a 3-stage single

ended PA, utilizing the 3-stack stage as the power stage, with simulated and measured results at 60 GHz. Finally chapter 5 will present conclusions based on the work done in the thesis and provide some suggestions for future work.

### Chapter 2

## Overview of Millimeter-Wave Power Amplifiers

It is challenging to design PAs in silicon and complex to implement them in mm-wave transceivers. This complexity arises from various factors that must be considered in the PA design, such as output power, efficiency, linearity and reliability. Overall, the design of CMOS PAs poses a multi-dimensional problem and requires the designer to make balanced performance trade-offs among the above mentioned factors to achieve optimal output performance balance.

This chapter discusses the basic background literature on PA and the various classes of operation. A literature survey of PAs designed in nano-scale CMOS and 130 nm SiGe BiCMOS for the mm-wave spectrum is also presented. Finally, a comparison is made among the various PA topologies and combining techniques for single stage, multistage and multipath architectures.

### 2.1 Classical Power Amplifiers

It is possible to think of PAs as DC to RF power converters. The amplification takes place by taking an input RF signal at the carrier frequency (frequency of interest) and amplifying it through an active device. The amplified output power is then transmitted through an antenna over the air for short or long range applications. The amount by which the signal is amplified can be expressed as power gain G, for a given input power level  $P_{in}$ and obtained power output  $P_{out}$  (input and output power expressed in watts),

$$G = \frac{P_{in}}{P_{out}} \tag{2.1}$$

The overall efficiency, referred to as the drain efficiency, of the PA determines the effective conversion to RF power of a given input DC power and is expressed as a percentage as shown in (2.2). For 60 GHz, where the operating frequency of the transistor is considerably higher than a decade to the  $f_{max}$  of CMOS technology, the gain of the transistor is considerably lower and is accounted for in the overall efficiency as the power added efficiency, (PAE), of the PA as shown in (2.3).

$$DrainEfficiency(\eta) = \frac{P_{out}}{P_{dc}}$$
(2.2)

$$PAE = \frac{P_{out} - P_{in}}{P_{dc}} = (1 - \frac{1}{G})\eta$$
(2.3)

Traditional PAs are comprised of a single transistor biased for a particular class of operation (through a biasing network) that is presented with the optimal load and source impedances through matching networks. For conventional classes of operation, the transistor is modelled as a transconductor, i.e. a voltage controlled current source, that converts input voltage in to output current.

For the class A amplifier, the gate is biased halfway between the saturation and cut off limits as shown by the Vgs versus Ids ideal FET transfer curve in Figure 2.1. If the input voltage swing, Vg, is kept within the limits of the saturation and cut off, a linear drain current can be generated, leading to an undistorted output voltage swing across the load. From Figure 2.3b, it can be seen that the slope of the load line for class A determines the optimum fundamental load impedance that results in maximum voltage swing, output power and efficiency. The optimum load line impedance and maximum output power can be expressed as,

$$R_{opt} = \frac{V_{DSmax}/2}{I_{max}/2} \tag{2.4}$$

$$P_{out} = \frac{1}{8} V_{DSmax} I_{max} \tag{2.5}$$

Under the above mentioned conditions, a class A can theoretically achieve a maximum of 50% efficiency at maximum voltage swing and peak output power, suffering from efficiency



Figure 2.1: DC characteristics of ideal FET showing (a)  $V_{gs}$  and  $I_{ds}$  bias points for various class of operation and (b) class A and B loadlines

degradation as the output power level is backed off. Moreover, when the non-ideal aspects of the transistor are considered, such as the knee voltage  $(V_{knee})$ , the efficiency of the transistor degrades from its theoretical 50% peak efficiency. In order to improve efficiency over that of class A operation, reduced conduction angle modes such as classes AB, B and C can be attained by lowering the gate bias point further towards the cut off point. For example, for a class B PA, the gate is biased at the threshold voltage of the transistor. A more intuitive understanding of the efficiency improvement obtained with reduced conduction angle modes is shown in Figure 2.2. Here it can be seen from the Fourier analysis on the drain current that the fundamental component be larger than the DC (for  $\alpha < 2\pi$ ), there will not be any RF signal at the input of the transistor which will conserve power consumption, thus improving efficiency. Since the amplifier conducts for only half of the input cycle, a conduction angle of  $\pi$  (as shown in Figure 2.2), it can achieve a theoretical efficiency of 78.5% at peak output power. Class AB operation is defined by a region rather than a defined bias point, and this lies between the class A and B bias points. A theoretical efficiency of 100% can be achieved if the gate bias point is moved below the threshold voltage, this is th epoint of class C operation. Each class of operation provides a trade-off in terms of gain, efficiency and linearity which is qualitatively summarized in Table 2.1 below.

 Table 2.1: Comparative Performance of Conventional Linear and Reduced Conduction

 Angle Mode Classes of Operation

| Class | Conduction<br>Angle $(\alpha)$ | Gain         | Efficiency   | Linearity    |
|-------|--------------------------------|--------------|--------------|--------------|
| Α     | $2\pi$                         | Excellent    | Poor         | Excellent    |
| AB    | $\pi < \alpha < 2\pi$          | Good         | Satisfactory | Satisfactory |
| В     | π                              | Satisfactory | Good         | Excellent    |
| С     | $\alpha < \pi$                 | Poor         | Excellent    | Poor         |

### 2.2 Waveform Engineered and Switching Mode Power Amplifiers

The motivation for using waveform engineering in PA design is to overcome the narrow design space of the conventional PA classes. For instance, for optimum performance, class B requires the higher harmonic to be shorted at the output, and presented with  $R_{opt}$  at the fundamental. Such harmonic conditions are difficult to achieve over a wide bandwidth, making both the design and design space narrow band.

Waveform engineered PAs, such as classes  $F/F^{-1}$  utilize harmonic tuning to increase the efficiency. Class F PAs are ideally designed to achieve perfectly square waveforms



Figure 2.2: Fourier analyses of reduced conduction angle current waveforms [6]

for the output voltage and half sine-waves for the output current, by shorting all even harmonics and presenting an open circuit to all odd harmonics at the output as shown in Figure 2.3. A theoretical efficiency of 100% can be achieved if non-overlapping voltage and drain waveforms are obtained. Moreover, class B/J amplifier also increases the design space and improves bandwidth. It provides an equivalent impedance range to obtain same gain, efficiency and linearity of that of a class B terminated with short harmonic impedances [7].

Switch mode classes such as Class  $D/D^{-1}$  and E utilize the transistor as a switch rather than as a voltage controlled current source. Switching PAs are highly efficient but non-linear. Moreover, the lack of fast switching devices and the proportional increase in switching losses with frequency at mm-wave prevents their widespread application.



Figure 2.3: Class F topology showing (a) general implementation of class-F PA and (b) class-F current and voltage waveforms

### 2.3 Previous Work on Millimeter-Wave CMOS Power Amplifiers

This section provides an overview of mm-wave PAs implemented for Q band and V band applications. The literature review covers various technologies including bulk CMOS, CMOS SOI and SiGe. Several PA topologies will be discussed with the aim of analyzing various power combining methods to obtain higher power output in the mm-wave regime. The goal is to provide an up to date overview of the past and recent work done in mm-wave PA design and provide some insight into the methodologies that are applied to design those PAs.

#### 2.3.1 60 GHz PA in CMOS and SiGe

At mm-wave frequencies, using large transistors for power makes the design and performance non viable due to limitation posed by parasitics. Early work in 60 GHz PA design presented various power combining techniques to overcome the low power density of CMOS devices. Figure 2.4 shows the generalized structure of a four way multipath PA topology.



Figure 2.4: A generalized 4-way 3-stage power combining PA architecture

For example, work in [8] utilized a Wilkinson power combiner in 90 nm bulk CMOS for a 4-way, four stage single ended common source PA architecture. Wilkinson combiners are widely used in PA design and are known for their isolation and port matching. However, the implementation of such structures (that utilize quarter-wave length transmission lines) consumes a large area which lowers the combining efficiency. Transmission line based combiners were applied in [9] within a thirty-two way, three stage single ended common source structure. Although the purpose of introducing multi transmission line combining topology was to overcome the difficulty of realizing low characteristic impedance on chip by connecting several parallel high characteristic lines for impedance transformation for wideband matching , the overall topology still suffers from large area consumption (total die area of 2.047  $mm^2$ ) and requires multi-stage architecture to recover the combining losses. The design achieved a total PAE of 10% at 23.2 dBm saturated output power.

A general shift can be observed in the combining network topology used in mm-wave regimes to overcome the issues of low combining efficiency and area consumption. More and more transformers are being used on chip power combining and splitting applications. This is due to their compact size, wide-band operation and ability to provide impedance matching (depending on the turns ratio of the primary and secondary side). Moreover, transformers have become even more valuable for differential topologies including differentialto-single ended conversion and DC bias.

A number of PAs using a variety of transformer based combining architectures, including voltage, current and hybrid (voltage-current) combining structures, implemented in 65 nm and 90 nm bulk CMOS were analyzed. The distributive active transformer topology (DAT) [10] applied in [11] and [12] for voltage combining of individual unit PA cells at the secondary facilitates the impedance matching of low output impedance CMOS transistors as shown in Figure 2.5. However, both of these works show below 15% PAE for the same output power level due to combining losses. The bottle neck within such large scale combining architectures is the efficiency of the output combiner and requires design optimization and attention to improve the performance (by minimizing losses) of these passives. Unfortunately CMOS technology already suffers from low Q passives and lossy silicon substrate. A 60 GHz 130 nm SiGe-BiCMOS PA implemented in [13] addressed the issue of the insertion loss of the combiner, by utilizing a self shielded floating-compensated balun combiner. Common-base power stages were employed to extend the collector-emitter voltage of the BJT, and a transformer coupled, four way, three stage differential architecture was employed to provide power gain, as can be seen from the full schematic in Figure 2.6.

Experiments have also been made with 60 GHz PA designs in sub-nanometer bulk CMOS technology such as 40 nm and 28 nm nodes. A two way, three stage transformer coupled fully differential PA stage was implemented in 40 nm [14], with one unit PA cell dynamically controlled for low power and high power modes to improve back-off efficiency. Similarly, a 28 nm PA adopted a similar two way, three stage transformer coupled fully differential topology that included coplanar strip lines for output and interstage matching [15].



Figure 2.5: Entire 3-stage DAT PA topology [11]



Figure 2.6: Schematic of 3-stage transformer-coupled 4-way combining differential PA [13]

#### 2.3.2 45 nm CMOS SOI for Millimeter-Wave PAs

With the arrival of scaled technologies and advanced fabrication processes in CMOS, an increase in the use of SOI can be observed in the literature. The purpose of using a silicon on oxide process is to benefit from the reduced parasitics and superior performance of the FET device.

The unit cell power combining topology exploited for 60 GHz PAs has emerged as an important and popular technique for realizing high output power performances. An alternative technique to unit cell power combining, which has been commonly applied in 45 nm CMOS SOI technology over the past few years, is the FET-stacking technique. The stacking idea was proposed in 2003, as a novel device configuration for less complex PA designs and was demonstrated using gallium arsenide (GaAs) FETs [16]. A highvoltage/high-power device (HiVP) was introduced, by connecting several GaAs FETs in series and adding gate capacitors to adjust the drain impedance level seen by each stage to its optimum. The idea behind using the HiVP device was to capitalize on a higher supply voltage to increase the output power and increase the optimum output impedance to reduce the impedance transformation ratio to 50  $\Omega$ .

The work in [17] applied the FET-stacking technique in Q-band PA designs in 45 nm SOI. Performances for 2-,3- and 4-stack configurations were demonstrated to show the limits of stacking in terms of the number of stages and the overall performance achieved. Higher output power was achieved after moving from a 2-stack to a 4-stack configuration but there was a considerable trade-off in efficiency. The critical importance of inter-stack matching between the stages to the overall in-phase combining of the output voltage of the stages and the performance and sensitivity of various matching topologies were also highlighted in [17]. Apart from improving the performance of the stacked-FET based PA, an improvement in the reliability was also demonstrated through a reduction of gate to drain, gate to source and drain to source voltage swings made possible by stacking capacitors at the gates of the stacked-FETs. The stacked FET topology has also been extended to the 90 GHz frequency and the challenges encountered were described in [18]. With increasing frequency, the device parasitics are large and cause leakage, resulting in deviation of current gain from unity between the stages of the series connected stacked-FET. Work in [18] identified the parasitic gate resistance of the stacked-FET to be the dominant source of the losses and tried to mitigate it by adding a separate stage to drive the gates of the stacked-FETs, as shown in Figure 2.7. However, this technique requires optimal phase alignment at the gates of each FET, which may limit the bandwidth of the PA.

A different approach with in the FET-stacking architecture was used in [19]. An alternate design methodology was applied, where the widths, number of stages in the stack and topology of the transistor in each stage were optimized for optimum input and output impedances. This was done to reduce the impedance transformation ratio and obtain larger bandwidths. In [19], a four stacked structure utilizing two common source and two common gate structure is demonstrated, shown in Figure 2.8 to achieve a 1 dB bandwidth from 6 GHz to 26 GHz. Unlike traditional stacked-FET structures which utilize common



Figure 2.7: Schematic of 90 GHz multi-drive stacked-FET PA [18]

source and common gate-like unit cells, this work utilized combinations of common source and cascode cells to achieve higher gain and stability. Moreover, an input transformer based matching network was used to ease the impedance transformation from 50  $\Omega$  to the low source impedances of the transistors. Moving beyond conventional class PAs, a class E-like FET-stacked PA was proposed in [20] at 45 GHz. Although design of switching mode at mm-wave frequencies is challenging due to the high switching losses, this work reformulated the design conditions for class E, keeping in mind the losses, and set the resistance  $R_{on}$ , finite DC-feed inductance without constraints on zero-voltage switching and zero-derivative voltage, important principles of class E operation.

Unit cell power combining has also been utilized in 45 nm SOI. Work in [21] targeted watt level power output utilizing an eight-way combination of 4-stack unit cell PAs as discussed in [20]. The combiner was synthesized using a quarter-wave lumped L-C-L topology, shown in Figure 2.9, for a total output power of 27 dBm and a maximum PAE of 10.7%.

Spatial combining has also been demonstrated as a power combining technique for PAs at mm-wave in 45 nm CMOS SOI [22]. Spatial combining involves the summing of the power output from each until cell in free space rather than using an on or off chip passive combiner. The work in [22] used a 3-stage architecture with a single ended 4-stack primary driver, 3-stack single-ended secondary driver, and a 4-stack pseudo-differential power stage,



Figure 2.8: A 4-stacked PA comprising of two common source and two common gate cells [19]



Figure 2.9: Schematic of the 33-46 GHz watt-class PA array prototype [21]

feeding to four patch antennas (2 x 2 configuration) implemented off chip, shown in Figure 2.10. An overall output power of 28 dBm was obtained with a peak PAE of 13.5 % at 45 GHz.



Figure 2.10: System block diagram for PA utilizing spatial combining with 2 x 2 antenna array system [22]

#### 2.3.3 Literature Review Summary

A summary of the literature published on PAs for the 60 GHz spectrum is presented in Table 2.2. PAs implemented in 45 nm CMOS SOI for other mm-wave spectra (above 30 GHz) have been summarized in Table 2.3.

| Year | Topology        | Technology | Frequency/BW<br>(GHz) | S21<br>(dB) | OP-1dB<br>(dBm) | Psat<br>(dBm) | PAE(%) |
|------|-----------------|------------|-----------------------|-------------|-----------------|---------------|--------|
| 2009 | 3-stage, 4-way  | 90 nm      | 60/16(-1.5 dB         |             |                 |               |        |
| [11] | DAT combiner    | CMOS       | BW)                   | 26.6        | 14.5            | 18            | 12.2   |
|      | 2-stage, 4-way  |            |                       |             |                 |               |        |
|      | common          |            |                       |             |                 |               |        |
|      | Source,         |            |                       |             |                 |               |        |
|      | Wilkinson       |            |                       |             |                 |               |        |
| 2010 | output          | 90 nm      |                       |             |                 |               |        |
| [8]  | combine         | CMOS       | 60/8(-3dB BW)         | 20.3        | 18.2            | 19.9          | 14.2   |
|      | 3-stage, 2-way  |            |                       |             |                 |               |        |
|      | transformer     |            |                       |             |                 |               |        |
| 2010 | based output    | 65 nm      |                       |             |                 |               |        |
| [37] | combine         | CMOS       | 62/7(-3dB BW)         | 16          | 7               | 11.5          | 15.2   |
|      | 3-stage, 4-way  |            |                       |             |                 |               |        |
|      | differential,   | 130 nm     |                       |             |                 |               |        |
| 2012 | transformer     | SiGe       | 60/10(-3dB            |             |                 |               |        |
| [13] | coupled PA      | BICMOS     | BW)                   | 20.6        | 19.7            | 20.1          | 18     |
|      | 3-stage, 32-    |            |                       |             |                 |               |        |
|      | way, single     |            |                       |             |                 |               |        |
|      | ended,          |            |                       |             |                 |               |        |
|      | common          |            |                       |             |                 |               |        |
| 2012 | source IL       | 65         |                       |             |                 |               |        |
| 2013 | based output    | 65 nm      | 64/25(-30B            | 10.2        | 10.0            | 22.2          | 10     |
| [9]  | combine         | CIVIUS     | BVV)                  | 16.3        | 19.6            | 23.2          | 10     |
|      | 3-stage, 2-way  |            |                       |             |                 |               |        |
| 2012 | coupled         | 40 pm      |                       |             |                 |               |        |
| 2013 | differential DA | 40 nm      | DV/5.5(-30B           | 21.2        | 14              | 17 /          | 20 E   |
| 2012 |                 | 65 nm      |                       | 21.2        | 14              | 17.4          | 20.5   |
| [12] | combiner        | CMOS       | B(N/)                 | 1/1 3       | 11              | 16.6          | 19     |
| [12] | 3-stage 2-way   | CIVIOS     | 500                   | 14.5        | 11              | 10.0          | 4.5    |
|      | transformer     |            |                       |             |                 |               |        |
| 2014 | coupled         | 28 nm      | 60/11(-3dB            |             |                 |               |        |
| [15] | differential PA | CMOS       | BW)                   | 24.4        | 11 7            | 16 5          | 12.6   |
| [10] | 4-stage 8-way   | Civico     | 5117                  | 2           | 11.7            | 10.5          | 12.0   |
|      | DAT combiner    |            |                       |             |                 |               |        |
|      | with 2-wav      |            |                       |             |                 |               |        |
|      | zero degree     |            |                       |             |                 |               |        |
| 2014 | current         | 65 nm      |                       |             |                 |               |        |
| [36] | combiner        | CMOS       | 60/9(-3dB BW)         | 32.4        | 17.2            | 19.9          | 20     |

Table 2.2: Summary of Published 60 GHz PA Literature

| Year         | Topology                                                                    | Frequency/BW<br>(GHz)  | S21 (dB) | OP-1dB<br>(dBm) | Psat (dBm) | PAE(%) |
|--------------|-----------------------------------------------------------------------------|------------------------|----------|-----------------|------------|--------|
| 2013<br>[19] | 4-stacked PA<br>comprising of 2<br>common<br>source and 2<br>cascode stages | 18/20 (-1 dB BW)       | 6        | 22.5            | 26.1       | 11     |
|              |                                                                             | 46 (2 stacked)         | 9.4      |                 | 15.9       | 32.7   |
| 2013<br>[17] | 2,3 and 4-<br>stacked                                                       | 46 (3 stacked)         | 9.4      |                 | 19.8       | 26.3   |
|              |                                                                             | 41 (4 stacked)         | 8.9      |                 | 21.6       | 25.1   |
| 2014<br>[18] | 2 stage, 3-<br>stacked                                                      | 91/15 (-3 dB BW)       | 12.4     | 16              | 19.2       | 14     |
|              | 2-stacked, 4-                                                               | 47/27 (2 stacked)      | 13       | 12              | 17.6       | 34.6   |
| 2014<br>[19] | stacked and 2<br>+ 4-stacked,                                               | 47.5/19 (4<br>stacked) | 12.8     | 14              | 20.3       | 19.4   |
|              | two stage                                                                   | 47/10 (2+4<br>stacked) | 24.9     | 15              | 20.1       | 15.4   |
| 2015<br>[21] | 4-stacked 2-<br>stage, 8-way<br>power<br>combined                           | 40/13                  | 19.4     | 21              | 27.2       | 10.7   |
| 2015<br>[22] | 4-way spatial combining                                                     | 45                     | 17       |                 | 28         | 13.5   |

Table 2.3: Summary of Published 45 nm CMOS SOI PA Literature

### Chapter 3

# Analysis of 45 nm CMOS SOI Active and Passive Components

In this chapter, an overview of the CMOS SOI technology will be presented. The floating body feature of the device is exploited and laid out for optimal mm-wave performance in PAs. In order to achieve the optimal performance, different layouts will be discussed and their merits compared in terms of parasitics,  $f_t$  and  $f_{max}$  performance. Moreover, passive components are also discussed, in particular CPWs and various topologies are analyzed with a view to implementation as matching networks in PA designs.

### 3.1 Overview of CMOS SOI Technology

The introduction of SOI substrates was made in an effort to minimize device parasitics for lower power and high performance applications. The presence of a buried oxide layer (BOX) under the silicon substrate reduces the junction capacitances, primarily the source to bulk and drain to bulk capacitances Csb and Cdb. Moreover, the BOX layer prevents the extension of depletion layers thus mitigating short channel effects like drain induced barrier lowering, punchthrough, and source to drain junction capacitances, Cds. However, SOI technology has its own set of drawbacks, including visible kink effect in DC-IV characteristics [23] and thermal dissipation.

The designs described in this work use the partially depleted IBM 45 nm CMOS SOI technology. The partially depleted transistor body is enclosed inside a 225 nm BOX layer which isolates the body of the transistor from the silicon substrate. The substrate resistivity

of the silicon bulk is 13.5  $\Omega$ -cm. Figure 3.1 shows the metal stackup of the 45 nm SOI process.



Figure 3.1: IBM 45 nm CMOS SOI stackup

### 3.2 Active Device Layout Analysis for Millimeter Wave Power Amplifiers

Sub-nanometer scaling of active devices has made CMOS an attractive platform for PA development, given its high  $f_t - f_{max}$  characteristics at mm-wave frequencies. However, device and interconnect parasitics impose limits on the output power and efficiency that can be achieved from nano-scale transistors at 60 GHz. Transistor layout optimization is critical in PA design in order to mitigate the effects of unwanted parasitics primarily, gate-to-drain capacitance (Cds), gate-to-source capacitance (Cgs), drain-to-source capacitance (Cds), series drain resistance (Rd), gate resistance (Rg) and degenerative source resistance (Rs). Optimization and topology experimentation with the device layout are required to

minimize these parasitics. In the context of PA design, power dissipation of FET devices can lead to thermal dissipation. This poses an important concern to the reliability and long time performance of PAs using nano-scale CMOS technology.

Several layout topologies pertinent to amplifier design have been discussed in the literature, such as round-table [24] and grid layout, that are thought to improve performance at mm-wave frequencies. In this thesis, three device layouts: 1) manifold 2) grid and 3) round-table layout are considered. All of these layouts require the division of one large power cell into multiple unit transistor cells with the output current combining at the global drain junction. As each of the transistor topologies is based on a unique arrangement of unit cells, the unit cell size must also be analyzed and purposefully selected for optimum mm-wave performance.

#### 3.2.1 Unit Cell Analysis

The unit cell constitutes the basic building block of the overall mm-wave PA transistor. Hence the optimization of the unit cell is critical to the overall performance of the transistor. In order to maximize the performance of a device through layout optimization, some key figures of merits are considered. As the goal is to reduce the unwanted capacitive and resistive parasitics of the device, the values of these parasitics can be extracted and measured. The short circuit unity current gain frequency,  $f_t$  (extrapolated from  $h_{21}$  of the device) provides an overall representation of the intrinsic device characteristic with the parasitics expressed as [25],

$$f_t = \frac{g_m}{2\pi(Cgs + Cgd)} \tag{3.1}$$

However, in PA design, the power gain of the device is more critical than the current gain and for this reason, the maximum unilateral gain cut-off frequency,  $f_{max}$ , is more valid to use to show the performance of the transistor. Unlike  $f_t$ , the  $f_{max}$  also accounts for the parasitic resistances of the device, and hence is sensitive to parasitic losses in the layout, expressed as [25],

$$f_{max} = \sqrt{\frac{f_t}{8\pi RgCgd}} \tag{3.2}$$

Figure 3.2 shows the implementation of a 30  $\mu$ m unit cell consisting of 30 x 1  $\mu$ m gate fingers. An inner gate ring (metals M1 to M3) is designed to provide double gate contacts
which theoretically reduces the series gate resistance by a factor of four, which helps to increase the  $f_{max}$  of the transistor. Similar to the gate connection, the source connection is constructed using an outer ring metal layer in C1 to reduce the source resistance of the transistor, minimizing degeneration of transconductance and gain of the transistor. However, the reduction in series source resistance, Rs is followed by an increase in Cgs, as both of the gate rings are spaced out by only 0.115  $\mu$ m, Figure 3.1. Any overlap between the rings is minimized to prevent any significant increase in the parasitic capacitance. Another reason for using the ring structure is to allow for easy access to ground around the source of the transistor (common source configuration) resulting in good grounding and lower ground impedance. The drain line, metal UB, is routed on top of the transistor to simplify the routing. The B3 to UA metal layers in the stack are used to construct the drain line to minimize the overlap capacitance, Cgd, between the polysilicon gate fingers and the overall gate ring implemented in M3.



Figure 3.2: Optimized 30  $\mu$ m unit cell layout

IBM's 45 nm SOI technology offers two different gate poly pitches for the floating SOI body 190 nm, called the regular pitch, and 380 nm, called the relaxed pitch. The two types of transistors show different parasitic performance and are compared in the analysis below. Figure 3.3 shows the simulated  $f_t$  and  $f_{max}$  curves of various unit regular pitch floating body cell sizes implemented in the same double ring layout as described previously. It can be seen from (3.1),  $f_t$  does not scale with the width, but gm and capacitances Cgs and Cgd do scale proportionally with the width. However, using RC extracted models of the laid out regular pitch transistor, variation in  $f_t$  can be observed (see Figure 3.3b). Up to a 15% increase in  $f_t$  is observed at a current density of 0.4 mA/ $\mu$ m as the width is increased from 10  $\mu$ m to 60  $\mu$ m. The opposite trend is observed in  $f_{max}$  and a variation of 30% is observed as the width is increased from 10  $\mu$ m to 50  $\mu$ m. In the case for  $f_{max}$ , the gate resistance Rg scales down and  $f_t$  scales up with increasing width, and a significant increase in Cgd breaks down the  $f_{max}$  due to scaling up of the device as seen from the relationship in (3.2).

For these reasons, the selection of the unit cell size cannot be limited to the smallest

and largest sizes. Although larger sizes show a higher  $f_t$  for the bias current density range of interest, smaller sizes show higher  $f_{max}$ . Compared to  $f_t$ ,  $f_{max}$  represents the power gain operating frequency range and therefore, is a more valid metric to show the limits of the activity of a transistor for PA design. However, using small unit sizes would increase the number of unit cells required to meet the overall width requirement of the transistor, (maximum width of 180  $\mu$ m used in the thesis). The drawback to using a large number of unit cells for the the manifold topology, for example, is that the length of the manifold line increases the drain and other routing line inductances. A unit cell size of 30 x 1  $\mu$ m was selected as a compromise between optimal device parasitics,  $f_t$ ,  $f_{max}$  and compact layout size.



Figure 3.3: Simulated (a)  $f_{max}$  and (b)  $f_t$  of regular pitch floating body transistors of various widths

#### 3.2.2 Manifold Layout

The concept of a manifold is derived from the arrangement of the high-power RF transistors. A typical high power RF transistor die has a large periphery, and the signals to the gate and drain electrodes of the transistor are fed using large manifold structures as shown in Figure 3.4. The manifold topology of the transistor (shown in Figure 3.5 utilizes



Figure 3.4: View of a high power LDMOS transistor with lid removed [26]

the 30  $\mu$ m unit cell discussed in the previous section. The source ring of each unit cell is overlapped with the adjacent unit cell to form a large source connection. Moreover, the drain and gate manifolds are laid out to form a global connection to the individual unit cell terminals. The manifold topology works on the principle of current combining at the output drain manifold through each unit finger. Figure 3.5 shows the layout for a 180  $\mu$ m transistor utilizing the 6 x 30  $\mu$ m unit cell in a manifold topology. For a common source configuration, a slotted ground plane is constructed from M1 to C2 to provide good grounding to the source. Perforations are used to meet the metal density rules for the process.

#### 3.2.3 Grid Layout

Similar to the manifold layout, the grid layout is based on combining unit cells to form a large power transistor. In the grid layout, unit cells are arranged in a square pattern with



Figure 3.5: Layout view of a 180  $\mu$ m manifold transistor

source and drain traces connected globally at the drain and source nodes source in a grid pattern as shown in Figure 3.6.

Unlike the double ring unit cell structure, discussed in Section 3.2.1, the grid layout uses a different unit cell design with the goal of optimizing parasitic overlap capacitance. One approach to minimizing the parasitic capacitance is to use stair case via and metal topology as shown in Figure 3.7 [27]. The staircase structure allows for minimal overlap capacitance between the source and the drain by arranging the metal layers and vias far apart, thus minimizing the side coupling capacitance. Moreover, the gate is routed as a mesh in the lowest metal layers (in this work, M1 was used) to reduce gate resistance and increase  $f_{max}$ . Unlike a single manifold structure, the grid layout utilizes multipath drain and source routings their respective global connections. The trace widths of the multipath routings are kept small to minimize the overlap capacitance.



Figure 3.6: Layout view of a 4x4 grid layout (left) utilizing 15 x 2  $\mu$ m unit cell (right)



Figure 3.7: Staircase structure for source and drain, implemented for grid topology [27]

#### 3.2.4 Round-Table Layout

The round-table layout first proposed in [24] attempts to improve  $f_{max}$  performance by up to 20%. As in the two topologies discussed above, the round-table layout also utilizes a circular combined unit cell architecture. The structure uses double contacts between the unit cells and multi-path connections between the source and drain cells. Figure 3.8 illustrates the round table topology [28].



Figure 3.8: Illustration of the round table layout concept [28]

Although the round-table has been shown to provide significant performance gains at 90 nm CMOS, applying this topology on 45 nm CMOS SOI requires a change in the orientation of the transistor unit cell. As only one gate orientation is permitted within the design rule guidelines for this technology, the decision was made not to implement, analyze or further investigate the round-table topology within this thesis.

#### 3.2.5 Overall Performance Summary

In order to make comparisons between the layouts, the parasitic capacitances and resistances are presented below. The parasitic parameters were extracted using the ColdFET approach on the RC extracted BSIMSOI transistor model. The ColdFET technique in [29] was initially applied to FET devices with package parasitics such as pad capacitances and bond wire inductances. In the case of on wafer, die transistors such package parasitics do not exist and the layout dependent parasitics are part of the intrinsic region of the transistor. The equivalent ColdFET model used for the parasitic extraction is shown in Figure 3.9. When a transistor is off, it behaves like a passive network that can be characterized by S-parameters. It is important to note that, since RC extracted transistor models have been used in the analysis, the parasitic inductances (Lg, Ld and Ls) are not modelled (finding those parameters would require electromagnetic (EM) simulation). Nevertheless it was felt that sufficient parasitic values had been extracted for the purpose of comparing the layout topologies of interest.



Figure 3.9: Equivalent ColdFET model for parasitic extraction

From [29], the parasitic capacitances were extracted using the equations below,

$$Cgs = \frac{\Im(y_{11} + y_{12})}{\omega} \tag{3.3}$$

$$Cgd = -\frac{\Im(y_{12})}{\omega} \tag{3.4}$$

$$Cds = \frac{\Im(y_{22} + y_{21})}{\omega} \tag{3.5}$$

The resistances are given as,

$$Rg = \Re(Z_{11} - Z_{12}) \tag{3.6}$$

$$Rg = \Re(Z_{22} - Z_{12}) \tag{3.7}$$

$$Rs = \Re(Z_{12}) \tag{3.8}$$

From Table 3.1, it can be clearly seen that the parasitic capacitance values and  $f_t$  and  $f_{max}$  performances vary among the different manifold and grid layout topologies for a large power cell of 480  $\mu$ m. Three different unit cell sizes were experimented with to obtain the optimal performance. The topology consisting of an 8 x 6 grid with 10  $\mu$ m unit cell size presents the lowest parasitic performance and highest  $f_t$  and  $f_{max}$  performances. On the other hand, the manifold topology with relaxed gate polysilicon pitch provides superior performance over the regular pitch manifold topology, primarily due to a 42% decrease in Cds due to the increase in drain to source pitch with the poly pitch However, this increased performance comes at the cost of increased area (50% greater area required).

| Unit Cell        | Manifold,          | Manifold,          | Grid,              | Grid,              | Grid,              |  |
|------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--|
| Combine          | regular            | relaxed            | regular            | regular            | regular            |  |
| Topology         |                    |                    | 4x4 layout         | 4x4 layout         | 8x6 layout         |  |
| Unit Cell        | $30 \ge 1 \ \mu m$ | $30 \ge 1 \ \mu m$ | $30 \ge 1 \ \mu m$ | $15 \ge 2 \ \mu m$ | $10 \ge 1 \ \mu m$ |  |
| Size $(\mu m)$   |                    |                    |                    |                    |                    |  |
| No. of           | 16                 | 16                 | 16                 | 16                 | 48                 |  |
| Units            |                    |                    |                    |                    |                    |  |
| Peak $f_t$       | 244                | 215                | 207                | 207                | 257                |  |
| (GHz)            |                    |                    |                    |                    |                    |  |
| Peak $f_{max}$   | 243                | 267                | 177                | 139                | 257                |  |
| (GHz)            |                    |                    |                    |                    |                    |  |
| $C_{gs}$ (fF)    | 217                | 206                | 198                | 192                | 193                |  |
| $C_{ds}$ (fF)    | 203                | 118                | 102                | 100                | 111                |  |
| $C_{gd}$ (fF)    | 206                | 186                | 200                | 195                | 185                |  |
| $R_g(\Omega)$    | 0.8                | 0.74               | 1                  | 1.45               | 0.71               |  |
| $R_s(\Omega)$    | 1.3                | 1                  | 1.2                | 0.8                | 0.85               |  |
| $R_d (\Omega)$   | 0.05               | 0.05               | 0.3                | 0.6                | 0.1                |  |
| Area $(\mu m^2)$ | 45x11              | 48x23              | 27x9               | 16x13              | 23x11              |  |

Table 3.1: Parasitic Performance Summary of the Manifold and Grid Layout Topologies

Overall, the  $8 \ge 6$  grid topology minimizes the parasitics and provides the best performance in comparison to the relaxed pitch manifold topology. However, for RF power transistors, self-heating and thermal runway under non-pulsed DC and continuous wave (CW) signals poses an issue which also needs to be considered during layout topology selection. The BOX layer under the channel provides a poor heat conducting layer which poses another concern for large signal CW operation. With a more widely spaced topology, such as the relaxed pitch manifold, the self-heating effects are minimized. However, a more thorough analysis using elctro-thermal simulation of the transistor would be required to understand the thermal behaviour of the layout topologies and that is, not within the scope of this thesis work. Although the overall size of the relaxed pitch manifold is 4.4 times the size of the 8 x 6 grid, the area occupied by active devices is significantly smaller than the chip area consumed by on chip passives devices.

The 45 nm SOI process also offers body-contacted transistors that have some advantages over the floating-body transistors. Firstly, the kink effect, which introduces DC-IV nonlinearity and is detrimental to the overall linearity of the PA, is not observed in the DC-IV characteristic of the body-contacted transistor. Secondly, the body-contacted devices have higher output impedance and intrinsic gain in comparison to floating-body devices. However, all of these benefits come on the cost of increased device parasitics and lower  $f_t$ and  $f_{max}$  performance, which was given preference for PA design at 60 GHz.

## 3.3 Passives Analysis

This section discusses the implementation of passives relative to the design of a mm-wave PA. Most of the discussion is based on CPWs and substrate shielding to synthesize high quality and low loss CPWs. For mm-wave designs, frequency dependent resistive losses in the conductor (skin effect and proximity effect) and substrate capacitive coupling are more prominent than in lower frequency designs. Deep submicron technology, like the IBM 45 nm SOI offers 11 metal layers and the thick top most layers can be used to design passives, while the bottom layers can be utilized for shielding.

CPW are often preferred over conventional microstrip transmission lines at mm-wave frequencies, due to their improved shielding, isolation and reduced cross talk between signal conductors (due to adjacent ground planes). Figure 3.10a shows a generalized unshielded CPW structure and its equivalent lumped electrical model 3.10b [30].

The series resistance  $R_s(f)$  represents the frequency dependent losses of the conductor primarily due to skin and proximity effects. Capacitor  $C_{ox}$  provides path for the current to flow between the signal line and substrate, where the substrate losses is modeled by resistor  $R_{si}$ . The capacitor  $C_{si}$  is added to augment the behavior of Quasi-TEM mode for higher frequencies [30]. Capacitor  $C_{sg}$  models the capacitance between the signal line and the adjacent ground planes of the CPW structure.



Figure 3.10: (a) CPW cross section on silicon substrate and (b) single  $\pi$ -section lumped element circuit model of transmission line on silicon substrate

#### 3.3.1 Substrate Shielding

The application of CMOS technology for mm-wave design poses some significant challenges. As CMOS suffers from low optimum load line resistance  $R_{opt}$ , wide band matching to  $R_L$ , i.e., 50  $\Omega$  is difficult as the impedance transformation ratio (ITR) increases, expressed as,

$$ITR = \frac{R_{opt}}{R_L} \tag{3.9}$$

Not only does the ITR impose a bandwidth limitation for matching, but it also affects the insertion loss of the matching network. Q factor on the other hand is crucial in reducing the losses. The insertion loss from an L-section (comprised of L and C) of a matching network can be given by [31],

Insertion Loss = 
$$1 + \frac{\sqrt{1/ITR - 1}}{Q_L}$$
 (3.10)

Where the Q factor of the inductor is denoted by  $Q_L$ . Equation 3.10 assumes that the Q factor of the capacitor is greater than  $Q_L$ . Hence, if a CPW section is used to synthesize an inductor, the losses must be minimized to improve  $Q_L$  thus reducing losses in the matching network. This is imperative, given the already poor ITR performance of CMOS transistors.

The substrate losses present the biggest constraint on achieving the best Q factor (as shown in Figure 3.10b) possible and must be minimized. Previous works have implemented various shielding methods such as floating metals and various patterned shields for inductors and transmission lines. In the next sections, various CPW topologies will be analyzed and compared to determine which are best to use to obtain high quality transmission lines.

#### 3.3.2 Slow-wave CPW and Grounded Shielded CPW

One major advantage of using CPWs over microstrip lines in PA design is the flexibility to use a wider signal path. For microstrip lines, the capacitance of the transmission line is dictated by the physical gap between the signal and ground, only a few microns in the case of silicon monolithic microwave integrated circuits (MMICs). Hence , synthesizing a 50  $\Omega$ microstrip line will result in a narrower signal path which might not be suitable for large current PA operation (due to elctro-migration concerns). In the case of CPWs, the parasitic capacitance required to synthesize a 50  $\Omega$  impedance is dictated by the grounds adjacent to the signal line [32]. This gap can be increased to reduce the capacitance, allowing for the implementation of wider signal conductor lines to meet large current limit requirements, ideal for PA applications. Various CPW topologies were implemented and EM simulated in Momentum to study the impacts of various shield types and metal layers on the physical length and Q factor. The CPW topologies investigated were standard unshielded, grounded shielded (GS-CPW), slow wave (SW-CPW), fully shielded (FS-CPW) and top shielded (TS-CPW).

First proposed by Seki and Hasegawa, the SW-CPW architecture was meant to reduce the dimension of the line and to reduce the signal speed [33]. As shown in Figure 3.11a, the SW-CPW is constructed by adding perpendicular floating metal strips underneath the signal and ground planes of the CPW line. This technique, increases the distributed capacitance and inductance of the line simultaneously. Hence the phase velocity  $v_p$  and wavelength  $\lambda$  both decrease as they are inversely proportional to the square root of the line inductance and capacitance (LC) product, as expressed below.

$$v_p \propto \lambda \propto \frac{1}{\sqrt{LC}}$$
 (3.11)

The equation highlights that a smaller wavelength requires smaller lengths to synthesize a fixed characteristic impedance. A grounded version of the SW-CPW was also analyzed for comparison. The GS-CPW topology is similar to SW-CPW topology, except the floating shield is connected to the top grounds through vias. Figure 3.11 shows the three dimensional topology of both the SW-CPW and GS-CPW.

## 3.3.3 Fully Shielded and Top Shielded CPWs

The FS-CPW topology utilizes grounded metal conductor layers above and below the signal line of the waveguide to form a cage-like structure. In other words, the CPW line is shielded



Figure 3.11: Three-dimensional view of (a) SW-CPW and (b) GS-CPW transmission line

at the top, bottom and side walls as depicted in Figure 3.12a below. On the other hand, a top and side wall shield configuration is shown in Figure 3.12b. Neither configuration provides superior performance in reducing the mutual coupling between parallel CPW lines when compared to the SW-CPW (will be shown in the next section).



Figure 3.12: Cross-section of (a) fully shielded and (b) top shielded transmission lines

#### **3.3.4** Performance Summary

For a fair comparison of the implemented CPW lines, all of the topologies under investigation were EM simulated with a constant line length of 100  $\mu$ m. The signal width, W and signal to ground gap, G were set to synthesize a 50  $\Omega$  characteristic impedance. The dimension, along with the attenuation and Q factor were extracted to compare as a figure of merit. The Q-factor of a transmission line is expressed as,

$$Q = \frac{\beta}{2\alpha} \tag{3.12}$$

where  $\beta$  is the phase delay in radians/m, and  $\alpha$  specifies the attenuation per unit length in Nepers/m.

From Figure 3.13 it is clear that the shielded topologies provide an increase in Q by at least 1, at 60 GHz, when compared to the unshielded CPW line. The SW-CPW line implemented with a shield in the B3 layer provides the highest Q factor of 13. The GS-CPW shows a Q factor, 0.5 lower than the SW-CPW. This can be explained by the doubling of the effective dielectric thickness between the LB layer and the B3 shield (for the SW-configuration), as the electric field is coupled between the conductor, shield and back to the CPW grounds essentially doubling the dielectric thickness. In the case of the GS-CPW, the dielectric thickness does not double, and the line capacitance increases [32]. There are also losses due to the fluctuations in the ground (it lacks a true ground), causing the shield to fluctuate with the ground.

Other topologies such as the FS-CPW and TS-CPW suffer from considerable losses based on the simulated Q factor at 60 GHz as shown in Figure 3.14. The FS-CPW topology suffers the most losses as the signal line suffers from increased capacitance from both the top and bottom shield layers. Moreover, the high line capacitance of the FS-CPW makes it impractical to be used for PA applications as a 50  $\Omega$  line can only be realized with a narrow width signal line and large gap as shown in Table 3.2.

In terms of the SW-CPW topology, the loss is sensitive to the slot arrangement underneath the CPW line. The slots of the for SW-CPW topology, the loss is sensitive to the slots arrangements underneath the CPW line. The slots of the floating shield helps reduce the induced current flow due to smaller cross sectional area and magnetic flux linkage. Hence, a minimum slot length, SL, and slot spacing, SS, is ideal for lower losses, as depicted in Figure 3.15. This is confirmed by EM simulations of an SW-CPW implemented in the LB layer with various slot dimension implemented in B3 layer. A slight increase in the Q factor and less loss is observed as the shield is changed from a coarse implementation



Figure 3.13: Simulation results for (a) Q factor, (b) attenuation per mm of length and (c) phase constant (Beta) in radians per mm for SW-CPW, GS-CPW and unshielded CPW (as a reference), with  $SL = 2 \ \mu m$  and  $SS = 2 \ \mu m$ 

 $(SL = 3 \ \mu m \text{ and } SS = 3 \ \mu m)$  to a more finely slotted shield  $(SL = 1 \ \mu m \text{ and } SS = 1 \ \mu m)$ . It seems that, a fine shield is ideal to benefit from the increased Q factor of the CPW line.



Figure 3.14: Simulation results for (a) Q factor, (b) attenuation per mm of length and (c) phase constant (Beta) in radians per mm for FS-CPW, TS-CPW and unshielded CPW (as a reference)



Figure 3.15: Illustration of slot dimensions including slot length and slot spacing for SW-CPW and GS-CPW topologies

Table 3.2: Dimension and Layer Details for Various CPW Lines Analyzed for Fixed Line length of 100  $\mu m$  and 50  $\Omega$  Characteristic Impedance

| CPW               | Signal | Тор    | Bottom | Signal           | Signal to       | Ground             | SL/                |
|-------------------|--------|--------|--------|------------------|-----------------|--------------------|--------------------|
| Topology          | Layer  | Shield | Shield | Conductor        | Ground          | Width              | $\mathbf{SS}$      |
|                   |        | Layer  | Layer  | Width ( $\mu$ m) | gap, $G(\mu m)$ | $(\mu \mathbf{m})$ | $(\mu \mathbf{m})$ |
| Unshielded<br>CPW | LB     | _      | _      | 15               | 5.5             | 30                 | _                  |
| SW-CPW            | LB     | _      | B3     | 13               | 7               | 30                 | 2/2                |
| SW-CPW            | LB     | _      | M3     | 13.5             | 7               | 30                 | 2/2                |
| GS-CPW            | LB     | _      | B3     | 13               | 7               | 30                 | 2/2                |
| GS-CPW            | LB     | _      | M3     | 13.5             | 7               | 30                 | 2/2                |
| FS-CPW            | UA     | LB     | B1     | 4                | 40              | 10                 | _                  |
| TS-CPW            | UA     | LB     | _      | 6                | 12.7            | 20                 | _                  |



Figure 3.16: Simulation results for (a) Q factor and (b) attenuation per mm of length for various shield lengths and shield spacings

# Chapter 4

# Millimeter-wave Cascode and Stacked-FET Power Amplifier

FET stacking can achieve higher output power than commonly applied techniques (e.g. output power combining), by providing reliable operation at higher voltages (above the breakdown voltage of a single FET). However, FET stacking sacrifices large signal gain performance, and so cascode topologies become more attractive for high gain and low power applications. This chapter will introduce the concepts of cascode and FET stacking and discuss the design and performance of PAs based on these topologies.

# 4.1 Cascode and FET Stacking for Millimeter-Wave Power Transistor

It is very challenging to design power transistors in submicron CMOS technologies that achieve high output power, gain and stability ensuring reliability throughout the input range and over their operational time. For both cascode and FET stacking model, the non-ideal behaviour and other issues they face arise from device parasitics encountered at mm-wave frequencies. This section discusses the core differences between the conventional cascode cell and FET-stacking topologies.

#### 4.1.1 Cascode Cell Analysis

Cascode topology is a commonly applied topology in analog circuit design. The advantages of using cascode transistors include high drain to source supply voltages which enable the use of a large voltage supply (for high power applications) and increased gain and reduced reverse gain  $S_{12}$  which results in improved isolation. However, the interstage node in the cascode structure poses concerns at mm-wave frequencies due to its parasitic capacitance as shown in Figure 4.1. The interstage parasitic capacitance causes leakage of the current to the gate source capacitance ( $Cgs_2$ ) of the common gate and to the capacitances present at the drain of the common source stage. Thus, at high frequency the small signal current reduces and impacts the small signal gain. Moreover, the out of phase combining of drain voltages for each stage also causes efficiency degradation as the impedance seen by the drain of each transistor is reactive at high frequencies.



Figure 4.1: Cascode transistor topology with associated interstage node parasitics

Apart from the above mentioned advantages of using a cascode topology for high gain and isolation, the stability of the cascode approach needs to be given special attention. Although, cascode cells show more unilateral behavior than common source transistors (due to increased isolation), their stability is concerning if the interstage node parasitics are no compensated for during design. Figure 4.2 compares the MAG of a common source and cascode transistor of 30um unit cell in CMOS 45 nm SOI technology. Instability in the cascode topology arises from the capacitive degeneration [34] of the common gate stage from parasitic interstage node capacitances, shown in figure 4.1. This results in a negative resistance at the input of the common gate as demonstrated in (4.4).

Looking at Figure 4.3, the input impedance looking into the common gate transistor M2 can be expressed as,



Figure 4.2: Maximum available gain of a 30 um common source and cascode transistor in 45 nm SOI technology

$$Z_{IN} = \frac{1}{j\omega Cgs_2} + (\beta(j\omega) + 1)\frac{1}{j\omega Cds_1}$$

$$\tag{4.1}$$

Where  $\beta(j\omega)$  is the short circuit current gain as a function of frequency and is given as,

$$\beta(j\omega) = \frac{\omega_t}{j\omega} \tag{4.2}$$

where  $\omega_t$  is the cut off frequency of the transistor (unity short circuit current gain). Hence (4.1) can be simplified to,

$$Z_{IN} = \frac{1}{j\omega Cgs_2} + \frac{1}{j\omega Cds_1} - \frac{\omega_t}{\omega^2 Cds_1}$$

$$\tag{4.3}$$

$$R_{IN} = \Re\{Z_{IN}\} = -\frac{\omega_t}{\omega^2 C ds_1} \tag{4.4}$$

The negative impedance  $R_{IN}$  shown in 4.4 needs to be compensated for to avoid instability.



Figure 4.3: Common gate circuit used for stability analysis

#### 4.1.2 FET-Stacking Analysis

FET stacking can be viewed as a modified cascode cell topology. In FET stacking, the common gate stage is not grounded at the frequency of operation, but is connected to the finite impedance through a stack capacitor. The introduction of this stack capacitor creates a finite impedance resulting in a finite RF swing at the gate. The voltage division of the capacitive network formed by the parasitic capacitances (primarily Cgs and Cgd) and stacked capacitor at each stage determines the gate voltage swing for each stage respectively as shown in Figure 4.4b. The benefit of introducing a finite ac swing at the gate of each stage is to reduce the drain to gate and drain to source voltages. This helps to protect nano-scale devices from stress and breakdown primarily gate oxide breakdown and hot carrier degradation. Moreover, these breakdown mechanisms are more pronounced when the nano-scale transistors are operated under large signal conditions under high gate to source, gate to drain and drain to source voltages. In addition, when FET stacking is applied to three or more transistors, the top transistors are required to withstand higher voltage swings, as the voltage supply scales with the number of transistors in the stack, making use of stack capacitors essential. Hence a FET-stacking approach allows keeping the transistor swings in check for large input drives more effectively than is possible in cascode arrangement, by utilizing the stack capacitor and inter-stack matching network (as discussed in the later sections).

As mentioned previously, PAs designed with scaled CMOS devices suffer from low  $R_{opt}$  due to the low breakdown voltage limits of the transistors. Moreover, the knee voltage, Vknee, contributes significantly to the overall drain swing, limiting the linear operation range of the PA. An added advantage of using cascode and stacked topologies is the increase in  $R_{opt}$  under constant drain current. For a general FET-stacking topology with K stages,  $R_{opt}$  can be shown to scale linearly under a constant drain current. The ideal scaling (no



Figure 4.4: (a) Cascode topology and (b) 2-stack topologies depicting voltage swings (red) at each node

losses) of the load-line impedance and supply voltage by a factor of K, improves the output power by a factor of K as shown below:

$$R_{opt} = \frac{(Vmax - Vknee)/2}{Imax/2} \tag{4.5}$$

$$Pout = \frac{1}{2} \frac{(Vmax - Vknee) * Imax}{4}$$

$$\tag{4.6}$$

$$R_{opt,k}(K^{th}stage) = \frac{K * (Vmax - Vknee)/2}{Imax/2} = K * R_{opt}$$

$$\tag{4.7}$$

$$Pout_k(Kstages) = \frac{1}{2} \frac{K * (Vmax - Vknee) * Imax}{4} = K * Pout$$
(4.8)

Another tecnique discussed in the literature on stacking topology for mm-wave applications [17], is the constant load line stacking technique. This technique involves scaling the current at each stage, as the voltage is scaled as the number of stages increases, to keep  $R_{opt}$  constant. The advantage of this technique is that there is an increase in total output power by a factor of  $K^2$  as shown below,

$$Pout_k(Kstages) = \frac{1}{2} \frac{K * (Vmax - Vknee) * K * Imax}{4} = K^2 * Pout$$
(4.9)

The constant load-line technique was not applied to the stacked-FET designs presented in this thesis, as current scaling through the stack requires transistor widths to be scaled for a fixed gate length of the device. The use of very large device sizes should be avoided due to the associated parasitics at millimeter-wave frequencies which impact the gain of the FET-stacking topology severely.

The value of stacked capacitance not only plays an important role in controlling the swing at the gate, but also affects the impedance seen by the drain of the preceding stage in the stack. Hence the stack capacitance of the  $k^{th}$  transistor in the stack,  $Cstack_k$ , can be utilized to set the output impedance of the preceding stage, i.e. the  $(k-1)^{th}$  transistor in stack, to its optimal resistance  $R_{opt}$ . Ideally, every stage in the stack should observe a loading of  $(k-1)R_{opt}$  to achieve its maximum and undistorted drain voltage swing, where k is the level of the transistor in the stack as shown in figure 4.5a.

Analyzing the small signal model of the  $k^{th}$  stacked transistor in Figure 4.5b, the functionality of the stacking capacitors can be shown to control the Vgs and Vdg swings. This can be expressed as the formation of a capacitive divider network between  $C_k$ ,  $Cgs_k$ ,  $Cgd_k$ , derived to be,

$$Vgs_k = \frac{Cgd_k - (k-1)Cstack_k}{Cgs_k + Cstack_k + Cgd_k}Vds$$
(4.10)

$$Vgd_k = -\frac{kCstack_k + Cgs_k}{Cgs_k + Cstack_k + Cgd_k}Vds$$

$$\tag{4.11}$$

The derivations in (4.10) and (4.11) are based on the assumption that all the stages experience the same Vds swing i.e.  $Vds_{k-1} = Vds_k$ . This can be ensured by providing the optimum output impedance to each stage, which is further discussed in this section.

In order to analyze the output impedance at the load of the  $(k-1)^{th}$  stage,  $Zout_{k-1}$ , an equivalent small signal model of the stacked transistor of the  $k^{th}$  stage is used for analysis as shown in Figure 4.5b. It is important to note that although Cds is not negligible for scaled MOSFETs operating at high frequency, effect of Cds is omitted in the analysis as it is small in comparison to Cgs and Cgd.

By applying a test voltage,  $V_t$  and measuring the test current,  $I_t$  at node S (source of the transistor) the impedance  $Zout_{k-1}$  can be calculated through the ratio  $V_t$  and  $I_t$  respectively.  $V_t$  can be expressed as:

$$V_t = Vg - Vgs = \frac{i1 - i2}{sCstack_k} - \frac{i2}{sCgs_k}$$

$$\tag{4.12}$$

$$I_t = -i2 - gm_k V_{gs} = -i2 - \frac{i2}{sCgs_k}$$
(4.13)



Figure 4.5: (a) Generalized stacked topology with capacitive parasitic for k stages and (b) a small signal model of  $k^{th}$  stacked transistor

$$\frac{V_t}{I_t} = Zout_{k-1} = -\frac{i1sCgs_k - i2(Cgs_k + sCstack_k)}{-i2(sCgs_k + gm_k)}$$
(4.14)

Current flowing through capacitor Cgd, *i1* can be written as:

$$-i1 = \frac{gm_k i2}{sCgs_k} + \frac{Vout}{Z_L} = \frac{gm_k i2}{sCgs_k} + \frac{\frac{i1}{sCgd_k} + \frac{i2}{sCgs_k} + V_t}{Z_L}$$
(4.15)

Substituting (4.15) in (4.14) gives an expression for  $Zout_{k-1}$  to be

$$Zout_{k-1} = \frac{Cgs_k + Cstack_k + Cgd_k(1 + gm_kZL + sCgs_kZ_L + sCstack_kZ_L)}{(gm_k + sCgs_k)(Cgd_k + Cstack_k + sCgd_ksCstack_kZ_L)}$$
(4.16)

The product term  $gm_kZ_L$  is larger, approximately 10 times larger than the product terms  $sCgs_kZ_L$  and  $sCstack_kZ_L$ . Moreover, the product term  $sCgd_ksCstack_kZ_L$  is significantly lower than the individual capacitances  $Cgd_k$   $Cstack_k$ , even at high frequency, (4.16) and can be simplified to,

$$Zout_{k-1} = \frac{Cgs_k + Cstack_k + Cgd_k(1 + gm_kZ_L)}{(gm_k + sCgs_k)(Cgd_k + Cstack_k)}$$
(4.17)

Hence the real part of the above expression can be adjusted by  $Cstack_k$ , and be described as,

$$Rout_{k-1} = \Re\{Zout_{k-1}\} = \frac{Cgs_k + Cstack_k + Cgd_k(1 + gm_kZ_L)}{gm_k(Cgd_k + Cstack_k)}$$
(4.18)

As mentioned previously, to achieve the optimum power match at each stage of the FETstack topology, each transistor should see optimum impedance. Hence, for the  $(k-1)^{th}$ transistor in the stack the optimum impedance should be  $(k-1)R_{opt}$  and  $kR_{opt}$  for the  $k^{th}$ stage. Substituting these conditions for  $Rout_{(k-1)}$  and ZL respectively in the equation above presents the relationship of  $Cstack_k$  to be,

$$Cstack_{k} = \frac{Cgs_{k} + Cgd_{k}(1 + gm_{k}R_{opt})}{(k-1)gm_{k}R_{opt} - 1}$$
(4.19)

In order to simplify the analysis, the output resistance *ro* is neglected. However, the floating body transistors from the 45nm SOI process suffer from low output impedance, and the above equations provide an approximation. Moving forward, the preceding analysis will be referred to in Section 4.3 when describing the design procedure for stacked-FET PAs.

# 4.2 A 2-stage 60 GHz Cascode Power Amplifier in 45 nm CMOS SOI

This section describes the design and discusses the performance of a 2 stage PA based on a cascode transistor cell. The design makes use of the optimized manifold layout of the 45 nm SOI floating body device (discussed in section 3.2) for improved parasitic and thermal performance. Moreover, SW-CPW transmission lines, as analyzed in Section 3.3, are used to synthesize the matching network of the PA.

According to IEEE's 802.15.3.c standard, in order to meet the requirement for field emissions with an antenna gain of up to 30 dBi, the total transmit power coming in to the antenna is required to be 10 dBm [35]. Hence, a design target of 10 dBm was set for the 2 stage PA to serve for applications involving this standard and WirelessHD streaming and transfer. Another objective of this design was to benchmark the IBM's 45 nm CMOS SOI technology for mm-wave PA applications. A cascode topology was chosen to obtain higher gain in comparison to the stacked-FET topology while maintaining reliability and output power. During the design of a cascode-based PA, voltage swings must be carefully monitored (Vdg and Vgs) to ensure reliable operation of the transistors for the operating input range.

Figure 4.6 shows the output power level under the 1 dB compression  $(OP_1 \text{ dB})$  variant with respect to the transistor widths. The  $OP_1$  dB levels and corresponding load impedances for each transistor width were obtained through load pull simulation at 60 GHz.



Figure 4.6: Variation in  $OP_1$  dB and ITR for various transistor widths at 60 GHz

In Figure 4.6 it can be clearly observed that as the transistor size is increased, under a constant supply voltage (a value of 2.2 V was used for the floating body NMOS device), a non linear increase in the output power level at 1 dB compression is observed. However, the load resistance required to achieve the  $OP_1$  power level reduces, as the transistor body size increases, putting a critical limitation on the bandwidth and complexity of the matching network. The resistance at 1 dB output compression point reduces by a factor of 3.3, with an ITR, with respect to 50  $\Omega$ , of 0.147 for the 180  $\mu$ m size. As the design specification for saturated output power is 10 dBm, a 90  $\mu$ m transistor was considered to be suitable as it provides an  $OP_1$  dB of 10 dBm with a reasonable ITR of 0.44. Moreover, larger devices such as the 150  $\mu$ m and 180  $\mu$ m units, suffer from low gain due to higher parasitics, putting strain on the driver and affecting the overall gain of the 2 stages PA when attempting to achieve 10 dBm output power.

Although the above analysis does not account for the input and output matching network losses, the load pull results provide a general idea of the power level that can be achieved while keeping some design margin for losses.

## 4.2.1 Output Power Stage

The output matching network of the power stage was designed to deliver maximum output power. The load pull contour for a 90um transistor shows the optimum load impedance to be approximately  $Z_{opt} = 24+j35$  at  $OP_1$  dB compression as shown in Figure 4.7. The selection of a 90 $\mu$ m transistor was based on the need to facilitate the matching of the output to 50  $\Omega$  without the realization of multi-section matching networks which incur losses. A single section LC matching network can be used to synthesize the matching network for the required impedance matching.

A SW-CPW was used to synthesize the inductor. It was implemented in the topmost metal layer (LB) and slotted shield (at B3), as discussed in Section 3.3.3. The dimensions were optimized to obtain a high Q-factor and meet the current density requirement to prevent electro migration effects and improve the reliability of the design. Figure 4.8 shows the overall matching topology with an optimized CPW width of 20  $\mu$ m, gap of 18  $\mu$ m and total length of 250  $\mu$ m. Current density limitation is critical for the output stub as it is also used as a DC feed for drain biasing of the output stage.



Figure 4.7: Output power and PAE contours for 90 um cascode device at 60 GHz

Moreover, as a gain of 15 dB or higher was required from the overall design of the 2 stage PA, the limited MAG at mm-wave frequencies made it necessary to use of class AB

necessary for the output design stage and class A for the driver stage. This poses further reliability concerns as the maximum voltage swing observed is up to twice the voltage supply, putting a constraint on the maximum supply that can be used.



Figure 4.8: LC output matching topology realized with SW CPW

## 4.2.2 Input Stage and Inter-stage Matching

It was necessary to add a driving stage to drive the output stage in to saturation and to contribute to the overall power gain. A size of 60  $\mu$ m was used to design the driver which was configured identically to the cascode output stage. The reason for selecting a smaller device for the cascode stage was to reduce the power consumption while keeping the input power drive level to the output stage high in order to drive it into saturation. In order to interface the driver stage with the output stage, an inter-stage matching network was designed. The key criterion of the inter-stage matching was to provide the optimum load impedance to the first stage so that it would deliver enough power to drive the second stage. Figure 4.9 shows the the L-C-L based inter-stage topology to match the input impedance of the output stage ( $Z_{in} = 8$ -j20  $\Omega$ ) to the  $Z_{opt}$  of the output stage. The matching network topology is similar to that of the output stage. On the other hand, the input matching network consists of single section L-C network that matches a 50  $\Omega$  generator impedance to the optimum source impedance of the PA, obtained through source pull simulation.



Figure 4.9: L-C-L based inter-stage matching topology between driver and output stages

#### 4.2.3 Final Circuit

Finally, the manifold layout discussed in Section 3.1 was utilized for optimized performance due to its minimized parasitics. The low impedance stubs in Figure 4.10, CPW1 and CPW3 were realized with the following dimensions width 10  $\mu$ m, 11.5  $\mu$ m gap and total length of 250  $\mu$ m. The output matching stub CPW2 and interstage matching stub CPW 4 are utilized to provide drain bias to the cascodes. High quality high sheet resistivity polysilicon resistors from the IBM 45 nm CMOS SOI design kit were utilized for the bias resistors to the common gate stages. The capacitors used for decoupling between the two stages, the RF shunts at the end of the CPWs and the gate decoupling capacitors were designed using metal-oxide metal (MOM) capacitors. The MOM capacitors for the IBM design kit are constructed using inter-digitated metal fingers. More metal layers can also be used to achieve high capacitance densities for smaller areas at the cost of higher parasitic capacitive coupling between the substrate. The large capacitors used for AC coupling (e.g. 0.5 pF) were implemented on a 15  $\mu$ m x 15  $\mu$ m area utilizing a full metal layer stack, i.e m1-b3. Moreover, 1 pF bias decoupling caps were implemented with an area of 30  $\mu$ m x 15  $\mu$ m. The input of the common source of both stages use an RC based damping circuit (100  $\Omega$ [120 fF) to make the stage stable at lower frequencies (where the gain is higher). Figure 4.11 shows the microphotograph of the total die area 814  $\mu$ m x 646  $\mu$ m.

#### 4.2.4 Small and Large-signal Measurement Setup

As the measurements at mm-wave frequencies are prone to errors due to the offsets and insertion loss introduced by components in the RF path, calibration is required to shift the



Figure 4.10: Simplified schematic of the single-ended 2-stage cascode PA



Figure 4.11: Die photomicrograph of 2-stage single-ended cascode PA

reference plane of measurement to the device under test (DUT). Figure 4.12a shows the setup or small signal measurements with calibration performed until the probe tips using

two-port Line-Reflect-Reflect-Match (LRRM) using an impedance standard substrate. A vector network analyzer was used to measure the small signal frequency response of the fabricated PAs.

A different setup seen in Figure 4.12b was used for large signal measurements. An external PA driver was used to drive the fabricated DUT to saturation and to provide a broadband 50  $\Omega$  matching through the isolator. Moreover, a power sensor and power meter were required to measure the absolute output power. This is necessary when measuring large signal power and efficiency characteristics of a PA. Apart from the LRRM calibration, a power calibration was required to de-embed the insertion loss of test fixtures, such as cables, adapters and probes to shift the reference plane to the input and output of the DUT.

#### 4.2.5 Simulation and Measurement Results

Using the small and large signal setup described in figure 4.12, the S-parameters, AM-AM and AM-PM characteristics of the 2 stage cascode PA were measured (shown in figure 4.13 and 4.14 respectively). A gain of 18.1 dB was achieved with an output -1 dB power of 9 dBm and saturated output power of 13.6 dBm in measurement. Measured peak PAE of 21% is obtained with an overall 3 dB bandwidth of approximately 19% (55.5-67 GHz) is achieved. During measurement of S-parameters, a pulsed RF input from the vector network analyzer source was used with a pulse width of 100  $\mu$ s and a 10% duty cycle. All simulation and measurements were done for a supply voltage of 2.4 V for both stages. The gate biasing stage by stage was set as,  $Vgs1_{Stage1} = 0.5$  V,  $Vgs2_{Stage1} = 1.6$  V,  $Vgs1_{Stage2} = 0.45$  V and  $Vgs2_{Stage2} = 1.5$  V to ensure an equal Vds drop (1.1 V) across each transistor in the cascode stage.

# 4.3 A 3-stack 60 GHz Power Amplifier in 45 nm CMOS SOI

As mentioned previously, the addition of stack capacitors to the gate of the common gate transistors allows for additional headroom for reliable operation, for large signal swings, by allowing for gate voltage swings in the stacked transistor such that Vdg and Vgs voltages are within the breakdown limits of the device. To demonstrate the feasibility of the stacked topology amplifier, a 3-stacked amplifier will be discussed in this section targeting 50 mW (17 dBm) of saturated output power.



Figure 4.12: Measurement setup for (a) small-signal and (b) large-signal PA characterization



Figure 4.13: Results for (a) forward gain  $(S_{21})$ , (b) input reflection coefficient  $(S_{11})$ , (c) output reflection coefficient  $(S_{22})$  and (d) isolation  $(S_{12})$  of the 2-stage cascode PA 56



Figure 4.14: Large signal performance showing (a) AM-AM and efficiency performance at 60 GHz, (b) AM-PM at 60 GHz and (c) measured PAE, OP-1dB and  $P_{sat}$  across 17% fractional bandwidth of 2-stage PA

Using a stack of three will allow the voltage scaling to be approximately three times the  $Vds_{max}$  of the transistor. Hence a 6 dB increase in output power should be obtained using a 3 stack device, compared to a single common source device, under ideal and lossless conditions. The number of stacked stages was limited to three for this analysis, although theoretically more power can be obtained by stacking a larger number of devices in the stack, previous work on mm-wave stacked amplifiers has shown only a marginal improvement of less than 2 dB for stacks up greater than 4 stages [17]. Based on load pull simulation, the implemented 3 stack device generated approximately 17 dBm at  $OP_1$  dB with an  $Z_{opt}$  of around  $18 + j20 \Omega$ .

### 4.3.1 Inter-stack Matching

The main idea behind the stacked topology is to obtain larger output power through the in-phase combining of the voltage swings on each stack. This is primarily achieved by providing optimal load line resistance to the output of each stack. Parasitic capacitances at the inter-stack nodes cause leakage of the drain current, primarily through Cgs, which impacts the efficiency and output power. Moreover, the use of stacking capacitors to obtain *Ropt*, as analyzed in (4.19), leads to an unwanted susceptance at the inter-stack node which affects the phase alignment of the output stack voltages. The purpose of the inter-stack matching network is to provide the optimal impedance at each stage for maximum voltage and in-phase swing. An expression for the required optimal susceptance  $Y_{opt}$  is derived using the small signal model shown in Figure 4.15.

In order to derive the optimal susceptance seen at the output of the  $k^{th}$  stage for optimal voltage  $V_{max}$  (i.e  $Yout_k$ ) the overall current  $Iout_k$  can be expressed as [17],

$$Iout_k = gm_k Vgs_k - Vgd_k sCgd_k + V_{max} sCds_k + kV_{max} sCdsub_k$$

$$(4.20)$$

It is important to note that it is preferable for  $Vds_k$  at to be a constant of  $V_{max}$  where as  $Vd_k$  will be scaled multiple of  $V_{max}$ , i.e  $kV_{max}$  for the  $k^{th}$  stage.

Optimum susceptance  $Yout_k$  (to achieve  $V_{max}$  swing at each stage) can be calculated as,

$$Yout_k = \frac{Iout_k}{Vd_k} = \frac{Iout_k}{kV_{max}} = \frac{gm_k Vgs_k - Vgd_k sCgd_k}{kV_{max}} + \frac{sCds_k}{k} + sCdsub_k$$
(4.21)

By substituting (4.10), (4.11) and (4.19) in (4.21),  $Yout_k$  can be simplified to,



Figure 4.15: Small signal model of inter-stack network between the  $k^{th}$  and  $(k+1)^{th}$  stage

$$Yout_k = \frac{gm_k Vgs_k}{kV_{max}} - \frac{s}{k}(Cds_k + kCdsub_k) - \frac{s}{k}(1 + \frac{1}{gm_k R_{opt}})Cgd_k$$
(4.22)

From (4.22) it can be observed that the imaginary part of this equation constitutes to a capacitive susceptance which will be denoted as  $Cout_k$ , i.e.,

$$Cout_{k} = \frac{1}{k}(Cds_{k} + kCdsub_{k}) + \frac{1}{k}(1 + \frac{1}{gm_{k}R_{opt}})Cgd_{k}$$
(4.23)

Substituting the optimum conductance of  $1/kR_{opt}$  in (4.22) gives,

$$Yout_k = \frac{1}{kR_{opt}} - \frac{s}{k}(Cout_k) \tag{4.24}$$

Previous works have analyzed three different topologies to combat the inter-stage node parasitics to ensure proper phase alignment of the voltage swing across each stage. First
proposed by Ezzedine et al. [16] external capacitors across the drain to source terminal of each stage can be added to ensure proper phase alignment of  $V_{ds}$  across the stacked transistor stage, as shown in Figure 4.16a. However, this technique increases the effective Cds which means the inductive tuning requirement will be higher in the output matching network. The second technique, shown in Figure 4.16b, involves the addition of a series inductance between the inter-stack node to present the required  $R_{opt}$ . As described in [17], this technique provides a scaled version of  $R_{opt}$ , causing the  $Cstack_k$  capacitor values to be greater than in the case where there is no inter-stack matching network.

In the third technique, shown in Figure 4.16c, a shunt inductance is added to the matching network to ensure in phase alignment for  $V_{ds}$  at each stage. The technique as applied in [17] has shown to superior performance when compared to the other two techniques. In addition, the impact on efficiency and output power performance is less sensitive to the absolute value inductance, minimizing process variation effects. Once an appropriate value of  $Cstack_k$  is selected to set  $R_{opt}$ , the shunt inductance  $Lstack_k$  can be added to ensure that the  $k^{th}$  stage sees the optimal susceptance  $Y_{opt}$ . The source current,  $Is_k$  seen by the source of the  $(k + 1)^{th}$  stage can be expressed as,

$$Is_{k+1} = gm_{k+1}Vgs_{k+1} + sCgs_{k+1}Vgs_{k+1} + sCds_{k+1}V_{max}$$
(4.25)

$$Ys_{k+1} = -\frac{Is_{k+1}}{kV_{max}} = -\frac{gm_{k+1}Vgs_{k+1} + sCgs_{k+1}Vgs_{k+1}}{kV_{max}} - \frac{sCds_{k+1}V_{max}}{kV_{max}}$$
(4.26)

By substituting (4.10), (4.11), (4.19) and setting the real  $\{Ys_{k+1}\}$  equal to the optimum conductance  $1/kR_{opt}$  in (4.27),  $Ys_{k+1}$  can be simplified to be [17],

$$Ys_{k+1} = \frac{1}{kR_{opt}} - \frac{sCds_{k+1}}{k} + \frac{sCgs_{k+1}}{kgm_{k+1}R_{opt}}$$
(4.27)

As shown in Figure 4.16c, the addition of a shunt inductor  $Lstack_k$  can allow the  $k_{th}$  stage to be matched to the desired optimum susceptance  $Yout_k$  as calculated in (4.22) and (4.24). The required value of  $Lstack_k$  necessary to achieve the optimum susceptance can be expressed as,

$$\Im\{Ys_{k+1}\} + \frac{1}{Lstack_k} = \Im\{Yout_k\}$$
(4.28)

Substituting imaginary coefficients from (4.22) and (4.27) in (4.28),  $Lstack_k$  simplifies to be,

$$\frac{1}{Lstack_k} = \frac{\omega^2 (Cds_k - Cds_{k+1})}{k} + \frac{\omega^2 Cgs_{k+1}}{kgm_{k+1}R_{opt}} + \omega^2 (\frac{Cgd_k + kCdsub_k}{k})$$
(4.29)



Figure 4.16: Matching techniques:(a) shunt capacitor (b) series inductance (c) shunt inductance

Using the above analysis, calculations were performed to obtain initial values to verify the operation of the shunt inductor based inter-stack matching network. A 3 stack PA was designed to verify the discussed theory. Using (4.19) to obtain values for  $Cstack_k$ and (4.29) for the required shunt inductance,  $Lstack_k$  for a device size of 180  $\mu$ m in a 3stack configuration. The extracted parasitic parameters for a 180 $\mu$ m manifold-layout based transistor in 45 nm CMOS for Cgs, Cgd, Cds are 130 fF, 60 fF and 53 fF, respectively. Cdsub is neglected in the analysis as it is assumed to be small in comparison to Cds and the other parasitic capacitances. The same transistor biased in class AB, Vgs of 0.45 V, has a gm of 260 mS which was the value used in the analysis. The  $Z_{opt}$  of the 3 stack-FET configuration, utilizing 180  $\mu$ m is around 18+j20 Ohms. Table 4.1 summarizes the calculated values and design values used for Cstack and Lstack with lossless and ideal input and output matching network.

It can be seen from the table, that both Lstack and Cstack values are in fair agreement in terms of the calculated and final design values used in simulation to obtain the required phase agreement. Moreover, the simulated  $R_{opt}$  is close to the load pull simulated  $R_{opt}$ 

| k | Calculated      | Design     | Calculated | Design     | Simulated |
|---|-----------------|------------|------------|------------|-----------|
|   | $Cstack_k$      | $Cstack_k$ | $Lstack_k$ | $Lstack_k$ | $Zopt_k$  |
|   | $(\mathrm{fF})$ | (fF)       | (pH)       | (pH)       | (Ohms)    |
| 1 | _               | _          | 53.9       | 62         | 7.3-j1.4  |
| 2 | 367             | 375        | 97.4       | 110        | 13+j2     |
| 3 | 114             | 150        | _          | _          | 17+j15    |

Table 4.1: Calculated and Design Parameter Summary of Cstack and Lstack

value. The optimum loadline does not scale linearly, which affects the voltage swing of the first and second stages. Both stages see slightly larger  $R_{opt}$ , than desired, which causes them to saturate and swing below the knee region of the transistor in turn, causing non-linear operation of the transistor and generating undesired higher order harmonics.

Figure 4.17 shows a comparison of the simulated drain voltage Vds for each stage in the 3-stacked-FET configuration. Figure 4.17a clearly demonstrates the importance of the shunt inductor in the phase alignment and symmetry of the output voltages in comparison to 4.17b, where the symmetry is broken due to the absence of the inter-stack matching network. Moreover looking at the junction voltage swings in Figure 4.18 and 4.19, even distribution of voltages across each FET in the stack, improved reliability and in-phase alignment is achieved in the presence of  $C_{stack}$  and  $L_{stack}$ .

Figure 4.20 shows a comparison of the impact on output power and drain efficiency for the 3-stack PA experiencing voltage swing alignment and misalignment as shown in Figure 4.17. Figure 4.20a shows a decrease of around 0.8 dB at  $OP_1$  dB when no interstack matching networks are used. Moreover, Figure 4.20b shows a degradation of 2.9% , in the drain efficiency at peak saturated output power without the use of inter-stack matching. It is important to note that the above simulations were based on ideal loss less input and output matching networks to demonstrate the impact of the inter-stack matching network losses. Although, the performance loss without the inter-stack matching network may seem marginal, during full design, the loss contribution from the output and input matching networks and the addition of more stages makes the use of inter-stack matching crucial to achieve higher optimum performance.



Figure 4.17: Simulated drain to source voltage swing Vds (a) with and (b) without interstack matching network synthesized by shunt inductor

### 4.3.2 Final Circuit

Similar to the 2 stage driver, the manifold layout based 30  $\mu$ m unit cell was used to construct a 180  $\mu$ m transistor. Furthermore, a 3-stacked 180  $\mu$ m cell was created with inter-stack access nodes to connect the shunt inductors. The shunt inductors were realized using a top and sidewall shielded CPW as shown in Figure 3.12b. The advantage of using such stubs is the reduced coupling between the inter-stack matching stubs and the input and output matching stubs, given that they are implemented in different metal layers and are shielded from one another. As the physical size of the active area of the transistor is small in comparison to the CPW lines used for the design, the close proximity of the



Figure 4.18: Simulated drain to gate voltage swing Vdg (a) with and (b) without inter-stack matching network synthesized by shunt inductor

parallel stubs caused unwanted mutual coupling affecting the synthesized inductances used for matching.

The nominal variation in the inductance value provided by the shunt inter-stack CPWs, CPW3 and CPW4 from Figure 4.21, marginally affected the output performance of the PA [17]. However, the coupling of these stubs with the output and input stubs can degrade the overall performance severely. One solution is to place the stubs orthogonally to prevent



Figure 4.19: Simulated gate to source voltage swing Vgs (a) with and (b) without interstack matching network synthesized by shunt inductor

coupling at the cost of larger area consumption. Alternatively, implementing the shunt CPW lines in a top and sidewall shielded configuration, Figure 3.12b, allows for the use of the lower metal layer UA as the CPW line and LB as the top shield. One disadvantage of using such a topology is the increased capacitance added by the top shield which reduces the inductance of the stub. Hence larger lengths and narrower lines were required to increase the effective inductance and characteristic impedance of the stub. The inter-stack



Figure 4.20: Simulated (a) AM-AM and (b) drain efficiency performance of the 3-stack PA with and without inter-stack matching network synthesized by shunt inductor

stubs CPW3 and CPW4 were implemented with dimensions of 5  $\mu$ m width and gap of 14  $\mu$ m. CPW1, the input matching stub has dimensions of width 7  $\mu$ m and gap of 12.5  $\mu$ m implemented as a SW-CPW in the topmost metal layer (LB) and slotted shield in B3, discussed in Section 3.3.3. The output matching stub CPW2 is similar to the input stub with a width 11  $\mu$ m and gap of 12.5  $\mu$ m. The 3-stack PA occupies a total area 860  $\mu$ m x

680  $\mu$ m. The final layout of the 3-stack PA is shown in Figure 4.22.



Figure 4.21: Simplified schematic of the single-ended 3-stack PA

#### 4.3.3 Simulation Results

The plots in Figure 4.23 show the simulated small signal parameters of the three-stacked PA using setup shown in Figure 4.12a. Moreover, the AM-AM and AM-PM curves were obtained under large input signal simulation. The three-stack PA achieved measured small signal gain of 8.8 dB gain. Under large signal simulation, a total saturated output power of 16 dBm is obtained at a PAE of 14 %. Simulations are performed at a supply voltage, VDD, of 3.5 V for the 3-stack. The gate for each stage in the stack is set to ensure an equal Vds drop (1.1 V) with gate biasing values, Vgs1 = 0.45 V, Vgs2 = 1.5 V and Vgs3 = 2.6 V.



Figure 4.22: Simplified schematic of the single-ended 3-stack PA

## 4.4 A 3-stage 60 GHz Power Amplifier in 45 nm CMOS SOI

The 3-stage design involves the use of a 2-stage cascode and a 3-stack PA discussed in section 4.2 and 4.3 respectively. The 3-stage PA was targeted to achieve 16 dBm from the 3-stack FET PA. To improve the overall gain, the 2-stage cascode driver was cascaded with the 3-stack device to achieve a target gain of 20 dB and higher.

This section will describe the design of the cascode driver and stacked-FET power stage which are interfaced with each other through an interstage matching network.



Figure 4.23: Small signal results for (a) forward gain  $(S_{21})$ , (b) input reflection coefficient  $(S_{11})$ , (c) output reflection coefficient  $(S_{22})$  and (d) isolation  $(S_{12})$  for 3-stack PA



Figure 4.24: Simulated large signal performance showing (a) AM-AM and efficiency and (b) AM-PM performance of 3-stack PA at 60 GHz

#### 4.4.1 Input, Inter-stage and Output Matching Networks

During the design of interstage matching, two possible directions can be adopted. To achieve maximum power transfer condition between the stages, conjugate matching can be applied between the driver stage and the output stage. However, conjugate match does not guarantee optimum power and efficiency performance of the driver stages. Hence an interstage is needed to transform the optimum source impedance of the power stage to the optimum load for the driver stage to maintain the output power and efficiency of the system. Similar to the interstage topology applied for the two stage cascode driver (Figure 4.9) an L-C-L based inter-stage topology between the driver and the output stage was applied to match  $Z_{opt}$  of the driver stage to  $Z_{source}^*$  of the power stage. Figure 4.21 shows the overall schematic of the 3-stage single-ended PA. CPW lines were used to realize inductances at 60 GHz in a SW-CPW configuration with signal and ground layer in LB layer with slotted B3 layer shield. CPW3 and CPW5 were implemented as TS-CPW with signal layer in UA, to reduce the mutual coupling between the lines. Moreover, the inter-stack stub was implemented as a microstrip line due to area constraints around the inter-stack junction of the transistor.

The inter-stage matching network between the 2-stage cascode driver was reduce from L-C-L based matching to a single section L-C matching. Furthermore, one of the interstack matching stubs between transistor stages M5 and M6 was removed for area and coupling concerns. Although the inter-stack stubs are important to provide the correct phase alignment and optimum load impedance to the stage for optimum efficiency, the degradation was observed to be within 1% in terms of PAE. For the input and output matching network, an L-L section matching topology is applied as shown in Figure 4.25. The lengths *l1* and *l2* were optimized to obtain the right impedances while minimizing the chip area.



Figure 4.25: L-L section matching topology for (a) input and (b) output of the 3-stage PA

#### 4.4.2 Simulation Results

The 3-stage PA is implemented, overall schematic and layout shown in Figures 4.26 and 4.27. Its performance was verified through post-layout simulation. The plots in Figure 4.28 show the simulated small signal parameters of the 3-stage PA. Moreover, the AM-AM

and AM-PM are shown in Figure 4.29. The simulation results show the three-stage PA achieving a flat small signal gain of 21.5 dB with a 1 dB bandwidth from 52 GHz to 64 GHz, with saturation output power of 16 dBm at a PAE of 13.8% at 60 GHz. The design occupied a total die area of 990  $\mu$ m x 760  $\mu$ m. By observing Figure 4.28c, it can be seen that for 52 GHz to 60 GHz, saturated output power of 16 dBm and above was obtained within a PAE range of 11% to 13.8%.



Figure 4.26: Simplified schematic of the single-ended 3-stage PA

## 4.5 Higher Order Harmonic Control for 60 GHz PA Design

In traditional PA designs, higher order harmonics can be controlled to shape the output voltage and currents to attain high efficiency performances. For linear class PA designs, such as class B, it is important to provide short circuit at the high order harmonics for optimum performance. However, for designs at 60 GHz, the designs showed minimal performance degradation with and without ideal higher order harmonic short. This was observed because the device parasitic capacitates provide very low impedances (at 60 GHz)causing any higher order harmonics to be shorted. Figure 4.30 shows the second harmonic load pull and source pull simulations performed on a 90  $\mu$ m cascode device used as the output stage for the two stage PA analyzed in section 4.2. The contours show insensitivity to the second harmonic load and source impedance, 2fo = 120 GHz, as the



Figure 4.27: Layout of proposed 3-stage PA

impedance is swept across the smith chart. At 1 dB output compression power, Figure 4.30a, the second harmonic, i.e. 120 GHz, load pull contours show a total variation in PAE by 3.8% and 0.5 dB change in output power. The second harmonic sweep for the source impedance of the transistor also shows less sensitivity when driven beyond the  $OP_1$  dB point of 15 dBm output power. The PAE changes by 0.6% only, whereas the power remains constant. In conclusion, sybthesizing higher order harmonic terminations at 120 GHz and 180 GHz (2fo and 3fo) is difficult as the output capacitance of the transistors provide a very low impedance at these frequencies [17]. However, it is important to ensure that appropriate biasing (e.g. to prevent knee region intrusion under large signal) should be set to minimize the higher order harmonic generation, or shape the current and voltage waveform.



Figure 4.28: Simulated small signal performance for (a) forward gain  $(S_{21})$ , (b) input reflection coefficient  $(S_{11})$ , (c) output reflection coefficient  $(S_{22})$  and (d) isolation  $(S_{12})$  of the 3-stage PA 74



Figure 4.29: Simulated large signal performance showing (a) AM-AM and PAE at 60 GHz (b) AM-PM 60 GHz and (c) PAE, OP-1dB and  $P_{sat}$  across 20% fractional bandwidth for the 3-stage PA



Figure 4.30: Simulated output power and PAE contours for second harmonic, 120 GHz, (a) load pull and (b) source pull on a 90  $\mu$ m cascode transistor

# Chapter 5

## **Conclusions and Future Work**

### 5.1 Conclusions

The primary impetus behind moving to the untapped mm-wave spectrum is to meet the need for high data rate communications. However, high frequency communication provides an unfavourable environment for signal propagation resulting in more attenuation for a given distance. The key motivation driving exploration of the 60 GHz spectrum is the availability of 5-7 GHz of unlicensed bandwidth. The investigations began with a focus on, applications in wireless personal area networks (WPAN) to enable high data-rate applications such as high-speed internet access, high definition video streaming and wireless data bus for cable replacement.

Moreover, through the scaling of CMOS technology, its high frequency capabilities have been improved and are providing a cheaper and more integrated alternative to existing nonsilicon based technologies (e.g. GaAs, InP and GaN). Apart from CMOS scaling, the use of insulating substrates such as silicon on insulator (SOI), has favourably impacted the use of CMOS technology in RF applications where the performance is critically affected by conductor and substrate losses.

This thesis has explored the use of nano-scale CMOS-SOI technology for PA applications for the 60 GHz spectrum. Emphasis was placed on the stacked-FET power combining technique as a way to increase the output power, the reliability under large signal operation and the ease of wide-band matching. Chapter 2 showed an overview of various PA classes and their pertinence to mm-wave PA designs. Moreover, an overview of various mm-wave PA designs for Q and V-bands examined the need for power combining architectures and lower loss passives to maximize combining efficiency. Chapter 3, presented an analysis of various CPW topologies to realize on chip transmission lines for the matching network. It was clear that the SW-CPW provided the optimum performance in terms of the highest Q-factor of 13 at 60 GHz. Chapter 4, presented the design of a 3-stage amplifier consisting of a 3-stack power stage and a 2-stage cascode based driver amplifier. Each of the stages, the power and the driver stage, were analyzed in separate breakout circuits with performance results presented.

Overall, the feasibility of SOI for output power range from 10 dBm to 18 dBm was tested by the designed PAs. A total measured gain of 18.1 dB and output power of 13.6 dBm was achieved for the 2-stage cascode driver stage with a peak PAE of 21 %. Similarly, the 3 -stack achieved 8.8 dB, 16 dBm and 16%, for gain, output power and PAE respectively. The post-layout simulated results for the integrated full 3-stage design showed 21.5 dB gain across 20% bandwidth, 16 dBm output power and peak PAE of 13.8%.

### 5.2 Future Work

Two issues raised by this work are worth further investigation. Firstly, by looking at the measured and simulated AM-AM performances, soft-compression in the AM-AM characteristic can be observed which constitutes non-linear behaviour of the PA. Improving the linearity of the PAs at mm-wave should be investigated with on chip approaches. Linearization techniques can be characterized in to predistortion, feedback, feedforward and other techniques that utilize a combination of the above mentioned approaches. Traditional techniques such as digital predistortion, commonly applied for high power PAs, has power overhead for its implementation in low output power PAs. Moreover, processing in the digital domain for multi-GHz bandwidth would require high speed analogue-to-digital and digital-to-analogue converters meeting a sampling frequency of atleast five to ten times the signal bandwidth. On the other hand, feedback techniques pose concerns of instability and low gain, which are critical factors to consider at mm-wave frequency.

Work in [38] applies predistortion for 60 GHz CMOS PA using an on chip pre-distortion linearizer, A cold-mode MOSFET linearizer is used at the input of the PA that provides gain expansion characteristic at the input to counteract the output gain compression of the PA, achieved through variable resistance at the input of the PA provided by the coldmode MOSFET. Furthermore, recently a post linearization technique has been proposed in [39] that utilizes an additional linearization amplifier at the output of the PA. Although the work is demonstrated for GaN PAs, the integration of linearizing amplifiers provide a potential for on chip solution for integrated mm-wave PAs. Another potential area for future investigation could be to improve back-off efficiency of PAs. The measurement results presented in this work are based on CW stimulus, however in practice, PAs are driven by complex modulated signals with varying PAPR and suffer from poor back-off efficiency. Previous work has demonstrated improved back-off efficiency for high PAPR signals using the Doherty technique at E band [40]. Figure 5.1 below shows the traditional parallel load connected Doherty amplifier. As the Doherty utilizes the use of impedance inverters, implementation of quarter wave lengths in silicon poses a bottle neck in achieving high performance, due to conductor and substrate losses. In order to avoid losses from the quarter wave lines and improve the gain of the peaking amplifier work in [41] demonstrates the use of a pre-amplifier at the input of the peaking amplifier on the cost of reduced efficiency. Apart from the Doherty technique, another efficiency enhancrement technique applied at mm-wave is the outphasing technique demonstrated at 60 GHz with modulated signals [42].



Figure 5.1: Traditional Doherty PA

# References

- [1] J.Wannstrom, "Carrier aggregation explained," June 2013. [Online]. Available: //www.3gpp.org/technologies/keywords-acronyms/101-carrier-aggregation-explained
- [2] 4G Americas, "5G spectrum recommendations," August 2015. [Online]. Available: http://www.4gamericas.org/files/6514/3930/9262/4G\_Americas\_5G\_Spectrum\_ Recommendations\_White\_Paper.pdf
- [3] Federal Communications Comission, "Part 15 rules for unlicensed operation in the 57-64 ghz band," August 2009. [Online]. Available: https://www.fcc.gov/document/part-15-rules-unlicensed-operation-57-64-ghz-band
- [4] T. Baykas, C. S. Sum, Z. Lan, J. Wang, M. A. Rahman, H. Harada, and S. Kato, "IEEE 802.15.3c: the first IEEE wireless standard for data rates over 1 Gb/s," *IEEE Communications Magazine*, vol. 49, no. 7, pp. 114–121, Jul. 2011.
- [5] R. C. Daniels and R. W. H. Jr., "60 GHz wireless communications: emerging requirements and design recommendations," *IEEE Vehicular Technology Magazine*, vol. 2, no. 3, pp. 41–50, Sep. 2007.
- [6] S.C.Cripps, *RF Power Amplifiers for Wireless Communications*, 2nd ed. Artech House, 2006.
- [7] S. C. Cripps, P. J. Tasker, A. L. Clarke, J. Lees, and J. Benedikt, "On the continuity of high efficiency modes in linear RF power amplifiers," *IEEE Microwave and Wireless Components Letters*, vol. 19, no. 10, pp. 665–667, Oct. 2009.
- [8] C. Y. Law and A. V. Pham, "A high-gain 60GHz power amplifier with 20dBm output power in 90nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. - (ISSCC)*, Feb. 2010, pp. 426–427.

- [9] Y. H. Hsiao, Z. M. Tsai, H. C. Liao, J. C. Kao, and H. Wang, "Millimeter-wave CMOS power amplifiers with high output power and wideband performances," *IEEE Transactions on Microwave Theory and Techniques*, vol. 61, no. 12, pp. 4520–4533, Dec. 2013.
- [10] I. Aoki, S. D. Kee, D. B. Rutledge, and A. Hajimiri, "Distributed active transformer-a new power-combining and impedance-transformation technique," *IEEE Transactions* on Microwave Theory and Techniques, vol. 50, no. 1, pp. 316–331, Jan. 2002.
- [11] Y. N. Jen, J. H. Tsai, T. W. Huang, and H. Wang, "Design and analysis of a 55 -71-GHz compact and broadband distributed active transformer power amplifier in 90-nm CMOS process," *IEEE Transactions on Microwave Theory and Techniques*, vol. 57, no. 7, pp. 1637–1646, Jul. 2009.
- [12] S. Aloui, B. Leite, N. Demirel, R. Plana, D. Belot, and E. Kerherve, "High-gain and linear 60-GHz power amplifier with a thin digital 65-nm CMOS technology," *IEEE Transactions on Microwave Theory and Techniques*, vol. 61, no. 6, pp. 2425–2437, Jun. 2013.
- [13] Y. Zhao and J. R. Long, "A wideband, dual-path, millimeter-wave power amplifier with 20 dBm output power and PAE above 15% in 130 nm SiGe-BiCMOS," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 9, pp. 1981–1997, Sep. 2012.
- [14] D. Zhao and P. Reynaert, "A 60-GHz dual-mode class AB power amplifier in 40-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 10, pp. 2323–2337, Oct. 2013.
- [15] S. V. Thyagarajan, A. M. Niknejad, and C. D. Hull, "A 60 GHz drain-source neutralized wideband linear power amplifier in 28 nm CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 8, pp. 2253–2262, Aug. 2014.
- [16] A. K. Ezzeddine and H. C. Huang, "The high voltage/high power FET (HiVP)," in Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symp, Jun. 2003, pp. 215– 218.
- [17] H. T. Dabag, B. Hanafi, F. Golcuk, A. Agah, J. F. Buckwalter, and P. M. Asbeck, "Analysis and design of stacked-FET millimeter-wave power amplifiers," *IEEE Transactions on Microwave Theory and Techniques*, vol. 61, no. 4, pp. 1543–1556, Apr. 2013.

- [18] A. Agah, J. A. Jayamon, P. M. Asbeck, L. E. Larson, and J. F. Buckwalter, "Multidrive stacked-FET power amplifiers at 90 GHz in 45 nm SOI CMOS," *IEEE Journal* of Solid-State Circuits, vol. 49, no. 5, pp. 1148–1157, May 2014.
- [19] J. H. Chen, S. R. Helmi, R. Azadegan, F. Aryanfar, and S. Mohammadi, "A broadband stacked power amplifier in 45-nm CMOS SOI technology," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 11, pp. 2775–2784, Nov. 2013.
- [20] A. Chakrabarti and H. Krishnaswamy, "High-power high-efficiency class-e-like stacked mmwave pas in SOI and bulk CMOS: Theory and implementation," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 8, pp. 1686–1704, Aug. 2014.
- [21] R. Bhat, A. Chakrabarti, and H. Krishnaswamy, "Large-scale power combining and mixed-signal linearizing architectures for Watt-class mmwave CMOS power amplifiers," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 2, pp. 703–718, Feb. 2015.
- [22] B. Hanafi, O. Gürbüz, H. Dabag, J. F. Buckwalter, G. Rebeiz, and P. Asbeck, "Q -band spatially combined power amplifier arrays in 45-nm CMOS SOI," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 6, pp. 1937–1950, Jun. 2015.
- [23] M. Kaifi and M. J. Siddiqui, "Kink model for SOI MOSFET," in Proc. Int Multimedia, Signal Processing and Communication Technologies (IMPACT) Conf, Dec. 2011, pp. 216–219.
- [24] B. Heydari, P. Reynaert, E. Adabi, M. Bohsali, B. Afshar, M. A. Arbabian, and A. M. Niknejad, "A 60-GHz 90-nm CMOS cascode amplifier with interstage matching," in *Proc. European Microwave Integrated Circuit Conf. EuMIC 2007*, Oct. 2007, pp. 88–91.
- [25] O. Inac, M. Uzunkol, and G. M. Rebeiz, "45-nm CMOS SOI technology characterization for millimeter-wave applications," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 6, pp. 1301–1311, Jun. 2014.
- [26] J. P.Aaen and J.Wood, Modeling and Characterization of RF and Microwave Power FETs. Cambridge University Press, 2007.
- [27] J.Zhan, "Millimeter-wave transistor device," U.S. Patent US 8,653,564 B1, February, 2014.

- [28] B. Heydari, M. Bohsali, E. Adabi, and A. M. Niknejad, "Millimeter-wave devices and circuit blocks up to 104 GHz in 90 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 12, pp. 2893–2903, Dec. 2007.
- [29] G. Dambrine, A. Cappy, F. Heliodore, and E. Playez, "A new method for determining the FET small-signal equivalent circuit," *IEEE Transactions on Microwave Theory* and Techniques, vol. 36, no. 7, pp. 1151–1159, Jul. 1988.
- [30] H. Hasegawa, M. Furukawa, and H. Yanai, "Properties of microstrip line on si-sio<sub>2</sub> system," *IEEE Transactions on Microwave Theory and Techniques*, vol. 19, no. 11, pp. 869–881, Nov. 1971.
- [31] A.M.Niknejad, Electromagnetics for high-speed analog and digital communication circuits. Cambridge University Press, 2007.
- [32] T. S. D. Cheung and J. R. Long, "Shielded passive devices for silicon-based monolithic microwave and millimeter-wave integrated circuits," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 5, pp. 1183–1200, May 2006.
- [33] S. Seki and H. Hasegawa, "Cross-tie slow-wave coplanar waveguide on semi-insulating GaAs substrates," *Electronics Letters*, vol. 17, no. 25, pp. 940–941, Dec. 1981.
- [34] H. H. Niknejad, Ali M., Ed., mm-Wave Silicon Technology. Springer, 2007.
- [35] S. M. N.Guo, R.C. Qiu and K.Takahashi, "60 GHz millimeter-wave radio: Principle, technology, and new results," *EURASIP Journal on Wireless Communications and Networking*, Volume 2007, Article ID 68253, 8 pages, doi:10.1155/2007/68253, 2006.
- [36] A. Larie, E. Kerhervé, B. Martineau, V. Knopik, and D. Belot, "A 1.2V 20 dBm 60 GHz power amplifier with 32.4 dB gain and 20 % peak pae in 65nm CMOS," in *Proc. ESSCIRC 2014 - 40th European Solid State Circuits Conf. (ESSCIRC)*, Sep. 2014, pp. 175–178.
- [37] W. L. Chan and J. R. Long, "A 58–65 GHz neutralized CMOS power amplifier with PAE above 10% at 1-V supply," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 3, pp. 554–564, Mar. 2010.
- [38] J. H. Tsai, C. H. Wu, H. Y. Yang, and T. W. Huang, "A 60 GHz CMOS power amplifier with built-in pre-distortion linearizer," *IEEE Microwave and Wireless Components Letters*, vol. 21, no. 12, pp. 676–678, Dec. 2011.

- [39] Y. Hu and S. Boumaiza, "Power-scalable wideband linearization of power amplifiers," *IEEE Transactions on Microwave Theory and Techniques*, vol. 64, no. 5, pp. 1456– 1464, May 2016.
- [40] E. Kaymaksut, D. Zhao, and P. Reynaert, "Transformer-based doherty power amplifiers for mm-wave applications in 40-nm CMOS," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 4, pp. 1186–1192, Apr. 2015.
- [41] A. Agah, H. T. Dabag, B. Hanafi, P. M. Asbeck, J. F. Buckwalter, and L. E. Larson, "Active millimeter-wave phase-shift doherty power amplifier in 45-nm SOI CMOS," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 10, pp. 2338–2350, Oct. 2013.
- [42] D. Zhao, S. Kulkarni, and P. Reynaert, "A 60-GHz outphasing transmitter in 40-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 12, pp. 3172–3183, Dec. 2012.