# **Euro DesignCon 2004**

# Enabling the PCI Express<sup>™</sup> ramp – ATE based testing of PCI Express architecture

Hubert Werkmann, PCI-SIG<sup>®</sup>/Agilent Technologies [e-mail : hubert\_werkmann@agilent.com]

# Abstract

This paper will start with an introduction into PCI Express architecture. It will highlight what the pre-requisites for testing PCI Express are and what test strategies have to be followed for testing this type of device efficiently. Besides the raw speed support and the adaptation to embedded clock interface specifics, emphasis will be put on the protocol induced difficulties for high volume testing. Proposals regarding design for testability (DfT) features that go beyond the ones that are mandatory for PCI Express will be made to reduce test and configuration time of the device under test. Besides these DfT features, the key electrical parameters and ways to test them efficiently on an ATE system will be described based on measurements on real PCI Express silicon.

# Author(s) Biography

Hubert Werkmann received his diploma in Computer Science from the University Stuttgart/Germany and his Ph.D. also from the University Stuttgart. From 1991 to 2000, he was with IMS-Chips where he worked in the areas of Multi-Chip-Module and chip design. In 2000, Hubert joined Agilent Technologies as an Application Engineer and now is with the Center of Expertise of Agilent's SOC business unit working on ATE test solutions for future high-speed computation applications. Hubert is a member of PCI-SIG and was involved in multiple ATE based test implementations of PCI Express devices.

# Disclaimer

PCI-SIG disclaims all warranties and liability for the use of this document and the information contained herein and assumes no responsibility for any errors that may appear in this document, nor does PCI-SIG make a commitment to update the information contained herein.

PCI-SIG, PCI Express and PCIe are registered trademarks or trademarks of PCI-SIG.

\*)All other product names are trademarks, registered trademarks or servicemarks of their respective owners.

# Introduction

Over the last two decades, PC desktop systems developed from office workhorses to home multimedia and entertainment platforms. This transformation went hand in hand with a continuously increasing bandwidth demand on the interfaces between the core components of the systems and their frontend components like graphics cards, add-in cards and network connections. The need for higher bandwidth on these interfaces was satisfied by scaling the data transfer rates and bus widths within one I/O interface generation. This scaling within I/O interface generations, for example, led to the evolution of the EISA interface out of ISA or AGP out of PCI. The only I/O generation step so far occurred in the 1990s from the ISA era to PCI and was driven by the need for a common standard that offered better scalability options for the future than ISA. With the current transition of the PC desktops to home server systems, and the resulting further increasing bandwidth needs, the scalability of the existing PCI and AGP interfaces runs out of steam. In addition to this, the design of more modular and more user friendly systems is required to facilitate a wider acceptance of such systems as a replacement for existing consumer devices in a living environment. Thus, as a logical consequence, a new generation step for I/O interfaces is required that will replace PCI and the derivates that evolved from it over the next years with the new I/O interface standard PCI Express (PCIe<sup>™</sup>)[1] as shown in **Figure 1**.



Figure 1: PC desktop bus evolution

The PCI-SIG (Peripheral Component Interconnect Special Interest Group) that already released the well known PCI and PCI-X standards published the first release version of the PCIe base specification in July 2002. Besides just serving as a desktop PC I/O interface, it is also intended that PCIe will play a substantial role in server systems as well as other application areas. First steps in this direction are made with multiple standards PCI-SIG is currently working on, such as the Server IO Module standard, the PCIe cable standard or the wireless form factor standard. Besides PCI-SIG, other standardization organizations also build on the PCIe base specification, like, for example, the Advanced Switching Interconnect SIG (ASI SIG), which works on standards enhancing PCIe with capabilities required for switch-fabric implementations.

Since the initial release of the PCIe base specification, a substantial portion of silicon vendors worked on PCIe implementations and currently are in the rollout phase of their devices. In order to execute this transition from the design verification and characterization to the technology ramp, and finally to volume manufacturing in a smooth way, sufficient test strategies for each of these steps are required. Automated test equipment (ATE) that is capable of testing PCIe silicon during each of these device life cycle steps is a fundamental requirement to enable a smooth ramp of this new I/O interface.

The first section of this paper will give an overview of the PCIe technology and its application areas. After a detailed review of the physical layer of the PCIe architecture, test requirements, PCIe test support and test challenges are identified, and useful design for testability (DfT) implementations to address some of these challenges are described. In the second section, ATE based test implementations are discussed that address the needs and requirements of PCIe device test in the three test relevant device life cycle steps mentioned above.

# **PCI Express Overview**

#### **Fundamentals**

The guiding principle for the development of the PCIe standard was to design an I/O interface that allows scalability along the bandwidth needs of the next decade with system manufacturing costs at or below currently used I/O interfaces. In order to allow a smooth transition from the existing conventional PCI to PCIe interfaces, software compatibility between the two I/O interfaces had to be ensured while considering software handles for the requirements of the future like improved hot-plug support, advanced power management, etc.

Maintaining and emulating the existing operational and configuration capabilities of conventional PCI on the one hand, and enhancing these capabilities with the requirements of the future on the other hand, ensures software compatibility of PCIe with the existing conventional PCI standard. Since configuration parameters are passed from software into hardware on the transaction layer interface of the protocol stack, the levels from this layer down to the physical layer need to be defined by PCIe as shown in **Figure 2**. The interface from the upper layers to the transaction layer is defined in the PCIe architecture as a superset of conventional PCI. With this approach, existing software based on conventional PCI is also usable with PCIe hardware. If new PCIe functionality is to be used, of course, software modifications are required.



Figure 2: PCI Express Protocol Stack

#### **Transaction Layer**

This layer of the PCIe protocol stack is responsible for the formatting of the payload data it gets from the higher layers (usually the operating system interface) into so-called transaction layer packets (TLP). A TLP consists of a header and a payload data section. Depending on the packet type, the header contains information about the packet format, address and routing information and a transaction descriptor that includes a transaction ID, traffic class selectors and other attributes. The packet types supported by the PCIe transaction layer are memory-, I/O-, and configuration-read/write transactions, as well as message transactions. Besides TLP assembly and disassembly, the transaction layer is also responsible for storing link configuration and link capability information. With regards to traffic prioritization, this layer implements virtual channels and traffic classes that allow isochronous data transfers via a PCIe link and the usage of a single physical link as multiple logical links.

#### Data Link Layer

The main purpose of the data link layer is to ensure correct communication between two link partners. In the transmit data direction, the data link layer receives TLPs from the transaction layer. It complements these TLPs with a CRC code for error detection and a packet sequence number and

hands this packet to the physical layer below which is responsible for physical data transmission. In the receive direction, the data link layer receives packets from the physical layer that were formatted in the same way from the data link layer of the partner device. On these packets, this layer performs an error check on the CRC code and if no error occurred, passes the received packets in TLP format to the transaction layer in the order given by the sequence number of the received packets. In order to recover from potential errors, the data link layer stores transmitted TLPs for potential retries if the data link layer of the partner device detected an error on a packet and requests re-transmission of the data from the faulty packet on. Data re-transmission is requested and acknowledged by sending packets that are generated solely within the data link layer. Besides indicating re-transmission requests and acknowledgements, these data link layer packets (DLLP) also exchange flow control information [2] like buffer size updates from link partner to link partner. In its function as a link between the transaction layer and the physical layer, the data link layer also conveys information about link states and power state requests between these two layers.

#### **Physical Layer**

The physical layer of PCIe architecture consists of a logical sub-block and an electrical sub-block. The logical sub-block defines the logical behavior of a data connection like signaling type, data scrambling, data coding and the behavior of two devices connected via a PCIe link over time whereas the electrical sub-block defines the low level electrical details of a connection such as electrical parameters, etc.

On the physical layer, PCIe is defined as a multilane embedded clock interface that uses differential signaling on point-to-point connections without the need for physical sideband signal connections. With embedded clock signaling, clock and data signals are physically combined on the same connection traces by serializing the data to be transferred and embedding clock information on this serial data stream. The transferred data does not only contain payload data, but also link control data. Embedded clocking was chosen to eliminate divergence between clock and data information. This divergence occurs due to PCB trace length mismatches between traces that route clock signals and traces that route the distinct data signals. Trace length mismatches lead to relative displacements between related clock and data signals or between related data signals. This clock-to-data and datato-data divergence is one of the biggest problems that is limiting data rate scaling of classical clocked bus or source synchronous interfaces. The embedded clock technique allows excellent data rate scalability for future generations of PCIe architecture on FR4 PCBs without the need to fundamentally change the underlying signal generation and data recovery mechanisms. In order to be immune against external influences on the low-voltage swings required in the multi-gigabit range PCIe uses differential signaling. Embedded clock connections are unidirectional by definition and thus PCIe interfaces are implemented as dual simplex connections to allow bi-directional data transfers. Another drawback of established interfaces in the computation area that limits data rate scaling is the fact that these interfaces usually are implemented as multi-drop bus structures. To avoid signal reflections caused by trace branches and stub connections especially at high data rates, PCIe exclusively uses a point-to-point interconnection scheme. Thus, a fundamental PCIe link consists of two low-voltage, differentially driven signal pairs, a transmit pair, and a receive pair that establish a connection between two partner devices as shown in Figure 3. This combination of a single transmitter and single receiver also is referred to as one lane.



Figure 3: Fundamental PCI Express Link

In order to achieve scalability not only on the data rate axis, but also within one data rate class, PCIe makes use of a so-called multilane concept. This means that a data link between devices can consist of multiple lanes. If there is more than one lane that makes up the link, a logical data stream in the transmitting device is split into several sub-streams, which are serialized separately as shown conceptually in **Figure 4**. Each of these sub-streams then is transmitted via a separate lane to the receiving device, which has to re-construct the original logical data stream after de-serializing the single sub-streams. Since each of the lanes carries its own clock information, multilane embedded clock interfaces can consider relative timing-displacements of data packets on the single lanes after they passed the transmission media during the re-construction of the original logical data stream. The relative displacements between the lanes are determined during link training in the protocol engine when a link is powered up. This allows system and PCB designers to have substantial differences in the physical trace length from differential pair to differential pair. This simplifies system and PCB layout dramatically compared to existing clocked bus or source synchronous interfaces. Depending on the number of sub-streams or lanes, the PCIe standard distinguishes x1, x2, x4, x8, x12, x16 and x32 links.



Figure 4: PCI Express Multilane Concept

With embedded clock signaling, the clock data recovery that separates the clock information from the data information in the receiving device relies on a sufficient transition density of the incoming data stream. In order to achieve this transition density, PCIe uses an 8b/10b-coding scheme [3]. Besides the required transition density, 8b/10b coding also delivers DC balanced signals that avoid baseline wander on the transmission media and allow AC coupling of data connections. AC coupling simplifies system design, since device pairs can be used regardless of the common mode voltage used on their differential signals. This allows combinations of devices that are manufactured in different processes use different termination schemes or different supply voltages. The usage of scrambling on the serial data streams avoids power peaks at distinct frequencies and thus helps to minimize EMI effects.

#### **Application Areas**

Although PCIe is mainly in the public discussion regarding the replacement of AGP and conventional PCI in PC desktop and server systems, it is not limited to these application areas. PCI-SIG actively works on form factor standards that expand the usage of PCIe beyond the PC platform to, e.g., mobile and small form factor devices with the PCIe Mini Card specification or the PCIe Wireless form factor. Distributed PCIe based systems will be enabled by the PCIe Cabling standard and modular server systems will be facilitated with the PCIe Server IO Module specification.

Besides PCI-SIG, other standardization organizations also use PCIe as a foundation for their specification developments. PCMCIA<sup>\*</sup>, for example, uses PCIe or USB for its new Express Card standard. This standard defines a form factor for hot-pluggable I/O cards that offer features similar to the well-known PC Cards with higher interface throughput and smaller physical dimensions.

Another example for standards that are PCIe architecture based is the Advanced Switching Core Standard that was developed by ASI SIG. This standard defines a switched fabric extension to PCIe that allows the replacement of proprietary backplane implementations for applications like bladed computation systems, storage systems or communication systems as shown in **Figure 5**, with a common high performance interface.



Figure 5: PCI Express Based Communication System

Potential other applications that will benefit from the Advanced Switching Standard are switches and routers, media gateways, firewalls, shared I/O communications servers, or distributed I/O communications servers.

These examples show that PCIe has the potential to become the pervasive interface standard over a tremendous range of applications throughout the electronic systems industry. Such a widely adopted standard will result in large device volumes that require efficient silicon testing infrastructure from characterization to high volume manufacturing.

#### Link Training and Status State Machine (LTSSM)

The LTSSM represents the heart of the logical sub-block in the physical layer of PCIe architecture. It is implemented as a finite state machine. Embedded clock interfaces require that the receiving device synchronizes itself on the incoming data stream. This synchronization usually is done in multiple steps with the first step performing bit synchronization to allow the clock data recovery (CDR) to sample the data bits with maximum setup-hold time margins. After bit synchronization is achieved, the receiving device identifies the word boundaries of the incoming data stream that is the basis for later identification of packet frames. With multilane interfaces, the electrical length differences between lane traces making up a link connection between two devices are compensated in the receiving device to allow correct re-construction of the data distributed over the lanes for transmission. For PCIe, one responsibility of the LTSSM is to perform these steps that are fundamental to start up a data link correctly.

Besides these synchronization tasks that are common for all embedded clock interfaces, the LTSSM also performs PCIe specific tasks such as receiver detection or link and lane configuration negotiation. The LTSSM also contains states that implement PCIe specific DfT features and power management. A high level diagram of the LTSSM with its states and state transitions is shown in **Figure 6**. A short description of these high level states follows.



Figure 6: PCI Express Link Training and Status State Machine (LTSSM) Copyright © 2002, 2003 PCI-SIG

#### **Detect:**

In the detect state of the LTSSM, a PCIe device is checking whether a link partner is connected to its transmitters. Detection is done for each transmitter separately and is the basis for later link width negotiation between two link partners. From electrical idle signal state, which pulls both legs of the differential signal via a high impedance termination to a common mode voltage, a common mode voltage pulse is sent into the media connected to the transmitters of a link. Depending on the time constant of the media-response to this pulse, a transmitter can judge whether a far end differential termination is connected or whether it is driving into an open ended connection medium as shown in **Figure 7**.



Figure 7: Receiver Detection Pulse Response x-Axis: time/usec ; y-Axis: line response/V Copyright © 2002, 2003 PCI-SIG

#### **Polling:**

The polling state of the LTSSM is responsible for establishing bit and word synchronization between two connected link partners. These synchronization tasks are achieved by exchanging, analyzing and adapting to training sequence ordered sets (TS1 and TS2) sent between two connected devices. In this state, potential polarity inversion per differential connection is also done if PCB layout restrictions force a system designer to connect the positive leg of a transmitter to the negative leg of a

receiver. For future generations of PCIe, speed negotiation between link partners takes place in the polling state of the LTSSM.

#### **Configuration:**

This LTSSM state performs lane-to-lane de-skew, evaluates the data content of the training sequences that are exchanged between link partners and branches into other LTSSM states (e.g., Disabled, Loopback...) if the appropriate bits in the training sequences are set. Besides forced branch requests to other states, the training sequences also contain information about desired link configurations like scrambler control, the minimum number of training sequences required to establish bit and word lock, as well as link and lane numbering. To ensure a consistent link and lane numbering between link partners and to guarantee that an established link fulfills the link width capabilities of both link partners, link and lane numbers are negotiated between the two devices while in configuration state of the LTSSM.

#### L0, L0s, L1 and L2:

After leaving the configuration state of the LTSSM to L0 state, a link between two partners is fully up and running and can be used for packet exchange. If no data exchange takes place, idle data characters are sent in L0 to keep bit and word lock on the receiver connected to a transmitter. L0s, L1 and L2 are different power saving states that implement three levels of power saving potential with the lowest power saving in L0s and the highest power saving in L2. Depending on the power saving level, resuming into data exchange mode L0 results in different return paths through the LTSSM to L0. These different return paths require different amounts of time to re-establish a fully operational link. The options to transit to L0 from one of the power saving states vary from just sending a certain amount of training sequences for L0s to a whole cycle through the LTSSM starting from detect state for L2.

#### **Recovery:**

Based on the negotiated link and lane numbers, recovery state is re-establishing bit and word synchronization and serves as a branching state to L0, Configuration, Loopback, Hot Reset and Configuration.

#### Loopback:

The LTSSM loopback state establishes loopback paths between the receivers and transmitters of a PCIe physical layer implementation (PHY). Data patterns stimulated on the Rx pins of a PCIe device appear with some latency exactly as received on the associated Tx pins of the device.

#### **HotReset:**

In this state, transmitters signalize their partner devices that a reset action was initiated from a higher protocol level by sending training sequences with the reset bit asserted. Upon receipt of training sequences with the reset bit asserted, a device sends training sequences with the reset bit asserted and enters detect state after a timeout period.

#### **Disabled:**

In disabled state, a link switches to electrical idle signaling after the partner devices have been informed about this by sending out training sequences with the disable link bit asserted.

### **Electrical Specifications**

The electrical PCIe specifications, like most other embedded clock signal interface standards are oriented on the data eye openings, a transmitter has to ensure and a receiver has to be able to identify in order to guarantee an error-free communication between two partner devices. The difference between the minimum transmitter eye and the minimum receiver eye defines the level and timing budget a system design can use for its connection losses. The minimum data eyes for a compliant PCIe transmitter and receiver are shown in **Figure 8**.



Figure 8: PCI Express Transmitter and Receiver Minimum Eye Masks Copyright © 2002, 2003 PCI-SIG

These two data eyes nail down the most important level and timing parameters for PCIe such as minimum differential peak-to-peak voltages, minimum data eye widths and maximum timing jitter numbers. The importance of these data eyes indirectly is also contained in the detailed electrical parameters for PCIe transmitters. For the transition time of a transmitter signal, only a minimum value of 0.125 UI is specified. There is no maximum specification. The only rule that limits the maximum transition time is the transmitter data eye compliance. Thus, for devices with low jitter, slower transition times are allowed than for devices with higher jitter numbers, since the transition time in combination with timing jitter on the transitions defines the data eye boundaries.

Besides the data eyes, another important parameter group for differential low swing signal interfaces are the DC parameters that define termination impedances of the transmitters and receivers. These impedances have substantial influence on the signal and data eye shape and the correct alignment of the positive and negative signal leg that form a differential signal. In addition to these DC impedance values, PCIe also specifies parameters for the AC impedance matching in the form of return loss parameters for both, transmitters and receivers.

Besides signal transition time, the electrical specification contains timing related parameters that define the allowed bit time or unit interval (UI) ranges as well as the lane-to-lane skew values that have to be met by the transmitters of a link and that have to be tolerated by the receivers of a link. The UI width specification, in fact, is defining the allowed  $\pm 300$ ppm frequency offset between either the nominal 100 MHz reference clock [4] and the actual reference clock frequency or the nominal local reference clock and the reference clock used to generate the data stream received by the local device since a relative frequency offset from nominal on the reference clock input of a device directly translates to the same relative change in UI width on a transmitter.

Another set of electrical parameters defines the level values various common mode voltage measurements have to fulfill. The most common of these parameters that also is found in most other differential signal interfaces is the AC peak common mode voltage parameter. PCIe defines this value as an rms value that is measured over 250 consecutive UIs, which differs from the maximum absolute peak-peak values used in most other standard specifications. Another special parameter in the common mode voltage parameter group of PCIe is the absolute delta between the common mode voltages of the positive and the negative leg of a differential signal. From a first glance, there is no obvious reason for this parameter because DC balanced signal coding is used and the two legs of the differential signal usually are physically tied to the same current source in the transmitter. Thus, no difference in common mode voltage between the legs should be expected. The underlying physical deviation for this parameter however is duty cycle distortion for the transmitting clock, which

exhibits as a relative common mode shift between the two legs of a differential signal, even if DC balanced data streams are used.

Additional electrical parameters defined in the parameter lists of PCIe are voltage levels for PCIe specific operation modes like electrical idle and detect or time constants that apply for operation mode changes of the PCIe frontends. An overview over all electrical parameters for PCIe transmitters and receivers can be found in **Figure 9** and **Figure 10**.

| Symbol                           | Parameter                                                                        | Min    | Nom  | Max    | Units |
|----------------------------------|----------------------------------------------------------------------------------|--------|------|--------|-------|
| UI                               | Unit Interval                                                                    | 399.88 | 400  | 400.12 | ps    |
| VTX-DIFFP-P                      | Differential Peak to Peak Output Voltage                                         | 0.800  |      | 1.2    | ٧     |
| VTX-DE-Ratio                     | De-Emphasized Differential Output Voltage<br>(Ratio)                             | -3.0   | -3.5 | -4.0   | dB    |
| TTX-EYE                          | Minimum TX Eye Width                                                             | 0.70   |      |        | UI    |
| TTX-EYE-MEDIAN-IS-MAX-<br>JITTER | Maximum time between the jitter median<br>and maximum deviation from the median. |        |      | 0.15   | UI    |
| TTX-RISE, TTX-FALL               | D+/D- TX Output Rise/Fall Time                                                   | 0.125  |      |        | UI    |
| VTX-CM-Acp                       | AC Peak Common Mode Output Voltage<br>(rms)                                      |        |      | 20     | mV    |
| VTX-CM-DC-ACTIVE-IDLE-<br>DELTA  | Absolute Delta Between DC Common<br>Mode During L0 and Electrical Idle.          | 0      |      | 100    | mV    |
| VTX-CM-DC-LINE-DELTA             | Absolute Delta Between DC Common<br>Mode between D+ and D                        | 0      |      | 25     | mV    |
| VTX-IDLE-DIFFp                   | Electrical Idle Differential Peak Output<br>Voltage                              | 0      |      | 20     | mV    |
| VTX-CM-RCV-DETECT                | The amount of common mode voltage<br>change allowed duringReceiver Detection.    |        |      | 600    | mV    |

| Symbol                | Parameter                                                                                                | Min | Nom | Max     | Units |
|-----------------------|----------------------------------------------------------------------------------------------------------|-----|-----|---------|-------|
| TTX-IDLE-MIN          | Minimum time spent in Electrical Idle                                                                    | 50  |     |         | UI    |
| TTX-IDLE-SET-TO-IDLE  | Maximum time to transition to a valid<br>Electrical Idle after sending an Electrical<br>Idle ordered-set |     |     | 20      | UI    |
| TTX-IDLE-TO-DIFF-DATA | Maximum time to transition to valid TX<br>specifications after leaving electrical idle<br>condition      |     |     | 20      | UI    |
| VTX-DC-CM             | Tx DC Common Mode Voltage                                                                                | 0   |     | 3.6     | ٧     |
| ITX-SHORT             | Tx Short Circuit Current Limit                                                                           |     |     | 90      | mA    |
| RLTX-DIFF             | Differential Return Loss                                                                                 | 12  |     |         | dB    |
| RLTX-CM               | Common Mode Return Loss                                                                                  | 6   |     |         | dB    |
| ZTX-DIFF-DC           | DC Differential TX Impedance                                                                             | 80  | 100 | 120     | Ω     |
| ZTX-DC                | Transmitter DC Impedance                                                                                 | 40  |     | -       | Ω     |
| LTX-SKEW              | Lane-to-Lane Output Skew                                                                                 |     |     | 500+2UI | ps    |
| Стх                   | AC Coupling Capacitor                                                                                    | 75  |     | 200     | nF    |
| Tcrosslink            | Crosslink Random Timeout                                                                                 | 0   |     | 1       | ms    |



| Symbol                 | Parameter                               | Min    | Nom | Max    | Units |
|------------------------|-----------------------------------------|--------|-----|--------|-------|
| UI                     | Unit Interval                           | 399.88 | 400 | 400.12 | ps    |
| VRX-DIFFp-p            | Differential Input Peak to Peak Voltage | 0.175  |     | 1.200  | V     |
| TRX-EVE                | Minimum Receiver Eye Width              | 0.4    |     |        | UI    |
| TRX-EYE-MEDIAN-to-MAX- | Maximum time between the jitter median  |        |     | 0.3    | UI    |
| JITTER                 | and maximum deviation from the median.  |        |     |        |       |
| VRX-CM-ACP             | AC Peak Common Mode Input Voltage       |        |     | 150    | mV    |
| RL <sub>RX-DIFF</sub>  | Differential Return Loss                | 15     |     |        | dB    |
| RL <sub>RX-CM</sub>    | Common Mode Return Loss                 | 6      |     |        | dB    |
| ZRX-DIFF-DC            | DC Differential Input Impedance         | 80     | 100 | 120    | Ω     |
| ZRX-DC                 | DC Input Impedance                      | 40     | 50  | 60     | Ω     |
| ZRX-HIGH-IMP-DC        | Powered Down DC Input Impedance         | 200k   |     |        | Ω     |
| VRX-IDLE-DET-DIFFp-p   | Electrical Idle Detect Threshold        | 65     |     | 175    | mV    |
| TRX-IDLE-DET-DIFF-     | Unexpected Electrical Idle Enter Detect |        |     | 10     | ms    |
| ENTERTIME              | Threshold Integration Time              |        |     |        |       |
| LRX-SKEW               | Total Skew                              |        |     | 20     | ns    |

Figure 10: PCI Express Electrical Parameters for Receivers

Two parameters of this list that require special consideration are  $V_{TX-DE-Ratio}$  and the values around jitter and eye openings. In order to compensate expected losses caused by the bandwidth limitation of the PCB connection between two devices, the first bit after a transition (transition bit) in the data stream is driven at a higher level than the subsequent bits representing the same logical level. From a frequency domain point of view, this increases the magnitude of all frequency components of the data stream bins for the first bit after a transition. Thus, effects that are caused by the low pass characteristic of a typical PCB interconnection are minimized. One example for such an effect is signal-droop as shown in **Figure 11** that translates into data dependent jitter due to Inter-Symbol-Interference (ISI).



Figure 11: Signal Degradation through PCB Connection without De-Emphasis (top) and with De-Emphasis (bottom) Stimulation Signal Left and Signal after PCB Trace Right

Regarding jitter parameters, PCIe uses a jitter extraction algorithm to obtain jitter numbers from a measured data stream [5]. This algorithm considers a database of 3500 measured UIs to perform a

software CDR function on this database. The jitter numbers that apply for the transmitters and receivers with this database are evaluated based on the data analysis of a 250 UI window in the center of the 3500 UI. The reason for calculating the jitter numbers using this algorithm is to weight low jitter frequencies less than high frequency jitter components. This weighting is done because PLLs providing the sampling clock for the CDR can track low jitter frequencies but not high jitter frequencies on the incoming data streams. Thus, high frequency jitter is more likely to cause wrong data latching than low frequency jitter. The jitter transfer characteristic that describes the jitter tracking capabilities over jitter frequency of the CDR algorithm used for PCIe is shown in **Figure 12**.



Figure 12: Jitter Transfer Characteristic of PCI Express 3500UI/250UI CDR Algorithm x-Axis: Jitter frequency/Hz ; y-Axis: jitter transfer/dB Copyright © 2004 PCI-SIG

#### Notice:

This paper is based on revision 1.0a of the PCI Express base specification. Revision 1.1 of the PCI Express base specification currently is under review by the PCI-SIG members. If the review results in an approval of the proposed changes, electrical parameters regarding jitter will change. This most likely will affect the specified jitter numbers as well as the jitter calculation algorithm.

#### **Test Requirements**

If the life cycle of a device is analyzed regarding the usage of ATE equipment in the various test steps that apply, one can see that there are mainly three sections that usually require the involvement of ATE systems with their ability to integrate into a production environment that allows to gather a substantial amount of device data efficiently. The first of these steps is the transition from design to manufacturing where ATE equipment is used to support design verification and device characterization in combination with box or bench test equipment. The second life cycle step where ATE equipment usually is seen is the production and technology ramp. Especially for new technologies like PCIe today, the test quality and minimization of test escapes in this phase of a device's life is of utmost importance to ensure that the technology will be successful in its target markets. Once this technology ramp is completed and devices usually are fully understood. In this phase, test optimization regarding cost of test and test methodologies based on ATE equipment that offers best integration in high volume manufacturing lines is done.

All three of the phases above pose different requirements on the ATE instrumentation and test methodologies used. For the design verification and characterization phase, most accurate tests on the complete parameter list are required. In this phase even parameters that can be guaranteed by design are verified directly to ensure that design implementation is correct. For PCIe such

parameters are, for example, timing parameters that are generated by device internal timers such as T<sub>TX-IDE-MIN</sub>, T<sub>TX-IDLE-SET-TO-IDLE</sub>, T<sub>TX-IDLE-TO-DIFF-DATA</sub> and T<sub>crosslink</sub>. For design verification and characterization ATE based test is complemented by box and bench equipment measurements or system level tests for test items that once verified can be guaranteed by design and which are more efficiently tested in a bench of system environment. For the parameters that are tested on the ATE, test instrumentation has to offer optimum performance regarding DC accuracy using integrated parametric measurement units (PMUs) to be able to accurately characterize DC parameters such as termination impedances and DC common mode voltages. Best AC level and timing accuracy is important to be able to characterize data eye parameters such as differential swing and jitter parameters. For characterizing the de-emphasis behavior of PCIe, the most critical instrumentation characteristic is analog bandwidth. Since de-emphasis is used to compensate bandwidth limitations of device connections, test instrumentation is not allowed to exhibit similar bandwidth limitations since the effect to be measured is not visible any more if this would be the case. Another important instrument requirement is given by the multilane architecture of PCIe. Multilane links on one hand require accurate relative timing measurements among ATE tester channels in the ps resolution range for Tx lane-to-lane skew measurements and wide relative timing shifts between these channels in the 10s of ns range, on the other hand for the Rx lane-to-lane skew tolerance characterization.

For the design and technology ramp phase of the device cycle, ATE based testflows usually do not test for parameters that are guaranteed by design and that have proven to be stable with enough distance to the allowed value boundaries during characterization. The most important task for PCIe testing in this phase is to ensure data eye compliance and DC parameter accuracy. Also tracking of other parameters that have exhibited narrow test margins during characterization is important during technology ramp. Since test time is already under tight control in this test phase, only the values of the most critical parameters that were identified during characterization are tracked. The compliance of all other parameters is ensured by guard-banded pass-fail tests. Since the measurement accuracy has to be in the same range as for characterization, requirements for the test equipment do not change substantially compared to the characterization phase. In fact, some of the requirements might even be tougher since test time is of more importance due to the higher volume that needs to be tested in this ramping phase. One example for this is the flexibility in termination mode switching ATE instrumentation has to provide. Whereas multiple test programs with different hardware configurations are acceptable for characterization to change ATE termination modes from, e.g., differential termination to high impedance termination, ATE equipment has to be capable of switching between these modes within one program on the fly for tests in the technology and device ramp phase.

In the volume-manufacturing phase, the main goal is to find the optimum balance between test cost and test coverage. In order to achieve this balance, DfT approaches such as Tx to Rx loop back configurations (far end loop back) are used widely. Since most of these DfT approaches are either rather new or lack in parametric variability to really stress device components such as the PCIe receivers, ATE equipment has to offer capabilities to support the DfT based testing of PCIe devices. One example for such a DfT support is to provide instrumentation that allows timing and level parameter variation in a loopback path between a Tx and Rx lane of the PCIe interface [8][9].

#### **Test Support**

The LTSSM of PCIe implements two states that are useful for testing the parametrics and a substantial portion of the logic inside the PCIe physical frontend (PHY) implementation. The first of these states is the Polling.Compliance sub-state of the Polling state. This state is entered if a transmitter detects a far end termination, but does not receive a valid differential signal-level on its associated receiver. This behavior indicates that the device is operated in a test environment and the transmitter starts to send a compliance pattern continuously. The content of the compliance pattern is defined to maximize ISI on the lanes and crosstalk between lanes of links with a width beyond x1. Maximum ISI is achieved by using a character with the maximum runlength 8b/10b codes allow in combination with a character containing the shortest runlength. This leads to a pattern containing an

alternating flow of a comma character (K28.5 with maximum runlength 5) and a clock pattern character (D10.2 or D21.5 with maximum runlength 1). To obtain a well-defined compliance pattern, the standard specifies that the first K28.5 character of the K28.5-D21.5-K8.5-D10 compliance pattern building block has to have negative disparity. In order to achieve crosstalk maximization, it is required that not all of the lanes within a link generate this pattern synchronously, but that one victim lane stays stable for a certain number of UIs while all other lines operate as aggressors and perform maximum switching. This is achieved with the K28.5-D21.5-K8.5-D10.2 building block by inserting two additional delay characters in one of the lanes before and after this building block. If an overlay of a lane without delay character insertion is done with a lane that contains delay character insertion, one can see that the K28.5 characters with minimum switching activity on one lane falls together with the maximum switching activity characters D21.5 and D10.2 on the other lane. The delay characters are selected as K28.5 characters. A pair of two delay characters is required to keep the data insertion disparity-neutral. The delay character insertion starts at lanes 0, 8, 16 and 24 of a link, continues to shift through all lanes and starts over on the initial lanes once each lane has inserted a building block with delay characters once. With this insertion scheme that is visualized in Figure 13, each of the lanes served as victim once while all other lanes operated as aggressors.



Figure 13: PCI Express Compliance Pattern with Delay Character Insertion

The second LTSSM state supporting the test of PCIe devices is the Loopback state that implements a far end loopback path between the receivers of a PCIe PHY with their associated transmitters. This mode of operation is activated by sending a training sequence ordered set (TS1 or TS2) with the loopback bit of the sequence enabled to a device receiver.

#### **Test Challenges**

Embedded clock interfaces in general and PCIe technology in particular exhibit multiple challenges for ATE based testing using general-purpose test instrumentation. With PCIe, there are two notions of non-determinism in the data streams a device generates [6][7]. The first notion of nondeterminism that is inherent for all embedded clock interfaces is the variable latency of the transmit data stream. This means that the exact position where a bit/word starts and where a bit/word ends is not predictable with respect to an external frequency reference (which the ATE runs on) and that the timing relationship between an embedded clock data stream and an external reference frequency might change if the reference clock feeding the PCIe device's PLL is shut down and re-started. In order to deal with this challenge, ATE equipment has to be able to provide a continuously running reference clock and needs to have the capability to adapt its timing system to the timing dictated by the device under test (DUT). One way to address this challenge with an ATE system is to use output dependent timing approaches, which identify bit and word boundaries using timing searches and match loop implementations. The timing of the ATE is adapted to the timing values found during the searches to align the compare strobe positions to the correct positions with regards to the DUT data stream. A search function on the compare strobe position for the Rx pins is done to identify the center of the data eyes and the loopback latency. After these two variables have been identified, the

strobe positions of the ATE comparators are programmed to the center of the data eye with the



Figure 14: Compare Strobe Alignment Principle to Enable Functional Tests on Rx Pins

appropriate latency per lane as shown in Figure 14.

The critical parameters of an ATE that are important to be successful with this approach are the range in which the timing can be moved, the at-speed match loop capabilities and the capability to keep the reference clock of the DUT running during all the searches and timing re-programming. Once the correct strobe positions are identified and set, correct functional tests can be performed on general purpose ATE hardware.

The second notion of non-determinism is non-deterministic data that occurs on PCIe devices. Nondeterministic data occurs because embedded clock devices like PCIe components have to be able to handle frequency offsets between link partners on the one hand and rely on continuously running data streams on the other hand. If two link partners communicate with a slight frequency offset, the slower partner is not able to process all the data at the pace it arrives. Since the data stream is continuously running, the receiving device has to skip some of the incoming data when its input buffers are full (or close to full). In order to prevent the device to skip payload data, PCIe foresees that transmitters have to insert so called skip ordered sets in specified interval boundaries into the data stream they send. Receiving devices then can skip or add dummy data (SKP symbols) to these skip ordered sets to adapt to a potential frequency offset without loosing payload data. The problem with skip ordered sets is that these are inserted into the data stream in a non-deterministic manner. It cannot be predicted with the help of simulations or other means, when an ATE has to expect such a skip ordered set. Thus, tests requiring real-time compares to pre-stored vector patterns are not possible in a normal operation mode of PCIe. Moreover, also undersampling techniques are also not possible, since the skip ordered sets occur at different positions from run to run for each sampling point. The only possibility to test a device with this native behavior is to do real-time sampling at the native data rate and time consuming post-processing of the gathered data or to do the tests for this operation mode in a loopback configuration that allows the generation of data streams without SKP ordered sets. A better way to test the functional behavior of PCIe however would be to enable such a test by means of DfT. There is either the possibility to make the occurrence of skip ordered sets predictable using DfT or to switch off the skip ordered set generation of the DUT completely. Since the ATE provides a well-controlled environment, it is ensured that no frequency offset will occur and thus, no skip ordered set generation is necessary in the communication with ATE equipment.

The LTSSM poses another challenge on testing PCIe devices. In order to walk through the state machine in a mission mode manner, handshaking between the DUT and the ATE is required since in such a case, the ATE is considered being a partner device on the link. This, however, requires real-time response calculation of the ATE, which is not feasible with general purpose ATE equipment. There are several ways to solve this problem. The first of these solutions performs bit and word lock as described above on the LTSSM Polling.Compliance state. Once bit and word lock on the ATE is achieved, the LTSSM can be executed from Polling.Compliance to L0 by sending the data to the DUT that is expected within a certain time window. This data is independent of the DUT's response as long as the implementation of the LTSSM states that are not required in an ATE environment. Since the DUT is well known by the test engineer, handshaking procedures like link width

negotiation, link speed negotiation, link and lane numbering negotiation and polarity inversion are not required in the ATE environment. Since all critical handshaking LTSSM states deal with the uncertainties a device might see in a system environment, it is feasible to bypass these states in the controlled ATE environment by means of DfT in the DUT. A third option to address this issue would be to implement protocol aware ATE hardware or use a golden device approach. Since protocol testing and the test of electrical low level parameters which is the target for ATE based testing however, access the device on very different levels, it might be difficult to get appropriate access to the low level electrical parameters from a higher protocol level for measurement and debug purposes.

The last test challenges to be mentioned here are challenges on the lowest electrical level such a low swing capabilities in the range of less than 30mV single ended, analog bandwidth requirements in the range of 4 GHz for 1<sup>st</sup> generation PCIe devices and raw data rate capabilities with sufficient signal quality. All of these low level challenges have to be fulfilled on multiple parallel ATE channels to address the multilane architecture of PCIe that can accommodate up to 32 differential lanes for a single link.

# **ATE based PCI Express Testing**

#### **Design Verification/Characterization**

The test configuration proposed for the verification and characterization of PCIe devices is to measure the required transmit parameters on the PCIe compliance pattern that is generated by the device after power up if no valid differential signal is received on the Rx pins as shown in **Figure 15**a. After the verification of the transmit parameters, the receiver parameters are checked in loopback mode as shown in **Figure 15**b by varying the parameter to be tested on the data stream that is sent to the Rx pins. The sensitivity of the receivers to this parameter variation then is checked by a comparison of the looped back data sent by the transmitters to expected data on the ATE. Since the correct functionality of the transmitters has been checked before, failures in this test can be isolated to the parameter variation that was done. It is important to notice that AC coupling between the DUT pins and the ATE test resource in both cases is not required if the termination voltage of the test resource can be controlled and thus is adaptable to the common mode voltage the DUT provides on its transmitters and expects on its receivers. Leaving out the coupling capacitors that are optional for testing PCIe allows testing DC parameters without the need to insert switching relays into the sensitive high-speed connection.



Figure 15: PCI Express Test configuration for a) Transmitter and b) Receiver

In order to cover most of the specification parameters without introducing bandwidth limiting or other signal degrading components like relays into the high speed signal path, the ATE hardware used to test the PCIe parameters needs to offer the flexibility to implement all required configuration modes on multiple parallel lanes. Currently, integrated high-speed digital ATE cards offer the best choice that fulfills these requirements. Characterization measurements using this type of hardware can be done if the used test resource offers the required bandwidth, accuracy and resolution. Even though these cards are targeted for digital test applications, they can be used to sample the PCIe compliance pattern with high accuracy and resolution in an analog manner on all the lanes of a link in parallel. In **Figure 16**, a PCIe compliance waveform sampled with such a differential digital high-

speed ATE channel at a resolution of 1ps and 1mV is shown. Since the sampling algorithm that was applied in this case does not rely on a passing functional test, but only on the analysis of functional test result transitions, variable latencies are no issue for this type of measurements as long as the reference clock of the DUT is applied continuously. Thus, timing searches to adapt the timing references between the ATE and the DUT are not required for this measurement. As one can see from this measurement, the de-emphasis effect can be clearly identified on the sampled waveform, which is a sign that the ATE integrated digital high-speed pin electronic used for this particular measurement offers sufficient analog bandwidth headroom to perform PCIe characterization measurements.



Figure 16: PCI Express Compliance Pattern Sampled with High-Speed Digital ATE Channel (1ps/1mV resolution)

It is obvious that parameters like differential peak-to-peak output voltage, transition times and the de-emphasis ratio can be extracted from this waveform. Moreover, standard compliant jitter analysis is possible on this waveform using the CDR algorithm and filter function defined in the PCIe base specification and associated documents. If the waveform of **Figure 16** is fed into a tool available from PCI-SIG implementing standard compliant signal analysis, test results and data eyes as shown in **Figure 17** are obtained.



Figure 17: PCI Express Compliant Signal Analysis Results of ATE-Sampled Compliance Pattern Waveform (PCI-SIG analysis tool overall result window with transition and non-transition data eyes)

Of course the functionality of such a tool also can be implemented in an ATE environment since the algorithms extracting the reported numbers are described in the PCIe base specification as mentioned before. In case more thorough characterization of the transmitters is required, such as spectral jitter decomposition [11] to identify potential jitter sources or RJ-DJ separation, digital high-speed ATE channels can also accomplish these tasks [12].

If the waveforms for multiple lanes within a link are sampled simultaneously, it is possible to analyze the relative timing between the lanes (lane-to-lane skew) based on the sampled waveforms as well as the optimum compare strobe positions for potential subsequent functional tests. For parameters requiring information about single ended voltages such as the AC peak common mode output voltage, the same sampling approach is used for characterization measurements. The voltage level information of the positive and negative leg is acquired separately and the relevant PCIe parameters are extracted in post processing steps on the acquired waveforms. All of these waveforms can also be used to calculate values that specify DC level behavior of single legs or the differential signal such as the absolute delta between DC common mode between positive and negative leg by calculating the medium level values the waveforms represent. This signal sampling approach can also be used to measure all of the PCIe parameters related to electrical idle signaling. In order to do so, the measurement resources of the ATE have to be flexible enough to switch termination modes from 100-Ohm differential termination to high impedance termination within a test program or test flow. A low impedance termination of the measurement resources during electrical idle signaling of the DUT is not allowed with the DC coupling used between the ATE and the DUT since low impedance DC terminations on the signal traces would influence the electrical idle signals provided by the DUT via high impedance resistors. Depending on the type of termination, the positive and negative leg of a lane would be pulled together to the same voltage via the 100-Ohm termination inside the measurement resource instead of the high impedance terminations within the DUT. For a 100-Ohm center-tapped ATE termination, both legs would be pulled to the center tap termination voltage of the ATE. If the used ATE equipment is able to provide high impedance termination up to the required data rates, the two legs of a lane are sampled in high impedance mode to extract electrical idle related parameters as shown in Figure 18.



Figure 18: Single Leg PCI Express Signals with Transition to Electrical Idle

The same termination capabilities are also required if the receiver detection mechanism of PCIe is to be characterized since this makes use of a detection pulse that is driven into the pins of a lane via high impedances inside the DUT. In order to measure the pulse generated by the device, low impedance terminations are not allowed at the far end of the line, which is represented by the ATE measurement resource, since only the termination voltage would be measured instead of the detection pulse.

A very important group of parameters that need to be characterized are the on-die termination impedances of the Tx and Rx pins. These impedances directly influence the signal shape and relative displacement between the positive and negative leg that make up the differential signal. ATE systems provide parametric measurement units (PMUs) that are used to do highly accurate two-point DC measurements of the termination characteristic. Two-point measurements are required to become independent of potential error terms inside the DUT that can cause an offset of the I-V impedance characteristic from its ideal behavior. The PMUs of the ATE are also used to measure DC voltage or current parameters that are not extracted from the sampled waveforms such as the specified Tx short current. In order to be able to do PMU based measurements of the termination impedances on Tx pins, it is required that the pins are held in a static condition during the measurement.

Characterizing the receiver AC parameters of a lane requires the execution of functional tests on the Tx lane associated to the Rx lane that is tested. This is the case since Rx parameters are varied in loopback mode until a functional test result change is observed on the associated Tx pin. In order to be able to execute functional tests on the transmitters, it is necessary to align the strobe positions of the comparators on the ATE's high-speed digital card as described in the test challenges chapter. Once functional testing is possible on the Tx pins, reducing or increasing the voltage levels of the ATE high-speed drivers that provide the data streams to the Rx pins is used to characterize the AC level parameters of the receivers. Since the AC input level specification for PCIe receivers distinguishes between four states (electrical idle, electrically valid differential signal, logically valid differential signal, and invalid signal swing), three level boundaries have to be characterized. Starting from a pass condition in the nominal Rx level range with, e.g., a compliance pattern, the upper and lower boundaries for the differential input peak-to-peak voltage are measured. This range represents the area of a logically valid differential signal. The lower boundary value of this parameter at the same time represents the transition boundary from a logically valid differential signal to an electrically valid signal which is recognized as differential signal by the receiver, but which does not allow correct data extraction anymore. The lower boundary of this electrically valid differential signal range has to be within the specification range for the electrical idle detect threshold parameter. The electrical idle detect threshold is usually measured in a special

configuration of the DUT that allows to monitor an internal electrical idle detection or squelch detection signal. This signal indicates whether a receiver sees a valid logical differential signal or whether it interprets the differential level at its inputs as electrical idle signal. If such a monitoring signal is not available, a functional test that compares the Tx pins to a static electrical idle level is used. In such a case, a search with an increasing Rx input level starting with a value that causes an electrical idle state of the DUT and thus a pass for this functional test is done. A pass to fail transition on the Tx identifies the lower boundary value for the electrical idle detect threshold parameter. Of course, this search can also be done in the reverse direction searching for a functional fail to pass transition. Since the voltage levels, especially for the electrical idle threshold detection require single ended swings in the range of 30mV, this test represents the toughest level related challenge of the PCIe specification for ATE pin electronics in two dimensions, accuracy, as well as, programmable level swing.

In order to characterize the minimum receiver eye width and the jitter related receiver specification values, jitter injection capabilities of the ATE high-speed pin electronic drivers are required to close the stimulated data eye according to the specification requirements. Since these jitter injection capabilities are a fundamental requirement for all high-speed embedded clock signal standards, ATE vendors are aware of this requirement from their activities in the communication arena where these type of interfaces are used for quite some time. Thus, state-of-the art, high-speed pin electronic cards have such jitter injection capabilities integrated, as shown in **Figure 19**.



Figure 19: ATE High-Speed Digital Driver Data Eyes with Jitter Injection

As mentioned before, ATE based tests in the design verification and characterization phase usually are complemented by tests on bench and box equipment. Parameters that are difficult to test on ATE equipment are verified on specialized box equipment. For PCIe, these parameters are return loss measurements and frequency offset measurements if appropriate DfT features are not implemented in the DUT. For return loss measurements, the main difficulty is the electrical connection to the DUT that usually consists of PCB traces of considerable length that influence the measurements. The difficulty with frequency-offset measurements on the electrical sub-block of the PHY mainly is the difficult synchronization between the frequency domains that are offset by minimal fractions of the nominal frequency. Thus, both domains only are in sync after very long time intervals that need to be identified by the stimulating and receiving frequency domain. This task is much better achieved on test equipment that operates on higher levels of the protocol such as protocol analyzers.

#### **Production/Technology Ramp**

One goal of the production and technology ramp phase is to get an understanding of the failure mechanisms that are present in new technologies such as PCIe and to separate parameters that are critical for the standard compliant function of the DUT from parameters that are well controlled within the required boundaries with the manufacturing processes applied. Another goal is to replace the test time consuming fully specification compliant test implementations by test setups that improve test time and reduce test cost per device which is of utmost importance for high volume manufacturing (HVM). In order to be able to do so in the short time frames dictated by the market,

easy correlation between the characterization tests and the production ramp tests is required until a full understanding of the new device and technology is obtained. The easiest and fastest way to achieve this correlation is to use measurement equipment that has the same capabilities as the equipment used for characterization, such as an ATE integrated digital high-speed pin electronic. On one hand, this allows to perform tests using the same test methodologies and accuracy as for characterization. On the other hand, new test time optimized test approaches for the parameters to be measured can be implemented. Correlation between these two test implementations then can easily be checked by applying the two tests to the same DUT even in the same test program.

In the production and technology ramp phase, parameters that can be guaranteed by design usually are not tested any more since their specification compliance was already verified before, and as long as no design changes take place, these parameters also are not expected to change. Examples for such parameters in the PCIe specification are all parameters that are based on digital counters, such as most of the timeout parameters and transition times from one state to another. The correct implementation of such counters is preferably checked by digital means such as scan tests instead of a direct measurement of the effect the counters trigger.

The tests for most of the parameters that cannot be guaranteed by design are reduced from value measurements to pass/fail tests with appropriate guardbanding. This is especially true for DC parameters such as the specified impedance values. For the remaining parameters that cannot be covered with simplified versions of the characterization measurements, test methodologies are applied that differ from the characterization measurements. Very often, these test methodologies cover more than one parameter for one test execution. One example for such a test that is commonly used on ATE equipment is the fast eye mask test on the data stream generated by a transmitter. As already mentioned in the paragraph on the electrical specifications of PCIe, the most important compliance criteria for PCIe devices is that they do not violate the transmit and receive data eyes of Figure 8. These data eyes contain information about the most critical timing and level parameters. A fast eye mask test checks whether a signal generated by a device is within the given boundaries by applying functional tests with the timing and level parameters of the ATE pin electronic set to the corners of the specified data eye as shown in **Figure 20**.



Figure 20: Fast Eye Mask Test Implementation on High-Speed ATE Pin Electronic

In order to be able to perform fast eye mask tests reliably on an ATE, the high-speed pin electronic cards used have to offer best accuracy and programming resolutions in multiple the time, as well as, the level dimension. From **Figure 20** it easily can be seen that a misalignment between the generated data eye and the test points directly will affect the yield that is achieved using this test. Since the timing reference of the ATE is based on timing searches over the data stream generated by the DUT and since the test points usually are programmed relative to this timing reference, the programming accuracy of the test points is defined by the accuracy of the alignment measurements and the programming linearity of the compare levels and compare strobe positions combined with the available programming resolution. Level and timing programming resolutions of 1mV and 1ps

with lowest differential and integral non-linearities are an absolute requirement for fast eye mask implementations. On the receive side of the DUT, a similar test is implemented by testing at the specified eye corner levels with injection of the maximum peak to peak jitter allowed by the specification.

If some more information is required about the jitter performance of a transmitter than just eye compliance, ATE high-speed pin electronics that provides certain hardware features can also be used to perform fast jitter measurements during production ramp or even in production testing. If the used pin electronic offers a hardware based error counter per pin, jitter can be measured by placing a compare strobe in a constant error count region (preferably a pass region with an error count of zero). If the compare strobe for a pin is then moved over the data stream and the error count number is reported for each compare strobe position, the primitive of the probability density function (PDF) for the data transitions is obtained. Differentiation of the retrieved error count function leads to the jitter histogram, which can be used for further parameter extraction such as median and maximum outlier distance from the median calculations. The principle of this measurement is shown in **Figure 21**. Especially for multilane interfaces as PCIe, it is of importance to have the error count capabilities integrated into the measurement hardware per pin in order to be able to achieve test times that make this test approach usable for device ramp and production testing.



Figure 21: ATE Based Jitter Test Principle

The technology and production ramp phase is also used widely to introduce test methodologies that are far more close to the production testing of a device than to the characterization testing. If this direct step from characterization measurement setups to HVM test methodologies is done for new technologies, of course, correlation effort is higher than switching to HVM test methodologies after gaining experience on the new technology using test methodologies that were closer to characterization. Nevertheless, it is important to mention that the test methodologies described in more detail in the next chapter also are applied in the ramp phase in some cases. This is especially true for HVM oriented test methodologies that also offer limited characterization capabilities that simplify correlation to the characterization results. Examples for such test methodologies are vectorless test approaches [8] either based on pure DfT solutions [9][10], or on ATE supported loopback paths that allow controlled parameter modifications (high integrity loopback) before the data received from a transmitter is sent back to the receiver of the DUT.

#### **Production Testing**

The ultimate goal of production or HVM testing is to reduce test cost and test time while not losing test coverage. Only this combination allows the shipment of high volume device quantities at the required quality. In principle, there are three options to achieve this goal in a high volume test environment that usually makes use of ATE systems. The first option is to use ATE high-speed pin electronic cards. The capabilities of these high-speed cards regarding accuracy and feature sets are dictated by the characterization requirements of the latest device technologies they need to test. Thus, high-speed pin electronics cards targeted for new technologies such as PCIe today have to make use of the most sophisticated components accessible for the ATE manufacturers. This of course results in best accuracy and as a result in best test coverage on the one hand, but also makes it difficult for ATE suppliers to provide such test equipment for leading edge technologies at price

points that allow its application in HVM. This is especially true if test cost is the only driver for HVM decisions and test coverage is only a second order selection criterion as it is the case for thoroughly characterized devices and mature technologies. On the other end of the test equipment spectrum, pure DfT based loopback solutions are applied. Based on pattern generators and pattern analyzers inside the DUT, tests are performed by looping back the data from the transmitters to the receivers. In its simplest implementation, this loopback approach loops the data from the transmitter as is via a plain DUT-external Tx-Rx connection back to the receiver. It is obvious that with such a plain loopback approach, test quality is not predictable upfront since none of the Tx parameters are tested and the receivers might not be stressed to the specification limits. It easily could happen that a device only passes this kind of a test because it generates very clean data eyes on its transmitters or has very stable receivers that even tolerate data eves from the transmitters that are far outside the specification limits. In both cases severe interoperability issues are not detected. Moreover, the pins of a device are not accessible anymore without a loadboard reconfiguration for test equipment like, for example, a PMU to perform DC tests if an external loopback is implemented on the loadboard. Since advantages of loopback based testing such as obsolete test equipment synchronization or protocol awareness of the test equipment are obvious, ways to overcome the disadvantages of the plain wire loopback were developed. One of these approaches [10] is completely DfT/BIST based and only requires highly accurate clock signals from the ATE. Of course, this approach introduces increased silicon area for the circuitry that modifies the parametrics of the generated data stream as well as the analysis circuitry required for measurements. Besides, good knowledge of the high-speed I/O buffers is required to implement this test approach. Especially for SOC devices that make use of third party IP building blocks, this often is not the case.

A third option that combines the parametric advantages traditional ATE equipment offers with the advantages of loopback based testing is the high integrity loopback approach. This test solution is based on DfT features like a pattern generator and pattern analyzer inside the DUT. Since pattern generators and analyzers are DfT-capabilities that are already contained in a wide range of highspeed I/O IP cores since they do not require sophisticated parameter variation circuitry, the chance to find a suitable IP core on the IP marketplace for a SOC design is quite high. If the required DfT features are available, an ATE based loopback path is provided that allows the measurement of key transmitter parameters, as well as the parametric modification of the data stream that is looped back to the receivers in order to stress these receivers to the limits of the specification. Since the only effect of a test the outside world sees is a pass or fail obtained from the DUT's pattern analyzer, measurements are implemented in a way that causes a pass/fail transition in the pattern analyzer if a measurement threshold is reached. In order to measure Tx and Rx parameters separately using a high integrity loopback, separate handling of the data stream that is received from the transmitters (receive data) from the parameter variations that are done to stress the receivers (drive data) is required. In order to measure the DUT Tx voltage levels, for example, the threshold values of the high integrity loopback receiver are programmed in a way that modifies the looped data in case the DUT Tx voltages are below the thresholds. If the drive parameters of the high integrity loopback are kept within the specification boundaries, a failure that is reported from the DUT pattern analyzer can be traced back to the modified receive threshold of the high integrity loopback and thus, to the transmitter levels of the DUT. In order to isolate failures on the receivers of a device, receive parameters in the high integrity loopback are kept within safe specification boundaries and drive parameters of the high integrity loopback are varied. This leads to a loopback principle for this approach as shown in Figure 22.



Figure 22: ATE High Integrity Loopback Principle

Besides providing a parameterized loopback path, it is important for a high integrity loopback card to be flexible enough to also allow DC tests on the high-speed pins as well as scan data access. The main parameters that need to be varied within the high integrity loopback ATE card are the ones that are required to close the data eye according to the compliance data eyes defined by the PCIe specification. This requires level and relative timing control of the looped back data as shown in **Figure 23**. The variation of these parameters allows stressing a PCIe receiver according to the specifications and enables the detection of problems isolated to transmitters or receivers even in a loopback configuration.



Figure 23: High Integrity Loopback Parameter Variations (jitter injection and common mode voltage modulation)

High integrity loop configurations can also be used for limited characterization measurements if parameter controllability is not restricted to the absolute minimum that is required to generate the required data eyes. One example for such a limited characterization measurement is shown in **Figure 24**. Here a jitter tolerance measurement is shown that is enabled by a frequency and amplitude controllable jitter injection source on the high integrity loopback ATE card.



Figure 24: Jitter Tolerance Measurement with High Integrity Loopback ATE Card

## Summary

PCI Express architecture as a new interface technology that introduces a fundamental break with the I/O technologies used so far in the computation arena for the sake of future scalability headroom and application coverage poses quite some test challenges on ATE equipment. Besides all of these challenges, it has been shown using real world examples that today's ATE equipment is capable to cover the test needs of PCI Express devices in all steps of a device's life cycle. The availability of these ATE based test solutions is key to allow the volume ramp of PCI Express based products.

All measurements presented in this paper were performed on an Agilent 93000 SOC scalable platform architecture equipped with NP2500/NP3G-XS digital high-speed pin electronic cards and BIST Assist 6.4 high integrity loopback cards.

#### References

- [1] <u>PCI-SIG</u>, "PCI Express Base Specification Revision 1.0a," April 15<sup>th</sup>, 2003
- [2] H.T. Kung, Robert Morris, "Credit-Based Flow Control for ATM Networks," IEEE Network Magazine, March 1995
- [3] Albert X. Widmer, Peter A. Franaszek,"A DC-Balanced, Partitioned-Block, 8B/10B Transmission Code," pp440-451, IBM Journal of Research and Development, Volume 27, Number 6, November 1983
- [4] <u>PCI-SIG</u>, "PCI Express Card Electromechanical Specification Revision 1.0a," April 15<sup>th</sup>, 2003
- [5] <u>PCI-SIG</u>,: "PCI Express Jitter Modelling Revision 1.0RD," July 14<sup>th</sup>, 2004
- [6] Agilent Technologies, "Testing for the PCI Express Interfaces," go/semiconductor quarterly Newsletter, Spring 2003
- [7] Hubert Werkmann, "Testing PCI Express<sup>™</sup>: Challenges and Solutions", appear in the 21st VLSI Test Symposium, 2003
- [8] Bernd Laquai, "<u>Vectorless test: best bet for high-speed I/O</u>;" EE Times, October 24<sup>th</sup>, 2003
- [9] Jay J. Nejedlo, "IBIST (Interconnect Built-in-Self-Test) Architecture and Methodology for PCI Express," IEEE International Test Conference 2003
- [10] T.M. Mak, Mike Tripp, Anne Meixner, "Testing Gbps Interfaces without a Gigahertz Tester," IEEE Design & Test of Computers, July-August 2004
- [11] Rainer Plitschka, Bernd Laquai, "Testing of High-Speed Serial I/O Interfaces Based on Spectral Jitter Decomposition," IEC DesignCon 2004
- [12] Gert Hänsel, Korbinian Stieglbauer, Guido Schulze, Jose Moreira, "Implementation of an Economic Jitter Compliance Test for a Multi-Gigabit Device on ATE," IEEE International Test Conference 2004