Propagating variability from technology to system level

June 6, 2017 | Autor: Bart Dierickx | Categoria: Monte Carlo, Rare Event

Descrição do Produto

1

Propagating variability from technology to system level Bart Dierickx, Miguel Miranda, Petr Dobrovolny, Florian Kutscherauer, Antonis Papanikolaou, Pol Marchal

Abstract— As CMOS technology feature sizes decrease, variability more and more jeopardizes system level parametric and functional yield. This paper proposes a framework that can capture variability at all levels in the design flow. It offers a correlated view on yield, timing, dynamic and static energy. Preservation on rare events in variability distributions is obtained by the Weighted Monte Carlo technique. Index Terms— variability, variability aware modeling, system level, yield estimation, technology aware design, Weighted Monte Carlo.

I. MOTIVATION

T

scaling into the deep sub-micron era is disrupting well-established design practices and flows. Process and material variability introduces functional problems and parametric uncertainty. New tools enter the design flow to counter its impact [1], such as Design for Manufacturing (DFM), a collection of measures aiming to improve functional yield by correctly modeling and providing solutions for systematic manufacturing errors [2], [3], [4], and Statistical Static Timing Analysis (SSTA), and are used for the estimation of the impact of variability on parametric specifications, timing and leakage [5], [6]. ECHNOLOGY

There exists not a sole type of variability. Variability exhibits geometrical or spatial correlation: it may be local (as what is commonly known as “mismatch”) or global (“die to die”, “across wafer”, “wafer to wafer”), random e.g. due to Random Dopant Fluctuations, versus systematic, or non-reproducible versus reproducible. Moreover, technology, device and circuit variability parameters exhibit complex correlations, which are not easily represented in a generic fashion. Variation of underlying technology parameters may affect devices and circuits in a correlated fashion. In this paper we propose a methodology and framework to represent variability over the whole manufacturing and design flow. The goal is propagating Manuscript received 30 September 2007. All authors are with IMEC, Kapeldreef 75, 3001 Leuven, Belgium (e-mail: bart.dierickx@ imec.be).

978-1-4244-1728-5/07/$25.00 ©2007 IEEE

“almost any” variability distribution over all levels of abstraction that occur in a typical CMOS SoC design flow. The methodology builds on top of existing signoff and simulations flows, and “adds” or “enables” variability. In order to reach that goal, we adhere to the following: -any variability distribution may be represented. We do this by casting these systematically to a numerical representation (WMC), superseding analytical representations. -a specific treatment and representation for different “types” of variability: random versus systematic, local versus global variabilities. -correlation between parameters of the same or different objects (device, circuit parts) is systematically represented and preserved. -propagation of variability may happen through “Weighted” Monte Carlo sampling, which needs a fairly low number of samples still yielding sufficient accuracy in estimating the frequency of outliers or of distribution tails. II. STATE OF THE ART State of the art EDA tools that handle variability are in the domains of DFM and SSTA. The advent of subwavelength lithography has provoqued inaccuracies and shape perturbations that influence the properties of the printed devices. Modeling approaches and solution techniques for these problems have been developed. This broad class of tools is known as Design-ForManufacturing (DFM). It aims to provide models of systematic effects in geometrical imperfections, and solutions that are applied during mask generation and manufacturing [7], [8]. Examples are Optical Proximity Correction, CMP compensation. DFM tackles predictable, reproducible (“systematic”) effects and impacts functional yield. In recent years, DFM approaches that compensate for the impact of physical distortions on the parametric properties of the device have emerged. They are called electrical DFM. A comprehensive overview of DFM is presented in [9]. Scaling further into the deep-deep sub micron regime has aggravated problems due to “random variability”. RDF (random dopant fluctuations) and LER (line edge roughness) are representative problems of this class. They are stochastic effects impacting the performance of the transistors. SSTA (statistical static timing analysis) handles

2

Various DFM techniques have found widespread use, SSTA is still gaining acceptance. One observes that further acceptance of today’s variability modeling techniques is hampered by leaving inaccuracy gaps, which preclude the techniques from reaching a comprehensive predictive capability. Challenges are x A systematic treatment of local versus global, systematic (reproducible) versus random (stochastic). x Models need more detail than normal (Gaussian) distributions or linear sensitivities to small signals perturbation can offer. This is more true as perturbations become “large signal”, and distributions can have uncommon shapes. x maintaining correlations of common underlying physical variability sources x The need to maintain a correlated view between speed performance and static/dynamic power, hence modeling the correlated variability of dynamic energy, static power and speed performance. Variability of dynamic power How can device variability create dynamic power variability? Timing uncertainty on the toggling activity of converging nets results in unnecessary switching or glitches; this can propagate through the logic gates and create a multiplying effect. Figure 1 is a simulation example that shows the impact of variability of active power on the circuit of section VII.

IV. WEIGHTED MONTE CARLO DISTRIBUTION

A.U.

III. LIMITATION OF THE STATE OF THE ART

This and similar experiments shows that the variability of dynamic energy is of the same order of magnitude as the timing variability. SSTA can capture the timing variations due to variability and is currently being extended toward capturing the leakage variations. Estimating dynamic energy, however, requires the introduction of application dependent information into the characterization and analysis tools and this is not possible today in any academic or industrial tool flow for SSTA analysis. Thus the final variability characterization must be performed one level higher, it has to rise from the gate abstraction to the Register Transfer Level including realistic application test-benches/traces as we do in our environment.

entries i freq F(i) ptoir

random variability by modeling statistically the impact of random effects on the timing of the circuit. SSTA emerged as a response to the difficulty to achieve timing closure under random variability. It is also used to obtain better estimates for required design margins and guardbands. Recently commercial SSTA tools also incorporate a statistical estimation of the leakage power consumption of the chip under variability [10], [11]. This provides the designer a partial view on the power consumption of the chip.

entries i Figure 2 Point-wise described distributions in one dimension. (a) Top: the result distribution of a “classic” Monte Carlo experiment: the list of entries equals the list of population members. (b) Bottom: “Weighted” Monte Carlo distribution. Members (“entries”) of the population have individual frequencies or “probabilities to occur in reality” WMC is a way to represent any uni- or multidimensional distribution, be it coming from measurements or from analytical expressions. In our framework we represent all “objects” with all their variability parameters in one large WMC table. entry 1 2 3 4 5 6 7 8 9

ptoir 1.697E-03 5.340E-02 5.510E-02 1.419E-02 8.651E-02 2.317E-02 1.029E-02 3.293E-02 8.554E-02

delay 2.976E-09 3.550E-09 2.876E-09 3.178E-09 3.506E-09 2.415E-09 2.970E-09 2.730E-09 3.507E-09

static 5.098E-07 5.622E-07 4.842E-07 5.350E-07 5.498E-07 4.220E-07 4.943E-07 4.865E-07 5.710E-07

dynamic 1.267E-09 1.322E-09 1.212E-09 1.200E-09 1.230E-09 1.282E-09 1.156E-09 1.312E-09 1.248E-09

Figure 3 Example of a WMC set for a digital block. Three correlated object parameters are listed versus their frequency or “probability to occur in reality” Figure 1 Example of simulated dynamic power, comparing the nominal (invariable) case and the distribution due to MOSFET variability. Simulation used a normal use set of vectors.

E.g. such object can be a nMOSFET, in the simplest case with parameters Vth and Beta, or a NAND gate with parameters static power, dynamic energy, and delays for certain load conditions and input conditions (Figure 3).

3

V. WEIGHTED MONTE CARLO SAMPLING STRATEGY Apart from that WMC is also a sampling strategy for Monte Carlo experiments. In the next figure we show the results of a simple sampling experiment. We create populations of NANDs from 4 randomly sampled MOSFETs. For each NAND, its delay (and energy) is calculated or simulated. 9.0E-10

1.0E-09

1.1E-09

1.2E-09

1.3E-09

1.4E-09

1.5E-09

1.E+04

1.E+00 1.E-02 1.E-04 1.E-06 1.E-08

WMC frequency or PTOIR

NAND nominal delay 1.E+02

classic Monte Carlo entry sampling

of the above processes: (a) pick the entry with a random number that gives each entry a probability to be picked F(i)(1-Ȗ) . (b) assign to the resulting object a probability to occur of F(i)Ȗ . The product of both steps is again F(i). By choosing the optimum value for Ȗ, one can greatly improve the speed by which the sampled population converges to its asymptotic distribution. WMC is a “variance reduction technique”, as in [12]. Figure 5 illustrates the efficiency of WMC to simulate distributions tails and outliers. In this experiment, NANDs are composed from MOSFETs that have 1/1e4 outliers. We ran a large number of WMC experiments to estimate the fraction of NANDs that contain outlier MOSFETs. 0.002 0.0018 0.0016 0.0014 0.0012

MC @1ı OF+1ı OF-1ı

0.001

0.0006 0.0004

1.E-12

0.0002

1.E-14

0

Classic Monte Carlo and Entry Sampling are two singular cases of a more generic “Weighted” Monte Carlo sampling method. Classic Monte Carlo sampling can be described as following two-step process: (a) In a WMC distribution such as in Figure 2b, pick an entry i using a random number that has a probability proportional to F(i). (b) For the object (e.g. MOSFET) corresponding to that entry to build a random object at higher level (e.g. NAND), each such lower object has equal “probability to occur”. The Entry Sampling strategy says: (a) pick the entry with a random number that gives each entry equal probability. (b) Assign to the object corresponding to that entry a “probability to occur” at higher level that is proportional to F(i). Monte Carlo fails in tracing distributions tails and outliers. Entry sampling over-emphasizes them. Observe that the product of the probability of the two steps is constant and proportional to F(i). This allows us to define the Weighted Monte Carlo sampling strategy also as a two-step process that is a generic representation

theoretical

0.0008

1.E-10

Figure 4 “Probability to occur in reality” (not normalized) versus gate delay, for samples of a 1000 sample NAND population. Populations were generated in two ways (a) classic Monte Carlo (Ȗ=0) (b) Entry Sampling (Ȗ=1).

NAND outlier fraction

(Weighted) Monte Carlo experiments sampling such objects will automatically sample correlated parameters. Additional to that, correlation between objects – i.e. nMOSFETs are correlated to pMOSFETs, NAND delay is correlated to NOR delay, etc. as they share underlying common technology variability – is enabled by creating the WMC tables for each object concurrently, and implying correlation between the row indexes of the WMC table for each object.

NAND population size 10

100

1000

10000

Figure 5 Accuracy (average and overall +1ı and - 1ı) of the estimation of the outlier fraction (OF) of NANDs versus the amount of samples in the population. In this experiment, Ȗ is 0.6. The theoretical outlier fraction is 0.04% (thick line). For comparison, the +1ı accuracy line when classic Monte Carlo (MC @1ı) is used is shown too. The outlier fraction estimates requires about 100x less samples with WMC than with classic Monte Carlo. In this case, we are able to estimate the number of outliers with an accuracy of 1 sigma, using a 100 sample WMC population with a properly chosen Ȗ. This improvement is consistent with other observations; the factor depends on the actual outlier “rareness”, the Ȗ used, the number of bricks in the wall (as MOSFETs in a NAND), and other peculiarities of the circuit. Figure 6 represents a set of experiments creating larger random digital blocks from these random NANDs. Observe classic Monte Carlo and Entry Sampling fail in representing the full output distribution with main distributions, outliers and defect items. Good detail for the whole distribution is obtained when using a Ȗ between 0.1 and 0.4.

4

1.E-08

1.E-07

block delay [ns]

1.E-06 1.E+00

Defect Blocks

1.E-01

Outlier Blocks

1.E-02 1.E-03

Classic MC sampling (gamma=0) gamma=0.1 gamma=0.2 gamma=0.3 gamma=0.4 gamma=0.5 entry sampling (gamma=1)

WMC frequency or ptoir

1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 1.E-09 1.E-10 1.E-11

Figure 6 Weighted Monte Carlo experiments creating 100 random digital blocks. Each block contains 100 NANDs as from Figure 4b. X-axis: block delay, Yaxis: probability to occur in reality. VI. VARIABILITY-AWARE MODELING FRAMEWORK The WMC distribution and sampling strategy are components of the Variability-Aware Modeling (VAM) framework. This framework intends to streamline characterization and propagation of variability and degradation effects over the entire semiconductor manufacturing and design flow. VAM builds wherever possible on top of mainstream commercial EDA solutions. VAM discriminates 5 levels of design abstraction, being technology (measurement data), compact models (MOSFETs, R, C, interconnects…), standard cells, digital blocks and system (being the combination of the IC and the software running it), as show in Figure 7. TOP : System Level Cost Model ch system System level statistical Yield&Timing&Power Analysis ch digital block concurrent statistical Timing/Energy Analysis for standard cell based logic ch standard cell Variability aware Standard cell library characterization

Variability aware Macro block (memories, PLL,..) characterization

ch compact model Variability scaling rules for ȕ, Vth… as function of W,L,T

Compact Device Model (e.g. BSIM)

ch technology BOTTOM : Technology insights

Figure 7 Schematic view of the Variability Aware Modeling framework illustrating the 5 design abstraction levels (chapters) it spans. Variability information propagates from bottom to top. Key in our development is the standardization of the interfaces between the different abstractions levels. Variability propagation from process technology to

analog simulation can be done in a number of different ways and no standard is emerging yet in the industry. Similar considerations apply for the interface between standard cell calibration and digital simulation. Existing SSTA solutions use proprietary standards for the standard cell library characterized for variability. Finally, a unified information format is required for the inputs to the chip integration analysis tool, which can capture the top level component energy and timing statistics. A. Technology insights The VAM flow starts from information about the process technology, as measurement data or data estimated by TCAD or known physical rules. These translate to transistor compact models, such as BSIM, and a complementary set of variability scaling rules. A similar approach is taken for passives and interconnects (R, C…). Such “rules” include models about how variability affects transistors of different gate lengths or widths, such as Pelgrom`s mismatch model for FETs [13]. Today we model most of the known time invariant variability effects, be they systematic or random. In the future rules will be extended to model the variability due to degradation and reliability mechanisms such a soft breakdown, electro migration etc. based on the impact of supply voltages or temperature conditions on the switching time and current amplitudes of the circuit parts over time. B. Standard cell library and macro characterization The goal of this characterization step is to generate a standard cell library which includes information about the statistical properties of timing and energy (dynamic, leakage and short circuit) per cell. Nassif et al. [14] have outlined the main challenges that one faces during this characterization. Our approach generates a statistical population of the standard cells again using weighted Monte-Carlo techniques. For this purpose, it creates multiple versions of each cell by modifying its default transistor-level description to specific variations such as e.g. litho, threshold voltage and beta variations. The results are collected and stored in a single library in “.lib” format, containing multiple versions of the same cell. In the case of macro blocks, (e.g. memories, ADCs, PLLs) the complete block's timing and energy are characterized using circuit specific analog simulation. Two major ways exist to introduce MOSFET variability in analog simulations. The first way is to introduce this variability information in the parameters of the model card. Current HSPICE versions can handle sensitivities on the model card parameters. The second way is to perform (weighted) Monte Carlo simulations and to introduce in the transistor extra net-list elements that mimic the impact of variability per simulation run. C. Statistical concurrent Timing-Energy Analysis for standard cell logic This step aims to characterize logic blocks in terms of correlated statistical timing and energy consumption for

5

the execution of a given application task. This task can typically be a single operation of the component, i.e. the execution of an instruction, a bus transfer, etc. This step is the lowest abstraction where application-dependent information, such as bit toggling activity, starts playing a role. Thus in order to properly capture the dynamic energy variations this step includes digital simulation capabilities. Moreover, this simulation is performed in a hierarchical manner. First the input activity is propagated through the system using RTL simulation and the input test-benches per system IP component are obtained. Then each component is individually characterized. SSTA currently cannot provide this functionality, because it lacks the activity view of the application. This step can be seen as equivalent to an SSTA functionality which has been extended to provide a full statistical characterization of the correlated timing and total energy properties of the standard cell logic. We have implemented it using weighted Monte Carlo simulations around an existing flow using commercial tools. Variability information is captured by randomly substituting nominal standard cells in the gate level netlist by standard cell instances that have been characterized for variability by the previous step. Timing/energy characterization of an average size logic component can take from several hours to several days CPU time. D. System level Timing/Energy Analysis A the top level, the system’s yield is evaluated as a function of its specifications which are expressed as clock frequency, power budget, but also supply voltage range, temperatures range, and application code. An instance of this method has been outlined in [15]. The concept can be compared to a block-based SSTA that additionally calculates total energy consumption. The algorithm is analytical, and is used only the variability calibrated results of top level digital blocks and macros, Therefore it is very fast, and is suitable for scaling to large SoC. Apart from these variability calibrated top level blocks, the estimation of system level (parametric) timing/energy requires application information. Typically different components have different duty cycles during the application execution, not all of them are activated permanently. This information is very important in order to correctly estimate dynamic energy consumption. One way to automatically generate it is to post-process the results of the digital simulation performed in the timing/energy analysis of standard cell logic. The analysis at the integration level is actually performed hierarchically. If the components are so large that digital simulation becomes too time consuming, we apply simulation to sub-components and then obtain the component results using this technique. Then at the integration stage it can applied to bring together all toplevel component statistical properties.

VII. RESULTS To demonstrate the VAM framework we have applied it to a real-life Very Long Instruction Word processor architecture from the portable wireless terminal domain [16] [17]. The circuit has 120K gates. The test-benches (application input) used were derived from the Wireless LAN functionality simulations. On the technology side we have used a PTM compact model [18] for the 32nm technology node. We created in fact a hypothetical transistor compact model that corresponds to the low standby power transistor from the ITRS roadmap [19]. The model parameters used were the following: printed channel length 32nm, Leff 13nm, tox 1nm, Vth 300mV for the NMOS and -300mV for the PMOS device, VDD 0.8V and Ron 150ȍ. In this experiment the underlying MOSFET variability was a synthetic distribution based on the 1 sigma = 20% of Vth as predicted in the ITRS. In variants of the experiments we added “outliers” to this distribution. Figure 7 illustrates the joint probability density function of longest path delay and dynamic energy of the complete processor architecture. The “invariable” (i.e. using nominal device parameters, not assuming any variability) results for longest path delay dynamic energy and leakage power are 8ns, 57pJ and 633uW respectively. Now as we use variability, the whole system suffers a significant shift on the mean value for all three metrics: longest path delay (+35%), dynamic energy (+35%) (Figure 1) and leakage power (+10%); and a considerable ±3V spread also in each metric: longest path delay (52%), dynamic energy (35%) and leakage power (10%). Figure 7 also shows than this circuit has a significant correlation between variability induced dynamic energy and path delay; a similar correlation exists with static power.

Figure 7 Probability density function versus dynamic energy consumption and longest path delay for the complete processor. Dynamic energy is calculated for processing one symbol of data as an average of a 1000 vector test bench. The “invariable” case is shown as reference.

6

These results were obtained by introducing only random (stochastic) variability on the threshold voltage of the low-standby transistors. The introduction of the complete range of systematic and random variations on all the relevant transistor parameters (e.g. W, L, tox, beta, Vth) has a larger impact on the offsets in timing and energy as well as the variation on each of the system parametric specifications. Extensions Having a complete framework for variability impact propagation enables the exploration of the design and manufacturing space. These can include technology options, like evaluations the impact of high-k metal gates or strain; manufacturing options, like restricted design rules; circuit options, like threshold voltage assignment in MTCMOS libraries or different circuit architectures; architecture options, like memory organization partitioning or the introduction of run-time configurable components [20]. VIII. CONCLUSION In this paper we proposed a general modeling framework which propagates variation information from the technology variability information up to the system level. It can be used to evaluate the impact of process variation on the main system parametric specifications: timing and total energy consumption. This enables designers to predict the parametric yield of their chip early in the design cycle. REFERENCES [1] X.-W. Lin, B. Nikolic, P. Habitz, R. Radojcic, “Practical Aspects of Coping with Variability: An Electrical View'', Tutorial at ACM/IEEE Design Automation Conf. 2006. [2] Y. Chen, A. Kahng, G. Robins, A. Zelikovsky, “Area Fill Synthesis for Uniform Layout Synthesis”, IEEE Trans. on CAD of Integrated Circuits and Systems, vol.21, no.10, pp. 1132-1147, 2002. [3] P. Gupta, F-L. Heng, “Toward a systematic-variation aware timing methodology'', Proc. ACM/IEEE Design Automation Conference, pp.321-326, 2004. [4] L. Stok, J. Koehl, ``Structured CAD: technology closure for modern ASICs'', Tutorial at IEEE Design Automation&Test in Europe (DATE), 2004. [5] E. Jacobs, M. Berkelaar, “Gate sizing using a statistical delay model”, Proc. IEEE Design Automation &Test in Europe (DATE), pp.283-290, 2000 [6] C. Visweswariah, K. Ranvindran, K. Kalafala, S.G. Walker, S. Narayan, “First-Order Incremental BlockBased Statistical Timing Analysis”, Proc. ACM/IEEE Design Automation Conf., pp. 331-336, 2004. [7] W. Grobman, R. Tian, E. Demircan, R. Wang, C. Yuan, M. Thompson, “Reticle Enhancement Technology: Implications and Challenges for Physical Design'', ACM/IEEE Proc. Design Automation Conf., pp.72-78, 2001.

[8] D. Sylvester, P. Gupta, A.B. Kahng, J. Yang, “Toward performance-driven reduction of the cost of RET-based lithography control'', Proc. SPIE, pp.123133, 2003. [9] P. Gupta, A.B. Kahng, “Manufacturing-Aware Physical Design'', Proc. IEEE/ACM Intl. Conference on Computer-Aided Design, pp. 681-687, 2003. [10] R. Rao, A.Devgan, D. Blaauw, D. Sylvester, “Parametric yield estimation considering leakage Variability'', Proc. ACM/IEEE Design Automation Conference, pp. 442-447, 2004 [11] M. Ashouei, A. Chatterjee, A. Singh, V. De, T. Mak, “Statistical estimation of correlated leakage power variation and its application to leakage-aware design'', Proc. Intl. Conf. on VLSI Design, 2006. [12] P. Bratley, B. Fox, L. Schrage, “A guide to simulation”, Springer-Verlag New-York 1983. [13] M. Pelgrom et al., “Matching properties of MOS transistors'', IEEE Journal. on Solid-State Circuits, vol. 24, iss. 5 pp. 1433-1439, 1989. [14] D. Nassif , D. Boning, N. Hakim, “The care and feeding of your statistical static timer'', IEEE/ACM Intl. Conf. on Computer Aided Design, pp. 138-139, 2004. [15] A. Papanikolaou, T. Grabner, M. Miranda, P. Roussel, F. Catthoor, “Yield prediction for architecture exploration in nanometer technology nodes: a model and case study for memory organizations'', Workshop on Hardware/Software Co-Design and Intl. System-level synthesis Symposium (Codes-ISSS), pp.253-258, 2006. [16] L. Van der Perre, B. Bougard, J. Craninckx, W. Dehaene, L. Hollevoet, M. Jayapala, P. Marchal, M. Miranda, P. Raghavan, T. Schuster, P. Wambacq, F. Catthoor, P. Vanbekbergen, “Architectures and circuits for software defined radios: scaling and scalability for low cost and low energy'', Proc. Intl. Solid-State Circuit Conf., 2007. [17] A. Papanikolaou, M. Miranda, P. Marchal, B. Dierickx, F. Catthoor, “At tape-out: Can system yield in terms of timing/energy specification be predicted?”, IEEE Custom Integrated Circuits Conf., Sep. 2007 [18] W. Zhao, Y. Cao, "New generation of Predictive Technology Model for sub-45nm design exploration", ISQED, pp. 585-590, 2006. [19] International Technology Roadmap for Semiconductors, 2005 update, http://public.itrs.net. [20] A. Papanikolaou, F.Lobmaier, H.Wang, M.Miranda, F.Catthoor, “A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications'', IEEE Proc. Wsh. on Hardware/Software Co-Design and Intnl. Systemlevel Synthesis Symposium (Codes-ISSS), pp.117-122, 2005.

Lihat lebih banyak...

Propagating variability from technology to system level

Descrição do Produto

Comentários