Adder Tree (12-input → 4-bit) — Schematic-to-Layout Integration Attempt

H. Choi
Digital Circuit Design Course Project

This project documents the design and integration attempt of a tree-based compressor that reduces 12 one-bit inputs (XT0–XT11) into a 4-bit output (bS0–bS3) using a hierarchy of Full Adders (FA: 3:2 compressor) and Half Adders (HA: 2:2 compressor).

DRC PASSED Hierarchical LVS NOT CLOSED

Overview

  • Technology: FreePDK45 (45nm educational PDK)
  • Tools: KLayout, ngspice
  • Flow: Cell design (FA/HA) → Cell DRC/LVS → Tree schematic → Tree layout (hierarchical) → Tree DRC → Tree LVS (not closed)

1. Motivation

A straightforward Ripple-Carry style accumulation of 12 inputs would introduce long carry propagation. Instead, this design adopts a Tree-based Compression (Compressor Tree) to reduce critical path length toward O(log N) behavior.

Key goals:

  • Minimize worst-case delay (T_worst) of the overall tree
  • Reduce total dynamic energy (E_total) and physical area via cell sizing and placement strategy
  • Validate correctness and feasibility through a full schematic-to-layout verification flow

2. Architecture Overview (Compressor Tree)

Adder Tree Architecture
12-input → 4-bit Compressor Tree Architecture

2.1 Design Rationale

The compressor tree was selected for:

  • Speed: parallel reduction reduces sequential carry chain length
  • Compression efficiency: staged reduction using FA (3:2) and HA (2:2)
  • Modularity: bottom-up approach (verify FA/HA first, then integrate)

However, this bottom-up strategy also exposed a classic physical design tradeoff: aggressive per-cell transistor sizing improved local delay, but hurt standard-cell regularity (mismatched cell heights / irregular pin locations), increasing routing complexity and making hierarchical LVS closure harder.

2.2 Functional Correctness (Schematic Level)

A full tree schematic was built using exported symbols from the chosen FA/HA cells. Schematic simulations confirmed output bits match expected arithmetic results with full swing observed on outputs.

3. Cell Selection (FA / HA)

3.1 Full Adder: FA_36p

Multiple FA topologies were evaluated. The final tree uses FA_36p, which reduces transistor count by using:

  • MUX-based logic decomposition
  • CMOS Transmission Gate (CMOSTG) blocks

Final sizing (best observed delay point):

  • PMOS W ≈ 100 nm
  • NMOS W ≈ 120 nm

3.2 Half Adder: HA2 (XOR + NAND + inverter)

HA2 uses XOR + NAND structure with inverter for signal restoration. Best performance in tree was obtained by differentiated sizing per block:

  • inverter ≈ 300 nm
  • XOR ≈ 100 nm
  • NAND ≈ 180 nm

4. Delay & Energy Measurement

4.1 Results Summary (Schematic-Level Tree)

  • T_worst ≈ 384 ps (dominant worst delay in first rising transition)
  • E_total ≈ 4.29 × 10⁻¹⁴ J
  • T_worst × E_total ≈ 1.65 × 10⁻²³ (J·s)

Contrary to initial hypothesis, worst delay was not always the falling transition.

5. Layout Strategy (Cell → Tree Integration)

5.1 Cell Layout Summary (Completed)

Both FA and HA cells were designed with shared diffusion maximization and consistent power rail placement.

  • FA_36p: 3130 nm × 1445 nm (DRC/LVS PASSED)
  • HA2: 1945 nm × 1695 nm (DRC/LVS PASSED)

5.2 Tree Layout (Hierarchical Assembly)

  • Tree area: 29230 nm × 1695 nm
  • DRC: PASSED
  • LVS: NOT PASSED

6. LVS Status: Root Cause Analysis

Even though individual cells passed LVS, the top-level hierarchical LVS did not close.

Primary suspected causes:

  1. Incomplete physical connectivity at top level - Some intended schematic connections were represented by labels/pins, but not fully realized as continuous metal routing
  2. Power net naming / consistency issues (VDD/VSS/GND) - Inconsistent naming/connection conventions across hierarchy levels
  3. Pin definition and net management ambiguity - Top-level pins not mapped consistently to subcell pins
  4. Hierarchy handling (flatten vs. hierarchical LVS) - LVS flows sensitive to hierarchy mismatches

7. Improvement Plan (How to Close LVS Next Time)

Concrete next steps to achieve LVS closure:

  • (A) Enforce explicit top-level metal routing for all subcell interconnects (avoid relying on label-only "virtual" connections)
  • (B) Unify global nets: standardize to VDD / GND naming across schematic, extracted netlists, and layout texts
  • (C) Define pins unambiguously: ensure each subcell uses consistent pin shapes/layers expected by LVS extraction rules
  • (D) Consider flattening: perform LVS on a flattened version of the tree to remove hierarchical ambiguity
  • (E) Standard-cell regularity: constrain FA/HA cell height and rail alignment even at the cost of minor cell-level delay, improving routability and integration stability

Key takeaway: In physical design, local cell-level optimization must be balanced with system-level regularity and routability.

8. What This Project Demonstrates

  • Built a non-trivial compressor tree from verified transistor-level cells
  • Performed delay/energy extraction based on transition-level measurements
  • Implemented full-custom layout with DRC-clean results at top level
  • Encountered and analyzed a real-world integration challenge: hierarchical LVS closure
  • Extracted actionable next steps reflecting a system-level physical design perspective

9. Technical Stack

FreePDK45 KLayout ngspice Full Custom Layout DRC/LVS Verification Transistor-Level Design