Preliminary Results

Page contents

VPR parameters

Mapping statistics

Experimental results (complete flow with IECAS numbers before optimization)

Area representation (before optimization)

Delay comparison (IECAS/FPGA'12 before optimization)

Delay/Area comparison (IECAS/FPGA'12 after optimization)

Preliminary results (after optimization)

What configurations shall we use to run VPR?

A set of experiments is designed to study the sensitivity of maximum achievable frequency with respect to some of the packing options. The experimental setup is the following:

- 72 flag variants were used (including 24 packing options x 3 different seeds).

- 5 VPR benchmarks (the fastest ones) processed for AIC and LUT mapping, respectively. The option with the default flags is colored in red and in a lighter color the same default packing options but with different seeds.

Click here to download the spread sheet with the detailed results

Conclusions:

- For AICs we observe a huge variability, with more than 70% frequency increase for some cases!! For example, the same packing options for ch_intrinsics result in 220, 289 or 335MHz (more than 50% increase) depending ONLY on the seeds.

- LUTs experience a smaller variability than AICs, but LUT variability is still quite relevant (40% in some cases).

Return to top

How efficiently are we using the AICs?

A set of experiments is designed to study the utilization of AICs when mapping a set of representative circuits. The experimental setup is the following:

- 17 VPR benchmarks (all but the 2 largest ones).

- For each benchmark we count the number of inputs of each type of cone that are different from gnd (which is only used for bypassing and not real computation).

For example the following 6-level Cone has 50 inputs connected to gnd, leading to a 78.1% ground occupation.

For a benchmark, we can see the number of cones with a particular ground occupation per type of cone:

Click here to download the results for each benchmark

Conclusions:

- We observe a substantial amount of big AICs that are largely underutilized.

- We should explore some local restructuring to see whether the size of the cones can be reduced while keeping the same functionality.

Return to top

LUT vs AIC reconfigurability using IECAS delay numbers

A set of circuits were mapped and P&R using VPR on 2 different architectures: AIC and LUT. The experimental setup is the following:

- 20 MCNC and 15 VPR (all but the 4 largest ones) benchmarks.

- Each architecture was annotated with the delay numbers extracted from the full custom design of the LUT and AIC logic cluster, respectively.

The maximum frequency in the LUT architecture is in average 67% higher than in the AIC architecture:

The delay of the critical paths are distributed between logic (logic + internal routing) and net (external routing) delay as follows:

The average external routing wirelength was increased by 25% in the AIC architecture:

Besides, the AIC architecture requires in average 7% more logic clusters than the LUT architecture:

Return to top

Representation of the LUT/AIC cluster area (IECAS numbers)

The area representation on terms of number of minimum width transistor.

Percentage area decomposition of the LUT/AIC cluster.

Return to top

Delay comparison (IECAS vs FPGA'12)

Although the IECAS implementation is in 40nm while the FPGA'12 implementation was using 90nm, the table below lists the delays of each component in the AIC cluster (in ps). The comparison aims at highlighting any striking ratio differences between the two implementations (ex. the crossbars delays).

	IECAS delay	FPGA'12 delay	FPGA/IECAS
2-AIC	93	167	1.80
3-AIC	135	248	1.84
4-AIC	173	331	1.91
5-AIC	214	417	1.95
6-AIC	260	496	1.91
Input xbar	283	80	0.28
Middle xbar	196	75	0.38
Output xbar	282	50	0.18