- français
- English
Crossbars design
The paper defining the full, minimal and sparse crossbars.
The paper proposing the crossbar area estimation formula (equations 3 & 4).
AIC crossbar
Each AIC node is an AND /NAND gate. And both of these gates have symmetric inputs (e.g. a.b = b.a).
This will allow us to reduce the crossbar area by reducing the number of used switches.
For each AIC input, only one switch can be removed without affecting the input connectivity. Removing additional switches makes some input combinations impossible to implement.
The full crossbar flexibility can still be maintained by simply swapping the nodes' inputs whenever a connection is not possible.
The following figure shows an example of such a crossbar when having N=6 crossbar inputs and M=4 crossbar outputs, feeding a 2-AIC.
The AIC crossbar has N=112 inputs and M=192 outputs. Using the area estimation formula, we can compute the following transistor count:
- This crossbar: 47 424 transistors
- Full crossbar: 47 616 transistors
- Difference (area reduction): around 0.4%
Note: The area of this crossbar can be further reduced but at the expense of some flexibility.
LUT/ALM crossbar
Brief architecture description:
- The cluster consists of 10 ALMs
- Each ALM contains four 4-input LUTs (some of which are composed of two 3-input LUTs). These LUTs share inputs as can be seen in the cluster architecture). The ALM has 8 inputs in total.
Note: If the ALM consisted on big 8-input LUT, then one can assume that the inputs' order is irrelevant and a minimal crossbar can be used while maintaining full flexibility. However, the ALM consists of several LUTs sharing different inputs, which adds constraints...
If we consider each 3-LUT:
The LUT has full flexibility, so the input order (or assignment) is not important since the LUT configuration can be modified according to the input order. So a minimal crossbar can guaranty that any of the cluster inputs can reach the LUT. The figure below shows a minimal crossbar for two 3-LUTs, assuming that the cluster has 6 inputs.
Now considering the entire ALM:
The shared inputs among the LUTs must be taken into consideration. Inputs e0, e1, f0 and f1 are not shared among the 3-LUTs/4-LUTs and some of them are used through the multiplexers. These inputs should be provided full flexibility and connected to all cluster inputs. However, inputs a, b, c, and d are shared (in particular combinations). So the crossbar for the ALM becomes as follows:
Using the area estimation formula on the whole cluster:
- This crossbar: 15 120 transistors.
- Full crossbar: 15 200 transistors
- Difference: 0.5%
Note: There could be a more optimal crossbar (having less area), but this is the safest we currently have while maintaining the full capacity.
AIC/ALM crossbar comparison:
The AIC crossbar is around 3x bigger than the ALM crossbar.
- Ce wiki
- Cette page