Difference between revisions of "Select"

From Vlsiwiki
Jump to: navigation, search
(PENDING bit)
(Design Specifications)
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
== Select Module Design Specifications ==
+
== Block Diagram ==
 +
[[Image:SelectBlockDiagram.jpg|500 px]]
 +
 
 +
== Design Specifications ==
 
The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module.  
 
The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module.  
The select module takes in 6 instructions per clock cycle.  Each instruction is put into a FIFO queue. There are 6 queues and each hold 6 instructions.  
+
The select module takes in 4 instructions per clock cycle.  Each instruction is put into a FIFO queue. There are 4 queues and each hold 8 instructions.  
  
 
{|
 
{|
Line 7: Line 10:
 
{| border="1" width="300pt" style="height:150px" cellspacing="0"
 
{| border="1" width="300pt" style="height:150px" cellspacing="0"
 
   |+ Instruction FIFO queues
 
   |+ Instruction FIFO queues
  |- align="center"
 
  !style="background:#efefef;"| FIFO 6
 
  | I35 || I34 || I33 || I32 || I31 || I30
 
  |- align="center"
 
  !style="background:#efefef;"| FIFO 5
 
  | I30 || I29 || I28 || I27 || I26 || I25
 
 
   |- align="center"
 
   |- align="center"
   !style="background:#efefef;"| FIFO 4
+
   !style="background:#efefef;"| FIFO 0
   | I24 || I23 || I22 || I21 || I20 || I19
+
   | I29  || I25  || I21  || I17  || I13  || I9  || I5  || I1
 
   |- align="center"
 
   |- align="center"
   !style="background:#efefef;"| FIFO 3
+
   !style="background:#efefef;"| FIFO 1
   | I18 || I17 || I16 || I15 || I14 || I13
+
   | I30  || I26  || I22  || I18  || I14 || I10  || I6  || I2
 
   |- align="center"
 
   |- align="center"
 
   !style="background:#efefef;"| FIFO 2
 
   !style="background:#efefef;"| FIFO 2
   | I12 || I11 || I10 || I9 || I8 || I7  
+
   | I31  || I27  || I23  || I19 || I15  || I11 || I7 || I3
 
   |- align="center"
 
   |- align="center"
   !style="background:#efefef;"| FIFO 1
+
   !style="background:#efefef;"| FIFO 3
   | I6 || I5 || I4 || I3 || I2 || I1
+
   | I32 || I28 || I24 || I20 || I16 || I12  || I8  || I4
 
|}
 
|}
 +
 +
<!-- SELECT stages-->
 
| style="text-align:center"|
 
| style="text-align:center"|
 
{| border="1" width="300pt" cellspacing="0"
 
{| border="1" width="300pt" cellspacing="0"
Line 34: Line 33:
 
|}
 
|}
  
 +
<!-- SELECT third stage -->
 
| style="text-align:center"|
 
| style="text-align:center"|
 
{| style="border:2px dashed blue" width="300pt" cellspacing="0"
 
{| style="border:2px dashed blue" width="300pt" cellspacing="0"
Line 41: Line 41:
 
|}
 
|}
 
|}
 
|}
Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 512 registers, the highest 2-bits are indicative of the CE cluster/unit to which they belong.
+
Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 640 registers, the highest 3-bits are indicative of the CE cluster/unit to which they belong.
  
 
{| border="1"  align="center" cellspacing="0"
 
{| border="1"  align="center" cellspacing="0"
   |+ Register Numbers and Corresponding Units
+
   |+ Register Assignments
 
   |-
 
   |-
 
   !style="background:#efefef;"| Register Number
 
   !style="background:#efefef;"| Register Number
   !style="background:#efefef;"| Highest 2 bits  
+
   !style="background:#efefef;"| Highest 3 bits  
 
   !style="background:#efefef;"| Compute Engine (CE)
 
   !style="background:#efefef;"| Compute Engine (CE)
  
 
   |- align="center"
 
   |- align="center"
   |0  - 127 || 00 || A Unit RF
+
   |0  - 127 || 000 || A Unit RF
  
 
   |- align="center"  
 
   |- align="center"  
   |128 - 255 || 01 || B Unit RF
+
   |128 - 255 || 001 || B Unit RF
  
 
   |- align="center"
 
   |- align="center"
   |256 - 383 || 10 || C Unit RF
+
   |256 - 383 || 010 || C Unit RF
  
 
   |- align="center"
 
   |- align="center"
   |384 - 511 || 11 || M Unit RF
+
   |384 - 511 || 011 || L Unit RF
 +
 
 +
  |- align="center"
 +
  |512 - 639 || 100 || S Unit RF  
 
|}
 
|}
  
There are three vectors called READY, SCHEDULING, and PENDING and each have 512 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position.  
+
There are three vectors called READY, SCHEDULING, and PENDING and each have 640 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position.  
 
The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros.
 
The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros.
  
{|
+
{|  
<!-- CE Unit assignment -->
+
<!-- CE Registers -->
|style="text-align:center" width="100pt"|
+
| style="text-align:center" width="100pt"|
{| border="0" cellspacing="0"
+
{| border="0" cellspacing="0"
   |+ Registers
+
   |+ Ready
 
   |-  
 
   |-  
 
   | 0   
 
   | 0   
 +
  |- 
 +
  | ...
 
   |-  
 
   |-  
  | ...
 
  |-
 
 
   | 127
 
   | 127
 
   |-  
 
   |-  
   | ...  
+
   | 128  
 
   |-   
 
   |-   
  | 128
 
  |-
 
 
   | ...
 
   | ...
 
   |-  
 
   |-  
   | 255
+
   | 255
 
   |-  
 
   |-  
   | 256  
+
   | 256
   |-
+
   |-
 
   | ...
 
   | ...
 
   |-  
 
   |-  
   | 383  
+
   | 383
 
   |-  
 
   |-  
   | 384
+
   | 384
 +
  |- 
 +
  | ...
 +
  |-
 +
  | 511
 
   |-
 
   |-
   | ...  
+
  | 512
 +
  |-
 +
   | ...
 +
  |-
 +
  | 639
 
   |-
 
   |-
  | 511
 
 
|}
 
|}
 +
 
<!-- The Ready Vector -->
 
<!-- The Ready Vector -->
 
| style="text-align:center" width="100pt"|
 
| style="text-align:center" width="100pt"|
Line 126: Line 135:
 
   |-  
 
   |-  
 
   | x
 
   | x
 +
  |-
 +
  | x
 +
  |-
 +
  | ...
 +
  |-
 +
  | x
 +
  |-
 
|}
 
|}
 
<!-- The Scheduling Vector -->
 
<!-- The Scheduling Vector -->
Line 155: Line 171:
 
   |-  
 
   |-  
 
   | 0
 
   | 0
 +
  |-
 +
  | 0
 +
  |-
 +
  | ...
 +
  |-
 +
  | 0
 +
  |-
 
|}
 
|}
 
<!-- The Pending Vector -->
 
<!-- The Pending Vector -->
Line 184: Line 207:
 
   |-  
 
   |-  
 
   | 0
 
   | 0
 +
  |-
 +
  | 0
 +
  |-
 +
  | ...
 +
  |-
 +
  | 0
 +
  |-
 
|}
 
|}
|}
+
 
 +
|}  
 
=== READY bit ===
 
=== READY bit ===
 
It represents the fact that the contents of a register are ready to be used. When a register of an instruction is not ready, the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1.  
 
It represents the fact that the contents of a register are ready to be used. When a register of an instruction is not ready, the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1.  
Line 228: Line 259:
 
           PENDING [fwd] = 0
 
           PENDING [fwd] = 0
  
We can send up to 4 forwarding signal in each clock cycle. The predetermined order is
+
We can send up to 4 forwarding signal in each clock cycle.  
# MUNIT - LOAD
+
# AUNIT  - se_aunit_fwd_pos
# BUNIT – ALU
+
# BUNIT0 - se_bunit_fwd_pos
# AUNIT
+
# BUNIT1
# BUNIT – Branch
+
# CUNIT  - se_cunit_fwd_pos
# CUNIT
+
# LUNIT  - se_lunit_fwd_pos
# MUNIT - STORE
+
# SUNIT0 - se_sunit_fwd_pos
 +
# SUNIT1
  
 
== How to Compile ==
 
== How to Compile ==
 
In order to compile any of Select Modules, go to directory Select and type in
 
In order to compile any of Select Modules, go to directory Select and type in
  rake fgpa:<ModuleName> full=1 v=1
+
  rake fpga:<ModuleName> full=1 v=1
 
  rake asic:<ModuleName> full=1 v=1
 
  rake asic:<ModuleName> full=1 v=1
  

Latest revision as of 02:56, 24 January 2013

Block Diagram

SelectBlockDiagram.jpg

Design Specifications

The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module. The select module takes in 4 instructions per clock cycle. Each instruction is put into a FIFO queue. There are 4 queues and each hold 8 instructions.

Instruction FIFO queues
FIFO 0 I29 I25 I21 I17 I13 I9 I5 I1
FIFO 1 I30 I26 I22 I18 I14 I10 I6 I2
FIFO 2 I31 I27 I23 I19 I15 I11 I7 I3
FIFO 3 I32 I28 I24 I20 I16 I12 I8 I4
Select Module
SE0 SE1
Overlapping Module
CE0

Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 640 registers, the highest 3-bits are indicative of the CE cluster/unit to which they belong.

Register Assignments
Register Number Highest 3 bits Compute Engine (CE)
0 - 127 000 A Unit RF
128 - 255 001 B Unit RF
256 - 383 010 C Unit RF
384 - 511 011 L Unit RF
512 - 639 100 S Unit RF

There are three vectors called READY, SCHEDULING, and PENDING and each have 640 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position. The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros.

Ready
0
...
127
128
...
255
256
...
383
384
...
511
512
...
639
Ready
x
...
x
x
...
x
x
...
x
x
...
x
x
...
x
Scheduling
0
...
0
0
...
0
0
...
0
0
...
0
0
...
0
Pending
0
...
0
0
...
0
0
...
0
0
...
0
0
...
0

READY bit

It represents the fact that the contents of a register are ready to be used. When a register of an instruction is not ready, the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1. In other words

@ SE0:
      READY [DINST.DEST] = 0
@ WB stage of the CE:
      READY [DINST.DEST] = 1

SCHEDULING bit

At the first stage, SE0, the scheduling bit for the destination register is set to 1 upon reading. When the instruction has been successfully sent out and the corresponding CE resource is not busy, the scheduling bit of the destination register is set to 0.

@ SE0:
        SCHEDULING [DINST.DEST] = 1
@ CE0: //if sent & !busy
        SCHEDULING [DINST.DEST] = 0

PENDING bit

This bit is required to let the CE know when the content of a register from an external source is needed. For example,

     RA1 ← RA2 + RB1 

The arithmetic operation is taking place in the aunit of the CE. The source register, RB1, is located in the Register File of the bunit. Therefore, there needs to be a signal which lets the BUnit know in advance to forward that particular register to the aunit. The signal is simply the address of the RB1 which is to be forwarded to aunit. Before the forwarding is performed, the PENDING bit is set to 1. As it can be concluded the pending bit is set or reset based on the source registers. Another important point to remember is that the register has to be ready. Meaning, the READY bit has to be set before any forwarding is issued. The more complete check for forwarding is performed as follows:

IF (READY & PENDING)
        SEND it to be forwarded //send the address

A challenge of waiting until (READY & PENDING)[dinst.src1] is true is that it is never possible to send an instruction and the do the forwarding simultaneously even if the register is ready. The reason is that it requires one cycle to set the pending bit.

The solution can be to set the FWD value if the register is READY. Since there can be many instructions and structural hazards for the forwarding port, the select stage (SE1) should decide which instruction gets the value forwarded. If an instruction got a value forwarded the pending bit should be cleared.


The pending bit is cleared when forwarding is finished.

@ CE0:
        //setting to 1
         IF (DINST.SRC1.id	!=  DINST.DEST.id)
            PENDING [DINST.SRC1] = 1
         IF (DINST.SRC2.id	!=  DINST.DEST.id)
            PENDING [DINST.SRC2] = 1
        //setting to zero whenever a forwarding is done
         PENDING [fwd] = 0

We can send up to 4 forwarding signal in each clock cycle.

# AUNIT  - se_aunit_fwd_pos
# BUNIT0 - se_bunit_fwd_pos
# BUNIT1 
# CUNIT  - se_cunit_fwd_pos
# LUNIT  - se_lunit_fwd_pos
# SUNIT0 - se_sunit_fwd_pos
# SUNIT1

How to Compile

In order to compile any of Select Modules, go to directory Select and type in

rake fpga:<ModuleName> full=1 v=1
rake asic:<ModuleName> full=1 v=1

This will create a script.

./synplify_ModuleName.sh 

Run the shell script to create an srr file in synplify directory which will be named ModuleName.srr Open this file in any editor and look for errors and warnings. If you would like info about the timing just find "MHz". If you would like to specify the line numbers of errors and warnings just type in the following command in the select directory.

grep @E synplify/ModuleName.srr 

--Elnaz 22:14, 27 March 2009 (PDT)