Select Module Design Specifications
The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module.
The select module takes in 6 instructions per clock cycle. Each instruction is put into a FIFO queue. There are 6 queues and each hold 6 instructions.
Instruction FIFO queues
FIFO 6
|
I35 |
I34 |
I33 |
I32 |
I31 |
I30
|
FIFO 5
|
I30 |
I29 |
I28 |
I27 |
I26 |
I25
|
FIFO 4
|
I24 |
I23 |
I22 |
I21 |
I20 |
I19
|
FIFO 3
|
I18 |
I17 |
I16 |
I15 |
I14 |
I13
|
FIFO 2
|
I12 |
I11 |
I10 |
I9 |
I8 |
I7
|
FIFO 1
|
I6 |
I5 |
I4 |
I3 |
I2 |
I1
|
|
|
|
Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 512 registers, the highest 2-bits are indicative of the CE cluster/unit to which they belong.
Register Numbers and Corresponding Units
Register Number
|
Highest 2 bits
|
Compute Engine (CE)
|
0 - 127 |
00 |
A Unit RF
|
128 - 255 |
01 |
B Unit RF
|
256 - 383 |
10 |
C Unit RF
|
384 - 511 |
11 |
M Unit RF
|
There are three vectors called READY, SCHEDULING, and PENDING and each have 512 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position.
The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros.
Ready, Scheduling, and Pending Vectors
A Unit Registers
|
I35 |
I34 |
I33 |
I32 |
I31 |
I30
|
FIFO 5
|
I30 |
I29 |
I28 |
I27 |
I26 |
I25
|
FIFO 4
|
I24 |
I23 |
I22 |
I21 |
I20 |
I19
|
FIFO 3
|
I18 |
I17 |
I16 |
I15 |
I14 |
I13
|
FIFO 2
|
I12 |
I11 |
I10 |
I9 |
I8 |
I7
|
FIFO 1
|
I6 |
I5 |
I4 |
I3 |
I2 |
I1
|
|
|
|
|
READY bit
It represents the fact that the content of a register are valid. When the destination register of an instruction is not ready the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1.
In other words
@ SE0:
READY [DINST.DEST] = 0
@ WB stage of the CE:
READY [DINST.DEST] = 1
SCHEDULING bit
At the first stage, SE0, the scheduling bit for the destination register is set to 1 upon reading. When the instruction has been successfully sent out and the corresponding CE resource is not busy, the scheduling bit of the destination register is set to 0.
@ SE0:
SCHEDULING [DINST.DEST] = 1
@ CE0: //if sent & !busy
SCHEDULING [DINST.DEST] = 0
PENDING bit
This bit is required to let the CE know when the content of a register from an external source is needed. For example,
RA1 ← RA2 + RB1
The arithmetic operation is taking place in the A unit of the CE. The source register, RB1, is located in the Register File of the B unit. Therefore, there needs to be a signal which lets the B unit know in advance to forward that particular register to the A unit. The signal is simply the address of the RB1 which is to be forwarded to A unit. Before the forwarding is performed, the PENDING bit is set to 1. As it can be concluded the pending bit is set or reset based on the source registers. Another important point to remember is that the register has to be ready. Meaning, the READY bit has to be set before any forwarding is issued. The more complete check for forwarding is performed as follows:
IF (READY & PENDING)
SEND it to be forwarded //send the address
The pending bit is cleared when forwarding is finished.
@ SE0: //setting to zero
IF forwarding is needed
PENDING [fwd] = 0
@ CE0: //setting to 1
IF (DINST.SRC1.POS != DINST.DEST.POS)
PENDING [DINST.SRC1] = 1
IF (DINST.SRC2.POS != DINST.DEST.POS)
PENDING [DINST.SRC1] = 1
We can send up to 4 forwarding signal in each clock cycle. The predetermined order is
- LOAD
- B unit – ALU
- A unit
- C unit
- B unit – Branch
- STORE