Difference between revisions of "Select"
(→PENDING bit) |
(→Design Specifications) |
||
(16 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | == Block Diagram == |
+ | [[Image:SelectBlockDiagram.jpg|500 px]] | ||
+ | |||
+ | == Design Specifications == | ||
The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module. | The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module. | ||
− | The select module takes in | + | The select module takes in 4 instructions per clock cycle. Each instruction is put into a FIFO queue. There are 4 queues and each hold 8 instructions. |
{| | {| | ||
Line 7: | Line 10: | ||
{| border="1" width="300pt" style="height:150px" cellspacing="0" | {| border="1" width="300pt" style="height:150px" cellspacing="0" | ||
|+ Instruction FIFO queues | |+ Instruction FIFO queues | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- align="center" | |- align="center" | ||
− | !style="background:#efefef;"| FIFO | + | !style="background:#efefef;"| FIFO 0 |
− | | | + | | I29 || I25 || I21 || I17 || I13 || I9 || I5 || I1 |
|- align="center" | |- align="center" | ||
− | !style="background:#efefef;"| FIFO | + | !style="background:#efefef;"| FIFO 1 |
− | | | + | | I30 || I26 || I22 || I18 || I14 || I10 || I6 || I2 |
|- align="center" | |- align="center" | ||
!style="background:#efefef;"| FIFO 2 | !style="background:#efefef;"| FIFO 2 | ||
− | | | + | | I31 || I27 || I23 || I19 || I15 || I11 || I7 || I3 |
|- align="center" | |- align="center" | ||
− | !style="background:#efefef;"| FIFO | + | !style="background:#efefef;"| FIFO 3 |
− | | | + | | I32 || I28 || I24 || I20 || I16 || I12 || I8 || I4 |
|} | |} | ||
+ | |||
+ | <!-- SELECT stages--> | ||
| style="text-align:center"| | | style="text-align:center"| | ||
{| border="1" width="300pt" cellspacing="0" | {| border="1" width="300pt" cellspacing="0" | ||
Line 34: | Line 33: | ||
|} | |} | ||
+ | <!-- SELECT third stage --> | ||
| style="text-align:center"| | | style="text-align:center"| | ||
{| style="border:2px dashed blue" width="300pt" cellspacing="0" | {| style="border:2px dashed blue" width="300pt" cellspacing="0" | ||
Line 41: | Line 41: | ||
|} | |} | ||
|} | |} | ||
− | Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only | + | Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 640 registers, the highest 3-bits are indicative of the CE cluster/unit to which they belong. |
− | {| border="1" cellspacing="0" | + | {| border="1" align="center" cellspacing="0" |
− | |+ Register | + | |+ Register Assignments |
|- | |- | ||
!style="background:#efefef;"| Register Number | !style="background:#efefef;"| Register Number | ||
− | !style="background:#efefef;"| Highest | + | !style="background:#efefef;"| Highest 3 bits |
!style="background:#efefef;"| Compute Engine (CE) | !style="background:#efefef;"| Compute Engine (CE) | ||
|- align="center" | |- align="center" | ||
− | |0 - 127 || | + | |0 - 127 || 000 || A Unit RF |
|- align="center" | |- align="center" | ||
− | |128 - 255 || | + | |128 - 255 || 001 || B Unit RF |
|- align="center" | |- align="center" | ||
− | |256 - 383 || | + | |256 - 383 || 010 || C Unit RF |
|- align="center" | |- align="center" | ||
− | |384 - 511 || | + | |384 - 511 || 011 || L Unit RF |
+ | |||
+ | |- align="center" | ||
+ | |512 - 639 || 100 || S Unit RF | ||
|} | |} | ||
− | There are three vectors called READY, SCHEDULING, and PENDING and each have | + | There are three vectors called READY, SCHEDULING, and PENDING and each have 640 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position. |
The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros. | The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros. | ||
+ | {| | ||
+ | <!-- CE Registers --> | ||
+ | | style="text-align:center" width="100pt"| | ||
+ | {| border="0" cellspacing="0" | ||
+ | |+ Ready | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 127 | ||
+ | |- | ||
+ | | 128 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 255 | ||
+ | |- | ||
+ | | 256 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 383 | ||
+ | |- | ||
+ | | 384 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 511 | ||
+ | |- | ||
+ | | 512 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 639 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | <!-- The Ready Vector --> | ||
+ | | style="text-align:center" width="100pt"| | ||
+ | {| border="1" cellspacing="0" | ||
+ | |+ Ready | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | x | ||
+ | |- | ||
+ | |} | ||
+ | <!-- The Scheduling Vector --> | ||
+ | | style="text-align:center" width="100pt"| | ||
+ | {| border="1" cellspacing="0" | ||
+ | |+ Scheduling | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | |} | ||
+ | <!-- The Pending Vector --> | ||
+ | | style="text-align:center" width="100pt"| | ||
+ | {| align="center" border="1" cellspacing="0" | ||
+ | |+ Pending | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | | ... | ||
+ | |- | ||
+ | | 0 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | |} | ||
=== READY bit === | === READY bit === | ||
− | It represents the fact that the | + | It represents the fact that the contents of a register are ready to be used. When a register of an instruction is not ready, the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1. |
In other words | In other words | ||
@ SE0: | @ SE0: | ||
Line 84: | Line 235: | ||
This bit is required to let the CE know when the content of a register from an external source is needed. For example, | This bit is required to let the CE know when the content of a register from an external source is needed. For example, | ||
RA1 ← RA2 + RB1 | RA1 ← RA2 + RB1 | ||
− | The arithmetic operation is taking place in the | + | The arithmetic operation is taking place in the aunit of the CE. The source register, RB1, is located in the Register File of the bunit. Therefore, there needs to be a signal which lets the BUnit know in advance to forward that particular register to the aunit. The signal is simply the address of the RB1 which is to be forwarded to aunit. Before the forwarding is performed, the PENDING bit is set to 1. As it can be concluded the pending bit is set or reset based on the source registers. Another important point to remember is that the register has to be ready. Meaning, the READY bit has to be set before any forwarding is issued. The more complete check for forwarding is performed as follows: |
+ | |||
IF (READY & PENDING) | IF (READY & PENDING) | ||
SEND it to be forwarded //send the address | SEND it to be forwarded //send the address | ||
+ | |||
+ | A challenge of waiting until (READY & PENDING)[dinst.src1] is true is that it is never possible to send | ||
+ | an instruction and the do the forwarding simultaneously even if the register is ready. The reason is that it requires | ||
+ | one cycle to set the pending bit. | ||
+ | |||
+ | The solution can be to set the FWD value if the register is READY. Since there can be many instructions and structural | ||
+ | hazards for the forwarding port, the select stage (SE1) should decide which instruction gets the value forwarded. If an | ||
+ | instruction got a value forwarded the pending bit should be cleared. | ||
+ | |||
The pending bit is cleared when forwarding is finished. | The pending bit is cleared when forwarding is finished. | ||
− | @ | + | @ CE0: |
− | + | //setting to 1 | |
− | + | IF (DINST.SRC1.id != DINST.DEST.id) | |
− | + | ||
− | IF (DINST.SRC1. | + | |
PENDING [DINST.SRC1] = 1 | PENDING [DINST.SRC1] = 1 | ||
− | IF (DINST.SRC2. | + | IF (DINST.SRC2.id != DINST.DEST.id) |
− | PENDING [DINST. | + | PENDING [DINST.SRC2] = 1 |
− | We can send up to 4 forwarding signal in each clock cycle. | + | //setting to zero whenever a forwarding is done |
− | # | + | PENDING [fwd] = 0 |
− | # | + | |
− | # | + | We can send up to 4 forwarding signal in each clock cycle. |
− | # | + | # AUNIT - se_aunit_fwd_pos |
− | # | + | # BUNIT0 - se_bunit_fwd_pos |
− | # | + | # BUNIT1 |
+ | # CUNIT - se_cunit_fwd_pos | ||
+ | # LUNIT - se_lunit_fwd_pos | ||
+ | # SUNIT0 - se_sunit_fwd_pos | ||
+ | # SUNIT1 | ||
+ | |||
+ | == How to Compile == | ||
+ | In order to compile any of Select Modules, go to directory Select and type in | ||
+ | rake fpga:<ModuleName> full=1 v=1 | ||
+ | rake asic:<ModuleName> full=1 v=1 | ||
+ | |||
+ | This will create a script. | ||
+ | ./synplify_ModuleName.sh | ||
+ | |||
+ | Run the shell script to create an srr file in synplify directory which will be named ModuleName.srr | ||
+ | Open this file in any editor and look for errors and warnings. If you would like info about the timing just find "MHz". | ||
+ | If you would like to specify the line numbers of errors and warnings just type in the following command in the select directory. | ||
+ | grep @E synplify/ModuleName.srr | ||
+ | --[[User:Elnaz|Elnaz]] 22:14, 27 March 2009 (PDT) |
Latest revision as of 02:56, 24 January 2013
Contents
Block Diagram
Design Specifications
The select stage is between the Scheduler and the Compute Engine (CE). The main purpose of this unit is to avoid RAW hazards between clusters and to enforce structural hazards due to limited amount of units available. The select unit has two stages: SE0 and SE1. Each stage is processed in one clock cycle. There is also an overlapping stage of CE0 between the Select module and CE module. The select module takes in 4 instructions per clock cycle. Each instruction is put into a FIFO queue. There are 4 queues and each hold 8 instructions.
|
|
|
Each of the four clusters in the CE holds 128 registers in its Register File. There are a total of 512 registers. A 9-bit binary number is assigned to each register. Since there are only 640 registers, the highest 3-bits are indicative of the CE cluster/unit to which they belong.
Register Number | Highest 3 bits | Compute Engine (CE) |
---|---|---|
0 - 127 | 000 | A Unit RF |
128 - 255 | 001 | B Unit RF |
256 - 383 | 010 | C Unit RF |
384 - 511 | 011 | L Unit RF |
512 - 639 | 100 | S Unit RF |
There are three vectors called READY, SCHEDULING, and PENDING and each have 640 rows with one column. The value of each element is either 0 or 1. The horizontal index is the register name/position. The READY vector is initialized to all x’s. The SCHEDULING and PENDING vectors are both initialized to zeros.
|
|
|
|
READY bit
It represents the fact that the contents of a register are ready to be used. When a register of an instruction is not ready, the ready bit is set to 0, and when the DONE signal from the CE is received and the content of the destination register becomes valid, it is set to 1. In other words
@ SE0: READY [DINST.DEST] = 0 @ WB stage of the CE: READY [DINST.DEST] = 1
SCHEDULING bit
At the first stage, SE0, the scheduling bit for the destination register is set to 1 upon reading. When the instruction has been successfully sent out and the corresponding CE resource is not busy, the scheduling bit of the destination register is set to 0.
@ SE0: SCHEDULING [DINST.DEST] = 1 @ CE0: //if sent & !busy SCHEDULING [DINST.DEST] = 0
PENDING bit
This bit is required to let the CE know when the content of a register from an external source is needed. For example,
RA1 ← RA2 + RB1
The arithmetic operation is taking place in the aunit of the CE. The source register, RB1, is located in the Register File of the bunit. Therefore, there needs to be a signal which lets the BUnit know in advance to forward that particular register to the aunit. The signal is simply the address of the RB1 which is to be forwarded to aunit. Before the forwarding is performed, the PENDING bit is set to 1. As it can be concluded the pending bit is set or reset based on the source registers. Another important point to remember is that the register has to be ready. Meaning, the READY bit has to be set before any forwarding is issued. The more complete check for forwarding is performed as follows:
IF (READY & PENDING) SEND it to be forwarded //send the address
A challenge of waiting until (READY & PENDING)[dinst.src1] is true is that it is never possible to send an instruction and the do the forwarding simultaneously even if the register is ready. The reason is that it requires one cycle to set the pending bit.
The solution can be to set the FWD value if the register is READY. Since there can be many instructions and structural hazards for the forwarding port, the select stage (SE1) should decide which instruction gets the value forwarded. If an instruction got a value forwarded the pending bit should be cleared.
The pending bit is cleared when forwarding is finished.
@ CE0: //setting to 1 IF (DINST.SRC1.id != DINST.DEST.id) PENDING [DINST.SRC1] = 1 IF (DINST.SRC2.id != DINST.DEST.id) PENDING [DINST.SRC2] = 1 //setting to zero whenever a forwarding is done PENDING [fwd] = 0
We can send up to 4 forwarding signal in each clock cycle.
# AUNIT - se_aunit_fwd_pos # BUNIT0 - se_bunit_fwd_pos # BUNIT1 # CUNIT - se_cunit_fwd_pos # LUNIT - se_lunit_fwd_pos # SUNIT0 - se_sunit_fwd_pos # SUNIT1
How to Compile
In order to compile any of Select Modules, go to directory Select and type in
rake fpga:<ModuleName> full=1 v=1 rake asic:<ModuleName> full=1 v=1
This will create a script.
./synplify_ModuleName.sh
Run the shell script to create an srr file in synplify directory which will be named ModuleName.srr Open this file in any editor and look for errors and warnings. If you would like info about the timing just find "MHz". If you would like to specify the line numbers of errors and warnings just type in the following command in the select directory.
grep @E synplify/ModuleName.srr
--Elnaz 22:14, 27 March 2009 (PDT)