Difference between revisions of "Leon3"

From Vlsiwiki
Jump to: navigation, search
(Links)
(Design Questions)
 
(10 intermediate revisions by the same user not shown)
Line 54: Line 54:
 
   #alternatives --config java   
 
   #alternatives --config java   
 
   #alternatives --config javac
 
   #alternatives --config javac
 +
 +
==Hardware in the Loop (HIL) Consideration==
 +
 +
The idea for the ML510 Board is outlined:
 +
 +
1) Have the board run a Leon3 core, with many peripherals.
 +
 +
2) Run an MMU-less Linux version (MMU-less so that software is SCOORE compatible)
 +
 +
3) Create a AHB component that will allow communication with a DUT (module under test).
 +
 +
4) Extend ATC to allow VPI testbenches to run on the Leon3
 +
 +
5) Current testbenches can test modules within the FPGA, running at native speeds
 +
 +
6) On error, transfer state information back to computer.
 +
 +
7) Rerun testbench with current state information -- essentially resuming from where the Leon3 left off.
 +
 +
8) Fix error, resynthesize, offload to back Leon3, resume testing.
 +
 +
 +
Essentially, an HDL module would be tested in hardware.  This allows speeds in excess of 100MHz, much faster than the 10KHz simulation speed of PC based simulators.
 +
By saving state information for the last 1000 cycles for example, the hope is that by "rewinding" back that far, the bug can be reproduced. 
 +
Using VCS or Modelsim for simulation, debugging can ensue. 
 +
 +
The hardware is used to allow acceleration of HDL verification.  If more transparency is needed, either FPGA based scopes such as ChipScope can be used, or switching
 +
to the PC simulator can happen almost transparently.  The end goal is to save time in discovering the presence of bugs, and to focus mostly on debugging the bugs themselves.
 +
 +
A mechanism such as DSP-Builder's Hardware in the Loop (HIL) allows for the instantiation of a module within the fpga, while the logic to interface with it is created
 +
by the tool automatically.  The main difference is that our design executes the testbench on the FPGA as well, saving bandwidth and potential problems in interfacing the PC to the module.
 +
 +
==HIL Ideas==
 +
Use internal RAM blocks to provide input vectors and store output vectors. (
 +
Use Fifo within the Sim module to increase bandwidth / allow decoupling of clocks.
 +
 +
When using FIFOs, does size matter
 +
 +
==Hardware in the Loop (HIL) Consideration==
 +
The case for HILs
 +
 +
The number one motivation is speed.  Imagine running SpecInt 2000 on a simulated processor. VCS might very chug along at an ultra fast speed of ~100 Cycle per second. SpecInt takes Billions of instructions.  Achieving a speedup of several magnitudes of order greater is no small feat. 
 +
Scalability is a major simulator disadvantage.  Simulating a complete processor at the RTL level
 +
 +
Read the HIL article from embedded.com [ http://www.embedded.com/15201692 ]
 +
 +
Example product: PCI accelerator card for Simulink DSP. [ http://www.gidel.com/ProcHils.htm ]
 +
 +
==HIL Downsides==
 +
HIL is a tool, which has its pros and cons.  Strengths and weaknesses should be weighed when determining if it is right for you.
 +
 +
Big HDL designs may take ~20 hours to synthesize.  A redesign may necessitate a full recompile, requiring a full day. 
 +
Intel implemented a full Pentium processor in a Virtex 4. The synthesis took 20-24 hours. Source [ http://portal.acm.org/ft_gateway.cfm?id=1331901&type=pdf ]
 +
It achieved 25MHz, while utilizing 46% of the slices of a V4LX200.
 +
 +
Multiple clocks and clock gating poses a challenge.
 +
 +
Links:
 +
(1) http://www.altera.co.jp/literature/ug/ug_dsp_design_flow.pdf (See HIL section)
  
 
==Design Questions==
 
==Design Questions==
 +
  
 
1) How easy is it to have mixed VHDL / Verilog Design using Synplify?
 
1) How easy is it to have mixed VHDL / Verilog Design using Synplify?
 +
  ->Very easy
  
 
2) How much throughput is needed?   
 
2) How much throughput is needed?   
 
+
 
 
3) Which bus to use? AHB, APB? or tightly couple to proc (difficult)
 
3) Which bus to use? AHB, APB? or tightly couple to proc (difficult)
 
   Wishbone / AHB / CoreConnect  http://es.elfak.ni.ac.yu/Papers/ICEST%20'06.pdf
 
   Wishbone / AHB / CoreConnect  http://es.elfak.ni.ac.yu/Papers/ICEST%20'06.pdf
Line 79: Line 140:
 
Nios II / Leon2 Comparison CoSCPU.pdf
 
Nios II / Leon2 Comparison CoSCPU.pdf
 
Leon2 / MicroBlaze / OpenRISC1200 Comparison Evaluation_of_synthesizable_CPU_cores.pdf
 
Leon2 / MicroBlaze / OpenRISC1200 Comparison Evaluation_of_synthesizable_CPU_cores.pdf
 +
 +
the Linux boot itself lasts about 1 billion cycles -  http://eve-team.com/demos/linux-boot-demo.php
 +
 +
3 Reviews of Tharas accelerator - http://www.deepchip.com/items/0408-03.html
 +
 +
Slides of EVE innovations - http://university.eve-team.com/files/ASAP2009.pdf
  
 
==Links==
 
==Links==
 +
 +
VHDL For Verilog Designers Cheat Sheet: vhdl4verilog_summary.pdf
  
 
  Comparison of many soft cpus - http://www.1-core.com/library/digital/soft-cpu-cores/
 
  Comparison of many soft cpus - http://www.1-core.com/library/digital/soft-cpu-cores/
Line 86: Line 155:
 
  Great Class, APB bus verilog LEON peripheral, includes verilog Leon top & much more http://classes.engineering.wustl.edu/cse465/
 
  Great Class, APB bus verilog LEON peripheral, includes verilog Leon top & much more http://classes.engineering.wustl.edu/cse465/
  
  Evaluation of synthesizable CPU cores
+
SOC Leon Verilog APB & simulation HOWTO Includes AMBA verilog code - http://vcag.ecen.okstate.edu/projects/scells/download/MOSIS_SCMOS/socks/socks_documentation.doc
 +
 
 +
  Evaluation of synthesizable CPU cores - Paper of several cores
  
 
  NIOS2 to Leon2 - http://www.altera.com/literature/dc/2007/in_2007_multiproc.pdf
 
  NIOS2 to Leon2 - http://www.altera.com/literature/dc/2007/in_2007_multiproc.pdf

Latest revision as of 11:46, 28 August 2009

Tom Golubev's Leon3 Based Logic Analyzer

Using NIOSII Cyclone dev board

Leon3 grlib-gpl-1.0.20-b3403

 FPGA Usage (1): EP1C20F400C7
  Configuration:                  LUT Usage (%)        Mem Usage (%)   System MHz 
 (1) Default: noFpu, noMMU               56 %               36 %           135.9 
 (2) noFPU, MMU                          65 %               37 %           134.8 
 (3) Bare                                38 %               18 %           126.1 




(1) Command Used: make distclean && time `make synplify && make quartus-synp`


Adding ADCDAC Tutorial

1) Edit config.in to add options to xconfig

2) add to leon3XX.vhd in the designs/your_board directory
library gleichmann;
use gleichmann.hpi.all;
use gleichmann.miscellaneous.all;
use gleichmann.multiio.all;
use gleichmann.dac.all;
use gleichmann.sspi.all;
use gleichmann.ge_clkgen.all;
use gleichmann.ac97.all;
3) Add signals for ADC in structure (leon3XX.vhd)
4) Add generate section into structure (leon3XX.vhd)
5) Removed gleichman from LIBSKIP (Makefile)
6) Copied leon3hpe.h file into project dir, because config.vhd.in file imported it


Programming Board

using grmon:
add to path - #export PATH=$PATH:/home/tom/leon3/grmon-eval/linux
add libjtag.so from quartus 32 bit #export LD_LIBRARY_PATH=/software/altera/quartus/linux
envoke grmon, #grmon-eval -altjtag
To use grmon-rcp, libwidget_gtk2.so 32 bit needs to be present
A 32-bit Java also need to be used
Install 32 bit openjdk
Switch from 64 to 32 bit java 
 #alternatives --config java  
 #alternatives --config javac

Hardware in the Loop (HIL) Consideration

The idea for the ML510 Board is outlined:

1) Have the board run a Leon3 core, with many peripherals.

2) Run an MMU-less Linux version (MMU-less so that software is SCOORE compatible)

3) Create a AHB component that will allow communication with a DUT (module under test).

4) Extend ATC to allow VPI testbenches to run on the Leon3

5) Current testbenches can test modules within the FPGA, running at native speeds

6) On error, transfer state information back to computer.

7) Rerun testbench with current state information -- essentially resuming from where the Leon3 left off.

8) Fix error, resynthesize, offload to back Leon3, resume testing.


Essentially, an HDL module would be tested in hardware. This allows speeds in excess of 100MHz, much faster than the 10KHz simulation speed of PC based simulators. By saving state information for the last 1000 cycles for example, the hope is that by "rewinding" back that far, the bug can be reproduced. Using VCS or Modelsim for simulation, debugging can ensue.

The hardware is used to allow acceleration of HDL verification. If more transparency is needed, either FPGA based scopes such as ChipScope can be used, or switching to the PC simulator can happen almost transparently. The end goal is to save time in discovering the presence of bugs, and to focus mostly on debugging the bugs themselves.

A mechanism such as DSP-Builder's Hardware in the Loop (HIL) allows for the instantiation of a module within the fpga, while the logic to interface with it is created by the tool automatically. The main difference is that our design executes the testbench on the FPGA as well, saving bandwidth and potential problems in interfacing the PC to the module.

HIL Ideas

Use internal RAM blocks to provide input vectors and store output vectors. ( Use Fifo within the Sim module to increase bandwidth / allow decoupling of clocks.

When using FIFOs, does size matter

Hardware in the Loop (HIL) Consideration

The case for HILs

The number one motivation is speed. Imagine running SpecInt 2000 on a simulated processor. VCS might very chug along at an ultra fast speed of ~100 Cycle per second. SpecInt takes Billions of instructions. Achieving a speedup of several magnitudes of order greater is no small feat. Scalability is a major simulator disadvantage. Simulating a complete processor at the RTL level

Read the HIL article from embedded.com [ http://www.embedded.com/15201692 ]

Example product: PCI accelerator card for Simulink DSP. [ http://www.gidel.com/ProcHils.htm ]

HIL Downsides

HIL is a tool, which has its pros and cons. Strengths and weaknesses should be weighed when determining if it is right for you.

Big HDL designs may take ~20 hours to synthesize. A redesign may necessitate a full recompile, requiring a full day. Intel implemented a full Pentium processor in a Virtex 4. The synthesis took 20-24 hours. Source [ http://portal.acm.org/ft_gateway.cfm?id=1331901&type=pdf ] It achieved 25MHz, while utilizing 46% of the slices of a V4LX200.

Multiple clocks and clock gating poses a challenge.

Links: (1) http://www.altera.co.jp/literature/ug/ug_dsp_design_flow.pdf (See HIL section)

Design Questions

1) How easy is it to have mixed VHDL / Verilog Design using Synplify?

  ->Very easy

2) How much throughput is needed?

3) Which bus to use? AHB, APB? or tightly couple to proc (difficult)

  Wishbone / AHB / CoreConnect   http://es.elfak.ni.ac.yu/Papers/ICEST%20'06.pdf
  BEE2 Leon2 ahb bus explanation http://cadlab.cs.ucla.edu/software_release/bee2leon3port/files/Timothy_Wong-MS_Report.pdf

==AMBA BUS Notes:== (From 1999 ARM ihi 0011A Document)

AMBA signal names

         All AMBA signals are named such that the first letter of the name indicates which bus
         the signal is associated with. A lower case n in the signal name indicates that the signal
         is active LOW, otherwise signal names are always all upper case.
         Test signals have a prefix T regardless of the bus type. More information on test signals
         can be found in Chapter 6 AMBA Test Methodology.


Important Sources / Documents: Nios II / Leon2 Comparison CoSCPU.pdf Leon2 / MicroBlaze / OpenRISC1200 Comparison Evaluation_of_synthesizable_CPU_cores.pdf

the Linux boot itself lasts about 1 billion cycles - http://eve-team.com/demos/linux-boot-demo.php

3 Reviews of Tharas accelerator - http://www.deepchip.com/items/0408-03.html

Slides of EVE innovations - http://university.eve-team.com/files/ASAP2009.pdf

Links

VHDL For Verilog Designers Cheat Sheet: vhdl4verilog_summary.pdf 
Comparison of many soft cpus - http://www.1-core.com/library/digital/soft-cpu-cores/
Great Class, APB bus verilog LEON peripheral, includes verilog Leon top & much more http://classes.engineering.wustl.edu/cse465/
SOC Leon Verilog APB & simulation HOWTO Includes AMBA verilog code - http://vcag.ecen.okstate.edu/projects/scells/download/MOSIS_SCMOS/socks/socks_documentation.doc
 
Evaluation of synthesizable CPU cores - Paper of several cores
NIOS2 to Leon2 - http://www.altera.com/literature/dc/2007/in_2007_multiproc.pdf
Tutorial of adding AHB Slave to Leon2 - http://www.ece.ncsu.edu/muse/soc_information/tutorial/leon/
Another paper of Leon2 APB addon - http://www.iberchip.org/iberchip2006/ponencias/45.pdf
Dutch university research into SOCs, includes NIOS2 Cyclone Dev board  - http://emsys.denayer.wenk.be/?project=empro&page=about
Porting to AHB / Wishbone buses report - http://emsys.denayer.wenk.be/empro/OCIDEC%20case%20report.pdf
Leon 2 dev board seller (mostly Xilinx parts) - http://www.pender.ch/products.shtml
Stratix 1S80 Leon board adaptation - http://www.nouiz.org/leon2.html

Print

GRADCDAC - tmtc.pdf (13-24)
AMBA Spec IHI0011A_AMBA_SPEC.pdf
GRADCDAC RTEMS Driver  rtems-gaisler-drivers-1.1.99.4.0.pdf (92-101)
Gaisler IP grip.pdf (533-558,563-570)

Sources

(ASAP 2006).
S. Tillich and J. Großsch¨adl. Instruction Set Extensions for
Efficient AES Implementation on 32-bit Processors. In
Cryptographic Hardware and Embedded Systems — CHES
2006, vol. 4249 of Lecture Notes in Computer Science, pp.
270–284. Springer Verlag, 2006.
A. Hodjat, I. Verbauwhede : Interfacing a high speed crypto
accelerator to an embedded CPU. In: Proceedings of the 38th
Asilomar Conference on Signals, Systems, and Computers, vol.
1, pp. 488–492. IEEE, New York (2004)
P. Schaumont, K. Sakiyama, A. Hodjat, and I. Verbauwhede.
Embedded Software Integration for Coarse-Grain Reconfigurable Systems. 
In Proceedings of the 18th International Parallel and Distributed 
Processing Symposium (IPDPS 2004), pp. 137–142, IEEE CS Press, 2004.
An AES Tightly Coupled Hardware Accelerator in an FPGA-based
Embedded Processor Core