SFL: Structured Function description Language Tutorial

SFL is a RTL (Register Transfer Level) hardware description language and
serves as an input of the PARTHENON system. It has a simple clock model that
allows describing synchronous single-phase clock digital circuits. The main
characteristics of SFL are as follows:

   * Control- and data path separation: SFL explicitly supports this concept
     by providing two types of I/O terminals: control related signals
     ('instrin', 'instrout') and data terminals ('input', 'output'). Signals
     declared as control input are always thought as driven with either '0'=
     inactive or '1'= active (positive logic system). Output control
     terminals 'instrout' are by default driven with '0'. If the programmer
     asserts such a signal it is true for exactly one clock cycle and
     changes back to false again (light-up principle). On the other hand
     data terminals have undefined values ('u'= unknown, 'z'= high
     impedance) if not explicitly driven.
   * Functional description: SFL is not mixed with interconnection
     description but uses an object-oriented style for describing module
     hierarchy. An object can have multiple behaviors. Every behavior is
     associated with a control signal. A behavior gets "executed" if the
     corresponding control signal is triggered. For example let's say 'Alu'
     is an object with two behaviors 'add' and 'sub'. So registers can be
     assigned by simply writing 'reg1:= Alu.add(in1,in2).out' or 'reg2:=
     Alu.sub(in1,in2).out'.
   * Explicit construct for describing state machines: finite state machines
     (FSM) play a major role in digital system design. SFL allows the
     programmer to describe a set of communication state machines in a
     compact manner.

The following tutorial contains three simple examples that stress above
concepts. We will simulate them with the PARTHENON simulator SECONDS. Here
is the full SFL syntax (Backnus Naur Form).

   * Full-adder: this purely combinatorial design will familiarize us with
     the basic SFL syntax and the module concept.
   * Timer: this is a sequential design and introduces the constructs for
     describing finite state machines. Further the advantage of the
     separation in control- and data signals is shown.
   * CPU: this example demonstrates the concept of communicating state
     machines and the usage of modules out of the PARTHENON library
     (memories, adders, etc.)

Full-adder

Let's have a look at the whole source code before sorting out line after
line. The design holds two modules: 'adder1' (one bit full-adder) is a
submodule and used in the top module 'adder4' (bit slice implementation).

Code file 'adder4.sfl' (with line numbering)

As in C language, SFL distinguishes between module declaration and module
definition. Before the 'adder1' circuit can be used in the top module
'adder4', it must be declared. This is done in line 7..13. The definition of
the 'adder1' behavior can be found in line 55..67. This definition code
could be located in a separate file. Further it is not necessary to declare
the top module 'adder4'.

Comments are delimited with /*this is a comment*/. SFL module syntax always
starts with the facility declaration. The 'adder1' module (line 7) declares
three one bit data inputs 'a', 'b', 'cin' and two one bit data outputs, the
sum bit 's' and the carry-out bit 'cout'. The control terminal 'add' in line
10 is an important thing. As mentioned in the introduction, behavior of an
object can be associated with a control signal. If doing so the 'add'
behavior is only "executed" if this control signal is active. Line 12 shows
the argument binding of the 'add' behavior with the 'instr_arg' construct.
With this it is very easy to access the object 'adder1' and its behavior
'add' from other modules (module instantiation).

Before we use our 'adder1' object in the top module 'adder4', let's take a
look at the actual implementation. After the facility declaration the
'instruct' keyword on line 60 starts the behavior definition related to the
'add' control signal. The 'par {..}' statement says what has to be done in
parallel during that clock period. The sum bit 's' is realized with xor
operations while the carry output bit 'cout' is formulated as a sum of
product (SOP). Although this description holds some redundancy we leave it
by this straightforward "truth table" definition and let the underlying
PARTHENON synthesizer do the optimization work. In real designs its anyway
appropriate to use building blocks such as adders or multipliers from a
library, which is optimized for the target technology (CMOS, FPGA, etc).

Now since the 'adder1' module is declared and defined we can use it in the
top module 'adder4'. Again we start with the facility declaration in line
19. Apart from input and output we have some internal facilities, i.e. four
1-bit adders ('ADDER0'.. 'ADDER3') and three selectors ('CARRY1'.. 'CARRY3')
for the carry calculation. The 'adder4' object binds its behavior to the
'ADD' control signal. Line 27..35 show how easy submodules can get connected
in SFL. The most significant bit (MSB) of the sum vector 'S' is assigned to
the output of the 'ADDER3' object by simply "calling" its behavior 'add'
with the corresponding arguments. The SFL operator '||' stands for
concatenation with the order MSB..LSB. Thus the sum vector 'S=[S<3>.. S<0>]'
can be assigned as in line 27..30. In line 31 and the following we do not
have to repeat the argument binding (i.e. 'COUT= ADDER3.add( A<3>, B<3>,
CARRY3)' ) since a hardware object can only get connected once. Because all
the assignments are within the 'par' statement, their order does not matter.

Now let's simulate the 'adder4.sfl' module with the PARTHENON simulator
SECONDS with the script file 'adder4.scr'. Here is the result of following
command.

   %seconds < adder4.scr
    ...

    --- simulation modules:
    adder4                          (module          )
    adder1                          (module          )

    --- facilities of top module:
    /                               (module          )
      A                               (term input      )
      ADD                             (instr input     )
      ADDER0                          (submodule       )
      ADDER1                          (submodule       )
      ADDER2                          (submodule       )
      ADDER3                          (submodule       )
      B                               (term input      )
      CARRY1                          (term internal   )
      CARRY2                          (term internal   )
      CARRY3                          (term internal   )
      CIN                             (term input      )
      COUT                            (term output     )
      S                               (term output     )

    --- simulation start:

    input pins | internal signals                              | output pins
    A B CIN ADD| ADDER0.a ADDER0.b ADDER0.cin ADDER0.add CARRY1| S COUT
    ------------------------------------------------------------------------
    1 1 0   1  | 1        1        0          1          1     | 2 0
    1 6 1   1  | 1        0        1          1          1     | 8 0
    7 2 0   1  | 1        0        0          1          0     | 9 0
    9 8 0   1  | 1        0        0          1          0     | 1 1
    1 1 0   0  | z        z        z          0          z     | z z
    1 6 1   0  | z        z        z          0          z     | z z

After listing the simulated modules and the facilities of the top module
'adder4', simulation starts. Our circuit does what we want. Interesting are
the control signals 'ADD' and 'ADDER0.add'. If they are zero, the output 'S'
and 'COUT' are in an undefined state 'z'.

Let's go down to gates now: The following shows a schematic (adder4.jpg,
adder4.ps) and the listing of the netlist (adder4.edif) after PARTHENON
synthesis with the demonstration CMOS library:

%auto adder4 ps DEMO demo

Where are our control signals 'ADD' and 'add'? They have gone. This is
actually not surprising since the full-adder consist of only combinatorial
circuitry. Because nothing has to be controlled the PARTHENON synthesizer
has optimized them away. Actually it is a question of style to having
introduced them after all. If we omit them, on the other hand, module
instantiation becomes clumsy as in structural description languages. That is
why we still use them even in combinatorial designs.

Timer

The second example shows the implementation of an 8-bit timer. Here is the
interface specification:

   Inputs   SET          sets the timer with the
                          'INIT<8>' value
           INIT<8>      8-bit initialization value
           RESET        resets timer by deactivating the
                          'EXPIRE' signal
   Outputs  EXPIRE       asserted if the timer reaches
                          zero
           ENABLE       active if 'COUNT<8>' is valid
                          otherwise inactive
           COUNT<8>     shows timer countdown

The 'SET' signal initializes the timer with the 'INIT' value and the
countdown starts. While counting down the 'COUNT' output shows the momentary
value of the timer. The 'ENABLE' signal is '1' whenever 'COUNT' is valid,
otherwise it is '0'. If the timer reaches zero the 'EXPIRE' signal is
activated. It stays active until 'RESET' is triggered. If 'RESET' gets
active during countdown, the timer stops and the 'COUNT' output is don't
care.

The first step is the interface declaration of the top module:

declare timer8 {
  instrin  SET, RESET;
  input    INIT<8>;
  instrout EXPIRE, ENABLE;
  output   COUNT<8>;
  instr_arg SET(INIT);
}/*timer8*/

We have declared four control signals: 'SET', 'RESET' as input and 'EXPIRE',
'ENABLE' as output. The reason for this is, that these signals are supposed
to be driven all the times, i.e. they are '0' or '1'. On the other hand the
input value 'INIT<8>' is a data terminal which must only be driven if the
'SET' is active. The same is true for the output 'COUNT<8>'. If ENABLE is
zero it can have any value. Don't cares are very important when it comes to
logic synthesis. A high degree of freedom allows the optimizer to find
better solutions (i.e. shorter critical paths, less gates) for a given
specification. The 'SET' behavior takes an argument 'INIT', which is
declared with the keyword 'instr_arg'.

After the interface declaration we can start now with the implementation of
the timer. It consists of an 8-bit decrementor submodule 'dec8' and the top
module 'timer8':

Code file 'timer8.sfl' (with line numbering)

After declaration the decrementor module 'dec8' is defined as a 'circuit' in
line 17..23. A circuit is a library module that is already synthesized. With
this it is possible to reuse third party designs. Thus we only have to
declare its interface and a quasi behavior for simulating it. This is done
in line 22 with the 'instruct' command. Since subtraction is not defined in
binary world we use the addition with the "two's complement". Here is an
overview of the SFL operators.

  priority  symbol   operation    example                  remarks
                                   code     a<4>=1011
                                            b<4>=1111
  high      <n:m>    extraction   a<2:1>   01              
             <n>      extraction   a<2>     0               
  middle    ^        not of all   ^a       0100            
                      bits
             /|       or of all    /|a      1               
                      bits
             /@       xor of all   /@a      1               
                      bits
             /&       and of all   /&a      0               
                      bits
             /        decode       /a       0000100000000000only
                                                            circuit
             \        encode       \a       011             only
                                                            circuit
             #        bit expansion8#a      11111011        
  low       |        or           a|b      1111            
             @        xor          a@b      0100            
             &        and          a&b      1011            
             ||       concatenationa||b     10111111        
             +        addition     a+b      11010           only
                                                            circuit
             >>       bit shift    a>>0x2   0010            only
                      right                                 circuit
             <<       bit shift    a<<0x2   1100            only
                      left                                  circuit
             ==       comparison   a==b     0               only
                                                            circuit,
                                                            ok if
                                                            rvalue is
                                                            constant

Operators only supported within circuit modules have an entry in the remark
column. The comparison operator can be synthesized if the right value is a
constant, i.e. a== 0xf8 works fine. In the future SFL will also allow
multiplication and other operations for behavioral simulation within
SECONDS.

Let's come back to the code. The comments in the code of the timer8 module
(line 28..) show the SFL syntax skeleton. It is divided into a "facility
declaration F1..F4" (line 33..48) and a "behavior definition part B1..B3"
(line 53..89). Facility declaration contains 4 subsections:

   * We already know the "I/O facilities" (F1) from the full-adder example.
     As "internal facilities" (F2) we declare the register <REMAINED<8> and
     an 8-bit decrementor 'DEC'. In the "argument binding" section (F3) we
     spot something new. Before we have bound an 'instrin' signal with data
     input. Now we associate the 'ENABLE' output control signal with the
     'COUNT' data output. In programming, "binding together what belongs
     together" has the same importance as the "divide and conquer principle"
     (module concept). We finish the declaration part with the last section
     "state machines" (F4) by introducing the sequential circuit 'MAIN' with
     the task 'RUN'. State machines can have multiple tasks and every task
     can bind different arguments. In our case, the 'RUN' task is associated
     with the 'REMAINED' register. As soon it gets active it will load the
     register with the specified value (see line 59).

The behavior definition part can be divided into three subsections B1..B3.

   * The first one is the "core behavior" (B1). It gets executed in every
     clock cycle. In our module we do not utilize it. For illustration we
     introduce the empty statement.
   * The "control related behavior" (B2) starts in line 58 with the
     'instruct' keyword. This code gets only executed if the related control
     signal, in our case 'SET', is active. We 'generate' the task 'RUN' of
     the state machine 'MAIN' and load the 'REMAINED' register with the
     'INIT' value.
   * The last section is the "state machine behavior" (B3), a very special
     feature of SFL. In line 63 we see that the 'stage' (i.e. state machine)
     'MAIN' holds two states 'DOWN' (countdown) and 'ASSERT' (count= zero).
     After initialization, the first state will be 'DOWN'. The behavioral
     part of state machine consists out of two subsections.
   * The "core behavior" (B3.1) is executed in every clock cycle the state
     machine is activated, regardless of the actual state. As our
     specification requires, we feed the contents of the 'REMAINED' register
     through the 'COUNT' data terminal to the outside world, as long as
     'RESET' is deactivated. Because we defined 'ENABLE' as an 'instrout'
     terminal it is by default driven with '0' and gets "turned on" with the
     'ENABLE(..)' command. This is very practical for the designer. He must
     only switch on the control signals and does not have to care about
     switching them off again. This job is done automatically by the
     PARTHENON synthesizer. SFL owes this feature to the simple clock model
     (one clock, synchronous, single-phase), in which every action is bound
     to a single clock cycle.
   * In the last section "state behavior" (B3.2) we define the actions for
     every state separately. We start with the 'DOWN' state in line 73. The
     'any' operator is a multiple 'if' construct. The conditions are
     delimited from the actions by a colon, i.e. 'any{condition1: action1;
     condition2: action2;... else: action;}'. In our case, we 'finish'
     execution of the state machine 'MAIN' if the expression 'RESET | SET'
     is true. Otherwise we decrement the 'REMAIND' register by one and
     'goto' the 'ASSERT' state if the decrementor reaches zero. For the
     condition evaluation we take the output of the decrementor 'DEC.out'
     and not the register 'REMAINED', because the value of the register
     changes only in the next clock cycle (':=" operator). Signals available
     in the same clock cycle are assigned with the '=' operator. Finally we
     define the behavior of the 'ASSERT' state. If the condition 'RESET |
     SET' is true we finish the state machine and change to the 'DOWN'
     state. Otherwise we keep asserting the control signal 'EXPIRE( )'.

Here is the result 'timer.sim' of the 'timer8.sfl' simulation with the
SECONDS script 'timer8.scr':

Again the simulation modules and the facilities of the top module 'timer8'
are listed first. The first column 'CLK' shows the clock period. At CLK=1 we
set the timer with the initialization value 'INIT'=0x04. As defined in our
code, the state machine 'MAIN' is in the 'DOWN' state and the task
'MAIN.RUN' is not running. At power on, the content of the register
'REMAINED' is unknown. The output signals behave correct. As specified the
control terminals 'EXPIRE' and 'ENABLE' are driven with '0'. On the other
hand, 'COUNT' is not driven. From CLK=2..5 the timer is counting down in the
'DOWN' state. Now the output 'COUNT' shows the momentary value. In CLK=6
count has reached zero and the 'EXPIRE' signal is asserted, until the
'RESET' in CLK=9.

CPU

In the last example we build a simple 8-bit MISC (Minimum Instruction Set
Computer!?). It is a register-memory architecture with only 16 instructions.
We use circuits out of the system library and show the concept of
communicating state machines. First let's have look at the full source code.

Code file 'cpu.sfl' (with line numbering)

In line 8 an SRAM memory 'r256_8.h' with 256 cells is included from the
PARTHENON system library. As in C language, the system library is searched
if the object is in <brackets>. User defined includes are marked with
"double quotes". Apart form the memory an 8-bit incrementor 'inc8.h' and
carry look-ahead adder 'cla8.h' are included.

The encoding of the instruction set is done with the define '%d'
preprocessor statements in line 15..30. Instructions with a memory operand
(LDAI, LDXI, LDXM, STXM, BC) have their MSB encode with '1'. The rest of the
encoding is done with the 4 LSBs.

Line 35..42 declare the 'cpu' module. It has a simple memory interface with
two data buses 'dti', 'dto', one address bus 'adrs' and two control signals
for read and write operations. Further the 'cpu' gets activated with the
'start' terminal.

The behavior of the 'top' module is defined in line 47..55. Thanks to the
object oriented SFL syntax connecting the 'ram' memory and 'cpu' is a piece
of cake. The whole behavior is packed into three 'instruct' statements.

The definition of the 'cpu' module starts with the I/O facilities
declaration in line 60. The program counter 'pc' is a 8-bit register with
reset 'reg_wr'. At power on it will be initialized with '0x00', ready to
fetch the first instruction. Register 'a' is an accumulator, 'x' the memory
address register, 'c' a carry or condition register. In 'op1' the fetched
instruction is stored, while 'op2' holds the immediate memory operand. The
memory data register 'md' is used for register deferred addressing mode of
the ALU instructions ADXC and ANDX.

In line 81..86 two state machines and corresponding tasks are declared. The
'if.ift' task controls the instruction fetch and the 'exec.ext' cares for
proper execution.

   * The 'if' stage contains two states. In state 'fetch1' the instruction
     is stored in the 'op1' register and the program counter is incremented.
     Only the MSB of the data bus 'dti<7>' is decoded. In case of an
     immediate instruction we goto state 'fetch2' and load the operand into
     'op2'. Otherwise we 'relay' execution to the 'exec.ext' task. The
     statement 'relay exec.ext( );' is a shortcut for 'finish; generate
     exec.ext( );'. Thus the state machine 'if' is deactivated in the next
     clock cycle (leave behind control).
   * State machine 'exec' also holds two states. In state 'exec1' we first
     decode the ALU instructions ADCX and ANDX since they take a memory
     operand. If the expression in line 118 is true, the operand is loaded
     into the 'md' register and control is transferred to exec2. We also
     'generate' the instruction fetch task 'if.ift( )'. With this, ADLX and
     ANDX are pipelined instruction. During the next clock period both tasks
     ('if.ift' in state 'fetch1' and 'exec.ext' in stage 'exec2') will be
     active. In state 'exec2' we only need the resources of the carry
     look-ahead adder. Thus the memory buses are free for instruction fetch.
     All other instructions are decoded and executed in line 124..139. Here
     again we 'relay' execution to the 'if.ift( )' task which means that we
     finish the 'exex.ext' task and generate the 'if.ift' one.

With these constructs control can be easily transferred from state machine
to state machine and various kind of parallelisms can be utilized (i.e.
leave behind, pipeline, etc.)

Now let's simulate our machine with the PARTHENON simulator SECONDS. The
code file is 'cpu.sfl' and the corresponding simulator script is 'cpu.scr'.
Here is the result 'cpu.sim' of the following little program which computes
the sum '0x12+ 0x34= 0x46':

   address  contents     assembler         meaning

   0x00     0x83 0xfd    LDXI 0xfd         x <- 0xfd
    0x02     0x01         LDAX              a <- (0xfd)
    0x03     0x08         CLC               c <- 0
    0x04     0x83 0xfe    LDXI 0xfe         x <- 0xfe
    0x06     0x0b         ADCX              a <- a+ (0xfe)+ c
    0x07     0x83 0xff    LDXI 0xff         x <- 0xff
    0x09     0x02         STAX              (x) <- a
    0x0a     0x07         SEC               c <- 0
    0x0b     0x8d 0x0b    BC 0x0b           if (c) pc <- op2

This code is set in the simulator script 'cpu.scr' with 'meset /ram/cell X00
0X83 0Xfd..'. Data is set with 'memset /ram/cell Xfd 0X12 0X34 0X00'.

After listing the simulated modules and the facilities of the 'top' module,
simulation starts. The first colon 'CLK' numbers the clock cycles. After
triggering the 'start' control signal in CLK=2 the 'ift' task is activated
for two clock cycles: first in the 'fetch1' state (instruction fetch of
LDXI) and then in the 'fetch2' state (operand fetch 0xfd). In CLK=4 the task
'ift' finishes and the execution task 'ext' is activated (relay statement).
The operand 0xfd is transferred to the memory address register 'x'. If we
compare the values in the colon 'ift' with the ones in 'ext' during the
following clock cycles, we can see the ping-pong control between these two
tasks. Only in CLK=14 both task are active because the instruction ADCX is
pipelined. While the 'ift' task fetches the next instruction LDXI the 'ext'
task adds the values 'a'= 0x12, 'md'= 0x34, 'c'= 0 and writes the result
back into the accumulator register 'a'=0x46. The program hangs up in an
endless loop after 'CLK'=21.

Further examples

Please try also these code examples:

   * PCI-bus interface simulation kit: 'pcisim96.tar'
   * 32 bit RISC engine (DLX): 'proc32.tgz'

Conclusion

A quick overview of the SFL hardware description language was given by
coding and simulating three examples. In order to compare SFL with other
languages have a look at similar tutorials (LOLA, Verilog HDL, VHDL).

The main advantages of SFL are:

   * Functional description: SFL is very close the software programming
     languages like C or Pascal. They stress behavior rather than structure.
     Thanks to the RTL level coding the gap between modeling and circuit
     implementation (ASIC, FPGA) can be closed with the powerful synthesis
     tools of the PARTHENON system.
   * Simple clock model: This allows the separation into control- and data
     signals, a unique feature of SFL. Thanks to the "light-up principle"
     the SFL code is very compact and easy readable. Further, faster
     simulation is possible because of the event- driven execution.
   * Future: dynamic hardware programming: In order to close the
     hardware-software gap we are working on a new architecture called PCA
     (Plastic Cell Architecture). It will be a platform that supports
     dynamic module instantiation. SFL will be enhanced with constructs like
     'malloc' or 'free' in C language. Hardware will become soft- a big
     challenge and an opportunity to get rid of the bottleneck of the
     Neumann-type computer paradigm.
