-
Notifications
You must be signed in to change notification settings - Fork 50
C Puzzle
Imagine a sequence of zeros and ones: 1,1,0,1,0,0,1
etc
Binary bytes (ASCII characters) have been encoded into the sequence based on some given integer constant N(ex.=1,2,etc) as follows:
- The sequence begins with 2N or more 1's
- Upon seeing the first 1->0 transition skip the next 3N-1 values
- The value arrived at is the first and least significant bit in the byte
- The next seven more-significant bits of the byte are obtained by skipping every 2N-1 values
- After the 8th bit completing the byte, repeat the above for next byte, beginning with looking for the 1->0 transition
Fill in this C function such that it returns zero/NULL until a full eight bit byte/character is parsed.
- The function is run once for each element in the sequence in order
- Only add code inside the function
- Only return a completed non-zero/null character once
- Use a single return statement at the end of the function
- No global variables, no pointers, pass by value only
- No libraries, no OS functionality, etc
- Use
+=1
or-=1
instead of++
or--
- Use
bool|u/int_t
types - Hint: static local variables maintain state
uint8_t the_function(bool seq_val){
uint8_t return_value = 0;
// CODE HERE
// One return statement at end of function
return return_value;
}
You can compile and run your version of the_function
(along with a working solution) as shown in c_puzzle.c:
Set #define N
to either 1
or 2
. The input_values
array sequences encodes two characters "Hi"
.
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#define N 2 // Some integer ex. 1,2
#define SKIP3 ((3*N)-1)
#define SKIP2 ((2*N)-1)
uint8_t the_function(bool seq_val){
// Hint: static uint8_t char_buffer; // Buffer to hold eight bits
uint8_t return_value = 0;
// CODE HERE
// One return statement at end of function
return return_value;
}
// Wrapper function for unrelated reasons...
void testbench(){
// These input_values sequences encode "Hi" with N=1 or N=2
#if N==1
// N=1
#define TEST_SIZE 45
bool input_values[TEST_SIZE] = {
1, // 2*N=2 or more 1's to start
1, // Ex. 3
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=2
// 'H' = 0x48 = 0b01001000
0, // Skipped
0, // Bit[0]
//
0, // Skip 2*N - 1=1
0, // Bit[1]
//
0, // Skip 2*N - 1=1
0, // Bit[2]
//
1, // Skip 2*N - 1=1
1, // Bit[3]
//
0, // Skip 2*N - 1=1
0, // Bit[4]
//
0, // Skip 2*N - 1=1
0, // Bit[5]
//
1, // Skip 2*N - 1=1
1, // Bit[6]
//
0, // Skip 2*N - 1=1
0, // Bit[7]
//
1, // 2*N=2 or more 1's
1, // Ex. 3
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=2
// 'i' = 0x69 = 0b01101001
1, // Skipped
1, // Bit[0]
//
0, // Skip 2*N - 1=1
0, // Bit[1]
//
0, // Skip 2*N - 1=1
0, // Bit[2]
//
1, // Skip 2*N - 1=1
1, // Bit[3]
//
0, // Skip 2*N - 1=1
0, // Bit[4]
//
1, // Skip 2*N - 1=1
1, // Bit[5]
//
1, // Skip 2*N - 1=1
1, // Bit[6]
//
0, // Skip 2*N - 1=1
0, // Bit[7]
//
1, // 2*N=2 or more 1's
1, //
1
};
#endif
#if N==2
// N=2
#define TEST_SIZE 87
bool input_values[TEST_SIZE] = {
1, // 2*N=4 or more 1's to start
1, // Ex. 5
1,
1,
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=5
0, // Skipped
0, // Skipped
// 'H' = 0x48 = 0b01001000
0, // Skipped
0, // Skipped
0, // Bit[0]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[1]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[2]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[3]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[4]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[5]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[6]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[7]
0,
//
1, // 2*N=4 or more 1's
1, // Ex. 5
1,
1,
1,
//
0, // The 1->0 transition \/
0, // Skip 3*N - 1=5
0, // Skipped
0, // Skipped
// 'i' = 0x69 = 0b01101001
1, // Skipped
1, // Skipped
1, // Bit[0]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[1]
0, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[2]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[3]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[4]
0, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[5]
1, // Skip 2*N - 1=3
//
1, // Skipped
1, // Skipped
1, // Bit[6]
1, // Skip 2*N - 1=3
//
0, // Skipped
0, // Skipped
0, // Bit[7]
0,
//
1, // 2*N=4 or more 1's
1, // Ex. 5
1,
1,
1
};
#endif
static uint32_t seq_index = 0;
uint8_t the_char = the_function(input_values[seq_index]);
if(the_char != 0){
printf("the_char = %c\n", the_char);
}
seq_index += 1;
}
// gcc c_puzzle.c -o c_puzzle && ./c_puzzle
int main(int argc, char** argv)
{
int i = 0;
while(i<TEST_SIZE)
{
testbench();
i++;
}
return 0;
}
Expected output:
the_char = H
the_char = i
Why is this a page on a hardware description language wiki?
Read this and see if it sounds familiar at all 😏
This puzzle is the same problem present in digital universal asynchronous receivers(/transmitters) UART
. N=one half of one bit period in time
. 2N=uart bit period, 3N is skipping 1.5 bits to get into center of bit for sampling (minus 1 is for counting/off by one reasons). Units are the clock cycle time
(ex. 100MHz = 10ns per tick, per run of function).
C code in c_puzzle.c written to solve this puzzle can be used with the PipelineC tool (see comments in code) to simulate and synthesize for FPGAs + ASICs.
Looking at the examples/c_puzzle.c/the_function/pipeline_map.gv.png
dataflow graph you can see how each operation maps to which locations in the C code. The size of each operation's 'box'/node in the graph is scaled to the time+space requirement of the computation ~= combinatorial delay x a rough estimate of size based on number of IO wires (TODO measuring/optimizing for fewer registers/LUTs).
This solution was kindly submitted by Discord|GitHub user siddhpant and was not intended to target hardware, so it makes a great experiment for evaluating 'what is natural for software folks to write' vs. 'what hardware it makes':
Dataflow:
In the the above dataflow graph, some of the larger and first items you might see are:
-
BIN_OP_SL_uint8_t_uint32_t
line 112
4.4ns
-
BIN_OP_PLUS_int32_t_int33_t
line 109
2.9ns
After synthesizing for an example Xilinx Artix 7
part, Vivado
logs produced by the tool show how the_function
has a maximum operating frequency of ~142 MHz
:
Report Cell Usage:
+------+-------+------+
| |Cell |Count |
+------+-------+------+
|1 |CARRY4 | 43|
|2 |LUT1 | 5|
|3 |LUT2 | 2|
|4 |LUT4 | 18|
|5 |LUT5 | 40|
|6 |LUT6 | 55|
|7 |FDRE | 113|
+------+-------+------+
Source: the_function_0CLK_a6fd0288/bit_pos_reg[4]/C
Destination: return_output_output_reg_reg[0]/R
Data Path Delay: 6.522ns (logic 2.414ns (37.013%) route 4.108ns (62.987%))
Logic Levels: 9 (CARRY4=6 LUT4=1 LUT5=1 LUT6=1)
The reported critically long delay path is from the bit_pos
register to the return output
. This can be confirmed by tracing the path in the dataflow from the bit_pos | NOW
current value of the bit_pos
register through the noted-as-large BIN_OP_SL_uint8_t_uint32_t
line 112
4.4ns
node all the way to the right OUT | return_output
return value of the function.
Based on the results below for the original solution, these are the recommended changes for a better hardware result:
- Don't use a variable-shifter
<<
, ex.letter |= seq_val << bit_pos;
- Instead of shifting
seq_val
by some 'run-time' value = bit_pos, it uses less resources to shift by a constant amount for each bit (one shift per bit in letter).
- Instead of shifting
- Use smaller width integer types
- Of course on a CPU this barely matters - but in hardware every bit counts, can reduce counter sizes to as small as
N
allows.
- Of course on a CPU this barely matters - but in hardware every bit counts, can reduce counter sizes to as small as
This solution was kindly submitted by Discord user can.l
, as an almost adversarial example of very concise C code. The code was broken apart into multiple lines for PipelineC + readability reasons.
Dataflow:
After synthesizing for an example Xilinx Artix 7
part, Vivado
logs produced by the tool show how the_function
has a maximum operating frequency of ~223 MHz
:
Report Cell Usage:
+------+-------+------+
| |Cell |Count |
+------+-------+------+
|1 |CARRY4 | 2|
|2 |LUT1 | 1|
|3 |LUT2 | 10|
|4 |LUT3 | 5|
|5 |LUT4 | 5|
|6 |LUT5 | 15|
|7 |LUT6 | 22|
|8 |FDRE | 33|
+------+-------+------+
Source: the_function_0CLK_f98a8a12/b_reg[0]/C
Destination: the_function_0CLK_f98a8a12/s_reg[2]/D
Data Path Delay: 4.429ns (logic 1.917ns (43.283%) route 2.512ns (56.717%))
Logic Levels: 6 (CARRY4=2 LUT2=1 LUT4=1 LUT5=1 LUT6=1)
FMAX ~= 223MHz
This next solution was submitted by PoignardAzur and has the highest estimated operating frequency (i.e. shortest critical path length) of all solutions thus far.
Dataflow:
After synthesizing for an example Xilinx Artix 7
part, Vivado
logs produced by the tool show how the_function
has a maximum operating frequency of ~311 MHz
:
Report Cell Usage:
+------+-----+------+
| |Cell |Count |
+------+-----+------+
|1 |LUT2 | 3|
|2 |LUT3 | 7|
|3 |LUT4 | 12|
|4 |LUT5 | 11|
|5 |LUT6 | 15|
|6 |FDRE | 35|
+------+-----+------+
Source: the_function_0CLK_8f4d4fc5/remaining_reg[5]/C
Destination: return_output_output_reg_reg[0]/D
Data Path Delay: 3.158ns (logic 0.999ns (31.634%) route 2.159ns (68.366%))
Logic Levels: 3 (LUT4=1 LUT6=2)
FMAX ~= 311MHz
The critical path has is shown to be from the remaining
count storage register to the return
ed output character (a generated output register).
Here is a short ranking of how slow operations are, fastest first, prefer selecting from the top of the list:
- Constant amount bit shifts / selecting some constant Nth bit / constant index local variable array/struct access
- Bitwise operators (&,|,etc)
- Multiplexing (if-else logic)
- Integer add/sub
- Integer mult
- Variable amount bit shifting (barrel shifters)
- Random array access
- Float mult
- Float add
- Integer divide
- Float divide
Dataflow:
In the the above dataflow graph, some of the larger and first items you might see are:
-
BIN_OP_MINUS_uint8_t_uint1_t
line 88
2.3ns
-
BIN_OP_EQ_uint8_t_uint3_t
line 78
2.3ns
This version of the_function
has a maximum operating frequency of ~290 MHz
:
Report Cell Usage:
+------+-----+------+
| |Cell |Count |
+------+-----+------+
|1 |LUT3 | 1|
|2 |LUT4 | 3|
|3 |LUT5 | 5|
|4 |LUT6 | 14|
|5 |FDRE | 29|
+------+-----+------+
Source: the_function_0CLK_83e31706/counter_reg[0]/C
Destination: return_output_output_reg_reg[0]/R
Data Path Delay: 2.925ns (logic 0.875ns (29.915%) route 2.050ns (70.085%))
Logic Levels: 2 (LUT4=1 LUT6=1)
The reported critically long delay path is from the counter
register to the returned output
. This can be confirmed by tracing the path in the dataflow from the counter | NOW
current value of the counter
register through the noted-as-large BIN_OP_MINUS_uint8_t_uint1_t
line 88
2.3ns
node all the way to the right OUT | return_output
return value of the function.