Portable and Optimized C for 8-bit Systems

This article describes some optimizing techniques for ANSI C code for all 8-bit vintage systems, i.e., computers, consoles, hand-helds, scientific calculators from the end of the '70s until mid '90s and in particular for the systems based on the following architectures (and the derived and compatible architectures):

Intel 8080 (*)
MOS 6502
Motorola 6809
Zilog Z80 (*)

(*) The Zilog Z80 is an extension of the Intel 8080. Therefore an Intel 8080 binary is compatible with a Z80-based system but not the other way round.

Most of the techniques presented here remain valid on other 8-bit architectures such as the COSMAC 1802 and the Intel 8051.

The goal of this article is two-fold:

present general techniques to optimize C code for all 8-bit systems;
present general techniques to write portable C code, i.e., valid and compatible code for all 8-bit systems, including systems that are not natively and explicitly supported by C compilers

Preconditions

This article is not a introduction nor a manual for the C language and has the following preconditions:

knowledge of the C language;
knowledge of structured and object-oriented programming;
familiarity with compilers and linkers.

Besides this article does not cover in depth some advanced topics such as:

coding in specific domains such as graphics, sound, input/output;
interaction between C and Assembly.

These advanced topics are very important and would require separate articles.

Definitions

In this article we will refer to system, target and architecture with the following meanings:

A system is any kind of processor-equipped machine such as computers, consoles, hand-helds, scientific calculators, embedded systems, etc.
A target of a compiler is any kind of system supported by the compiler, i.e., a system for which the compiler provides specific support with libraries and the generator of a binary in a format that is usable by an emulator and/or real hardware.
An architecture is a processor family (e.g., Intel 8080, MOS 6502, etc.). Therefore a target has an architecture that corresponds to its processor. So it has only one architecture unless if it has two or more processors belonging to different families (such as the Commodore 128, which has both a Z80 and a 6502-derived processor)

Multi-target cross-compilers

In order to produce binaries from source code we recommend multi-target cross-compilers (i.e., compilers that are run on a modern PC and that produce binaries for multiple targets).

Cross-compilers vs native compilers

We do not recommend the use of native compilers because they would be inconvenient (even if used inside an accelerated emulator) and would never produce the same kind of optimized code due to the limited resources of the 8-bit system.

In particular we will refer to the following multi-target cross-compilers:

Architecture	Compiler/Dev-Kit	Web Page
Intel 8080	ACK	https://github.com/davidgiven/ack
MOS 6502	CC65	https://github.com/cc65/cc65
Motorola 6809	CMOC	https://perso.b2b2c.ca/~sarrazip/dev/cmoc.html
Zilog 80	SCCZ80/ZSDCC (Z88DK)	https://github.com/z88dk/z88dk

We mention other multi-target cross-compilers that we do not cover here but for which most of the described general techniques are valid:

SDCC (http://sdcc.sourceforge.net/) for several architectures including the 8-bit Zilog Z80 and Intel 8051;
LCC1802 (https://sites.google.com/site/lcc1802/) for the COSMAC 1802 8-bit processor;
GCC-6809 (https://github.com/bcd/gcc) for the Motorola 6809 (GCC adaptation);
GCC-6502 (https://github.com/itszor/gcc-6502-bits) for the MOS 6502 (GCC adaptation);
SmallC-85 (https://github.com/ncb85/SmallC-85) for the Intel 8080/8085 ;
devkitSMS (https://github.com/sverx/devkitSMS) for Sega Z80-based consoles (Sega Master System, Sega Game Gear, Sega SG-1000).

We remark that the Z88DK dev-kit provides two compilers:

the more reliable SCCZ80 that also offers fast compilation,
the experimental ZSDCC (Z80-only optimized SDCC version) that can produce faster and more compact code than SCCZ80 at the cost of much slower compilation and the risk introducing erratic behavior.

Almost all of considered compilers generate code for just one architecture (they are mono-architecture) even though they are multi-target. ACK is an exception because it is multi-architecture (Intel 8080, Intel 8088/8086, I386, 68K, MIPS, PDP11, etc.).

This article is not an introduction nor a manual for these compilers and it will not cover the following topics:

compiler installation
basic usage of the compiler

For details on these topics we refer to the compiler’s manuals and web pages.

A sub-set of ANSI C
In this article we will refer to ANSI C as to a large sub-set of the C89 standard where float and long long are optional but pointers to functions and pointers to struct are present.
We will not consider previous versions such as C in K&R syntax.

Motivation

Why should we use C to code for vintage 8-bit systems?
Traditionally these systems were coded in either Assembly or interpreted BASIC or a mix of the two.
Given the limited resources, Assembly was often necessary. BASIC was convenient for its simplicity and because an interpreter was often present on the system’s ROM.

If we limit our comparison to just Assembly, BASIC and C, the following tables summarizes the reasons of using C:

language	simplicity	portability	efficiency
Assembly	low	no	optimal
BASIC	high	low	low
C	high	high	good

Very high portability

In particular ANSI C allows us:

to ease porting from different architectures
to write “universal” code, that is valid code for different targets without any necessary modification

Good performance

Someone sees C as a sort of universal Assembly language. I do not fully agree with this statement because optimally-written C will never beat optimally-written Assembly.
Nevertheless, C is the closest language to Assembly among the languages that allow high level programming.

“Sentimental drawbacks”

One not fully rational reason for not using C in this context is the fact that coding in C provides a less vintage experience compared to BASIC and Assembly because it was less common on the home computers from the 80s (but it was common on 8-bit business computers such as on computers that used the CP/M operating system).
On the other hand, I believe that a good reason for coding in C is that C allows us to code for any 8-bit system.

Writing portable code

Writing easily portable code or even directly compilable code for different architectures is possible in C through different strategies:

Write code that is hardware-agnostic through abstract interfaces (i.e., is hardware-independent APIs)
Use different implementations of the interfaces and select them at compilation-time (by using precompiler-directives or by providing different files at linking-time)

Writing portable code for targets of a dev-kit

This is trivial if our multi-target dev-kit provides a multi-target library or if we just use standard C libraries (e.g., stdio, stdlib, etc.). Under these conditions we just need to recompile our code. The multi-target library will do the “magic” for us.

Unfortunately only CC65 and Z88DK provide significant multi-target libraries for input and output other than standard C libraries:

Dev-Kit	Architecture	multi-target libraries
Z88DK	Zilog Z80	standard C lib, conio, vt52, vt100, sprite software, UDG, bitmap
CC65	MOS 6502	standard C lib, conio, tgi (bitmap)
CMOC	Motorola 6809	standard C lib
ACK	Intel 8080	standard C lib

In particular Z88DK has very powerful libraries for multi-target graphics and even provides APIs for software sprites (https://github.com/z88dk/z88dk/wiki/monographics) and redefined characters for most of its 80 targets.
Example: The code of the game H-Tron (https://sourceforge.net/projects/h-tron/) uses Z88DK’s APIs for low resolution bitmap graphics for a multitude of Z80-based targets.

Therefore if we were to use exclusively the standard C libraries we could compile our code with ACK, CMOC, CC65 and Z88DK. If we used conio we could compile the code with CC65 and Z88DK (maybe with minor adaptations).

In all other cases, if we want to write portable code on different architectures and systems, we would need to write a “hardware abstraction layer” that allows us to separate:

the code that does not depend on the hardware (e.g., the logic part)
the code that depends on the harware (e.g., input/output in a videogame)

This pattern is very common in modern programming and it is not exclusive to C. For this purpose C provides a set of tools to implement this pattern to select the different portions of code required by each hardware at compilation-time.
In particular C provides a powerful pre-compiler with commands such as:

#define -> to define a macro
#if … defined(...) … #elif … #else -> to select code portions that depend on the existence of value of a given macro

Moreover all compilers provide the option -D to pass a macro to the pre-compiler. Some compilers such as CC65 implicitly define a macro that depends on the selected target (e.g., VIC20).

In our code we may have something like:

...
		#elif defined(__PV1000__)
			#define XSize 28
		#elif defined(__OSIC1P__) || defined(__G800__) || defined(__RX78__) 
			#define XSize 24
		#elif defined(__VIC20__) 
			#define XSize 22
...

When we compile for the Vic 20 target, the pre-compiler will select for us the Vic 20-specific definition of XSize.
This also allows to select specific options for the configuration of the target (additional memory, video card, video mode, debug compilation, etc.).

As main example we refer to the Cross-Chase project:
https://github.com/Fabrizio-Caruso/CROSS-CHASE

The code of Cross-Chase provides an example of how to write universal code for any system and architecture:

the code of the game (src/chase directory) is hardware-independent
the code of the crossLib library (src/cross_lib directory) implements all the hardware-specific details

Writing portable code even for unsupported systems

The dev-kits under our consideration support a list of targets for each architecture by providing specific libraries. Nevertheless it is possible to exploit these dev-kits for other systems with the same architecture but we will have to implement all the hardware-specific code:

the necessary code for input/output (e.g., graphics, sounds, keyboard, joystick, etc.)
the necessary code for correct machine initialization

Alternatively, it is possible to extend a dev-kit to support to new targets.

In many cases, we can use the ROM routines to do this (see the section on the ROM routines)

Moreover we may have to convert the binary to a format that can be accepted by the system.

Therefore, we can indeed write portable code for even these unsupported systems.

For example CC65 does not support the BBC Micro, nor the Atari 7800 and CMOC does not support the Olivetti Prodest PC128. Yet, it is possible to use these dev-kits to produce binaries for such targets:

Cross Chase (https://github.com/Fabrizio-Caruso/CROSS-CHASE) supports (theoretically) any architecture even the unsupported ones such as for example the l’Olivetti Prodest PC128.
The game Robotsfindskitten is been compiled for thr Atari 7800 with CC65 (https://sourceforge.net/projects/rfk7800/files/rfk7800/).
BBC has already been added unofficially as an experimental new target in CC65 (https://github.com/dominicbeesley/cc65).

Compilation for unsupported targets

We give a list of compilation options for generic targets for each dev-kit. These options tell to compile without any dependence on a specific target. For more details we refer to the manual of the dev-kits.

Architecture	Dev-Kit	Option(s)
Intel 8080	ACK	(*)
MOS 6502	CC65	`+none`
Motorola 6809	CMOC	`--nodefaultlibs`
Zilog 80	SCCZ80/ZSDCC (Z88DK)	`+test`, `+embedded` (new lib), `+cpm` (generic CP/M target)

(*) ACK officially only supports the CP/M-80 for the Intel 8080 architecture but it is possible to use ACK to build generic Intel 8080 binaries but it is not very simple because ACK uses a sequence of commands to produce intermediate results (including the “EM” byte-code):

ccp.ansi: C precompiler
em_cemcom.ansi: compiles precompiled C code into “EM” byte-code
em_opt: optimizes “EM” byte-code
cpm/ncg: generates Intel 8080 Assembly from “EM” bytecode
cpm/as: generates Intel 8080 binary from Assembly
em_led: links object files

General C code optimization

We describe some general rules to improve the code that do not depend on whether the architecture is 8-bit or not.

Re-use same functions

In general, in whatever programming language we want to code, it is important to avoid code duplication and unnecessary code.

Structured programming

We have to examine each function in order to find common portions that we can factor by introducing sub-functions that are original function can call.
However we must take into account that, beyond a certain limit, excessive code granularity has negative effects because each function call has a computational and memory cost.

Passing variables

If two functions do the same thing on different objects then just simply use the same function and pass to it the specific object as a parameter.

Passing pointers to functions

In other cases, some portions of the code differ only by an applied function. In such cases, we should write one function to which we pass a pointer to the specific function we want to apply.

Not everyone is familiar with the C syntax for pointer to functions. Therefore we give here a simple example in which we define sumOfSomething(range, something) that sums something(i) on values of i from 0 to i-1:

unsigned short sumOfSomething(unsigned char range, unsigned short (* something) (unsigned char))
{
    unsigned char i;
    unsigned short res =0;
    for(i=0;i<range;++i)
    {
        res+=something(i);
    }
    return res;
}

Hence given the two functions:

unsigned short square(unsigned char val)
{
        return val*val;
}

unsigned short next(unsigned char val)
{
    return ++val;
}

we can use sumOfSomething on either of the two:

printf("%d\n",sumOfSomething(4,square));

prints 14, i.e., the sum of squares of 0,1,2,3.

printf("%d\n",sumOfSomething(4,next));

prints 10, i.e., the sum of 0+1,1+1,2+1,3+1.

Passing offsets to struct

In some cases it is possible to generalize the code by passing a parameter to avoid writing very similar functions.
An advanced example is in https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/chase/character.h where, given a struct with two fields _x and _y,
we want to be able to change the value of one field or the other in different situations:

	struct CharacterStruct
	{
		unsigned char _x;
		unsigned char _y;
		...
	};
	typedef struct CharacterStruct Character;

We avoid two different functions for _x and _y by creating one function to which we pass an offset to select the field:

	unsigned char moveCharacter(Character* hunterPtr, unsigned char offset)
	{
		if((unsigned char) * ((unsigned char*)hunterPtr+offset) < ... )
		{
			++(*((unsigned char *) hunterPtr+offset));
		}
		else if((unsigned char) *((unsigned char *) hunterPtr+offset) > ... )
		{
			--(*((unsigned char *) hunterPtr+offset));
		}
	...
	}

In this case, we use the fact that the field _y is exactly one byte after the field _x. Therefore with offset=0 we access _x and with offset=1 we access _y.

Warning: We must always remember that adding a parameter has a cost and we must verify that the cost of the parameter is lower than the cost of an extra function (e.g., by looking at the size of the obtained binary).

Same code on similar objects

We can do even better and use the same code on objects that are not identical but shares some common features by using offset in struct, pointers to functions, etc. In general this is possible through object-oriented programming whose light-weight implementation for 8-bit systems is described in a subsequent section in this article.

Pre-increment/decrement vs Post-increment/decrement

We must avoid post-increment/decrement operators (i++, i--) when they are not needed, i.e., when we do not need the original value and replace them with (++i, --i). The reason is that the post-increment operator requires at least an extra operation to save the original value.
Remark: It is totally useless to use a post-increment in a for loop.

Constant vs Variables

Any architecture will perform better if variables are replaced by constants.

Use constants

Therefore if a variable has a known value at compilation-time, it is important to replace it with a constant.
If its value depends on some compilation option, then we should use a macro to set its value.

Help the compiler recognize constants

For single pass compilers (the majority of 8-bit cross-compilers, e.g., CC65), it is important to help the compiler decide whether a given expression is a constant.

Example (from https://www.cc65.org/doc/coding.html):
A single pass compiler may evaluate the following expression from left to right and miss the fact that OFFS+3 is a constant:

	#define OFFS   4
	int  i;
	i = i + OFFS + 3;

In this case it would be better to re-write i = i + OFFS+3 as i = OFFS+3+i or i = i + (OFFS+3):

	#define OFFS   4
	int  i;
	i = OFFS + 3 + i;

8-bit specific code optimization

The C language has both high level constructs (such as struct, functions as parameters, etc.) and low level constructs (such as pointers, bitwise operators, etc.).
This is not enough to make C a programming language well-suited for programming 8-bit systems.

Implement `peek` and `poke` in C

Most probably we will need to read and write single bytes from and to specific memory locations. In old BASIC this was done through peek and poke commands. In C we must do this through pointers whose syntax is not very readable. In order to make our code more readable we can create the following macros:

    #define POKE(addr,val)  (*(unsigned char*) (addr) = (val))
    #define PEEK(addr)      (*(unsigned char*) (addr))

Remark: The compilers will produce optimal code when we use constants as parameters of these macros.

For more details we refer to: https://github.com/cc65/wiki/wiki/PEEK-and-POKE

The “best types” for 8-bit systems

First of all we must take into account that we have the following situation:

all arithmetic operations are just 8-bit
most of other operations use 8 bits while some may use 16 bits and none uses 32 bits
signed operations are slower than unsigned operations
the hardware does not support floating point operations

Integer vs floating point types

The C languages provides signed integer types (char, short, int, long, long long, etc.) and their unsigned counterparts.
Most cross compilers (but not CC65) support the float type (for floating point numbers), which we do not cover here. We only remark that float numbers in 8-bit architecture are always software float and therefore have a high computational cost. Hence we should only use them when strictly necessary.

Our friend unsigned

Since the 8-bit architectures under consideration do NOT handle signed types well, we must avoid them whenever possible.

“Size matters!”

The size of the standard integer types depend on the compiler and architecture and not on the C standard.
Recently the C99 standard has introduced some types that have an unambiguous size (e.g., uint8_t for an 8-bit unsigend integer).

In order to use these types in our code we should include stdint.h with:

	#include <stdint.h>

Not all 8-bit cross compiler support these types.

Fortunately for most 8-bit compilers we have the following situation:

type	number of bits	`stdint.h`	alternative name
`unsigned char`	8	`uint8_t`	`byte`
`unsigned short`	16	`uint16_t`	`word`
`unsigned int`	16	`uint16_t`	`word`
`unsigned long`	32	`uint32_t`	`dword`

Therefore we must:

use unsigned char (or uint8_t) for arithmetic operations whenever possible;
use unsigned char (or uint8_t) and unsigned short (or uint16_t) for all other operations and avoid all 32-bit operations.

Remark: When the fixed-size types are not available we can introduce them by using typedef:

	typedef unsigned char uint8_t;
	typedef unsigned short uint16_t;
	typedef unsigned long uint32_t;

Choice of the operations

When writing code for an 8-bit architecture we must avoid inefficient operations or operations that force us to use inefficient types (such as signed or 32-bit types).

Avoid signed

In particular, it is often possible to rewrite the code in a way to avoid subtractions or when this is not possible, we can at least have a code that does not produce negative results.

Avoid explicit products

All the architectures under consideration, with the only exception of the Motorola 6809, do not have a product operation between two 8-bit values. Therefore, if possible, we should avoid products or limit ourselves to products and divisions by power of 2 that we can implement with the << e >> operators:

	unsigned char foo, bar;
	...
	foo << 2; // multiply by 2^2=4
	bar >> 1; // divide by 2^1=2

Rewrite some operations

Other operations such as modulo can be rewritten in a more efficient way for the 8-bit systems by using bit-wise operators because the compiler is not always capable of optimizing these operations:

	unsigned char foo;
	...
	if(foo&1) // equivalent to foo%2
	{
		...
	}

Variables and parameters

One of the greatest limitations of the MOS 6502-architecture is not the lack of registers as someone may think but it is the small size of its hardware stack (in page one: $0100-01FF), which is unusable in C for managing the scope of variables and parameter passing.
Therefore a C compiler for the MOS 6502 may have to use a software stack:

to manage the scope of local variables,
to manage parameter passing.

The other 8-bit architectures under our consideration may suffer less from this problem but the scope of local variables and parameter passing also have a cost when a hardware stack can be used.

An antipattern may help us

One way to mitigate this problem is to reduce the use of local variables and passed parameters. This is clearly an antipattern and if we were to apply it to all our code we would get some spaghetti code.
We must therefore wisely choose which variables deserve to be local and which variables can be declared as global.
We would then have less re-usable code but we will gain in efficiency. I am NOT suggesting the use of just global variables and to renounce to all parameters in functions.

[6502] Do not use re-entrant functions

The CC65 compiler for the MOS 6502 architecture provides the -Cl option that tells the compiler to interpret all local variables as static, i.e., global.
This has the effect of avoiding the use of the software stack for their scope. This also has the effect of making all the functions non-reentrant.
In practice this prevents us from using recursive functions. This is not a serious loss because recursion would be a costly operation, which we should avoid on 8-bit systems.

[6502] Use page zero

Standard C provides the register keyword to give a hint to the compiler to use a register for a given variable.
Most modern compiler simply ignore this keyword because their optimizers can choose better than the programmer.
This is also true for most of the compilers under consideration but not for CC65 that uses this keyword to tell the compiler to use page zero for a given variable. The MOS 6502 can access this page more efficiently than any other memory area. The operating system already uses this page but the CC65 compilers leaves a few available bytes for the programmer. By default CC65 reserves 6 bytes in page zero for variables declared as register.
One may think that all variables should be declared as register but things are NOT so simple because everything has a cost. In order to store a variable in page zero, some extra operations are required. Hence, page zero provides an advantage only for variables that are heavily used.
In practice the two most common scenarios where this is the case are:

parameters of type pointer to struct that are used at least 3 times within the function scope;
variables inside a loop that is repeated at least about 100 times.

A reference with more details is: https://www.cc65.org/doc/cc65-8.html

My personal advice is to compile and verify if the produced binary is shorter/faster.

Binary structure

If our program uses data in a specific memory area, it would be better to have the data already stored in the binary and have the load process of the binary copy the data at the expected locations without any code to do actual copying of the data.
If the data is in the source code instead, we will have to copy them and we will also end up having them twice in memory.
The most common case is the data for sprites and redefined characters or tiles.

Different compilers provide different tools to define the final binary structure.

[CC65] Let us instruct the linker

It is possible to configure CC65’s linker through a .cfg file that describes the structure of the binary that we want to produce.
This is not very simple and a description of the linker would go beyond the scope of this article. For details we refer to
https://cc65.github.io/doc/ld65.html
We advice to read the manual and start by modifying the default .cfg file in order to adapt it to one’s use-case.

Exomizer can help us (also) on this

In some cases we may have graphics data in a memory area far from the code and have them both on the same binary. If we do this, we may end up with a “hole” between the two areas.
A common example is provided by the C64 where graphics data may be in higher memory than the code.
In this case I recommend the exomizer tool to compress the binary: https://bitbucket.org/magli143/exomizer/wiki/Home

[Z88DK] Appmake does almost everything for us

Z88DK makes our life easier and its power appmake tool automatically builds binaries in the correct format for most scenarios.
Z88DK also allows the user to define memory sections and to redefine the binary “packaging” but doing this is quite complicated.
This topic is treated in detail in:
z88dk/z88dk#860

Code on multiple files

Usually separating the source code into multiple files is a good practice but it may produce poorer code because 8-bit optimizers do not perform link-time optimization, i.e., they cannot optimize code between two or more files and only optimize each file separately.
For example if we have a function that is called only once and the function is defined in the same file where it is invoked, then the optimizer may be able to inline it but this would never be possible if the function were defined and invoked in different files.
My advice is not to create one or few huge files but to take into account how separating the code into multiple files can affect the optimization.

Advanced memory use

The C compiler usually produces a unique binary that contains both code and data, which will be loaded in specific memory locations (even with non contiguous memory areas).

In many architectures some RAM areas are used as buffers for the ROM routines or are used only in some special cases (e.g., some graphics modes).
My advice is to study the memory map. For example for the Vic 20 we would have to look at:
http://www.zimmers.net/cbmpics/cbm/vic/memorymap.txt

In particular we should look for:

cassette buffer, keyboard buffer, printer buffer disk buffer, etc.
memory used by ROM routines and in particular by BASIC routines
memory areas used by special graphics modes
free small portions of free memory that are not usually used by code because they are not contiguous with the main code memory area.

These memory areas could be used by our code if they do not serve their standard purpose in our use-case, e.g., if we do not intend to use the tape after the program has been loaded (including from the tape), then we can use the tape buffer in our code to store some variables.

Useful cases
We list some of these useful memory areas for some systems including many with very limited RAM:

computer	descrizione	area
Commodore 16/116/+4	BASIC input buffer	$0200-0258
Commodore 16/116/+4	tape buffer	$0333-03F2
Commodore Pet	system input buffer	$0200-0250
Commodore Pet	tape buffer	$033A-03F9
Commodore 64 & Vic 20	BASIC input buffer	$0200-0258
Commodore 64 & Vic 20	tape buffer	$033C-03FB
Galaksija	variable a-z	$2A00-2A68
Sinclair Spectrum 16K/48K	printer buffer	$5B00-5BFF
Mattel Aquarius	random number space	$381F-3844
Mattel Aquarius	input buffer	$3860-38A8
Oric	alternate charset	$B800-B7FF
Oric	grabable hires memory	$9800-B3FF
Oric	Page 4	$0400-04FF
Sord M5	RAM for ROM routines (*)	$7000-73FF
TRS-80 Model I/III/IV	RAM for ROM routines (*)	$4000-41FF
VZ200	printer buffer & misc	$7930-79AB
VZ200	BASIC line input buffer	$79E8-7A28

(*): Multiple buffer and auxiliary ram for ROM routiens. For more details please refer to:
http://m5.arigato.cz/m5sysvar.html and http://www.trs-80.com/trs80-zaps-internals.htm

In standard C we can only define some pointer and array variables at some specific memory locations.

In the following with give a theoretical example on how to define some of these pointer and array variables at address starting at 0xC000 where given a 5-byte struct type Character we want to also handle the following variables:

player of type Character,
ghosts, an array with 8 Character elements (40=$28 bytes)
bombs, an array with 4 Character elements (20=$14 bytes)

	Character *ghosts = 0xC000;
	Character *bombs = 0xC000+$28;
	Character *player = 0xC000+$28+$14;

This generic solution with pointers does not always produce optimal code because it forces us to dereference our pointers and creates pointer variables (usually 2 bytes per pointer) that the compiler has to allocate in memory.

No standard solution exists to store any other type of variables in a specific memory area but the CC65 and Z88DK linkers provide a special syntax to do this and let us save hundreds or even thousands of precious bytes. Some examples are in
https://github.com/Fabrizio-Caruso/CROSS-CHASE/tree/master/src/cross_lib/memory

In particular we will have to create an Assembly file: a .s file (underCC65) or .asm file (under Z88DK) that we will link to our binary. In this file we will be able to assign each variable to a specific memory area.
Remark: We need to add an underscore prefix to each variable.

CC65 syntax (Commodore Vic 20 example)

	.export _ghosts;
	_ghosts = $33c
	.export _bombs;
	_bombs = _ghosts + $28 
	.export _player;
	_player = _bombs + $14

Z88DK syntax (Galaksija example)

	PUBLIC _ghosts, _bombs, _player
	defc _ghosts = 0x2A00
	defc _bombs = _ghosts + $28 
	defc _player = _bombs + $14

CMOC provides the --data=<address> option to allocate all writable global variables at a given starting memory address.

ACK documentation does not say anything about this. We could nevertheless define pointer and array types at given free memory locations through the generic standard syntax.

Object-oriented programming

Contrary to common belief, object-oriented programming is possible in ANSI C and can help up produce more compact compact in certain situations. There are complete object-oriented frameworks for ANSI C (e.g., Gnome is writtwn with GObject, which is one of these frameworks).

We can implement classes, polymorphim and inheritance very efficiently even for memory-limited 8-bit systems.

A detailed description of object-oriented programming goes beyond the purpose of this articile.
Here we decribe how to implement its main features:

Use pointers to functions to implement *polymorphic" methods, i.e., methods with dynamic binding, whose behavior is defined at run-time. It is possible to avoid the implementation of a vtable if we limit ourselves to classes with just one polymorphic method.
Use pointers to struct and composition to implelent sub-classes: given a struct A, we implement a sub-class with a struct B defined as a struct whose first field is of type A. When passing pointers to such new struct, the C language guarantees that the offset of B are the same as the ones of A and therefore a pointer to B can be cast into a pointer to A.

Example (taken from https://github.com/Fabrizio-Caruso/CROSS-CHASE/tree/master/src/chase)
Let us define Item as a sub-class ofCharacter to which we add some variables and a polymorphic method _effect():

	struct CharacterStruct
	{
		unsigned char _x;
		unsigned char _y;
		unsigned char _status;
		Image* _imagePtr;
	};
	typedef struct CharacterStruct Character;
...
 	struct ItemStruct
	{
		Character _character;
		void (*_effect)(void);
		unsigned short _coolDown;
		unsigned char _blink;
	};
	typedef struct ItemStruct Item;

We can then pass a pointer to Item as if it were a pointer to Character (by performing a simple cast):

	Item *myIem;
	void foo(Character * aCharacter);
	...
	foo((Character *)myItem);

Why can we save memory by doing this?
Because we may treat different, yet similar, objects with the same code and so avoid code duplication.

Optimized compilation

We won’t cover exhaustively all compilation options of the cross-compilers under our consideration. We refer to their respective manuals for the derails.
Here we give a list of options to produced optimized code on our compilers.

“Aggressive” compilation

The following options will apply the highest optimizations to produce faster and above all more compact code:

Architecture	Compiler	Options
Intel 8080	ACK	`-O6`
Zilog Z80	SCCZ80 (Z88DK)	`-O3`
Zilog Z80	ZSDCC (Z88DK)	`-SO3` `--max-alloc-node20000`
MOS 6502	CC65	`-O` `-Cl`
Motorola 6809	CMOC	`-O2`

Speed vs Memory

The most common problem for many 8-bit systems is the presence of little memory for code and data. Usually optimizing for speed also improves memory usage but this is not always the case. In some other cases, our goal is speed even at the cost of extra memory. Some compilers provide options to specify our preference with respect to speed and memory:

Architecture	Compiler	Options	Description
Zilog Z80	ZSDCC (Z88DK)	`--opt-code-size`	Optimize memory
Zilog Z80	SCCZ80 (Z88DK)	`--opt-code-speed`	Optimize speed
MOS 6502	CC65	`-Oi`, `-Os`	Optimize speed

Known problems

CC65: -Cl prevents the use of recursive functions
ZSDCC: has bugs that do not depend on the options and has specific bugs that are triggered by -SO3 when no --max-alloc-node20000 option is provided.

In order to avoid these problems and reduce compilation time we recommend the use of just SCCZ80 for Z80 during development and debugging and resort to ZSDCC only for the final optimization and tests.

Avoid linking useless code

Our compilers will not always be able to detect and remove unused and useless code from the binary. Therefore we must avoid to include it in the first place.

We can do even better with some of the compilers by instructing them to not include some standard libraries or even portions of the libraries that we are sure not to use.

Avoid the standard library

Avoiding the standard library can save some generated code. This has a significant impact when using ACK to produce CP/M-80 binaries. When compiling with ACK, whenever possible, we shoud try to replace functions such as printf and scanf with just getchar() and putchar(c).

[z88dk] Special `pragma`'s to remove code

Z88DK provides several pragma commands to instruct the compiler and linker to not include some useless code.

For example:

#pragma printf = "%c %u"

includes only %c and %u converts and excludes all the others;

#pragma-define:CRT_INITIALIZE_BSS=0

does not generate code to inizialize the BSS memory area;

#pragma output CRT_ON_EXIT = 0x10001

the program does not when it exists (e.g., to BASIC);

#pragma output CLIB_MALLOC_HEAP_SIZE = 0

no heap memory (i.e., no malloc are possible);

#pragma output CLIB_STDIO_HEAP_SIZE = 0

removes stdio heap (no file can be opened).

More examples are in: https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/cross_lib/cfg/z88dk

Use ROM routines

Most of the 8-bit systems (almost all computers), have plenty of routines in ROM. It is important to know about them and use them when they are need. In order to use them explicitly in our code, we may have to write some in line Assembly in our C code (or use separate Assembly routines). How to do this is different in every dev-kit and we refer to their respective manuals for more details.

This is very important for systems that are not natively supported by the compilers and for which all input/output routines have to be written.

Example (taken from https://github.com/Fabrizio-Caruso/CROSS-CHASE/blob/master/src/cross_lib/display/display_macros.c)

In order to display characters on the screen for the Thomson Mo5, Mo6 and Olivetti Prodest PC128 (which are not supported by CMOC), we can use the ROM routine by using a little in line Assembly code:

	void PUTCH(unsigned char ch)
	{
		asm
		{
			ldb ch
			swi
			.byte 2
		}
	}

Libraries may already use ROM routines

Luckily we use ROM routines implicitly by just using the libraries that are provided by the dev-kit. This saves us a lot of RAM memory because the code is already stored in ROM.
Nevertheless we must take into consideration that when we use a ROM routine may add some constraints in our code because we cannot modify them and they may use some auxiliary RAM locations (e.g., buffers) that we won’t be allowed to use.

BASCK: let us find the ROM routines

When information on the ROM routines of a lesser known system is scant or we do not know the entry points (start addresses) of such routines we may resort BASCK (https://github.com/z88dk/z88dk/blob/master/support/basck/basck.c, developed by Stefano Bodrato), which is distributed as part of Z88DK. BASCK takes as input ROM files of Z80 and 6502-based systems and searches for known patterns of ROM routines. Once the routines and the their entry points are found, using them is not always simple but in some cases it is trivial.

Example

Let us say we are looking for the PRINT routine in the ROM, then we run BASCK and we filter its output with the “PRS” string (e.g., with the Unix “grep” command )

> basck -map romfile.rom |grep PRS  
PRS = $AAAA ; Create string entry and print it

This gives us the address of the PRINT routine.

Now we can write C or Assembly code to use it:

extern void rom_prs(char * str) __z88dk_fastcall @0xAAAA;
main() {

rom_prs ("Hello WORLD !");

while (1){};

}

Exploit the graphics chips

As seen in the previous section, even if we could in C we should not forget the specific hardware. In some cases the hardware can help us write more compact and faster code. In particular the graphics chip can help us save lots of RAM.

Example (TI VDP chip such as the TMS9918A used in the MSX, Spectravideo, Memotech MTX, Sord M5, etc.)
In some cases we could exploit a special text mode (Mode 1) where the color of a character is implicitfor each group of characters. In such case, a single byte is sufficient to define a character and its color. The 8-bit Atari computer have a similar text mode (graphics mode 1+16, Antic mode 6).

Example (VIC chip used in the Commodore Vic 20)
The Commodore Vic 20 is a special case because of its hardware limits (RAM totale: 5k, RAM disponibile per il codice: 3,5K) but also for some of the tricks it provides to reduce the impact of these limits.
One surprising feature of this chip, is its ability to map just a subset of its characters to RAM while mapping the rest to ROM. If we only need n (<=64) redefined characters we can map onto RAM just 64 of them with POKE(0x9005,0xFF);. In such a way we may also use less than 64.

Moreover, in some cases we can use the separate video ram to which some graphics chips have access (e.g., the TI VDP, MOS VDC of the C128, etc.) for different purposes than doing graphics, such as storing data. This is possible but it has a very high computational cost because the CPU has an indirect access to this separate RAM.

Files

8bitC_ENG.md

Latest commit

History