-
Notifications
You must be signed in to change notification settings - Fork 157
Smaller C Compiler Driver Wiki
What is Smaller C Compiler Driver?
What features does Smaller C Compiler Driver have?
How do I compile Smaller C Compiler Driver?
How do I run Smaller C Compiler Driver?
Limitations and implementation details
Smaller C Compiler Driver (henceforth, driver) is a utility that simplifies the use of the Smaller C Core Compiler. It invokes the core compiler (smlrc), the assembler (nasm) and the linker (smlrl) as necessary and with appropriate parameters, making it easier to do common compilation tasks. It is expected to be the primary user interface for the compiler as a whole.
The driver functions similarly to gcc, letting the user:
- Compile, or, more appropriately, translate C files into assembly files for further compilation into object files or for inspection and troubleshooting
- Compile C and assembly files into object (x86 ELF) files, optionally storing the object files into a standard archive file that can then be used as a library
- Link object and archive files into DOS, Windows, Linux and flat executables
- Compile and link C, assembly, object and archive files into executables, all in one command
- Specify compilation options and define macros at compile time (as opposed to defining the macros in the source code)
Use your favorite C compiler (or Smaller C itself). You will need to define one of the following macros at compile time to compile for the host platform of the choice: HOST_DOS, HOST_WINDOWS, HOST_LINUX. The host platform is the platform, where you intend to run the driver and the entire Smaller C. For example, you could do it like this:
gcc -Wall -Wextra -O2 -DHOST_LINUX smlrcc.c -o smlrcc
or
smlrcc -Wall -dosh -DHOST_DOS smlrcc.c -o smlrcc.exe
Irrespective of the host platform, the driver and the rest of Smaller C are able to target (that is, compile and link for) all other platforms.
Typically, you run it like this:
smlrcc [option(s)] <input file(s)>
If there are many input files, you can create a special text file listing all those input files and instead provide that file's name prefixed with the at (@) character. This is especially useful when compiling in DOS, where command lines are restricted to some 120+ characters, and when creating an archive file (library) out of many files. You can store the options in this file as well. The options and the file names must be separated by white space (spaces, tabs or new line characters, whichever you choose). Note: currently, spaces in file names aren't supported inside @-files.
Some basic examples follow.
Compile several source files (and link together with the standard library) into a DOS executable:
smlrcc -dosh file1.c file2.asm -o file12.exe
Compile one or more source files into object files:
smlrcc -dosh -c file1.c -o file1.o
smlrcc -dosh -c file1.c file2.asm
Link several object files (together with the standard library) into a Windows executable:
smlrcc -win file3.o file4.o -o file34.exe
Compile several source files into a DOS library (note the -c option and the .a extension):
smlrcc -dosh -c file5.c file6.c file7.asm -o lib567.a
Compile one or more C files into assembly files (note the -S option):
smlrcc -linux -c -S file8.c -o file8.asm
smlrcc -linux -c -S file8.c file9.c
Input file types:
- .c: C source code
- .i: C source code (intended for preprocessed C code; currently handled the same way as .c)
- .asm: assembly source code (for NASM)
- .o: 16/32-bit x86 ELF object file (produced with NASM's -f elf option)
- .a: library/archive containing .o files
Options:
- -o <output assembly/object/archive/executable file> Specifies the name of the output file. If you're making an executable and this option isn't given, the executable file will be named aout.com (if you use -dost or -tiny), aout.exe (if you use -doss, -small, -dosh, -huge, -win or -pe), a.out (if you use -linux or -elf) or aout.bin (if you use -flat16 or -flat32). If you're compiling to an object file and this option isn't given, the object file will be named the same as the input file but will have the .o extension. Likewise, if you're translating C code to assembly code (for that you need the -c and -S options) and this option isn't given, the assembly file will be named the same as the C file but will have the .asm extension.
- -stack <number> (passed to the linker) Specifies the stack size (in bytes) for the tiny, small and huge DOS memory models. To be used with the options -dost, -tiny, -doss, -small, -dosh, -huge. If the option isn't given, the linker assumes 8192 for the tiny and small memory models and 32768 for the huge memory model. The number can be decimal, hex or octal (e.g. 8192, 0x2000 and 020000 would all specify the same value).
- -map <somefile.map> (passed to the linker) Creates a map file of the output executable listing the sections and global public symbols and their locations in memory and in the executable. This option is only useful for debugging.
- -entry <name> (passed to the linker) Specifies the name of the entry point symbol. If this option isn't given, the entry point is assumed to be __start. Note the two leading underscore characters. If you want a C function to be the default entry point, it must be named as either _start (Smaller C prepends an underscore character by default) or __start (if using Smaller C with the option -no-leading-underscore).
- -origin <number> (passed to the linker) Specifies the origin for the executable file as an integer constant, decimal, hex or octal (e.g. 10, 0xA or 012 would all specify the same value). In Windows/PE and Linux/ELF executables it's the image base address, the address in memory at which the PE/ELF headers will be loaded, which then will be followed by the code and data sections. In flat executables it's the address/offset in memory at which the first byte of the file will be loaded for execution. The (E)IP register is expected to have this address/offset at start. The first byte in flat executables is part of the first executable instruction. Example: when linking into DOS .COM executables, the implicit origin is naturally set to 0x100 by the linker itself. This option should typically be used only when making flat executables (see options -flat16 and -flat32). In other cases the default origin value chosen by the linker should be sufficient.
- -v or -verbose prints commands executed during compilation (i.e. commands that invoke the core compiler, the assembler and the linker). May be useful for troubleshooting or monitoring compilation progress.
- -signed-char (passed to the core compiler) This makes char signed. This is the default.
- -unsigned-char (passed to the core compiler) This makes char unsigned.
- -leading-underscore (passed to the core compiler) This prefixes global C identifiers with an underscore, so you get labels like _main for main() and _printf for printf() in the generated assembly and object code. This is the default.
- -no-leading-underscore (passed to the core compiler) This results in no underscore prefixing of global C identifiers and you get assembly labels main and printf for main() and printf() respectively. This may be useful for compilation using the ELF format in Linux. This makes object/library files compatible with gcc on Linux.
- -winstack (passed to the core compiler) This option is to be used together with -pe and is automatically included by the -win option. It causes proper stack growth on Windows by calling an equivalent of the Visual C++ _chkstk() function to touch/probe stack pages and move the process guard page whenever necessary (when a function allocates 4096 or more bytes for local variables). If the guard page isn't moved, stack accesses beyond it will cause the process to crash.
- -Wall (passed to the core compiler) will cause printing of compilation warnings.
- -dost Selects the tiny memory model for DOS .COM programs (CS=DS=ES=SS, all code, data and stack are in the same segment). The type long isn't available in this model. Passes -seg16 to the core compiler and -tiny to the linker. Additionally passes -D _DOS to the compiler and, if you're making an executable, it causes the standard library (lcds.a) to be linked in.
- -tiny Same as -dost, except -D _DOS isn't passed to the compiler and the standard library isn't linked.
- -doss Selects the small memory model for DOS .EXE programs (DS=ES=SS, all data and stack are in the same segment, but the code is in a separate segment, CS≠SS). The type long isn't available in this model. Passes -seg16 to the core compiler and -small to the linker. Additionally passes -D _DOS to the compiler and, if you're making an executable, it causes the standard library (lcds.a) to be linked in.
- -small Same as -doss, except -D _DOS isn't passed to the compiler and the standard library isn't linked.
- -dosh Selects the huge memory model for DOS .EXE programs. This model is specific to Smaller C and isn't compatible with huge models supported by other DOS compilers. Passes -huge to the core compiler and -huge to the linker. Additionally passes -D _DOS to the compiler and, if you're making an executable, it causes the standard library (lcdh.a) to be linked in.
- -huge Same as -dosh, except -D _DOS isn't passed to the compiler and the standard library isn't linked.
- -win Selects Windows .EXE as the target. Passes -seg32 and -winstack to the core compiler and -pe to the linker. Additionally passes -D _WINDOWS to the compiler and, if you're making an executable, it causes the standard library (lcw.a) to be linked in.
- -pe Same as -win, except neither -winstack nor -D _WINDOWS is passed to the compiler and the standard library isn't linked.
- -linux Selects Linux/ELF as the target. Passes -seg32 to the core compiler and -elf to the linker. Additionally passes -D _LINUX to the compiler and, if you're making an executable, it causes the standard library (lcl.a) to be linked in.
- -elf Same as -linux, except -D _LINUX isn't passed to the compiler and the standard library isn't linked.
- -flat16 Similar to -tiny and is useful for making flat 16-bit binaries similar to DOS .COM programs having the tiny memory model (CS=DS=ES=SS, all code, data and stack are in the same segment). The default origin is 0, but can be changed (see the -origin option). The type long isn't available in this model. Passes -seg16 to the core compiler and -flat16 to the linker. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point. The standard library isn't linked.
- -flat32 Similar to -flat16. Can be useful for creating simple 32-bit hobby OSes or something similar. The default origin is 0, but can be changed (see the -origin option). Passes -seg32 to the core compiler and -flat32 to the linker. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point. The standard library isn't linked.
- -c Skip linking, compile only. If used together with the -o option and the output name ends in .a, a library is created out of the object files.
- -S Compile to assembly instead of to object file(s). To be used with -c.
- -I<path> (passed to the core compiler) This adds a directory to the include file search path. This is for #include "file.h" kind of includes.
- -SI<path> (passed to the core compiler) This adds a directory to the system include file search path. This is for #include <stdio.h> kind of includes. Paths provided using this option take precedence over (are searched by the core compiler before) the path that the driver may find using the environment variables and the path to smlrcc.exe.
- -D<macro[=expansion text]> (passed to the core compiler) This defines a macro. When the =expansion text part is omitted, the macro is defined as 1.
- -SL<path> This tells the driver where to look for the standard library (lcds.a, lcdh.a, lcw.a or lcl.a) instead of the default location. The linker doesn't perform any search and must be given all input files explicitly.
If you don't specify any of the file format options (-dost, -tiny, -doss, -small, -dosh, -huge, -win, -pe, -linux, -elf, -flat16, -flat32), smlrcc will act as if you specified -dosh or -win or -linux, depending on how smlrcc was compiled (with HOST_DOS, HOST_WINDOWS or HOST_LINUX macro defined). IOW, by default smlrcc will target the host platform, where it's running.
- File names: Avoid using non-alphanumeric characters (i.e. other than 0-9, a-z, A-Z, _), especially spaces in input and output file names and paths. This may not work due to command expansion in the shell.
- Lengths of file and path names: Avoid using long paths or putting Smaller C files far away from the root directory of the disk. Paths are naturally short in DOS and so is the DOS command line. Further, the lengths of file names in the core compiler are currently fixed. Placing Smaller C files in C:\SMLRC\BIND, C:\SMLRC\INCLUDE, C:\SMLRC\LIB should work fine in DOS. Stick with that.
- Intermediate/temporary files: Unless you specify the name of the output assembly/object file for every file you compile with smlrcc, smlrcc will by default create the assembly/object file in the same directory as the input C/assembly file and name it the same way, except for the name extension. This means two things. First, the project directory with the source code must be writable and must have enough space (generated assembly files can get as large as hundreds of kilobytes or a couple of megabytes). Second, you should avoid naming your C and assembly source files in exactly the same way, e.g. file123.c and file123.asm. file123.asm will be overwritten during compilation of file123.c.
- #include "header.h": the file header.h will be first searched for in the current directory and only then at locations provided with the -I and -SI options. Note that the current directory isn't necessarily the same directory that contains the file that does #include "header.h".
- Location of system header and library files under DOS and Windows: smlrcc will first look for files limits.h and lc.a* under %SMLRC%/include and %SMLRC%/lib, where %SMLRC% is the value of the environment variable SMLRC. If it doesn't find them there, it will look for them under SMLRCC_PATH/../include, SMLRCC_PATH/../lib, where SMLRCC_PATH is the location of smlrcc.exe itself. If limits.h isn't found, smlrcc won't pass -SI<path_to_limits.h> to the core compiler, which will likely result in a compilation error, unless the proper -SI<path> option is given to smlrcc. If the standard library isn't found and the -SL<path> option isn't given to smlrcc, the standard library won't be linked in, potentially causing a linking error. Note also, if you compile smlrcc.c into smlrcc.exe in the current directory and the compilation succeeds, the next invocation of smlrcc.exe may fail if smlrcc.exe from the current directory starts looking for the system files relatively to the current directory and fails to find them and the proper -SI<path> option isn't given to smlrcc.exe. You may need to use a different output name for the executable or prefix it with a path to a directory other than the current to avoid this issue. You could rename or move the freshly compiled smlrcc.exe as well.
- Location of system header and library files under Linux: smlrcc will first look for files limits.h and lc.a* under $SMLRC/include and $SMLRC/lib, where $SMLRC is the value of the environment variable SMLRC. If it doesn't find them there, it will next look for them under $HOME/smlrc/include and $HOME/smlrc/lib, where $HOME is the value of the environment variable HOME, IOW, the user's home directory. If it doesn't find them there, it will look for them under /usr/local/smlrc/include and /usr/local/smlrc/lib. If limits.h isn't found, smlrcc won't pass -SI<path_to_limits.h> to the core compiler, which will likely result in a compilation error, unless the proper -SI<path> option is given to smlrcc. If the standard library isn't found and the -SL<path> option isn't given to smlrcc, the standard library won't be linked in, potentially causing a linking error.
- NASM: you need NASM 2.03 or better (as it will be able to choose the short and the long forms of jumps automatically as necessary).