Skip to content

alorber/C-Compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECE466 (Compilers) Project

C Compiler

Over the course of the semester, I built a functional compiler for a subset of the C language, written in Lex, Bison, & C.


Usage

To run the compiler, navigate to /Parser and type make compile. Optional fields input, output, and error can be used to change the default input, output, and error files from stdin, stdout, and stderr respectively.


Compiler Stages

The compiler was completed in stages:

1. Lexer

The first stage of the compiler is the lexical analyzer, which was built using Lex. This stage reads in one character at a time and groups them into tokens, determining the token semantic value when applicable. File names and line numbers are also ingested by the lexer for error reporting.

2. Parser

The second stage of the compiler is the parser, which was built using Bison. The parser consumes tokens from the lexer and uses a grammar to perform syntactical & semantic analysis. As the grammar is used to determine the relationship between tokens, an abstract syntax tree is built.

This stage was built in increments to support a majority of the C language:

Expressions

The parser is able to properly handle most of the C expression syntax.

Declarations, Symbols, & Types

The parser is able to properly handle most C declarations. The parser is also able to properly store variables within the symbol table, differentiating by namespace and scope, as well as keep track of the current scope and search for identifiers throughout the scope stack.

Statements

The parser is able to properly handle most C statements, including expression statements, for statements, if-else statements, etc.

3. Quad Generation

In this stage, the abstract syntax tree nodes generated by the parser are used to create quads, an intermediary representation of the program which can then be translated into assembly code.

4. Assembly Generation

Finally, the quads are translated into X86 - 32 bit assembly code, which can then be run by the computer.


Implementation

Abstract Syntax Tree

An abstract syntax tree is a data structure that graphs the abstract relationship between elements of the language as they are parsed. An example can be seen below for the expression xyz = (1024 + abc);.

image

In my implementation, an AST node is represented as a union of various AST node structs, each containing fields relevant to the node type. For example, a binary operation AST node contains pointers to the left and right expression AST nodes, as well as a field to store the operation.

Symbol Table

The symbol table is the data structure used to contain a mapping between variables and values for each scope and namespace. When a variable is added to the symbol table, its identifier, or symbol, is hashed and a pointer to its AST node is inserted at the hashed index.

When the compiler begins parsing the program, a scope stack is created. Each layer, or scope, contains the type of scope (file, block, function, or prototype) and a symbol table for each namespace (label, tag, & other). Additionally, each scope contains the file and line on which it begins for better error reporting.

Variables are always inserted into the innermost scope and specified namespace, and when searching for a variable, the search begins in the innermost scope and specified namespace and works up the scope stack.


Certain areas of the C language were not implemented, including:

  • Initialized declarations
  • Expressions within an array declarator (i.e a[2+2])
  • C99 variable-length arrays
  • Type Qualifiers (the keywords are still accepted in the grammar)
  • Enums
  • Bit Fields
  • Prototype / Formal Parameter Lists
  • Typedef