Skip to content

Commit

Permalink
Merge pull request #31 from tmr232/cpp-support
Browse files Browse the repository at this point in the history
C++ Support
  • Loading branch information
tmr232 authored Nov 30, 2024
2 parents ea7835a + 2eb54ff commit a5133f4
Show file tree
Hide file tree
Showing 45 changed files with 8,647 additions and 4,039 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,16 @@ Check [Keep a Changelog](http://keepachangelog.com/) for recommendations on how
- Added JetBrains frontend for use in JetBrains IDE plugin
- Demo: Added a link to the JetBrains plugin page
- Demo learned to change font-size
- Documented the process of adding a new language
- [Biome](https://biomejs.dev/linter/) has been added as an additional linter
- [Oxlint](https://oxc.rs/docs/guide/usage/linter) has been added to auto-fix some common issues
- The `generate-parsers.ts` script has been updated to support copying existing `.wasm` files from tree-sitter grammar packages
- Initial support for C++
- A basic [typedoc](https://typedoc.org/) configuration was added, to help in rendering docs

### Changed

- Adding a new language now requires less wiring code, as many language declarations were merged.

## [0.0.8] - 2024-10-10

Expand Down
33 changes: 33 additions & 0 deletions biome.jsonc
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"$schema": "https://biomejs.dev/schemas/1.9.4/schema.json",
"vcs": {
"enabled": false,
"clientKind": "git",
"useIgnoreFile": false,
},
"files": {
"ignoreUnknown": false,
"ignore": ["./dist", "*.svelte"],
},
"formatter": {
"enabled": true,
"indentStyle": "space",
},
"organizeImports": {
"enabled": true,
},
"linter": {
"enabled": true,
"rules": {
"recommended": true,
"style": {
"noParameterAssign": "off",
},
},
},
"javascript": {
"formatter": {
"quoteStyle": "double",
},
},
}
Binary file modified bun.lockb
Binary file not shown.
92 changes: 92 additions & 0 deletions docs/AddNewLanguage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
title: Adding a New Language
group: Documents
category: Guides
---

# Adding a New Language

## Add the Relevant Parser

We're using [tree-sitter] to parse code into ASTs.
Each language requires its own parser.
Find yours in [tree-sitter's list of parsers][tree-sitter parsers].

Once you find the parser, you need to install it:

```shell
bun add --dev tree-sitter-<language>
```

After installing it, add it to `./scripts/generate-parsers.ts`
and run `bun generate-parsers` to try and generate the `.wasm` parser file from it.

If the package contains a pre-built `.wasm` file, this will work.
If it fails, Follow the [tree-sitter instructions for generating .wasm language files][build wasm] to set up emsrcipten,
and run `bun generate-parsers` again.

Once the command completes successfully, your new parser should be inside `./parsers`.

## Generating the CFG

Each CFG-builder resides in its own file inside `./src/control-flow`.
Name yours `cfg-<language>.ts`.

Your builder is expected to expose a `createCFGBuilder(options: BuilderOptions): CFGBuilder` function.
A naive implementation to get started with would look something like this:

```typescript
import type Parser from "web-tree-sitter";
import type { BasicBlock, BuilderOptions, CFGBuilder } from "./cfg-defs";
import {
type Context,
GenericCFGBuilder,
type StatementHandlers,
} from "./generic-cfg-builder.ts";

export function createCFGBuilder(options: BuilderOptions): CFGBuilder {
return new GenericCFGBuilder(statementHandlers, options);
}

const statementHandlers: StatementHandlers = {
named: {},
default: defaultProcessStatement,
};

function defaultProcessStatement(
syntax: Parser.SyntaxNode,
ctx: Context,
): BasicBlock {
const newNode = ctx.builder.addNode(
"STATEMENT",
syntax.text,
syntax.startIndex,
);
ctx.link.syntaxToNode(syntax, newNode);
return { entry: newNode, exit: newNode };
}
```

Once you have your initial builder file, there's quite a lot of wiring to do,
to register the language in all the relevant places.
Search for `ADD-LANGUAGES-HERE` in the code, and add the language in all the relevant places.
Those will include:

- Language & builder definitions in `src/control-flow/cfg.ts`
- Mapping languages to `.wasm` files in `src/components/utils.ts`
- Mapping VSCode's `languageId` to our language definitions in `src/vscode/extension.ts`
- Adding test-collectors and tests in `src/test/commentTestCollector.ts`
- Adding the language in the demo's UI in `src/components/Demo.svelte`

### Implementing the Builder

Once all the wiring is in place, it's time to actually generate the CFG.
It is highly recommended that you read the other CFG implementation for reference.

While you're working, the [tree-sitter playground] will prove highly valuable in understanding the AST
and creating queries.

[tree-sitter]: https://tree-sitter.github.io/tree-sitter/
[tree-sitter parsers]: https://github.com/tree-sitter/tree-sitter/wiki/List-of-parsers
[tree-sitter playground]: https://tree-sitter.github.io/tree-sitter/playground
[build-wasm]: https://github.com/tree-sitter/tree-sitter/blob/master/lib/binding_web/README.md#generate-wasm-language-files
56 changes: 56 additions & 0 deletions docs/CommentTests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Running & Writing Tests
group: Documents
category: Guides
---

# Comment Tests

The comment-tests framework allows us to define CFG generation tests in the source-code that we test on.
This makes test-writing easier, as we don't need to include code as strings in our tests.

## Running Tests

Use `bun test` to run all the tests.

### Visualizing Failures

If you have failing tests, you might want to visualize them.
To do that, collect the test results as they get updated:

```shell
bun web-tests --watch
```

And run the web server to visualize them:

```shell
bun web
```

## Test Types

The current available test types are:

1. `nodes`: asserts the expected node-count in the CFG
2. `exits`: asserts the expected exit-node count in the CFG
3. `reaches`: asserts reachability between node pairs
4. `render`: asserts that the code CFG for ths code renders successfully

Additionally, code-segmentation and snapshot-tests are added automatically for the code used in comment-tests.

## Writing Tests

1. Write your code in a new function in the matching file under `src/test/commentTestSamples`
2. Add a comment right above the function, declaring the relevant tests.
The commend format is JSON, but without the curly braces.

## Adding Languages

When we add a new language, we need to add a test-collector for that language.
A test collector exports a `getTestFuncs(code: string): Generator<TestFunction>` function.
To do that, we need to parse the code, and extract all functions and comments inside it.
It's best to look at one of the `collect-<language>.ts` files to see how this is done.

Once we have a collector, we add it in `src/test/commentTestCollector.ts` and map file-extensions to use with it.
Then, we add a test file under `src/test/commentTestSamples`.
3 changes: 3 additions & 0 deletions oxlintrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"$schema": "./node_modules/oxlint/configuration_schema.json"
}
8 changes: 6 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"esbuild-plugin-copy": "^2.1.1"
},
"devDependencies": {
"@biomejs/biome": "1.9.4",
"@codemirror/lang-cpp": "^6.0.2",
"@codemirror/lang-go": "^6.0.1",
"@codemirror/lang-python": "^6.1.6",
Expand All @@ -26,16 +27,19 @@
"eslint": "^9.12.0",
"graphology-utils": "^2.5.2",
"lz-string": "^1.5.0",
"oxlint": "0.13.2",
"prettier": "3.3.3",
"prettier-plugin-svelte": "^3.2.7",
"svelte": "^4.2.19",
"svelte-awesome-color-picker": "^3.1.4",
"svelte-codemirror-editor": "^1.4.1",
"tree-sitter-c": "^0.23.1",
"tree-sitter-cli": "^0.23.2",
"tree-sitter-cpp": "^0.23.4",
"tree-sitter-go": "^0.23.1",
"tree-sitter-python": "^0.23.2",
"typescript-eslint": "^8.8.0",
"typedoc": "^0.27.1",
"typescript-eslint": "^8.16.0",
"vite": "^5.4.8"
},
"peerDependencies": {
Expand All @@ -56,7 +60,7 @@
"build-demo": "bun run --cwd ./src/demo/ vite build --outDir ../../dist/demo --base '/function-graph-overview/'",
"build-jetbrains": "bun run --cwd ./src/jetbrains/ vite build",
"format": "bun prettier . --write --log-level silent",
"lint": "bun format && bun run eslint || bun run tsc --noEmit",
"lint": "bun format && bun run biome lint --fix || bun run oxlint --ignore-pattern=\"*.svelte\" --fix || bun run eslint || bun run tsc --noEmit",
"generate-parsers": "bun run ./scripts/generate-parsers.ts"
},
"//": "START EXTENSION ATTRIBUTES",
Expand Down
Binary file added parsers/tree-sitter-cpp.wasm
Binary file not shown.
10 changes: 5 additions & 5 deletions scripts/collect-comment-tests.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import { intoRecords } from "../src/test/commentTestUtils";
import { watch } from "fs";
import { parseArgs } from "util";
import { watch } from "node:fs";
import { parseArgs } from "node:util";
import { collectTests } from "../src/test/commentTestCollector";

const watchDir = import.meta.dir + "/../src";
const watchDir = `${import.meta.dir}/../src`;

const { values } = parseArgs({
args: Bun.argv,
Expand All @@ -20,13 +20,13 @@ const { values } = parseArgs({
async function generateJson() {
try {
const records = intoRecords(await collectTests());
Bun.write("./dist/tests/commentTests.json", JSON.stringify(records));
await Bun.write("./dist/tests/commentTests.json", JSON.stringify(records));
} catch (error) {
console.log(error);
}
}

generateJson();
await generateJson();
if (values.watch) {
const watcher = watch(
watchDir,
Expand Down
58 changes: 53 additions & 5 deletions scripts/generate-parsers.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,58 @@
/**
* The `generate-parsers` script copies or builds the relevant tree-sitter
* parsers in to the `./parsers` directory.
*
* To add a new parsers, add it's package name to the `parsersToBuild` array.
*/
import { $ } from "bun";
import * as fs from "node:fs";
import { fileURLToPath } from "node:url";

const treeSitter = Bun.file("./node_modules/web-tree-sitter/tree-sitter.wasm");
await Bun.write("./parsers/tree-sitter.wasm", treeSitter);
/**
* The parsers to include
*/
const parsersToBuild = [
"tree-sitter-go",
"tree-sitter-c",
"tree-sitter-python",
"tree-sitter-cpp",
];

const parsers = ["tree-sitter-go", "tree-sitter-c", "tree-sitter-python"];
function locatePrebuiltWasm(packageName: string): string {
return fileURLToPath(
import.meta.resolve(`${packageName}/${packageName}.wasm`),
);
}

function hasPrebuiltWasm(packageName: string): boolean {
try {
locatePrebuiltWasm(packageName);
} catch {
return false;
}
return true;
}

for (const name of parsersToBuild) {
const targetWasmPath = `./parsers/${name}.wasm`;
if (await Bun.file(targetWasmPath).exists()) {
console.log(`${name}: .wasm found, skipping copy.`);
} else if (hasPrebuiltWasm(name)) {
console.log(`${name}: copying .wasm`);
fs.copyFileSync(locatePrebuiltWasm(name), targetWasmPath);
} else {
console.log(`${name}: building .wasm`);
await $`bun x --bun tree-sitter build --wasm -o ${targetWasmPath} ./node_modules/${name}/`;
}

await $`git add ${targetWasmPath}`;
}

for (const name of parsers) {
await $`bun x --bun tree-sitter build --wasm -o ./parsers/${name}.wasm ./node_modules/${name}/`;
const treeSitterPath = "./parsers/tree-sitter.wasm";
if (!(await Bun.file(treeSitterPath).exists())) {
const treeSitter = Bun.file(
"./node_modules/web-tree-sitter/tree-sitter.wasm",
);
await Bun.write(treeSitterPath, treeSitter);
await $`git add ${treeSitterPath}`;
}
2 changes: 1 addition & 1 deletion scripts/watch-with-esbuild.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ import config from "./esbuild.config";
try {
const context = await esbuild.context(config);
await context.watch();
} catch (_e) {
} catch {
process.exit(1);
}
2 changes: 1 addition & 1 deletion src/components/CodeSegmentation.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@
) {
const { trim, simplify } = options;
const tree = parsers[language].parse(code);
const functionSyntax = getFirstFunction(tree);
const functionSyntax = getFirstFunction(tree, language);
const builder = newCFGBuilder(language, {});
let cfg = builder.buildCFG(functionSyntax);
Expand Down
Loading

0 comments on commit a5133f4

Please sign in to comment.