Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Add Indexes for common queries #308

Merged
merged 4 commits into from
Aug 9, 2023

Conversation

wmanley
Copy link
Contributor

@wmanley wmanley commented Aug 7, 2023

In #306 it's reported that SQLite is slow. This adds indexes so lookups will be faster. The intention is to speed up the bottom 3 queries from #306 (comment) as they are "the ones to focus on".

Specifically this should speed up:

  • Load graph data for a file

      SELECT value FROM graphs WHERE file = ?
    

    Called many times during path stitching

  • Load paths for a file node

      SELECT file,value from file_paths WHERE file = ? AND local_id = ?
    

    Called many times during path stitching.

  • Load paths for root node

      SELECT file, value from root_paths WHERE symbol_stack = ?
    

    Called many times during path stitching.

If the data that you're fetching is in the index then SQLite won't even bother reading the row proper, just pulling the data streight from the index (good for data locality). As such I've included not just the columns we're using in the WHERE clause, but also the ones from SELECT too.

I've not actually tested the performance impact as I'm not familiar with this project (this is more of a drive-by) and don't know what benchmarks to use. The only testing I've done is cargo test --features storage.

In github#306 it's reported that SQLite is slow.  This adds indexes so lookups will be
faster.  The intention is to speed up the bottom 3 queries from
github#306 (comment) as
they are "the ones to focus on".

Specifically this should speed up:

> * Load graph data for a file
>
>         SELECT value FROM graphs WHERE file = ?
>
>   Called many times during path stitching
>
> * Load paths for a file node
>
>         SELECT file,value from file_paths WHERE file = ? AND local_id = ?
>
>   Called many times during path stitching.
>
> * Load paths for root node
>
>         SELECT file, value from root_paths WHERE symbol_stack = ?
>
>   Called many times during path stitching.

If the data that you're fetching is in the index then SQLite won't even bother
reading the row proper, just pulling the data streight from the index (good for
data locality).  As such I've included not just the columns we're using in the
`WHERE` clause, but also the ones from `SELECT` too.

I've not actually tested the performance impact as I'm not familiar with this
project (this is more of a drive-by) and don't know what benchmarks to use.
@hendrikvanantwerpen
Copy link
Collaborator

This is great! I tried it locally and it seems to speed up things quite a bit.

I've made a few adjustments:

  • The covering indexes increased the database size a lot, so I've not included the values for now.
  • The indexes are also created from the Reader, just in case the database was written without it. This shouldn't have much impact, because the indexes are only created if they don't exist yet.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally by indexing typeorm/typeorm and running a query against it. Query response came in significantly faster than without the index.

@hendrikvanantwerpen hendrikvanantwerpen merged commit a4a726d into github:main Aug 9, 2023
9 checks passed
@wmanley wmanley deleted the sqlite-indexes branch August 9, 2023 14:59
@github github deleted a comment from Ninja-man07 Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants