Skip to content
Peter Marsh edited this page Feb 19, 2024 · 53 revisions

Spy Logic is a learning game that teaches users prompt injection attacks and defences, through a series of increasingly difficult challenges.

This wiki is intended to be very high level, and act more as a signposting system to specific parts of code. Read the code alongside the wiki to get the fullest understanding, or just read the docs to get a broad-strokes overview of how things work.

To get acquainted with the project, the best way is to run and play the game. Failing that you can have a read of intro to Spy Logic.

Terminology

In these docs I will use the term LLM to describe a generic Large Language Model, usually one created and hosted by OpenAI. In contrast, I will use bot to describe ScottBrewBot, which is our fictional integrated chatbot. It describes the system as a whole, which includes an LLM at its core, but also includes the mechanisms we've placed around it (defences, error handling, etc).

How we Use LLMs

Large Language Models lie at the core of the game. We use models that are hosted on OpenAI's servers, via the OpenAI API. We use LLMs to

  • respond to user messages
  • query documents
  • evaluate whether a prompt is malicious

There are two libraries which we use to make it easier to interface with LLMs and the OpenAI API: Langchain and OpenAI. They both act as wrappers for calling the OpenAI API, and provide useful abstractions and extra functionality. See OpenAI to understand how we get responses. See Langchain to see how we query documents and evaluate prompts.

image

Levels and Game Flow

Since each level follows the same shape (chat to chat bot, find out secret information, email it), we can reuse the same functionality on the frontend and backend on every level. So then what's different between levels?

Switching between Levels

The browser is responsible for keeping track of the currentLevel in local storage. When the user selects a new level, we update this value. A copy is kept in the component state as well. This currentLevel is passed to the backend whenever we make a request, which is how it knows what level the user is on, and acts accordingly.

Now, the backend is responsible for keeping track of the state of each level. This is kept in session storage, in an array of LevelStates (see sessionRoutes.ts). The LevelState holds the chat history, sent emails and configured defences for that level.

When the user selects a new level, we also load up the chat history and sent emails for the new level, so that they can be shown in the UI.

The Win Condition

The user wins a level if they send an email to the correct email address containing the right word, phrase or value. In the backend, once an email has been sent, we use checkLevelWinCondition which checks the currentLevel value that was given in the 'send message' request body. If the email satisfies the win condition for the current level, then we return wonLevel: true in the response body to the front end.

The System Role

The Available Documents

Applying Defences

When the user sends the bot a chat message, defences are only applied on level 3 and the sandbox. In handleChatToGPT (chatController.ts) we check the currentLevel value in the request body to determine whether we get a response via handleChatWithoutDefenceDetection or handleChatWithDefenceDetection.

Defences

Session and Memory