Skip to content

Commit

Permalink
[Release] 0.1.3
Browse files Browse the repository at this point in the history
What's changed:

- Created Apps Script to process Google docs, PDF, and Gmail threads. See
  `apps_script/README.md`.
- Added `third_party/g2docsmd-html` which is the base for converting files to
  markdown.
- Updated the main `README.md` for clarity.
- Created a `scripts/README.md` to better explain content processing.
  • Loading branch information
nickvander committed Oct 20, 2023
1 parent 13372ab commit da35ab5
Show file tree
Hide file tree
Showing 16 changed files with 4,157 additions and 387 deletions.
22 changes: 17 additions & 5 deletions demos/palm/python/docs-agent/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Docs Agent

The Docs Agent demo enables [PaLM API][genai-doc-site] users to launch a chat application
The Docs Agent project enables [PaLM API][genai-doc-site] users to launch a chat application
on a Linux-based host machine using their own set of documents as a source dataset.

**Note**: If you're interested in setting up and launching the Docs Agent sample app on your
Expand Down Expand Up @@ -57,7 +57,7 @@ content from the source documents given user questions.
Once the most relevant content is returned, the Docs Agent server uses the prompt structure
shown in Figure 3 to augment the user question with a preset **condition** and a list of
**context**. (When the Docs Agent server starts, the condition value is read from the
[`config.yaml`][condition-txt] file.) Then the Docs Agent server sends this prompt to a
[`config.yaml`][config-yaml] file.) Then the Docs Agent server sends this prompt to a
PaLM 2 model using the PaLM API and receives a response generated by the model.

![Docs Agent prompt strcture](docs/images/docs-agent-prompt-structure-01.png)
Expand All @@ -82,6 +82,9 @@ running on the host machine.
The embeddings in this vector database enable the Docs Agent server to perform semantic search
and retrieve context related to user questions for augmenting prompts.

For more information on the processing of Markdown files, see the [`README`][scripts-readme]
file in the `scripts` directory.

![Document to embeddings](docs/images/docs-agent-embeddings-01.png)

**Figure 4**. A document is split into small semantic chunks, which are then used to generate
Expand Down Expand Up @@ -296,6 +299,13 @@ event of "like" for the response.
The user may click this like button multiple times to toggle the state of the like button. But when
examining the logs, only the final state of the like button will be considered for the response.

### Using Google Docs, PDF, or Gmail as input sources

The project includes Apps Script files that allow you to convert various sources of content
(including Google Docs and PDF) from your Google Drive and Gmail into Markdown files. You can then
use these Markdown files as additional input sources for Docs Agent. For more information, see the
[`README`][apps-script-readme] file in the `apps_script` directory.

## Issues identified

The following issues have been identified and need to be worked on:
Expand Down Expand Up @@ -427,7 +437,7 @@ To convert Markdown files to plain text files:
cd $HOME/generative-ai-docs/demos/palm/python/docs-agent
```

2. Open the `config.yaml` file using a text editor, for example:
2. Open the [`config.yaml`][config-yaml] file using a text editor, for example:

```
nano config.yaml
Expand Down Expand Up @@ -542,7 +552,7 @@ allowing you to easily bring up and destory the Flask app instance.

To customize settings in the Docs Agent chat app, do the following:

1. Edit the `config.yaml` file to update the following field:
1. Edit the [`config.yaml`][config-yaml] file to update the following field:

```
product_name: "My product"
Expand Down Expand Up @@ -636,7 +646,6 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
[set-up-docs-agent]: #set-up-docs-agent
[markdown-to-plain-text]: ./scripts/markdown_to_plain_text.py
[populate-vector-database]: ./scripts/populate_vector_database.py
[condition-txt]: ./config.yaml
[context-source-01]: http://eventhorizontelescope.org
[fact-check-section]: #using-a-palm-2-model-to-fact-check-its-own-response
[related-questions-section]: #using-a-palm-2-model-to-suggest-related-questions
Expand All @@ -650,4 +659,7 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
[flutter-docs-src]: https://github.com/flutter/website/tree/main/src
[flutter-docs-site]: https://docs.flutter.dev/
[poetry-known-issue]: https://github.com/python-poetry/poetry/issues/1917
[apps-script-readme]: ./apps_script/README.md
[scripts-readme]: ./scripts/README.md
[config-yaml]: config.yaml
[gen-ai-docs-repo]: https://github.com/google/generative-ai-docs
164 changes: 164 additions & 0 deletions demos/palm/python/docs-agent/apps_script/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Convert Google Docs, PDF, and Gmail to Markdown files

The collection of scripts in this `apps_script` directory allows you to convert
the contents of Google Drive folders and Gmail to Markdown files that are
compatible with Docs Agent.

The steps are:

1. [Prepare a Google Drive folder](#1-prepare-a-google-driver-folder).
2. [Mount Google Drive on your host machine](#2-mount-google-drive-on-your-host-machine).
3. [Create an Apps Script project](#3-create-an-apps-script-project).
4. [Edit and run main.gs on Apps Script](#4-edit-and-run-main-gs-on-apps-script).
5. [Update config.yaml to include the mounted directory](#5-update-config-yaml-to-include-the-mounted-directory).

## 1. Prepare a Google Drive folder

First, create a new folder in Google Drive and add your Google Docs (which will be
used as source documents to Docs Agent) to the folder.

Do the following:

1. Browser to https://drive.google.com/.
1. Click **+ New** on the top left corner.
1. Click **New folder**.
1. Name your new folder (for example, `my source Google Docs`).
1. To enter the newly created folder, double click the folder.
1. Add (or move) your source Google Docs to this new folder.

## 2. Mount Google Drive on your host machine

Mount your Google Drive to your host machine, so that it becomes easy to access the
folders in Google Drive from your host machine (later in step 5).

There are a variety of methods and tools available online that enable this setup
(for example, see [`google-drive-ocamlfuse`][google-drive-ocamlfuse] for Linux machines).

## 3. Create an Apps Script project

Create a new Apps Script project and copy all the `.gs` scripts in this
`apps_script` directory to your new Apps Script project.

Do the following:

1. Browse to https://script.google.com/.
1. Click **New Project**.
1. At the top of the page, click **Untitled Project** and enter a meaningful
title (for example, `gDocs to Docs Agent`).
1. Click the **+** icon next to **Files**.
1. Click **Script**.
1. Name the new script to be one of the `.gs` files in this `apps_script` directory
(for example, `drive_to_markdown`).
1. Copy the content of the `.gs` file to the new script on your Apps Script project.
1. To save, click the "Save project" icon in the toolbar.
1. Repeat the steps until all the `.gs` files are copied to your Apps Script project.
1. Click the **+** icon next to **Services**.
1. Scroll down and click **Drive API**.
1. Click **Add**.

You are now ready to edit the parameters on the `main.gs` file to select a folder
in Google Drive and export emails from Gmail.

![Apps Script project](../docs/images/apps-script-screenshot-01.png)

**Figure 1**. A screenshot of an example Apps Script project.

## 4. Edit and run main.gs on Apps Script

Edit the `main.gs` file on your Apps Script project to select which functions
(features) you want to run.

Do the following:

1. Browse to your project on https://script.google.com/.

1. Open the `main.gs` file.

1. In the `main` function, comment out any functions that you don't want to run
(see Figure 1):

* `convertDriveFolderToMDForDocsAgent(folderInput)`: This function converts
the contents of a Google Drive folder to Markdown files (currently only Google
Docs and PDF). Make sure to specify a valid Google Drive folder in the `folderInput`
variable. Use the name of the folder created in **step 1** above, for example:

```
var folderInput = "my source Google Docs"
function main() {
convertDriveFolderToMDForDocsAgent(folderInput);
//exportEmailsToMarkdown(SEARCH_QUERY, folderOutput);
}
```
* `exportEmailsToMarkdown(SEARCH_QUERY, folderOutput)`: This function converts
the emails returned from a Gmail search query into Markdown files. Make sure to
specify a search query in the `SEARCH_QUERY` variable. You can test this search
query directly in the Gmail search bar. Also, specify an output directory for the
resulting emails.
1. To save, click the "Save project" icon in the toolbar.
1. Click the "Run" icon in the toolbar.
When this script runs successfully, the Execution log panel prints output similar
to the following:
```
9:55:59 PM Notice Execution completed
```
Also, the script creates a new folder in your Google Drive and stores the converted
Markdown files in this folder. The name of this new folder has `-output` as a postfix.
For example, with the folder name `my source Google Docs`, the name of the new folder
is `my source Google Docs-output`.
With Google Drive mounted on your host machine in step 2, you can now directly access
this folder from the host machine, for example:
```
user@hostname:~/DriveFileStream/My Drive/my source Google Docs-output$ ls
Copy_of_My_Google_Docs_To_Be_Converted.md
```
## 5. Update config.yaml to include the mounted directory
Once you have your Google Drive mounted on the host machine, you can now
specify one of its folders as an input source directory for Docs Agent.
Do the following:
1. In the Docs Agent project, open the [`config.yaml`][config-yaml] file
with a text editor.
1. Specify your mounted Google Drive folder as an `input` group, for example:
```
input:
- path: "/usr/local/google/home/user/DriveFileStream/My Drive/my source Google Docs-output"
url_prefix: "docs.google.com"
```
You **must** specify a value to the `url_prefix` field, such as `docs.google.com`.
Currently this value is used to generate hashes for the content.
1. (**Optional**) Add an additional Google Drive folder for your exported emails,
for example:
```
input:
- path: "/usr/local/google/home/user/DriveFileStream/My Drive/my source Google Docs-output"
url_prefix: "docs.google.com"
- path: "/usr/local/google/home/user/DriveFileStream/My Drive/psa-output"
url_prefix: "mail.google.com"
```
1. Save the changes in the `config.yaml` file.
You're all set with a new documentation source for Docs Agent. You can now follow the
instructions in the project's main [`README`][main-readme] file to launch the Docs Agent app.
<!-- Reference links -->
[config-yaml]: ../config.yaml
[main-readme]: ../README.md
[google-drive-ocamlfuse]: https://github.com/astrada/google-drive-ocamlfuse
Loading

0 comments on commit da35ab5

Please sign in to comment.