Skip to content

Commit

Permalink
Merge pull request #5 from CambioML/dev
Browse files Browse the repository at this point in the history
Add Example folder, use CAMBIO API key from .env, use preprod endpoint, bump up version to 0.0.2
  • Loading branch information
CambioML authored Apr 3, 2024
2 parents 84d4c52 + 0335d65 commit 98129c9
Show file tree
Hide file tree
Showing 12 changed files with 278 additions and 189 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ ipython_config.py
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
Expand Down
Binary file added examples/test1.pdf
Binary file not shown.
Binary file added examples/test2.pdf
Binary file not shown.
143 changes: 143 additions & 0 deletions examples/test_document_extraction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# File Extraction"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import sys\n",
"\n",
"sys.path.append(\".\")\n",
"sys.path.append(\"..\")\n",
"sys.path.append(\"../..\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from open_parser import OpenParser\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"\n",
"example_apikey = os.getenv(\"CAMBIO_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Upload response: 204\n",
"Extraction success.\n",
"# Productivity and Business Processes\n",
"\n",
"## Overview\n",
"\n",
"| Investor Metrics | FY23 Q1 | FY23 Q2 | FY23 Q3 | FY23 Q4 | FY24 Q1 |\n",
"|:-------------------------------------------------------------------|:----------|:----------|:----------|:----------|:----------|\n",
"| Office Commercial products and cloud services revenue growth (y/y) | 7% / 13% | 7% 14% | 13% / 17% | 12% / 14% | 15% / 14% |\n",
"| Office Consumer products and cloud services revenue growth (y/y) | 7% 11% | (2)% 3% | 1% 4% | 3% 6% | 3% 4% |\n",
"| Office 365 Commercial seat growth (y/y) | 14% | 12% | 11% | 11% | 10% |\n",
"| Microsoft 365 Consumer subscribers (in millions) | 65.1 | 67.7 | 70.8 | 74.9 | 76.7 |\n",
"| Dynamics products and cloud services revenue growth (y/y) | 15% / 22% | 13% 20% | 17% / 21% | 19% / 21% | 22% / 21% |\n",
"| LinkedIn revenue growth (y/y) | 17% / 21% | 10% / 14% | 8% 11% | 6% 8% | 8% |\n",
"\n",
"Growth rates include non-GAAP CC growth (GAAP %/CC%)\n",
"\n",
"## Press release\n",
"\n",
"## Business Highlights\n",
"\n",
"Revenue in Productivity and Business Processes was $17.0 billion and increased 7% (up 13% in constant currency), with the following business highlights:\n",
"\n",
"Office Commercial products and cloud services revenue increased 7% (up 14% in constant currency) driven by Office 365 Commercial revenue growth of 11% (up 18% in constant currency)\n",
"Office Consumer products and cloud services revenue decreased 2% (up 3% in constant currency) and Microsoft 365 Consumer subscribers grew to 63.2 million\n",
"LinkedIn revenue increased 10% (up 14% in constant currency)\n",
"Dynamics products and cloud services revenue increased 13% (up 20% in constant currency) driven by Dynamics 365 revenue growth of 21% (up 29% in constant currency)\n",
"\n",
"Revenue in Intelligent Cloud was $21.5 billion and increased 18% (up 24% in constant currency), with the following business highlights:\n",
"\n",
"Server products and cloud services revenue increased 20% (up 26% in constant currency) driven by Azure and other cloud services revenue growth of 31% (up 38% in constant currency)\n",
"\n",
"Revenue in More Personal Computing was $14.2 billion and decreased 19% (down 16% in constant currency), with the following business highlights:\n",
"\n",
"Windows OEM revenue decreased 39%\n",
"Windows Commercial products and cloud services revenue decreased 3% (up 3% in constant currency)\n",
"Xbox content and services revenue decreased 12% (down 8% in constant currency)\n",
"Search and news advertising revenue excluding traffic acquisition costs increased 10% (up 15% in constant currency)\n",
"Devices revenue decreased 39% (down 34% in constant currency)\n",
"\n",
"## Financial statement-MD&A\n",
"\n",
"Highlights from the second quarter of fiscal year 2024 compared with the second quarter of fiscal year 2023 included:\n",
"\n",
"Microsoft Cloud revenue increased 24% to $33.7 billion\n",
"Office Commercial products and cloud services revenue increased 15% driven by Office 365 Commercial growth of 17%\n",
"Office Consumer products and cloud services revenue increased 5% and Microsoft 365 Consumer subscribers grew to 78.4 million\n",
"LinkedIn revenue increased 9%\n",
"Dynamics products and cloud services revenue increased 21% driven by Dynamics 365 growth of 27%\n",
"Server products and cloud services revenue increased 22% driven by Azure and other cloud services growth of 30%\n",
"Windows revenue increased 9% with Windows original equipment manufacturer licensing (\"Windows OEM\") revenue growth of 11% and Windows Commercial products and cloud services revenue growth of 9%\n",
"Devices revenue decreased 9%\n"
]
}
],
"source": [
"example_local_file = \"./test2.pdf\"\n",
"\n",
"op = OpenParser(example_apikey)\n",
"\n",
"content_result = op.extract(example_local_file)\n",
"print(content_result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "open-parser",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
30 changes: 30 additions & 0 deletions examples/test_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import os
import sys

from dotenv import load_dotenv

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

from open_parser import OpenParser # noqa: E402

if __name__ == "__main__":
load_dotenv()

example_apikey = os.getenv("CAMBIO_API_KEY")

example_local_file = "./test2.pdf"

op = OpenParser(example_apikey)

print("file/document extraction test:")
content_result = op.extract(example_local_file)
print(type(content_result))
print(content_result)

print("information extraction test:")
example_prompt = "Return table under Investor Metrics in JSON format with year as the key and the column as subkeys."
qa_result = op.parse(example_local_file, example_prompt)
print(type(qa_result))
print(qa_result)
95 changes: 95 additions & 0 deletions examples/test_information_extraction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Information Extraction"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import sys\n",
"\n",
"sys.path.append(\".\")\n",
"sys.path.append(\"..\")\n",
"sys.path.append(\"../..\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from open_parser import OpenParser\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"load_dotenv(override=True)\n",
"\n",
"example_apikey = os.getenv(\"CAMBIO_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Upload response: 204\n",
"Extraction success.\n",
"[{'result': [{'FY23 Q1': {'Office Commercial products and cloud services revenue growth (y/y)': '7% / 13%', 'Office Consumer products and cloud services revenue growth (y/y)': '7% / 11%', 'Office 365 Commercial seat growth (y/y)': '14%', 'Microsoft 365 Consumer subscribers (in millions)': '65.1', 'Dynamics products and cloud services revenue growth (y/y)': '15% / 22%', 'LinkedIn revenue growth (y/y)': '17% / 21%'}}, {'FY23 Q2': {'Office Commercial products and cloud services revenue growth (y/y)': '7% / 14%', 'Office Consumer products and cloud services revenue growth (y/y)': '2% / 3%', 'Office 365 Commercial seat growth (y/y)': '12%', 'Microsoft 365 Consumer subscribers (in millions)': '67.7', 'Dynamics products and cloud services revenue growth (y/y)': '13% / 20%', 'LinkedIn revenue growth (y/y)': '10% / 14%'}}, {'FY23 Q3': {'Office Commercial products and cloud services revenue growth (y/y)': '13% / 17%', 'Office Consumer products and cloud services revenue growth (y/y)': '1% / 4%', 'Office 365 Commercial seat growth (y/y)': '11%', 'Microsoft 365 Consumer subscribers (in millions)': '70.8', 'Dynamics products and cloud services revenue growth (y/y)': '17% / 21%', 'LinkedIn revenue growth (y/y)': '8% / 11%'}}, {'FY23 Q4': {'Office Commercial products and cloud services revenue growth (y/y)': '12% / 14%', 'Office Consumer products and cloud services revenue growth (y/y)': '3% / 6%', 'Office 365 Commercial seat growth (y/y)': '11%', 'Microsoft 365 Consumer subscribers (in millions)': '74.9', 'Dynamics products and cloud services revenue growth (y/y)': '19% / 21%', 'LinkedIn revenue growth (y/y)': '6% / 8%'}}, {'FY24 Q1': {'Office Commercial products and cloud services revenue growth (y/y)': '15% / 14%', 'Office Consumer products and cloud services revenue growth (y/y)': '3% / 4%', 'Office 365 Commercial seat growth (y/y)': '10%', 'Microsoft 365 Consumer subscribers (in millions)': '76.7', 'Dynamics products and cloud services revenue growth (y/y)': '22% / 21%', 'LinkedIn revenue growth (y/y)': '8%'}}], 'log': {'instruction': 'Return table under Investor Metrics in JSON format with year as the key and the column as subkeys.', 'source': '', 'usage': {'input_tokens': 1758, 'output_tokens': 771}}, 'page_num': 0}]\n"
]
}
],
"source": [
"example_local_file = \"./test2.pdf\"\n",
"example_prompt = \"Return table under Investor Metrics in JSON format with year as the key and the column as subkeys.\"\n",
"\n",
"op = OpenParser(example_apikey)\n",
"qa_result = op.parse(example_local_file, example_prompt)\n",
"\n",
"print(qa_result)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "open-parser",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
4 changes: 4 additions & 0 deletions open_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
from open_parser.base import OpenParser

__all__ = ["OpenParser"]

__version__ = "0.0.2"
6 changes: 3 additions & 3 deletions open_parser/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
import requests

CAMBIO_UPLOAD_URL = (
"https://p1iz3c1c77.execute-api.us-west-2.amazonaws.com/v1/cambio_api/upload"
"https://nl6h9ycq39.execute-api.us-west-2.amazonaws.com/v1/cambio_api/upload"
)
CAMBIO_EXTRACT_URL = (
"https://p1iz3c1c77.execute-api.us-west-2.amazonaws.com/v1/cambio_api/extract"
"https://nl6h9ycq39.execute-api.us-west-2.amazonaws.com/v1/cambio_api/extract"
)
CAMBIO_PARSE_URL = (
"https://p1iz3c1c77.execute-api.us-west-2.amazonaws.com/v1/cambio_api/parse"
"https://nl6h9ycq39.execute-api.us-west-2.amazonaws.com/v1/cambio_api/parse"
)


Expand Down
Loading

0 comments on commit 98129c9

Please sign in to comment.