Skip to content

Commit

Permalink
Merge pull request #18 from piercefreeman/feature/ca-install-cli
Browse files Browse the repository at this point in the history
Make the installation process easier for third party libraries by extending our executable with an install-ca command. This currently supports installation on MacOS and Ubuntu.

We also add build logic for the python library to deploy via pypi in one executable. Specifically:
- Add a new build extension phase that will build the go executable and deposit into the correct asset directory. This is somewhat of a rare / anti-pattern in distutil based pipelines, since we're not building a shared .so library. This is intentional since our goal is to deliver the appropriate go executable as a separate process and not to integrate it at the code level with our python application.
- Add separate runners to build component wheels on Ubuntu and OSX
- Combine wheels on final workflow and upload via poetry to pypi
  • Loading branch information
piercefreeman authored Oct 18, 2022
2 parents f6ee732 + 10db3b4 commit f8be16b
Show file tree
Hide file tree
Showing 18 changed files with 418 additions and 63 deletions.
83 changes: 82 additions & 1 deletion .github/workflows/test-groove.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
push: true
tags: ${{ env.IMAGE }}:groove-${{ github.sha }}

run_tests:
run_python_tests:
name: Run groove-python tests
runs-on: ubuntu-latest
needs: build
Expand All @@ -54,3 +54,84 @@ jobs:
- name: Run test
run:
docker run ${{ env.IMAGE }}:groove-${{ github.sha }} test-python

build_python_wheels:
name: Build wheels ${{ matrix.os }} - python ${{ matrix.python }}
if: startsWith(github.ref, 'refs/tags/v')
needs: run_python_tests

strategy:
matrix:
os: [ubuntu-20.04, macos-11]
python: ["3.10"]

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3

- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python }}

- uses: actions/setup-go@v3
with:
go-version: '^1.18.1'

- name: Install poetry
run:
curl -sSL https://install.python-poetry.org | python3 -

- name: Build wheels
run: |
export PATH="/Users/runner/.local/bin:$PATH"
cd groove
cp -r proxy groove-python
cd groove-python
poetry install
poetry build
- name: List wheels
run: |
cd groove/groove-python/dist
ls -ls
- uses: actions/upload-artifact@v3
with:
path: groove/groove-python/dist/*.whl

publish_python_package:
name: Publish python package
needs: [build_python_wheels]

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- uses: actions/download-artifact@v3
with:
# unpacks default artifact into dist/
# if `name: artifact` is omitted, the action will create extra parent dir
name: artifact
path: groove/groove-python/dist

- uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install poetry
run:
curl -sSL https://install.python-poetry.org | python3 -

- name: Build sdist static artifact
run: |
cd groove
cp -r proxy groove-python
cd groove-python
poetry install
poetry build --format sdist
- name: Publish
run: |
cd groove/groove-python
poetry publish --username ${{ secrets.PYPI_USERNAME }} --password ${{ secrets.PYPI_PASSWORD }}
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__pycache__
.DS_Store

node_modules

Expand All @@ -17,4 +17,9 @@ node_modules
.pytest_cache

build
dist
**/assets/grooveproxy

# Exclude built python files
**/*.so
__pycache__
2 changes: 0 additions & 2 deletions groove/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,4 @@ mkdir -p build
(cd proxy && go build -o ../build)

# Python
rm -rf ./groove-python/groove/assets/ssl
cp ./build/grooveproxy ./groove-python/groove/assets/grooveproxy
cp -r ./proxy/ssl ./groove-python/groove/assets/ssl
Binary file added groove/groove-python/.DS_Store
Binary file not shown.
81 changes: 79 additions & 2 deletions groove/groove-python/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,80 @@
# groove-python
# Groove

Python APIs for Groove.
Python APIs for Groove, a proxy server built for web crawling and unit test mocking. Highlights of its primary features:

- HTTP and HTTPs support over HTTP/1 and HTTP/2.
- Local CA certificate generation and installation on Mac and Linux to support system curl and Chromium.
- Different tiers of caching support - from disabling completely to aggressively maintaining all body archives.
- Limit outbound requests of the same URL to 1 concurrent request to save on bandwidth if requests are already inflight.
- Record and replay requests made to outgoing servers. Recreate testing flows in unit tests while separating them from crawling business logic.
- 3rd party proxy support for commercial proxies.
- Custom TLS Hello Client support to maintain a Chromium-like TLS handshake while intercepting requests and re-forwarding on packets.

For more information, see the [Github](https://github.com/piercefreeman/grooveproxy) project.

## Usage

Add groove to your project and install the local certificates that allow for https certificate generation:

```
pip install groove
install-ca
```

Instantiating Groove with the default parameters is usually fine for most deployments. To ensure we clean up resources once you're completed with the proxy, wrap your code in the `launch` contextmanager.

```
from groove.proxy import Groove
from requests import get
proxy = Groove()
with proxy.launch():
response = get(
"https://www.example.com",
proxies={
"http": proxy.base_url_proxy,
"https": proxy.base_url_proxy,
}
)
assert response.status_code == 200
```

Create a fully fake outbound for testing:

```
from groove.proxy import Groove
from requests import get
records = [
TapeRecord(
request=TapeRequest(
url="https://example.com:443/",
method="GET",
headers={},
body=b"",
),
response=TapeResponse(
status=200,
headers={},
body=b64encode("Test response".encode())
),
)
]
proxy = Groove()
with proxy.launch():
proxy.tape_load(
TapeSession(
records=records
)
)
response = get(
"https://www.example.com",
proxies={
"http": proxy.base_url_proxy,
"https": proxy.base_url_proxy,
}
)
assert response.content == b"Test response"
```
75 changes: 75 additions & 0 deletions groove/groove-python/build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from distutils.command.build_ext import build_ext
from distutils.core import Distribution
from distutils.errors import (CCompilerError, CompileError, DistutilsExecError,
DistutilsPlatformError)
from distutils.extension import Extension
from os import chmod, stat
from pathlib import Path
from shutil import copyfile
from subprocess import run


class GoExtension(Extension):
def __init__(self, name, path):
super().__init__(name, sources=[])
self.path = path


extensions = [
GoExtension(
#"groove",
"groove.assets.grooveproxy",
# Assume we have temporarily copied over the proxy folder into our current path
"./proxy",
)
]


class BuildFailed(Exception):
pass


class GoExtensionBuilder(build_ext):
def run(self):
try:
build_ext.run(self)
except (DistutilsPlatformError, FileNotFoundError):
raise BuildFailed("File not found. Could not compile extension.")

def build_extension(self, ext):
try:
if isinstance(ext, GoExtension):
extension_root = Path(__file__).parent.resolve() / ext.path
ext_path = self.get_ext_fullpath(ext.name)
result = run(["go", "build", "-o", str(Path(ext_path).absolute())], cwd=extension_root)
if result.returncode != 0:
raise CompileError("Go build failed")
else:
build_ext.build_extension(self, ext)
except (CCompilerError, DistutilsExecError, DistutilsPlatformError, ValueError):
raise BuildFailed('Could not compile C extension.')


def build(setup_kwargs):
distribution = Distribution({"name": "python_ctypes", "ext_modules": extensions})
distribution.package_dir = "python_ctypes"

cmd = GoExtensionBuilder(distribution)
cmd.ensure_finalized()
cmd.run()

# This is somewhat of a hack with go executables; this pipeline will package
# them as .so files but they aren't actually built libraries. We maintain
# this convention only for the ease of plugging in to poetry and distutils that
# use this suffix to indicate the build architecture and run on the
# correct downstream client OS.
for output in cmd.get_outputs():
relative_extension = Path(output).relative_to(cmd.build_lib)
copyfile(output, relative_extension)
mode = stat(relative_extension).st_mode
mode |= (mode & 0o444) >> 2
chmod(relative_extension, mode)


if __name__ == "__main__":
build({})
10 changes: 10 additions & 0 deletions groove/groove-python/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from subprocess import run

from groove.assets import get_asset_path


def install_ca():
run(
str(get_asset_path("grooveproxy")),
"install-ca",
)
20 changes: 16 additions & 4 deletions groove/groove-python/groove/proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
from gzip import compress, decompress
from json import dumps, loads
from subprocess import Popen
from sysconfig import get_config_var
from time import sleep
from urllib.parse import urljoin

from groove.assets import get_asset_path
from pydantic import BaseModel, validator
from requests import Session

from groove.assets import get_asset_path


class CacheModeEnum(Enum):
# Ensure enum values are aligned with the cache.go definitions
Expand Down Expand Up @@ -100,8 +102,6 @@ def __init__(
@contextmanager
def launch(self):
parameters = {
"--ca-certificate": get_asset_path("ssl/ca.crt"),
"--ca-key": get_asset_path("ssl/ca.key"),
"--port": self.port,
"--control-port": self.control_port,
"--proxy-server": self.proxy_server,
Expand All @@ -116,7 +116,7 @@ def launch(self):

process = Popen(
[
str(get_asset_path("grooveproxy")),
self.executable_path,
*[
str(item)
for key, value in parameters.items()
Expand Down Expand Up @@ -161,3 +161,15 @@ def set_cache_mode(self, mode: CacheModeEnum):
)
)
assert response.json()["success"] == True

@property
def executable_path(self) -> str:
# Support statically and dynamically build libraries
if (path := get_asset_path("grooveproxy")).exists():
return str(path)

wheel_extension = get_config_var("EXT_SUFFIX")
if (path := get_asset_path(f"grooveproxy{wheel_extension}")).exists():
return exit(path)

raise ValueError("No groove executable file found")
2 changes: 1 addition & 1 deletion groove/groove-python/groove/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import pytest
from playwright.sync_api import sync_playwright

from groove.proxy import Groove
from playwright.sync_api import sync_playwright


# We want this to recreate by default on every unit test to clear the state
Expand Down
14 changes: 5 additions & 9 deletions groove/groove-python/groove/tests/test_auth.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
from bs4 import BeautifulSoup
from base64 import b64encode

import pytest
from bs4 import BeautifulSoup
from playwright._impl._api_types import Error as PlaywrightError
from requests import get
from groove.assets import get_asset_path

from groove.proxy import (
Groove,
TapeRecord,
TapeRequest,
TapeResponse,
TapeSession,
)
from groove.assets import get_asset_path
from groove.proxy import (Groove, TapeRecord, TapeRequest, TapeResponse,
TapeSession)

AUTH_USERNAME = "test-username"
AUTH_PASSWORD = "test-password"
Expand Down
2 changes: 1 addition & 1 deletion groove/groove-python/groove/tests/test_cache.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from uuid import uuid4

from bs4 import BeautifulSoup
from groove.tests.mock_server import MockPageDefinition, mock_server

from groove.proxy import CacheModeEnum
from groove.tests.mock_server import MockPageDefinition, mock_server


def test_cache_off(proxy, browser):
Expand Down
Loading

0 comments on commit f8be16b

Please sign in to comment.