Skip to content

Commit

Permalink
Updates v1 (#2)
Browse files Browse the repository at this point in the history
* change name

* more examples in readme

* git ignore

* add delete edges to diff

* Update README.md

* pre-commit hook

* fix docs

* add rust linting action

* typo

* action for pre-commit hooks

* test bad commit

* test rust linter

* fix back code

* run only on main

* Docs + tests + `GraphDiff<..., W = f32>`

* update docs + add tests

* add generic weight

* minor changes

* Update README.md

* add EdgeException

---------

Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
  • Loading branch information
LNS98 and hamishs authored May 22, 2024
1 parent 08c14ab commit c1fe416
Show file tree
Hide file tree
Showing 15 changed files with 816 additions and 167 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: pre-commit

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
run-linters:
name: Run linters
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.10"
architecture: "x64"

- name: pre-commit-run
run: |
pip install pre-commit
pre-commit run --all-files
29 changes: 29 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Rust code linting

on:
push:
paths:
- '**.rs'
- '**/workflows/rust.yml'
branches:
- 'main'
pull_request:
paths:
- '**.rs'
branches:
- 'main'

jobs:
lint_and_test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Run cargo fmt
working-directory: src/
run: cargo fmt -- --check

- name: Run cargo clippy
working-directory: src/
run: cargo clippy --all-targets --all-features -- -Dwarnings
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
target
Cargo.lock*
40 changes: 40 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
repos:
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
language_version: python3
args: ["--line-length", "84"]
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
language_version: python3
name: isort (python)
args: ["--profile", "black", "--line-length", "84"]
- repo: https://github.com/pycqa/flake8.git
rev: 5.0.4
hooks:
- id: flake8
additional_dependencies:
- flake8-black>=0.1.1
language_version: python3
args: [
"--ignore", "C901,E203,E741,W503,BLK100",
"--max-line-length", "84",
"--max-complexity", "18",
"--select", "B,C,E,F,W,T4,B9",
"--per-file-ignores", "__init__.py:F401",
]
- repo: https://github.com/pycqa/pydocstyle
rev: 6.1.1
hooks:
- id: pydocstyle
args: ["--ignore", "D100,D104,D105,D107,D203,D212"]
exclude: examples
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-ast
- id: check-added-large-files
- id: check-merge-conflict
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[package]
name = "edge-cli"
name = "drisk-api"
version = "0.1.0"
edition = "2021"

[lib]
name = "edge_cli"
name = "drisk_api"
crate-type = ["lib", "cdylib"]

[features]
Expand Down
198 changes: 180 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,205 @@
# Edge Python API
API to connect to dRISK Edge.

### Useful Edge Links
Some useful links for new edge users:

- Log in to edge: [demo.drisk.ai](https://demo.drisk.ai/)
- Documentation: [demo.drisk.ai/docs](https://demo.drisk.ai/docs/)



## Installation
```
pip install drisk_api
```

## Usage
```python
from edge_cli import GraphClient
```
## Baisc Usage

The API supports the basic building blocs for Create/Read/Update/Delete operations on the graph. For example:

Create a graph (requires auth token):
```python
graph = GraphClient.create_graph("a graph", token)
```

or connect to one
```python
from drisk_api import GraphClient

token = "<edge_auth_token>"

# create or conntect to a graph
new_graph = GraphClient.create_graph("a graph", token)
graph = GraphClient("graph_id", token)

# make a new node
node_id = graph.create_node(label="a node")

# get a node
node = graph.get_node(node_id)

# get the successors of the node
successors = graph.get_successors(node_id)

# update the node
graph.update_node(node_id, label="new label", size=3)

# add edges in batch
with graph.batch():
graph.add_edge(node, other, weight=5.)

```

Create a node:
## More Examples

We can use these building blocks to create whatever graphs we are most interested in. Below are some examples:


### Wikepedia Crawler

In this example we will scrape the main url links for a given wikipedia page and create a graph out of it.


Most of the code will be leveraging the [wikipedia api](https://pypi.org/project/wikipedia/) and is not particularly important.
What is more interesting is how we can use the `api` to convert the corresponding information into a graph to then explore it in edge.


First load the relevant module

```python
node_id = graph.create_node(label="a node")
import wikipedia
from wikipedia import PageError, DisambiguationError, search, WikipediaPage
from tqdm import tqdm
from drisk_api import GraphClient
```

Get a node's successors:
Let's define some helper functions that will help us create a graph of wikipedia urls for a given page.
The main function to pay attention to is `wiki_scraper` which will find the 'most important' links in a
given page and add them to the graph, linking back to the original page.
It will do this recursively for each node until a terminal condition is reached (e.g. a max recursion depth).


```python
successors = graph.get_successors(node_id)

def find_page(title):
"""Find the wikipedia page."""
results, suggestion = search(title, results=1, suggestion=True)
try:
title = results[0] or suggestion
page = WikipediaPage(title, redirect=True, preload=False)
except IndexError:
raise PageError(title)
return page


def top_links(links, text, top_n):
"""Find most important links in a wikipedia page."""
link_occurrences = {}
for link in links:
link_occurrences[link] = text.lower().count(link.lower())

sorted_links = sorted(link_occurrences.items(), key=lambda x: x[1], reverse=True)

top_n_relevant_links = [link for link, count in sorted_links[:top_n]]

return top_n_relevant_links



def wiki_scraper(
graph,
page_node,
page_name,
string_cache,
visited_pages,
max_depth=3,
current_depth=0,
max_links=10,
first_depth_max_links=100,
):
try:
page = find_page(title=page_name)
except (DisambiguationError, PageError) as e:
return

# add the url to the page_node (and make sure label is right)
graph.update_node(page_node, label=page_name, url=page.url)

if page_name in visited_pages or current_depth >= max_depth:
return

links = top_links(page.links, page.content, first_depth_max_links if current_depth == 0 else max_links)

if current_depth == 0:
tqdm_bar = tqdm(total=len(links), desc="wiki scraping")

for link in links:
if current_depth == 0:
tqdm_bar.update(1)

# see if we have already visted the page
new_page_node = None
if link in string_cache:
new_page_node = string_cache[link]
else:
# if we haven't add a new node and add to cache
new_page_node = graph.create_node(label=link)
string_cache[link] = new_page_node

# link this original page to the new one
graph.create_edge(page_node, new_page_node, 1.)

# repeat for new link
wiki_scraper(
graph,
new_page_node,
link,
string_cache,
visted_pages,
current_depth=current_depth + 1,
max_links=max_links,
first_depth_max_links=first_depth_max_links,
)

visited_pages.add(page_name)

```

Update the properties of a node:
Then we can connect to our graph (or make one):

```python
graph.update_node(node_id, label="new label", size=3)
TOKEN = "<edge_auth_token>"
graph_id = "graph_id"
home_view = "view_id"
g = GraphClient(graph_id, TOKEN)
```

Make changes in batch:
and run the scraper:

```python
with graph.batch():
graph.add_edge(node, other, weight=5.)

page_name = "Napoleon"
string_cache = {}
visted_pages = set()

page_node = g.create_node(label=page_name)
g.add_nodes_to_view(home_view, [page_node], [(0., 0.)])

with g.batch():
wiki_scraper(
g,
page_node,
page_name,
string_cache,
visted_pages,
max_depth=3,
current_depth=0,
max_links=3,
first_depth_max_links=2,
)

```

We can then head to edge to interact with the graph:

<p align="center">
<img src="https://raw.githubusercontent.com/driskai/drisk_api/main/docs/images/Napoleon-graph.png" width="80%">
</p>

![](![](![](![]())))
Binary file added docs/images/Napoleon-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ build-backend = "maturin"

[tool.maturin]
python-source = "python"
module-name = "edge_cli"
module-name = "drisk_api"

[project]
name = "edge_cli"
name = "drisk_api"
version = "0.0.1"
requires-python = ">=3.7"
classifiers = [
Expand Down
1 change: 1 addition & 0 deletions python/drisk_api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .graph_client import EdgeException, GraphClient, Node
Loading

0 comments on commit c1fe416

Please sign in to comment.