Updates v1 (#2)

* change name * more examples in readme * git ignore * add delete edges to diff * Update README.md * pre-commit hook * fix docs * add rust linting action * typo * action for pre-commit hooks * test bad commit * test rust linter * fix back code * run only on main * Docs + tests + `GraphDiff<..., W = f32>` * update docs + add tests * add generic weight * minor changes * Update README.md * add EdgeException --------- Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
driskai · May 22, 2024 · c1fe416 · c1fe416
1 parent 08c14ab
commit c1fe416
Show file tree

Hide file tree

Showing 15 changed files with 816 additions and 167 deletions.
diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
@@ -0,0 +1,29 @@
+name: pre-commit
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  run-linters:
+    name: Run linters
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v3
+        with:
+          python-version: "3.10"
+          architecture: "x64"
+
+      - name: pre-commit-run
+        run: |
+          pip install pre-commit
+          pre-commit run --all-files
+
diff --git a/.github/workflows/rust.yml b/.github/workflows/rust.yml
@@ -0,0 +1,29 @@
+name: Rust code linting
+
+on:
+  push:
+    paths:
+      - '**.rs'
+      - '**/workflows/rust.yml'
+    branches:
+      - 'main'
+  pull_request:
+    paths:
+      - '**.rs'
+    branches:
+      - 'main'
+
+jobs:
+  lint_and_test:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Run cargo fmt
+        working-directory: src/
+        run: cargo fmt -- --check
+
+      - name: Run cargo clippy
+        working-directory: src/
+        run: cargo clippy --all-targets --all-features -- -Dwarnings
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+target
+Cargo.lock*
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,40 @@
+repos:
+-   repo: https://github.com/psf/black
+    rev: 22.3.0
+    hooks:
+    - id: black
+      language_version: python3
+      args: ["--line-length", "84"]
+-   repo: https://github.com/pycqa/isort
+    rev: 5.12.0
+    hooks:
+    - id: isort
+      language_version: python3
+      name: isort (python)
+      args: ["--profile", "black", "--line-length", "84"]
+-   repo: https://github.com/pycqa/flake8.git
+    rev: 5.0.4
+    hooks:
+    - id: flake8
+      additional_dependencies:
+        - flake8-black>=0.1.1
+      language_version: python3
+      args: [
+        "--ignore", "C901,E203,E741,W503,BLK100",
+        "--max-line-length", "84",
+        "--max-complexity", "18",
+        "--select", "B,C,E,F,W,T4,B9",
+        "--per-file-ignores", "__init__.py:F401",
+      ]
+-   repo: https://github.com/pycqa/pydocstyle
+    rev: 6.1.1
+    hooks:
+    - id: pydocstyle
+      args: ["--ignore", "D100,D104,D105,D107,D203,D212"]
+      exclude: examples
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.3.0
+    hooks:
+    - id: check-ast
+    - id: check-added-large-files
+    - id: check-merge-conflict
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,10 +1,10 @@
 [package]
-name = "edge-cli"
+name = "drisk-api"
 version = "0.1.0"
 edition = "2021"
 
 [lib]
-name = "edge_cli"
+name = "drisk_api"
 crate-type = ["lib", "cdylib"]
 
 [features]

diff --git a/README.md b/README.md
@@ -1,43 +1,205 @@
 # Edge Python API
 API to connect to dRISK Edge.
 
+### Useful Edge Links
+Some useful links for new edge users:
+
+- Log in to edge: [demo.drisk.ai](https://demo.drisk.ai/)
+- Documentation: [demo.drisk.ai/docs](https://demo.drisk.ai/docs/)
+
+
+
 ## Installation
 ```
 pip install drisk_api
 ```
 
-## Usage
-```python
-from edge_cli import GraphClient
-```
+## Baisc Usage
+
+The API supports the basic building blocs for Create/Read/Update/Delete operations on the graph. For example:
 
-Create a graph (requires auth token):
-```python
-graph = GraphClient.create_graph("a graph", token)
-```
 
-or connect to one
 ```python
+from drisk_api import GraphClient
+
+token = "<edge_auth_token>"
+
+# create or conntect to a graph
+new_graph = GraphClient.create_graph("a graph", token)
 graph = GraphClient("graph_id", token)
+
+# make a new node
+node_id = graph.create_node(label="a node")
+
+# get a node
+node = graph.get_node(node_id)
+
+# get the successors of the node
+successors = graph.get_successors(node_id)
+
+# update the node
+graph.update_node(node_id, label="new label", size=3)
+
+# add edges in batch
+with graph.batch():
+    graph.add_edge(node, other, weight=5.)
+
 ```
 
-Create a node:
+## More Examples
+
+We can use these building blocks to create whatever graphs we are most interested in. Below are some examples:
+
+
+### Wikepedia Crawler
+
+In this example we will scrape the main url links for a given wikipedia page and create a graph out of it.
+
+
+Most of the code will be leveraging the [wikipedia api](https://pypi.org/project/wikipedia/) and is not particularly important.
+What is more interesting is how we can use the `api` to convert the corresponding information into a graph to then explore it in edge.
+
+
+First load the relevant module
+
 ```python
-node_id = graph.create_node(label="a node")
+import wikipedia
+from wikipedia import PageError, DisambiguationError, search, WikipediaPage
+from tqdm import tqdm
+from drisk_api import GraphClient
 ```
 
-Get a node's successors:
+Let's define some helper functions that will help us create a graph of wikipedia urls for a given page.
+The main function to pay attention to is `wiki_scraper` which will find the 'most important' links in a
+given page and add them to the graph, linking back to the original page.
+It will do this recursively for each node until a terminal condition is reached (e.g. a max recursion depth).
+
+
 ```python
-successors = graph.get_successors(node_id)
+
+def find_page(title):
+    """Find the wikipedia page."""
+    results, suggestion = search(title, results=1, suggestion=True)
+    try:
+        title = results[0] or suggestion
+        page = WikipediaPage(title, redirect=True, preload=False)
+    except IndexError:
+        raise PageError(title)
+    return page
+
+
+def top_links(links, text, top_n):
+    """Find most important links in a wikipedia page."""
+    link_occurrences = {}
+    for link in links:
+        link_occurrences[link] = text.lower().count(link.lower())
+
+    sorted_links = sorted(link_occurrences.items(), key=lambda x: x[1], reverse=True)
+
+    top_n_relevant_links = [link for link, count in sorted_links[:top_n]]
+
+    return top_n_relevant_links
+
+
+
+def wiki_scraper(
+    graph,
+    page_node,
+    page_name,
+    string_cache,
+    visited_pages,
+    max_depth=3,
+    current_depth=0,
+    max_links=10,
+    first_depth_max_links=100,
+):
+    try:
+        page = find_page(title=page_name)
+    except (DisambiguationError, PageError) as e:
+        return
+
+    # add the url to the page_node (and make sure label is right)
+    graph.update_node(page_node, label=page_name, url=page.url)
+
+    if page_name in visited_pages or current_depth >= max_depth:
+        return
+
+    links = top_links(page.links, page.content, first_depth_max_links if current_depth == 0 else max_links)
+
+    if current_depth == 0:
+        tqdm_bar = tqdm(total=len(links), desc="wiki scraping")
+
+    for link in links:
+        if current_depth == 0:
+            tqdm_bar.update(1)
+
+        # see if we have already visted the page
+        new_page_node = None
+        if link in string_cache:
+            new_page_node = string_cache[link]
+        else:
+            # if we haven't add a new node and add to cache
+            new_page_node = graph.create_node(label=link)
+            string_cache[link] = new_page_node
+
+        # link this original page to the new one
+        graph.create_edge(page_node, new_page_node, 1.)
+
+        # repeat for new link
+        wiki_scraper(
+            graph,
+            new_page_node,
+            link,
+            string_cache,
+            visted_pages,
+            current_depth=current_depth + 1,
+            max_links=max_links,
+            first_depth_max_links=first_depth_max_links,
+        )
+
+    visited_pages.add(page_name)
+
 ```
 
-Update the properties of a node:
+Then we can connect to our graph (or make one):
+
 ```python
-graph.update_node(node_id, label="new label", size=3)
+TOKEN = "<edge_auth_token>"
+graph_id = "graph_id"
+home_view = "view_id"
+g = GraphClient(graph_id, TOKEN)
 ```
 
-Make changes in batch:
+and run the scraper:
+
 ```python
-with graph.batch():
-    graph.add_edge(node, other, weight=5.)
+
+page_name = "Napoleon"
+string_cache = {}
+visted_pages = set()
+
+page_node = g.create_node(label=page_name)
+g.add_nodes_to_view(home_view, [page_node], [(0., 0.)])
+
+with g.batch():
+    wiki_scraper(
+        g,
+        page_node,
+        page_name,
+        string_cache,
+        visted_pages,
+        max_depth=3,
+        current_depth=0,
+        max_links=3,
+        first_depth_max_links=2,
+    )
+
 ```
+
+We can then head to edge to interact with the graph:
+
+<p align="center">
+<img src="https://raw.githubusercontent.com/driskai/drisk_api/main/docs/images/Napoleon-graph.png" width="80%">
+</p>
+
+![](![](![](![]())))
diff --git a/docs/images/Napoleon-graph.png b/docs/images/Napoleon-graph.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,10 +4,10 @@ build-backend = "maturin"
 
 [tool.maturin]
 python-source = "python"
-module-name = "edge_cli"
+module-name = "drisk_api"
 
 [project]
-name = "edge_cli"
+name = "drisk_api"
 version = "0.0.1"
 requires-python = ">=3.7"
 classifiers = [

diff --git a/python/drisk_api/__init__.py b/python/drisk_api/__init__.py
@@ -0,0 +1 @@
+from .graph_client import EdgeException, GraphClient, Node
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		from .graph_client import EdgeException, GraphClient, Node