Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Graph-Based Modeling for Anti-Gaming and Coverage Analysis #23

Open
evandiewald opened this issue Nov 4, 2021 · 0 comments
Open

Graph-Based Modeling for Anti-Gaming and Coverage Analysis #23

evandiewald opened this issue Nov 4, 2021 · 0 comments
Assignees
Labels
2.accept accepted, move to contracting cat.Tools/Infrastructure category of application: Tools/Infrastructure Data Analytics

Comments

@evandiewald
Copy link

Project:

Adaptive Network Modeling using Graph-Based Representations

Elevator Pitch:

Helium's Blockchain API is an effective way to view historical data stored on-chain, but the ledger-based format is less useful for feeding directly into network models. In this project, we propose to build a framework for a graph-based representation of blockchain activity, including Proof of Coverage and Token Flow. By capturing the natural adjacency between hotspots and accounts, we will be able to build machine learning models to, for instance, identify likely "gaming" behavior and predict coverage maps based on hotspot placement.

Total fiat/hnt ask:

18750 USD

Name and Address:

Please provide your legal name and a link to the submitted issue to grants@dewi.org.
This will streamline the contract process and KYC. A lack of this information will delay the contract.

Team or projects social: (optional)

LinkedIn

About the Applicant:

Evan is a graduate student with years of experience applying machine learning to messy datasets. A longtime member of the Helium Ecosystem, his team won the Grand Prize in the Hackster.io #IoTForGood contest for their predictive beehive monitoring system. He also maintains py-helium-console-client, a Python wrapper for the Console HTTP API. Evan fully embraces open source development and documents his projects in Medium publications like Towards Data Science and Better Programming.

Github (evandiewald)

Project Details:

The goal of this project is to create a dynamic, graph-based representation of the Helium Network and develop a preliminary suite of real-time analysis tools to characterize concepts like token flow, coverage mapping, and anomalous hotspot activity. Because Network Graphs natively capture the adjacency between nodes, they are widely used in a variety of applications, including search engines, social media platforms, and even biology. This data structure is also advantageous for the Helium Blockchain, which contains a number of connected elements, such as:

  • Hotspot → Hotspot (Witness Lists)
  • Account → Account (Token Flow)
  • Account → Hotspot (Ownership)

With this representation in place, we can leverage decades of research in graph theory to extract insights about network behavior. For example, Betweenness Centrality, which uses shortest path metrics to identify the nodes that uniquely connect disparate portions of a graph, has been used to identify Reddit communities with the most influence on pop culture. In the context of Proof of Coverage, betweenness can help us find the hotspots that - through witness paths - connect distinct neighborhoods in a city (see below).

Betweenness in Pittsburgh, PA

In addition to position, we can also apply relevant features to each node, such as local elevation and PoCv11 antenna characteristics, as well as each edge, like the reported RSSI of that witness path. As demonstrated in this blog post, we can use these features to train Graph Neural Networks for the purpose of, for instance, anomaly detection and predictive modeling.

The interpretability of Proof of Coverage is a double-edged sword. On one hand, mining rewards incentivize productive participants to optimize network coverage through well-defined criteria for hotspot placement and configuration. However, these rules also provide convenient thresholds for malicious actors to work around. Alternatively, AI-based approaches can be used to identify nonlinear decision boundaries that are more difficult to circumvent. They also have the benefit of real-time optimization when trained on continuously-evolving datasets. While we are not proposing that such a scheme be implemented in the core consensus protocol, it may be useful for analytics, including gaming detection and predictive modeling. For example, given a certain layout of hotspots in a region, what can we expect the coverage map to look like?

From the perspective of Helium's economics, graphs can also inherently capture concepts like token flow between wallets and exchanges, as well as hotspot ownership. While this information can be extracted from the official Helium API, by storing the data in a native graph database platform (such as the open-source ArangoDB), adjacency is expressed directly, which simplifies analytics and visualization tools.

Technical Objectives:

  • Graph Database and Extraction Toolkit: Establish a scalable & modular pipeline for generating and storing the graphs in a database, likely ArangoDB. We will create an API with methods for common queries (e.g. get the graph for a given city), as well as an open-source Python library to transform the extracted graphs into analytics-friendly formats, such as NetworkX and PyTorch Geometric. We want to provide Helium and the community with all the tools they need to leverage the dataset in their own analysis pipelines.

  • Graph Development: Explore different ideas for the graphs themselves, regarding the scale and nature of the adjacency matrix. In the demo implementation, the global Helium Network was segmented on a city-to-city basis, but it may also make sense to try kRings or other, more localized representations.

  • Feature Engineering: With the advent of POCv11, we can also incorporate local regulations and antenna setups into the feature set, which will help us characterize regional differences and improve our ability to spot anomalous activity. A stretch goal would be to incorporate features that are not stored on-chain, but would be useful for modeling, e.g. local geography and/or elevation.

  • Anomaly Detection: Develop a proof-of-concept, real-time anomaly detection model. We will explore Graph Neural Network-based architectures, as well as more conventional clustering approaches, like PCA and t-SNE (the idea being to capture the main distribution of "nominal" hotspots, where outliers fall somewhere outside that main cluster).

  • Modeling Coverage Maps: Develop a predictive model that, given a certain arrangement of hotspots, generates the expected coverage map, rewards scales, and/or witness paths.

  • Dashboard: Finally, we will present these results to the community with a preliminary visualization tool. A live dashboard indicating, for instance, how many "outlier" hotspots we are detecting at any given moment, how much HNT is being lost due to these bad actors, as well as a graph of real-time token flow. In addition to useful metrics, this should give us a sense of the scalability and stability of the ETL pipeline.

Roadmap:

Milestone + Date Deliverable Summary Cost
MS1, Dec. 7, 2021 Graph Database Importer Automated pipeline for generating and importing network graphs to ArangoDB database. Dataset will include adjacencies between hotspots and accounts, using features contained on-chain. Estimated at 25 developer hours. 3125 USD
MS2, Dec. 14, 2021 API + Deep Learning Toolkit (alpha) v1 API to support common queries of the database, as well as initial Python package for converting results into analysis-friendly formats. This phase will include an investigation of the feasibility of embedding off-chain data, such as elevation/geographic features. Estimated at 25 developer hours. 3125 USD
MS3, Dec. 21, 2021 Model Building Pt. 1 Preliminary real-time Graph NN-based model(s) for anomaly detection. Goal will be to estimate the percentage of "gaming" activity on the network, by number of hotspots and by rewards. Estimated at 20 developer hours. 2500 USD
MS4, Jan. 10, 2022 Model Building Pt. 2 Preliminary model for coverage mapping. Goal will be to predict coverage (if feasible from limited mapper data at this time) and/or estimated rewards by hotspot given a certain configuration, which will (ideally) aid PoC optimization efforts. Estimated at 20 developer hours. 2500 USD
MS5, Jan. 17, 2022 Model Building Pt. 3 Real-time token flow analysics tool. Will focus on aggregate movement to and from exchanges, as well as significant transfers. Estimated at 15 developer hours. 1875 USD
MS6, Jan 27, 2022 Dashboard Basic web-based dashboard showing real-time metrics for anomaly detection & token flow models. Visual demo of coverage mapping prediction as function of hotspot placement. Estimated at 25 developer hours. 3125 USD
MS7, Feb 7, 2022 Final Deliverables Open-source repository of analysis tools + Medium article(s) describing the completed work and instructions on how the community can access the dataset to create future models/tools. Estimated at 20 developer hours. 2500 USD
@Scottsigel Scottsigel self-assigned this Nov 4, 2021
@Scottsigel Scottsigel added cat.Tools/Infrastructure category of application: Tools/Infrastructure 2.accept accepted, move to contracting labels Nov 5, 2021
@evandiewald evandiewald changed the title Adaptive Network Modeling using Graph-Based Representations Graph-Based Modeling for Anti-Gaming and Coverage Analysis Nov 5, 2021
@jthiller jthiller self-assigned this Dec 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
2.accept accepted, move to contracting cat.Tools/Infrastructure category of application: Tools/Infrastructure Data Analytics
Projects
None yet
Development

No branches or pull requests

4 participants