Graph Summarization #9

Hevia · 2022-12-14T01:15:00Z

Implement graph summarization method similar to: https://github.com/mswellhao/PacSum

Required Tasks:

Tokenize by Sentence, and create Sentence nodes that connect to a Document node
Add functionality to SentenceGraph to support sentence/node mapping
Add previous/next sentence relations for sentences in a document
Create sentence similarity relations if sentences meet a threshold (may or may not be worth saving all edge weights)
Research augmentations you can make to make this method suitable for MDS
Implement the PACSUM extractor algorithm (This might be worth implementing in raw Neo4J as opposed to computing at the API level)

Hevia · 2022-12-16T09:02:37Z

Looks like the two functions we need to copy are:

Will be worth writing some pseudo code here. Will help narrow down the Cypher required

Hevia · 2022-12-16T22:58:03Z

Algorithm:

Inputs: A list of sentence nodes, beta, lambda1, lambda2

Get the minimum, and maximum edge weight
Use those values + a provided beta value to compute the minimum edge threshold
We then compute the forward and backward scores (after playing with the code, I have a better idea of how/why this works)
Add each nodes forward and backward score together (multiply each respect score by a lambda beforehand). Append this result to a list along with the associated node
PACSUM randomly shuffles the list to avoid any bias, sort the list by the highest scores, extract top K sentences from the shuffled/sorted list

This will be relatively easy to implement in Python, my concern would be grabbing all the sentence nodes from the associated documents using Cypher

Provide feedback