Geo Vec Model

We introduce a novel document representation learning model, Geometric Document Vectors (Geo-Vec). Inspired by recent developments in geometric deep learning our model encodes documents as graphs and treats an entire corpus as the result of a latent document topology manifold. Using a modified graph auto-encoder (GAE), our approach successfully propagates complex word relations utilizing the shared weights, thus creating a semantically rich latent space. An attention module is included, that serves as a topic filter to compress learned embeddings. We compare our model to several classic document representation learning models on an information retrieval task, and show that Geo-Vec performs on par or outperforms. The shared weights of the model only depend on the vocabulary and can thus enables training on very large corpora. Additionally, inference on unseen documents can be done efficiently by a simple forward pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Geo Vec Model

Files

README.md

Latest commit

History

README.md

File metadata and controls

Geo Vec Model