Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 948 Bytes

README.md

File metadata and controls

3 lines (2 loc) · 948 Bytes

Geo Vec Model

We introduce a novel document representation learning model, Geometric Document Vectors (Geo-Vec). Inspired by recent developments in geometric deep learning our model encodes documents as graphs and treats an entire corpus as the result of a latent document topology manifold. Using a modified graph auto-encoder (GAE), our approach successfully propagates complex word relations utilizing the shared weights, thus creating a semantically rich latent space. An attention module is included, that serves as a topic filter to compress learned embeddings. We compare our model to several classic document representation learning models on an information retrieval task, and show that Geo-Vec performs on par or outperforms. The shared weights of the model only depend on the vocabulary and can thus enables training on very large corpora. Additionally, inference on unseen documents can be done efficiently by a simple forward pass.