We introduce a novel document representation learning model, Geometric Document Vectors (Geo-Vec). Inspired by recent developments in geometric deep learning our model encodes documents as graphs and treats an entire corpus as the result of a latent document topology manifold. Using a modified graph auto-encoder (GAE), our approach successfully propagates complex word relations utilizing the shared weights, thus creating a semantically rich latent space. An attention module is included, that serves as a topic filter to compress learned embeddings. We compare our model to several classic document representation learning models on an information retrieval task, and show that Geo-Vec performs on par or outperforms. The shared weights of the model only depend on the vocabulary and can thus enables training on very large corpora. Additionally, inference on unseen documents can be done efficiently by a simple forward pass.
-
Notifications
You must be signed in to change notification settings - Fork 1
gverkes/Geo-Vec-Model
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Implementation of the Geo-Vec model for embedding documents
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published