Knowledge Hub is a semantic search application for enterprise knowledge based on a enterprise knowledge graph. It consists of 2 parts:
- KHub Builder - a pipeline for automatic construction of knowledge graphs based on Apache Jena framework
- KHub Explorer - an overlay search application for interacting with Apache Fuseki via full-text search
Enterprise knowledge includes company internal data distributed across multiple platforms such as Atlassian Confluence, Microsoft SharePoint/Teams, Slack, Notion, etc. This data is present in form of messages, Wiki pages, documents and can be integrated in one common system for searching, analyzing and other purposes.
- Java SE 17 or higher
- Docker (Desktop / Engine) with Docker Engine 19.03.0 or higher
- Create Access Token(s):
- Confluence API Token (see link). The token is connected to an account and has the form
me@example.com:my-api-token
. For crawling open Confluence groups no token is required, usenull
asconfluence.token
. - Microsoft Graph API Token (see link). The app must be registered in Azure AD. The user needs to be member of all MS Teams that should be processed. The following permissions are also required:
- Team.ReadBasic.All
- Channel.ReadBasic.All
- ChannelMessage.Read.All
- Confluence API Token (see link). The token is connected to an account and has the form
- Fill in the
confluence.endpoint
,confluence.token
andteams.token
in the KHub Builder configuration file. For Confluence Cloud usehttps://your-domain.atlassian.net/wiki/
, for Confluence Server -http://your-domain.com/
, the application will prepend the second part of the endpoint starting withrest/api/...
. Use tokens from 1. ornull
for sending crawling requests without authentication header. Make other configuration adjustments if needed. - (Optional) Define queries and mapping files for additional knowledge extraction, see examples in Knowledge Graph Enriching.
- Build the Docker images manually with
docker-compose build
from resources folder, otherwise some pipeline steps might fail or get stuck and will need to be rerun. - Run the pipeline completely or only its separate steps with
java -jar khub-builder.jar
, see additional commands with--help
. After successful run, several named graphs are constructed and saved in Jena TDB in the directory configured intdb.path
. - Run
start-khub-explorer.bat
ordocker-compose up
in the explorer directory for starting Jena.TextIndexer, Fuseki Server and KHub Explorer. If needed, Fuseki assembler file and docker-compose.yaml can be modified. The KHub Explorer will start locally allowing full-text search on the constructed knowledge graph.
Dependency | Version | License |
---|---|---|
Apache Maven | 3.9.0 | Apache 2.0 |
Apache Jena | 4.7.0 | Apache 2.0 |
Bootstrap | 5.2.3 | MIT |
Google Gson | 2.10.1 | Apache 2.0 |
Jsoup | 1.15.3 | MIT |
MongoDB | latest-image | License |
MongoDB Java Driver | 3.12.11 | Apache 2.0 |
React | 18.2.0 | MIT |
React Bootstrap | 2.7.2 | MIT |
React Scripts | 5.0.1 | MIT |
RML Mapper | latest-image | MIT |