Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: async imports from frontend in LOCAL mode #106

Merged
merged 28 commits into from
Nov 13, 2023
Merged

Conversation

ClemDoum
Copy link
Collaborator

@ClemDoum ClemDoum commented Oct 11, 2023

TODOs

  • minimize the number of JS dependencies
  • try to avoid core client duplication in the extension
  • clean TODOs

PR description

This PR adds the Graph statistics and Import tasks sections to the neo4j widget. Previously the widget only add a Export section was availabe.
The statistic section can be used to compare the number of entities found inside neo4j and trigger import (only available in LOCAL mode) if the graph is empty or outdated.

The following screenshot shows an empty neo4j graph:

Screenshot 2023-10-24 at 18 54 17

When the user see that the graph is empty, the Create graph button can be hit and the graph import can be monitored:

Screenshot 2023-10-24 at 18 54 27

The number of documents and entities found in the graph is then updated like this:

Screenshot 2023-10-24 at 18 59 42

Counting graph document and named entities

The widget now displays the number of documents and named entity in the graph. In order to match the number of documents and named entities in Datashare (ES), some aggregation is performed on graph nodes. Indeed the named entity graph model, compresses the information available in ES and there are less named entities nodes in the graph that documents in the ES index.

Changes

plugins/neo4j-dump-widget

Added

  • created a Neo4jGraphCount.vue component to display the counts of document and entities found in neo4j
  • created a Neo4jGraphImport.vue component to monitor full import tasks and trigger graph update
  • copied the EllipseStatus.vue component from datashare-client in order to display tasks progres
  • copied humanDate utils from the main client to the extension to nicely display dates
  • copied namedEntities utils from the main client to the extension to nicely display named entities FA icons
  • copied other utils needed by the two previous ones from the main client to the extension
  • added actions to Neo4jModule.js to refhresh the list of initialized project and project graph counts

Changed

  • updated Neo4jGraphCount.vue to trigger the initialization of the current project (creation of the neo4j DB, migration of the DB etc etc)
  • updated Neo4jModule.js to poll running imports
  • integrated graph count and graph import components in WidgetNeo4jDump.vue
  • improved handling of extension backend HTTP errors

neo4j-app

Added

  • added a GET /graphs/count?project= API route to retrieve the number of entities and documents in the graph
  • added a POST /tasks/search?project= API route to search for tasks, optionally filtering by task type and statuses
  • added to the full_import async task to the worker, the task upserts all documents and named entities from ES to neo4j

Fixed

  • retrieve projects list once in a while when polling from neo4j

src

Added

  • added a GET /api/neo4j/graphs/count?project= API route to retrieve the number of entities and documents in the graph
  • added a POST /api/neo4j/full-imports?project= API route to trigger full import. The route is restricted to the LOCAL usage
  • added a GET /api/neo4j/full-imports?project= API route to list full import tasks

Changed

  • improved the GET /ping route to check underlying component availability
  • renamed _lifespan_worker_pool into lifespan_worker_pool
  • defaulted tasks inputs to an empty dict when missing
  • raise a WorkerCancelled error rather than a RuntimeError when worker gets killed on purpose
  • return task by decreasing creation dates

Fixed

  • fixed GET /config to correclty return neo4j support
  • move registry DB creation to lifetime injection dependencies
  • improved error handling when errors occur in the ESClientABC.to_neo4j when concurrently polling ES (error from concurrent tasks were silent and not reraised)
  • fixed count of ES pages to poll in order to correctly log progress
  • fixed the name of the email header field coming from DS ("emailHeaderField" vs. "emailHeader") and added the "tika_metadata_dc_creator" tag to the list of sending tags
  • avoid logging in DEBUG for ES
  • fixed dependency injection to reraise errors in entering and exiting dependencies

@ClemDoum ClemDoum changed the base branch from main to feature/task-creation-in-java October 11, 2023 11:06
@ClemDoum ClemDoum changed the title Feature/graph import feature: async imports Oct 11, 2023
@ClemDoum ClemDoum changed the title feature: async imports feature: async imports from frontend in LOCAL mode Oct 11, 2023
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch from bfa7d1b to abc7395 Compare October 11, 2023 11:14
@ClemDoum ClemDoum force-pushed the feature/graph-import branch 2 times, most recently from e49691b to 93283e9 Compare October 12, 2023 10:12
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch from abc7395 to 7095123 Compare October 17, 2023 11:04
@ClemDoum ClemDoum force-pushed the feature/graph-import branch from 379c664 to 91e3b8c Compare October 17, 2023 12:10
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch 2 times, most recently from ac79b72 to 9eeed1f Compare October 17, 2023 15:16
@ClemDoum ClemDoum force-pushed the feature/graph-import branch 2 times, most recently from e934960 to f2497a3 Compare October 18, 2023 12:00
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch 2 times, most recently from aa5338f to d230d9f Compare October 19, 2023 15:00
@ClemDoum ClemDoum force-pushed the feature/graph-import branch from 492e6e3 to 627d2a2 Compare October 19, 2023 15:08
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch from d230d9f to f7adb3f Compare October 20, 2023 14:23
@ClemDoum ClemDoum force-pushed the feature/graph-import branch 4 times, most recently from 365601e to 4632ec9 Compare October 23, 2023 16:13
@ClemDoum ClemDoum marked this pull request as ready for review October 24, 2023 17:11
@ClemDoum ClemDoum requested a review from a team as a code owner October 24, 2023 17:11
@ClemDoum ClemDoum self-assigned this Oct 25, 2023
@ClemDoum ClemDoum force-pushed the feature/task-creation-in-java branch from 6f61000 to 83d7af2 Compare October 30, 2023 16:37
@ClemDoum ClemDoum force-pushed the feature/graph-import branch from 9245e77 to e4d9c56 Compare October 31, 2023 17:31
@ClemDoum ClemDoum merged commit d670fa4 into main Nov 13, 2023
3 checks passed
@ClemDoum ClemDoum deleted the feature/graph-import branch November 13, 2023 09:08
@ClemDoum
Copy link
Collaborator Author

Self mergin this one as it pretty monolitic and hard to review, will submit later PRs for proper review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant