Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_is_annotation_tid() in data_store exceptions throwing causing (significant) slowing down in typical usage scenarios (such as NLP) #923

Closed
J007X opened this issue Mar 1, 2023 · 1 comment · Fixed by #925
Assignees

Comments

@J007X
Copy link
Collaborator

J007X commented Mar 1, 2023

Describe the bug
In some routine profiling check on performance on typical usage scenarios , (significant) slowdown is detected after some recent code change. After analysis, _is_annotation_tid() in data_store is identified as the method currently consuming too much time and causing slow down. After detailed analysis and further debugging, it looks like with the recent underlying changes making the look up of the dictionaries (populated by methods affected by changes) different than before , and newly added exception handling code also affects performance.

To Reproduce
Steps to reproduce the behavior:

  1. using standard profiling test code such as in Profiling new data pack speed #805 (using a standard pipeline for NLTK based POS tagger and NER processing)
  2. Perform profiling test in PyCharm
  3. See performance difference, and the Cprofile analysis report and Call Graph it generated

Expected behavior
_is_annotation_tid() is identified to be "hot"(consumes significant time) by cprofile. Also further debugging shows excessive exceptions (more than previous version) was thrown from it.

Environment (please complete the following information):

  • OS: All
  • Version : current code base (0.3, snapshot of Feb.06 or Feb.14)
  • Python and Package verions: 3.8

Additional context
(Currently investigating) recent underlying code change related to tid, entry and related code for populating related dictionary

@J007X J007X self-assigned this Mar 1, 2023
@J007X J007X changed the title _is_annotation_tid() in data_store throwing (significantly) more exceptions (than before) and causing slowing down in typical usage scenarios _is_annotation_tid() in data_store exceptions throwing causing (significant) slowing down in typical usage scenarios (such as NLP) Mar 13, 2023
@J007X
Copy link
Collaborator Author

J007X commented Mar 13, 2023

Adjust the title and description slightly to reflect latest investigation results -- after adding some tracing/debugging code it seems the slow down is caused by exception throwing code newly added (in which the Call Graph identified related method as "green" as they are system method). The dictionary access was different from before however the changes are small and not causing this significant changes in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant