Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data_Pack level attribute(/"primitive-like") direct access interface (for user) design/requirements/considerations #924

Open
6 tasks
J007X opened this issue Mar 9, 2023 · 2 comments · May be fixed by #926

Comments

@J007X
Copy link
Collaborator

J007X commented Mar 9, 2023

Is your feature request related to a problem? Please describe.
Per discussion in earlier meetings and emails, a "Data_Pack" level attribute(/"primitive-like") direct access interface (without using classes) for batch-like/mass retrieval is preferred and thus we need a high-level design/considerations/requirement ticket for this, as this interface will be exposed (like and API) to user , so more discussion is needed. Also this ticket is for organizing sub tasks identified during the requirement/design phase.

Describe the solution you'd like
This (data_pack level) attribute(/primitive-like) direct access interface , will provide higher performance for some typical batch-like/mass retrieval scenarios such as NLP pipeline (such as for POS tagging and NER) using Forte. It also extends the capability for accessing attributes "as range/batch" for one or more tid(s), or using specific type, so that the data can be accessed without the need to using classes (thus avoiding related performance overheads).

Describe alternatives you've considered
several overall design is considered, (including discussion around cached data in data_pack), per recent discussion (with Hector) it is now preferred to focus on the current data_store related implementation to first provide some basic interface (and maybe then later to expand its capabilities).

Some current method design/considerations and sub tasks

  • Using specified list of attribute names, and type name for accessing the attributes/primitive-like data for most frequently used data types in typical scenarios (such as NLP pipeline) (Name suggestion: get_attr_of_type, similar to the "get" method of data_pack but adding attr_names: List[str] and optional attr_ids list as parameter)
  • Using specified list of attribute names (or list of attr_id) and tid (or list of tid) for "range-selecting" for attributes for access (Name suggestion: get_attr_data, this will combine the tid/tids methods and attr_name and attr_id options all into one method, as suggested)
  • return format (for attributes) can be dict for easy access using attribute name (and can be together with entry for compatibility/mixed usage scenarios which could be common)
  • Also, write-access is very likely be needed in additional for read-access to further boost performance (in batch mode)
  • Demo python script
  • Documentation (in source code)

Additional context
This is a higher level interface for user to access , unlike (lower level) interfaces in Data_Store (for provided related services)

@hunterhector
Copy link
Member

Some suggestions:

  1. Suggested to add subtask for data_store level implementation. What are the functions to be implemented?
  2. Suggested to add subtask for a demo python script, and a documentation. These tasks are important, not only can they help future users, they can also help us sort out the easiness of the interface.
  3. The get_data function could be a good starting point, it returns data in primitive "dict" format, and takes a "request" in the method interface. Thus extending this function will allow one to achieve several goals you mentioned.

@J007X
Copy link
Collaborator Author

J007X commented Mar 9, 2023 via email

@J007X J007X linked a pull request Mar 22, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants