You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Per discussion in earlier meetings and emails, a "Data_Pack" level attribute(/"primitive-like") direct access interface (without using classes) for batch-like/mass retrieval is preferred and thus we need a high-level design/considerations/requirement ticket for this, as this interface will be exposed (like and API) to user , so more discussion is needed. Also this ticket is for organizing sub tasks identified during the requirement/design phase.
Describe the solution you'd like
This (data_pack level) attribute(/primitive-like) direct access interface , will provide higher performance for some typical batch-like/mass retrieval scenarios such as NLP pipeline (such as for POS tagging and NER) using Forte. It also extends the capability for accessing attributes "as range/batch" for one or more tid(s), or using specific type, so that the data can be accessed without the need to using classes (thus avoiding related performance overheads).
Describe alternatives you've considered
several overall design is considered, (including discussion around cached data in data_pack), per recent discussion (with Hector) it is now preferred to focus on the current data_store related implementation to first provide some basic interface (and maybe then later to expand its capabilities).
Some current method design/considerations and sub tasks
Using specified list of attribute names, and type name for accessing the attributes/primitive-like data for most frequently used data types in typical scenarios (such as NLP pipeline) (Name suggestion: get_attr_of_type, similar to the "get" method of data_pack but adding attr_names: List[str] and optional attr_ids list as parameter)
Using specified list of attribute names (or list of attr_id) and tid (or list of tid) for "range-selecting" for attributes for access (Name suggestion: get_attr_data, this will combine the tid/tids methods and attr_name and attr_id options all into one method, as suggested)
return format (for attributes) can be dict for easy access using attribute name (and can be together with entry for compatibility/mixed usage scenarios which could be common)
Also, write-access is very likely be needed in additional for read-access to further boost performance (in batch mode)
Demo python script
Documentation (in source code)
Additional context
This is a higher level interface for user to access , unlike (lower level) interfaces in Data_Store (for provided related services)
The text was updated successfully, but these errors were encountered:
Suggested to add subtask for data_store level implementation. What are the functions to be implemented?
Suggested to add subtask for a demo python script, and a documentation. These tasks are important, not only can they help future users, they can also help us sort out the easiness of the interface.
The get_data function could be a good starting point, it returns data in primitive "dict" format, and takes a "request" in the method interface. Thus extending this function will allow one to achieve several goals you mentioned.
Hi Hector for 1) the sub_tasks for data_store is in another design ticket #922 (created for lower level service interfaces not exposed) to keep it more clear focused (as these 2 layers could change independently). please check it out (it is also slightly improved for method considerations per discussion).
Is your feature request related to a problem? Please describe.
Per discussion in earlier meetings and emails, a "Data_Pack" level attribute(/"primitive-like") direct access interface (without using classes) for batch-like/mass retrieval is preferred and thus we need a high-level design/considerations/requirement ticket for this, as this interface will be exposed (like and API) to user , so more discussion is needed. Also this ticket is for organizing sub tasks identified during the requirement/design phase.
Describe the solution you'd like
This (data_pack level) attribute(/primitive-like) direct access interface , will provide higher performance for some typical batch-like/mass retrieval scenarios such as NLP pipeline (such as for POS tagging and NER) using Forte. It also extends the capability for accessing attributes "as range/batch" for one or more tid(s), or using specific type, so that the data can be accessed without the need to using classes (thus avoiding related performance overheads).
Describe alternatives you've considered
several overall design is considered, (including discussion around cached data in data_pack), per recent discussion (with Hector) it is now preferred to focus on the current data_store related implementation to first provide some basic interface (and maybe then later to expand its capabilities).
Some current method design/considerations and sub tasks
Additional context
This is a higher level interface for user to access , unlike (lower level) interfaces in Data_Store (for provided related services)
The text was updated successfully, but these errors were encountered: