-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Remove the concept of "reduction_features" #4488
Comments
reduction features was always a bit of a hack to get around the few bits of information that didn't abide by the "reductions are wholly self-contained" idea. We should probably either go in fully on that concept and prohibit reductions from passing reduction-specific information between each other, or give up on the idea of self-contained reductions and formalize a side-channel. We could always split reductions into 2 classes (self-contained vs intertwined), but that just seems icky. |
The side channel could be achieved by being more strict with what can be passed and formalizing the allowed things as supported "services" (via Workspace). Which really is just glorified global state with more accountability in who uses it and an interface to allow for runtime checks. |
Maybe a sort of a "broadcast" pattern. Reductions could subscribe to specific reductions or pieces of data, and those are the only ones they would have access to. The owner of the data would be the only one that could modify the data. |
Yeah that sounds good to me |
Reduction features was originally added (#2282) as a means to split out the contents of a label that are also required for prediction.
I think this goal has been unmet by
reduction_features
as they are today andreduction_features
have started to be used for other purposes as the general nature of them is attractive as a generic channel.I propose we remove the concept of
reduction_features
and propose a short term and long term solution to both the current usages ofreduction_features
and the general requirement of label state and predict-only state.The lack of explicit association of the fields in
reduction_features
means that you can only implicitly know what fields muyst be used to retrieve problem-relevant information. There is no explicit way to determine what is relevant as there is for labels withVW::label_type_t
. The fact fields have been added that do not have an associated label type means that retroactively adding this is tricky.It also makes user code interacting with input examples more confusing because there is both a label and potentially a
reduction_features
to edit.reduction_features
are not well exposed in bindings despite being critical for describing some existing inputs.Analysis of current usage
There are 6 fields in the
reduction_features
as of VW 9.7.ccb_reduction_features
This exists in the
reduction_features
but using it was never implemented. So it can be confusing to see this but the label is still actually used.continuous_actions::reduction_features
The chosen action and entire PDF can be passed to CA which is used when predicting. I think this information is unused in learn.
simple_label_reduction_features
The base and initial values are in this type which are used for prediction. They were removed from the label type so there is no duplication.
cb_explore_adf::greedy::reduction_features
This contains the
epsilon
value for this example. This was added to support the epsilon decay reduction. This value is not related to the example butreduction_features
are used as a channel to communicate between reductions.large_action_space::las_reduction_features
This contains several things:
This is used as a channel to communicate between reductions. Generated interactions are calculated near the base of the stack and because of the work LAS does it needs to know what the interactions are for the given input.
Shared example is used to remove the shared features from an action.
SquareCB gamma is used to generate the prediction.
cb_graph_feedback::reduction_features
This contains the graph edges for a new enhancement to exploration. This is passed in by the user as input. I am unsure if it is predict-only or used in learning too.
Proposal
The communication between reductions relies on tight coupling of producors and consumers in all of these situations.
There are two kind of uses present, problem input information used during prediction and inter-reduction communication.
For kind 1, we can as a stopgap measure go back to putting this information in the label structure for the given label type.
For kind 2, one approach is to create generic publish/subscribe interfaces for this kind of information. It decouples the reductions which need to use it and the content itself which is published. This only really works for global information. I don't have a great suggestion for shared_example used by large action spaces. Potentially LAS should be above shared_feature_merge if this info is necessary. We would need to clearly define the concurrency requirements of the pub/sub architecture. But allowing changes only on learn seems reasonable and allows safe concurrency without locking requirements.
Splitting of label state and predict-only state
For kind 1, a longer term solution is to allow the Example, MultiEx type to be expanded into problem specific types that a reduction can specify. This seems to be a requirement we keep hitting up against. It will allow things like shared, slot, etc to be expressed as first class concepts in the "input type". One drawback is that parsers need to be able to produce more rich "input types" whereas they have managed to be quite generic up until now.
In this world, predict would only need the "input type", and learn would require this in addition to the label.
For example, a contextual bandit input type might looks something along the lines of:
The text was updated successfully, but these errors were encountered: