Proposal: Remove the concept of "reduction_features" #4488

jackgerrits · 2023-02-06T18:25:50Z

Reduction features was originally added (#2282) as a means to split out the contents of a label that are also required for prediction.

I think this goal has been unmet by reduction_features as they are today and reduction_features have started to be used for other purposes as the general nature of them is attractive as a generic channel.

I propose we remove the concept of reduction_features and propose a short term and long term solution to both the current usages of reduction_features and the general requirement of label state and predict-only state.

The lack of explicit association of the fields in reduction_features means that you can only implicitly know what fields muyst be used to retrieve problem-relevant information. There is no explicit way to determine what is relevant as there is for labels with VW::label_type_t. The fact fields have been added that do not have an associated label type means that retroactively adding this is tricky.

It also makes user code interacting with input examples more confusing because there is both a label and potentially a reduction_features to edit. reduction_features are not well exposed in bindings despite being critical for describing some existing inputs.

Analysis of current usage

There are 6 fields in the reduction_features as of VW 9.7.

ccb_reduction_features _ccb_reduction_features;
continuous_actions::reduction_features _contact_reduction_features;
simple_label_reduction_features _simple_label_reduction_features;
cb_explore_adf::greedy::reduction_features _epsilon_reduction_features;
large_action_space::las_reduction_features _large_action_space_reduction_features;
cb_graph_feedback::reduction_features _cb_graph_feedback_reduction_features;

`ccb_reduction_features`

This exists in the reduction_features but using it was never implemented. So it can be confusing to see this but the label is still actually used.

`continuous_actions::reduction_features`

The chosen action and entire PDF can be passed to CA which is used when predicting. I think this information is unused in learn.

`simple_label_reduction_features`

The base and initial values are in this type which are used for prediction. They were removed from the label type so there is no duplication.

`cb_explore_adf::greedy::reduction_features`

This contains the epsilon value for this example. This was added to support the epsilon decay reduction. This value is not related to the example but reduction_features are used as a channel to communicate between reductions.

`large_action_space::las_reduction_features`

This contains several things:

Generated interactions
A reference to the shared example
SquareCB gamma

This is used as a channel to communicate between reductions. Generated interactions are calculated near the base of the stack and because of the work LAS does it needs to know what the interactions are for the given input.

Shared example is used to remove the shared features from an action.

SquareCB gamma is used to generate the prediction.

`cb_graph_feedback::reduction_features`

This contains the graph edges for a new enhancement to exploration. This is passed in by the user as input. I am unsure if it is predict-only or used in learning too.

Proposal

The communication between reductions relies on tight coupling of producors and consumers in all of these situations.

There are two kind of uses present, problem input information used during prediction and inter-reduction communication.

For kind 1, we can as a stopgap measure go back to putting this information in the label structure for the given label type.

For kind 2, one approach is to create generic publish/subscribe interfaces for this kind of information. It decouples the reductions which need to use it and the content itself which is published. This only really works for global information. I don't have a great suggestion for shared_example used by large action spaces. Potentially LAS should be above shared_feature_merge if this info is necessary. We would need to clearly define the concurrency requirements of the pub/sub architecture. But allowing changes only on learn seems reasonable and allows safe concurrency without locking requirements.

Splitting of label state and predict-only state

For kind 1, a longer term solution is to allow the Example, MultiEx type to be expanded into problem specific types that a reduction can specify. This seems to be a requirement we keep hitting up against. It will allow things like shared, slot, etc to be expressed as first class concepts in the "input type". One drawback is that parsers need to be able to produce more rich "input types" whereas they have managed to be quite generic up until now.

In this world, predict would only need the "input type", and learn would require this in addition to the label.

For example, a contextual bandit input type might looks something along the lines of:

struct CBExample
{
    std::optional<FeatureGroups> shared_features;
    std::vector<FeatureGroups> actions;
    bool have_shared_features_been_merged_into_actions;
}

The text was updated successfully, but these errors were encountered:

peterychang · 2023-02-06T21:43:16Z

reduction features was always a bit of a hack to get around the few bits of information that didn't abide by the "reductions are wholly self-contained" idea.

We should probably either go in fully on that concept and prohibit reductions from passing reduction-specific information between each other, or give up on the idea of self-contained reductions and formalize a side-channel. We could always split reductions into 2 classes (self-contained vs intertwined), but that just seems icky.

jackgerrits · 2023-02-06T21:55:37Z

The side channel could be achieved by being more strict with what can be passed and formalizing the allowed things as supported "services" (via Workspace). Which really is just glorified global state with more accountability in who uses it and an interface to allow for runtime checks.

peterychang · 2023-02-07T18:38:30Z

Maybe a sort of a "broadcast" pattern. Reductions could subscribe to specific reductions or pieces of data, and those are the only ones they would have access to. The owner of the data would be the only one that could modify the data.

jackgerrits · 2023-02-07T18:41:35Z

Yeah that sounds good to me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Remove the concept of "reduction_features" #4488

Proposal: Remove the concept of "reduction_features" #4488

jackgerrits commented Feb 6, 2023 •

edited

Loading

peterychang commented Feb 6, 2023

jackgerrits commented Feb 6, 2023 •

edited

Loading

peterychang commented Feb 7, 2023

jackgerrits commented Feb 7, 2023

Proposal: Remove the concept of "reduction_features" #4488

Proposal: Remove the concept of "reduction_features" #4488

Comments

jackgerrits commented Feb 6, 2023 • edited Loading

Analysis of current usage

ccb_reduction_features

continuous_actions::reduction_features

simple_label_reduction_features

cb_explore_adf::greedy::reduction_features

large_action_space::las_reduction_features

cb_graph_feedback::reduction_features

Proposal

Splitting of label state and predict-only state

peterychang commented Feb 6, 2023

jackgerrits commented Feb 6, 2023 • edited Loading

peterychang commented Feb 7, 2023

jackgerrits commented Feb 7, 2023

jackgerrits commented Feb 6, 2023 •

edited

Loading

`ccb_reduction_features`

`continuous_actions::reduction_features`

`simple_label_reduction_features`

`cb_explore_adf::greedy::reduction_features`

`large_action_space::las_reduction_features`

`cb_graph_feedback::reduction_features`

jackgerrits commented Feb 6, 2023 •

edited

Loading