Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: redact subgraph errors #6344

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

Samjin
Copy link

@Samjin Samjin commented Nov 27, 2024

This breaking change is to update include_subgraph_errors to redact_subgraph_errors with additional blacklist/whitelist capabilities.
Note that test cases are not fully updated yet. I'd like to get input from Apollo team regarding the config proposal.

Reasons we need this:

  1. Some subgraphs may want to pass non-vulnerable data to client via error extensions.
  2. Some clients depend on certain data in error extensions to render UI. Although they shouldn't do it, migrating away from it would take time, especially from natives. While waiting their migration, it allows the original error message and selected keys to be redacted or propagated per subgraph.

Configs

  1. This is a breaking change as long as user defines include_subgraph_errors in router.yaml. However, it shouldn't affect default behavior as the new change will redact subgraph error message and error extensions by default.

Before

include_subgraph_errors:
  all: false

After

redact_subgraph_errors:
  all: true # Will redact error's message and extensions
  1. Additionally, redact specific error extension keys for all subgraphs
redact_subgraph_errors:
    all: true # Will redact message.
    redact_extensions_keys:
      - "service"
      - "exception"
  1. Redact specific subgraphs only
redact_subgraph_errors: 
    all: false # Will not redact error message and extensions.
    redact_extensions_keys:
      - "service"
      - "exception"
    subgraphs:
      product:
        enabled: true # will redact message and keys based on default config. 
      reviews:
        - "service" # Will not redact these keys. Other keys from default config and error message will be redacted.
  1. Redact for all subgraphs, but with subgraph override.
redact_subgraph_errors:
    all: true # Redact all subgraph's error messages. 
    redact_extensions_keys:
      - "service"
      - "exception"
    subgraphs:
      product:
        enabled: false # Will not redact anything.
      reviews:
        - "service" # Will not redact these keys. Other keys from default config and error message will be redacted.

There is no issue number yet.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@Samjin Samjin requested review from a team as code owners November 27, 2024 04:17
@apollo-cla
Copy link

@Samjin: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Nov 27, 2024

✅ Docs Preview Ready

No new or changed pages found.

@router-perf
Copy link

router-perf bot commented Nov 27, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@Samjin Samjin changed the title WIP: Redact subgraph errors feat: redact subgraph errors Nov 27, 2024
@bnjjj
Copy link
Contributor

bnjjj commented Nov 27, 2024

Hey @Samjin I didn't take a look at the code yet but I already have some questions reading the description. It looks like you're renaming an existing feature. Is it backward compatible with the existing one ? We can't introduce breaking changes so I just would like to know if you introduced a migration or if you thoughts about backward compatibility and non breaking changes. It would also be great to have a comparison in PR's description to know how it was before and how it would be with your changes (config wise)

@Samjin
Copy link
Author

Samjin commented Nov 27, 2024

Hey @Samjin I didn't take a look at the code yet but I already have some questions reading the description. It looks like you're renaming an existing feature. Is it backward compatible with the existing one ? We can't introduce breaking changes so I just would like to know if you introduced a migration or if you thoughts about backward compatibility and non breaking changes. It would also be great to have a comparison in PR's description to know how it was before and how it would be with your changes (config wise)

I updated the description. In short, it's a breaking change. The intention to rename the plugin is to make subgraph's override clear.

However we could also update to something like this if it has to be backward compatible.

include_subgraph_errors: 
  all: false
  redact_extensions_keys:
    - "service"
    - "exception"
  subgraphs: 
    product: true # propagate error
    reviews:
       - "service"

I do find include_subgraph_errors harder to use when setting overrides for subgraphs as I often mistakenly set false in subgraph to turn off redaction, but it ends up to be the opposite. But this is subjective, not sure who else shares the same experience 😁 .

@Samjin
Copy link
Author

Samjin commented Nov 27, 2024

Sorry, I just realized this maybe not a breaking change since it won't allow user to build if they don't update their explicit config. Also, the default behavior is the same as include_subgraph_errors if user never defined it.

If this is correct, I can update the description later with more migration steps.

@bnjjj
Copy link
Contributor

bnjjj commented Nov 28, 2024

@Samjin Not being able to run the router after upgrading the version without changing the configuration is considered as a breaking change for us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants