Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): add detection and recovery for missing mutation events #7576

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

bjoerge
Copy link
Member

@bjoerge bjoerge commented Oct 3, 2024

Description

In very rare cases, the /listen endpoint may drop mutation events. This PR implements detection of "holes" in the received mutation events, and implements error recovery by restarting the connection.

What to review

Are the threshold values sensible? Error detection and recovery is now triggered by the following thresholds.

  • DEFAULT_MAX_BUFFER_SIZE=20: If we se more than 20 mutation events that can't be applied in order we treat it as an error
  • DEFAULT_DEADLINE_MS=30_000: If 30 seconds pass since we last received a message that can't be applied in order we treat it as an error

Testing

This PR includes unit tests and has also undergone extensive manual testing.

Notes for release

  • Fixes an issue that could in rare cases lead to an outdated version of the document being displayed locally

@bjoerge bjoerge requested a review from ricokahler October 3, 2024 14:37
Copy link

vercel bot commented Oct 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
page-building-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 3:53pm
performance-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 3:53pm
test-compiled-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 3:53pm
test-next-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 3:53pm
test-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 7, 2024 3:53pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
studio-workshop ⬜️ Ignored (Inspect) Visit Preview Oct 7, 2024 3:53pm

Copy link
Contributor

github-actions bot commented Oct 3, 2024

No changes to documentation

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Component Testing Report Updated Oct 7, 2024 3:50 PM (UTC)

✅ All Tests Passed -- expand for details
File Status Duration Passed Skipped Failed
comments/CommentInput.spec.tsx ✅ Passed (Inspect) 45s 15 0 0
formBuilder/ArrayInput.spec.tsx ✅ Passed (Inspect) 8s 3 0 0
formBuilder/inputs/PortableText/Annotations.spec.tsx ✅ Passed (Inspect) 30s 6 0 0
formBuilder/inputs/PortableText/copyPaste/CopyPaste.spec.tsx ✅ Passed (Inspect) 39s 11 7 0
formBuilder/inputs/PortableText/copyPaste/CopyPasteFields.spec.tsx ✅ Passed (Inspect) 0s 0 12 0
formBuilder/inputs/PortableText/Decorators.spec.tsx ✅ Passed (Inspect) 17s 6 0 0
formBuilder/inputs/PortableText/DisableFocusAndUnset.spec.tsx ✅ Passed (Inspect) 11s 3 0 0
formBuilder/inputs/PortableText/DragAndDrop.spec.tsx ✅ Passed (Inspect) 3m 0s 0 0 0
formBuilder/inputs/PortableText/FocusTracking.spec.tsx ✅ Passed (Inspect) 46s 15 0 0
formBuilder/inputs/PortableText/Input.spec.tsx ✅ Passed (Inspect) 1m 39s 21 0 0
formBuilder/inputs/PortableText/ObjectBlock.spec.tsx ✅ Passed (Inspect) 1m 16s 18 0 0
formBuilder/inputs/PortableText/PresenceCursors.spec.tsx ✅ Passed (Inspect) 9s 3 9 0
formBuilder/inputs/PortableText/RangeDecoration.spec.tsx ✅ Passed (Inspect) 26s 9 0 0
formBuilder/inputs/PortableText/Styles.spec.tsx ✅ Passed (Inspect) 18s 6 0 0
formBuilder/inputs/PortableText/Toolbar.spec.tsx ✅ Passed (Inspect) 35s 12 0 0
formBuilder/tree-editing/TreeEditing.spec.tsx ✅ Passed (Inspect) 0s 0 3 0
formBuilder/tree-editing/TreeEditingNestedObjects.spec.tsx ✅ Passed (Inspect) 0s 0 3 0

Copy link
Contributor

github-actions bot commented Oct 3, 2024

⚡️ Editor Performance Report

Updated Mon, 07 Oct 2024 16:02:37 GMT

Benchmark reference
latency of sanity@latest
experiment
latency of this branch
Δ (%)
latency difference
article (title) 17.1 efps (59ms) 17.1 efps (59ms) +0ms (-/-%)
article (body) 56.2 efps (18ms) 54.8 efps (18ms) +0ms (+2.5%)
article (string inside object) 18.5 efps (54ms) 17.9 efps (56ms) +2ms (+3.7%)
article (string inside array) 14.5 efps (69ms) 13.9 efps (72ms) +3ms (+4.3%)
recipe (name) 29.4 efps (34ms) 27.8 efps (36ms) +2ms (+5.9%)
recipe (description) 33.9 efps (30ms) 32.3 efps (31ms) +2ms (+5.1%)
recipe (instructions) 99.9+ efps (7ms) 99.9+ efps (7ms) +0ms (-/-%)
synthetic (title) 14.3 efps (70ms) 14.5 efps (69ms) -1ms (-1.4%)
synthetic (string inside object) 14.5 efps (69ms) 14.7 efps (68ms) -1ms (-1.4%)

efps — editor "frames per second". The number of updates assumed to be possible within a second.

Derived from input latency. efps = 1000 / input_latency

Detailed information

🏠 Reference result

The performance result of sanity@latest

Benchmark latency p75 p90 p99 blocking time test duration
article (title) 59ms 67ms 85ms 194ms 1272ms 14.9s
article (body) 18ms 20ms 23ms 142ms 279ms 5.9s
article (string inside object) 54ms 57ms 62ms 155ms 968ms 8.9s
article (string inside array) 69ms 73ms 85ms 203ms 1851ms 10.3s
recipe (name) 34ms 36ms 40ms 88ms 102ms 9.3s
recipe (description) 30ms 32ms 36ms 54ms 40ms 6.4s
recipe (instructions) 7ms 9ms 10ms 23ms 0ms 3.4s
synthetic (title) 70ms 75ms 88ms 625ms 2979ms 18.4s
synthetic (string inside object) 69ms 75ms 84ms 312ms 2562ms 11.1s

🧪 Experiment result

The performance result of this branch

Benchmark latency p75 p90 p99 blocking time test duration
article (title) 59ms 65ms 73ms 187ms 1263ms 14.0s
article (body) 18ms 24ms 40ms 146ms 212ms 6.4s
article (string inside object) 56ms 58ms 66ms 174ms 1053ms 8.9s
article (string inside array) 72ms 76ms 94ms 223ms 2089ms 10.7s
recipe (name) 36ms 39ms 44ms 98ms 97ms 9.4s
recipe (description) 31ms 35ms 57ms 123ms 93ms 6.8s
recipe (instructions) 7ms 9ms 10ms 50ms 0ms 3.4s
synthetic (title) 69ms 73ms 92ms 356ms 2280ms 16.3s
synthetic (string inside object) 68ms 72ms 89ms 497ms 2214ms 10.5s

📚 Glossary

column definitions

  • benchmark — the name of the test, e.g. "article", followed by the label of the field being measured, e.g. "(title)".
  • latency — the time between when a key was pressed and when it was rendered. derived from a set of samples. the median (p50) is shown to show the most common latency.
  • p75 — the 75th percentile of the input latency in the test run. 75% of the sampled inputs in this benchmark were processed faster than this value. this provides insight into the upper range of typical performance.
  • p90 — the 90th percentile of the input latency in the test run. 90% of the sampled inputs were faster than this. this metric helps identify slower interactions that occurred less frequently during the benchmark.
  • p99 — the 99th percentile of the input latency in the test run. only 1% of sampled inputs were slower than this. this represents the worst-case scenarios encountered during the benchmark, useful for identifying potential performance outliers.
  • blocking time — the total time during which the main thread was blocked, preventing user input and UI updates. this metric helps identify performance bottlenecks that may cause the interface to feel unresponsive.
  • test duration — how long the test run took to complete.

Copy link
Contributor

@ricokahler ricokahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

}
}

const DEFAULT_MAX_BUFFER_SIZE = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth mentioning the hard coded value in buffered document in a comment above

Copy link
Contributor

@ricokahler ricokahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👩‍🍳💋

@bjoerge bjoerge added this pull request to the merge queue Oct 7, 2024
Merged via the queue into next with commit 8195c96 Oct 7, 2024
63 checks passed
@bjoerge bjoerge deleted the sdx-1652 branch October 7, 2024 22:30
github-merge-queue bot pushed a commit that referenced this pull request Oct 8, 2024
…ror` (#7595)

### Description

The following builds off of #7576. This PR:

- Propagates the `PairListenerOptions` through the document-store
- Adds telemetry for when the `onSyncErrorRecovery` fires.

### What to review

- Did I cover all APIs that need to propagate this? Seemed best to add
it to the `document-store` options instead of each individual method
- Am I logging the telemetry event correctly?

### Testing

- There were no existing tests for the document-store so I just manually
tested that this fired
- I also went through all the types and ensured that the option was
propagated correctly

### Notes for release

N/A - this one builds off of #7576 but is mostly internal changes

---------

Co-authored-by: Bjørge Næss <bjoerge@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants