Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic caching in Mapper for better performance #17586

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

scofalik
Copy link
Contributor

@scofalik scofalik commented Dec 3, 2024

Suggested merge commit message (convention)

Other (engine): Introduced dynamic caching in Mapper in order to improve view-to-model mapping performance, and as a result improve editor load and data save performance. Closes #17623.

Other (engine): New parameter data was added for change:children event fired by ViewElement and ViewDocumentFragment when their children change. data is an object with index property, which says at which index the change happened. Related to #17623.

Other (engine): Mapper#registerViewToModelLength() is now deprecated and will be removed in one of upcoming releases. Note: if this method is used, the caching mechanism for Mapper will be turned off which may degrade performance when handing big documents. Note: this method is used by the deprecated legacy lists feature. See #17623.

MINOR BREAKING CHANGE (engine): Mapper#registerViewToModelLength() is now deprecated and will be removed in one of the upcoming releases. This method is useful only in obscure and complex converters, where model element, or a group of model elements, are represented very differently in the view. We believe that every feature that uses custom view-to-model length callback can be rewritten in a way that this mechanism is no longer necessary. Note: if this method is used, the caching mechanism for Mapper will be turned off which may degrade performance when handing big documents. Note: this method is used by the deprecated legacy lists feature.

…prove view-to-model mapping performance, and as a result improve editor load and data save performance.
@scofalik
Copy link
Contributor Author

scofalik commented Dec 3, 2024

This PR introduces caching in Mapper which greatly improves performance. Before, we only kept bindings between model elements and view elements, and whenever we wanted to map model position to view, we started in the mapped view element and traversed it until we reach the target model offset. This caused a lot of unnecessary traversals. In this PR, after a node is traversed for the first time, its view position and model offset is cached. Next time Mapper is asked about a model offset, it can use the cache to quickly retrieve the view position.

I included a lot of explanations in the API docs and inline comments, so go through the PR to learn more.

My main assumptions were:

  • Keep the whole logic in one area.
  • Don't force developers to learn/remember about a new mechanism (it should work automagically).

I decided to implement it fully inside Mapper.

Other options I considered were:

  • Make it inside view, i.e. ViewNode, ViewElement, etc. In a similar fashion as with recent model offset evaluation performance improvement. I didn't like it because:
    • View in theory should not care that model exists.
    • View is used in upcast, where it is not mapped, and nothing related to mapping/model is necessary there.
    • Some view elements woud need to know that they are mapped (because we store the cache only for mapped elements).
    • Even though data would be stored in view, creating this data would be inside Mapper, or view would somehow need to know what are views model lengths.
    • Advantage would be that maybe the implementation would be easier to understand. But the way it is now, it is easier to turn off and maybe even to debug, as all is in one place.
  • Make it inside DowncastWriter.
    • Again, changes would need to be introduced both in Writer and Mapper.
    • Changes would need to happen in all Writer methods, which seemed more complex and more prone to errors to me, especially at the beginning.

I decided that making it directly in Mapper makes most sense.


Possible improvement that occured to me at later point, and I didn't try it. Maybe it would be easier to understand and simpler to implement if each traversed node kept total modelOffset so far. Maybe for each mapped element we could keep such flat array of descendants instead of a bit abstract structures that we have so far. It seems that many of the mechanics would be the same, just the structure of MappingCache and CacheItem would be different, possibly simpler. I might it quite abstract in this implementation.

packages/ckeditor5-engine/src/view/documentfragment.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/view/node.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
…rCache` are always "top-most" possible positions for given model offset.

Delete entires from `MapperCache#_nodeToCacheListIndex`.
@scofalik
Copy link
Contributor Author

scofalik commented Dec 9, 2024

Ready for re-review.

@scofalik scofalik requested a review from niegowski December 9, 2024 11:42
@scofalik
Copy link
Contributor Author

scofalik commented Dec 9, 2024

Okay I didn't notice TS failed in other places -_-.

@scofalik
Copy link
Contributor Author

scofalik commented Dec 9, 2024

I reverted some of the changes as they were breaking changes. This is the best I could do without running into more issues.

niegowski
niegowski previously approved these changes Dec 17, 2024
packages/ckeditor5-engine/src/conversion/mapper.ts Outdated Show resolved Hide resolved
@scofalik
Copy link
Contributor Author

Looks like there's a memory leak in the mapper cache. Consecutive editor.getData() calls make the memory snapshot grow with time. I cannot reproduce the same on master.

@scofalik
Copy link
Contributor Author

scofalik commented Dec 18, 2024

I fixed the error in API docs and also fixed the memory leak problem. We cannot be sure if stopTracking() will be called for tracked view elements, hence WeakMaps are used, but since these are WeakMaps, we can't iterate over them to call stopTracking(). (E.g. assuming that Mapper#clearBindings() would call something like MapperCache#clear().)

Another way to go would be to use Maps instead of WeakMaps and try to make sure that stopTracking() and clear() are called when needed, but if possible, I think that using WeakMaps is better, cause it just works.

Unfortunately, this.listenTo() retained view elements that were no longer necessary and should be garbage collected. Hence, I changed it to .on() and now memory is correctly freed.

The problem is that I needed to assign the class methods to properties, so that we can call .off(). If you have a better idea how to do it in a more elegant way, please propose.

On the side note, I am not sure if we need .stopTracking() at all. As I wrote, view elements are not retained by MapperCache, (after .on() is used), so we don't need to actually call .stopTracking() for memory management purposes.

But I am afraid that it may be necessary to prevent some errors? OTOH, it's very unlikely. We stop tracking after view element is unbound in mapper, so the view element at this point is no longer used, most probably. In theory something might "happen" inside it, but probably not, and even if, then MapperCache don't look at Mapper, it has all information it needs to operate. And as we see, we don't even have any pointers to these elements anyway after the conversion is executed. So maybe .stopTracking() is not actually needed and we don't need these property-functions to call .off().

@scofalik
Copy link
Contributor Author

scofalik commented Dec 18, 2024

On the other note, I can see that memory grows anyway, around 0.2 MB after every editor.getData() call. Looks like at least some of that is related to tables/slots. I am afraid that it may grow out of hand after, let's say, an hour of working over a big document. I will create an issue for that. #17668

Fixes in code style and API docs.
@scofalik scofalik requested a review from niegowski December 19, 2024 14:40
niegowski
niegowski previously approved these changes Dec 19, 2024
@@ -893,16 +894,20 @@ export class MapperCache extends /* #__PURE__ */ EmitterMixin() {
// `Mapper#getModelLength()` are implemented so that parents are cached before their children.
//
// So, don't create new cache if one already exists. Instead, only save `_nodeToCacheListIndex` value for the related view node.
if ( viewOffset > 0 ) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to removing this if, I figured out that the way save() is used, we should never pass viewOffset == 0 here. Running all unit tests confirmed this. There were specific unit tests for mapper caching which passed viewOffset = 0 here, but I've changed them as it was an incorrect use of the method.

@scofalik scofalik requested a review from niegowski January 17, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce caching mechanism in Mapper to improve performance
3 participants