Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic import maps #10528

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

Conversation

yoavweiss
Copy link
Contributor

@yoavweiss yoavweiss commented Jul 30, 2024

Introduction

Import maps in their current form provide multiple benefits to web developers. They enable them to avoid cache invalidation cascades, and to be able to work with more ergonomic bare module identifiers, mapping them to URLs in a convenient way, without worrying about versions when importing.

At the same time, the current import map setup suffers from fragility. Only a single import map can be loaded per document, and it can only be loaded before any module script is loaded. Once a single module script is loaded, import maps are disallowed.
That creates a situation where developers have to think twice (or more) before using module scripts in situations that may introduce import maps further down in the document. It also means that using import maps can carry a risk unless you’re certain you can control all the module scripts loaded on the page.

Beyond that, the fact that import maps have to be loaded before any module means that the map itself acts as a blocking resource to any module functionality. Large SPAs that want to use modules, have to download the map of all potential modules they may need during the app’s lifetime ahead of time.

So, it seems like there’s room for improvement. Enabling more dynamic import maps would allow developers to avoid these issues and fully benefit from import maps’ caching and ergonomic advantages without incurring a cost when it comes to stability or performance.

At the same time, the current static design gives us determinism and isn’t racy. A module identifier that resolves to a certain module will continue to do so throughout the lifetime of the document. It would be good to keep that characteristic.

Objectives

Goals

  • Increase robustness when using ES modules and import maps
  • Enable expanding the Window’s import map throughout its lifetime
  • Satisfy the EcmaScript HostLoadImportedModule requirement that multiple calls will always resolve to the same module
  • Minimize race-conditions which can result in different module resolutions on different loading sequences of the same page.

Non-Goals

  • Provide a programmatic way to expand or modify the Window’s import map - out-of-scope for the current effort
  • Completely avoid race conditions that can result in different module resolutions based on network conditions
    • Such races are already possible today (e.g. if an import map is dynamically injected by a classic script which may or may not run before a module is loaded)
    • Specifically for dynamic modules, requiring this would conflict with the “expanding the import map over time” goal.

Use Cases

Third party scripts

When third party scripts integrate themselves to web pages today, they cannot do that as ES modules without taking on some risk. That risk varies somewhat, depending on their form of integration.

Injected without developer supervision

That could include third party scripts injected by the CDN, by a CMS or some other automated system that isn’t content-aware.

For such scripts to be loaded as ES modules, they have to make sure that they are not loaded before any import maps in the content.

They can do that by:

  • Loading at the bottom of the page, which may or may not correspond with the point in which they typically need to load in order to function optimally.
  • Buffering the content and validating that no import maps are present, which can incur performance penalties.

Developer-injected snippets

For snippets-based 3Ps, they need to provide instructions so that the developer is aware of import maps in their page and only injects the snippet after it. That may or may not be a realistic thing to ask. It’d definitely increase the integration’s complexity, resulting in a higher percentage of failures or support calls.

Content Management Systems

Content management systems often have markup and code arriving from multiple different sources. Site owners, theme developers and application/extension/plug-in developers all take part in generating the final markup of the page delivered to the user, which often contains lots of scripts. Some of that code can be static, while other parts can vary per user.

If any of that code contains an import map, extreme caution needs to be taken when integrating all these different script entry points, if any of them is an ES module.

Browser Extensions

A similar problem exists with browser extensions, where if extension-injected code wants to use ES modules or import maps, it needs to verify ahead of time that it doesn’t collide with the content itself and where the code is added relative to the rest of the page.

Large Single-Page Apps

Serving hundreds to thousands of different modules is a reality for large SPAs. While bundling is used to speed up the loading-performance cost of modules, in later stages of the application lifetime, it doesn’t always make sense to bundle - while it can reduce the weight of modules over the network (by improving compression ratios), it can also cause over-fetching and less-granular caching which can result in frequent invalidations.

So apps end up with several thousands of modules that may load during the lifetime of the app, using dynamic import.
Using import maps can significantly help such apps avoid cache invalidation cascades, but it also presents a challenge.
An import map for such a site needs to include all the thousands of different modules it may import, and it needs to do that before any module loads. As such, the quite-large import map would be blocking any module-based functionality. That’s a significant performance tradeoff.

Usage examples

Two import maps with no conflicts

When two import maps that have no conflicting rules are being merged, and no resolved modules correspond to the rules defined, the resulting map would be a combination of the two maps.

So, the following existing and new import maps:

{
   "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}
{
  "imports": {
    "/app/main": "./main/index.mjs"
  }
}

Would be equivalent to the following single import map:

{
  "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js",
    "/app/main": "./main/index.mjs"
  }
}

New import map defining an already-resolved specifier

When an import map tries to define a rule that would have resolved a module that was already resolved, that rule gets dropped from the import map.

So, if the resolved module set already contains the pair (null, "/app/helper"), the following new import map:

{
   "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

Would be equivalent to the following one:

{
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

New import map defining an already-resolved specifier in a specific scope

The same it true for rules defined in specific scopes. If the resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:

{
  "scopes": {
    "/app/": {
      "/app/helper": "./helper/index.mjs"
    },
  }
   "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

Would similarly be equivalent to:

{
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

The script in the pair is the script object itself, rather than its URL, so these examples are somewhat simplistic in that regard.

Already-resolved specifier and multiple rules redefining it

We could also have cases where a single already-resolved specifier has multiple rules for its resolution, depending on the referring script. In such cases, only the most specific rule would not be added to the map.

For example, if the resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:

{
  "scopes": {
    "/app/": {
      "/app/helper": "./helper/index.mjs"
    },
  }
   "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
    "/app/helper": "./other_path/helper/index.mjs"
  }
}

Would similarly be equivalent to:

{
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
    "/app/helper": "./other_path/helper/index.mjs"
  }
}

This is achieved by the fact that the merge algorithm uses a copy of the resolved module set and removes already referring script specifier pairs from it if they already resulted in a rule being ignored.

Two import maps with conflicting rules

When an existing and new import maps that have conflicting rules are being merged, and no resolved modules correspond to the rules defined, the first defined rules persist.

For example, the following existing and new import maps:

{
   "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}
{
  "imports": {
    "/app/helper": "./main/helper/index.mjs"
  }
}

Would be equivalent to the following single import map:

{
  "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js",
  }
}

High-level design

At a high-level, we want a module resolution cache that will ensure that a resolved module identifier always resolves to the same module. That is implemented using the "resolved module set", which ensures that URLs for modules that were already resolved cannot be added to future import maps.

We also want top-level imports that start loading a module tree won’t have that tree change “under their feet” due to an import map that was loaded in parallel. That is achieved by providing a copy of the import maps to the module resolution algorithm of these top-level modules and propagating it recursively down its module tree.

And finally, we want a way to create a single, coherent import map from multiple import map scripts loaded on the document. That is done with the "merge new and existing import maps" algorithm.

(See WHATWG Working Mode: Changes for more details.)


/infrastructure.html ( diff )
/links.html ( diff )
/scripting.html ( diff )
/webappapis.html ( diff )

@yoavweiss yoavweiss marked this pull request as draft July 30, 2024 07:55
@yoavweiss yoavweiss marked this pull request as ready for review July 31, 2024 10:57
source Outdated Show resolved Hide resolved
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing a relatively detailed review anyway, although for some repeated editorial issues I stopped commenting.

I have two major questions:

  • The merge algorithm needs examples, and maybe explanatory text. I can follow most of the steps (modulo some bugs), but I can't figure out the intent. The examples can either use the JSON syntax, or the normalized syntax seen below https://html.spec.whatwg.org/#parse-an-import-map-string if that is helpful in giving extra clarity. The impact of the resolved set is particularly unclear.

  • I don't understand why the import map is being passed around so much. There's still always one import map per global, and it's easy to get to that global from any algorithm or from the "script" struct. At least one instance of this seems completely redundant, which I commented on. But e.g. why are you storing the import map in [[HostDefined]]? I realize there's probably some complexity here at the particular point in time when you're merging import maps and thus the global's import map changes, but that should be able to happen completely discretely between script parsing and execution, so I don't see why scripts should need to track individual import maps separate from the global one.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss yoavweiss changed the title Dynamic module imports Dynamic import maps Aug 1, 2024
Copy link
Contributor Author

@yoavweiss yoavweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed some comments and addressing the rest soon!

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

  • I don't understand why the import map is being passed around so much. There's still always one import map per global, and it's easy to get to that global from any algorithm or from the "script" struct. At least one instance of this seems completely redundant, which I commented on. But e.g. why are you storing the import map in [[HostDefined]]? I realize there's probably some complexity here at the particular point in time when you're merging import maps and thus the global's import map changes, but that should be able to happen completely discretely between script parsing and execution, so I don't see why scripts should need to track individual import maps separate from the global one.

My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others.
If that's not possible for some reason, I'm happy to revert these changes.

@yoavweiss yoavweiss requested a review from domenic August 1, 2024 18:59
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful, thanks!

My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others.
If that's not possible for some reason, I'm happy to revert these changes.

That makes perfect sense.

Given this, we should explain this in the spec, maybe around #concept-window-import-map. With a note that in general only the root of a loading operation will access concept-window-import-map, and otherwise it'll be threaded through.

With that frame, auditing all the call sites of concept-window-import-map...

  • "resolve a module integrity metadata" seems suspicious. It should probably get an import map threaded to it?
  • "fetch the descendants of and link" seems suspicious. Shouldn't it be getting threaded an import map from its various callers? (per the diagram above it.)
  • "register an import map" has a broken assert

I'm also a bit unsure now about the cases where an import map is not passed in. When is that possible? (Except workers.) We have fallbacks to the Window's import map in those cases, but I'm now questioning whether they're sound.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
Copy link
Contributor Author

@yoavweiss yoavweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! editorial fixes first :)

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

Given this, we should explain this in the spec, maybe around #concept-window-import-map. With a note that in general only the root of a loading operation will access concept-window-import-map, and otherwise it'll be threaded through.

Added

  • "resolve a module integrity metadata" seems suspicious. It should probably get an import map threaded to it?

This was indeed lacking. Should be fixed now.

  • "fetch the descendants of and link" seems suspicious. Shouldn't it be getting threaded an import map from its various callers? (per the diagram above it.)

Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the Record.

  • "register an import map" has a broken assert

Indeed!!

@yoavweiss
Copy link
Contributor Author

I'm also a bit unsure now about the cases where an import map is not passed in. When is that possible? (Except workers.) We have fallbacks to the Window's import map in those cases, but I'm now questioning whether they're sound.

Let me try to enumerate the cases:

I think that covers all of them but let me know if I missed something.

@yoavweiss yoavweiss requested a review from domenic August 2, 2024 14:50
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the Record.

I think I see. Because the callers all operate on URLs or inline scripts, so they didn't need to do any resolution, and so didn't need an import map. It's only for the descendants that you start doing resolution and thus start needing an import map.

This does have the slightly-strange impact that given something like

<script type=module src=my-script.mjs></script>
<script type=importmap>
...
</script>

the modifications that appear after the <script type=module> will apply to the imports of my-script.mjs, because we delay snapshotting the import map until the response from the server comes back. That seems a bit unfortunate; WDYT?

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss yoavweiss requested a review from domenic August 5, 2024 14:59
@yoavweiss
Copy link
Contributor Author

I think I see. Because the callers all operate on URLs or inline scripts, so they didn't need to do any resolution, and so didn't need an import map. It's only for the descendants that you start doing resolution and thus start needing an import map.

This does have the slightly-strange impact that given something like

<script type=module src=my-script.mjs></script>
<script type=importmap>
...
</script>

the modifications that appear after the <script type=module> will apply to the imports of my-script.mjs, because we delay snapshotting the import map until the response from the server comes back. That seems a bit unfortunate; WDYT?

Forgot to address this part.. I agree that this would be weird, and hence it'd be better to pipe in the import map in those cases. I'll do that.

@yoavweiss
Copy link
Contributor Author

Forgot to address this part.. I agree that this would be weird, and hence it'd be better to pipe in the import map in those cases. I'll do that.

Done!

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea for something that might clean things up. Basically, I find the importMap being optional confusing. It's hard to know whether an algorithm is not getting an import map because we forgot, or because we're coming from a worker. And the fact that sometimes we fall back to the Window's import map, even though we're not in an obviously "top level" algorithm, is extra confusing. (For example, in "create a JavaScript module script.)

I think the following would clean that up:

  • Move "Window's import map" (and Window's resolved module set?) to all global objects. Put them next to https://html.spec.whatwg.org/#in-error-reporting-mode . Add a note explaining that only Window objects have their import maps modified away from the initial empty import map, for now.
  • Make all import map arguments mandatory.
  • Always grab the global object's import map when appropriate. This should now be obviously only at top-level situations.

WDYT?

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

I have an idea for something that might clean things up. Basically, I find the importMap being optional confusing. It's hard to know whether an algorithm is not getting an import map because we forgot, or because we're coming from a worker. And the fact that sometimes we fall back to the Window's import map, even though we're not in an obviously "top level" algorithm, is extra confusing. (For example, in "create a JavaScript module script.)

I think the following would clean that up:

  • Move "Window's import map" (and Window's resolved module set?) to all global objects. Put them next to https://html.spec.whatwg.org/#in-error-reporting-mode . Add a note explaining that only Window objects have their import maps modified away from the initial empty import map, for now.
  • Make all import map arguments mandatory.
  • Always grab the global object's import map when appropriate. This should now be obviously only at top-level situations.

WDYT?

SG. done!

@yoavweiss
Copy link
Contributor Author

In case it's useful for the review process - I mapped the high-level relevant spec changes to Chromium's code.

@Jamesernator
Copy link

Jamesernator commented Aug 10, 2024

At the same time, the current static design gives us determinism and isn’t racy. A module identifier that resolves to a certain module will continue to do so throughout the lifetime of the document. It would be good to keep that characteristic.

Has an alternative design been considered that doesn't mutate a shared global map?

For example perhaps associating maps to individual script tags?:

<!-- importmap="..." would only affect the loading of this graph -->
<script type="module" src="./entry.js" importmap="./entry.importmap.json"></script>

This alternative design could even allow for explaining importmaps in terms of import attributes, i.e. if you want to load a third-party module with a third-party importmap you could do:

import thirdParty from "./dist/third-party.js" with { importmap: "./dist/third-party.importmap.json" }

@jeff-hykin
Copy link

jeff-hykin commented Aug 10, 2024

I almost gave a nearly-equivlent comment yesterday but was afraid I was misunderstanding something. So I'm glad you spoke up @Jamesernator.

My draft example was literally:

await import("./dist/third-party.js", { withMap: "./dist/third-party.importmap.json"})
import thirdParty from "./dist/third-party.js" withMap "./dist/third-party.importmap.json"

Concerns with mutating a shared global map

  1. I don't see any discussion about risks, possible adverse side effects, or security.

Consider a dynamic const bcrypt = await import("/path/to/bcrypt.js") written by a site author. Right now they have full confidence that the import will either load what they expect or throw an error. I'm not convinced that letting browser extensions break that assumption won't lead to new attack vectors. While I believe browser extensions can already affect the top level import map, its still not clear to me that dynamic changes are not more risky.

  1. I don't see discussion about adverse side effects.

As I understand it, the proposal has side effects due to this being a global map. E.g. loading a brower extension can change the import behavior of non-browser-extension imports. Extensions often do not want side effects: case-in-point the motivating usecase at the top (this one) does not want side effects, it just wants to map-imports for extension1, not for extension2.

If extension1 global-mutates import-map for A
If extension2 global-mutates import-map for A

IMO extension-loading order creates a unnecessary race condition where extension1 overrides extnesion2's import map, along with extension2's developer having no good way to debug when-and-why their extension broke. (And even once they do figure out why there is basically no solution since telling the user to change the load order is impractical, they have to go back to square1 of bundling everything to be reliable)

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing to see this, I only had time to do a very very brief review but I really like the overall approach. Specifically, I have some concerns about algorithmic complexity, but haven't had a chance to look more carefully at the algorithm to determine if they are "resolvable" yet.

Then for the copying approach - if we have a well-defined deduping that won't change the nature of mappings, what is the reason for wanting to lock down resolutions during individual load operations? Will this locked map also affect dynamic import or is it only cloned through the static graph? And if so, it might seem strange having different resolution rules for dynamic import and static import, especially when ECMA-262 also maintains its own cache to ensure these are consistent for known imports.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@bakkot
Copy link
Contributor

bakkot commented Aug 11, 2024

I probably won't have time to do a thorough review of this, but @michaelficarra and I spent quite a while thinking about (some parts of) this problem back when import maps were first being discussed, so I want to call @yoavweiss's attention to this issue which @michaelficarra wrote at the time.

I don't see anything in the OP about the case where the second import map maps to something which is already mapped by the first. (Maybe I missed it?) For example:

First:

{
  "imports": {
    "/app/helper": "./helper/index.mjs"
  }
}

Second:

{
  "imports": {
    "helper": "/app/helper"
  }
}

What's the intended behavior in this case? My inclination is to say that this is equivalent to

{
  "imports": {
    "/app/helper": "./helper/index.mjs",
    "helper": "./helper/index.mjs",
  }
}

i.e., imports on the RHS of later maps are resolved in the context of earlier maps. This is to preserve the property that if you have a script with import "/foo", you can replace this with a script with import "bar" plus an import map of "bar": "/foo". Plus, one of the use cases for import maps is to rewrite the location of some dependency, and that only work if someone can't re-introduce the original location of the dependency by adding their own import map later.

This is consistent with the way the web usually works: if one script does globalThis.fetch = decorate(globalThis.fetch), and a later script does globalThis.fetch = decorate2(globalThis.fetch), the second script will wrap the fetch from the first script, not the original fetch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda+ To be discussed at a triage meeting
Development

Successfully merging this pull request may close these issues.

None yet

8 participants