Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer::removeRemoteReferences doesn't remove remote images #94

Open
irauta opened this issue Sep 15, 2023 · 2 comments
Open

Sanitizer::removeRemoteReferences doesn't remove remote images #94

irauta opened this issue Sep 15, 2023 · 2 comments

Comments

@irauta
Copy link

irauta commented Sep 15, 2023

When sanitizing an image like the example below, a href attribute pointing to a remote resource in an image element does not get removed, when $sanitizer->removeRemoteReferences(true); has been called before sanitizing:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 370 500" width="370" height="500">
  <rect x="5" y="5" width="350" height="490" style="fill: #FF6347; stroke: #000000; stroke-width: 10px;" />
  <image x="20" y="10" width="320" height="480" href="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Gustav_chocolate.jpg/800px-Gustav_chocolate.jpg" />
</svg>

I narrowed this down to the hasRemoteReference method of Sanitizer. (Link points to the 0.16.0 version, but the code in master branch is currently identical.) There is an early return in the method, introduced by PR #15, which seems to cause that potentially only attributes that are along the lines of fill="url('http://trackingsite.tld/image.png')" are removed, but the more straightforward references like the cat image from Wikimedia Commons in the example above are left alone.

I tried commenting out the middle part of the hasRemoteReference method, leaving only the first and last line active:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 370 500" width="370" height="500">
  <rect x="5" y="5" width="350" height="490" style="fill: #FF6347; stroke: #000000; stroke-width: 10px;"></rect>
  <image x="20" y="10" width="320" height="480"></image>
</svg>

I haven't tested older versions of svg-sanitizer, but looking at the code history it seems like this roughly corresponds to the way the method originally worked.

To me removing hrefs that point to remote resources seems like the intended functionality when removeRemoteReferences is enabled, but it's also not how the code currently works. I'm not suggesting to just remove the code that removes url() references - handling them is clearly important too - but am not sure what would be the best change to hasRemoteReference as I don't understand why the early return was added.

@elpas0
Copy link

elpas0 commented Aug 31, 2024

it seems that remoteReference method is exactly for checking url(...) in attributes
href is in allowed attributes list, and it also checked with isHrefSafeValue, which allow relative, http(s) values and different data:img

I guess, easiest way is just to remove image from allowed tags list

@verdy-p
Copy link

verdy-p commented Sep 8, 2024

Are image references using the internal "data:" URI scheme allowed? Do such internal images have to be restricted by MIME types?

This would require to include a subparser for safechecking these allowed data schemes, and possibly sanitize them, however the subparsers would be sort of plugins with their own parser and sanitizer instance, if they are not in SVG. So the SVG-sanitizer alone could not validate and sanitize anything else than SVG-type "data:" references and would have to reject possibly unsafe image types like PNG, GIF, JPEG, or PNG. IF SVG sanitizer does not perform any sanitization on the data content, but just checks its safety, no reencoder of the content would be needed (sanitizing the content may not work on all image types if they contain some digital signature extensions prohibiting most changes, and removing the signatures may also violate some licencing conditions for these embedded images; so this is quite complex to handle in a simple change if you need a way to integrate subparsing for embedded elements).

If only some safe media types are allowed, that do not require a complex subparser instance and reencoder, what is the process for integrating these subparser instances and whitelisted image types (and their own internal allowed features, notably only safe versions of PNG, GIF or JPEG without some of their unsafe extensions)?

The use case is generally for SVG images that need to contain embedded bitmaps, e,g. basic fill patterns (not efficiently represented as SVG if these patterns are quite large, such as textures possibly blended for mipmaps for 3D rendering), or digized photographs (for which a conversion to SVG is generally very poor, or requires a much larger SVG representation).

If these bitmaps cannot be safely embedded in "data:" UTI schemes, may be they could still be simply whitelisted only if the reference use a scheme-less relative URI (and that relative URI does not contain any "../" to bypass a container folder, for example when the sanitized SVG would be extracted and used from a safe temporary subdirectory or sandbox in "/tmp", where the "../" could be used to attempt reading/rendering contents outside of that sandbox, from the host system or from other neighrouring temporary subdirectories used by concurrent SVG santizers running for different remote user contexts).

Then what is the way to sanitize URIs that contain any scheme, or any path? are safe URIs only limited to the SVG-document itself (i.e. only URI starting by "#" followed by a anchor id)? If only safe-containment in the SVG itself is allowed, we still need a way to safely embed some bitmaps (fill patterns and photos) without being forced to convert them to SVG in ugly way.

[We have the same security/safety concerns when using some OpenType or SVG fonts that also need some embedded bitmaps; OpenType addressed that issue by defining a limited but safe encoding profile for such bitmaps; for SVG fonts, nothing like this seems to be defined].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants