Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 BOM handling #2

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

UTF-8 BOM handling #2

wants to merge 5 commits into from

Conversation

hjtappe
Copy link
Contributor

@hjtappe hjtappe commented Mar 27, 2015

Hi!

The BOM handling should take into account the different encoding types indicated by the BOM. Just removing the BOM only works for the most common way. The hungarian translation of the widely-used ckeditor nevertheless is an example where this fails. A derived test file showing the exception is included in the patch.

Thanks for considering this patch.

@mrclay
Copy link
Owner

mrclay commented Mar 30, 2015

I think this is outside the scope of JSMin, but this logic could go in a different class whose only purpose is to detect/convert these particular non-UTF8 encodings. Ideally it would also support iconv.

@mrclay
Copy link
Owner

mrclay commented Mar 30, 2015

I'd recommend JSMin\EncodingDetector with method getEncoding($input) and it would return 'UTF-8' if no BOM was present.

@hjtappe
Copy link
Contributor Author

hjtappe commented Mar 30, 2015

Hi! Thanks for the feedback. I'll take care. Please give me a few days, I'm still learning git & github. :-)
Actually, the class would need to return either the iconv-modified $input or perform the mbstring settings. I'll do some research to extract the functionality it from the JSMin class.

@hjtappe
Copy link
Contributor Author

hjtappe commented Apr 8, 2015

Please review the new encoder class and advise it anything should be changed.
Thanks a lot.

@hjtappe
Copy link
Contributor Author

hjtappe commented Apr 8, 2015

Was it OK to merge with your master or does it cause merge problems? It seems, the already-merged changes are now also in the pull request - but I did want to use all new test files in my feature branch...

@glensc
Copy link
Collaborator

glensc commented Dec 7, 2022

The get_encoding method relies on default_encoding, which is valid for PHP 5.6+, for earlier version should check for previous ini variables:

and also the default value has changed to utf-8, so this would be breaking change because earlier versions the ini values default to to iso8859-1.

and new code does always decode even if mbstring and iconv both are missing, resulting fatal error.

@glensc glensc marked this pull request as draft December 13, 2022 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants