Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable conversion of unicode chars #100

Open
fmmarzoa opened this issue May 14, 2024 · 2 comments
Open

Add an option to disable conversion of unicode chars #100

fmmarzoa opened this issue May 14, 2024 · 2 comments

Comments

@fmmarzoa
Copy link

Hello!

I have a dictionary that looks like this:

translit_map = { u"\u0027": "", u"\u00C0": "A", u"\u00C1": "A", u"\u00C2": "A", u"\u00C3": "A", u"\u00C4": ["A", "AE"], u"\u00C5": ["A", "AA"], u"\u00C6": "AE", u"\u00C7": "C", ...

When this is converted, the unicode stuff is actually converted to their literal representation, like in:

 Ac={"'":D,'À':H,'Á':H,'Â':H,'Ã':H,'Ä':[H,'AE'],'Å':[H,'AA'],'Æ':'AE','Ç':O

So it'd be nice to have an option to disable this behaviour so it keeps the keys like u"\u0027", because there are some cases whether it could be needed (I have one in which I have to upload this code to a server through a web form and those non-ascii chars get converted into '?'. I have reported it to the server admin too, but anyway, it could be great if you could choose not to convert these into literal UTF-8 chars).

Thanks!
Fran

@dflook
Copy link
Owner

dflook commented May 16, 2024

Hello @fmmarzoa.

You can get close to what your are looking for by using code like this:

from python_minifier import minify

with open('snippet.py', 'rb') as f:
    source = f.read()

minified = minify(source)

with open('minified.py', 'w', encoding='ascii', errors='backslashreplace') as f:
    f.write(minified)

which will output:

translit_map={"'":'','\xc0':'A','\xc1':'A','\xc2':'A','\xc3':'A','\xc4':['A','AE'],'\xc5':['A','AA'],'\xc6':'AE','\xc7':'C'}

But this will break any program that uses non-ascii unicode names, e.g.

def Á():pass

@fmmarzoa
Copy link
Author

fmmarzoa commented Jul 5, 2024

Hi dflook,

Thanks for that workaround suggestion, I missed the notification.

What I did myself was to run a script after minify to restore those escaped chars using this:

def unicode_to_escape(input_str):
    """
    Convert non-ASCII Unicode characters in the input string to their escape sequences.

    Args:
        input_str (str): The input string containing Unicode characters.

    Returns:
        str: The modified string with non-ASCII Unicode characters converted to escape sequences.
    """
    def replace_unicode(match):
        return match.group(0).encode('unicode-escape').decode('ascii')

    # Match characters beyond the basic ASCII set
    return re.sub(r"[\x80-\uffff]", replace_unicode, input_str)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants