Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uv_to_utf8 family of functions #22782

Merged
merged 2 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions embed.fnc
Original file line number Diff line number Diff line change
Expand Up @@ -3809,6 +3809,17 @@ Cp |U8 * |uvoffuni_to_utf8_flags_msgs \
|UV input_uv \
|const UV flags \
|NULLOK HV **msgs

Admp |U8 * |uv_to_utf8 |NN U8 *d \
|UV uv
Admp |U8 * |uv_to_utf8_flags \
|NN U8 *d \
|UV uv \
|UV flags
Admp |U8 * |uv_to_utf8_msgs|NN U8 *d \
|UV uv \
|UV flags \
|NULLOK HV **msgs
CDbp |U8 * |uvuni_to_utf8 |NN U8 *d \
|UV uv
EXdpx |bool |validate_proto |NN SV *name \
Expand Down
3 changes: 3 additions & 0 deletions embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,9 @@
# define utf8n_to_uvchr Perl_utf8n_to_uvchr
# define utf8n_to_uvchr_error Perl_utf8n_to_uvchr_error
# define utf8n_to_uvchr_msgs Perl_utf8n_to_uvchr_msgs
# define uv_to_utf8(a,b) Perl_uv_to_utf8(aTHX,a,b)
# define uv_to_utf8_flags(a,b,c) Perl_uv_to_utf8_flags(aTHX,a,b,c)
# define uv_to_utf8_msgs(a,b,c,d) Perl_uv_to_utf8_msgs(aTHX,a,b,c,d)
# define uvchr_to_utf8(a,b) Perl_uvchr_to_utf8(aTHX,a,b)
# define uvchr_to_utf8_flags(a,b,c) Perl_uvchr_to_utf8_flags(aTHX,a,b,c)
# define uvchr_to_utf8_flags_msgs(a,b,c,d) Perl_uvchr_to_utf8_flags_msgs(aTHX,a,b,c,d)
Expand Down
11 changes: 11 additions & 0 deletions pod/perldelta.pod
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,11 @@ L<perlapi/C<utf8_to_uv>> replaces L<perlapi/C<utf8_to_uvchr>> (which is
retained for backwards compatibility), but you should convert to use the
new form, as likely you aren't using the old one safely.

To convert in the opposite direction, you can now use
L<perlapi/C<uv_to_utf8>>. This is not a new function, but a new synonym
for L<perlapi/C<uvchr_to_utf8>>. It is added so you don't have to learn
two sets of names.

There are also two new functions, L<perlapi/C<strict_utf8_to_uv>> and
L<perlapi/C<c9strict_utf8_to_uv>> which do the same thing except when
the input string represents a code point that Unicode doesn't accept as
Expand All @@ -440,6 +445,12 @@ L<perlapi/C<utf8_to_uv_errors>> replaces L<perlapi/C<utf8n_to_uvchr_error>>.
L<perlapi/C<utf8_to_uv_msgs>> replaces
L<perlapi/C<utf8n_to_uvchr_msgs>>.

Also added are the inverse functions L<perlapi/C<uv_to_utf8_flags>>
and L<perlapi/C<uv_to_utf8_msgs>>, which are synonyms for the existing
functions, L<perlapi/C<uvchr_to_utf8_flags>> and
L<perlapi/C<uvchr_to_utf8_flags_msgs>> respectively. These are provided only
so you don't have to learn two sets of names.

=item *

Three new API functions are introduced to convert strings encoded in
Expand Down
9 changes: 9 additions & 0 deletions proto.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 17 additions & 9 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,14 @@ S_new_msg_hv(pTHX_ const char * const message, /* The message text */
=for apidoc uvoffuni_to_utf8_flags

THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
Instead, B<Almost all code should use L<perlapi/uvchr_to_utf8> or
L<perlapi/uvchr_to_utf8_flags>>.
Instead, B<Almost all code should use L<perlapi/uv_to_utf8> or
L<perlapi/uv_to_utf8_flags>>.

This function is like them, but the input is a strict Unicode
(as opposed to native) code point. Only in very rare circumstances should code
not be using the native code point.

For details, see the description for L<perlapi/uvchr_to_utf8_flags>.
For details, see the description for L<perlapi/uv_to_utf8_flags>.

=cut
*/
Expand All @@ -155,9 +155,11 @@ const char super_cp_format[] = "Code point 0x%" UVXf " is not Unicode,"
#define MASK UTF_CONTINUATION_MASK

/*
=for apidoc uvchr_to_utf8_flags_msgs
=for apidoc uv_to_utf8_msgs
=for apidoc_item uvchr_to_utf8_flags_msgs

THIS FUNCTION SHOULD BE USED IN ONLY VERY SPECIALIZED CIRCUMSTANCES.
These functions are identical. THEY SHOULD BE USED IN ONLY VERY SPECIALIZED
CIRCUMSTANCES.

Most code should use C<L</uvchr_to_utf8_flags>()> rather than call this directly.

Expand Down Expand Up @@ -367,26 +369,32 @@ Perl_uvoffuni_to_utf8_flags_msgs(pTHX_ U8 *d, UV input_uv, UV flags, HV** msgs)
}

/*
=for apidoc uvchr_to_utf8
=for apidoc uv_to_utf8
=for apidoc_item uv_to_utf8_flags
=for apidoc_item uvchr_to_utf8
=for apidoc_item uvchr_to_utf8_flags

These each add the UTF-8 representation of the native code point C<uv> to the
end of the string C<d>; C<d> should have at least C<UVCHR_SKIP(uv)+1> (up to
C<UTF8_MAXBYTES+1>) free bytes available. The return value is the pointer to
the byte after the end of the new character. In other words,

d = uvchr_to_utf8(d, uv);
d = uv_to_utf8(d, uv);

This is the Unicode-aware way of saying

*(d++) = uv;

C<flags> is used to make some classes of code points problematic in some way.
C<uvchr_to_utf8> is effectively the same as calling C<uvchr_to_utf8_flags>
(C<uvchr_to_utf8> is a synonym for C<uv_to_utf8>.)

C<uv_to_utf8_flags> is used to make some classes of code points problematic in
some way. C<uv_to_utf8> is effectively the same as calling C<uv_to_utf8_flags>
with C<flags> set to 0, meaning no class of code point is considered
problematic. That means any input code point from 0..C<IV_MAX> is considered
to be fine. C<IV_MAX> is typically 0x7FFF_FFFF in a 32-bit word.

(C<uvchr_to_utf8_flags> is a synonym for C<uv_to_utf8_flags>).

A code point can be problematic in one of two ways. Its use could just raise a
warning, and/or it could be forbidden with the function failing, and returning
NULL.
Expand Down
13 changes: 8 additions & 5 deletions utf8.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,11 +142,11 @@ typedef enum {
#define uvoffuni_to_utf8_flags(d,uv,flags) \
uvoffuni_to_utf8_flags_msgs(d, uv, flags, 0)

#define Perl_uvchr_to_utf8(mTHX, d, u) \
Perl_uvchr_to_utf8_flags(aTHX, d, u, 0)
#define Perl_uvchr_to_utf8_flags(mTHX, d, u, f) \
Perl_uvchr_to_utf8_flags_msgs(aTHX, d, u, f, 0)
#define Perl_uvchr_to_utf8_flags_msgs(mTHX, d, u, f , m) \
#define Perl_uv_to_utf8(mTHX, d, u) \
Perl_uv_to_utf8_flags(aTHX, d, u, 0)
#define Perl_uv_to_utf8_flags(mTHX, d, u, f) \
Perl_uv_to_utf8_msgs(aTHX, d, u, f, 0)
#define Perl_uv_to_utf8_msgs(mTHX, d, u, f , m) \
Perl_uvoffuni_to_utf8_flags_msgs(aTHX_ d, NATIVE_TO_UNI(u), f, m)

/* This is needed to cast the parameters for all those calls that had them
Expand All @@ -173,6 +173,9 @@ typedef enum {
#define Perl_c9strict_utf8_to_uv(s, e, cp_p, advance_p) \
Perl_utf8_to_uv_flags( s, e, cp_p, advance_p, \
UTF8_DISALLOW_ILLEGAL_C9_INTERCHANGE)
#define Perl_uvchr_to_utf8 Perl_uv_to_utf8
#define Perl_uvchr_to_utf8_flags Perl_uv_to_utf8_flags
#define Perl_uvchr_to_utf8_flags_msgs Perl_uv_to_utf8_msgs

#define utf16_to_utf8(p, d, bytelen, newlen) \
utf16_to_utf8_base(p, d, bytelen, newlen, 0, 1)
Expand Down
Loading