Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate UTF-8 hPutStr to standard hPutStr #589

Merged
merged 5 commits into from
Jun 22, 2024

Conversation

BebeSparkelSparkel
Copy link
Contributor

Data.Text.IO.Utf8.hPutStr was implemented in #503 and has much better performance than Data.Text.IO.hPutStr when the encoding is "UTF-8" and the neline is LF, so I added it.

Question: Is there a faster way to check for the encoding without checking for string equality?

Copy link
Contributor

@Lysxia Lysxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspecting the textEncodingName of TextEncoding like you did seems like the right way.

src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
@Bodigrim Bodigrim requested a review from Lysxia June 2, 2024 21:10
@Lysxia Lysxia force-pushed the directly-write-utf8 branch from 2a5714e to e1068f1 Compare June 3, 2024 15:26
@Lysxia
Copy link
Contributor

Lysxia commented Jun 3, 2024

I merged with master (adapting to changes from #600 which moved the logic of hPutStr to Data.Text.Internal.IO.hPutStream) (this PR is squashable, just not rebaseable) and added a comment about why the pointer-equality is fine (I initially thought it had to be guarded as enc `unsafePtrEquality` utf8 || enc == utf8 but convinced myself otherwise).

@Lysxia Lysxia requested a review from Bodigrim June 3, 2024 15:30
@Bodigrim
Copy link
Contributor

Bodigrim commented Jun 3, 2024

@Lysxia shall we drop GHC 8.2 CI job?

-- | Write a string to a handle, followed by a newline.
hPutStrLn :: Handle -> Text -> IO ()
hPutStrLn h t = hPutStreamOrUtf8 h (streamLn t) (Just putUtf8)
where putUtf8 = hPutBuilder h (encodeUtf8Builder t <> charUtf8 '\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not B.hPutStrLn h (encodeUtf8 t)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be B.hPutStrLn h (encodeUtf8 (t <> '\n')). I couldn't find another way to append the '\n' at the end in constant time. Open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect B.hPutStrLn to append '\n' on its own, am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't see your Ln. There is no B.hPutStrLn... right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hPutStrLn :: Handle -> ByteString -> IO ()
hPutStrLn h ps = hPut h ps >> hPut h (L.singleton 0x0a)

https://github.com/haskell/bytestring/blob/46a3aeb179c26d2385f95ba518eaa5464c94ceaa/Data/ByteString/Lazy/Char8.hs#L932

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the B.hPutStrLn implementation since it breaks the atomic printing that was just implemented for strict hPustStrLn in #600

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that non-100%-atomicity of B.hPutStrLn is for bytestring to resolve, not for us. But does hPutBuilder guarantee atomic printing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B.hPutBuilder does if it can fit in the buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reopened haskell/bytestring#200 and added a comment.

@Bodigrim Bodigrim merged commit caed573 into haskell:master Jun 22, 2024
24 of 26 checks passed
@Bodigrim
Copy link
Contributor

Thanks a ton!

@Lysxia Lysxia changed the title Integrate utf8 hPutStr to standard hPutStr Integrate UTF-8 hPutStr to standard hPutStr Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants