-
Notifications
You must be signed in to change notification settings - Fork 1
Performance tips
Do not use getLine
, read
, readLn
. They are too slow and can cause TLE as soon as the input becomes moderately large.
In competitive programming the input format is well-defined as ASCII text with space-separated values. Based on that, a simple option that works well is line-by-line input using strict ByteString
s from bytestring
.
import Data.List (unfoldr)
import qualified Data.ByteString.Char8 as C
-- Reads one line of space-separated Ints
-- (<=' ') is a quick way to drop whitespace chars
getInts :: IO [Int]
getInts = unfoldr (C.readInt . C.dropWhile (<=' ')) <$> C.getLine
Sometimes, you want input to be faster, to be as fast as possible. The fastest way I have found is to read the full input into a ByteString
and then consume it as needed. There is a convenient implementation of this in haccepted's Scanner.hs.
Output in competitive programming is usually much smaller compared to input.
show
and putStrLn
are fast and work well enough.
The fastest way to output that I know of is using Data.ByteString.Builder
.
Input method | |
---|---|
getLine and read
|
3.944s |
ByteString line-by-line |
0.339s |
ByteString at once (Scanner ) |
0.184s |
Output method | |
---|---|
putStrLn |
0.574s |
ByteString Builder
|
0.158s |
Data.List.sort
is very slow. A simple
faster alternative is using Data.Sequence.sort/unstableSort
.
import Data.Foldable (toList)
import qualified Data.Sequence as Seq
sort :: Ord a => [a] -> [a]
sort = toList . Seq.unstableSort . Seq.fromList
If you need to be even faster and the vector-algorithms
package
is available, convert to a mutable unboxed/boxed vector and sort using an algorithm of choice.
If vector-algorithms
is not available (which is usually the case), you can use the mergesort implementations in haccepted's
Sort.hs.
Data.List.sort
is lazy, so a rare situation in which it can be the best choice is when you want to consume a small number of elements from
the front of the sorted list.
A summary of some sorting methods benchmarked on a list of length
Source
[Int] |
[(Int, Int)] |
|
---|---|---|
Data.List.sort | 246.3 ms | 274.6 ms |
Seq sort | 169.8 ms | 202.0 ms |
IntMap sort | 147.1 ms | N/A |
Map sort | 269.9 ms | |
Unboxed vector sort | 20.01 ms | 34.96 ms |
haccepted unboxed mergesort | 35.72 ms | N/A |
haccepted boxed mergesort | 106.1 ms |
Use the vector
package if available.
If it is not available, you can use the array
package that comes with GHC. Arrays from array
are less convenient to work with, but they do the job. There are two primary issues with array
:
- Very limited functions to operate on arrays, just the basic stuff
- Very cumbersome to define unboxed arrays for your own types
There is some stuff in haccepted's Array.hs to help with point 2.
array
also supports multidimensional arrays. This is one, and maybe the only, situation in which array
might be preferable over vector
. Of course, with vector
one can make arrays of arrays, but this can be inconvenient and slower.
Always use unboxed arrays over boxed arrays if you have the choice. It will be multiple times faster.
Sometimes you will need boxed arrays, such as when
- Your array element type is not a small fixed size. You may want an array of
Set
s, for instance. - Your array elements are generated by indexing the array they are in. You need the laziness of boxed arrays to make this recursive behavior work. This is an easy way to implement dynamic programming.
Make an array of Int
s and calculate the sum.
array boxed |
168.9 ms |
array unboxed |
9.858 ms |
vector boxed |
162.6 ms |
vector unboxed |
6.877 ms |
This can be encountered when writing nested loops, such as for dynamic programming or Floyd-Warshall. Consider the following code, which appears fine.
import Control.Monad.ST
import Data.Array.ST
import Data.Foldable
nestedLoops :: Int -> Int -> Int
nestedLoops n m = runST $ do
a <- newArray (0, 0) 0 :: ST s (STUArray s Int Int)
for_ [1..n] $ \i ->
for_ [1..m] $ \j ->
writeArray a 0 $ i + j
readArray a 0
Unfortunately it runs poorly.
We expect the compiler to turn the lists [1..n]
and [1..m]
into simple loops, but here it decides to share the inner list [1..m]
.
Note that this only happens if the inner list can be shared. There would no problem if the inner list was [i..m]
, for instance.
There are two ways to avoid this as far as I know.
One is to simply not use a list. With something like the loop
combinator below, for_ [1..m]
can replaced with loop 1 m
.
loop :: Applicative f => Int -> Int -> (Int -> f a) -> f ()
loop l r f = go l where go i = when (i <= r) (f i *> go (i + 1))
Another way is to compile with -fno-full-laziness
, by putting {-# OPTIONS_GHC -fno-full-laziness #-}
at the top of the file. This will apply to the entire file, so
it can affect other places where you may want this optimization.
The possibly negative effect of -ffull-laziness
is a known issue. See
Consider that you want to strict fold over a list of values after traversing it using traverse
, for
or sequenceA
(or their monadic versions). As a simple example, you want the sum of the elements in a mutable array of Int
s.
sum <$> traverse (readArray a) [1..n]
This is... not good. It is much more performant to foldM
directly.
foldM (\acc i -> (acc+) <$!> readArray a i) 0 [1..n]
Since this comes up once in a while, we can have a convenient definition:
foldMComp :: Monad m => (b -> a -> b) -> (c -> m a) -> b -> c -> m b
foldMComp f g = \z x -> f z <$!> g x
And use it as:
foldM ((+) `foldMComp` readArray a) 0 [1..n]
Data.Array.MArray
offers the functions
getElems :: (MArray a e m, Ix i) => a i e -> m [e]
getAssocs :: (MArray a e m, Ix i) => a i e -> m [(i, e)]
Be careful when using them, because they suffer from the same issue.
If you're using mutable vectors from vector
, you can fold over them directly using provided functions, so that's great.
sum <$> mapM (readArray a) [1..n] |
395.0 ms |
foldM (\acc i -> (acc+) <$!> readArray a i) 0 [1..n] |
81.04 ms |
foldM ((+) `foldMComp` readArray a) 0 [1..n] |
81.71 ms |
Vector.Unboxed.Mutable.foldl' (+) 0 a |
29.36 ms |