Skip to content

Performance tips

Soumik Sarkar edited this page Jun 11, 2023 · 9 revisions

Input and output

Input

Do not use getLine, read, readLn. They are too slow and can cause TLE as soon as the input becomes moderately large.

In competitive programming the input format is well-defined as ASCII text with space-separated values. Based on that, a simple option that works well is line-by-line input using strict ByteStrings from bytestring.

import Data.List (unfoldr)
import qualified Data.ByteString.Char8 as C

-- Reads one line of space-separated Ints
-- (<=' ') is a quick way to drop whitespace chars
getInts :: IO [Int]
getInts = unfoldr (C.readInt . C.dropWhile (<=' ')) <$> C.getLine

Sometimes, you want input to be faster, to be as fast as possible. The fastest way I have found is to read the full input into a ByteString and then consume it as needed. There is a convenient implementation of this in haccepted's Scanner.hs.

Output

Output in competitive programming is usually much smaller compared to input.

show and putStrLn are fast and work well enough.

The fastest way to output that I know of is using Data.ByteString.Builder.

Benchmarks

Source

Input method
getLine and read 3.944s
ByteString line-by-line 0.339s
ByteString at once (Scanner) 0.184s
Output method
putStrLn 0.574s
ByteString Builder 0.158s

Sorting

Data.List.sort is very slow. A simple faster alternative is using Data.Sequence.sort/unstableSort.

import Data.Foldable (toList)
import qualified Data.Sequence as Seq

sort :: Ord a => [a] -> [a]
sort = toList . Seq.unstableSort . Seq.fromList

If you need to be even faster and the vector-algorithms package is available, convert to a mutable unboxed/boxed vector and sort using an algorithm of choice.
If vector-algorithms is not available (which is usually the case), you can use the mergesort implementations in haccepted's Sort.hs.

Data.List.sort is lazy, so a rare situation in which it can be the best choice is when you want to consume a small number of elements from the front of the sorted list.

Benchmarks

A summary of some sorting methods benchmarked on a list of length $2\cdot{10}^5$:
Source

[Int] [(Int, Int)]
Data.List.sort 246.3 ms 274.6 ms
Seq sort 169.8 ms 202.0 ms
IntMap sort 147.1 ms N/A
Map sort 269.9 ms
Unboxed vector sort 20.01 ms 34.96 ms
haccepted unboxed mergesort 35.72 ms N/A
haccepted boxed mergesort 106.1 ms

Arrays

vector vs array

Use the vector package if available.

If it is not available, you can use the array package that comes with GHC. Arrays from array are less convenient to work with, but they do the job. There are two primary issues with array:

  1. Very limited functions to operate on arrays, just the basic stuff
  2. Very cumbersome to define unboxed arrays for your own types

There is some stuff in haccepted's Array.hs to help with point 2.

array also supports multidimensional arrays. This is one, and maybe the only, situation in which array might be preferable over vector. Of course, with vector one can make arrays of arrays, but this can be inconvenient and slower.

Boxed vs unboxed

Always use unboxed arrays over boxed arrays if you have the choice. It will be multiple times faster.

Sometimes you will need boxed arrays, such as when

  • Your array element type is not a small fixed size. You may want an array of Sets, for instance.
  • Your array elements are generated by indexing the array they are in. You need the laziness of boxed arrays to make this recursive behavior work. This is an easy way to implement dynamic programming.

Benchmarks

Make an array of Ints and calculate the sum.

Source

array boxed 168.9 ms
array unboxed 9.858 ms
vector boxed 162.6 ms
vector unboxed 6.877 ms

Unwanted sharing

This can be encountered when writing nested loops, such as for dynamic programming or Floyd-Warshall. Consider the following code, which appears fine.

import Control.Monad.ST
import Data.Array.ST
import Data.Foldable

nestedLoops :: Int -> Int -> Int
nestedLoops n m = runST $ do
    a <- newArray (0, 0) 0 :: ST s (STUArray s Int Int)
    for_ [1..n] $ \i ->
        for_ [1..m] $ \j ->
            writeArray a 0 $ i + j
    readArray a 0

Unfortunately it runs poorly.

We expect the compiler to turn the lists [1..n] and [1..m] into simple loops, but here it decides to share the inner list [1..m]. Note that this only happens if the inner list can be shared. There would no problem if the inner list was [i..m], for instance.

There are two ways to avoid this as far as I know.
One is to simply not use a list. With something like the loop combinator below, for_ [1..m] can replaced with loop 1 m.

loop :: Applicative f => Int -> Int -> (Int -> f a) -> f ()
loop l r f = go l where go i = when (i <= r) (f i *> go (i + 1))

Another way is to compile with -fno-full-laziness, by putting {-# OPTIONS_GHC -fno-full-laziness #-} at the top of the file. This will apply to the entire file, so it can affect other places where you may want this optimization.

The possibly negative effect of -ffull-laziness is a known issue. See

Fold after traverse

Consider that you want to strict fold over a list of values after traversing it using traverse, for or sequenceA (or their monadic versions). As a simple example, you want the sum of the elements in a mutable array of Ints.

sum <$> traverse (readArray a) [1..n]

This is... not good. It is much more performant to foldM directly.

foldM (\acc i -> (acc+) <$!> readArray a i) 0 [1..n]

Since this comes up once in a while, we can have a convenient definition:

foldMComp :: Monad m => (b -> a -> b) -> (c -> m a) -> b -> c -> m b
foldMComp f g = \z x -> f z <$!> g x

And use it as:

foldM ((+) `foldMComp` readArray a) 0 [1..n]

Sidenote on mutable vectors

Data.Array.MArray offers the functions

getElems :: (MArray a e m, Ix i) => a i e -> m [e] 
getAssocs :: (MArray a e m, Ix i) => a i e -> m [(i, e)] 

Be careful when using them, because they suffer from the same issue.

If you're using mutable vectors from vector, you can fold over them directly using provided functions, so that's great.

Benchmarks

Source

sum <$> mapM (readArray a) [1..n] 395.0 ms
foldM (\acc i -> (acc+) <$!> readArray a i) 0 [1..n] 81.04 ms
foldM ((+) `foldMComp` readArray a) 0 [1..n] 81.71 ms
Vector.Unboxed.Mutable.foldl' (+) 0 a 29.36 ms