Useful extensions to Scala's Iterator. Think errata for iterators.
Using SBT:
libraryDependencies += "com.timgroup" %% "iterata" % "0.1.6"
Or download the jar directly from maven central.
Iterata is currently published for Scala 2.11 only, please feel free to let us know if you'd like a build for a different Scala version.
Use the #par()
method to add parallelism when processing an Iterator
with functions chained via #map
and #flatMap
. It will eagerly evaluate the underlying iterator in chunks, and then evaluate the functions on each chunk via the Scala Parallel Collections. For example:
scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).iterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads
You can provide a specific chunk size, for example it.par(100)
.
Note that only the following Iterator methods are implemented (so far) to make use of parallel collections:
#map
#flatMap
#filter
#find
The #par()
method is available on any iterator, and takes an optional chunk size parameter. However, if you already have a GroupedIterator
, you can simply call #par
since it is already grouped. For example:
scala> val it = (1 to 100000).iterator.grouped(4).par
Use the #memoizeExhaustion
method to wrap an Iterator
so that its #hasNext
method will
not be called again after returning false
. This is useful in cases where it is expensive
to check if there is a next element, such as when I/O is involved.
Can serve as a workaround for SI-9623, where
after concatenating two iterators with ++
, the left iterator's #hasNext
will be called twice
for every call to the right iterator's #next()
.
scala> import com.timgroup.iterata.MemoizeExhaustionIterator.Implicits._
scala> val it1 = new IteratorWithExpensiveHasNext()
scala> val it2 = new IteratorWithExpensiveHasNext()
scala> (it1.memoizeExhaustion ++ it2).foreach(_ => ())
scala> it1.numTimesHasNextReturnedFalse
res2: Int = 1