Skip to content

tim-group/iterata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Maven Central

iterata

Useful extensions to Scala's Iterator. Think errata for iterators.

Installation

Using SBT:

libraryDependencies += "com.timgroup" %% "iterata" % "0.1.6"

Or download the jar directly from maven central.

Iterata is currently published for Scala 2.11 only, please feel free to let us know if you'd like a build for a different Scala version.

Usage

1. Parallel processing iterator: #par()

Use the #par() method to add parallelism when processing an Iterator with functions chained via #map and #flatMap. It will eagerly evaluate the underlying iterator in chunks, and then evaluate the functions on each chunk via the Scala Parallel Collections. For example:

scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).iterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads

You can provide a specific chunk size, for example it.par(100).

Note that only the following Iterator methods are implemented (so far) to make use of parallel collections:

  • #map
  • #flatMap
  • #filter
  • #find

Grouped vs Ungrouped

The #par() method is available on any iterator, and takes an optional chunk size parameter. However, if you already have a GroupedIterator, you can simply call #par since it is already grouped. For example:

scala> val it = (1 to 100000).iterator.grouped(4).par

2. Memoize exhaustion iterator: #memoizeExhaustion

Use the #memoizeExhaustion method to wrap an Iterator so that its #hasNext method will not be called again after returning false. This is useful in cases where it is expensive to check if there is a next element, such as when I/O is involved.

Can serve as a workaround for SI-9623, where after concatenating two iterators with ++, the left iterator's #hasNext will be called twice for every call to the right iterator's #next().

scala> import com.timgroup.iterata.MemoizeExhaustionIterator.Implicits._
scala> val it1 = new IteratorWithExpensiveHasNext()
scala> val it2 = new IteratorWithExpensiveHasNext()
scala> (it1.memoizeExhaustion ++ it2).foreach(_ => ())
scala> it1.numTimesHasNextReturnedFalse
res2: Int = 1

About

Useful extensions to Scala's Iterator

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages