-
Notifications
You must be signed in to change notification settings - Fork 0
/
streaming.cabal
255 lines (243 loc) · 15 KB
/
streaming.cabal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
name: streaming
version: 0.1.4.5
cabal-version: >=1.10
build-type: Simple
synopsis: an elementary streaming prelude and general stream type.
description: This package contains two modules, <http://hackage.haskell.org/package/streaming/docs/Streaming.html Streaming>
and <http://hackage.haskell.org/package/streaming/docs/Streaming-Prelude.html Streaming.Prelude>.
The principal module, <http://hackage.haskell.org/package/streaming-0.1.4.3/docs/Streaming-Prelude.html Streaming.Prelude>, exports an elementary streaming prelude focused on
a simple \"source\" or \"producer\" type, namely @Stream (Of a) m r@.
This is a sort of effectful version of
@([a],r)@ in which successive elements of type @a@ arise from some sort of monadic
action before the succession ends with a value of type @r@.
Everything in the library is organized to make
programming with this type as simple as possible,
by the simple expedient of making it as close to @Prelude@
and @Data.List@ as possible. Thus for example
the trivial program
.
> >>> S.sum $ S.take 3 (S.readLn :: Stream (Of Int) IO ())
> 1<Enter>
> 2<Enter>
> 3<Enter>
> 6 :> ()
.
sums the first three valid integers from user input. Similarly,
.
> >>> S.stdoutLn $ S.map (map toUpper) $ S.take 2 S.stdinLn
> hello<Enter>
> HELLO
> world!<Enter>
> WORLD!
.
upper-cases the first two lines from stdin as they arise,
and sends them to stdout. And so on,
with filtering, mapping, breaking, chunking, zipping, unzipping, replicating
and so forth:
we program with streams of @Int@s or @String@s directly as
if they constituted something like a list. That's because streams really do constitute something
like a list, and the associated operations can mostly have the same names.
(A few, like @reverse@, don't stream and thus disappear;
others like @unzip@ are here given properly streaming formulation for the first time.)
And we everywhere
oppose \"extracting a pure list from IO\",
which is the origin of typical Haskell memory catastrophes.
Basically any case where you are
tempted to use @mapM@, @replicateM@, @traverse@ or @sequence@
with Haskell lists, you would do better to use something like
@Stream (Of a) m r@. The type signatures are a little fancier, but
the programs themselves are mostly the same. /In fact, they are mostly simpler./ Thus,
consider the trivial demo program mentioned in
<http://stackoverflow.com/questions/24068399/haskell-performance-of-iorefs this SO question>
.
> main = mapM newIORef [1..10^8::Int] >>= mapM readIORef >>= mapM_ print
.
The new user notices that this exhausts memory, and worries about the efficiency of Haskell @IORefs@.
But of course it exhausts memory! Look what it says!
The problem is immediately cured by writing
.
> main = S.print $ S.mapM readIORef $ S.mapM newIORef $ S.each [1..10^8::Int]
.
which really does what the other program was meant to do,
uses no more memory than @hello-world@, /and is simpler anyway/, since it
doesn't involve the detour of \"extracting a list from IO\". Almost
every use of list @mapM@, @replicateM@, @traverse@ and @sequence@ produces
this problem on a smaller scale. People get used to it, as if it were
characteristic of Haskell programs to use a lot of memory. But in truth
\"extracting a list or sequence from IO\" is mostly just bad practice pure and simple.
Of course, @mapM@, @replicateM@, @traverse@ and @sequence@ make sense for lists,
under certain conditions! But @unsafePerformIO@ also makes sense under
certain conditions.
.
The <http://hackage.haskell.org/package/streaming-0.1.4.3/docs/Streaming.html Streaming> module exports the general type,
@Stream f m r@, which can be used to stream successive distinct
steps characterized by /any/
functor @f@, though we are mostly interested in organizing computations
of the form @Stream (Of a) m r@. The streaming-IO libraries have
various devices for dealing
with effectful variants of @[a]@ or @([a],r)@ in which the emergence of
successive elements somehow depends on IO. But it is only with
the general type @Stream f m r@, or some equivalent,
that one can envisage (for example) the connected streaming of their
sorts of stream - as one makes lists of lists in the Haskell
@Prelude@ and @Data.List@. One needs some such type if we are
to express properly streaming equivalents of e.g.
.
> group :: Ord a => [a] -> [[a]]
> chunksOf :: Int -> [a] -> [[a]]
> lines :: [Char] -> [[Char]] -- but similarly with byte streams, etc.
.
to mention a few obviously desirable operations.
(This is explained more elaborately in the <https://hackage.haskell.org/package/streaming#readme readme> below.)
.
One could throw of course throw something
like the present @Stream@ type on top of a prior stream concept: this is how @pipes@ and
@pipes-group@ (which are very much our model here) use @FreeT@.
But once one grasps the iterable stream concept needed to express
those functions then one will also see that,
with it, one is /already/ in possession of a complete
elementary streaming library - since one possesses @Stream ((,) a) m r@
or equivalently @Stream (Of a) m r@. This
is the type of a \'generator\' or \'producer\' or \'source\' or whatever
you call an effectful stream of items.
/The present Streaming.Prelude is thus the simplest streaming library that can replicate anything like the API of the Prelude and Data.List/.
.
The emphasis of the library is on interoperation; for
the rest its advantages are: extreme simplicity, re-use of
intuitions the user has gathered from mastery of @Prelude@ and
@Data.List@, and a total and systematic rejection of type synonyms.
The two conceptual pre-requisites are some
comprehension of monad transformers and some familiarity
with \'rank 2 types\'. It is hoped that experimentation with this
simple material, starting with the ghci examples in @Streaming.Prelude@,
will give people who are new to these concepts some
intuition about their importance. The most fundamental purpose of the
library is to express elementary streaming ideas without reliance on
a complex framework, but in a way that integrates transparently with
the rest of Haskell, using ideas - e.g. rank 2 types, which are here
implicit or explicit in most mapping - that the user can carry elsewhere,
rather than chaining her understanding to the curiosities of
a so-called streaming IO framework (as necessary as that is for certain purposes.)
.
See the
<https://hackage.haskell.org/package/streaming#readme readme>
below for further explanation, including the examples linked there.
Elementary usage can be divined from the ghci examples in
@Streaming.Prelude@ and perhaps from this rough beginning of a
<https://github.com/michaelt/streaming-tutorial/blob/master/tutorial.md tutorial>.
Note also the
<https://hackage.haskell.org/package/streaming-bytestring streaming bytestring>
and
<https://hackage.haskell.org/package/streaming-utils streaming utils>
packages. Questions about usage can be put
raised on StackOverflow with the tag @[haskell-streaming]@,
or as an issue on Github, or on the
<https://groups.google.com/forum/#!forum/haskell-pipes pipes list>
(the package understands itself as part of the pipes \'ecosystem\'.)
.
The simplest form of interoperation with
<http://hackage.haskell.org/package/pipes pipes>
is accomplished with this isomorphism:
.
> Pipes.unfoldr Streaming.next :: Stream (Of a) m r -> Producer a m r
> Streaming.unfoldr Pipes.next :: Producer a m r -> Stream (Of a) m r
.
Interoperation with
<http://hackage.haskell.org/package/io-streams io-streams>
is thus:
.
> Streaming.reread IOStreams.read :: InputStream a -> Stream (Of a) IO ()
> IOStreams.unfoldM Streaming.uncons :: Stream (Of a) IO () -> IO (InputStream a)
.
With
<http://hackage.haskell.org/package/conduit conduit>
one might use, e.g.:
.
> Conduit.unfoldM Streaming.uncons :: Stream (Of a) m () -> Source m a
> \str -> Streaming.mapM_ Conduit.yield (hoist lift str) :: Stream (Of o) m r -> ConduitM i o m r
> \src -> hoist lift str $$ Conduit.mapM_ Streaming.yield :: Source m a -> Stream (Of a) m ()
.
These conversions should never be more expensive than a single @>->@ or @=$=@.
The simplest interoperation with regular Haskell lists is provided by, say
.
> Streaming.each :: [a] -> Stream (Of a) m ()
> Streaming.toList_ :: Stream (Of a) m r -> m [a]
.
The latter of course accumulates the whole list in memory, and is mostly what we are trying
to avoid. Every use of @Prelude.mapM f@ should be reconceived as using the
composition @Streaming.toList_ . Streaming.mapM f . Streaming.each@ with a view to
considering whether the accumulation required by @Streaming.toList_@ is really necessary.
.
Here are the results of some
<https://gist.github.com/michaelt/96606bbf05b29bf43a05aba081dc9bd4#file-benchmachines-hs microbenchmarks>
based on the
<https://github.com/ekmett/machines/blob/master/benchmarks/Benchmarks.hs benchmarks>
included in the machines package:
.
<<http://i.imgur.com/YbQtlXm.png>>
.
Because these are microbenchmarks for individual functions,
they represent a sort of \"worst case\"; many other factors can influence
the speed of a complex program.
.
license: BSD3
license-file: LICENSE
author: michaelt
maintainer: what_is_it_to_do_anything@yahoo.com
stability: Experimental
homepage: https://github.com/michaelt/streaming
bug-reports: https://github.com/michaelt/streaming/issues
category: Data, Pipes, Streaming
extra-source-files: README.md
source-repository head
type: git
location: https://github.com/michaelt/streaming
library
exposed-modules: Streaming,
Streaming.Prelude,
Streaming.Internal,
Control.Applicative.LApplicative
Control.Monad.LMonad
Control.Monad.Morph.LMorph
Control.Monad.Trans.LClass
Data.Functor.LFunctor
Data.Functor.LOf
Data.Linear
-- other-modules:
other-extensions: RankNTypes, CPP,
StandaloneDeriving, FlexibleContexts,
DeriveDataTypeable, DeriveFoldable,
DeriveFunctor, DeriveTraversable,
UndecidableInstances, PartialTypeSignatures,
ExplicitForAll, GADTs,
RebindableSyntax, ScopedTypeVariables
build-depends: base >=4.6
, mtl >=2.1
, mmorph >=1.0
, transformers >=0.4
, transformers-base < 0.5
, resourcet > 1.1.0
, exceptions > 0.5
, monad-control >=0.3.1
, time
, ghc-prim
, containers
hs-source-dirs: src
default-language: Haskell2010
test-suite streaming-tests
type: exitcode-stdio-1.0
hs-source-dirs: test, src
main-is: Spec.hs
build-depends: base >=4.6
, mtl >=2.1
, mmorph >=1.0
, transformers >=0.4
, transformers-base < 0.5
, resourcet > 1.1.0
, exceptions > 0.5
, monad-control >=0.3.1
, time
, ghc-prim
, containers
, streaming
, HUnit