Skip to content

Commit

Permalink
Text trie (#1)
Browse files Browse the repository at this point in the history
* before removing previous

* remove original modules

* clean up test output

* update license, readme, etc.

* fix markdown tables

* performed a series of individual (benchmarked) optimizations: use lazy text when building text, use BangPatterns in several places, remove elemToNat, replace Word with Word16, add inlining stages, refactor highestBitMask to use let, inline several uses of breakMaximalPrefix; update tests accodingly, added benchmark summary generator (with results), added encoding to UTF16 benchmark

* move benchmarks from README to bench.md

* replace travis with minimal stack (single LTS) test

* add missing copyright notices, remove 'Tested-With'
  • Loading branch information
michaeljklein authored Apr 11, 2019
1 parent a90f48e commit 67b9180
Show file tree
Hide file tree
Showing 31 changed files with 1,781 additions and 1,365 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
.DS_Store

# Ignore Haskell building stuff
.stack-work/
.cabal-sandbox/
cabal.sandbox.config
cabal-dev/
Expand Down
185 changes: 16 additions & 169 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,178 +1,25 @@
# TODO: replace this file with one auto-generated by
# <https://github.com/haskell-CI/haskell-ci>
# (But in a way that doesn't prevent us from setting -O3 for
# src/Data/Trie/ByteStringInternal/indexOfDifference.c if we so choose)
# https://stackoverflow.com/a/38943257/5154287
sudo: false

language: c

# <http://docs.travis-ci.com/user/languages/haskell/>
# <https://github.com/ZeusWPI/12Urenloop/blob/master/.travis.yml>
# <http://docs.travis-ci.com/user/build-configuration/>

# This language field gives us a "no language set" thing, but if
# we leave it off then it goes through the trouble of installing
# some default ruby environment. Having it here doesn't seem to
# hurt; unlike what hvr/multi-ghc-travis suggests.
language: haskell

# Alas, we need sudo to use hvr's PPA. So we can't use the
# container-based infrastructure, and so we can't cache things :(
# <http://docs.travis-ci.com/user/workers/container-based-infrastructure/>
# <http://docs.travis-ci.com/user/caching/>
#sudo: false
#cache:
# directories:
# - $HOME/.cabal
# - $HOME/.ghc
# We could consider signing up for Amazon S3 to do it...
# <http://looprecur.com/blog/haskell-types-tests-and-fast-feedback/>


# The only versions natively supported by TravisCI are (7.0, 7.4,
# 7.6, 7.8). Other versions (6.8, 6.10, 6.12; 7.2; 7.10) are not
# supported, and patchlevels cannot be secified.
# <http://docs.travis-ci.com/user/ci-environment/#Haskell-VM-images>
#
# So instead, we use <https://github.com/hvr/multi-ghc-travis>
# However, note that these are only for Ubuntu Linux 12.04 LTS
# 64-bit. Will have to come up with something fancy in order to
# also test 32-bit, Windows, and OSX...
env:
# The base/bytestring library versions are from:
# <https://ghc.haskell.org/trac/ghc/wiki/Commentary/Libraries/VersionHistory>
# Coincidentally, base-4.5.0.0 is the lower bound for Tasty :)
# base bytestring binary
- CABALVER=1.16 GHCVER=7.4.1 # 4.5.0.0 0.9.2.1 0.5.1.0
- CABALVER=1.16 GHCVER=7.4.2 # 4.5.1.0 --''-- --''--
- CABALVER=1.16 GHCVER=7.6.1 # 4.6.0.0 0.10.0.0 0.5.1.1
- CABALVER=1.16 GHCVER=7.6.2 # 4.6.0.1 0.10.0.2 --''--
- CABALVER=1.18 GHCVER=7.6.3 # --''-- --''-- --''--
- CABALVER=1.18 GHCVER=7.8.1 # 4.7.0.0 0.10.4.0 0.7.1.0
- CABALVER=1.18 GHCVER=7.8.2 # --''-- --''-- --''--
- CABALVER=1.18 GHCVER=7.8.3 # 4.7.0.1 --''-- --''--
- CABALVER=1.18 GHCVER=7.8.4 # 4.7.0.2 --''-- --''--
- CABALVER=1.22 GHCVER=7.10.1 # 4.8.0.0 0.10.6.0 0.7.3.0
- CABALVER=1.22 GHCVER=7.10.2 # 4.8.1.0 --''-- 0.7.5.0
- CABALVER=1.22 GHCVER=7.10.3 # 4.8.2.0 --''-- --''--
- CABALVER=1.24 GHCVER=8.0.1 # 4.9.0.0 0.10.8.1 0.8.3.0
- CABALVER=1.24 GHCVER=8.0.2 # 4.9.1.0 --''-- --''--
- CABALVER=2.0 GHCVER=8.2.1 # 4.10.0.0 0.10.8.2 0.8.5.1
- CABALVER=2.0 GHCVER=8.2.2 # 4.10.1.0 --''-- --''--
- CABALVER=2.2 GHCVER=8.4.1 # 4.11.0.0 --''-- --''--
- CABALVER=2.2 GHCVER=8.4.2 # 4.11.1.0 --''-- --''--
- CABALVER=2.2 GHCVER=8.4.3 # --''-- --''-- --''--
- CABALVER=2.4 GHCVER=8.6.1 # 4.12.0.0 --''-- 0.8.6.0
- CABALVER=2.4 GHCVER=8.6.2 # --''-- --''-- --''--
cache:
directories:
- ~/.stack

addons:
apt:
packages:
- libgmp-dev

before_install:
# If $GHCVER is the one travis has, don't bother reinstalling it.
# We can also have faster builds by installing some libraries with
# `apt`. If it isn't, install the GHC we want from hvr's PPA along
# with cabal-1.18. This trick was taken from lens
# cf., <https://github.com/ekmett/lens/blob/master/.travis.yml>
- |
if [ $GHCVER = `ghc --numeric-version` ]; then
# Try installing some of the build-deps with apt-get for speed.
travis/cabal-apt-install --enable-tests $MODE
export CABAL=cabal
else
# Install the GHC we want from hvr's PPA
travis_retry sudo add-apt-repository -y ppa:hvr/ghc
travis_retry sudo apt-get update
travis_retry sudo apt-get install cabal-install-$CABALVER ghc-$GHCVER
export PATH=/opt/ghc/$GHCVER/bin:/opt/cabal/$CABALVER/bin:$PATH
# HACK: For some reason CABALVER>=1.24 uses the bare name
# 'cabal' rather than the versioned name. So we try to
# autodetect and autocorrect for that anomaly.
if command -v cabal-$CABALVER >/dev/null 2>&1 ; then
# The usual/old case, for CABALVER<1.24
export CABAL=cabal-$CABALVER
elif command -v cabal >/dev/null 2>&1 ; then
# Found something called 'cabal', but should double-check
# that it's the right one. (Maybe I'm being excessively
# paranoid?)
ACTUAL=$(cabal --version | head -1)
EXPECTED="cabal-install version $CABALVER"
if [[ $ACTUAL =~ $EXPECTED ]] ; then
export CABAL=cabal
else
echo 2>&1 "I found something called cabal, but not the one I expected..."
cabal --version 2>&1
exit 1
fi
else
echo >&2 "Why can't I find $CABAL?!"
exit 1
fi
fi
# The standard configuration for cabal-1.16 gives a horribly
# obscure error message because it cannot parse the "jobs: $ncpus"
# line that's there by default. So we're fixing that.
# cf., <https://ghc.haskell.org/trac/ghc/ticket/7324>
- |
if [ -e ~/.cabal/config ]; then
echo 'Fixing the ~/.cabal/config for Cabal-1.16'
mv ~/.cabal/config{,.bak} && grep -v '^[[:space:]]*jobs:' ~/.cabal/config.bak > ~/.cabal/config
# TODO: Try to remove these other warnings?
# Warning: /home/travis/.cabal/config: Unrecognized stanza on line 117
# /home/travis/.cabal/config: Unrecognized stanza on line 89
# /home/travis/.cabal/config: Unrecognized field extra-prog-path on line 37
fi
# Uncomment this line whenever hackage is down.
#- mkdir -p ~/.cabal && cp travis/config ~/.cabal/config && $CABAL update
- $CABAL update

# TODO: for use with <https://github.com/guillaume-nargeot/hpc-coveralls>
# TODO: figure out how to do the appropriate version test...
#- |
# if [ $CABALVER -ge 1.22 ]; then
# export ENABLE_COVERAGE='--enable-coverage'
# else
# export ENABLE_COVERAGE='--enable-library-coverage'
# fi

# Download and unpack the stack executable
- mkdir -p ~/.local/bin
- export PATH=$HOME/.local/bin:$PATH
- travis_retry curl -L https://www.stackage.org/stack/linux-x86_64 | tar xz --wildcards --strip-components=1 -C ~/.local/bin '*/stack'

install:
- $CABAL --version
- echo "$(ghc --version) [$(ghc --print-project-git-commit-id 2> /dev/null || echo '?')]"
- travis_retry $CABAL update
# TODO: make the --enable-benchmarks flag work
- $CABAL install --only-dependencies --enable-tests
# TODO: make the --enable-benchmarks flag work
- $CABAL configure -v2 --enable-tests
- stack --no-terminal --install-ghc test --only-dependencies


# Here starts the actual work to be performed for the package under
# test; any command which exits with a non-zero exit code causes
# the build to fail.
script:
- $CABAL build
- $CABAL test --show-details=always
# Not passing --hyperlink-source, unless we want to install hscolour>=1.8
# TODO: how to get this to throw an error if we don't have 100% coverage?
- $CABAL haddock
# We ignore the return result of check, because it will warn about
# us demanding -O2 and there's no way to tell it that yes we
# really do want that.
- $CABAL check || true
# tests that a source-distribution can be generated
- $CABAL sdist
# check that the generated source-distribution can be built & installed
- |
export SRC_TGZ=$(cabal info . | awk '{print $2 ".tar.gz";exit}')
cd dist/
if [ -f "$SRC_TGZ" ]; then
cabal install --force-reinstalls "$SRC_TGZ"
else
echo "expected '$SRC_TGZ' not found"
exit 1
fi
# TODO: additional checks:
# * Check for code-smell via hlint
# * Check for build-depends excluding latest package versions with packdeps
# * Check for unused build-depends with packunused
# * Check for 100% Haddock coverage
# * Check for trailing whitespaces and/or tabs in source files
- stack --no-terminal test --haddock --no-haddock-deps
6 changes: 5 additions & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
=== Haskell bytestring-trie package AUTHORS/THANKS file ===
=== Haskell text-trie package AUTHORS/THANKS file ===

The text-trie package was adapted from bytestring-trie by michael j. klein and is
released under the terms in the LICENSE file.


The bytestring-trie package was written by wren gayle romano and is
released under the terms in the LICENSE file. I would also like to
Expand Down
31 changes: 5 additions & 26 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,26 +1,5 @@
0.2.5.0 (2019.02.25):
- Fixing things to compile under GHC 8.4 and 8.6.
- Adds Semigroup (Trie a) instance
- Removed the obsolete/unused "useCinternal" Cabal flag
0.2.4.3 (2019.02.24):
- Moved VERSION to CHANGELOG
- Fixing things to compile under GHC 8.0 and 8.2. N.B., still doesn't compile under 8.4 or 8.6, due to the version limit on `base`.
0.2.4.1 (2015.04.04):
- Data.Trie.Internal: adjusted imports to compule under GHC 7.10.1
0.2.4 (2014.10.09):
- added Data.Trie.Internal.{match_,matches_}, Data.Trie.Base.{match,matches}
0.2.3 (2010.02.12):
- added Data.Trie.Internal.alterBy_
- added Data.Trie.Internal.{contextualMap, contextualMap', contextualFilterMap, contextualMapBy}
- added Data.Trie.Convenience.{fromListWith', fromListWithL, fromListWithL'} as suggested by Ian Taylor
- added Data.Trie.Convenience{insertWith', insertWithKey', unionWith'}
- converted fmap, foldMap, traverse, and filterMap to worker/wrapper
0.2.2 (2010.06.10):
- Corrected a major bug in mergeBy, reported by Gregory Crosswhite
0.2.1.1 (2009.12.20):
- Added a VERSION file
0.2.1 (2009.02.13):
- Most recent release before adding a VERSION file

0.1.4 (2009.01.11):
- The only previous tag
0.2.5.0 (2019.04.02):
- Fixed things to compile with stack lts-13.15
- Modified ByteString cases to use Text
- Modified tests to use Text Trie (e.g. change `Ord` instance to the one produced by `toList16`)
- Modified functions that accept/return built `Text` to use `Data.Text.Lazy`
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2008--2013, wren gayle romano.
Copyright (c) 2008--2013, wren gayle romano, 2019 michael j. klein
ALL RIGHTS RESERVED.

Redistribution and use in source and binary forms, with or without
Expand Down
66 changes: 36 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
bytestring-trie
text-trie
===============
[![Hackage version](https://img.shields.io/hackage/v/bytestring-trie.svg?style=flat)](https://hackage.haskell.org/package/bytestring-trie)
[![Hackage-Deps](https://img.shields.io/hackage-deps/v/bytestring-trie.svg?style=flat)](http://packdeps.haskellers.com/specific?package=bytestring-trie)
[![TravisCI Build Status](https://img.shields.io/travis/wrengr/bytestring-trie.svg?style=flat)](https://travis-ci.org/wrengr/bytestring-trie)
[![CircleCI Build Status](https://circleci.com/gh/wrengr/bytestring-trie.svg?style=shield&circle-token=b57517657c556be6fd8fca92b843f9e4cffaf8d1)](https://circleci.com/gh/wrengr/bytestring-trie)
[![Hackage version](https://img.shields.io/hackage/v/bytestring-trie.svg?style=flat)](https://hackage.haskell.org/package/text-trie)
[![TravisCI Build Status](https://img.shields.io/travis/michaeljklein/bytestring-trie.svg?style=flat)](https://travis-ci.org/michaeljklein/text-trie)

The bytestring-trie package provides an efficient implementation
The `text-trie` package is a lightweight adaptation of `bytestring-trie` to `Text`.

For the differences in performance, see [bench.md](https://github.com/michaeljklein/text-trie/blob/text-trie/bench.md).


## bytestring-trie

The [bytestring-trie](https://github.com/wrengr/bytestring-trie) package provides an efficient implementation
of tries mapping `ByteString` to values. The implementation is
based on Okasaki's big-endian patricia trees, à la `IntMap`. We
first trie on the elements of `ByteString` and then trie on the
Expand All @@ -28,37 +33,38 @@ and maximum keys, etc.)
This is a simple package and should be easy to install. You should
be able to use one of the following standard methods to install it.

-- With cabal-install and without the source:
$> cabal install bytestring-trie
```bash
-- With stack and without the source:
$> stack install text-trie

-- With cabal-install and with the source already:
$> cd bytestring-trie
$> cabal install
-- With stack and with the source already:
$> cd text-trie
$> stack install

-- Without cabal-install, but with the source already:
$> cd bytestring-trie
$> runhaskell Setup.hs configure --user
$> runhaskell Setup.hs build
$> runhaskell Setup.hs haddock --hyperlink-source
$> runhaskell Setup.hs copy
$> runhaskell Setup.hs register

The Haddock step is optional.
```


## Portability

The implementation is quite portable, relying only on a few basic
language extensions. The complete list of extensions used is:
The implementation only relies on a few basic
language extensions and `DeriveGeneric`. The complete list of extensions used is:

* `CPP`
* `MagicHash`
* `NoImplicitPrelude`
* `StandaloneDeriving`
* `DeriveGeneric`

* CPP
* MagicHash
* NoImplicitPrelude

## Links

* [Website](http://wrengr.org/)
* [Blog](http://winterkoninkje.dreamwidth.org/)
* [Twitter](https://twitter.com/wrengr)
* [Hackage](http://hackage.haskell.org/package/bytestring-trie)
* [GitHub](https://github.com/wrengr/bytestring-trie)
- [Hackage](http://hackage.haskell.org/package/text-trie)
- [GitHub](https://github.com/michaeljklein/text-trie)

- `bytestring-trie`
* [Website](http://wrengr.org/)
* [Blog](http://winterkoninkje.dreamwidth.org/)
* [Twitter](https://twitter.com/wrengr)
* [Hackage](http://hackage.haskell.org/package/bytestring-trie)
* [GitHub](https://github.com/wrengr/bytestring-trie)

52 changes: 0 additions & 52 deletions TODO

This file was deleted.

Loading

0 comments on commit 67b9180

Please sign in to comment.