This document describes a set of standards for all code under the Liqwid project. It also explains our reasoning for these choices, and acts as a living document of our practices for current and future contributors to the project. We intend for this document to evolve as our needs change, as well as act as a single point of truth for standards.
The desired outcomes from the prescriptions in this document are as follows.
Inconsistency is worse than any standard, as it requires us to track a large amount of case-specific information. Software development is already a difficult task due to the inherent complexities of the problems we seek to solve, as well as the inherent complexities foisted upon us by decades of bad historical choices we have no control over. For newcomers to a project and old hands alike, increased inconsistency translates to developmental friction, resulting in wasted time, frustration and ultimately, worse outcomes for the code in question.
To avoid putting ourselves into this boat, both currently and in the future, we must strive to be automatically consistent. Similar things should look similar; different things should look different; as much as possible, we must pick some rules and stick to them; and this has to be clear, explicit and well-motivated. This will ultimately benefit us, in both the short and the long term. The standards described here, as well as this document itself, is written with this foremost in mind.
There is a limited amount of space in a developer's skull; we all have bad days, and we forget things or make decisions that, perhaps, may not be ideal at the time. Therefore, limiting cognitive load is good for us, as it reduces the amount of trouble we can inflict due to said skull limitations. One of the worst contributors to cognitive load (after inconsistency) is non-local information
- the requirement to have some understanding beyond the scope of the current unit of work. That unit of work can be a data type, a module, or even a whole project; in all cases, the more non-local information we require ourselves to hold in our minds, the less space that leaves for actually doing the task at hand, and the more errors we will introduce as a consequence.
Thus, we must limit the need for non-local information at all possible levels. 'Magic' of any sort must be avoided; as much locality as possible must be present everywhere; needless duplication of effort or result must be avoided. Thus, our work must be broken down into discrete, minimal, logical units, which can be analyzed, worked on, reviewed and tested in as much isolation as possible. This also applies to our external dependencies.
Thus, many of the decisions described here are oriented around limiting the amount of non-local knowledge required at all levels of the codebase. Additionally, we aim to avoid doing things 'just because we can' in a way that would be difficult for other Haskellers to follow, regardless of skill level.
Haskell is a language that is older than some of the people currently writing it; parts of its ecosystem are not exempt from it. With age comes legacy, and much of it is based on historical decisions which we now know to be problematic or wrong. We can't avoid our history, but we can minimize its impact on our current work.
Thus, we aim to codify good practices in this document as seen today. We also try to avoid obvious 'sharp edges' by proscribing them away in a principled, consistent and justifiable manner.
As developers, we should use our tools to make ourselves as productive as possible. There is no reason for us to do a task if a machine could do it for us, especially when this task is something boring or repetitive. We love Haskell as a language not least of all for its capability to abstract, to describe, and to make fun what other languages make dull or impossible; likewise, our work must do the same.
Many of the tool-related proscriptions and requirements in this document are driven by a desire to remove boring, repetitive tasks that don't need a human to perform. By removing the need for us to think about such things, we can focus on those things which do need a human; thus, we get more done, quicker.
The words MUST, SHOULD, MUST NOT, SHOULD NOT and MAY are defined as per RFC 2119.
The following warnings MUST be enabled for all builds of any project, or any project component:
-Wall
-Wcompat
-Wincomplete-record-updates
-Wincomplete-uni-patterns
-Wredundant-constraints
-Werror
These options are suggested by Alexis King - the
justifications for them can be found at the link. These fit well with our
motivations, and thus, should be used everywhere. The -Werror
ensures that
warnings cannot be ignored: this means that problems get fixed sooner.
Every source file MUST be free of warnings as produced by HLint, with default settings.
HLint automates away the detection of many common sources of boilerplate and inefficiency. It also describes many useful refactors, which in many cases make the code easier to read and understand. As this is fully automatic, it saves effort on our part, and ensures consistency across the codebase without us having to think about it.
Every source file MUST be formatted according to Fourmolu, with the following settings (as per its settings file):
indentation: 2
comma-style: leading
record-brace-space: true
indent-wheres: true
diff-friendly-import-export: true
respectful: true
haddock-style: multi-line
newlines-between-decls: 1
Each source code line MUST be at most 100 characters wide, and SHOULD be at most 80 characters wide.
Consistency is the most important goal of readable codebases. Having a single standard, automatically enforced, means that we can be sure that everything will look similar, and not have to spend time or mind-space ensuring that our code complies. Additionally, as Ormolu is opinionated, anyone familiar with its layout will find our code familiar, which eases the learning curve.
Lines wider than 80 characters become difficult to read, especially when viewed on a split screen. Sometimes, we can't avoid longer lines (especially with more descriptive identifiers), but a line length of over 100 characters becomes difficult to read even without a split screen. We don't enforce a maximum of 80 characters for this exact reason; some judgment is allowed.
camelCase MUST be used for all non-type, non-data-constructor names; otherwise,
TitleCase MUST be used. Acronyms used as part of a naming identifier (such as
'JSON', 'API', etc) SHOULD be downcased; thus repairJson
and
fromHttpService
are correct. Exceptions are allowed for external libraries
(Aeson's parseJSON
for example).
camelCase for non-type, non-data-constructor names is a long-standing convention
in Haskell (in fact, HLint checks for it); TitleCase for type names or data
constructors is mandatory. Obeying such conventions reduces cognitive load, as
it is common practice among the entire Haskell ecosystem. There is no particular
standard regarding acronym casing: examples of always upcasing exist (Aeson) as
well as examples of downcasing (http-api-data
). One choice for consistency
(or as much as is possible) should be made however.
All publically facing modules (namely, those which are not listed in
other-modules
in package.yaml
) MUST have explicit export lists.
All modules MUST use one of the following conventions for imports:
import Foo (Baz, Bar, quux)
import qualified Foo as F
Data types from qualified-imported modules SHOULD be imported unqualified by themselves:
import Data.Vector (Vector)
import qualified Data.Vector as Vector
The main exception is if such an import would cause a name clash:
-- no way to import both of these without clashing the Vector type name
import qualified Data.Vector as Vector
import qualified Data.Vector.Storable as VStorable
The sole exception is a 'hiding import' to replace part of the functionality
of Prelude
:
-- replace the String-based readFile with a Text-based one
import Prelude hiding (readFile)
import Data.Text.IO (readFile)
Data constructors SHOULD be imported individually. For example, given the following data type declaration:
module Quux where
data Foo = Bar Int | Baz
Its corresponding import should be:
import Quux (Foo, Bar, Baz)
For type class methods, the type class and its methods MUST be imported as so:
import Data.Aeson (FromJSON (fromJSON))
Qualified imports SHOULD use the entire module name (that is, the last component of its hierarchical name) as the prefix. For example:
import qualified Data.Vector as Vector
Exceptions are granted when:
- The import would cause a name clash anyway (such as different
vector
modules); or - We have to import a data type qualified as well.
Explicit export lists are an immediate, clear and obvious indication of what publically visible interface a module provides. It gives us stability guarantees (namely, we know we can change things that aren't exported and not break downstream code at compile time), and tells us where to go looking first when inspecting or learning the module. Additionally, it means there is less chance that implementation details 'leak' out of the module due to errors on the part of developers, especially new developers.
One of the biggest challenges for modules which depend on other modules (especially ones that come from the project, rather than an external library) is knowing where a given identifier's definition can be found. Having explicit imports of the form described helps make this search as straightforward as possible. This also limits cognitive load when examining the sources (if we don't import something, we don't need to care about it in general). Lastly, being explicit avoids stealing too many useful names.
In general, type names occur far more often in code than function calls: we have
to use a type name every time we write a type signature, but it's unlikely we
use only one function that operates on said type. Thus, we want to reduce the
amount of extra noise needed to write a type name if possible. Additionally,
name clashes from function names are far more likely than name clashes from type
names: consider the number of types on which a size
function makes sense.
Thus, importing type names unqualified, even if the rest of the module is
qualified, is good practice, and saves on a lot of prefixing.
The following pragmata MUST be enabled at project level (that is, in
package.yaml
):
DeriveFunctor
DerivingStrategies
EmptyCase
FlexibleContexts
FlexibleInstances
GeneralizedNewtypeDeriving
InstanceSigs
ImportQualifiedPost
LambdaCase
MultiParamTypeClasses
NoImplicitPrelude
OverloadedLabels
OverloadedStrings
TupleSections
Any other LANGUAGE pragmata MUST be enabled per-file. All language pragmata MUST
be at the top of the source file, written as {-# LANGUAGE PragmaName #-}
.
Furthermore, the following pragmata MUST NOT be used, or enabled, anywhere:
PartialTypeSignatures
DerivingStrategies
is good practice (and in fact, is mandated by this
document); it avoids ambiguities between GeneralizedNewtypeDeriving
and
DeriveAnyClass
, allows considerable boilerplate savings through use of
DerivingVia
, and makes the intention of the derivation clear on immediate
reading, reducing the amount of non-local information about derivation
priorities that we have to retain. DeriveFunctor
and
GeneralizedNewtypeDeriving
are both obvious and useful extensions to the
auto-derivation systems available in GHC. Both of these have only one correct
derivation (the former given by parametricity
guarantees, the latter by the fact that a newtype only
wraps a single value). As there is no chance of unexpected behaviour by these,
no possible behaviour variation, and that they're key to supporting both the
stock
and newtype
deriving stratgies, having these on by default removes
considerable tedium and line noise from our code. A good example are newtype
wrappers around monadic stacks:
newtype FooM a = FooM (ReaderT Int (StateT Text IO) a)
deriving newtype (
Functor,
Applicative,
Monad,
MonadReader Int,
MonadState Text,
MonadIO
)
EmptyCase
not being on by default is an inconsistency of Haskell 2010, as
the report allows us to define an empty data type, but without this extension,
we cannot exhaustively pattern match on it. This should be the default behaviour
for reasons of symmetry.
FlexibleContexts
and FlexibleInstances
paper over a major deficiency of
Haskell2010, which in general isn't well-motivated. There is no real reason to
restrict type arguments to variables in either type class instances or type
signatures: the reasons for this choice in Haskell2010 are entirely for the
convenience of the implementation. It produces no ambiguities, and in many ways,
the fact this isn't the default is more surprising than anything.
Additionally, many core libraries rely on one, or both, of these extensions
being enabled (mtl
is the most obvious example, but there are many others).
Thus, even for popularity and compatibility reasons, these should be on by
default.
InstanceSigs
are harmless by default, and introduce no complications. Their
not being default is strange. ImportQualifiedPost
is already a convention
of this project, and helps with formatting of imports.
LambdaCase
reduces a lot of code in the common case of analysis of sum
types. Without it, we are forced to either write a dummy case
argument:
foo s = case s of
-- rest of code here
Or alternatively, we need multiple heads:
foo Bar = -- rest of code
foo (Baz x y) = -- rest of code
-- etc
LambdaCase
is shorter than both of these, and avoids us having to bind
variables, only to pattern match them away immediately. It is convenient, clear
from context, and really should be part of the language to begin with.
MultiParamTypeClasses
are required for a large number of standard Haskell
libraries, including mtl
and vector
, and in many situations. Almost any
project of non-trivial size must have this extension enabled somewhere, and if
the code makes significant use of mtl
-style monad transformers or defines
anything non-trivial for vector
, it must use it. Additionally, it arguably
lifts a purely implementation-driven decision of the Haskell 2010 language, much
like FlexibleContexts
and FlexibleInstances
. Lastly, although it can
introduce ambiguity into type checking, it only applies when we want to define
our own multi-parameter type classes, which is rarely necessary. Enabling it
globally is thus safe and convenient.
Based on the recommendations of this document (driven by the needs of the
project and the fact it's cardinally connected with Plutus),
NoImplicitPrelude
is required to allow us to default to the Plutus prelude
instead of the one from base
.
OverloadedStrings
deals with the problem that String
is a suboptimal
choice of string representation for basically any problem, with the general
recommendation being to use Text
instead. It is not, however, without its
problems:
ByteString
s are treated as ASCII strings by theirIsString
instance;- Overly polymorphic behaviour of many functions (especially in the presence of type classes) forces extra type signatures;
These are usually caused not by the extension itself, but by other libraries and
their implementations of either IsString
or overly polymorphic use of type
classes without appropriate laws (Aeson's KeyValue
is a particularly
egregious offender here). The convenience of this extension in the presence of
literals, and the fact that our use cases mostly covers Text
, makes it worth
using by default.
TupleSections
smooths out an oddity in the syntax of Haskell 2010 regarding
partial application of tuple constructors. Given a function like foo :: Int -> String -> Bar
, we accept it as natural that we can write foo 10
to get a function of
type String -> Bar
. However, by default, this logic doesn't apply to tuple
constructors. As special cases are annoying to keep track of, and in this case,
serve no purpose, as well as being clear from their consistent use, this should
also be enabled by default; it's not clear why it isn't already.
The exclusion of PartialTypeSignatures
is by design, as it creates
confusing situations which are hard to understand.
The PlutusTx.Prelude
MUST be used. A 'hiding import' to remove functionality
we want to replace SHOULD be used when necessary. If functionality from the
Prelude
in base
is needed, it SHOULD be imported qualified. Other
preludes MUST NOT be used.
As this is primarily a Plutus project, we are in some ways limited by what Plutus requires (and provides). Especially for on-chain code, the Plutus prelude is the one we need to use, and therefore, its use should be as friction-free as possible. As many modules may contain a mix of off-chain and on-chain code, we also want to make impendance mismatches as limited as possible.
By the very nature of this project, we can assume a familiarity (or at least,
the goal of such) with Plutus stuff. Additionally, every Haskell developer is
familiar with the Prelude
from base
. Thus, any replacements of the
Plutus prelude functionality with the base
prelude should be clearly
indicated locally.
Haskell is a 30-year-old language, and the Prelude
is one of its biggest
sources of legacy. A lot of its defaults are questionable at best, and often
need replacing. As a consequence of this, a range of 'better Prelude
s' have
been written, with a range of opinions: while there is a common core, a large
number of decisions are opinionated in ways more appropriate to the authors of
said alternatives and their needs than those of other users of said
alternatives. This means that, when a non-base
Prelude
is in scope, it
often requires familiarity with its specific decisions, in addition to whatever
cognitive load the current module and its other imports impose. Given that we
already use an alternative prelude (in tandem with the one from base
),
additional alternatives present an unnecessary cognitive load. Lastly, the
dependency footprint of many alternative Prelude
s is highly non-trivial;
it isn't clear if we need all of this in our dependency tree.
For all of the above reasons, the best choice is 'default to Plutus, with local
replacements from base
'.
A project MUST use the PVP. Two, and only two, version numbers MUST be used: a major version and a minor version.
The Package Versioning Policy is the conventional Haskell versioning
scheme, adopted by most packages on Hackage. It is clearly described, and even
automatically verifiable by use of tools like policeman
. Thus,
adopting it is both in line with community standards (making it easier to
remember), and simplifies cases such as Hackage publication or open-sourcing in
general.
Two version numbers (major and minor) is the minimum allowed by the PVP, indicating compilation-breaking and compilation-non-breaking changes respectively. As parsimony is best, and more granularity than this isn't generally necessary, adopting this model is the right decision.
Every publically-exported definition MUST have a Haddock comment, detailing its purpose. If a definition is a function, it SHOULD also have examples of use using Bird tracks. The Haddock for a publically-exported definition SHOULD also provide an explanation of any caveats, complexities of its use, or common issues a user is likely to encounter.
If the code project is a library, these Haddock comments SHOULD carry an
@since
annotation, stating what version of the library they
were introduced in, or the last version where their functionality or type
signature changed.
For type classes, their laws MUST be documented using a Haddock comment.
Code reading is a difficult task, especially when the 'why' rather than the 'how' of the code needs to be deduced. A good solution to this is documentation, especially when this documentation specifies common issues, provides examples of use, and generally states the rationale behind the definition.
For libraries, it is often important to inform users what changed in a given
version, especially where 'major bumps' are concerned. While this would ideally
be addressed with accurate changelogging, it can be difficult to give proper
context. @since
annotations provide a granular means to indicate the last
time a definition changed considerably, allowing someone to quickly determine
whether a version change affects something they are concerned with.
As stated elsewhere in the document, type classes having laws is critical to our ability to use equational reasoning, as well as a clear indication of what instances are and aren't permissible. These laws need to be clearly stated, as this assists both those seeking to understand the purpose of the type class, and also the expected behaviour of its instances.
Lists SHOULD NOT be field values of types; this extends to String
s. Instead,
Vector
s (Text
s) SHOULD be used, unless a more appropriate structure exists.
On-chain code, due to a lack of alternatives, is one place lists can be used as
field values of types.
Partial functions MUST NOT be defined. Partial functions SHOULD NOT be used except to ensure that another function is total (and the type system cannot be used to prove it).
Derivations MUST use an explicit strategy. Thus, the following is wrong:
newtype Foo = Foo (Bar Int)
deriving (Eq, Show, Generic, FromJSON, ToJSON, Data, Typeable)
Instead, write it like this:
newtype Foo = Foo (Bar Int)
deriving stock (Generic, Data, Typeable)
deriving newtype (Eq, Show)
deriving anyclass (FromJSON, ToJSON)
Deriving via SHOULD be preferred to newtype derivation, especially where the underlying type representation could change significantly.
type
SHOULD NOT be used. The only acceptable case is abbreviation of large
type-level computations. In particular, using type
to create an abstraction
boundary MUST NOT be done.
Haskell lists are a large example of the legacy of the language: they (in the
form of singly linked lists) have played an important role in the development of
functional programming (and for some 'functional' languages, continue to do so).
However, from the perspective of data structures, they are suboptimal except for
extremely specific use cases. In almost any situation involving data (rather
than control flow), an alternative, better structure exists. Although it is both
acceptable and efficient to use lists within functions (due to GHC's extensive
fusion optimizations), from the point of view of field values, they are a poor
choice from both an efficiency perspective, both in theory and in practice.
For almost all cases where you would want a list field value, a Vector
field
value is more appropriate, and in almost all others, some other structure (such
as a Map
) is even better.
Partial functions are runtime bombs waiting to explode. The number of times the 'impossible' happened, especially in production code, is significant in our experience, and most partiality is easily solvable. Allowing the compiler to support our efforts, rather than being blind to them, will help us write more clear, more robust, and more informative code. Partiality is also an example of legacy, and it is legacy of considerable weight. Sometimes, we do need an 'escape hatch' due to the impossibility of explaining what we want to the compiler; this should be the exception, not the rule.
Derivations are one of the most useful features of GHC, and extend the
capabilities of Haskell 2010 considerably. However, with great power comes great
ambiguity, especially when GeneralizedNewtypeDeriving
is in use. While there
is an unambiguous choice if no strategy is given, it becomes hard to remember.
This is especially dire when GeneralizedNewtypeDeriving
combines with
DeriveAnyClass
on a newtype. Explicit strategies give more precise control
over this, and document the resulting behaviour locally. This reduces the number
of things we need to remember, and allows more precise control when we need it.
Lastly, in combination with DerivingVia
, considerable boilerplate can be
saved; in this case, explicit strategies are mandatory.
The only exception to the principle above is newtype deriving, which can occasionally cause unexpected problems; if we use a newtype derivation, and change the underlying type, we get no warning. Since this can affect the effect of some type classes drastically, it would be good to have the compiler check our consistency.
type
is generally a terrible idea in Haskell. You don't create an
abstraction boundary with it (any operations on the 'underlying type' still work
over it), and compiler output becomes very inconsistent (sometimes showing the
type
definition, sometimes the underlying type). If your goal is to create
an abstraction boundary with its own operations, newtype
is both cost-free
and clearer; if that is not your goal, just use the type you'd otherwise
rename, since it's equivalent semantically. The only reasonable use of type
is to hide complex type-level computations, which would otherwise be too long.
Even this is somewhat questionable, but the questionability comes from the
type-level computation being hidden, not type
as such.
Boolean blindness SHOULD NOT be used in the design of any function or API. Returning more meaningful data SHOULD be the preferred choice. The general principle of 'parse, don't validate' SHOULD guide design and implementation.
The description of boolean blindness gives specific reasons why it is a poor design choice; additionally, it runs counter to the principle of 'parse, don't validate. While sometimes unavoidable, in many cases, it's possible to give back a more meaningful response than 'yes' or 'no, and we should endeavour to do this. Designs that avoid boolean blindness are more flexible, less bug-prone, and allow the type checker to assist us when writing. This, in turn, reduces cognitive load, improves our ability to refactor, and means fewer bugs from things the compiler could have checked if a function wasn't boolean-blind.
Any multi-parameter type class MUST have a functional dependency restricting its relation to a one-to-many at most. In cases of true many-to-many relationships, type classes MUST NOT be used as a solution to the problem.
Multi-parameter type classes allow us to express more complex relationships
among types; single-parameter type classes effectively permit us to 'subset'
Hask
only. However, multi-parameter type classes make type inference
extremely flakey, as the global coherence condition can often lead to the
compiler being unable to determine what instance is sought even if all the type
parameters are concrete, due to anyone being able to add a new instance at any
time. This is largely caused by multi-parameter type classes defaulting to
effectively representing arbitrary many-to-many relations.
When we do not have arbitrary many-to-many relations, multi-parameter type
classes are useful and convenient. We can indicate this using functional
dependencies, which inform the type checker that our relationship is not
arbitrarily many-to-many, but rather many-to-one or even one-to-one. This is a
standard practice in many libraries (mtl
being the most ubiquitous example),
and allows us the benefits of multi-parameter type classes without making type
checking confusing and difficult.
In general, many-to-many relationships pose difficult design choices, for which type classes are not the correct solution. If a functional dependency cannot be provided for a type class, it suggests that the current design relies inherently on a many-to-many relation, and should be either rethought to eliminate it, or be dealt with using a more appropriate means.
Any type class not imported from an external dependency MUST have laws. These laws MUST be documented in a Haddock comment on the type class definition, and all instances MUST follow these laws.
Type classes are a powerful feature of Haskell, but can also be its most confusing. As they allow arbitrary ad-hoc polymorphism, and are globally visible, it is important that we limit the confusion this can produce. Additionally, type classes without laws inhibit equational reasoning, which is one of Haskell's biggest strengths, especially in the presence of what amounts to arbitrary ad-hoc polymorphism.
Additionally, type classes with laws allow the construction of provably
correct abstractions above them. This is also a common feature in Haskell,
ranging from profunctor optics to folds. If we define our own type classes, we
want to be able to abstract above them with total certainty of correctness.
Lawless type classes make this difficult to do: compare the number of
abstractions built on Functor
or Traversable
as opposed to Foldable
.
Thus, type classes having laws provides both ease of understanding and additional flexibility.