What's new in GHC 2021

A complete overview of the modern Haskell defaults

Mar 14, 2023

My general rule for writing about new developments in the Haskell compiler or core libraries is to wait until they’ve been available for a year or so. This avoids putting a reader into a situation where the tutorial they’re reading involves features that they can’t use yet, perhaps because they’re not yet able to install the latest compiler, or because they’re working within a larger codebase where not all of their dependencies support the latest stuff yet. GHC2021 has now been available for about a year and a half, and there are now three major GHC releases — 9.2, 9.4, and 9.6 (as of just a few days ago) — so I’m quite happy to go ahead and begin giving a new recommendation: Use GHC2021 as the language for your Haskell projects.

My skeletal default Cabal file looks like this nowadays:

cabal-version: 3.0
name: my-package-name
version: 0.0.0.0

common base
    default-language: GHC2021
    ghc-options: -Wall
    build-depends: base

library
    import: base
    hs-source-dirs: library
    exposed-modules: ...

The available choices for default-language are Haskell98, Haskell2010, and GHC2021.

A brief history of 27 years

Haskell has gone through a number of versioning schemes as things have evolved over the decades. The recorded history of Haskell begins with Haskell 1.0 in 1990. A few other releases went on in this style; many are lost to time, though we can find Haskell 1.3 from 1996. Then at the end of the decade, a new language standard was published: Haskell 98, this time named for the year in which it was written. With the exception of some lesser revisions a few years later, Haskell 98 was the definition of “official” Haskell for another decade.

There were two major Haskell compilers — Hugs and GHC — and, because coders gonna code, new features continued to be introduced into each of them. I wasn’t around in this era and I don’t know much about Hugs, so I will speak only of GHC. By default, GHC was a Haskell98 compiler; but if you compiled with the -fglasgow-exts flag, then you got all the new stuff that GHC added that wasn’t in the language specification. Some fun additions during the early years were:

Scoped type variables were added in GHC 4.02 (1998) and expanded upon in GHC 6.4 (2005)
Functional dependencies first appeared in GHC 5.00 (2001)
Generalized newtype deriving is from GHC 5.04 (2002)
Template Haskell, GHC 6.0 (2003)
GADTs, GHC 6.4 (2005)

GHC 6.4 was the first version to include the package manager, Cabal. It seems to have been recognized around this time that Haskell was losing sight of the dream of having multiple compilers. If package authors increasingly relied on compiler-specific features — that is, if everybody just kept publishing packages that could only be built with the -fglasgow-exts flag — then there would effectively be two programming languages, GHC Haskell and Hugs Haskell, and libraries could not be shared between them.

Thus the language pragma appears in GHC 6.6. Rather than declaring “I require the GHC features,” this allows a package to specify precisely which non-standard features it requires. It is feasible, then, for two compilers to incorporate some of the same features, creating opportunity for library interoperability without requiring that the two compilers stay perfectly in sync.

In 2010, we got a new official language specification: Haskell 2010. There hasn’t been once since. I can’t speak for anyone else, but my sense is: We have lost the dream. Modern Haskell is GHC. This is not to say that the entire system of having a stable language specification with many named extensions is a pointless exercise; I think the extension mechanism provides excellent points of reference for discussing language features, tracking how the compiler has changed over time, and seeing which features people are using. But it does seem clear that the specification document is losing relevance, and that this has resulting in a general lack of motivation to work on a new version of it. Language extensions are now the thing at the top of everyone’s mind.

It was clear that we needed a new language version, though. Because over the past twenty or so years, Haskell has been accumulating a hefty pile of language extensions, and many of them have become de facto standards. But if nobody wants to take on the substantial task of writing (or funding the writing of) a sequel to the 300-page document that is Haskell 2010, then it just isn’t happening.

So what we got instead in 2021 was sort of a new language specification, but defined in the laziest way possible. (Not a criticism.) GHC 2021 is defined as Haskell 2010 plus a list of language extensions.

Changes in GHC 2021

Below we give a complete list of the differences between Haskell 2010 and GHC 2021. The list of extension names can be found the GHC manual. If you haven’t written Haskell in ten years — or if you’ve been using Haskell and need an overview of what language features you don’t have to explicitly turn on anymore — the remainder of this article outlines what you need to know to get up-to-date.

Use GHC2021 as the language for your Haskell projects.

Expanding deriving power

The set of stock-derivable classes (e.g. Eq, Ord, Show) has been expanded to include Functor, Foldable, Traversable, Generic, Data, and Lift.

Standalone deriving clauses are now permitted. For example,

data Ix a = Ix Int a deriving Eq

can now be written as

data Ix a = Ix Int a

deriving instance Eq a => Eq (Ix a)

A newtype is now permitted to receive instances derived from its underlying type. In the following example, Show is a stock-derivable class, but Semigroup and Monoid are newtype-derived using this feature.

import Data.Foldable (fold)
import Data.Monoid (Sum (..))
import Numeric.Natural (Natural)

newtype Total = Total (Sum Natural) 
  deriving (Show, Semigroup, Monoid)

λ> fold [Total 5, Total 10, Total 3]
Total (Sum {getSum = 18})

More explicit type information

The forall keyword is added, letting you explicitly bind type variable names. This is called existential quantification, and it makes several other features possible.

If you bind a type variable with forall in parentheses within a larger type, that’s called a higher-rank type, and it is not possible in Haskell 2010.

If you bind a type variable with forall in some definition’s type signature, you can refer to that type variable within the lexical scope of the definition; this feature is called scoped type variables.

One way to use a scoped type variable is in a type application, which is an important tool for reducing polymorphism to resolve type ambiguities using the @ keyword. For example:

{-# language AllowAmbiguousTypes #-}

import Data.List (intercalate)

showAll :: forall a. (Enum a, Bounded a, Show a) => String
showAll = intercalate ", " $ map (show @a) $
    enumFromTo minBound maxBound

λ> showAll @Ordering
"LT, EQ, GT"

Type signatures are now allowed on methods within instance definitions. For example:

import Data.Semigroup (Semigroup (..))

data Pair a = Pair a a

instance (Semigroup a) => Semigroup (Pair a) where
   (<>) :: Pair a -> Pair a -> Pair a
   Pair a b <> Pair c d = Pair (a <> c) (b <> d)

The line beginning with “(<>) ::” is an instance signature. It is not permitted in Haskell 2010 but is allowed in GHC 2021. The instance signature is not allowed to contradict the polymorphic type of (<>) as defined by the Semigroup class; it only reiterates the type for clarity.

Kind annotations and kind signatures now supported. These look just like type annotations and type signatures, using the :: keyword. The difference is that while a type annotation specifies the type of an expression, a kind annotation specifies the kind of a type. For example, consider the following datatype definition.

import Data.Kind (Type)

newtype NumberAction context number =
  NumberAction (context number)

There are two ways to make the kinds of its type parameters explicit. One is with kind annotations:

newtype NumberAction (context :: Type -> Type) (number :: Type) =
  NumberAction (context number)

The other is by adding a standalone kind signature:

type NumberAction :: (Type -> Type) -> Type -> Type

newtype NumberAction context number =
  NumberAction (context number)

In this example, writing the extra kind information has no consequence; it serves only to make things explicit. Kind annotations are sometimes necessary, however, because kind polymorphism is now enabled by default. This means that, in the absence of an explicit kind annotation, the compiler will now infer polymorphic kinds when possible. This is generally something you will only encounter if you are doing something exotic.

New ways to write numbers

There are some new ways to write numeric literals. Underscores can be inserted into a number wherever you like. Typically we use this in the same way you would use commas in English to increase readability: for example, we write 1_000_000 to represent one million (1,000,000). In contrast with 1000000, the underscores make it is easier to comprehend at a glance how many zeroes there are.

Binary literals are now allowed, prefixed with 0b. The list [1, 2, 3, 4, 5] can be written in binary like so:

[0b0, 0b1, 0b10, 0b11, 0b100, 0b101]

Floating-point literals can now be expressed using hexadecimal notation, prefixed with 0x. For example, a and b below are equivalent:

a = 0xb.fe
b = 11 + (254 / 256)

Slightly nicer-looking imports

It is now permitted to move the qualified keyword to the end of an import declaration. This can lead to more compact import lists in some aligned formatting styles. For example,

import           Data.List (intercalate)
import qualified Data.List as List

can now be written as:

import Data.List (intercalate)
import Data.List qualified as List

Bits of abbreviated syntax

\(a -> (a,b)) can be written as (a,) and \(b -> (a,b)) can be written as (,b). The abbreviated forms are called tuple sections.

Postfix operators are now allowed. It resembles the syntax for operation sections, but is for operators that are unary rather than binary. This feature is not commonly used.

import Data.IORef

(++) :: Num a => IORef a -> IO ()
(++) i = modifyIORef' i (+ 1)

λ> do { i <- newIORef 0 ; (i ++) ; (i ++) ; readIORef i }
2

Bang patterns allow you to prefix a variable pattern with ! to force evaluation of the matched value to weak head normal form, equivalent to applying seq.

When you are constructing or deconstructing using record syntax and a local variable has the same name as a corresponding record field, the name no longer has to be written twice. This feature is called named field punning. Consider the following example that does not use field puns:

data ShippingInfo =
    ShippingInfo{ name :: String, address :: String }

readShippingInfo :: IO ShippingInfo
readShippingInfo = do
    name <- getLine
    address <- getLine
    pure ShippingInfo{ name = name, address = address }

displayShippingInfo :: ShippingInfo -> String
displayShippingInfo ShippingInfo{ name = name, address = address} =
    "Name: " <> name <> "\n" <>
    "Address: " <> address <> "\n"

Using named field puns, these functions can be written more conveniently as:

readShippingInfo :: IO ShippingInfo
readShippingInfo = do
    name <- getLine
    address <- getLine
    pure ShippingInfo{ name, address }

displayShippingInfo :: ShippingInfo -> String
displayShippingInfo ShippingInfo{ name, address } =
    "Name: " <> name <> "\n" <>
    "Address: " <> address <> "\n"

Support for Void

Case expressions without any cases are now permitted. This is generally needed only when dealing with the Void type, which has no constructors. For example, the absurd function is defined using an empty case expression:

absurd :: Void -> a
absurd x = case x of { }

Another limitation pertaining to types with no data constructors is relaxed: Previously they weren’t allowed to have a deriving clause, and this is now permitted. For example, the definition of the Void type itself now looks like this:

data Void
  deriving (Eq, Data, Generic, Ord, Read, Show)

Multi-parameter classes

Perhaps the most bold change in GHC 2021 is that multi-parameter type classes are now permitted. For example, the MonadError class from the mtl package, reproduced below, has two parameters, e and m.

class (Monad m) => MonadError e m | m -> e where
    throwError :: e -> m a
    catchError :: m a -> (e -> m a) -> m a

Like many multi-parameter type classes, the MonadError class also has a functional dependency (the “ | m -> e” at the end of the first line). This is not enabled by GHC 2021; it still requires separately enabling the FunctionalDependencies language extension.

Relaxed rules for classes, instances, constraints

Flexible contexts and flexible instances are now permitted; these extensions lift some restrictions that were previously imposed upon what sorts of typeclass constraints were permissive.

It is now permissible to use a type alias in an instance definition. For example, consider the following class and instances from Relude. Since String is an alias for [Char], the last instance is not permitted in Haskell 2010, but it is fine in GHC 2021.

class ToText a         where toText :: a -> Text
instance ToText Text   where toText = id
instance ToText LText  where toText = LT.toStrict
instance ToText String where toText = T.pack

Class methods were previously not allowed to have constraints on class variables, such as the (Eq a) contraint in the example below. This restriction has been deemed unhelpful, and so it has been removed.

class Seq s a where
  fromList :: [a] -> s a
  elem     :: (Eq a) => a -> s a -> Bool

A type context (the stuff to the left of the => arrow) can now be anything of the Constraint kind. This includes, most notably, constraint aliases. For example, we can define the following:

import Data.Kind (Type, Constraint)

type All :: Type -> Constraint
type All a = (Enum a, Bounded a)

This defines All to be a constraint satisfied by types that belong to both Enum and Bounded. This alias can then be used in the context of a type signature such as:

all :: All a => [a]
all = enumFromTo minBound maxBound

A new way to write datatypes

There is a new form of data declaration. Instead of an = symbol, the where keyword opens a block which contains a list of type signatures for all the datatype’s constructors. The following two type definitions are equivalent:

data Maybe a = Nothing | Just a

data Maybe a where
    Nothing :: Maybe a
    Just :: a -> Maybe a

Which format you choose is a matter of preference. The old form is usually more concise. The new form is called GADT syntax because it is sometimes more convenient for writing the sorts of complicated datatypes that require the GADTs (Generalised Algebraic Data Types) language extension. GADTs were too exotic to standardize into GHC 2021, but this syntax designed for it is innocuous enough to be now supported by default.

The writing will continue until I run out of money!

Happy Haskelling, and everybody stay (type-) safe out there.