Algebraic Types and Binary Trees

Type Synonyms vs Algebraic Data Types

Important - these notes describe algebraic data types that are constructed with the data keyword in Haskell and that exist as entirely new user-defined types. This must not be confused with the type synonyms that can be defined in Haskell using the type keyword. Type synonyms can make code more readable but do not actually create new types. The Haskell type checker strips away all type synonyms and works with the underlying types during type inference and checking. Type synonyms do not, therefore, enhance type checking. But they definitely can make code easier to read.

Examples of Algebraic Data Types

All new user defined algebraic types are introduced with the data keyword. User defined types allow for more specific typing and strong compile type checking. Haskell algebraic data types uniformly implement some common user-defined data types found in other languages and in addition allows for recursive and parameterized defined types.

Sum types (Enumerated Types)

The data values of enumerated types are listed explicitly and are interpreted literally. The set of values is listed using an or symbol (|) and interpreted as a disjoint union (a logical sum) and thus the designation sum types. The values of literals here are called 0-ary value constructors and must be capitalized. Any finite collection of named values can be specified with sum types.

data Temp = Cold | Hot
data Season = Spring | Summer | Fall | Winter

Functions are defined over sum types by pattern matching cases on the named values.

weather :: Season -> Temp
weather Summer = Hot
weather _      = Cold

Product types

Product types are user defined tuples. They act the same as the pre-defined tuples but type values are named with value constructors so tuples used for distinct purposes cannot be inadvertently mixed.

data Person  = PersonC String Int    -- meaning (name,age)

This is a collection of pairs of Strings and Int values each representing a "Person" by name string and age Int. Here is a list of sample values of the type Person.

  [ PersonC "Neal" 29,
    PersonC "Sam" 24,
    PersonC "Al" 18
  ]

The keyword PersonC is called a binary value constructor for the Person data values. The value constructors for enumerated types like Temp and Season above are called "0-ary" value constructors. User defined product types can be of any finite arity, corresponding to any arity of tuple. Constructors for all algebraic data types must be capitalized.

Product types work the same as tuples, but you have a named value constructor instead of parentheses and commas, eg, ("Neal",29) corresponds to PersonC "Neal" 29. Product types give strong and safer typing because the value constructors act as type tags for the values. Functions can be guaranteed to work on the correct type of values because they can be constrained to work only on the properly tagged values. We saw that with the weather function above. Here are functions on the Person type.


-- The age selector function
age :: Person -> Int
age PersonC _ years = years 

-- The name selector function
name :: Person -> Int
age PersonC nameStr _ = nameStr

oldest :: [Person] -> Int
oldest [] = error "oldest: no Persons in database"
oldest (p:ps) = max p (oldest ps)

oldest' :: [Person] -> Int
oldest' = maxList . map age
  where maxList = foldr1 max 

Important Note: Don't Confuse Type names with Constructors

When an algebraic type has a single value constructor it is legal and common convention to name the contructor the same as the type. Unfortuantely this can be very confusing in the beginning, so I have given the suffix "C" to all the contructors. Do not confuse type names with value constructors in algebraic types!. Both type names and value constructors can appear on the right side of data definitions, but they have very different meaning. Only types can appear on the left side of data definitions, not value constructors. Only value constructors can be used in pattern matching definitions, not type names.

Here is an example of the confusion that can arise.

data Person = Person String Person
This is a recursive type. The Person on the left is the type name. The Person on the right is also a type name. The Person to the immediate right of the equal sign is the constructor for values of the type. Ghci gives the following information about the constructor Person and the type Person.
Prelude> data Person = Person String Person

Prelude> :t Person
Person :: String -> Person -> Person
Prelude> :k Person
Person :: *
Ghci is telling us that the value constructor Person has the indicated type, where the indicated type involves the type Person. The type Person has kind *. We'll discuss kinds and the kind of types later, but for now note that we can only ask for the kind of types and type constructors in the world of types. And be Careful! type constructors are not the same as value constructors! The world of values and the world of types are carefully distinguished in Haskell and cannot be computationally mixed. The type world does calculation at compile time only and is fully terminating.

We can see the distinction more clearly in Ghci when we give the value constructor a different name. We can only ask

Prelude> data Person = PersonC String Person
Prelude> :t PersonC
PersonC :: String -> Person -> Person
Prelude> :k PersonC
< interactive>:1:1:
    Not in scope: type constructor or class `PersonC'
    Perhaps you meant `Person' (line 10)
Prelude> :k Person
Person :: *
The data structure is like a list of Strings, where we expect the strings will be person names, although they don't have to be.

Combining sum and product types

Here we show how a sum-of-products data type can be defined. This allows us to safely implement Union types (from C), that is, constructed values can be of varying forms with each form distinguished by the product value constructor. Notice that here we have included the deriving statement so that the Haskell compiler will automatically produce code for the required operators of the listed type classes.
data Shape = Circle Float
           | Rect Float Float
           deriving (Eq, Ord, Enum, Show, Read)
Shapes can be either circles or Rectangles and we can define functions appropriately for each variant of the data type.
area(Circle r) == pi * r^2
area(Rect x y) = x * y
Constructors are actually a special kind of function that produces values of the algebraic data type. What makes them special is that they do not evaluate by any rewrite rule like regular functions. That is, they have no defining equation and instead they just create a value (although the value constructor arguments will be evaluated as needed).

Haskell record syntax for product types

These are keyword referenced tuples that can sometimes be easier to use or read. Keyword tuples are not positionally ordered. Haskell automatically generates selector functions for product types using record syntax, so this can avoid clutter. See the Haskell Wiki section More_on_datatypes in the intermediate level for a discussion of the Record Syntax. Also you can see RWH pp55-56.

Algebraic Data Types can be Polymorphic

An algebraic data type can be defined with type variables as well as concrete types. For example the following Maybe type can be used to improve error handling by allowing a Nothing return value for failure or else a Just a return value for success, where the type variable can be any type. Here's an example from Graham Hutton's textbook, Chapter 10. (The Maybe type is actually predefined in the Prelude).
data Maybe a = Nothing | Just a

safediv :: Int -> Int -> Maybe Int
safediv _ 0 = Nothing
safediv m n = Just (m `div` n)

Recursive Algebraic Data Types

Here's a data type definition for lists. This is just a more verbose version of the built-in lists which have value constructors [] and :
data MyList a = Nil | Cons a (MyList a)

mylist = Cons 3 (Cons 2 (Cons 1 (Cons 0 Nil)))
usuallist = 3:2:1:0:[]

General form of Algebraic Data Type Definitions

Here's a schema of the general form where each value constructor can have a different number of product terms. The a1,a2, etc are type parameters that can be used in each of the t_m:n which are types, read t subscript m n
data Name a1 a2 ... aj
  = Con_0   t_0:1   t_0:2 ...   t_0:k1
  | Con_1   t_1:1   t_1:2 ...   t_1:k2
  |  ...
  | Con_n   t_n:1   t_n:2 ...   t_n:km

Algebraic data types can easily participate in Haskell type classes

Use the deriving facility to have Haskell automatically generate code for an algebraic data type. Note that the derived equality for lists here means lists must be exactly the same length and have exactly the same order of elements. If you want a different equality, you would not use the deriving clause and instead supply your own equality in an instance statement.
data MyList a = Nil | Cons a (List a)
              deriving (Eq, Ord, Read, Show)

isEqual :: MyList a -> Mylist a -> Bool
isEqual as bs = as == bs

Polymorphic Binary Trees

data Tree a = Nil | Node (Tree a) a (Tree a)
              deriving (Eq, Ord, Show, Read)

myTree = Node (Node Nil 3 (Node Nil 4 Nil)) 5 (Node Nil 2 Nil) 

countT :: Tree a -> Int
countT Nil = 0
count (Node t1 x t2) = 1 + count t1 + count t2

Simple Expression Trees

data Expr = Lit Int
          | Add Expr Expr
          | Sub Expr Expr
          deriving (Eq, Ord, Show, Read)

Write an evaluator for Exprs

Expression Trees with a Factored Definition

data Expr = Lit Int
          | Op Ops Expr Expr
          deriving (Eq, Ord, Show, Read)
data Ops = Add | Sub | Mul | Div
         deriving (Eq, Ord, Show, Read)
Write an evaluator for this new new form of representing expressions