A colleague and I recently found a case where they provided a slightly different benefit. They still helped provide some required type safety, but in this case they also helped to statically associate a value with a generic type parameter, avoiding a runtime lookup and keeping the code succinct using type inference.
This post uses Kotlin, but should be applicable to Java or any other language with generics / parameterised types.
A phantom type is a type that has a type parameter in its declaration that is not actually used by the constructors or fields of that type.
For (a fairly contrived) example:
Here we have a Measure
type with a single Double
field. It is parameterised by some type T
, but we could quite easily remove this parameter and the class would still work (i.e. data class Measure(val value: Double)
).
Even though T
isn’t used in the class definition, it does let us tag references to this type with additional information. The type parameter does not change how the type itself works, but it does let us restrict how it can be used.
Let’s make a plus
operator for Measure
, but ensure it only works on values of the same measurement units:
Within Measure<T>
we just have a Double
; the actual type of T
makes no difference at all. But when we need to use Measure<T>
values, we can ensure only compatible T
values are combined, or define operations only for specific T
(such as convert: Measure<Metres> -> Measure<Inches>
).
I’ll strip back the actual problem we were facing to something that is convenient to write up, yet still hopefully within the realms of plausibility. Say we have several entity types we want to load from some data store. Values of each type are queried by some schema information.
The problem here is that we can’t get the static SCHEMA
property from T
. There is no way for us to say “all types T
where T
has a static property SCHEMA: Schema
” in Kotlin.
There are a few options here. We can change our fetch
method to be fun <T: Entity> fetch(schema: Schema): List<T>
, but then we could call fetch<Sprocket>(Widget.SCHEMA)
by accident, which will cause all sorts of troubles when we try to cast/convert our data to the wrong type. We could use Kotlin’s reified type parameters or pass in a Class<T>
, and switch on type to lookup the correct schema for whatever T
is passed in, but that gets a little messy and will add a runtime cost when we actually know what we want statically.
Instead, let’s make Schema
a phantom type and tag it with the information about the specific entity type it represents.
This gives a warning that Type parameter "T" is never used
, a convincing indication we have a phantom type. Spooky.
Now we can pass through a schema value to our instance, and the compiler will infer what T
we’re after:
Now we have schema values that are also associated with specific types, giving us the ability to write generic code that needs these values.
Adding an essentially unused type parameter to Schema
here gives us an easy way to associate schema values with a particular type. It avoids runtime lookups on types, and does not compromise code safety by allowing us to pass through a value that is not appropriate for the expected type T
.
Here’s a list of all booleans in GHCi:
We can use the default applicative instance for lists to run all combinations of bools
through a function. If we use a tuple constructor, we’ll get truth table inputs (shown below for 2 and 3 argument expressions).
Let’s check a basic expression p
, and its equivalent using De Morgan’s laws. Writing x <$> bools <*> bools
gets repetitive, so we’ll start off by defining tt2
to get truth-tableish output values for a 2 input function Bool -> Bool -> Bool
.
We can then zipWith
to get truth table inputs with the corresponding truth table output (using a bit of uncurry
trickery):
Or for more arguments:
We can write something to format these nicely, but this was enough for me to get the information I wanted about a few expressions.
]]>Once upon a time there was a project that produced a JAR based on some generated code. I needed to make a repeatable, versioned build for the JAR and plug it into TeamCity so I could reference that build artifact from my current project. Simple!
Here is a rough timeline of how this went.
WARNING! In case it is not painfully obvious, this is all me learning stuff so any specifics mentioned are probably sub-optimal at best to destructive at worst. This is more about the journey than the destination.
git describe
with --first-parent
and use this information to shove some version info into the manifest.So far so good! Now to wire this up to TeamCity.
--first-parent
is not supported by git describe
--first-parent
was added to git describe
in 1.8.4.At this point someone suggests trying Docker to isolate the build dependencies. The generator team has an image for the generator I can use as a base. I can then create an image that adds the JDK and the version of Git I need and I should be good to go.
docker system df -v
, different prune options).docker run -it image_name bash
docker run -it --rm image_name bash
docker run -it --rm -v ${HOME}/.ssh:/root/.ssh -v $SSH_AUTH_SOCK:/ssh-agent -e SSH_AUTH_SOCK=/ssh-agent image_name bash
docker run -it --rm -v ${HOME}/.gradle:/root/.gradle ...
Dockerfile
and corresponding image so that we always have enough information to build each commit of the project with a compatible Docker image.Great, home stretch now. TeamCity has some Docker integration so this should be easy.
wget
some other packages and manually install them before installing the Docker package.~/.ssh/known_hosts
. Update this on the agents (ssh-keyscan -t rsa the_repo >> ~/.ssh/known_hosts
), then make sure .ssh
from host gets mapped to container via Docker parameters.Then it failed on deployment due to a different credential issue to another internal system. But after temporarily disabling deployment, the build works! Once I sort out the deployment problem I can start on my actual task of adding this library to my project. But that should be simple to fix, right?
Now if you’ll excuse me I’m going to go and have a bit of a lie down on several mountains of yak hair-stuffed cushions.
]]>When switching between my editor and GHCi REPL to test stuff out I often forget to add a deriving (Show, Eq)
or similar line to my data types. This normally occurs after I’ve just set up a bit of test data in the REPL, so if I just fix the data declaration and :reload
GHCi then my setup will be lost. We can use the StandaloneDeriving GHC extension to help here.
The following example is me playing around with some parsing stuff and forgetting to make ParseError
an instance of Show
(so it won’t print in the REPL), then using :set -XStandaloneDeriving
to fix this:
I’ve trimmed the orphan instance warning. In this case it should not be a problem as I’m just working around my forgetfulness. :)
We can also use this in real .hs
files if necessary via a language pragma ({-# LANGUAGE StandaloneDeriving #-}
) as described here.
For this post we’ll consider the example of a list of Sample
values we want aggregate information for. A Sample
includes the month and year it was collected, and an integer representing the value sampled. For each set of samples we are required to show the following information:
We could neatly get each individual bit of information by using multiple queries1, but requiring multiple iterations seems quite wasteful, especially for larger data sets. Instead we could use multiple variables, or an aggregate type containing those variables, and update each as we loop or fold over the data set:
This seems a reasonable approach to me, and we’ll take this and adapt it in an attempt to get a few additional benefits:
First we’ll create a type to represent values that can be combined. We’ll use Kotlin’s plus
operator for this purpose.
We’ll steal the term “semigroup” from mathematics as its definition includes the constraints our plus
operation needs2, although we could also call it Combinable
or Addable
or something else if we prefer.
If you haven’t used Kotlin before, defining a plus
operator function lets us also use the +
symbol, so a + b
will get translated to a.plus(b)
. Whenever you see two semigroups being added using +
for the remainder of this post, keep in mind it will be calling the plus
function defined by that semigroup instance. (If you don’t like co-opting +
in this way feel free to change the interface to declare fun combine(other: T): T)
or similar.)
Next, we’ll define instances that represent sum, max, and min aggregation:
Looking at our CandidateAggregate
from earlier, we also need to handle nullable values (earliestSampleDate: MonthYear?
), as well as combining Map<MonthYear, Int>
values. Rather than building these specifically for this case, we can express these concepts more generally in terms of other semigroups, so they can be reused for different cases:
Each of these operations is implemented quite similarly to the code we used for each field in CandidateAggregate
, but now we can reuse them for different aggregates, as well as test each in isolation. The cost is we have now spread this code across more types.
We can also write some general functions, concat
and concatMap
, to combine any list of Semigroup<T>
values into a single Semigroup<T>
value, effectively combining aggregates3. Here is an example of how to define and use these functions (as well as an example of testing Sum
and Max
in isolation):
Now we can rewrite CandidateAggregate
using our aggregation types:
The type of aggregation used appears explicitly for each field in Aggregate
. For example largestSample: Max<Int>
conveys both the type of the result (Int
), as well as the process being used to calculated it (Max
). In CandidateAggregate
only the former was expressed. We also build some field types by composing semigroups, such as Mapped<MonthYear, Sum>
, which specifies we will be adding values using Sum
rather than some other approach. This also makes it very simple to update the method of aggregation (as illustrated below).
We have made Aggregate
itself a semigroup to define how we combine these composite aggregates. We’ve also added an empty
property to make it easier to call concat
and concatMap
.
The last piece we need is to translate a single Sample
into an Aggregate
, then we can do the entire aggregation using concatMap
as shown in the aggregateSamples()
test. Each Sample
gets transformed into an Aggregate
representing that individual sample (an aggregate of 1), then each Aggregate
in turn gets combined to calculate the required information across all the samples.
This definitely has more pieces that the CandidateAggregate
version (although the code for each piece has not changed much, it is now spread over multiple types). More pieces suggest a performance impact, but I have not measured this.
We do get a few benefits for this price. Firstly, we now have some small, simple, genuinely reusable aggregation types (Sum
, Max
, Min
, Mapped
etc.). These can be combined into other aggregates, and they can be tested in isolation. Secondly, we explicitly define aggregate types in terms of the aggregates of which they are composed. We don’t have an aggregate that contains an Int
, we have a Sum
or a Max<Int>
which conveys more information as to the aggregation process, as well as preventing errors (summing two Int
values that should have been combined using maxOf
for example).
We also make it simpler to change our aggregation. For example, if we wanted to change from reporting the total value to the maximum value for each month, we can change Mapped<MonthYear, Sum>
to Mapped<MonthYear, Max<Int>>
and the aggregation process will adjust accordingly.
We introduced a Semigroup<T>
interface which represents values that can be combined with an associative, binary operation. We also introduced concat
and concatMap
operations that work for any instance of this interface. We created Sum
, Max
, Min
, Nullable
and Mapped
instances of this interface to represent common methods of aggregation, then built a custom Aggregate
semigroup composed of some of these instances.
This is a bit more complex compared than manually aggregating a set of values over a loop or fold, but in return gives us reusable and testable aggregate types, more communicative types for our aggregate model, less opportunities for bugs in the aggregation process, as well as making the creation of new aggregates and modifications to existing aggregates simpler.
Example of multiple queries:
val minDate = samples.map { it.date }.min()
val maxSample = samples.map { it.value }.max()
val inRangeCount = samples.count { (100..200).contains(it.value) }
A semigroup for a type T
consists of a closed binary operation T -> T -> T
that is also associative (i.e. a + (b + c) == (a + b) + c
). This associativity constraint means we can combine and compose these values fairly flexibly. For example, we can do a + b + c
, without having to worry about wether b
is itself a composite of x + y
, as associativity guarantees a + (x + y) + c
is the same as ((a + x) + y) + c
. We can’t do the same thing with non-associative operations like subtraction:
100 - (30 - 10) - 5 /= ((100 - 30) - 10) - 5
75 /= 55
The end result is we can use associativity to combine values without having to also take evaluation order into account.↩
Both concat
and concatMap
take an empty: T
value for cases where the items
lists are empty. We could use a Monoid
constraint instead of Semigroup
, which adds the concept of an empty identity element, but I found this messy to implement in Kotlin.↩
FAKE is an F#-based build tool along similar lines to Make, Rake, etc. The FAKE documentation describes one way of setting up dependencies between targets using the ==>
operator. For example:
This declaration means that to run the Test
target, Build
must be run beforehand, which in turn requires Version
, which in turn requires Clean
to be run.
This approach limits us to a linear build order. I’d prefer to specify these dependencies less prescriptively, and have FAKE calculate the ordering based on whatever target or targets I need.
Continuing the above example, I’d like to quickly build and run the tests during development, but for that case I don’t really need to version the assemblies. I’d also like to avoid running Clean
in this case to take advantage of incremental compilation. But if I’m running the Package
task to package everything for NuGet then it is essential to run Version
before Build
to make sure the packaged assemblies have the right version numbers. And I want to make sure I Clean
before a full build to avoid any old artefacts making it into a package.
I fairly recently found out that FAKE does support this flexibility, using soft dependencies and the ability to specify multiple dependencies using the <==
operator.
The "Clean" ?=> "Build"
line tells FAKE “if Clean
needs to run, it must run before Build
”. We also tell FAKE that if we are Version
ing, that has to be done before build as well. Unlike the linear definition we are not saying we have to run Clean
or Version
, just that if they need to run, they must go before Build
.
The <==
operator lets us make a target depend on multiple other targets. So "Package" <== [ "Build"; "Version" ]
tells FAKE that to run Package
, we have to run Build
and Version
. When we run fake Package
FAKE knows it has to run both tasks, and it also knows that if it runs Version
it must do so before Build
. So the final build order for that case is: Version
, then Build
, then Package
.1
This gives me exactly the behaviour I was after. I can run the Test
target which will force a build, but won’t run a clean or version the assemblies. I can generate a NuGet package with versioned assemblies (I should probably make that depend on Test
as well). Or I can run a Full
build which will clean, version, build, test and create the package.
I’d normally specify the task in the expected final order where possible, so "Package" <== [ "Verison"; "Build" ]
, but I just wanted to illustrate that FAKE is working out the required order, it isn’t a side-effect of the order dependencies are specified.↩
At its most basic, pattern matching can be use to represent standard conditionals and switch
statements. For example (in F#):
This did not initially seem very exciting to me. There has to be more to it than this, right? (Spoiler: yes :) )
Things get more interesting when we are dealing with types whose values can have different shapes. For example, Option<T>
(similar to Nullable<T>
in C#). In F# Option<T>
has an IsSome
property (like HasValue
for Nullable<T>
). If this is true
then it is safe to access that value’s Value
property. If IsSome
is false
, then accessing Value
will throw a NullReferenceException
. So we could (but please don’t) use option types like this:
I don’t like this. I’m not fond of null reference exceptions, and I don’t like checking IsSome
before accessing values because I do silly things like messing up the conditional, or forgetting to check and crashing with a NullReferenceException
(or if not forgetting, there are always those cases that “will never be null” which end up being just that due to a misunderstanding or a change somewhere along a call stack). And what about more complicated types, where we may have to check several different preconditions before accessing a number of different values?
Instead, we can use pattern matching to match all the possible shapes of our type:
This is great because we don’t need to access the null reference-throwing .Value
property. Instead the value is assigned as part of the pattern: Some value
. For the None
case there is no value we can access within the pattern. If we tried to add one, the compiler will stop and tell use we have the wrong pattern. What is extra great is that if we don’t cover all the possible allowable values of the type we are matching against the compiler will warn us.
So we’ve ruled out a whole bunch of errors, and have very explicit, compiler-checked documentation about valid ways to use values of each type.
This is awesome! Pattern match all teh things!
Say we have a collection of key value pairs, where both keys and values are strings. Maybe we got this from a POST request, or a flattened JSON object or something. We want to get the value for a particular key, and convert it to an integer (or 0
if we can not do the conversion).
So we have two cases that can be None
, looking up a value for a key that may not be in the JSON, and trying to convert the value to a valid integer.
Let’s start out with the conditional version:
Yuck, look at all those potentially catastrophic .Value
calls! Let’s rewrite it with our new-found hammer:
What isn’t so great is that we are still writing very similar code, just with safer pattern-matching instead of free-form conditionals. But we’re still going through the same code branches.
What I also found alarming when first starting out with this is a side-effect of the compiler warning us about unmatched values – we’re now forced to be explicit everywhere about how to handle all the values. Isn’t this going to get horribly verbose? We already have a good idea about when things are going to be null, so why trade concise code for a little safety?
Well, the good thing is we can have our safety and eat… er… code… concisely too!
Rather than digging into the details of a type by pattern matching all the time, we can define operations for using and combining values of these types. I often see these referred to as “combinators” (although that term seems overloaded). For example, we can rewrite our getRows
function using Option.bind
and Option.getOrElse
1 without ever digging in to grab a value from an Option<T>
type.
Under the hood this code is still doing exactly the same thing, but we are now expressing the operation in terms of other distinct operations, instead of via the details of deconstructing values2. This allows us to start thinking at a higher level of abstraction. Rather than thinking about things like “if this is Some value
return that, or if it is None
then return the second option”, we start thinking in terms of the higher-level operations like or
and map
. These operations allow us to more easily and precisely express more complex ideas.
This was a huge turning point for me. Previously I was worried about things like Option<T>
values propagating all over the code, and having to pattern match at each call site. Now we still get propagation (which is completely valid! If we are dealing with a call that can return an empty value, chances are the caller will also need to return an empty value), but there is no cost for this. Combinators make using these values almost as convenient as using the wrapped type3, with the benefit that we are now safely handling empty values instead of relying on us to remember which calls sometimes return null
instead of a T
.
If we mainly use combinators for combining types of values, this makes pattern matching a less essential part of a language. It is still a very nice feature to have, as it is pretty natural to implement combinators using pattern matching, and pattern matching seems to go hand-in-hand with sum types which I regard as an essential language feature. But for those who still do a lot of work in C# and similar languages this means that we can implement these combinators in others ways (sometimes messy ways, without as much compiler/type system help) and get a lot out of useful, oft-pattern-matched types like Option
and Either
.
My experience with pattern matching has gone from not understanding why it was useful, then to wanting to use it everywhere, now to favouring combinators and avoiding having to dig in to the details of a type as much as possible. Using these operations defined over types gives me a nice, high-level way of thinking about building up these values.
Pattern-matching is still really useful, particularly for defining operations over a type, but in general I try to use those defined operations instead, only falling back to pattern matching in the cases where it is much simpler (for example: cases like let (a,b) = foo
instead of let a = fst foo; let b = snd foo
).
If you currently use pattern matching all the time, maybe try to pull out the repeated operations the pattern matches represent and see if you prefer that style. Operations like map
, flatMap
, apply
, reduce
/fold
, and other combining functions along the lines of +
, and
, and or
are good places to start.
getOrElse
is not part of the Option
module in F#3, but thankfully we can add members to modules.↩
To me using combinators like this is similar to how we tend to use classes. The internal details of the class are stored in private fields, and we define methods to interact with instances of that class without having to know the details of those fields. Combinators give us the same level of abstraction – we can access operations over a type without knowing the patterns / specific constructors of that type.↩
…and every bit as easy as using an object with methods hanging off it, which is one valid way of implementing these combinator functions↩
Consider a call that takes 2 arguments and returns some value2:
Currying is the process of converting this to a function that takes a single argument, and returns another function that takes a single argument.
For functions with more than 2 arguments, we can use currying to convert it to a series of functions that each take a single argument:
Partial application is when we can have a function that takes multiple arguments, give it a subset of those arguments, and get back a function that will take the remaining arguments. With curried functions we get this ability for free, but you could imagine a language feature that implements this for uncurried functions:
I think it is correct to say that all curried functions support partial application, but not all partial application implementations require currying.
Also left as a comment to this post, modified slightly here↩
See Reading type annotations if this style of writing out types is unfamiliar.↩
Other typed languages and programming papers use a notation more like this:
I found it took a bit of getting used to, but I now much prefer to read and write this style. I think it is worth becoming familiar with, as it is used in quite a few languages1 and in all the programming papers I’ve seen. So here’s a quick guide on how to read this style of type annotation.
From the methodName
example above, we can see the structure has changed from “return type - name - arguments” to “name - arguments - return type”. So the main change is moving the return type from the beginning to the end.
A :
separates the name from the type signature. :
can be read as “has type”. Haskell unfortunately uses ::
for this, instead of the :
character which seems to be used pretty much everywhere else.
A ->
arrow separates function input from function output. So a -> b
reads as “I take values of type a
and produce values of type b
”.
Arguments are shown as a tuple of one or more types. In some languages (like ML, OCaml, and F#) tuple types are shown denoted by types separated by *
characters, so the signature would look like methodName : argType0 * argType1 -> returnType
.
There are a few different ways of representing generic parameters. Let’s take a function that, given a single element of some type, returns a singleton list of that type.
In Haskell, any type starting with a lowercase character is a type variable rather than a concrete type. In F# type parameters begin with a quote character '
. Not requiring an additional step to list generic parameters is handy.
Where this notation starts to show some advantages is with higher order functions. For example, say we want a generic map
function:
These functions take a function that translates Ts to As, and a list of Ts, to produce a list of As. The parentheses around the (t -> a)
in the Haskell-style signature show that this is a single argument (that happens to itself be another function). This is a bit cleaner than the equivalent Func<T, A>
in the C# version, particularly when the explicit type parameter declarations are taken into account. The difference becomes more noticeable as we increase the number of functions and type parameters:
In the map
example above a “more exact, less idiomatic translation” was shown:
map1
takes a function (t -> a)
and returns a function List t -> List a
. It would also be correct to write it as map1 :: (t -> a) -> (List t -> List a)
. In constrast, map2
takes a single argument that happens to be a tuple of ((t -> a), List t)
. If we are supplying both arguments at once there is not much difference, but the map1
version also lets us supply just the (t -> a)
argument to create a new function.
Being able to supply less than the full contingent of arguments to a function, and get back a function that takes the remainder of the arguments, is called partial application.
The map1
form of signature, where a function is built out of functions that each take a single argument, is called a curried function (map2
is “uncurried”). We get partial application, the ability to provide one argument at a time, for free with curried functions.
Curried function signatures in C# get unpleasant fairly quickly:
Some methods take no input and return a value (either a constant, or due to some side-effect). The “no input” value is normally represented by empty parenthesis ()
, and is called “unit” (because there is only a single legal value of this type, ()
).
Similarly for things that take an argument but produce no direct output value (i.e. performs a side-effect)2. Again, this is represented by unit:
This starts to look a bit funny when methods take other calls with no input and no direct output:
It does give some immediate clues as to where side-effects are in a type signature thought.
We’ve looked at different forms of type signatures, but this style also tends to work its way into method definitions, again using the form name : type
.
Haskell tends to split the type signature from definition. F# specifies the arguments as argName : argType
, and then gives the type of the resulting value (in this case List<'T>
. Generic type parameters are indicated with a '
prefix. Swift uses a similar style, but an arrow is used for the return type. Swift needs explicit declaration of generic type parameters.
In both the Haskell and F# cases the type information can actually be omitted – the type system will infer the correct type signature.
This has been a quick tour through the things that first tripped me up when reading type signatures from non-C-style languages.
The main habit I needed to break was expecting to see a type then a name. Instead, names are first, then their type shown. So method types change like this:
Similarly arguments go from ArgType name
to name : ArgType
.
Hope this helps!
Such as Haskell, F#, Swift, Scala, OCaml, ML, Idris, Elm, PureScript, and TypeScript.↩
Note that void
in C-style languages is different to the terms “unit” and “Void” in non-C-style languages. In C-style languages void
means “has no return value”, where a return type of ()
means “returns the value ()”. In contrast, the type Void
is one with no legal instance. We can never return a value of type Void
, so my understanding is a function a -> Void
can never return.↩
I’ve found this pattern incredibly useful in F#, Swift and Haskell. The examples here are in F#, but as far as I can tell we can use it anywhere that has generic types and higher-order functions.
Say we have some generic type, let’s call it Widget<T>
(we’ll use the term “widget” as a placeholder for a generic type we are working with - feel free to substitute in Option<T>
, Either<E,A>
, Future<T>
, List<T>
etc.). There are lots of useful functions that work with non-widget types, and we would like them to work with Widget
values without having to re-write them.
We can achieve this aim if the generic type has a map
(or Select
in C# terminology) and an apply
function. Continuing our Widget
example:
If the type does not have these functions provided we may still be able to write them. We’ll look at this later.
We can use any non-widget function with widget values using map
for the first argument, and apply
for subsequent arguments.
Say we are using a library with a Result<'Error, 'T>
type that represents operations that can fail with a value of type 'Error
, or succeed with a value of type 'T
. The library also supplies map
and apply
functions for this type. We want to use this type to try to parse a Person
value from a UI form with name
, email
and age
text fields:
Sometimes a type will not have an apply
function provided, but will have map
, and also a flatMap
/bind
function provided with the following type:
This is the case with the F# Option module, which provides map
and bind
with the required signatures. In these cases we can implement apply
in terms of the these other functions:
We can now use the pattern with optionals (and any type with map
and flatMap
/bind
):
In cases where we have a mix of arguments, some using our generic type and others not, we can still apply1 the pattern by converting the values to our generic type. For our Person.create
example, we could already have the person’s email as a valid string
value from earlier in the sign-up process:
Here we convert email
from a string
to a Result<AppError,string>
value first using the Success
constructor. Then we have our three Result<AppError,'T>
values to use with the apply pattern.
This pattern is useful for being able reuse all our existing functions in the context of another type, like Future<T>
, Option<T>
, Result<E,A>
and lots, lots more. To do this for some generic type Widget<T>
we need:
We then apply the non-widget function to the first argument using map
, and use apply
for subsequent applications.
Calls look similar to regular function application, with the additional operators taking care of conversion into our Widget<T>
context.
We can mix widget and non-widget arguments by converting non-widgets:
I wrote a bit more about how this works a while back, or search around for “applicative functor” if you are interested in the theory behind the practice. We can effectively use this pattern without delving into the details though - so we can apply now and ask questions later. :)
Sorry.↩
packages/FSharp.Formatting.CommandTool-2.8.0
to packages/FSharpFormatting.CommandTool-2.9.1
. We’d also taken our own copies of some templates in the package, and I wanted to check if there were any differences between -2.8.0\templates
and -2.9.1\templates
that I should port across.
Rather than my usual fumbling about (check out both, copy, diff) I thought I’d try to learn the necessary Git incantation to compare the paths. And then blog it, so that when I forget I’ll have a quick reference handy for next time. :)
I ended up using git diff
with the COMMIT:PATH
format, using HEAD
and HEAD~1
as the commit references (shown split over multiple lines):
To get a summary of the files changes instead (in this case, to confirm nothing changed), use the --stat
option:
I was pretty impressed that Git’s Bash prompt on Windows gave me autocompletion on the HEAD~1:/...2.8.0/
path despite the path no longer being in the working directory.
->
style function signatures please let me know.
The example we were looking at is Seq.unfold
, whose signature looks like this:
Any type prefixed with a '
character represents a type parameter (or generic type in C# parlance). For unfold
this means 'State
and 'T
can be any type. We can also write this in potentially more familiar .NET syntax:
A lot of the F# code I see follows a more Haskellish (?) convention of using lowercase type variable names, more like:
Types separated by a *
are tupled (or product types, which explains the *
symbol). For example, (1, "abc", Foo())
is of type int * string * Foo
.
So in unfold
, 'T * 'State
represents a tuple of 'T
and 'State
.
F# supports both .NET-style prefix generic syntax and ML-style postfix syntax. So instead of writing int option
, we can also write Option<int>
(both forms are equivalent). Which means we can re-write unfold
as:
unfold
With those things in mind, let’s use the unfold
signature to work out what it does.
Given a function that can take 's
values and return a tuple of an element and next 's
value or nothing, and a starting 's
, unfold
will generate a sequence of 't
values until the generator function returns None
(i.e. potentially infinite).
We could use this to generate a sequence of all the days since a starting date (infinite, at least until DateTime
hits DateTime.MaxValue
):
Finally, if you’re more familiar with C# or Haskell, here are my attempted translations:
Haskell uses lowercase type names for generics (instead of '
characters), while concrete types have uppercase names. It also uses the same syntax for tuple types as values, so (1,2) :: (Int, Int)
. For some odd reason, Haskell uses ::
for “type of” instead of a single :
.
The C# version is a bit messier due to having to use Func
instead of a shorthand for function types, and similarly for declaring tuple types. (I’ve also uncurried the C# version otherwise we end up with nested Func
types everywhere, and it is the more typical form for C# functions.)
error FS0729: This field is not a literal and cannot be used in a pattern
I’m not entirely sure it’s a good idea, but I managed to work around this using a partial active pattern.
Here’s the gist of the C# code I was working with:
Thingamabobs
represent a sort of enum with an associated value - the kind of thing we’d typically use a discriminated union for in F#.
Trying to convert this to my own type using pattern matching resulted in the FS0729 error:
I couldn’t find much information on this error, but I gather I need to explicitly compare the argument to the field value using if ... else if ... else
, something like:
I think it looks neater as a pattern match, so worked around this using a partial active pattern to do the comparison:
I’m not sure if there are any drawbacks to this approach, so if you can think of any please let me know.
]]>Part 3 of the tutorial ends with a bar chart that shows the relative frequency of letters used in the English language.
The creation of each bar per datum is handled by this code:
This says we’re dealing with chart elements of the CSS class .bar
for each datum. The .enter()
call tells D3 we want to perform the operations that follow for any new data (data that has entered source). We can also use .exit()
for data that is no longer in the source. If we want to handle updated data we can add properties directly (outside of enter()
/ exit()
).
To specify updates I had to change the data join so D3 knows how to differentiate added, removed and updated data. In this case we will use the name
property, which is a letter from A
to Z
.
Next we’ll modify the code to specify how to handle updated and removed data, instead of just what to do on enter()
for new data.
This was enough to update the chart, but the y-axis would draw the new axis over the top of the previous axis, so both values would show. This answer on StackOverflow suggested removing the axis and redrawing it each time, which worked well.
The next thing I wanted to try was animating changes to existing data. This turned out to be trivial thanks to D3’s transition()
method, which I just dumped prior to the code we used to update each bar.
And that’s it!
Here’s an example of the update in action. Use the radio buttons to alternate between the chart showing frequencies of letters in English and the frequencies of letters used in the source for this post.
]]>I got a Node ARDX kit at the event, and followed the Nodebots AU setup guide to get all the software bits and pieces.
For Haskelling, I used my existing Haskell installation, then created a new cabal sandbox
and installed the hArduino package (v0.9) into it.
Here’s a simple circuit that includes a potentiometer and a bunch of LEDs. The idea is that as someone turns up the potentiometer, the number of LEDs switched on increases accordingly. (Yes, this may seem somewhat unimpressive, but as a complete newbie who managed to do this without blowing anything up, I’m calling it a win! :))
So now I’m in my cabal sandbox and it’s time to write some Haskell. Here’s the main outline of the program (with some explanatory comments added).
After the initialisation stuff, the main bit of the program is the run loop, which polls the potentiometer and updates the LEDs whenever the value changes.
The updateLeds
and related code looks like this:
The updateLeds
function takes the potentiometer value and works out how many LEDs it needs to turn on based on the numLedsOn
function. It then loops through each numbered LED and turns it on or off based on whether the ledNum <= maxLedNum
we need to switch on.1
numLedsOn
doesn’t need to be a separate function like this, but I found it helped to be able to test my arithmetic independently of hardware. :) (We could also get away without specifying any types, but I find doing so makes it easier for me to read.)
Rather than setup a build, I just ran cabal repl
from my sandbox to get a GHCi with the hArduino
package accessible, then loaded and ran the code:
ghci> :load lights.hs
ghci> main
Now I could finally fulfill my life-long dream of adjusting LEDs using a twirly dial! Hooray! :)
The updateLeds
loop is a bit neater in applicative form, but assumes familiarity with the operators: for_ (zip leds [1..]) $ digitalWrite <$> fst <*> ((<= maxLedNum) . snd)
↩
Exercise 1.41 of SICP asks us to work out what the following expression will evaluate to, given the definition of double
:
If this expression had side-effects, we’d need to understand the evaluation order and keep track of state changes with each evaluation step. Because this is a pure function, we can substitute the definitions of each sub-expression, and in any order we like. I found these substitutions quite tricky though, because with each evaluation of double
I had to take into account its argument. I ended up with something like this:
I found this quite hard to follow. I had to use additional intermediary steps (not shown above) to evaluate expressions like (double (lambda (x) (inc (inc x))))
.
Instead of direct substitution we can transform the expression into equivalent terms we find easier to reason about, such as those with mathematical properties we can apply to simplify the expression. We still use substitution, but with other equalities instead of replacing a term’s name with its definition.
For this example, I found it easier to think of double
in terms of function composition.
Function composition is associative (which we can convince ourselves of using equalities1). I find this made it easier to reduce the nested double
calls.
Composition lets us deal with functions as values without having to substitute in for their arguments at each step. The fact composition is associative hides unimportant details, such as not having to worry about the order of composition in expressions like (double . double) . (double . double)
. I found each step made more sense to me, and I had more confidence in my answer.
I imagine different people will find different forms easier than others, but the point is we can choose whichever transformations we like to get the expression into an equivalent form we can work with more easily.
For a while now I’ve appreciated that pure functions mean we can more easily use substitution to understand code, but it wasn’t until this exercise that I’ve finally started to get a vague idea of what equational reasoning means. It is more than just substitution – it is being able to use all sorts of transformations and properties to understand our code. I don’t think this example really showcases this idea, but I did feel like it was my first glimpse into a different, powerful way of understanding code.
My attempt at showing function composition is associative:
-- Associativity of composition
f . g = \x -> f (g x)
a . (b . c) = \x -> a ( (b.c) x)
= \x -> a ( (\x' -> b (c x')) x)
= \x -> a ( b (c x) )
= \x -> (a . b) (c x)
= \x -> ((a . b) . c) x
= (a . b) . c
I’m using xUnit for these examples, but all of this should apply to NUnit (and other test runners) too. I’ve put all the code in a Sample.fs gist if you want to see it all in one place.
I’ve written about getting started with NUnit in F# before. We can also use xUnit and its built in assertions.
I needed to specify the int list
type explicitly to get F# to resolve the correct overload.
Here’s an example of an assertion failure (when I change the expected value to [2;3;5]
):
Position: First difference is at position 2
Expected: FSharpList<Int32> { 2, 3, 5 }
Actual: FSharpList<Int32> { 2, 3, 4 }
FsUnit provides helpers for NUnit, xUnit, MbUnit, and MSTest assertions to make them play nicely with F# syntax and type inference. I installed the FsUnit.Xunit
package.
Sample failure:
Position: First difference is at position 0
Expected: Equals [2; 3; 5]
Actual: was [2; 3; 4]
(I’m not sure why first difference is at position 0 here?)
Unquote lets us use quoted expressions for assertions.
If the test fails, Unquote shows each step in reducing the expression so you can see where they start to differ:
List.map Sample.incr [1; 2; 3] = [2; 3; 5]
[2; 3; 4] = [2; 3; 5]
false
This case only shows 3 steps, but more complex expressions will show more.
FsCheck is influenced by Haskell’s QuickCheck and Scala’s scalacheck. Rather than asserting a specific input and output, we define a property that should hold for all values of a type (optionally requiring they meet certain criteria, such as being a positive integer).
The FsCheck.Xunit package has specific support for xUnit through a PropertyAttribute
that let us run properties directly as an xUnit test (otherwise a little more boilerplate is required, see “Using FsCheck with other testing frameworks” in the Quick Start guide).
If we modify the property to ensure it fails (for example, (List.map f << List.map g) xs = List.map (f << g) (List.filter even xs)
), we get this output:
FsCheck.Xunit.PropertyFailedException
Falsifiable, after 3 tests (2 shrinks) (StdGen (267259328,295888818)):
[1]
This shows that given an input of [1]
the property does not hold.
Fuchu is more focussed on test organisation than assertions, and can be used with any of the assertion-providing libraries above. If you’d like to try something different to the usual (N|x|Mb)Unit approaches for defining test cases then give it a look.
]]>So I decided to give up and find a way to use hand-drawn sketches instead. Here’s the method I ended up with, based almost entirely on Marc Liberatore’s “Whiteboard Diagrams as PDFs” post and the wonderful ImageMagick and Potrace utilities.
My drawings still look fairly terrible, but at least they convey what I want them to and are quick to produce! :)
brew install imagemagick
Here’s a sample result of me unleashing my inner da Vinci on a poor, defenceless bit of paper.
I’ve found a thick texta/marker works well, but standard ball-point pens can come out alright too.
Next, take a photo with ye olde phone camera. I avoid using the flash - even light (no shadows) is best. (Try holding the paper up vertically so your phone does not cast a shadow over the paper.)
My phone auto-uploads photos to an online thingoe from which I can quickly crop the image and download to my Mac.
Next up I want to convert the photo to a grayscale bitmap and turn up the brightness and contrast to wash out the background and bring out the marker lines. I’m using ImageMagick’s convert
to quickly do this from the console.
We’ll then run the bitmap through potrace
as described in Marc’s post to create a nice SVG. We can stop there, or use ImageMagick again to get a PNG out.
Here’s the original photo, which I’ve cropped and saved as sketch.jpg
:
Then the adjustments:
# Convert to grayscale BMP. Dial up brightness (20) and contrast (10)
% convert sketch.jpg -colorspace Gray -brightness-contrast 20x10 sketch.bmp
# Convert to SVG (-s), set a reasonable height, smooth speckles (-t 10)
% potrace -s -H 400pt -t 10 sketch.bmp
# Convert SVG to PNG (using 256 colours)
% convert sketch.svg PNG8:sketch.png
And here’s the output:
You may need to tweak settings like brightness, contrast, dimensions, PNG quality/size1 and so on.
Might be worth also running Pngcrush or similar optimiser over the resulting PNG: brew install pngcrush; png crush sketch.png sketch2.png
. A GUI option for Mac is ImageOptim.↩
UPDATE 2019-02: I tend to use ghcup on Mac and Linux these days. Am leaving these steps here as building GHC and Cabal is still a valid way of getting Haskell up and running.
This is how I got it working on my Mac, with loads of help from ddere and bitemyapp on the #haskell-beginners channel on Freenode IRC. It is reasonable to assume all mistakes in this write up are mine, while they deserve the credit for any useful bits.
I’ve got XCode 5.1.1 installed, which I believe is a prerequisite (or at least the dev tools?). Other than that, grab a terminal and a browser, and we’re set to go.
UPDATE 2019-02: Check out ghcup for an easier way to get a platformless Haskell running on Mac or Linux. I’ve switched to using that to manage Haskell installations.
Here’s the summary if you want the steps without explanation:
configure --prefix=<my-dir>
, make install
, and add to PATHbin
directorycabal update; cabal install cabal cabal-install alex happy
~/.cabal/bin/
to PATHcabal sandbox init
)~/.cabal/bin
; or install directly into ~/.cabal
and rm -rf ~/.ghc
if we ever get build conflicts. I’m doing the former.The rest of the post will go through the specific commands used, and explain some of the decisions you might need to make.
~/dev/ghc-7.8.2
)./configure --prefix=<my-dir>
from the extract directory. I used ./configure --prefix=/Users/dave/dev/ghc
.make install
~/dev/ghc/bin
for me.We should now be able to run ghc
, ghci
and co. Success!
cabal
binary somewhere. I put mine in alongside my GHC binaries in ~/dev/ghc/bin
so it is on my PATH and I can quickly fallback to it if I nuke everything else but GHC.cabal update
to initialise the package database.This will just be used to kick off our cabal-ing. Afterwards we’ll be managing cabal with cabal (for that nice recursive touch).
We’re now going to build and install some final bits and pieces into Cabal’s user-db (stored in ~/.cabal/
).
% cabal install cabal cabal-install alex happy
Next up I adjusted my PATH to make sure binaries are loaded from ~/.cabal/bin
first1. My PATH now looks like this:
export PATH=~/.cabal/bin:~/dev/ghc/bin:(non-haskell stuff)
We should now have everything we need to build Haskell projects. For projects we’ll run all our cabal install
commands within a sandbox.
% mkdir myNewProj
% cd myNewProj
% cabal sandbox init
% cabal init
-- insert joyous haskelling here --
Sometimes we’d like to use cabal
to install some binaries like hlint
, hoogle
or pointfree
. I’ve heard a few schools of thought on this.
Here is what I’ve found works reasonably well for me. I’ve created a directory ~/dev/hs/
to build these utilities in. From there:
~/dev/hs/ % mkdir hlint
~/dev/hs/ % cd hlint
~/dev/hs/hlint % cabal sandbox init
~/dev/hs/hlint % cabal install hlint
~/dev/hs/hlint % ln -s "$(pwd)/.cabal-sandbox/bin/hlint" ~/.cabal/bin/
This builds gives us a fresh hlint
binary and creates a symbolic link to it in the .cabal/bin
directory (i.e. somewhere on my PATH). Sometimes I’ll copy instead of symlink.
The good thing about this is if I need to use specific versions of a particular dependent library for a build I can cabal install
it without worrying about it affecting other builds outside the sandbox.
The catch is some libraries also link against static assets that get put in $(pwd)/.cabal-sandbox/share
, which means if we move or delete this sandbox that binary will stop working.
The other approach is to cabal install
the utility outside of a sandbox. This means all docs and static assets go into a safe location (~/.cabal
), but on the downside we’ll sometimes get build failures due to library version conflicts.
In these cases we need to delete everything in ~/.ghc
and try again. I have it on good authority from several sources that this is no problem. All our binaries in ~/.cabal
should still work, it just means next cabal install
won’t rely on cached library builds.
Still, I feel more comfortable with the sandboxed build approach (almost definitely because I don’t fully understand what’s going on behind the scenes).
At the time of writing I had some trouble building the wonderful Pandoc library due to a change in a dependent library. Pandoc is a library that relies on statically linked assets by default which was mentioned in the sandboxed builds section as a possible problem. Thankfully it provides a build option to embed these assets.
% cd ~/dev/hs
% mkdir pandoc
% cd pandoc
% cabal sandbox init
% cabal install exceptions-0.4
% cabal install hsb2hs
% cabal install pandoc -fembed_data_files
% cp "$(pwd)/.cabal-sandbox/bin/" ~/.cabal/bin/
Installing a specific version of exceptions-0.4
fixed the build problem, while passing the -fembed_data_files
option to the Pandoc build embeds the static assets so we can move the binary and delete the sandbox without breaking Pandoc.
Thanks to Carter for telling me which version of exceptions
I needed, and about -fembed_data_files
for Pandoc.
This seems to be working ok for me, but if you can see any problems with this approach or can suggest any improvements please let me know and I’ll update the post.
We’ve now installed a verion of the cabal
binary into ~/.cabal/bin
. By putting that into our PATH first we’ll always use the latest version for our builds. If we lose our ~/.cabal
for some reason then we can fall back to the one we put into the ghc
folder earlier.↩
Say we want to parse out some information from basic Liquid tags, like this:
We can give ourselves two problems and implement this using System.Text.RegularExpressions
(it looks almost identical in C#, see this gist or footnote1):
First up we need to add the RegexProvider to our project via nuget: PM> Install-Package RegexProvider
.
Now we can rewrite our previous implementation like this:
This will compile equivalently to our previous implementation2, but we’ve gained some nice static checks.
We can access the tag
and contents
capture groups of our match as properties. This isn’t a method_missing
-style dynamic lookup – if we rename the group in the regex to (?<notTag>\w+)
then we get a compile-time error:
error FS0039: The field, constructor or member 'tag' is not defined
Also neat, if we completely muck up our regex, the compiler will let us know:
error FS3033: The type provider ... reported an error: parsing "[asd" -
Unterminated [] set.
Tests would catch both these errors, but feedback doesn’t get much faster than “as we’re typing the code”, plus we get precise line numbers for the errors as well. It also reduces code noise, dealing directly with the capture group names rather than having to specify particular collection lookups.