Elevate Your Elixir With Sigils
Adding support for real-valued intervals implementing the Enumerable Protocol in Elixir.
Motivation
In a previous article of mine, I wrote about using NimbleOptions to add extremely powerful option handling to your Elixir applications. One of the custom validations I was using was a function in_range
that would check if an option fell within a real-valued interval. This differs from Elixir's built-in Range
in that it needed to be real-valued (rather than discrete integer steps). Additionally, mostly due to aesthetic and personal opinion, I wanted to be able to express the intervals using mathematical notation such as (0,1]
to mean "allow any value greater than 0 and less than or equal to 1". I find Elixir to be such a beautiful language with a unique capacity for extensions that it felt wrong to use a function such as in_range
or in_interval
. Additionally, some implementations I've come across have somewhat unintuitive APIs, such as the following spec:
@spec in_range(float, float, float, bool, bool) :: bool
@doc """
* `:value` - Value to test for inclusion
* `:min` - Minimum value in range
* `:max` - Maximum value in range
* `:left` - Whether the left boundary is inclusive (true) or exclusive (false)
* `:right` - Whether the right boundary is inclusive (true) or exclusive (false)
"""
def in_range(value, min \\ 0, max \\ 1, left \\ true, right \\ true)
There's nothing expressly wrong with this implementation, but with my use cases and as Elixir is being used more in the domain of Machine Learning which deals with these intervals quite often, I wanted a solution that felt a bit more integrated.
Solution
This led me to create a small 1-file library called Exterval
which is available on Hex and can be installed with:
def deps do
[
{:exterval, "~> 0.1.0"}
]
end
To make the interval feel more native to Elixir, I implemented it as a sigil that implements the Enumerable
Protocol, which gives you several nice benefits:
- Takes advantage of the
member?/2
function which means we can use thein
keyword to check for membership - Allows for checking of sub-interval membership (with some caveats)
- Implements an optional
step
parameter that allows you to iterate/reduce over the interval - Implements a
size
function (remember,size
refers to the ability to count the number of members without reducing over the whole structure, whereaslengths
implies a need to reduce). - Allows for
:infinity
and:neg_infinity
to be specified in the interval
This lets us write more succinct checks like:
iex> import Exterval
iex> ~i<[1, 10)//2>
[1, 10)//2
iex> ~i<[1, 10)//2> |> Enum.to_list()
[1.0, 3.0, 5.0, 7.0, 9.0]
iex> ~i<[1, 10)//2> |> Enum.sum()
25.0
iex> ~i<[-1, 3)//-0.5> |> Enum.to_list()
[2.5, 2.0, 1.5, 1.0, 0.5, 0.0, -0.5, -1.0]
iex> ~i<[1, 10]> |> Enum.count()
:infinity
iex> ~i<[1, 10)//2> |> Enum.count()
4
iex> ~i<[-2,-2]//1.0> |> Enum.count()
1
iex> ~i<[1,2]//0.5> |> Enum.count()
3
iex> ~i<[-2,-1]//0.75> |> Enum.count()
2
iex> 1 in ~i<[1, 10]>
true
iex> 1 in ~i<[1, 10)//2>
true
iex> 3 in ~i<(1, 10)//2>
true
# You can even do variable substitution using string interpolation syntax, since the sigil parameter is just a string
iex> min = 2
iex> 3 in ~i<(#{min + 1}, 10)//2>
false
Design Details
The decision to implement the interval as a sigil was not as straightforward as it might seem. As I mentioned before, Elixir is an extremely extensible language with superior support for meta-programming, so implementing this as a macro was my first instinct. I considered commandeering the opening brackets (
and [
to trigger the macro, or something similar with the comma ,
, but fortunately, I hit a brick wall with that effort. I say fortunately not only because it would have been a bad idea from a design perspective, but it certainly would have been a messier implementation and would have overly complicated it in addition to actually making the code less clear. I appreciate the usage of the sigil ~I
because it makes it clear that the range that follows is not to be confused with the built-in Range
.
Once I decided on the usage of the Enumerable
protocol, I knew I wanted to allow some way for an optional step size to be specified so that reduce
could be used on the structure. Elixir sigils allow for parameters to be passed after the closing sigil, so initially, I considered passing in the step size as a parameter since zero or more ASCII letters and digits can be given as a modifier to the sigil, but this would prohibit having floats as step sizes. Another constraint to consider when using sigils is that string interpolation is only allowed within the sigil when using a lowercase sigil. Sigils start with ~
and are followed by one lowercase letter or by one or more uppercase letters, immediately followed by one of the allowed delimiter pairs. Within the context of our use case, we happen to be able to get by without having string interpolation since we use it within pre-defined parameters that are hardcoded, but the library becomes much more useful if we can have dynamically defined intervals, so this limits how the sigil is named.
Another major design decision was how to actually parse the sigil. I ultimately landed on the straightforward answer of just using a regex, but I had a decent back-and-forth with my friend Paulo from the Elixir-Nx core team regarding other options. He provided some nice proofs of concept using binary pattern matching as well as NimbleParsec, but I decided on a regex due to my familiarity, its ability to reduce the amount of code, and because I was not too concerned with performance concerns with what will typically be short patterns.
One of the last design details finalized was how to treat the step size and its effect on item membership. Paulo and I discussed whether it should support ranges where the min and max values did not necessarily have to be in the correct order (e.g. ~i<1,-1//0.5>
) which would essentially imply that any iteration would start at 1
in this instance and would work towards -1
in steps of 0.5
. This was discussed since it can be seen in some other implementations throughout other ecosystems. We decided that the most clear solution, as well as the solution that fit best within the spirit of the library, was to enforce that the first value specified be less than or equal to the second value, and any desire to iterate starting with the max value could be specified using a negative step size.
Implementation Details
Creation
An interval is stored as a struct with the following fields:
left
- the left bracket, either[
or(
.right
- the right bracket, either]
or)
.min
- the lower bound of the interval. Can be:neg_infinity
or any number.max
- the upper bound of the interval. Can be:infinity
or any number.step
- the step size of the interval. Ifnil
, the interval is continuous.
To define a sigil, you create a function with the name of the sigil prefixed by sigil_
, so since I wish to use this sigil using ~i
I define it as
def sigil_i(pattern, []) do
end
The second parameter are the options to the sigil I mentioned earlier. For now these are unused.
I parse the input to the sigil using the following regex:
^(?P<left>\[|\()\s*(?P<min>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:neg_infinity)\s*,\s*(?P<max>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:infinity)\s*(?P<right>]|\))(?:\/\/(?P<step>[-+]?(?:[1-9]+|\d+\.\d+)(?:[eE][-+]?\d+)?))?$
Using the named capture groups I perform some additional validation such as ensuring that the interval goes from the minimum value to the maximum.
Enumerable – Size / Count
The first function I need to implement for the protocol is the Enumerable.count/1
function. Logically, there are three conditions to account for. First are the instances where the size is either zero or infinity. Since Enumerable.count/1
must return a number on success, I choose to return {:error, Infinity}
from Enumerable.count/1
when I wish to return :infinity
. This would normally be used to return a module which can perform a reduction to compute the count, but if we just make a simple helper module
defmodule Infinity do
@moduledoc false
def reduce(%Exterval{}, _, _), do: {:halt, :infinity}
end
Now I can get my desired behavior. I implement these cases with the following:
def size(interval)
def size(%__MODULE__{step: nil}), do: {:error, Infinity}
def size(%__MODULE__{max: :neg_infinity}), do: 0
def size(%__MODULE__{min: :infinity}), do: 0
def size(%__MODULE__{min: min, max: max})
when min in [:infinity, :neg_infinity] or max in [:infinity, :neg_infinity],
do: {:error, Infinity}
Lastly I separate cases where the step size is negative and where its positive since the logic is different.
def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step < 0 do
case {left, right} do
{"[", "]"} ->
abs(trunc((max - min) / step)) + 1
{"(", "]"} ->
abs(trunc((max - (min - step)) / step)) + 1
{"[", ")"} ->
abs(trunc((max + step - min) / step)) + 1
{"(", ")"} ->
abs(trunc((max + step - (min - step)) / step)) + 1
end
end
def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step > 0 do
case {left, right} do
{"[", "]"} ->
abs(trunc((max - min) / step)) + 1
{"(", "]"} ->
abs(trunc((max - (min + step)) / step)) + 1
{"[", ")"} ->
abs(trunc((max - step - min) / step)) + 1
{"(", ")"} ->
abs(trunc((max - step - (min + step)) / step)) + 1
end
end
Enumerable – Reduce
The implementation for reduce
is a great example of how Elixir's pattern matching in function headers can reduce visual complexity and even the implementation itself. First, we return :infinity
if step
is nil
.
def reduce(%Exterval{step: nil}, acc, _fun) do
{:done, acc}
end
Next, we again have different clauses depending on if the step
is positive or negative, since that dictates which direction with respect to the interval the reduction occur.
def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
when step > 0 do
case left do
"[" ->
reduce(min, max, right, acc, fun, step)
"(" ->
reduce(min + step, max, right, acc, fun, step)
end
end
def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
when step < 0 do
case right do
"]" ->
reduce(min, max, left, acc, fun, step)
")" ->
reduce(min, max + step, left, acc, fun, step)
end
end
Notice that these clauses to the reduce/3
implementation return a different reduce/6
function which is specific to our module.
Next we handle conditions where the reduction is halted or suspended:
efp reduce(_min, _max, _closing, {:halt, acc}, _fun, _step) do
{:halted, acc}
end
defp reduce(min, max, closing, {:suspend, acc}, fun, step) do
{:suspended, acc, &reduce(min, max, closing, &1, fun, step)}
end
Next we handle edge cases involving :infinity
and :neg_infinity
where we have no way to begin the reduction since we cannot move step
increments away from either of these when they are our starting point:
defp reduce(:neg_infinity, _max, _closing, {:cont, acc}, _fun, step) when step > 0 do
{:done, acc}
end
defp reduce(_min, :infinity, _closing, {:cont, acc}, _fun, step) when step < 0 do
{:done, acc}
end
Interestingly, these are cases where the size of the intervals would be :infinity
but we cannot reduce over them at all, as opposed to other infinitely sized intervals where we can begin iteration which will never end, such as ~i<[0,:infinity]//1>
which would effectively be an infinite stream starting at 0
and incrementing by 1
.
Next we add all of the main logic for the "typical" cases:
defp reduce(min, max, "]" = closing, {:cont, acc}, fun, step)
when min <= max do
reduce(min + step, max, closing, fun.(min, acc), fun, step)
end
defp reduce(min, max, ")" = closing, {:cont, acc}, fun, step)
when min < max do
reduce(min + step, max, closing, fun.(min, acc), fun, step)
end
defp reduce(min, max, "[" = closing, {:cont, acc}, fun, step)
when min <= max do
reduce(min, max + step, closing, fun.(max, acc), fun, step)
end
defp reduce(min, max, "(" = closing, {:cont, acc}, fun, step)
when min < max do
reduce(min, max + step, closing, fun.(max, acc), fun, step)
end
And lastly we add the final case where the condition that min < max
(or min <= max
depending on the brackets) is no longer met, which means the reduction is complete:
defp reduce(_, _, _, {:cont, acc}, _fun, _up) do
{:done, acc}
end
Just like that the reduce/3
implementation is complete! As I mentioned before and notes in more detail, there are some opinions inherit to this implementation having to do with :infinity
and :neg_infinity
bounds as well as empty intervals, but I tried to keep the behavior consistent throughout.
Enumerable – Membership
Now on to the part that I was most interested in, which is interval membership. First, let's add support for checking membership between two intervals, which is essentially a check for one interval being a sub-interval of another.
Sub-interval must satisfy the following to be a subset:
- The minimum value of the subset must belong to the superset.
- The maximum value of the subset must belong to the superset.
- The step size of the subset must be a multiple of the step size of the superset.
If the superset has no step size, then only the first two conditions must be satisfied.
if the superset has a step size, and the subset doesn't then membership is false
.
def member?(%Exterval{step: nil} = outer, %Exterval{} = inner) do
res = inner.max in outer && inner.min in outer
{:ok, res}
end
def member?(%Exterval{}, %Exterval{step: nil}) do
{:ok, false}
end
def member?(%Exterval{} = outer, %Exterval{} = inner) do
res = inner.max in outer && inner.min in outer && :math.fmod(inner.step, outer.step) == 0
{:ok, res}
end
Then that just leaves the main implementation for membership checks, which is basically just a case
statement which changes the output depending on the brackets supplied. Additionally, if the interval contains a step
then the value being checked must be a multiple of the step
.
def member?(%Exterval{} = rang, value) when is_number(value) do
res =
if Exterval.size(rang) == 0 do
{:ok, false}
else
case {rang.left, rang.min, rang.max, rang.right} do
{_, :neg_infinity, :infinity, _} ->
true
{_, :neg_inf, max_val, "]"} ->
value <= max_val
{_, :neg_infinity, max_val, ")"} ->
value < max_val
{"[", min_val, :infinity, _} ->
value >= min_val
{"(", min_val, :infinity, _} ->
value > min_val
{"[", min_val, max_val, "]"} ->
value >= min_val and value <= max_val
{"(", min_val, max_val, "]"} ->
value > min_val and value <= max_val
{"[", min_val, max_val, ")"} ->
value >= min_val and value < max_val
{"(", min_val, max_val, ")"} ->
value > min_val and value < max_val
_ ->
raise ArgumentError, "Invalid range specification"
end
end
res =
unless is_nil(rang.step) || rang.min == :neg_infinity || rang.max == :infinity do
res && :math.fmod(value - rang.min, rang.step) == 0
else
res
end
{:ok, res}
end
Inspect
Lastly, to make the user experience a bit better, it's not too difficult to implement the Inpect
Protocol to provide a cleaner output:
defimpl Inspect do
import Inspect.Algebra
import Kernel, except: [inspect: 2]
def inspect(%Exterval{left: left, right: right, min: min, max: max, step: nil}, opts) do
concat([string(left), to_doc(min, opts), ",", to_doc(max, opts), string(right)])
end
def inspect(%Exterval{left: left, right: right, min: min, max: max, step: step}, opts) do
concat([
string(left),
to_doc(min, opts),
",",
to_doc(max, opts),
string(right),
"//",
to_doc(step, opts)
])
end
end
Future Plans
Currently I am weighing the options between adding more functionality to the library or keeping it as thin as it currently is. The main additions could be more robust set operations on the intervals, but I currently do not have a need for it so it will probably not make it into the library in the near future.
For now, I hope this provided a detailed look at the process of identifying a problem, and subsequently designing and implementing the solution. I found this to be an elegant solution to the problem, but as I mentioned it was not a straight-line path. I would be interested to hear about any other solutions people have seen!