Elevate Your Elixir With Sigils

Elevate Your Elixir With Sigils

Adding support for real-valued intervals implementing the Enumerable Protocol in Elixir.

Motivation

In a previous article of mine, I wrote about using NimbleOptions to add extremely powerful option handling to your Elixir applications. One of the custom validations I was using was a function in_range that would check if an option fell within a real-valued interval. This differs from Elixir's built-in Range in that it needed to be real-valued (rather than discrete integer steps). Additionally, mostly due to aesthetic and personal opinion, I wanted to be able to express the intervals using mathematical notation such as (0,1] to mean "allow any value greater than 0 and less than or equal to 1". I find Elixir to be such a beautiful language with a unique capacity for extensions that it felt wrong to use a function such as in_range or in_interval. Additionally, some implementations I've come across have somewhat unintuitive APIs, such as the following spec:

@spec in_range(float, float, float, bool, bool) :: bool
@doc """
  * `:value` - Value to test for inclusion
  * `:min` - Minimum value in range
  * `:max` - Maximum value in range
  * `:left` - Whether the left boundary is inclusive (true) or exclusive (false)
  * `:right` - Whether the right boundary is inclusive (true) or exclusive (false)
"""
def in_range(value, min \\ 0, max \\ 1, left \\ true, right \\ true)

There's nothing expressly wrong with this implementation, but with my use cases and as Elixir is being used more in the domain of Machine Learning which deals with these intervals quite often, I wanted a solution that felt a bit more integrated.

Solution

This led me to create a small 1-file library called Exterval which is available on Hex and can be installed with:

def deps do
[
  {:exterval, "~> 0.1.0"}
]
end

To make the interval feel more native to Elixir, I implemented it as a sigil that implements the Enumerable Protocol, which gives you several nice benefits:

  • Takes advantage of the member?/2 function which means we can use the in keyword to check for membership
  • Allows for checking of sub-interval membership (with some caveats)
  • Implements an optional step parameter that allows you to iterate/reduce over the interval
  • Implements a size function (remember, size refers to the ability to count the number of members without reducing over the whole structure, whereas lengths implies a need to reduce).
  • Allows for :infinity and :neg_infinity to be specified in the interval

This lets us write more succinct checks like:

iex> import Exterval
iex> ~i<[1, 10)//2>
[1, 10)//2
iex> ~i<[1, 10)//2> |> Enum.to_list()
[1.0, 3.0, 5.0, 7.0, 9.0]
iex> ~i<[1, 10)//2> |> Enum.sum()
25.0
iex> ~i<[-1, 3)//-0.5> |> Enum.to_list()
[2.5, 2.0, 1.5, 1.0, 0.5, 0.0, -0.5, -1.0]
iex> ~i<[1, 10]> |> Enum.count()
:infinity
iex> ~i<[1, 10)//2> |> Enum.count()
4
iex> ~i<[-2,-2]//1.0> |> Enum.count()
1
iex> ~i<[1,2]//0.5> |> Enum.count()
3
iex> ~i<[-2,-1]//0.75> |> Enum.count()
2
iex> 1 in ~i<[1, 10]>
true
iex> 1 in ~i<[1, 10)//2>
true
iex> 3 in ~i<(1, 10)//2>
true
# You can even do variable substitution using string interpolation syntax, since the sigil parameter is just a string
iex> min = 2
iex> 3 in ~i<(#{min + 1}, 10)//2>
false

Design Details

The decision to implement the interval as a sigil was not as straightforward as it might seem. As I mentioned before, Elixir is an extremely extensible language with superior support for meta-programming, so implementing this as a macro was my first instinct. I considered commandeering the opening brackets ( and [ to trigger the macro, or something similar with the comma , , but fortunately, I hit a brick wall with that effort. I say fortunately not only because it would have been a bad idea from a design perspective, but it certainly would have been a messier implementation and would have overly complicated it in addition to actually making the code less clear. I appreciate the usage of the sigil ~I because it makes it clear that the range that follows is not to be confused with the built-in Range.

💡
You can read more about Elixir sigils here and see their syntax reference here. Of note, you can use any of the allowed delimiter pairs that it lists to capture your sigil. I chose [ and ] so as to not conflict with the brackets used in the interval. You could also use something like ~i|[0,1)| if you prefer.

Once I decided on the usage of the Enumerable protocol, I knew I wanted to allow some way for an optional step size to be specified so that reduce could be used on the structure. Elixir sigils allow for parameters to be passed after the closing sigil, so initially, I considered passing in the step size as a parameter since zero or more ASCII letters and digits can be given as a modifier to the sigil, but this would prohibit having floats as step sizes. Another constraint to consider when using sigils is that string interpolation is only allowed within the sigil when using a lowercase sigil. Sigils start with ~ and are followed by one lowercase letter or by one or more uppercase letters, immediately followed by one of the allowed delimiter pairs. Within the context of our use case, we happen to be able to get by without having string interpolation since we use it within pre-defined parameters that are hardcoded, but the library becomes much more useful if we can have dynamically defined intervals, so this limits how the sigil is named.

Another major design decision was how to actually parse the sigil. I ultimately landed on the straightforward answer of just using a regex, but I had a decent back-and-forth with my friend Paulo from the Elixir-Nx core team regarding other options. He provided some nice proofs of concept using binary pattern matching as well as NimbleParsec, but I decided on a regex due to my familiarity, its ability to reduce the amount of code, and because I was not too concerned with performance concerns with what will typically be short patterns.

One of the last design details finalized was how to treat the step size and its effect on item membership. Paulo and I discussed whether it should support ranges where the min and max values did not necessarily have to be in the correct order (e.g. ~i<1,-1//0.5>) which would essentially imply that any iteration would start at 1 in this instance and would work towards -1 in steps of 0.5. This was discussed since it can be seen in some other implementations throughout other ecosystems. We decided that the most clear solution, as well as the solution that fit best within the spirit of the library, was to enforce that the first value specified be less than or equal to the second value, and any desire to iterate starting with the max value could be specified using a negative step size.

Implementation Details

Creation

An interval is stored as a struct with the following fields:

  • left - the left bracket, either [ or (.
  • right - the right bracket, either ] or ).
  • min - the lower bound of the interval. Can be :neg_infinity or any number.
  • max - the upper bound of the interval. Can be :infinity or any number.
  • step - the step size of the interval. If nil, the interval is continuous.

To define a sigil, you create a function with the name of the sigil prefixed by sigil_, so since I wish to use this sigil using ~i I define it as

def sigil_i(pattern, []) do
end

The second parameter are the options to the sigil I mentioned earlier. For now these are unused.

I parse the input to the sigil using the following regex:

  ^(?P<left>\[|\()\s*(?P<min>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:neg_infinity)\s*,\s*(?P<max>[-+]?(?:\d+|\d+\.\d+)(?:[eE][-+]?\d+)?|:infinity)\s*(?P<right>]|\))(?:\/\/(?P<step>[-+]?(?:[1-9]+|\d+\.\d+)(?:[eE][-+]?\d+)?))?$

Using the named capture groups I perform some additional validation such as ensuring that the interval goes from the minimum value to the maximum.

Enumerable – Size / Count

The first function I need to implement for the protocol is the Enumerable.count/1 function. Logically, there are three conditions to account for. First are the instances where the size is either zero or infinity. Since Enumerable.count/1 must return a number on success, I choose to return {:error, Infinity} from Enumerable.count/1 when I wish to return :infinity. This would normally be used to return a module which can perform a reduction to compute the count, but if we just make a simple helper module

defmodule Infinity do
  @moduledoc false
  def reduce(%Exterval{}, _, _), do: {:halt, :infinity}
end

Now I can get my desired behavior. I implement these cases with the following:

def size(interval)
def size(%__MODULE__{step: nil}), do: {:error, Infinity}
def size(%__MODULE__{max: :neg_infinity}), do: 0
def size(%__MODULE__{min: :infinity}), do: 0

def size(%__MODULE__{min: min, max: max})
    when min in [:infinity, :neg_infinity] or max in [:infinity, :neg_infinity],
    do: {:error, Infinity}

Lastly I separate cases where the step size is negative and where its positive since the logic is different.

def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step < 0 do
  case {left, right} do
    {"[", "]"} ->
      abs(trunc((max - min) / step)) + 1

    {"(", "]"} ->
      abs(trunc((max - (min - step)) / step)) + 1

    {"[", ")"} ->
      abs(trunc((max + step - min) / step)) + 1

    {"(", ")"} ->
      abs(trunc((max + step - (min - step)) / step)) + 1
  end
end

def size(%__MODULE__{left: left, right: right, min: min, max: max, step: step}) when step > 0 do
  case {left, right} do
    {"[", "]"} ->
      abs(trunc((max - min) / step)) + 1

    {"(", "]"} ->
      abs(trunc((max - (min + step)) / step)) + 1

    {"[", ")"} ->
      abs(trunc((max - step - min) / step)) + 1

    {"(", ")"} ->
      abs(trunc((max - step - (min + step)) / step)) + 1
  end
end

Enumerable – Reduce

The implementation for reduce is a great example of how Elixir's pattern matching in function headers can reduce visual complexity and even the implementation itself. First, we return :infinity if step is nil.

def reduce(%Exterval{step: nil}, acc, _fun) do
  {:done, acc}
end
    

Next, we again have different clauses depending on if the step is positive or negative, since that dictates which direction with respect to the interval the reduction occur.

def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
    when step > 0 do
  case left do
    "[" ->
      reduce(min, max, right, acc, fun, step)

    "(" ->
      reduce(min + step, max, right, acc, fun, step)
  end
end

def reduce(%Exterval{left: left, right: right, min: min, max: max, step: step}, acc, fun)
    when step < 0 do
  case right do
    "]" ->
      reduce(min, max, left, acc, fun, step)

    ")" ->
      reduce(min, max + step, left, acc, fun, step)
  end
end

Notice that these clauses to the reduce/3 implementation return a different reduce/6 function which is specific to our module.

Next we handle conditions where the reduction is halted or suspended:

efp reduce(_min, _max, _closing, {:halt, acc}, _fun, _step) do
  {:halted, acc}
end

defp reduce(min, max, closing, {:suspend, acc}, fun, step) do
  {:suspended, acc, &reduce(min, max, closing, &1, fun, step)}
end

Next we handle edge cases involving :infinity and :neg_infinity where we have no way to begin the reduction since we cannot move step increments away from either of these when they are our starting point:

defp reduce(:neg_infinity, _max, _closing, {:cont, acc}, _fun, step) when step > 0 do
  {:done, acc}
end

defp reduce(_min, :infinity, _closing, {:cont, acc}, _fun, step) when step < 0 do
  {:done, acc}
end

Interestingly, these are cases where the size of the intervals would be :infinity but we cannot reduce over them at all, as opposed to other infinitely sized intervals where we can begin iteration which will never end, such as ~i<[0,:infinity]//1> which would effectively be an infinite stream starting at 0 and incrementing by 1.

Next we add all of the main logic for the "typical" cases:

defp reduce(min, max, "]" = closing, {:cont, acc}, fun, step)
     when min <= max do
  reduce(min + step, max, closing, fun.(min, acc), fun, step)
end

defp reduce(min, max, ")" = closing, {:cont, acc}, fun, step)
     when min < max do
  reduce(min + step, max, closing, fun.(min, acc), fun, step)
end

defp reduce(min, max, "[" = closing, {:cont, acc}, fun, step)
     when min <= max do
  reduce(min, max + step, closing, fun.(max, acc), fun, step)
end

defp reduce(min, max, "(" = closing, {:cont, acc}, fun, step)
     when min < max do
  reduce(min, max + step, closing, fun.(max, acc), fun, step)
end

And lastly we add the final case where the condition that min < max (or min <= max depending on the brackets) is no longer met, which means the reduction is complete:

defp reduce(_, _, _, {:cont, acc}, _fun, _up) do
  {:done, acc}
end

Just like that the reduce/3 implementation is complete! As I mentioned before and notes in more detail, there are some opinions inherit to this implementation having to do with :infinity and :neg_infinity bounds as well as empty intervals, but I tried to keep the behavior consistent throughout.

Enumerable – Membership

Now on to the part that I was most interested in, which is interval membership. First, let's add support for checking membership between two intervals, which is essentially a check for one interval being a sub-interval of another.

Sub-interval must satisfy the following to be a subset:

  • The minimum value of the subset must belong to the superset.
  • The maximum value of the subset must belong to the superset.
  • The step size of the subset must be a multiple of the step size of the superset.

If the superset has no step size, then only the first two conditions must be satisfied.

if the superset has a step size, and the subset doesn't then membership is false.

def member?(%Exterval{step: nil} = outer, %Exterval{} = inner) do
  res = inner.max in outer && inner.min in outer
  {:ok, res}
end

def member?(%Exterval{}, %Exterval{step: nil}) do
  {:ok, false}
end

def member?(%Exterval{} = outer, %Exterval{} = inner) do
  res = inner.max in outer && inner.min in outer && :math.fmod(inner.step, outer.step) == 0
  {:ok, res}
end

Then that just leaves the main implementation for membership checks, which is basically just a case statement which changes the output depending on the brackets supplied. Additionally, if the interval contains a step then the value being checked must be a multiple of the step.

def member?(%Exterval{} = rang, value) when is_number(value) do
  res =
    if Exterval.size(rang) == 0 do
      {:ok, false}
    else
      case {rang.left, rang.min, rang.max, rang.right} do
        {_, :neg_infinity, :infinity, _} ->
          true

        {_, :neg_inf, max_val, "]"} ->
          value <= max_val

        {_, :neg_infinity, max_val, ")"} ->
          value < max_val

        {"[", min_val, :infinity, _} ->
          value >= min_val

        {"(", min_val, :infinity, _} ->
          value > min_val

        {"[", min_val, max_val, "]"} ->
          value >= min_val and value <= max_val

        {"(", min_val, max_val, "]"} ->
          value > min_val and value <= max_val

        {"[", min_val, max_val, ")"} ->
          value >= min_val and value < max_val

        {"(", min_val, max_val, ")"} ->
          value > min_val and value < max_val

        _ ->
          raise ArgumentError, "Invalid range specification"
      end
    end

  res =
    unless is_nil(rang.step) || rang.min == :neg_infinity || rang.max == :infinity do
      res && :math.fmod(value - rang.min, rang.step) == 0
    else
      res
    end

  {:ok, res}
end

Inspect

Lastly, to make the user experience a bit better, it's not too difficult to implement the Inpect Protocol to provide a cleaner output:

defimpl Inspect do
  import Inspect.Algebra
  import Kernel, except: [inspect: 2]

  def inspect(%Exterval{left: left, right: right, min: min, max: max, step: nil}, opts) do
    concat([string(left), to_doc(min, opts), ",", to_doc(max, opts), string(right)])
  end

  def inspect(%Exterval{left: left, right: right, min: min, max: max, step: step}, opts) do
    concat([
      string(left),
      to_doc(min, opts),
      ",",
      to_doc(max, opts),
      string(right),
      "//",
      to_doc(step, opts)
    ])
  end
end

Future Plans

Currently I am weighing the options between adding more functionality to the library or keeping it as thin as it currently is. The main additions could be more robust set operations on the intervals, but I currently do not have a need for it so it will probably not make it into the library in the near future.

For now, I hope this provided a detailed look at the process of identifying a problem, and subsequently designing and implementing the solution. I found this to be an elegant solution to the problem, but as I mentioned it was not a straight-line path. I would be interested to hear about any other solutions people have seen!

Comments