Thursday, 28 June 2007

Erlang Binary Map


(Updated to clean up some hypertext errors in the code examples.)

The Netflix Challenge dataset is very big, and there is not much RAM in my laptop at all.

To fit as much as possible of the rating data into memory, I converted the data into a bunch of Erlang binaries. Erlang binaries are 'just' contiguous blocks of bytes, so they do not incur the overhead of list cons cells or dictionary indexing. They are fast, but pretty simple, and using them to store data like C arrays means that you have to write your own code to manage access to them. (At least you will get a runtime error if you try to access past the end of a binary, which is a major step up from C.)


While wandering through the wasteland that is my code I happened to notice that one section did not smell quite right. This code was taking a chunk of that binary data, converting the chunk into a list of elements, and then mapping a function over this list of elements in order to return a list of function results. Creating that intermediate list seemed like a bit of unnecessary overhead - I would much rather iterate directly across the binary itself and save all of that extra list consing - but I could not find any binary:map-like function anywhere.

So I wrote my own.

It's not very big.


The original code had used a list_from_bin function to turn a binary into a list of elements:

% Convert a binary into a list of unsigned integer elements.
list_from_bin(Bin, BytesPerElem) ->
list_from_bin(Bin, BytesPerElem, BytesPerElem * 8, size(Bin), []).

list_from_bin(_Bin, _BytesPerElem, _BitsPerElem, 0, Results) -> Results;
list_from_bin(Bin, BytesPerElem, BitsPerElem, BytesOffset, Results) ->
BytesDiscard = BytesOffset - BytesPerElem,
<<_:BytesDiscard/binary,Element:BitsPerElem/unsigned-integer,_/binary>> = Bin,
list_from_bin(Bin, BytesPerElem, BitsPerElem, BytesDiscard, [Element|Results]).


And list_from_bin was used in this manner (with a trivial "2 * Elem" function):

[2 * N || N <- list_from_bin(<<1,2,3,4>>, 1)].
-> [2,4,6,8]

[2 * N || N <- list_from_bin(<<1,2,3,4>>, 2)].
-> [516,1544]

Note that list_from_bin iterates from the end of the binary backwards down to the beginning of the binary, so the list it builds is in the same order as the original binary and does not need reversing.

(If all of my elements were one byte long then I could have just used the standard Erlang binary_to_list/1 function, but sometimes the elements I used were two or three bytes in length. I should have probably included an initial clause of "list_from_bin(Bin, 1) -> list_from_binary(Bin);", but didn't think of it at the time.)


The new map_bin function maps a function over the elements in a binary, and returns a list of the function's results:

map_bin(Fun, Bin, BytesPerElem) ->
map_bin(Fun, Bin, BytesPerElem, 0, size(Bin) div BytesPerElem, []).

map_bin(_Fun, _Bin, _BytesPerElem, _BytesDiscard, 0, Results) -> lists:reverse(Results);
map_bin(Fun, Bin, BytesPerElem, BytesDiscard, CountElemRemain, Results) ->
<<_:BytesDiscard/binary,Elem:BytesPerElem/binary,_/binary>> = Bin,
map_bin(Fun, Bin, BytesPerElem, BytesDiscard + BytesPerElem, CountElemRemain - 1, [Fun(Elem)|Results]).


The main function (map_bin/3) takes these arguments:
  • Fun: A function that takes a single binary element as input. May return any value.
  • Bin: The original binary data block. Its size should be an exact multiple of BytesPerElem. If the size of the binary is not an exact multiple of the BytesPerElem value then any excess bytes at the end of the binary are discarded.
  • BytesPerElem: The number of bytes taken up by each element in the binary.

The helper function (map_bin/6) takes three additional arguments:
  • BytesDiscard: The number of bytes to skip at the beginning of the binary, for the current iteration.
  • CountElemRemain: The number of elements remaining to process.
  • Results: An accumulated, reversed list of the function's results.

The BytesDiscard argument was added to avoid having to recalculate the number of bytes to skip for every iteration (with, for example, something like "BytesDiscard = Offset * BytesPerElem"). I am not sure if this was a good decision or if it reeks too much of premature optimisation. Old C habits die hard.

CountElemRemain starts at the number of elements to process and decrements each iteration so the terminating condition can be written simply as 0, rather than having to have a "when Offset > CountElems" guard on the function clause.

And here is map_bin in action:

map_bin(fun(<< Elem:8>>) -> 2 * Elem end, <<1,2,3,4>>, 1).
-> [2,4,6,8]

map_bin(fun(<< Elem:16>>) -> 2 * Elem end, <<1,2,3,4>>, 2).
-> [516,1544]


Now with the new map_bin function my code can skip the creation of an intermediate list, and, quite entirely by accident, is actually more flexible than before. The original code always produced lists of unsigned integers from the binaries; the new code can be used to operate on multiple elements. For example:

map_bin(fun(<< Elem1:8,Elem2:16>>) -> Elem1 + Elem2 end, <<1,2,3,4,5,6>>, 3).
-> [516,1290]

It's just not quite as nice to look at as a good list comprehension.


Performance

The map_bin function is all well and good, but the real question we all want answered is... does using this new code actually make our program run any faster?

Well, according to my very informal use of now/0 and timer:now_diff/2, with large binaries and a trivial "x2" function for each element, map_bin seems to be around 11% faster than using list_from_bin. That's... not too bad. But we could go faster with multiple processes, I think.


Parallelism

Really, what is the point of using Erlang if we don't spawn a few hundred processes for every trivial piece of code?

Luckily for my fingers it is only a minor modification to make a parallel version of map_bin:

pmap_bin(Fun, Bin, BytesPerElem) ->
pmap_bin(Fun, Bin, BytesPerElem, 0, size(Bin) div BytesPerElem, []).

pmap_bin(_Fun, _Bin, _BytesPerElem, _BytesDiscard, 0, Pids) ->
[receive {done, Pid, Result} -> Result end || Pid <- lists:reverse(Pids)];
pmap_bin(Fun, Bin, BytesPerElem, BytesDiscard, CountElemRemain, Pids) ->
PidThis = self(),
<<_:BytesDiscard/binary,Elem:BytesPerElem/binary,_/binary>> = Bin,
pmap_bin(Fun, Bin, BytesPerElem, BytesDiscard + BytesPerElem, CountElemRemain - 1,
[spawn(fun() -> PidThis ! {done, self(), Fun(Elem)} end)|Pids]).


Sadly, I cannot comment much on the speed difference of this conversion because I do not (yet) have access to a multi-core machine. A routine like this is probably best avoided for a single-CPU system as the overhead of spawning many processes would be wasted, but it should perform better on multiple-CPU systems. (Feel free to try it out and let me know how it goes!)


There is still room for improvement in this function, if we wanted to take it further. We may not want to spawn a separate process for each element, for instance, but rather split the original binary into N chunks and spawn a process to apply a function to each chunk. Also, we might want to expand on the parallelism and include other nodes into the mix, to spread the calculations across multiple machines.

Something for another day.


And now someone is going to tell me that binary comprehensions have been available in the standard Erlang distribution for a while, and I just haven't been paying enough attention to the Erlang mailing-list announcements.

4 comments:

  1. There's a technique called List Fusion which helps avoiding the problem with intermediate lists you described. In Haskell:

    http://www.haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html#id3159016

    'If a "good consumer" consumes an intermediate list constructed by a "good producer", the intermediate list should be eliminated entirely.'

    ReplyDelete
  2. In recent erlangs there is an undocumented binary comprehension mechanism.

    -module(blb).
    -compile([binary_comprehension,export_all]).

    double() ->
    [2 * N || <<N:16>> <= <<1,2,3,4>>].

    this evaluates to [516,1544] like one of your examples. You must remember to compile with the binary_comprehension option to enable this syntax. See the whitepaper for more information...

    ReplyDelete
  3. Herrmann: I have been considering Haskell for some time now, but I get the feeling that the Erlang VM and OTP modules currently support more features that I [think I] need.

    I am going to have to remember the concept of list fusions for a later project of mine (one in a list of many). I hope I will be able to incorporate the idea because it is definitely useful to have the compiler do it automatically for the developer.


    Anonymous: Thank you for confirming my suspicion. A binary counterpart to list comprehensions is way too useful to be missing from the language. Time to update my code!

    ReplyDelete
  4. "Really, what is the point of using Erlang if we don't spawn a few hundred processes for every trivial piece of code?"

    lol

    ReplyDelete

Obligatory legal stuff

Unless otherwise noted, all code appearing on this blog is released into the public domain and provided "as-is", without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the author(s) be liable for any claim, damages, or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.