Understanding and Implementing the famous map and reduce functions - (Functional PHP 5.3 Part II)

Posted Aug 21, 2009 by Kris | Comments ( 2 ) | Filed in: PHP | | | | |

To demonstrate some uses of PHP 5.3's fun new anonymous functions let's implement the famous functions: map and reduce.

In Part I we deconstructed the differences between anonymous functions, lambdas, and closures in PHP. You may want to get comfy with those terms before continuing here. As a refresher, lambdas and anonymous functions are essentially the same idea: functions that are values just like integers and strings. Closures are what enable lambdas to refer to or use variables defined outside of the lambda, even after those variables have fallen out of scope. Now, onto map reduce.

We should chat about the butterflies, the birds, and the bees...

By now you've heard the cool kids at school raving about the euphoria of recursion and lambdas with higher order functions like map and reduce. We'll get there, we will, but first we need to have a little talk about butterflies, birds, and bees.

There's a moment in every little caterpillar's life when it becomes a butterfly. Metamorphosis is a transformative process. It's a transformation function, caterpillar in, butterfly out. (Caterpillar) -> (Butterfly) What is important to understand is not how metamorphosis works but that it is a function, a special kind of function that transforms a caterpillar into a butterfly.

Can you think of other functions that transform an input into an output? Sure: function square($x) { return x * x; } or strtolower or function filter($dirty) { /* clean up */ return $clean; }. There are a lot of transformer functions! Some, like square may take an input and give you the same type of input back int -> int. Others, like metamorphosis, will take one type of input and give you another caterpillar -> butterfly. For our purposes let's classify all of these types of functions simply transformer functions.

Transformer!

Are there other types of functions too? You bet. This is the point where we talk about the birds and the bees. On second thought, I've never really understood how birds and bees combine to make anything, so lets stick to butterflies. When two butterflies are come together in a certain way they create a new butterfly (yes, I know this analogy also has holes, just play along). Let's refrain from thinking about the details of how they come together, and focus on the beauty of what is happening. Two butterflies combining to make one! (Butterfly, Butterfly) -> (Butterfly). It's another function, a combiner function, that takes two like things and results in another like thing.

Combiner!

So, what does this have to do with map / reduce and functional programming in PHP?

map and reduce are higher-order functions. This means each has a parameter that is a function, but not just any function. Each takes a special type of function. map uses a transformer function and reduce uses a combiner function. map transforms a list of things, like say a bunch of caterpillars, into a list of another things, like butterflies. reduce reduces a list of things like butterflies until there's only one left. Here's a visual of how map reduce works:

The Code! For crying out loud, show us The Code!

Ok, ok. You get it! map transforms a list of things into a list of other things, reduce combines a list of things into one thing. What does this look like in PHP?

PHP has two built-in functions: array_map and array_reduce. (Unfortunately the two functions take the array and the callback parameters in different orders. We won't make that mistake when we implement map/reduce will we?) A callback has a lot of different meanings in PHP which we'll cover in the next post in this series. For now take a callback to mean an "anonymous function".

The canonical map/reduce example is word counting. So let's take on the challenge in PHP: given an array of strings we must count the occurrences of each word in all strings. Here is our input and desired output:

<?php
$lines = array(
             'one two three four',
             'two three four',
             'three four',
             'four',
             );
// Desired Output, array of type word => count
// array ( 'one' => 1, 'two' => 2, 'three' => 3, 'four' => 4, ) 
?>

Let's break down the problem into what we know:

  • Inputs: (String of Words)
  • Output: Array(Word => Count)

We need to get from our input type to our output type. How can we use a transformer function to get us half-way there? And a reducer to bring us home? Think of the caterpillars and the butterflies.

Our caterpillar is a plain old, space delimited, lowercase string. Our transformer function must take a single string and metamorphosize it into our butterfly, the output type: a single array where keys are words and values are counts. (Line of Words) -> (Array: Word => Count) Let's code this anonymous transformer function up (we'll cheat and use PHP's built-in array_count_values to count the words):

<?php
// Transforms (Line of Words) -> (Array: Word => Count)
$lineToWordCounts = 
    function($line) { 
        return array_count_values(explode(' ', $line));
    };
    
// Test on a single line:
var_export($lineToWordCounts('one two three four'));
// Output: array ( 'one' => 1, 'two' => 1, 'three' => 1, 'four' => 1, )

// Test with array_map:
$counts = array_map($lineToWordCounts, $lines);
var_export($counts);
// Output: array ( 0 => array ( 'one' => 1, 'two' => 1, 'three' => 1, 'four' => 1, ), 
//                 1 => array ( 'two' => 1, 'three' => 1, 'four' => 1, ),
//                 2 => array ( 'three' => 1, 'four' => 1, ),
//                 3 => array ( 'four' => 1, ), )
?>

Sweet mother nature! The strings have transformed into beautiful butterflies! Now all we need to do is combine them down into a single array. If we can write a combiner function that takes two word count arrays and combines them into one we can use reduce to combine all of them into one for a total word count. Let's take a stab:

<?php 
// Combiner (Array:Word=>Count,Array:Word=>Count)->(Array:Word=>Count)
$sumWordCounts =
    function($countsL, $countsR) {
        // Get all the words
        $words = array_merge(array_keys($countsL), array_keys($countsR));
        $out = array();
        // Put them in a new (Array: Word => Count)
        foreach($words as $word) {
            // Sum their counts
            $out[$word] = isset($countsL[$word]) ? $countsL[$word] : 0;
            $out[$word] += isset($countsR[$word]) ? $countsR[$word] : 0;
        }
        return $out;
    };
$totals = array_reduce($counts, $sumWordCounts, array());
var_export($totals);
// Output: array ( 'one' => 1, 'two' => 2, 'three' => 3, 'four' => 4, ) 
?>

Just like that we've implemented a multi-line word count using PHP's built-in array_map and array_reduce functions and our very own transformer and combiner lambdas. The map step transforms each line from a string into an array of word counts for that line. The reduce step combines those arrays of word counts into a single total count for all lines.

Now that we understand map's use of a transform function, and reduce's use of combine function, let's implement map and reduce on our own.

So, you're ready to implement map/reduce in PHP?

Map's implementation is so simple, it barely needs an explanation. All we're doing is calling the transform function on every input element and storing each result in an array to be returned. Here it is:

<?php
/**
 * $transformer lambda(caterpillar) -> butterfly
 * $in array of caterpillars
 */
function map($transformer, $in) {
    $out = array();
    foreach($in as $item) {
        $out[] = $transformer($item);
    }
    return $out;
}
?>

Not too bad, eh? Somewhere John Mccarthy's just stooled himself, I know. Hang tight, John, we'll get to recursion! Reduce is a little trickier because of edge cases where arrays with no or only 1 element are provided. Don't get caught in those details, just focus on the snippet where we've got more than 1 element to reduce:

<?php
/**
 * $combiner lambda(butterfly, butterfly) -> butterfly
 * $in array of butterflies
 */
function reduce($combiner, $in, $identity) {
    if(count($in) <= 1) {
        $out = $identity;
    } else if(count($in) > 1) {
        $out = array_shift($in);
        do {
            $next = array_shift($in);
            $out = $combiner($out, $next);
        } while(!empty($in));
    }
    return $out;
}
$totals = reduce($sumWordCounts, map($lineToWordCounts, $lines), array());
var_export($totals);
// Output: array ( 'one' => 1, 'two' => 2, 'three' => 3, 'four' => 4, ) 
?>

With reduce we begin by shifting the first butterfly off the list. We combine it with the next butterfly to make a super butterfly. We then combine the super butterfly with the next butterfly on the list, and so on until we have the ultimate single butterfly. This is a little trippy at first so refer back to the diagram for a visual explanation.

Map/Reduce, Take 2: Recursive Implementations, Please

Remember those crazy trips to lambdaland full of recursion all the cool kids on Reddit and HN are taking? Well, your time has come, too. Our initial implementations of map and reduce were done imperatively with loops. The functional folks believe loops are, well, garbage. (Don't worry, they're not.) Why do you need loops when you have recursive functions? How would map and reduce look if we got rid of the loops?

Before we get there, let's implement a few helper functions to make our recursive implementations beautiful. Here they are:

<?php
// First element of an array
function first($in) { 
    return empty($in) ? null : $in[0];
}

// Everything after the first element of an array
function rest($in) {
    $out = $in;
    if(!empty($out)) { array_shift($out); }
    return $out;
}

// Take an element and an array
//  and fuse them together so that the element
//  is at the front of the array
function construct($first, $rest) {
    array_unshift($rest, $first);
    return $rest;
}
?>

Now that we've got those helpers out of the way, allowing us to work with arrays like they're lists, let's do recursive map and reduce functions in PHP.

<?php
/**
 * $transformer lambda(caterpillar) -> butterfly
 * $in array of caterpillars
 */
function map($transformer, $in) {
    return !empty($in) ?    construct(  $transformer(first($in)), 
                                        map($transformer,rest($in)))
                        :    array();
};

/**
 * $combiner lambda(butterfly, butterfly) -> butterfly
 * $in array of butterflies
 */
function reduce($combiner, $in, $identity) {
    return !empty($in) ?    $combiner(first($in),
                                      reduce($combiner, rest($in), $identity))
                         :  $identity;
};

$totals = reduce($sumWordCounts, map($lineToWordCounts, $lines), array());
var_export($totals);
// Output: array ( 'one' => 1, 'two' => 2, 'three' => 3, 'four' => 4, ) 
?>

If you're more comfortable with for loops than recursion this is going to look trippy. Chew on it a little longer.

Some things that may help you reason about the recursive version of map:

  1. If there is at least one caterpillar, it will be transformed. map then calls itself with any remaining caterpillars.
  2. Once there are no caterpillars left, this is our base case: an empty array is returned.
  3. Once the empty array is returned, the previous map will construct a new array with its butterfly and return the array.
  4. The construction of the butterflies array happens one-by-one until every butterfly is in it, and the top-level map returns.

How did that feel for you? Beautifully functional? No loops, yet, we're iterating through a list!

I'll leave the interpretation of reduce for you to mull over. We've now written higher-order functions by implementing map and reduce, imperatively and functionally, in PHP 5.3 with anonymous functions.

So, That Was Fun. What's Next?

We've deciphered lambdas, anonymous functions, and closures, now we've seen higher order functions with callbacks by implementing map/reduce, so what's next? Follow the RSS feed for the upcoming articles in this series on functional PHP 5.3.

You should follow me on Twitter to get updates as this series continues to evolve.

Comments

Comments are moderated. It may take some time for your comment to appear.

Recess can only be as good as the thoughts that go into it. Let us hear yours...

  1. Posted by devsmt on Sep 6, 2009
    extremely informative and clear, tnks a lot for sharing this!
  2. Posted by Mark on Sep 9, 2009
    The word count example would be easier if you weren't counting words that are also numbers! (one, two, three, etc.) The example would be clearer if you counted apple, banana, etc. instead.

You must login to comment on this page.