[Perl] Safe-navigation Monad.

作者：gugod 發佈於：2023/06/18 ，更新於：2023/06/20 #perl #monad

Recently on reddit.com/r/perl, pmz posted a question like this:

XML::Twig, get value (or not) without dying

Currently I do:
if (defined $elt->first_child('addr')->first_child('postalCode')) { $patient{patient_postal_code} = $elt->first_child('addr')->first_child('postalCode')->text ; }
because if I don't check for "defined" and the resulting value is null , it dies.

Link to the original post: https://www.reddit.com/r/perl/comments/1492sc1/xmltwig_get_value_or_not_without_dying/

While on one hand the question is about how to use XML::Twig, on the other hand the obvious inconvenience here is when first_child('addr') returns undef, which means there are no <addr> element underneath, the following call of first_child('postalCode') would make the programm die. Generally speaking: in a chain of calls we expect objects to be present in all positions, but sometimes there are undef. Given that, is there a way to avoid the program from dying and let the entire call chain return undef if undef is encountered in the calling chain ?

To formalize the question a bit more generically: assume a class with instance methods a(), b(), and c(). These methods may return an instance of same class, or ocassionally, undef. Consider the following chain of calls originally from $o:

$res = $o->a()->b()->c();

In case any of a(), b(), or c() returns undef, the program dies with messages like this:

Can't call method "c" on an undefined value

Which suggests b() returns undef and since undef is not an object, we cannot call methods on it.

Now, could we rewrite the same program to prevent the abovementioned error from happening, while making $res be undef if any of a(), b(), c() returns undef, or otherwise, the return value of c() ?

In some other programming languages, such purpose could be satisfied by using the safe-navigation operator, such as ?. in javascript or kotlin:

res = o.a()?.b()?.c();

Or in raku, .?

$res = $o.a().?b().?c();

However, we haven't seen anything similar up until perl 5.38 just yet.

A rather intuitive way to rewrite would be something like this:

$res_a = $o->a();
$res_b = $res_a && $res_a->b();
$res   = $res_b && $res_b->c();

However, besides making the program much longer and less easier to grasp, the rewrite is not generic. It'll be different for similar statements with different method names. Not a good strategy.

Meanwhile, here's a super simple and generic way:

$res = eval { $o->a()->b()->c() };

However, with the powerful side-effect of eval, all exceptions would be ignored while we are only interested in ignoring undefined values. That is a lot more than what we want. Even though it looks simple, it is probably not applicable.

Here is a solution with Monad design pattern.

The rewritten version looks like this:

$res = SafeNav->wrap($o) ->a()->b()->c() ->unwrap();

The SafeNav is defined as the folowing.

use v5.36;
package SafeNav {
    sub wrap ($class, $o) { bless \$o, $class }
    sub unwrap ($self)    { $$self            }

    sub AUTOLOAD {
        our $AUTOLOAD;
        my $method = substr $AUTOLOAD, 2 + rindex($AUTOLOAD, '::');

        my ($self, @args) = @_;

        # [a]
        (defined $$self) ?
            __PACKAGE__->wrap( $$self -> $method(@args) ) :     # [a.1]
            $self;                                              # [a.2]
    }

    sub DESTROY {}
};

SafeNav is a class that wraps all scalar values and equips with AUTOLOAD for responding to all method calls.

Inside AUTOLOAD there is the core part of our logic in [a]: If we are not wrapping an undef value, we call the original method on it, then re-wrap the return value ([a.1]). Or if we are wrapping an undef, we ignore the method call and just lay down and keep being ourselves ([a.2]).

Thanks to the mechanism of AUTOLOAD, the original form of ->a()->b()->c() is kept exactly the same after the rewrite. Let's put both versions side-by-side for a quick comparison:

$res = $o ->a()->b()->c();
$res = SafeNav->wrap($o) ->a()->b()->c() ->unwrap();

The wrap() at the front, together with unwrap() at the back, form a clear boundary in which SafeNav is effective. Method calls after unwrap() are not not guarded by SafeNav.

With that, we properly ignore undef values, nothing more. If other kinds of exceptions are thrown from method a, b, c, the program would correctly abort. In the 3 proposed ways to rewrite the program in this article, the SafeNav monad is both generic and not adding too much verbosity to the original program.