Parsing Made Easyish

A Guide to Parsing with Marpa

http://cleverdomain.org/opw-marpa/

About Me

Parsing Quick Intro

Perl Parsing Tools

Story Time

Lexerless Parsing

Enter Marpa

My Favorite Feature of Marpa

Standard Input Model: Earleme-per-Token

while ($input_remaining) {
  my $token = $lexer->next_token($input);
  unless ($parser->read($token)) {
    die "Parse error!";
  }
}
my $value_ref = $parser->value;
unless ($value_ref) {
  die "Parse error!";
}
return $$value_ref;

Earleme-per-Character Model

while ($pos < length $input) {
  my $expected = $parser->expected_terminals;
  for my $token_name (@$expected) {
    if (my $token = $lexer->match($token_name, $input, $pos)) {
      $parser->alternative($token);
    }
  }
  try { $parser->earleme_complete } catch { die "Parse error!" };
  $pos++;
}
my $value_ref = $parser->value;
unless ($value_ref) {
  die "Parse error!";
}
return $$value_ref;

So there’s still a lexer?

Code Sample: MarpaX::Lex::Easy

TOKEN: for my $token_name (@expected) {
  my $token = $tokens->{$token_name};
  die "Unknown token $token_name" unless defined $token;
  next if $token eq 'passthrough';
  my $rule = $token->[0];
  pos($input) = $pos;
  next TOKEN unless $input =~ $rule;
  my $matched_len = $+[0] - $-[0];
  my $matched_value = undef;

  if (defined( my $val = $token->[1] )) {
    if (ref $val eq 'CODE') {
      eval { $matched_value = $val->(); 1 } || do { next TOKEN };
    } else {
      $matched_value = $val;
    }
  } elsif ($#- > 0) { # Captured a value
    $matched_value = $1;
  }

  push @matches, [ $token_name, \$matched_value, $matched_len + $whitespace_consumed ];
}
return @matches;

Math Parser: Grammar

my $grammar = Marpa::R2::Grammar->new({
  actions => 'Math::Parser::Actions',
  start => 'Expression',
  rules => q{
    Expression ::= NUMBER                           action => number
                |  (LPAREN) Expression (RPAREN)     action => parens assoc => group
                || Expression (ASTERISK) Expression action => multiply
                |  Expression (SLASH) Expression    action => divide
                || Expression (PLUS) Expression     action => add
                |  Expression (MINUS) Expression    action => subtract
  },
});
$grammar->precompute;

Math Parser: Tokens

my %tokens = (
  'LPAREN'   => [ qr/\G[(]/ ],
  'RPAREN'   => [ qr/\G[)]/ ],
  'ASTERISK' => [ qr/\G[*]/ ],
  'SLASH'    => [ qr#\G[/]# ],
  'PLUS'     => [ qr/\G[+]/ ],
  'MINUS'    => [ qr/\G[-]/ ],
  'NUMBER'   => [ qr/\G([-+]?[0-9.]+(?:e[+-]?[0-9]+)?)/, sub { 0 + $1 } ],
);

Math Parser: Parse Method

method parse_line ($line) {
  my $rec = Marpa::R2::Recognizer->new({
      grammar => $grammar,
      ranking_method => 'rule',
  });

  my $lex = MarpaX::Lex::Easy->new(
    tokens => \%tokens,
    recognizer => $rec,
    automatic_whitespace => 1,
    whitespace_pattern => qr/\G\s+/,
  );

  return $lex->read_and_parse($line);
}

Math Parser: Actions

package Math::Parser::Actions;

sub parens   { $_[1] }
sub multiply { $_[1] * $_[2] }
sub divide   { $_[1] / $_[2] }
sub add      { $_[1] + $_[2] }
sub subtract { $_[1] - $_[2] }
sub number   { $_[1] }

Bonus Technique: Two Parsers

Two Parsers Cont.

TAP: Line Parser

method parse_line ($line) {
  my $rec = Marpa::R2::Recognizer->new(
      { grammar => $self->line_grammar, ranking_method => 'rule' });

  for my $pos (0 .. length($line) - 1) {
    my $expected_tokens = $rec->terminals_expected;
    if (@$expected_tokens) {
      my @matching_tokens = $self->lex(\$line, $pos, $expected_tokens);
      $rec->alternative( @$_ ) for @matching_tokens;
    }
    my $ok = eval { $rec->earleme_complete; 1 };
    if (!$ok) {
      return [ 'Junk_Line', $line ];
    }
  }
  $rec->end_input;
  return ${$rec->value};
}

TAP: Stream Parser

method parse {
  my $rec = Marpa::R2::Recognizer->new(
      { grammar => $self->stream_grammar, ranking_method => 'rule' });
  my $reader = $self->reader;

  while (defined( my $line = $reader->() )) {
    my $line_token = $self->parse_line($line);
    unless (defined $rec->read(@$line_token)) {
      my $expected = $rec->terminals_expected;
      die "Parse error, expecting [@$expected], got $line_token->[0]";
    }
  }
  $rec->read('EOF');
  return ${$rec->value};
}

The Future

Watch This Space

Marpa Community

Thanks!