Skip to content

Extract matching sequence of tokens #41

@andreasbaumann

Description

@andreasbaumann

Sometimes you want to extract the original text of a matching sequence, for instance:

WORD ^1       : /([\p{L}\d]+)/;
SIGN              : /[.,!?:;]/;

Rule = sequence_imm( c[] = any( WORD | SIGN ) {3, 8}

(the rule syntax is a complete invention of mine :-) ).

The matching text is:

Uhren, Porzellan Gmbh, St. Gallen

I would like to extract the text with dots and commas. So far I'm only able to extract
the single tokens:

XX [XXX] : 1 WORD Uhren
XX [XXX] : 2 SIGN ,
XX [XXX] : 1 WORD Porzellan
XX [XXX] : 1 WORD GmbH
XX [XXX] : 2 SIGN ,
XX [XXX] : 1 WORD St
XX [XXX] : 2 SIGN .
XX [XXX] : 1 WORD Gallen

So, it's not easy to reconstruct the text from the c[] array of tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions