-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Sometimes you want to extract the original text of a matching sequence, for instance:
WORD ^1 : /([\p{L}\d]+)/;
SIGN : /[.,!?:;]/;
Rule = sequence_imm( c[] = any( WORD | SIGN ) {3, 8}
(the rule syntax is a complete invention of mine :-) ).
The matching text is:
Uhren, Porzellan Gmbh, St. Gallen
I would like to extract the text with dots and commas. So far I'm only able to extract
the single tokens:
XX [XXX] : 1 WORD Uhren
XX [XXX] : 2 SIGN ,
XX [XXX] : 1 WORD Porzellan
XX [XXX] : 1 WORD GmbH
XX [XXX] : 2 SIGN ,
XX [XXX] : 1 WORD St
XX [XXX] : 2 SIGN .
XX [XXX] : 1 WORD Gallen
So, it's not easy to reconstruct the text from the c[] array of tokens.
Metadata
Metadata
Assignees
Labels
No labels