Skip to content

Backreference language sampling #24

@RunDevelopment

Description

@RunDevelopment

The JS parser is able to resolve backreferences where (among other conditions) the associated capturing group accepts a small finite language.

E.g. /(a|b)\1/ will be parsed as /aa|bb/

This is quite useful but it cannot handle capturing groups that accept infinitely many strings.

E.g. /(=+)a\1/ cannot be parsed.

However, in some use cases (such as static analysis), it might be enough to replace the backreferences with a sample of the language of the capturing group. The parse result will only approximate the input RegExp but this may be good enough for some use cases.

E.g. /(=+)a\1/ might get parsed as /=a=|==a==|===a===|====a====/ depending on the sampling. This isn't correct but it might be useful.

The sampling algorithm has to be provided by the user. The solution will always be imperfect and refa can't know which tradeoffs are acceptable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions