NAME

Tstregex - A Hybrid Regex Diagnostic Tool (single file Library module and command tool)

SYNOPSIS

shows the longest Regular Expression match / highlight the rejected part

Example:

$ perl lib/Tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'
abc123
abc12a (^[a-z]*\d{3}$)

# Above, the normal parts are the longest matching substring when bold parts highlight the rejected substring

SYNOPSIS

$tstregex 'regex' string1 string2 ... stringN

OPTIONS (CLI)

-h --help

show that help..

-v --verbose

shows key info on (un)matching..

-d --diag

Triggers the Enriched Diagnostic View. It displays: - The string with the failing part highlighted. - The exact token in the regex that caused the break. - A visual pointer (^--- HERE) aligned with the regex syntax. - Execution time (useful for spotting ReDoS/Exponential backtracking).

-a --assert

Misc: performs a huge test suite various a large collection of regexp tests with Tstregex..

Perl Module SYNOPSIS

  use Tstregex;
  my $ctx = tstregex_init_desc('/^\d{3}/');
  tstregex($ctx, '12a');
  if (!tstregex_is_full_match($res))
      {
      my $token = tstregex_get_fail_token($res);
      my $pos   = tstregex_get_match_len($res);
      print "Failure on token '$token' at column $pos\n";
      }

API

tstregex_init_desc($raw_re)

Pre-parses the regex, handles delimiters (m!!, //, etc.), extracts modifiers (i, s, m, x), and prepares the nibbling steps. Returns a context hash.

tstregex($ctx, $string)

Executes the diagnostic. Updates the context.

tstregex_is_full_match

Returns match status of input string (BOOL 0 OR 1)

tstregex_get_match_portion

Returns the matching portion in case of full match (might be smaller than input string, depending on anchors..)

tstregex_get_match_len

Returns the matching substring length

tstregex_get_fail_token

Returns the failing token in the regexp

tstregex_get_re_clean

Returns the matching regexp subpart

tstregex_get_re_raw

Returns the internal representation of the regexp

tstregex_get_prefix_offset

Returns the offset of the original regexp in the raw regexp

DESCRIPTION

tstregex is designed to solve the "Black Box" problem of Regular Expressions. When a complex regex fails, Perl usually just says "No Match". This tool identifies exactly where and why it failed by finding the longest possible partial match.

EXAMPLE

  $ perl lib/Tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'
  abc123
  abcB<12a> (B<^[a-z]*>\d{3}$)

The tool highlights the part of the string where the match failed.

The "Nibbling" Engine

The diagnostic logic uses a "Nibbling" (grignotage) strategy:

1. Decomposition

The engine breaks down your regex into a hierarchy of valid sub-patterns (lexical groups, atoms, and quantifiers) from longest to shortest.

It iteratively tests these sub-patterns against the input string. It's not just checking if the start matches, but what is the maximum sequence of instructions the engine could follow before hitting a wall.

3. Failure Point Identification

Once the longest matching sub-pattern is found, the tool identifies the very next token in your regex syntax. This is your "Point of Failure".

AUTHOR

Olivier Delouya - 2026

LICENSE

Artistic Version 2

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 40:

Unterminated C<...> sequence