Regex Quiz #1
Let’s bring in the color tags!Time to answer the Regex Quiz #1. The regex was:
\[(#[a-fA-F0-9]{3,6}|[a-zA-Z]+)\](.+)\[\/\1\]
What can it be used for? This is what we’ll explain in this post.
Dissecting the Regex #
The quiz was given with a clue: you need a matcher to exploit it, so it’s likely you won’t only test whether the input matches it or not.1
Let’s focus on the regex for a while.
Escaping Brackets #
First, we notice the expression is quite unpleasant to read because of many escapings, especially for square brackets: [
and ]
are preceded with \
, which means they don’t have their usual role in a regex.
Instead, they’re part of the text we’ll be looking for when testing the regex against an input.2
Capturing Groups #
You can see there are two capturing groups:
- Group #1 is
(#[a-fA-F0-9]{3,6}|[a-zA-Z]+)
and will capture either:#
followed by a hexadecimal string from three to six characters;- or a string of letters.
- Group #2 is
(.+)
and will capture everything between the matched]
and[
.
Reuse of Captured Groups #
There is one particularity that may not be obvious but is fundamental to understanding this regex: \1
will ensure that this regex matches only if the text in this place is the same as what was captured by group #1.
For instance, if the regex group #1 captures cyan
, then the regex would match only if the text in the place of the \1
is cyan
, too.
This would mean the input starts with [cyan]
and ends with [/cyan]
.
Yes, this is a tag system.
Wrapping Up #
So, to sum up, we have:
- a tag system;
- the tag contains a HTML color, either in hexadecimal form or color names;
- a content.
Yes, that is a color pipe you can use to turn [red]This is important.[/red]
into <span style="color: red">This is important.</red>
.
A quick snippet to do it would be:
1final String regex = "\\[(#[a-fA-F0-9]{3,6}|[a-zA-Z]+)\\](.+)\\[\\/\\1\\]";
2final String input = "[red]This is important.[/red]";
3
4Pattern pattern = Pattern.compile(regex);
5Matcher matcher = pattern.matcher(input);
6
7if (matcher.find()) {
8 String output = String.format(
9 "<span style=\"color: %s\">%s</span>",
10 matcher.group(1),
11 matcher.group(2)
12 );
13 System.out.println(output);
14} else {
15 System.out.println(input);
16}
The Story Behind #
Well, as you can guess, we had the need for users to add color to their text, while using an editor that only manipulated raw text. Going to phpBB-like tags seemed an easy enough route, and regex was a quick way to parse it.
It is a basic usage, but it was a nice example of the \1
trick to capture matching tags.
In our case, this allows for nesting of tags if necessary ([red]This is [#ff9900]very[/#ff9900] important.[/red]
), but there are many other cases where this might be just as important.
We’ll see another use of that trick in our next regex quiz, in two months.
Hopefully you’ll remember and spot it.
The quiz was also given with a mistake. When using Java, you have to escape most of your escapings, which makes it hard to read and keep track. In order to focus on the regex per se, I’ll avoid Java for future quizzes. ↩︎
You could actually not escape the closing
]
to make it a bit easier to read.\[(#[a-fA-F0-9]{3,6}|[a-zA-Z]+)](.+)\[\/\1]
is thus equivalent. ↩︎