I’ve reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one’s fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.
The stuff below is an extract from the unedited update. It’ll appear in the next beta. It follows a discussion of named groups, \k
and related stuff.
There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>
, and we define the group using (?<name>...)
. Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0}
to the group, it means “zero matches of this group,” so the group is not executed when first encountered.
sentence = %r{
(?<subject> cat | dog | gerbil ){0}
(?<verb> eats | drinks| generates ){0}
(?<object> water | bones | PDFs ){0}
(?<adjective> big | small | smelly ){0}
(?<opt_adj> (\g<adjective>\s)? ){0}
The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object>
}x
md = sentence.match("The cat drinks water")
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"
md = sentence.match("The big dog eats smelly bones")
puts "The adjective in the second sentence is #{md[:adjective]}"
sentence =~ "The gerbil generates big PDFs"
puts "And the object in the last is #{$~[:object]}"
produces:
The subject is cat and the verb is drinks
The adjective in the second sentence is smelly
And the object in the last is PDFs
Cool, eh?