Wednesday, October 15, 2008

Fun with Ruby 1.9 Regular Expressions

I’ve reorganized the regular expression content in the new Programming Ruby, and added some cool new advanced examples. This one’s fairly straightforward, but I love the fact that I can now start refactoring my more complex patterns, removing duplication.


The stuff below is an extract from the unedited update. It’ll appear in the next beta. It follows a discussion of named groups, \k and related stuff.




There’s a trick which allows us to write subroutines inside regular expressions. Recall that we can invoke a named group using \g<name>, and we define the group using (?<name>...). Normally, the definition of the group is itself matched as part of executing the pattern. However, if you add the suffix {0} to the group, it means “zero matches of this group,” so the group is not executed when first encountered.


sentence = %r{ 
(?<subject> cat | dog | gerbil ){0}
(?<verb> eats | drinks| generates ){0}
(?<object> water | bones | PDFs ){0}
(?<adjective> big | small | smelly ){0}

(?<opt_adj> (\g<adjective>\s)? ){0}

The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object>
}x

md = sentence.match("The cat drinks water")
puts "The subject is #{md[:subject]} and the verb is #{md[:verb]}"

md = sentence.match("The big dog eats smelly bones")
puts "The adjective in the second sentence is #{md[:adjective]}"

sentence =~ "The gerbil generates big PDFs"
puts "And the object in the last is #{$~[:object]}"

produces:


The subject is cat and the verb is drinks 
The adjective in the second sentence is smelly
And the object in the last is PDFs



Cool, eh?