Pragdave

Sunday, March 23, 2008

I'd like to apologize

Just before the weekend, I did something stupid and hurtful, and I’d like to make it right.

For the last few weeks, I’ve been getting e-mails about a book under development over at O’Reilly called Software Craftsmanship—from Apprentice to Journeyman. People kept pointing out that the main title was the same as Pete McBreen’s book (for which I wrote the foreword), and that the overall structure of the title was similar to that of The Pragmatic Programmer.

After a while, this started to get under my skin. I wasn’t so much concerned about the “journeyman” bit, but the duplication of the title just seemed wrong to me—I really liked Pete’s book, and I didn’t want to see it getting eclipsed. I complained about this to a senior editor at O’Reilly, and he said he’d bring it up with the book’s editor, who worked for him. I heard nothing back.

So, at the end of a tiring week, I wrote a blog post, complaining about the title.

That was wrong of me.

It was wrong for a number of reasons.

I could, and probably should, have bypassed etiquette and contacted the authors directly, even though they write for a rival publisher.

It really wasn’t any of my business.

But, most importantly, it took something which was a kind of intellectual annoyance and turned it into something that made the authors of the book feel bad. And for that, I apologize.

This year, I’ve been the target of some cruel blog posts. Most readers of these posts viewed them as fine sport. But as the recipient of the criticism, I’m here to tell you that it hurts. It doesn’t matter whether it is based on truth or whether it isn’t. It doesn’t matter whether the person writing them knows you or is a total stranger. It hurts. Public attacks like this are virtually impossible to defend against, and that is a cruel violation. It’s cruel when it is done to you, and it’s cruel when you do it to others.

So, I of all people should have known better. I should have had the common sense to realize that my comments, aimed at a book, were going to be hurtful to the authors. It’s kind of obvious, really.

I wasn’t thinking straight, and I messed up.

So, Dave and Ade, I’m sorry for any distress I caused.

Good luck with your book.

Thursday, March 13, 2008

Source Code for that Testing Library

Ring the bells that still can ring
Forget your perfect offering
There is a crack in everything
That’s how the light gets in.
—Leonard Cohen

Yesterday, I posted on a trivial little testing library I hacked together.~~I’ve put the source online.~~ Get the source through Rob’s git repository (see below).

In the meantime, I discovered a problem with the idea of intercepting comparison operators, the technique used by the expect method. Ruby doesn’t really have != and !~ methods. Instead, the parser maps(a != b) into !(a == b). This means that the ComparisonProxy cannot intercept calls to either of these. This is a problem because

   expect(1) != 1

actually passes, because it becomes !(expect(1) == 1), and the expect method is happy with that.

I’m betting there’s a way around this…

Update: 14:26 CDT.

Rob Sanheim has set up a Git repository for the code. He says
I’ve put this up on github to watch what forks or releases develop around it.
```
git clone git://github.com/rsanheim/prag_dave_testing.git
```

Michael Neumann suggested a way around the negated == and =~ tests using source inspection:

class ComparatorProxy
   def ==(obj)
     # try to get the source code position of the call
     # and see if it's a != or a ==
   end
end

Wednesday, March 12, 2008

Playing with a Testing Library

I’d like to be able to express my unit tests fairly naturally, using the conditional operators built into the language. So, for example, I’d want to write:
```
      expect(factorial(5)) == 120
      expect(factorial(10)) > 10000
    
```

I’d like the error messages to show both the code that caused the error and the values that caused the error. So, for example, I’d want the following (incorrect) test


      expect(factorial(6)) == 600

to output something like

      /Users/dave/tmp/tmc/blog_tests.rb:16
          the code was: expect(factorial(6)) == 600,
          but 720 != 600

and


      expect(1) > 2

should say


      /Users/dave/Play/tmc/blog_tests.rb:11
      	the code was: expect(1) > 2,
      	but 1 <= 2

(Note how the expression showing the actual values negates the comparison operator to make it easier to read.)

I annotate my code with comments, so I’d like to be able to annotate my tests the same way.


    expect(factorial(6)) == 600  # Deliberate bad test

should produce something like

    /Users/dave/tmp/tmc/blog_tests.rb:17
       Deliberate bad test
       the code was: expect(factorial(6)) == 600,
       but 720 != 600

Sometimes I write longer comments.


   # The factorial of 6 is a special case,
   # because of the labor laws in Las Vegas
   expect(factorial(6)) == 600

So the resulting errors are longer, too.

           
   /Users/dave/Play/tmc/blog_tests.rb:21
      The factorial of 6 is a special case, because of the labor laws in Las Vegas
      the code was: expect(factorial(6)) == 600,
      but 720 != 600

I like to be able to group my tests.

      testing("positive factorials") do
        expect(factorial(1)) == 1
        expect(factorial(2)) == 2
        expect(factorial(5)) == 120
      end

      testing("factorial of zero") do
        expect(factorial(0)) == 1
      end

      testing("negative factorials") do
        expect(factorial(-1)) == 1
        expect(factorial(-5)) == 1
      end

I like the description of the group to appear along with any individual test annotation if a test fails.


      testing("factorial of zero") do
        # this test is deliberately wrong
        expect(factorial(0)) == 0
      end

will produce

      /Users/dave/Play/tmc/blog_tests.rb:31--while testing factorial of zero
          this test is deliberately wrong
          the code was: expect(factorial(0)) == 0,
          but 1 != 0

I like to have the flexibility to set up the environment for a group of tests. I also like to have the idea of a global environment which doesn’t get messed up by the running of tests (so that subsequent tests can run in that environment. I don’t see why I should have to package things into methods with magic names to have that happen. Instead, why not just have transactional instance variables? That way, I can use regular methods to set up the state for a test.
```
      @order = Order.new("Dave Thomas", "Ruby Book")

      testing("normal case") do
        expect(@order.valid?) == true
      end

      testing("missing name in order") do
        @order.name = nil
        expect(@order.valid?) == false
        expect(@order.error)  == "missing name"
      end

      # Check that order is reset to valid state here
      expect(@order.valid?) == true
    
```
So, in the preceding case, the second testing block changed the @order object. However, once the block terminated, the object was restored to its initial (valid) state.

So, while waiting for the last day of the Rails Studio to start, I hacked together a quick proof of concept. It’s less than 100 lines of code. All the output shown here was generated by it. Is this worth developing into something usable?

Monday, March 10, 2008

The 'Language' in Domain-Specific Language Doesn't Mean English (or French, or Japanese, or ...)

I’m a really big fan of Domain-Specific Languages. Andy and I plugged them back in ‘98 when writingThe Pragmatic Programmer. I’ve written my share of them over the years, and I’ve used even more. Which is why it is distressing to see that a whole group of developers are writing DSLs (and discussing DSLs) without seeming to get one of the fundamental principles behind good DSL design.

Domain experts don’t speak a natural language

Let’s say that another way. Whenever domain experts communicate, they may seem to be speaking in English (or French, or whatever). But they are not. They are speaking jargon, a specialized language that they’ve invented as a shorthand for communicating effectively with their peers. Jargon may use English words, but these words have been warped into having very different meanings—meanings that you only learn through experience in the field.

Let’s look at some successful domain specific languages before turning our attention on the way that some DSLs are trying just a little too hard.

Success Story 1: Dependency Management in Make

The Make utility has been a mainstay of Unix software development for over 30 years. You can complain about some strange syntax rules (some of which involve the invisible difference between tabs and spaces), but it would be hard to argue that Make hasn’t had a major impact in the open source world.

At its heart, Make addresses the building of systems from components in the presence of dependencies. Make lets me express the dependencies between header files, source files, object files, libraries, and executable images. It also lets me specify the commands to execute to resolve those dependencies when certain items are missing. For example, I could say


my_prog.o: my_prog_.c common.h

extras.o:   extras.c common.h

my_prog:    my_prog.o extras.o
            cc -o my_prog -lc my_prog.o extras.o

This example of the Make DSL says that my_prog.o depends on my_prog.c and common.h, and thatextras.o depends on extras.c and also depends on common.h. The final program, my_prog, depends on the two object files. To build the program, we have to execute the cc command on the line that follows the dependency line. No build command is needed for the object files: in this case Make knows what to do implicitly.

People who build software from source are domain experts in the area of dependencies and build commands. They need concise ways of expressing that expertise, of saying things like “if I ask you to ensure my program is up-to-date, and the common header file has been changed, then I want you to rebuild all the dependent object files before then rebuilding the main program”. Make is by no means perfect, but its longevity shows that it goes a long way as a DSL to meeting its expert’s needs.

Success Story 2: Active Record Declarations

Love it or loathe it, you have to admit that Rails has changed the game. And one reason is its extensive use of DSLs. For example, when you are writing model classes, you are claiming to be an expert on your application’s domain, and in the relationships between objects in that domain. And Rails has a nifty DSL to let you express those relationships.


class Post < ActiveRecord::Base
  has_many :comments
  ...
end

class Comment < ActiveRecord::Base
  belongs_to :post
  ...
end

The two lines containing has_many and belongs_to are part of a data modeling DSL provided by Rails. Behind the scenes, this simple-looking code creates a whole heap of supporting infrastructure in the application, infrastructure that allows the programmer to easily navigate and manage the relationships between (in this case) posts and comments.

At first blush, this might seem like an English-language DSL. But, despite appearances, has_many andbelongs_to are not English phrases. They are jargon from the world of modeling. They have a specific meaning in that context, a meaning that is clear to developers using Rails (because those developers take on the role of domain modeler when they start writing the application).

Success Story 3: Groovy Builders

The Groovy language has a wonderful way of expressing data in code. The builder concept lets you construct a set of nodes as a side effect of code execution. You can then express those nodes as (for example) XML, or JSON, or Swing user interfaces. Here’s a trivial example that constructs some nodes describing a person which we can then output as XML.


  result = new StringWriter
  xml = new groovy.xml.MarkupBuilder(result)
  xml.person(category: 'employee') {
    name('dave')
    likes('programming')
  }
  println result

This would generate something like


  <person category="employee">
    <name>dave</name>
    <likes>programming</likes>
  </person>

(Jim Wierich took this idea and created the wonderful Ruby Builder library, the basis of Rails’ XML-generating templates.)

Again, we have a DSL aimed squaring at someone who knows what they are doing. If you’re creating XML, then you know that the elements can be nested, that they can have textual content, and that elements have optional attributes. The Builder DSL takes care of all the details for you—the angle brackets, any quoting, and so on—but you still have to know the underlying concepts. Again, the language of the DSL is the language of the domain.

Seduced by Language

Over the years, people have looked at DSLs and wondered just how far they can be taken. Would it be possible to create a DSL that could be used by somewhat who wasn’t a domain expert? So far, the answer is “no.” The problem is that abstractions leak—to do things in a domain, you need to know the domain. The folks who brought us Startrek TNG pretended otherwise. Jean Luc Picard used an English language DSL to talk to his food dispenser. It worked every time. But, in the real world, you know that the first time someone said “Earl Gray, hot” to this magic box, they’d be surprised when a naked English peer covered in baby oil popped out.

The reality is that languages such as English, French, and so on, are imprecise. That ambiguity makes them powerful. Because of this, whenever we try to create a DSL that looks like a natural language, we fall short. Take AppleScript as an example. On the face of it, it looks nice and expressive—we’re writing something that looks very natural. Here’s an example from the Apple example scripts.


  set this_file to choose file without invisibles
  try
  	tell application "Image Events"
  		launch
  		set this_image to open this_file
  		scale this_image to size 640
  		save this_image with icon
  		close this_image
  	end tell
  on error error_message
  	display dialog error_message
  end try

Kind of makes sense, doesn’t it? I thought so too. So, for years, I’ve been trying to get into AppleScript. I keep trying, and I keep failing. Because the language is deceptive. They try to make it English-like. But it isn’t English. It’s a programming language. And it has rules and a syntax that are very unEnglish like. There’s a major cognitive dissonance—I have to take ideas expressed in a natural language (the problem), then map them into an artificial language (the AppleScript programming model), but then write something that is a kind of faux natural language. (Piers Cawley calls these kinds of DSLs domain-specific pidgin, but my understanding is that pidgins are full languages, and our code hasn’t got that far.)

What’s the point? When you’re writing logic like this, with exception handling, command sequencing, and (in more advanced examples) conditionals and loops, then what you’re doing is programming. The domain is the world of code. If you’re not up to programming, then you shouldn’t be writing AppleScript. And if you are up to programming, then AppleScript just gets in your way.

But this isn’t a discussion of AppleScript. That’s just an example of the kind of trouble you get into when you forget what the domain is and try to create natural language DSLs.

Testing Times

Here’s a little code from a test written using the test/spec framework.


  specify "should be a string" do
    @result.should.be.a.kind_of String
  end
  specify "value should be 'cat'" do
    @result.should.equal "cat"
  end

It’s an elegant example of what can be done with Ruby. And, don’t get me wrong. I’m not picking on Chris here. I think he’s created a clever framework, and one that is likely to become quite popular.

But let’s look at it from a DSL point of view. What is the domain? I’m thinking it is the specification of the correct behavior of programs. And who are the domain experts? That’s a trickier question to answer. In an ideal world, it would be the business users. But, the reality is that if the business users had the time, patience, an inclination to write things at this level, they wouldn’t need programmers. Don’t kid yourselves—writing these specs is programming, and the domain experts are programmers.

As a programmer, a couple of things leap out at me from these tests. First, there’s the duplication. Thespecify lines are a form of grouping, and each contains a string documenting what that group tests. But the whole point of the DSL part of the exercise is to make that blindingly obvious anyway. Now the BDD folks say that you write the specifications first, without any content, and then gradually add the tests in the blocks as you add supporting application code. I’d suggest that you might want to look at ways of removing the eventual duplication by transforming the specification into the test.

But for me the really worrying thing is the syntax. @result.should.be.a.kind.of String. It reads like English. But it isn’t. The words are separated by periods, except the last two, where we have a space. As a programmer, I know why. But as a user, I worry about it. In the first example, we write@result.should.be.a.kind_of. Why not kind.of? If I want to test that floats are roughly equal, I’d have said @result.should.be.close value. Why not close.to value?

Trivial details, but it means that I can’t just write tests using my knowledge of English—I have to look things up. ANd if I have to do that, why not just use a language/API that is closers to the domain of specifications and testing? Chris’s work is great, but it illustrates how a DSL that pretends to be English can never really get there. The domain of his language is software development—it would be perfectly OK to produce a DSL that makes sense in that domain.

RSpec is another behavior-driven testing framework. Here’s part of a specification (or should it be test?).


  describe "(empty)" do

    it { @stack.should be_empty }

    it_should_behave_like "non-full Stack"

    it "should complain when sent #peek" do
      lambda { @stack.peek }.should raise_error(StackUnderflowError)
    end

    it "should complain when sent #pop" do
      lambda { @stack.pop }.should raise_error(StackUnderflowError)
    end

  end

Another nice, readable piece of code, full of clever Ruby tricks. But, again, the attempt to create a natural language feel in the DSL leads to all sorts of leaks in the abstraction. Look at the use of should. We have should be_empty. Here, the actual assertion is (somewhat surprisingly) “should be_”. That’s right—the be_ part is really part of the should, indicating that what follows the underscore is a predicate method to be called (after adding a question mark, so we’d call @result.empty? in this case). Then we have another way of using _should_ in the phrase it_should_behave_like—all one word. Then there’s a third way of using should when we reach should raise_error. And, of course, all these uses of _should_ differ from the use in test/spec, even though both strive for an English-like interface. The same kinds of dissonance occur with the use of it in the first three lines (it {...} vs.it_should_... vs. it "...").

It’s a Domain Language

Just to reiterate, I’m not bashing either of these testing frameworks—they are popular and I’m in favor of anything that brings folks to the practices of testing.

However, I am concerned that the popularity of these frameworks, and other similar uses of English-as-a-DSL, may lead developers astray. Martin Fowler writes about fluent interfaces. I think his work might have been misunderstood—the fluency here is programmer fluency, not English fluency. It’s writing succinct, expressive code (and, in particular, using method chaining where appropriate).

The language in a DSL should be the language of the domain, not the natural language of the developer. Resist the temptation to use cute tricks to make the DSL more like a natural, human language. By doing so you might add to its readability, but I can guarantee that you’ll be taking away from its writability, and you’ll be adding uncertainty and ambiguity (the strengths of natural languages). The second you find yourself writing


  def a
    self
  end

so that you can use “a” as a connector in

  
    add.a.diary.entry.for("Lunch").for(August.10.at(3.pm))

you know you’ve crossed a line. This is not longer a DSL. It’s broken English.

Sunday, January 27, 2008

Pragmatic Dave on Passion, Skill and 'Having A Blast'

Link: Pragmatic Dave on Passion, Skill and 'Having A Blast'

At QconLondon 2007 Jim Coplien spoke with “Pragmatic” Dave Thomas for InfoQ. This energetic 30-minute interview runs the gamut of Dave’s wide-ranging interests: ‘agile’ publishing; how to turn what you love doing into a book; programming (and methodology) monocultures; staying limber with code “katas”; and advice for academics: help your students live with the passion of a 5-year old!

Monday, January 7, 2008

A loud "Huzzah!" was heard throughout the land

Eric Hodel is giving RDoc some love. You can’t imagine how happy that makes me.

When I first wrote RDoc, I was trying to find a way of solving two problems:

Adding comments to the largely uncommented C source of Ruby, and

Providing a means for library writers easily to document their creations.

I’d just finished the PickAxe, and I wanted to take the work Andy and I had done reverse engineering the Ruby API and add it back into the interpreter source code.

I set myself constraints with RDoc and ri:

it should produce at least some documentation even on totally uncommented source files

it should extract tacit information from the program source (for example guessing good names for block parameters by looking for yield statements inside methods)

the markup in the source files should be unobtrusive. In the typical case, someone reading the source should not even notice that the comments follow markup conventions

it should only use libraries that come pre-installed with Ruby

the documentation it produced should be portable across machines and architectures

it should allow incremental documentation. Libraries that you install over time can add methods to existing classes. As you add these libraries, the method lists in the classes you extend should grow to reflect the changes

it should be secure. People pushed many times to add the ability to execute code during the documentation process. I didn’t want to have code run on an end user’s machine during a process that ostensibly was simply installing documentation (particularly as these installations often ran as root)

it should be throw-away

The last one might be a surprise, but the real objective of RDoc wasn’t the tool. The real objective was to set a standard that meant that future libraries would get documented in a consistent and usable way. And so RDoc and ri compromised like crazy. Rather than a database or some complex binary format, they used a set of directory trees in the user’s filesystem to store documentation. This documentation, which is basically a set of Ruby objects, was stored using YAML, rather than marshaled objects or Ruby source. Even though YAML is slow, it is more portable than marshaled objects, and more secure than Ruby source. The parser in RDoc was a wild hack on the parser in irb. This means it performs a static, not dynamic, analysis and that it is sometimes confused by edge cases in Ruby syntax. So be it.

But the very worst part of RDoc/ri is the output side. I wanted to be able to produce output in a variety of formats: HTML, plain text, XML, chm, LaTeX, and so on. So the analysis side of RDoc produces a data structure, and passes it to the output side. Here I made a stupid design decision. What RDoc generates internally is basically nested hashes. This has a couple of major advantages. In particular, there’s a kind of fractal property when traversing it: it doesn’t matter how deep you are in the structure—all you pass to the next routine down is a hash. But it has a major downside—it’s a bitch to work with. If I were doing it again, I’d use Structs.

Finally, there’s the generation of the output itself. I needed a templating system and, for what seemed like good reasons at the time, I wrote my own. It was only a handful of lines of code initially. It’s still only a couple of hundred. It did a few things well, but ultimately it was ugly as sin. But now, as Erb has become something of a standard, it is definitely the right time to replace it.

RDoc and ri are, in a way, the ultimate stone soup. The code itself is not the output of the project. The real output is the thousands of libraries that are now self-documenting. Eric and the crew are busy on the stew, replacing the stones with real and tasty ingredients. When they are finished, we’ll be able to use all that library documentation in remarkable new ways. So, a big thank you to Eric and Seattle.rb, and to all the Ruby coders who’ve created such a great base of documentation for us all.

Here’s to RDoc 2.0.

Tuesday, January 1, 2008

Pipelines Using Fibers in Ruby 1.9—Part II

In the previous post, I developed a class called PipelineElement. This made it relatively easy to create elements that act as producers and filters in a programmatic pipeline. Using it, we could write Ruby 1.9 code like:

    10.times do
      puts (evens | multiples_of_three | multiples_of_seven).resume
    end

The construct in the loop is a pipeline containing three chunks of code: a generator of even numbers, a filter that only passes multiples of three, and another filter that passes multiples of seven. Numbers are passed from the producer to the first filter, and then from that filter to the next, until finally popping out and being made available to puts.

However, creating these pipeline elements is still something of a pain. It turns out that we can simplify things when it comes to creating filters. In the implementation I’ll show here, we’ll only handle the case of simple transforming filters—filters that take an input, do something to it, and write the result to the filter chain.

Let’s revisit the PipelineElement class

    class PipelineElement

       attr_accessor :source

       def initialize
         @fiber_delegate = Fiber.new do
           process
         end
       end

       def |(other)
         other.source = self
         other
       end

       def resume
         @fiber_delegate.resume
       end

       def process
         while value = input
           handle_value(value)
         end
       end

       def handle_value(value)
         output(value)
       end

       def input
         source.resume
       end

       def output(value)
         Fiber.yield(value)
       end
     end

The process method is the driving loop. It reads the next input from the pipeline, then callshandle_value to deal with it. In the base class, handle_value simply echoes the input to the output-real filters subclass PipelineElement and subclass this method.

Let’s make a small change to the handle_value method.

    def handle_value(value)
      output(transform(value))
    end

    def transform(value)
      value
    end

By doing this, we’ve split the transformation of the incoming value into a separate method. And the work done by this method no longer uses any of the state in the PipelineElement object, which means we can also do it in a block in the caller’s context. Let’s change our PipelineElement class to allow this. We’ll have the constructor take an optional block, and we’ll use that block in preference to thetransform. Here’s another listing, showing just the changed methods.

    class PipelineElement

      def initialize(&block)
        @transformer = block || method(:transform)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value))
      end
    end

This illustrates a cool (and underused) feature of Ruby. Method objects (created with the method(...)call) are duck-typed with proc objects: we can use .call(params) on both. This is a great way of letting users of a class change its behavior either by subclassing and overriding a method, or by simply passing in a block.

With this change in place, we can now write transforming filters using blocks. This is a lot more compact that the previous subclassing approach.

    class Evens < PipelineElement
      def process
        value = 0
        loop do
          output(value)
          value += 2
        end
      end
    end

    evens = Evens.new

    tripler     = PipelineElement.new {|val| val * 3}
    incrementer = PipelineElement.new {|val| val + 1}

    5.times do
      puts (evens | tripler | incrementer ).resume
    end

This outputs 1, 7, 13, 19, and 25.

Different Kinds of Filter

This approach works well if all we want is transforming filters. But what if we would also like to simplify filters that either pass of don’t pass values based on some criteria? A block would seem like a great way of specifying the condition, but we’ve already used our one block parameter up. Subclassing to the rescue. We can create two subclasses, Transformer and Filter. One sets the @transformer instance variable to any block it is passed. The other sets @filter. Here’s the relevant code:

    class PipelineElement

      attr_accessor :source

      def initialize(&block)
        @transformer  ||= method(:transform)
        @filter       ||= method(:filter)
        @fiber_delegate = Fiber.new do
          process
        end
      end

      # ...

      def handle_value(value)
        output(@transformer.call(value)) if @filter.call(value)
      end

      def transform(value)
        value
      end

      def filter(value)
        true
      end
    end

    class Transformer < PipelineElement
      def initialize(&block)
        @transformer = block
        super
      end
    end

    class Filter < PipelineElement
      def initialize(&block)
        @filter = block
        super
      end
    end

Thus equipped, we can write:

    tripler          = Transformer.new {|val| val * 3}
    incrementer      = Transformer.new {|val| val + 1}
    multiple_of_five = Filter.new {|val| val % 5 == 0}

    5.times do
      puts (evens | tripler | incrementer | multiple_of_five ).resume
    end

Moving The Blocks Inline

Our final hack lets us move the blocks directly into the pipeline.

Let’s look at the actual pipeline code:

    puts (evens | tripler | incrementer | multiple_of_five ).resume

Those pipe characters are simply calls to the | method in class PipelineElement. And methods can take block arguments, right? So what stops us writing

    puts (evens | {|v| v*3} | {|v| v+1} | multiple_of_five ).resume

It turns out that Ruby stops us. The brace characters are taken to be hash parameters, not blocks, so Ruby gets its knickers in a twist. Fortunately, that’s easily fixed by making the method calls explicit.

    puts (evens .| {|v| v*3} .| {|v| v+1} .| multiple_of_five ).resume

Now we just need to make the | method accept an optional block. If the block is present, we use it to create a new transformer.

    def |(other=nil, &block)
      other = Transformer.new(&block) if block
      other.source = self
      other
    end

Ruby 1.9 lets you chain method calls across lines, so we can tidy up our pipeline visually.

    5.times do
      puts (evens 
            .| {|v| v*3}
            .| {|v| v+1}
            .| multiple_of_five 
           ).resume
    end

A Palindrome Finder

Let’s finish with another trivial example. We’ll create a generic producer class that takes a collection and passes it, one element at a time, into the pipeline.

    class Pump < PipelineElement
      def initialize(source)
        @source = source
        super()
      end
      def process
        @source.each {|item| Fiber.yield item}
        nil
      end
    end

Now we can write a simple palindrome finder (a palindrome is a word which is the same when spelled backwards).

    words = Pump.new %w{Madam, the civic radar rotator is not level.}
    is_palindrome = Filter.new {|word| word == word.reverse}

    pipeline = words .| {|word| word.downcase.tr("^a-z", '') } .| is_palindrome

    while word = pipeline.resume
      puts word
    end

This outputs: madam, civic, radar, rotator, level.

But what if we instead want to show each word in the input stream, and flag it if it is a palindrome? That’s easily done, but we won’t do it the easy way. Instead, let’s show a more convoluted method, because it might be useful in the general case.

There’s no law to say that a transformer that receives a string as input has to write a string as output. It could, if it wanted to, write an array. Or a structure. So we could write:

    WordInfo = Struct.new(:original, :forwards, :backwords)

    words = Pump.new %w{Madam, the civic radar rotator is not level.}

    normalize = Transformer.new {|word| [word, word.downcase.tr("^a-z", '')] }

    to_word_info = Transformer.new do |word, normalized|
      reversed = normalized.reverse
      WordInfo.new(word, normalized, reversed)
    end

    formatter = Transformer.new do |word_info|
      if word_info.forwards == word_info.backwords
        "'#{word_info.original}' is a palindrome"
      else
        "'#{word_info.original}' is not a palindrome"
      end
    end

    pipeline = words | normalize | to_word_info | formatter

    while word = pipeline.resume
      puts word
    end

This outputs

    'Madam,' is a palindrome
    'the' is not a palindrome
    'civic' is a palindrome
    'radar' is a palindrome
    'rotator' is a palindrome
    'is' is not a palindrome
    'not' is not a palindrome
    'level.' is a palindrome

So, What’s the Point?

Is this a great way of writing a palindrome finder? Not really. But…

What we’ve done here is turned the way a program works on it’s head. We’ve written chunks of isolated code, each of which either filters or transforms an input. We’ve then independently knitted these chunks together. That’s a high degree of decoupling. We can also leave it until runtime to determine what gets put into the pipeline (and the order that it appears in the pipeline), which means we can move more power into the hands of our users.

Could we have done all this without Fibers? Of course. Could we do it without Ruby 1.9? Absolutely. But sometimes factors come together which lead us to experiment with new ways of thinking about our code.

This pipeline stuff is not revolutionary, and it isn’t generally applicable. But it’s fun to play with. And, for me, that’s the main thing.

A Wee Postscript

All this content is stuff that I decided not to include in the third edition of the PickAxe. It didn’t work in the section on fibers, because it uses programming techniques not yet covered. It didn’t work later because, as an example of various programming techniques, it is just too long.