Terminator - Timeout without Mercy

September 11th, 2008

If you have been following my posts on Ruby-Talk and Ruby on Rails and even RSpec mailing list (and who wouldn’t?! I mean, aside from my mother) then you would have noticed I have been banging my head against a brick wall on the subject of System calls not being handled by the Timeout libraries in Ruby…

I’ve done a lot of work with this. I mean a lot.

So I thought I would share.

It all starts with a basic requirement I have, which is to replicate a data set from one side of the planet to the other. This involves logging into a database server remotely using ActiveRecord across some encrypted VPNs, querying some tables, and then inside of a transaction, replicating down certain data on certain rows.

The solution I have works well, I can replicate about 2-5 rows per second across the link which is more than adequate (the dataset changes upto about 100 rows per minute). The problem is that occasionally, ActiveRecord’s link to the remote database will hit a snag, then Ruby will wait, and wait, and wait to Timeout…. This can take hours, and in fact, sometimes, it never does timeout, leaving a zombie process.

To avoid multiple replication problems, I have lock files which prevent further copies of the replicator firing up if it detects that one is already running.

So, in the end, I get stuck with several replicators “running”, all hung, all waiting for a link to come back that never will, and no replication happening. In the mean time, the rows are still changing and I am getting backlogged. Leave it for a day and you are 80,000 rows backlogged and people start asking questions and I start looking for old travel tickets that I had ‘forgotten about’.

But why does ruby do this?

Shouldn’t the Timeout library protect us from such evilness?

Well, yes and no. The problem is because ruby uses Green threads. Green threads mean that there is only one real Ruby process running on the server, and it “internally” makes other “threads” that it schedules within the main ruby process’ kernel time. Green threads are very efficient (they save a lot of context switching) but the problem is that if you get a System call that ‘blocks’, then the kernel will stop servicing your main ruby thread until your system call is complete, and THIS means that the little ‘Timeout.timeout(5) { my code }’ block you put in, will never get called…

And then you get a hang… forever.

I tried a couple of handlings, the first was the SystemTimer library by Philippe Hanrigou. This entry actually has a really good and simple example of what Green threads and blocking system calls mean. If you want to brush up, it is a good read.

But SystemTimer didn’t work. It uses alarm signals which could have some problems. But in my case, it didn’t work and I still got never ending timeouts.

So I asked again on the Ruby-Talk mailing list and Ara T. Howard came back with a quick script that created little homicidal external ruby processes to kill the ruby process I was in if it didn’t “make it” in time.

The approach was actually very clever, and it works!

What he did was go “We can’t be sure that the existing Ruby process will be able to nuke itself, so instead, lets start up another Ruby instance, running on it’s own, that has the PIDs of the ruby instance that spawned it, and if the time runs out, do a system based kill TERM on the pids.”

You could say that our little ruby processes have a license to kill. Literally.

So I implemented his idea, found a couple of problems, gave back some ideas, and he coded, I spec’d, tested and described and we released a brand new gem…

The Terminator.

You can get it with;


gem install terminator

Or grab the source code.

And how do you use it? Simple:

1
2
3
4
5
require 'terminator'
Terminator.terminate 1 do
  sleep 2
  puts "I'll never print"
end

This will never print because the terminator times out after 1 second, which is before the sleep of 2 seconds inside the block. This will raise a Terminator::Error, which you could catch and try again if you want.

This will always work. It is because we are starting a separate process of Ruby (which has some minor overhead) that waits the specified number of seconds and then just simple does a system kernel TERM on our misbehaving process.

Why should you use it?

Well, if you are making ANY calls to external web services, external databases, OpenID, Youtube, Google Maps… anything, then you should have a fail fast policy in place and time out rapidly if these fail. As these are external system calls, they will most likely not be caught by Ruby’s timeout.rb library… and that means that your application will just hang, waiting for the call that never comes back.

It is much better to go “Ok, 2 seconds are up, no response, let’s tell the user to try again in a minute” and render an appropriate message, than have the user frustratingly whack the reload button a few hundred more times wondering why your application is not responding.

So yes, I think you should use Terminator.

Besides, how many other gems do you know that have the method ‘plot_to_kill’ ?

blogLater

Mikel

17 Responses to “Terminator - Timeout without Mercy”

  1. Mike H Says:

    I just gave it a quick test on windows and it doesn’t seem to work. Do you happen to know if it’s supposed to work on windows?

    Thanks!

  2. Mikel Lindsaar Says:

    You know, I thought it might, but I just tested it as well and found the same thing. I’ll look into it.

    Mikel

  3. pistos Says:

    You should maybe link to the source code ( http://codeforpeople.com/lib/ruby/terminator ) for those that want to peruse the code online. :)

  4. Mike H Says:

    Thanks!

    I’ve got some code that does a similar thing on Windows, but it’s not nearly as complete as terminator, just a quick hack. If I see anything useful, I’ll let you know.

    Are you guys using git? I have to admit that when I found that the project wasn’t on github (and I couldn’t fork it there), my enthusiasm for possibly contributing a patch dropped off significantly.

  5. Mikel Lindsaar Says:

    @pistos: Good idea, fixed it :)

    @Mike: Let me know how it goes on Windows? I am having mixed reactions. On Git, no it’s a tar file right now. Maybe I’ll get off my butt and go gitify it :)

    Mikel

  6. Chuck Bergeron Says:

    This is super helpful, I have noticed some errors when using Ruby’s timeout on a typical XML call.

    Thanks Mikel!

  7. Justin Says:

    When I do this:

    begin
        Terminator.terminate :seconds => 2 do
            content = open(fix_http(uri))
        end
    rescue Terminator.error
        content = nil
    end

    it throws ‘Terminator::Error’ and terminates my web server process which is not the behavior I want :-).

    Any ideas?

  8. Mikel Lindsaar Says:

    @Justin, you need to:

    rescue Terminator::Error

    Instead of:

    rescue Terminator.error

    Mikel

  9. Ahsan Says:

    So .. a ruby process will be spawned for every request to a web service .. ?

  10. Ahsan Says:

    What if the call finishes within n seconds ? SystemTimer doesn’t seem to work in that case, and Terminator seems to terminate the process completely even if the call (system “ls”) was successfull.

    Any ideas ?

  11. Mikel Lindsaar Says:

    @Ahsan: That was working here. I just did a clean up on the initial code and added documentation for a 0.4.4 release. Maybe update and try again.

  12. Ahsan Says:

    @Mikel: Ok, I upgraded to 0.4.4, but the problem persists.

  13. Mikel Lindsaar Says:

    @Ahsan, could you pastie your code?

  14. Phil Says:

    The simple example of:

    begin Terminator.terminate(1) do sleep 3 end end

    rescue Terminator::Error
      puts "Timed out"

    does not work for me. I’m using rails 1.2.2, ruby 1.8.5, fattr 1.0.3, and terminator 0.4.4.

    Any help is appreciated.

  15. Doug Says:

    I’m having a similar problem as Phil. I can rescue Terminator::Error in Rails script/console but not when rails is being run with mongrel. Any

  16. Mikel Says:

    @Doug and @Phil

    Hmm… are you guys running on Windows?

    Mikel

  17. Doug Says:

    @Mikel – *Ubuntu (tried on both 7.10 and 8.04) *ruby 1.8.6 *Mongrel 1.1.5 *Rails 2.1.1

    I tried using a different signal in Terminator and commenting out the TERM trap in Mongrel both with no luck. Thanks for the gem, I really like it in theory ;). Doug

Leave a Reply