ryah ([info]four) wrote,
@ 2009-01-26 13:47:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:benchmark, rails, stats

nginx vs haproxy vs nginx+maxconn
this data includes two runs of nginx+maxconn (AKA nginx-ey-balancer) because I was testing if commenting out a certain line of code affects the results (it doesn't)

The setup:

  • haproxy-1.3.15.5 (worker process 1, maxconn 1, minconn 1)
  • nginx-0.6.34 (worker process 1)
  • ngx_max_connections-0.0.3 + nginx-0.6.34 (worker process 1, maxconn 1)
The data was collected using ab -c 30 -t 120. Behind the proxies are 3 mongrels serving a Rails application (the stats for a single one are here). (raw data and config files)







When you have flaky Rails backend processes, Nginx's behavior of dispatching requests to backends as it receives them leads to erratic response times. Sometimes Nginx responds quickly but more likely it will respond slowly:
> nginx
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   18.0   138.0   418.0   534.2   814.0  3056.0 

> nginx+maxconn O
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   44.0   412.0   480.0   493.8   557.0  2389.0 

> haproxy
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  247.0   413.0   475.0   479.9   537.0  2644.0 

For more consistent and on average faster response times one should queue requests in the proxy and send them out to backends one or two at time.



(5 comments) - (Post a new comment)

Great timing
[info]geekfun
2009-02-03 07:43 pm UTC (link)
I've been working with our devs to try and understand the performance of a Rails app they've been developing.

I was a little surprised to find that people relied on mongrel to handle the queuing of requests. It seemed like it would be both more efficient and less variable to do it at the proxy layer. I now see that it is.

From eyeballing the area under the curves in your graphs, it looks though like both maxconn and haproxy could do a better job of managing the queues than they are, since the median response time is higher than for straight nginx. Maybe I'm not thinking about it right though.

(Reply to this) (Thread)

Re: Great timing
[info]four
2009-02-06 11:21 am UTC (link)
This is with dispatching one request to the Rails backends at a time. Legend has it that sending 2 at a time gets better results. I will play with the settings in a later post to figure out what is optimal.

(Reply to this) (Parent)(Thread)

Re: Great timing
[info]jason_watkins
2009-06-05 12:09 am UTC (link)
There's a general result in queue theory that a single larger queue is typically better than parallel smaller queues. This is because the single queue only pays the price of correlated high service times from the workers, whereas with multiple queues any jumps in service time affect all currently queue requests at that worker.

So while it may make intuitive sense to always have a request "on deck" for each worker to eliminate idle time, it really depends on the numbers. My guess is that variations in service time (10's of ms) are so much larger than the time to get dispatched a request when idle (likely under 1 ms on LAN) that it actually isn't a good idea. Like anything with performance: it's always worth measuring to try to see what's really going on.

Ideally I'd like to see a two part http proxy where requests are queued at a front end node (or nodes for reliability purposes) and worker nodes use work stealing to pass one request at a time to a http backend. Haven't dug into this enough to know if fuzed or haproxy could run in this mode.

(Reply to this) (Parent)

Great analysis
(Anonymous)
2009-04-30 10:32 am UTC (link)
it is great analysis of this data.

I think you are using ggplot2 package with R. Can you post the R scripts to draw all these plots?

Thanks a bunch.

(Reply to this) (Thread)

Re: Great analysis
[info]four
2009-06-11 09:34 am UTC (link)
Yes, it's R with ggplot2. Here is the code I use (messy)

library(ggplot2)

ab.load <- function (filename, name) {
  raw <- data.frame(read.csv(filename, sep="\t", header=T), server=name)
  raw <- data.frame(raw, time=raw$seconds-min(raw$seconds))
  raw <- data.frame(raw, time_s=raw$time/1000000)
  raw
}

ab.tsPoint <- function (d) {
  qplot(time_s, ttime, data=d, facets=server~.,
        geom="point", alpha=I(1/15), ylab="response time (ms)",
        xlab="time (s)", main="c=30, res=26kb", 
        ylim=c(0,100))
}

ab.tsLine <- function (d) {
  qplot(time_s, ttime, data=d, facets=server~.,
        geom="line", ylab="response time (ms)",
        xlab="time (s)", main="c=30, res=26kb", 
        ylim=c(0,100))
}

ab.histogram <- function (d, max) {
  qplot(ttime, data=d, geom="histogram",
        facets=server~.,main="c=30, res=123bytes",
        binwidth=1, xlab="response time (ms)",
        xlim=c(0,max))
}

(Reply to this) (Parent)


(5 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…