ryah ([info]four) wrote,
@ 2009-01-29 02:26:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:rant

Public Service Announcement: the "P" in "HTTP" stands for "protocol"
FastCGI is a special protocol for HTTP proxies to speak to backends. I think most are missing the absurdity of this. Let's review what happens:

  1. the (frontend) proxy server receives a HTTP request.
  2. it parses the HTTP and rewrites it in into the FastCGI protocol
  3. the proxy transfers it to your backend process (PHP, Rails, etc)
  4. the backend process parses the FastCGI and processes the request
  5. the backend sends a reply to proxy using FastCGI
  6. the proxy parses the FastCGI and rewrites it into HTTP
  7. the proxy sends the reply to the client.
This is idiotic because there is already a protocol for encoding HTTP messages: HTTP.

It's not like FastCGI is infinitely easier to parse or extremely more compact. Indeed, FastCGI is hard enough to parse that someone has invented: SCGI (Simple CGI). If you notice, the headers and message body in SCGI are still sent in their original encoding - SCGI only saves a couple of characters—a few "\r\n" here and there, "HTTP/1.1", etc.—but at the cost of introducing an entirely new protocol. Even more ridiculous, Passenger Phusion (an apache module which load balances Rails instances and restarts them when they die) has invented their own HTTP transfer protocol despite the popularity of the Mongrel HTTP server library in Ruby. Who knows how many others are doing this behind closed doors.

These "optimizations" only complicate things. HTTP is a fine protocol: an open standard which quite literally is becoming the backbone of our society. It is understood by countless software. It has support for all sorts of various features. It's even compact when compared with things like XMPP.

There are well tested, feature-full, fast, free, stand-alone, easy-to-use HTTP parsers out there. There is no excuse to use SCGI, FastCGI, or to invent your own HTTP-transfering protocol.

So please, the next time you're thinking about which protocol to use to transfer HTTP messages, just choose HTTP.



(45 comments) - (Post a new comment)


[info]joneshead
2009-01-29 06:04 am UTC (link)
Dear Ry,

As often as Ive said it before - the arcane nature of this topic continues to befuddle me, just as Im sure a baptist preacher does a trout.

I will end this note with an appropriate non-sequiter:

Give a man fire, he will be warm for a night ; set a man on fire and he will be warm for the rest of his life.

(Reply to this)


[info]hypostatization
2009-01-29 01:45 pm UTC (link)
Landed here via reddit, long time ry.

(Reply to this) (Thread)


[info]four
2009-01-29 03:46 pm UTC (link)
:*

(Reply to this) (Parent)

Yes but
(Anonymous)
2009-01-29 01:49 pm UTC (link)
FastCGI is FAST!!!!!11!1111111

(Reply to this) (Thread)

Re: Yes but
(Anonymous)
2009-01-29 03:02 pm UTC (link)
It's also CGI.

(Reply to this) (Parent)


[info]nikolasco
2009-01-29 02:49 pm UTC (link)
Dood, I just wanna be cool and design protocols and write parsers, like IETF do.

Seriously, this isn't even necessary for X-Sendfile functionality (just have the proxy handle it). These protocols seem to be less about actual protocols and more about process management, which could be done completely external (e.g. proxy checks for pulse, something else handles restarts) or be internally (limited to local machine, I'm not aware of an implementation).

(Reply to this)


(Anonymous)
2009-01-29 03:13 pm UTC (link)
Figuratively. Society doesn't have a backbone. Or any bones.

(Reply to this)


(Anonymous)
2009-01-29 03:15 pm UTC (link)
If the proxy just passes the HTTP msg without any modification, than the back-end will have to know everything about this protocol. For example, it will have to distinguish between the Real and Virtual servers. What about HTTP compression ? What about HTTPS ? What about a different HTTP version ?

(Reply to this) (Thread)


[info]codetoad
2009-01-29 10:55 pm UTC (link)
This is the answer.

(Reply to this) (Parent)


[info]baxil
2009-01-29 11:30 pm UTC (link)
Umm ... so instead we'll abstract it out, and now the back-end has to know everything about FastCGI. Which leads us back to the original post:
It's not like FastCGI is infinitely easier to parse or extremely more compact. Indeed, FastCGI is hard enough to parse that someone has invented: SCGI (Simple CGI). If you notice, the headers and message body in SCGI are still sent in their original encoding - SCGI only saves a couple of characters—a few "\r\n" here and there, "HTTP/1.1", etc.—but at the cost of introducing an entirely new protocol.

(Here via Reddit, btw.)

(Reply to this) (Parent)


[info]unnes
2009-01-29 03:21 pm UTC (link)
Ryah on Reddit? Who'da thunk.

(Reply to this)

macournoyer
[info]macournoyer.com
2009-01-29 03:23 pm UTC (link)
Amen!
Sad that CGI headers have become the standard. HTTP_HOST instead of Host, that clearly is legacy.

(Reply to this)


(Anonymous)
2009-01-29 03:35 pm UTC (link)
For someone being pedantic I'm surprised you can't use the word 'literally' correctly. Our society does not *literally* have a backbone of http, it *figuratively* does.

(Reply to this) (Thread)


(Anonymous)
2009-01-29 04:02 pm UTC (link)
Using "literally" as emphasis dates back to at least Mark Twain, who used it himself. it's called hyperbole. It's fair to say that it's cliche, but it's certainly not ungrammatical or an incorrect usage. (Except insofar as all hyperbole is, which is not very.)

(Reply to this) (Parent)(Thread)


(Anonymous)
2009-01-29 06:34 pm UTC (link)
It's not even hyperbole. His statement is simply correct; "literally" modifies "becoming."

Literally, adverb: 3. true to fact; actually; without exaggeration or inaccuracy.
Backbone, noun: 3. A main support or major sustaining factor.

The author's sentence could be rewritten: "...an open standard which [truly] is becoming the [major sustaining factor] of our society."

(Reply to this) (Parent)(Thread)


(Anonymous)
2009-01-30 04:36 am UTC (link)
It is an exaggeration, which is precisely why it is misused.

(Reply to this) (Parent)(Thread)


(Anonymous)
2009-01-30 05:19 am UTC (link)
Seriously. It's not much of an exaggeration, if at all. Even if it is, since when is exaggeration considered a misuse of language?

(Reply to this) (Parent)

FastCGI for stateful CGI
(Anonymous)
2009-01-29 04:07 pm UTC (link)
I thought one of the main reasons to use FastCGI was that CGI is stateless so you cant initialize anything or maintain any initialization state. FastCGI does allow you to initialize things and use them for each request. I've found that to be useful and don't really care about the protocol used.

(Reply to this) (Thread)

Re: FastCGI for stateful CGI
[info]hattmoward
2009-01-29 06:51 pm UTC (link)
FastCGI is not for "stateful CGI" nor does having some stuff initialized before accepting requests make your program stateful. It just avoids repeating the all the work done up until the point you process a CGI request. You have no guarantee that the next FCGI request is from the same client, so you have to pass a session ID and recover session information in order to fake stateful behavior - just like a CGI app.

FastCGI had a specific niche it wanted to fill - it didn't replace, rewrite, or reimplement HTTP because that was outside its scope. FastCGI is an alternative to CGI, not HTTP - its niche was to make CGI scripts easy to retrofit in order to avoid the cost of process startup.

If one wanted to go outside that scope, there are and have been many alternatives to FastCGI. They knew that, and they point out the narrow scope in their design. I don't know about the other projects mentioned, though.

(Reply to this) (Parent)

Other information
[info]ianbicking
2009-01-29 04:29 pm UTC (link)
FastCGI (and SCGI etc) have the ability to send extra data that is not HTTP headers. For instance, REMOTE_USER, or a value like HTTPS=on. You can send these through headers like X-Forwarded-Scheme, but there's no standard for those headers, not even documented conventions, and it introduces worries that someone who gets access to the underlying proxied server might be able to forge these values.

Also conventional HTTP implementations only are set up to use inet sockets, and not named sockets or pipes. Of course this isn't really HTTP's fault, it can work over any socket, but generally proxying adds an extra complication of managing a set of local ports.

(Reply to this) (Thread)

Re: Other information
[info]four
2009-01-29 04:35 pm UTC (link)
HTTP can work perfectly fine over UNIX sockets. Nginx and Thin pair up and do this regularly.

Adding headers to the request to convey extra information is a perfectly valid. Although, I agree that there are some security considerations, it does not warrant a new protocol.

(Reply to this) (Parent)


(Anonymous)
2009-01-29 05:47 pm UTC (link)
STFUUUUUUUUUUUUUUUUUUUUUUUUUUUU!!!!!!!!!!!!!!!!!!!!!!!!!ONE

(Reply to this)

I've been saying this for years
(Anonymous)
2009-01-29 06:27 pm UTC (link)
@work, we've been building production systems using *only* HTTP as a communication layer (between proxy/apache & app server) for over 3 years.

I never understood why you'd want to introduce another protocol in your stack when you've got mod_proxy & mod_rewrite.

So thanks!

(Reply to this) (Thread)

Re: I've been saying this for years
(Anonymous)
2009-01-29 07:38 pm UTC (link)
In many countries they use a really screwed up and insanely complicated protocol known as HL7 to transmit records around hospitals etc ("level 7" is a reference to the OSI stack) .

Use of HL7 began just before the Internet took off, but refuses to die in favor of XML/HTTP. One of the reasons your medical insurance premiums are so high is that they waste billions of dollars on rubbish software.

(Reply to this) (Parent)(Thread)

Re: I've been saying this for years
(Anonymous)
2009-01-29 09:50 pm UTC (link)
Me thinks you are not familiar with HL7 v3. And no, billions are not spent on "rubbish" software. Billions are spent on life_saving_hardware.

(Reply to this) (Parent)(Thread)

Re: I've been saying this for years
(Anonymous)
2009-01-30 06:30 am UTC (link)
I am familiar with version 3 actually. The CDA might have something going for it as a document standard but it's still rubbish as a messaging protocol. HTTP works brilliantly for the rest of us.

Me thinks you are not familiar with the damning reports of the National Audit Office and Public Accounts Committee on the UK £12.7bn National Programme for IT (NPfIT). It is currently running 4 years late with no end in site, and very little of value produced.

I have worked in the health IT area for about 15 years, and I also know how to do a proper cost utility analysis. I know most of the "life saving" claims for health IT systems are pure bullshit, with numbers for the alleged benefits plucked out of the air. It's not just in the UK but all over the world.

(Reply to this) (Parent)


[info]hattmoward
2009-01-29 06:58 pm UTC (link)
You seem to have forgotten a major offender in your rant:



CGI is a special protocol for HTTP proxies to speak to backends. I think most are missing the absurdity of this. Let's review what happens:

1. the (frontend) proxy server receives a HTTP request.
2. it parses the HTTP and rewrites it in into the CGI 1.1 interface
3. the proxy creates the backend process and transfers the request (bash, Perl, PHP, Rails, etc)
4. the backend process parses the CGI request and processes the request
5. the backend sends a reply to proxy using CGI
6. the proxy parses the CGI and rewrites it into HTTP
7. the proxy sends the reply to the client.

This is idiotic because there is already a protocol for encoding HTTP messages: HTTP.

It's not like CGI is infinitely easier to parse or extremely more compact.
[...]



(Though if you think CGI, FastCGI, or SCGI are egregious, look up the old WinCGI)

(Reply to this) (Thread)


[info]four
2009-01-29 07:02 pm UTC (link)
no one actually uses cgi anymore

(Reply to this) (Parent)(Thread)


[info]hattmoward
2009-01-29 08:07 pm UTC (link)
Who is still using FastCGI in a production environment where the cost of an added protocol matters? Who is using SimpleCGI at all? They all do the same general thing, regardless of how many use them. Why not rant against them all?

Nobody questions the validity of CGI in its place, mostly because we've already moved on. People have been using mod_php, mod_perl, reverse proxies, in-process app servers, and similar systems for nearly a decade now. If you're ranting about the redundancy and cost of massaging HTTP requests, isn't it a bit late? If they're the current offender, why not just send an inquisitive note to the Phusion Passenger mailing list?

(Reply to this) (Parent)(Thread)


[info]four
2009-01-29 08:17 pm UTC (link)
FastCGI, SCGI - these technologies are quite in use. (Most notably with PHP, I think.)

(Reply to this) (Parent)

History lesson
(Anonymous)
2009-01-29 07:40 pm UTC (link)
If you dig into the history of FastCGI a bit its original purpose was to get around the original CGI spec way back in the mid-90's. The problem was that every web request to a CGI script launched a new copy of that script. For example- if you had 100 simultaneous requests for a perl script, you'd have 100 copies of that program running (starting up, running, exiting = USELESS OVERHEAD) on your server. At the time there was no way with then-available technology to connect a CGI (common gateway interface) request to an already running server (at least in a generic way).

Now, lets say you've got a very large CGI script, say on the order of 20-30 megs (yes, they existed back then). The startup time alone would completely kill your performance. Heck just loading that script into memory would take a few seconds, nevermind running, making database calls and exiting (so the webserver could process the output.

FastCGI (which actually is an appropriate name because its a more efficient version of the CGI) is NOT a protocol, its an Interface, and was designed to simply have a generic way to pass control of the request from the webserver to an already running backend process (Perl, Tcl, etc) that would handle the request and remain running (no startup/shutdown penalty). HTTP never really came into the equation because that's what the webserver did, talked HTTP to the client (which doesn't really care HOW the request got served). We're talking about the other end of the server entirely. Not doing things in this way would have required building your own webserver (which OpenMarket actually did and did well).

What's kinda shocking to me is that FastCGI is still in use today. Hope someone finds this interesting!

(Reply to this) (Parent)


[info]xtat
2009-01-29 08:57 pm UTC (link)
Do you feel the same way about AJP? AFAIK the argument in favor of AJP is that it will reduce bandwidth between frontend/backend.

I am not sure of the merits, but surely certainly end to end HTTP would be easier to manage.

(Reply to this) (Thread)

Re: Phusion Passenger
[info]http://claimid.com/foobarwidget
2009-01-29 09:31 pm UTC (link)
"Even more ridiculous, Passenger Phusion (an apache module which load balances Rails instances and restarts them when they die) has invented their own HTTP transfer protocol despite the popularity of the Mongrel HTTP server library in Ruby. Who knows how many others are doing this behind closed doors."

Ridiculous you say? Do you know how much effort is needed to write an HTTP parser? You say "It's not like CGI is infinitely easier to parse or extremely more compact". SCGI may not be extremely more compact, but it *is* infinitely easier to parse.

Just take a look at the HTTP grammar, it's huge. You need to write a full-blown recursive-descent parser, or another parser which can handle context-free grammars. In case of a recursive-descent parser you will need to write a tokenizer. This is a lot of code and is easy to get wrong. And when you're wrong, it can easily turn into a security vulnerability. Just take a look at the number of HTTP parsing vulnerabilities that Apache has had in its early days. Mongrel has an army of fuzz tests for its HTTP parser.

Or, we can bypass all this code and all this complexity by inventing our own protocol.

The Phusion Passenger backend protocol was designed to mimic HTTP while being as simple as possible - even simpler than SCGI. The protocol parser in Ruby consists of 1 (!) line of code and is extremely fast. Compare this the amount of code necessary to write an HTTP parser or even an SCGI parser. In case of SCGI I still have to parse the netstring length component, which means buffering data indefinitely until the end of the length component has been reached. Not so with Phusion Passenger's internal protocol - the length component is just a 32-bit big-endian integer in binary format.

We chose this protocol to minimize the amount of code that we write. If you can show me how to write an HTTP parser in 1 line of code, and the parser is just as fast as our current protocol parser, then maybe I'll consider using HTTP.

(Reply to this) (Parent)(Thread)

Re: Phusion Passenger
[info]four
2009-01-29 09:58 pm UTC (link)
in ruby it's as easy as

require 'mongrel'
p = Mongrel::HttpParser.new
p.execute(data)

Please benchmark it. It's quite efficient. I've written an extension to Zed's parser which allows for easy keep-alive processing and chunked body requests - i linked to it above. This parser has a ruby binding:
http://github.com/ry/flow/blob/4672b8271016d96662f7c795dab67c7c60905f6d/ext/flow_parser.c


even without fancy ragel parsers one can hack out a solution with .split("\r\n")

(Reply to this) (Parent)(Thread)

Re: Phusion Passenger
[info]http://claimid.com/foobarwidget
2009-01-29 10:52 pm UTC (link)
I don't think making Passenger dependent on Mongrel is a good idea. I suppose I could extract his HTTP parser and integrate it into Passenger, but then again I can invent my own format and write a parser for it in 1/10th of the time.

Generating a message in Passenger's internal format is almost trivial. I don't have to deal with escaping at all, as is the case with HTTP; I just have to ensure that my data doesn't contain NUL bytes.

Furthermore, split("\r\n") is a hack, as you already said. It works for the simplest cases, but it won't take long before someone h4x0rs your web server because of a flaw your parser. Writing a full-blown HTTP parser is far from trivial.

All in all, I see no advantage in using HTTP for Passenger's backends.

(Reply to this) (Parent)(Thread)

Re: Phusion Passenger
[info]four
2009-01-29 11:09 pm UTC (link)
All in all, I see no advantage in using HTTP for Passenger's backends.

it works quite well now - so why change it. :)

sorry to pick on your software - it was just the latest thing i had looked at.

(Reply to this) (Parent)

Re: No it's not
[info]nikolasco
2009-01-30 01:15 am UTC (link)
What part of RFC2616's grammar requires a pushdown automaton? I agree that you need some sort of variable/template support in your parser generator to make you life convenient, but that's it. I don't even think the look-ahead requirement is large (distinguishing date formats and headers of interest being the only serious ones).

(I'm not trying to pick on you. I'm just curious about these things.)

(Reply to this) (Parent)

Waka?
(Anonymous)
2009-01-29 09:32 pm UTC (link)
And on that note, how about Waka (http://en.wikipedia.org/wiki/Waka_(protocol))?

(Reply to this) (Parent)

No it's not
(Anonymous)
2009-01-29 11:42 pm UTC (link)
FastCGI is a special protocol for HTTP proxies to speak to backends.

The 'I' stands for interface, not protocol. And the 'G' for gateway, not proxy.

Move back 7 spaces.

(Reply to this)


(Anonymous)
2009-01-30 01:59 am UTC (link)
It's not like FastCGI is infinitely easier to parse or extremely more compact.


"Infinitely", no. However, the HTTP/1.1 spec (http://www.rfc-editor.org/rfc/rfc2616.txt) is 176 pages long, whereas the FastCGI spec (http://www.fastcgi.com/devkit/doc/fcgi-spec.html) is about 20. So I don't see how you can seriously imply that FastCGI isn't vastly simpler than HTTP.

Incidentally, SCGI (http://python.ca/scgi/protocol.txt) is almost two pages long and is way easier to parse.

(Reply to this) (Thread)


[info]four
2009-01-30 02:15 am UTC (link)
fastcgi still sends all the headers exactly as they were received. this is what rfc 2616 is mostly describing. fcgi does not abstract away, say, the "Accept-Language" header.

the basic layout of a http request: "POST /path HTTP/1.1\r\nHeaderField1: Value1\r\nHeaderField2: Value2\r\n\r\nbody" is rather simple.

(Reply to this) (Parent)


[info]four
2009-01-30 02:19 am UTC (link)
SCGI spec is basically "take everything in rfc 2616. now replace ':' with \0 and \r\n with \0" that's why it's so short - that's also why it's so pointless.

(Reply to this) (Parent)(Thread)


[info]http://claimid.com/foobarwidget
2009-01-31 09:15 am UTC (link)
Not quite. If you look at the HTTP message format (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4), you'll see that a field-value can contain an LWS -- a newline + tab/space indentation -- which is to be ignored. This alone makes HTTP harder to parse than SCGI, the latter which is just a matter of splitting the data with a \0 delimiter. A naive HTTP parser which splits the data with \r\n is not compliant with the specification and can easily cause security problems if one's not careful.

Furthermore, SCGI has the advantage that all keys and values are essentially null-terminated. This means that I can easily write an SCGI parser with a zero-copy architecture. This is much harder to do with HTTP because of the LWS requirement.

(Reply to this) (Parent)(Thread)


[info]four
2009-01-31 11:08 am UTC (link)
sure - my point is that rfc 2616 describes a lot more than just the message format. saying "SCGI spec is 2 pages and rfc2616 is 176 pages, therefore SCGI is 88 times easier" is ridiculous. this is all rather mute since there are already http parsers written. the whole point is: this problem has been solved

(Reply to this) (Parent)


(Anonymous)
2009-01-30 08:52 am UTC (link)
HTTP/1.1 is in fact insanely hard to parse (correctly). Just look at the kinds of encodings, compression, and chunks. (For this reason HTTP/1.1 is a *bad* standard. But it's a standard nevertheless, we're stuck with it.)

(Reply to this) (Parent)


(45 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…