ryah ([info]four) wrote,
@ 2009-11-21 16:45:00
Previous Entry  Add to memories!  Share this!  Next Entry
New HTTP Parser
I've implemented a new HTTP/1.1 request and response parser by hand. (My previous parser was written with the help of Ragel.) It requires 124 bytes per HTTP connection, makes zero allocations, has no dependencies, is nearly optimal in its use of CPU instructions, interruptible on any character, has extensive tests, and is MIT licensed.

README

http_parser.h

http_parser.c



(Only one user at the moment: I've just merged it into Node.)



(32 comments) - (Post a new comment)


[info]bachan
2009-11-23 05:43 pm UTC (link)
hahaha, yeah, i can dig it!
that code was a bit of a laaad. :)

(Reply to this)


(Anonymous)
2009-11-25 01:48 pm UTC (link)
congratulations! you are one step closer to dreaming exclusively in hex.

(Reply to this)


(Anonymous)
2009-12-01 03:27 am UTC (link)
I like how simple this is. Very nice.

However, it looks like this only supports PUT, POST, DELETE, HEAD, and GET. What about OPTIONS, CONNECT, and TRACE (or other random HTTP verbs we might want to invent in the future)?

(Reply to this) (Thread)


[info]four
2009-12-01 03:56 am UTC (link)
i think extension verbs are a mistake. i might add support for webdav stuff, but i see the point in supporting extensions (seems like too much overhead)

(Reply to this) (Parent)(Thread)


[info]isaacschlueter
2009-12-01 04:35 am UTC (link)
I'm not sure if I agree that they're a mistake, but that's a completely valid answer. You could add the 3 missing verbs to call it spec-complete without adding too much to the code in terms of size or speed, and keeping the complexity down is a huge win.

(Reply to this) (Parent)


[info]isaacschlueter
2009-12-01 03:29 am UTC (link)
Sorry, that last comment was from me. Forgot to sign in.

(Reply to this) (Thread)


[info]four
2009-12-01 03:56 am UTC (link)
see above

(Reply to this) (Parent)

vs Ragel
[info]http://claimid.com/foobarwidget
2009-12-01 08:54 am UTC (link)
Interesting, but how is this better than the Ragel parser?

(Reply to this)

Why
(Anonymous)
2009-12-01 09:13 am UTC (link)
Why did you do this?

(Reply to this)


(Anonymous)
2009-12-01 01:26 pm UTC (link)
Now do it in assembler :-)

(Reply to this)


[info]s1m
2009-12-01 05:51 pm UTC (link)
> is nearly optimal in its use of CPU instructions

do you really think re-entering the main loop 8 times and then doing 8 if's to match the first 8 bytes of data ('HTTP/1.1') is "nearly optimal" in terms of CPU instructions? :/

(Reply to this) (Thread)


[info]comnimh
2009-12-01 08:31 pm UTC (link)
Agreed.
You should compare first four bytes of a buffer with 0x48545450 for HTTP, 0x504f5354 for POST, etc.

(Reply to this) (Parent)(Thread)


[info]s1m
2009-12-01 09:15 pm UTC (link)
something like that, yes. depends on the CPU.

on a 64-bit machine a single compare is needed to determine whether or not it's a "HTTP/1.1" request. 1 extra compare to determine that it's a rare "HTTP/1.0" request.

one could do DWORD compare for GET/POST (most common), or do a single masking int64 compare on a 64-bit machine to match any of the available HTTP methods.

CRLF can be matched as a single WORD compare.

c = LOWER(ch);
if (c >= 'a' && c <= 'z') break;

can easily be a single compare by checking a masked set of flags.

i could probably go on for a while, but i just wanted to point out that "nearly optimal in its use of CPU instructions" might be a bit of a stretch here :/

(Reply to this) (Parent)(Thread)


[info]comnimh
2009-12-01 09:18 pm UTC (link)
pearls before swine is excellent!

(Reply to this) (Parent)


(Anonymous)
2009-12-08 03:16 pm UTC (link)
I'm nitpicking, but HTTP/1.0 requests may not be rare. For instance nginx only talks HTTP/1.0 to proxied backends.

(Reply to this) (Parent)(Thread)


[info]s1m
2009-12-08 06:24 pm UTC (link)
it's a valid concern. a specialized web server could check for 1.0 first, but a generic one is better off assuming it's 1.1 and then checking if its wrong.

(Reply to this) (Parent)


[info]four
2009-12-08 03:23 pm UTC (link)
yes, a couple tens or maybe hundreds of instructions can be saved per request. most come at the cost of making buffering assumptions.

I'll probably do the bitwise compare for http methods and "HTTP/1.1" at some point like nginx does: http://lxr.evanmiller.org/http/source/http/ngx_http_parse.c?v=nginx-0.7.62#L161

(Reply to this) (Parent)(Thread)


[info]s1m
2009-12-08 06:21 pm UTC (link)
a bit more than that. besides, the reason to have a super fast HTTP parser is to process a lot of requests, right? once you get over 20k RPS it adds up quickly.

it's all relative, of course. i would say that most of the performance gains in your parser could come from reducing the frequency of call backs and improving headers matching strategy.

btw, "buffering assumptions" can speed up parsing greatly as well. when i undertook a similar exercise i wouldn't parse a line until i found CRLF, that made a pretty significant difference. it helped matching dramatically, i used a custom hashing function for the header pattern matching, that allowed me to match most of the common used headers by a several bit-wise operations and an indirect table jump.

(Reply to this) (Parent)(Thread)


[info]four
2009-12-08 06:34 pm UTC (link)
i would like to write a wrapper around this parser which NULLs most callbacks and buffers the request - say the first 8kb. the mark/length elements (for the path, for example) could be read in on_headers_complete without needing a callback. that, combined with nginx strcmp above should be hard to beat. i like having the incremental property for the trivial case - makes throwing together a web server/client very easy.

(Reply to this) (Parent)(Thread)


[info]s1m
2009-12-08 06:51 pm UTC (link)
most of the requests should fit in 8kb, so it's a good assumption.

again, the strategy i used revolved around finding the end of the line (CRLF) and then going from there, rather than reading from the input stream by one byte.

if i reached the end of the buffer before i reached a CRLF i'd copy the unmatched data to the start of the buffer and request a read from the resulting offset.

once the line was completed i could use better methods of parsing it, such as 64/32 bit compares and custom hashing for the headers.

i am not claiming my parser was the fastest, i basically stopped optimizing it as soon as i could fill a gigabit ethernet pipe without using more than 2-3% of the CPU back in 2003.

(Reply to this) (Parent)

Doesn't compile with VS2005
(Anonymous)
2009-12-29 01:46 am UTC (link)
Apparently VS2005 has no support for C99 which I guess is the main reason it won't compile. Any plans to incorporate support for VS2005 users?

Thanks

(Reply to this) (Thread)

Re: Doesn't compile with VS2005
[info]four
2009-12-31 12:23 am UTC (link)
i would gladly accept a patch

(Reply to this) (Parent)


(Anonymous)
2010-01-04 07:52 pm UTC (link)
good day. I have a problem when building node_postgres.
after node-waf configure build error

[1 / 2] cxx: binding.cc -> build/default/binding_1.o
.. / binding.cc: In member function 'bool Connection:: Connect (const char *)':
.. / binding.cc: 70: error: 'Attach' was not declared in this scope
.. / binding.cc: In member function 'void Connection:: Close (v8:: Local )':
.. / binding.cc: 115: error: no matching function for call to 'Connection:: Emit (const char [6], int, NULL)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle , int, v8:: Handle
[Error: Irreparable invalid markup ('<v8::>') in entry. Owner must fix manually. Raw contents below.]

good day. I have a problem when building node_postgres.
after node-waf configure build error

[1 / 2] cxx: binding.cc -> build/default/binding_1.o
.. / binding.cc: In member function 'bool Connection:: Connect (const char *)':
.. / binding.cc: 70: error: 'Attach' was not declared in this scope
.. / binding.cc: In member function 'void Connection:: Close (v8:: Local <v8::Value>)':
.. / binding.cc: 115: error: no matching function for call to 'Connection:: Emit (const char [6], int, NULL)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: 117: error: no matching function for call to 'Connection:: Emit (const char [6], int, v8:: Local <v8::Value> *)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: 119: error: 'Detach' was not declared in this scope
.. / binding.cc: In member function 'void Connection:: MakeConnection ()':
.. / binding.cc: 300: error: no matching function for call to 'Connection:: Emit (const char [8], int, NULL)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: In member function 'void Connection:: EmitResult (PGresult *)':
.. / binding.cc: 431: error: no matching function for call to 'Connection:: Emit (const char [7], int, NULL)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: 436: error: no matching function for call to 'Connection:: Emit (const char [7], int, v8:: Local <v8::Value> *)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: 443: error: no matching function for call to 'Connection:: Emit (const char [7], int, v8:: Local <v8::Value> *)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: 450: error: no matching function for call to 'Connection:: Emit (const char [7], int, v8:: Local <v8::Value> *)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
.. / binding.cc: In member function 'void Connection:: Event (int)':
.. / binding.cc: 481: error: no matching function for call to 'Connection:: Emit (const char [6], int, NULL)'
/ usr / local / include / node / node_events.h: 17: note: candidates are: bool node:: EventEmitter:: Emit (v8:: Handle <v8::String>, int, v8:: Handle <v8:: Value> *)
/ usr / local / include / node / node_events.h: 20: note: static v8:: Handle <v8::Value> node:: EventEmitter:: Emit (const v8:: Arguments &)
Waf: Leaving directory `/ www / devel / node / node_postgres / build '
Build failed
-> Task failed (err # 1):
(task: cxx binding.cc -> binding_1.o)

node and node_postgres from github, the last

(Reply to this)


[info]smystery
2010-01-08 11:15 am UTC (link)
Interesting code very nice.
thanks for http_parser.h link


(Reply to this)

Not detecting Proxy-Connection header...
(Anonymous)
2010-01-09 12:31 am UTC (link)
You mention using the parser for a proxy (which I am trying to); shouldn't the parser then detect the Proxy-Connection header to determine when the request is complete?

(Reply to this) (Thread)

Re: Not detecting Proxy-Connection header...
[info]four
2010-01-09 09:15 am UTC (link)
You're right. I should support that. Does this work for you?

(Reply to this) (Parent)

Callbacks returning non-0...
(Anonymous)
2010-02-02 12:14 am UTC (link)
Hi, I noticed that the parser state is not maintained properly when an event handler returns non-0; is this intentional or a bug?

I want to temporarily stop parsing after having received the Url portion (by having my on_url callback return 1), then resume parsing a bit later. This does however not seem to be working as the parser state is still s_start_req in the subsequent call to http_parse_requests().

The problem seems to be that the CALLBACK() macro more or less bails out when it encounters an event handler that returns non-0, so the stack variable 'state' is never copied back to the http_parser structure...

(Reply to this) (Thread)

Re: Callbacks returning non-0...
[info]four
2010-02-02 12:21 am UTC (link)
that was not an intended feature. do you want to try to patch it? (I guess just copy state onto the parser struct in the CALLBACK?)

(Reply to this) (Parent)(Thread)

Re: Callbacks returning non-0...
(Anonymous)
2010-02-02 12:43 am UTC (link)
I tried that, but unfortunately most of the code in the main parse() function sets the local 'state' _after_ calling the callback(s), so some reordering would be needed throughout in order to make it functional.
What is the reason for having a local 'state' variable rather than using the one from http_parser structure? (I guess performance...). Also would there be other local variables (i.e. header_state, index) that needed to be copied over?

(Reply to this) (Parent)

Header callbacks: same or new header...
(Anonymous)
2010-04-22 12:06 am UTC (link)
As for the extra bookkeeping that the client has to do in order to figure out whether any of the header callbacks are called on the same or a new header, wouldn't it be possible for the parser itself to keep track of that and provide this information as a hint in the callbacks? Seems to be the most logical to solve this once for all in the parser; it should only cost 1 bit to keep track of...

(Reply to this)

(Reply from suspended user)

(Reply from suspended user)

(32 comments) - (Post a new comment)

Image by [info]delightedly. Join the contest in [info]remixed!
Create an Account
Forgot your login or password?
Log in with OpenID
English • Español • Deutsch • Русский…