| node v0.1.3 |
[Aug. 6th, 2009|02:23 pm] |
node.js development is coming along nicely!
I've just made the 10th release, v0.1.3. Fixing bugs, adding features. The project is still very rough—I think the average time-to-encountering-a-bug-after-first-compile is about 20 minutes. We've got a little IRC channel with 6 or 7 regulars and an usually active mailing list (which doubles as a bug tracking system). |
|
|
| non-blocking stdio hack |
[Aug. 3rd, 2009|11:43 am] |
http://s3.amazonaws.com/four.livejournal/20090803/nonblocking_stdio.tar.gz
The problem with stdout, stderr, and stdin is that they are necessarily
blocking when associated with a file. Like file descriptors for normal
files, they cannot be used with select() or poll().
If one does
my_program > /mnt/nfs_filesystem/output
even printf() system calls might block the entire process's execution.
This is bad for high performance servers which juggle many sockets and
must never block execution.
This function can help with that. Now STDOUT_FILENO, STDERR_FILENO, and
STDIN_FILENO can be used with select(). All are set non-blocking and will
act like pipe end-points (they are actually).
This is done by internally using a ring buffer, pipe, and child process
to pump data to the blocking file descriptor.
Compile with -pthreads. Call nonblocking_stdio() as quickly as possible
after your program starts. Expect strange behavior with using printf()
and friends--those are buffered and expect blocking file descriptors. |
|
|
| popen() for node |
[Jun. 21st, 2009|03:58 pm] |
I just implemented a neat process launching thing in Node. (It's the first time I've ever used vfork()!) It's great because it allows one to stream data in and out of child process. Here's a simple example:
var cat = new node.Process("cat");
cat.onOutput = function (chunk) {
puts("cat said: " + chunk);
};
cat.onExit = function (status) {
puts("cat exited with status " + status);
};
cat.write("hello");
cat.write(" ");
cat.write("world");
cat.close();
It will be easy to implement Web Workers on top of this in pure javascript.
Check out the docs |
|
|
| Node (Another Server-Side Javascript) |
[May. 31st, 2009|03:21 pm] |
Node is a new server-side javascript project. It provides a purely event-based interface to I/O:
* TCP server and client * Standard setTimeout() setInterval() timers * Asynchronous file I/O * HTTP server and client
Node's main focus is on performance and efficiency. It is built on top of
* V8 javascript * libev, event loop abstraction * libeio, file I/O thread pool
Node has the goal of supporting most POSIX operating systems (including Windows/MinGW) but at the moment it is only being tested on Linux, Macintosh, and FreeBSD. Node has no external dependencies (V8 and other libraries are included in its source tree.)
Node strives to provide complete and good documentation. This goal is not yet met but there is enough material to allow one to get started.
Node is released under a MIT license.
Release: node-0.0.2.tar.gz
Website: http://tinyclouds.org/node/
API documentation: http://tinyclouds.org/node/api.html
Git Repository: http://github.com/ry/node |
|
|
| RFC 1122 (Requirements for Internet Hosts -- Communication), section 4.2.2.13: Closing a Connection |
[May. 2nd, 2009|09:34 pm] |
A TCP connection may terminate in two ways: - the normal TCP close sequence using a FIN handshake, and
- an "abort" in which one or more RST segments are sent and the connection state is immediately discarded.
If a TCP connection is closed by the remote site, the local application MUST be informed whether it closed normally or was aborted.
The normal TCP close sequence delivers buffered data reliably in both directions. Since the two directions of a TCP connection are closed independently, it is possible for a connection to be "half closed," i.e., closed in only one direction, and a host is permitted to continue sending data in the open direction on a half-closed connection.
A host MAY implement a "half-duplex" TCP close sequence, so that an application that has called CLOSE cannot continue to read data from the connection. If such a host issues a CLOSE call while received data is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULD send a RST to show that data was lost.
When a connection is closed actively, it MUST linger in TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). However, it MAY accept a new SYN from the remote TCP to reopen the connection directly from TIME-WAIT state, if it: - assigns its initial sequence number for the new connection to be larger than the largest sequence number it used on the previous connection incarnation, and
- returns to TIME-WAIT state if the SYN turns out to be an old duplicate.
DISCUSSION:
TCP's full-duplex data-preserving close is a feature that is not included in the analogous ISO transport protocol TP4.
Some systems have not implemented half-closed connections, presumably because they do not fit into the I/O model of their particular operating system. On these systems, once an application has called CLOSE, it can no longer read input data from the connection; this is referred to as a "half-duplex" TCP close sequence.
The graceful close algorithm of TCP requires that the connection state remain defined on (at least) one end of the connection, for a timeout period of 2xMSL, i.e., 4 minutes. During this period, the (remote socket, local socket) pair that defines the connection is busy and cannot be reused. To shorten the time that a given port pair is tied up, some TCPs allow a new SYN to be accepted in TIME-WAIT state. Here is my guess of what half-duplex close sequence means in POSIX.
client-program client-kernel server-kernel server-program
| | | |
|--shutdown()--->| | | client: shutdown(fd, SHUT_WR)
| |-----FIN------>| |
| | |--read EOF--->| server: read() returns 0
| |<----ACK-------| |
| | | |
| | | |
| | . | |
| | . | |
| | . | |
| | | |
| | | |
| | |<--write()----| server: optionally still writes data
| |<--------------| |
|<--- read()-----| | | client: still reads data as normal
| |-----ACK------>| |
| | |<--write()----|
| |<--------------| |
|<--- read()-----| | | client: still reads data as normal
| |-----ACK------>| |
| | | |
| | | |
| | . | |
| | . | |
| | . | |
| | | |
| | | |
| | |<--close()----| server: close(fd) or shutdown(fd, SHUT_WR)
| |<-----FIN------| | both have the same effect
|<--read EOF ----| | | client: read() returns 0
| |-----ACK------>| |
|---close()----->| | | client: close(fd)
Is this accurate? |
|
|
| TCP |
[May. 2nd, 2009|06:12 pm] |
I think the proper abstraction of a TCP socket (both sides) is something like this
events: - connected - received data - send buffer drained - got FIN - done
methods: - connect (for clients only) - send - close (sends FIN, shutdown(fd, SHUT_WR))
this is rather tricky to get right. (posix's unholy mixture of sockets and files aids to the confusion.) am i missing anything? |
|
|
| HTTP Parser |
[Apr. 27th, 2009|05:21 pm] |
I extracted the HTTP parser from libebb and beefed it up. It now handles HTTP responses, and I gave it a bit of documentation, made various other little improvements.
* No dependencies
* Parses both requests and responses.
* Handles keep-alive streams.
* Decodes chunked encoding.
* Extracts the following data from a message
o header fields and values
o content-length
o request method
o response status code
o transfer-encoding
o http version
o request path, query string, fragment
o message body
source repo: http://github.com/ry/http-parser/tree/master
version 0.1: http://s3.amazonaws.com/four.livejournal/20090427/http_parser-0.1.tar.gz |
|
|
| modules in javascript |
[Apr. 12th, 2009|03:46 pm] |
A small modification of Douglas Crockford's module pattern. The module pattern uses a temporary function which returns an object to achieve a private scope
var module = function () {
// private
function world () {
return "world";
}
return {
// public
hello: function () {
return "hello " + world();
}
};
}();
I think it's cleaner to instead use an anonymous constructor:
var module = new function () {
// private
function world () {
return "world";
}
// public
this.hello = function () {
return "hello " + world();
};
};
Anything prefixed with this is exported from the scope. This technique does not have an ending (), which tends to be easily overlooked, and it doesn't require an extra indention level.
(This isn't new - comments in the linked-to article have already suggested this.) |
|
|
| Infinite IRC |
[Apr. 8th, 2009|01:46 am] |
You're given 50000 servers, a toothpick, and a rubber band. How would
you design a modern IRC-like system?
By IRC-like I mean
- Users can connect to the system and give themselves a nickname.
- Users can join channels.
- Users can send messages to channels, which get distributed
to each of the users in the channel.
- Users can send messages directly other users.
By modern I mean:
- It should scale almost infinitely. A billion messages per second is the
minimum load.
- There can be no single point of failure.
Participating servers are failing constantly and
this should have limited effect on the users.
New servers are added often.
- Web interface. (Just to be a bit more concrete about the technologies
involved.)
My Attempt. Each of the 50000 servers will run two programs: a web
server and a queue daemon (like RabbitMQ). The web server and queue daemon
connection library both need to be non-shitty. And by non-shitty I mean
evented. (In the Ruby world there is an Event Machine-based AMQP
client library for use with Thin. I would use my own (yet unreleased)
tools.)
A user goes to http://irc.org/ and must pick a nickname
before joining. Let's say the user picks jane as her nickname.
The server resolves jane.irc.org (more about the
DNS later) and forwards the user to http://jane.irc.org.
There the user's browser downloads the HTML, JavaScript, CSS and starts
AJAX long polling for updates. (Like Gmail.)
The node jane.irc.org has an internal hostname called
node1234.irc.org.
When the user joins channel #ruby-lang (through some button
in the HTML UI), here's what happens:
-
An AJAX request sends a JOIN ruby-lang command to the web
server. (Encoded in someway or another: perhaps by doing POST
/channels.)
-
The web server, upon receiving that request, opens an AMQP connection to
ruby-lang.irc.org if it does not exist already.
(The server maintains a pool of connections to other nodes.)
Once the connection is established the web server creates an
auto-delete, exclusive queue (AMQP terms) called server.node1234 and a binding which forwards
all messages with the routing key channel.ruby-lang to the new queue server.node1234.
Finally it subscribes the connection to queue server.node1234.
All messages received on this connection are published to the
localhost AMQP server. (A master connection from the web server to the
local AMQP server is always maintained.)
-
Via the "master" connection to the local AMQP server, the web server
creates an exclusive queue called user.jane and binds all messages matching
channel.ruby-lang to user.jane. The master connection then
subscribes to user.jane.
-
Messages which appear in the user.jane queue allow Jane's long-poll to
return. That is, when a message is returned on the master connection
which was in the queue user.jane and
there is a pending long-poll request then the message is returned as a
response, acknowledgement of the message reciept is sent to the queue
daemon.
If on the other hand there is no pending long-poll, the
message receipt acknowledgment is delayed. (This is so that messages
stay on in the queue until they're ready to be delivered to the user.)
When the user sends a message to the ruby-lang channel, the web server
publishes a message on ruby-lang.irc.org
with the routing key channel.ruby-lang.
I think its easy to interpolate the rest of the IRC functionality. For example, to
send a message to a user "bob", look up the hostname
bob.irc.org and publish a message on that server with the
routing key user.bob.
I leverage DNS to keep track of servers. This will be another system unto
itself perhaps requiring a dozen nameservers. The nameserver
system is rather simple - it must have the ability to
- register new channels and nicks (new channels/nicks are hashed to
node IPs using a libketama-style
consistant hashing algorithm);
- be notified of downed nodes and switch over previously hashed names
to alternatives;
- be notified of new nodes and begin hashing names to them.
This is a decently difficult, yet tractable engineering task; I leave this
as an exercise to the reader.
One bottleneck here is that each channel basically lives on a certain
computer. That means that channels can not scale to a million simultaneous
users. Perhaps a couple hundred users (like normal IRC) would be the limit.
This could alleviated somewhat by assigning multiple IP addresses to each
channel. Web servers would then create connections to all of the IP
addresses assigned to a channel to recieve messages; they would post new
messages to only one of them. |
|
|
| build systems |
[Apr. 7th, 2009|10:35 am] |
I've switched from scons to WAF. WAF is much better and solves almost all problems. (Building included dependency libraries is still annoying though.) |
|
|
| (no subject) |
[Mar. 19th, 2009|08:55 am] |
Dear developers inventing APIs for web servers,
A HTTP header is not a dictionary object. There can exist multiple header lines with the same field name. This happens rarely but it does happen. For example, for a server to set multiple cookies in one response, multiple "Set-Cookie" header lines are needed. A list is a more appropriate way of storing header lines. Instead of having response.setHeader(field, value) I would suggest response.addHeader(field, value) Thank you. |
|
|
| (no subject) |
[Mar. 17th, 2009|12:44 am] |
This looks like a nice async dns library udns. |
|
|
| FLV seeking |
[Feb. 26th, 2009|08:35 pm] |
The seek functionality in FLV (flash video) is implemented using a GET parameter ?start=123 where 123 represents the beginning byte that the response should send. This is dumb because HTTP already has a special header for requesting a part of a file (Range, RFC 2616 14.35). This has been noted before. The problem is that - Flash will not allow a GET request to have a
Range header. - This string
"FLV\x1\x1\0\0\0\x9\0\0\0\x9" needs to be prepended to the data. Annoying. |
|
|
| c style |
[Feb. 11th, 2009|12:17 pm] |
i have a new function declaration style. I put each parameter on its own line with commas in front:static void op_write
( fuse_req_t req
, fuse_ino_t ino
, const char *buf
, size_t size
, off_t off
, struct fuse_file_info *fi
) or void directory_parser_init
( struct directory_parser *parser
, directory_parser_cb cb
, void *userdata
); easy to read and edit (i hate maintaining line breaks in parameter lists to keep it below 80 columns). also skinny header files can be put in thin vertical windows to the right:
 |
|
|
| IPN |
[Feb. 8th, 2009|11:35 am] |
Inter Process Networking sounds interestingIPN is an Inter Process Communication service. It uses the same programming interface and protocols used for networking. Processes using IPN are connected to a "network" (many to many communication). The messages or packets sent by a process on an IPN network can be delivered to many other processes connected to the same IPN network, potentially to all the other processes. Different protocols can be defined on the IPN service. The basic one is the broadcast (level 1) protocol: all the packets get received by all the processes but the sender. It is also possible to define more sophisticated protocols. For example it is possible to have IPN sockets dispatching packets using the Ethernet protocol (like a Virtual Distributed Ethernet - VDE switch), or Internet Protocol (like a layer 3 switch). These are just examples, several other policies can be defined. |
|
|
| (no subject) |
[Feb. 5th, 2009|08:26 pm] |
JavaScript makes relative times compatible with caching is a trivial hack (and imo rather poorly done) but it's the right kind of thinking! Amazingly, this is the sort of optimization that could cut the hosting bills for some websites in half (in ways that a faster backend server like Ebb never could) |
|
|
| (no subject) |
[Feb. 4th, 2009|12:39 am] |
YesTo decrease the amount of bandwidth wasted by web crawlers for search engines, I'd suggest the development of a standard index format that can be linked to by a sites robots.txt. The site admins would run a local crawler which would produce this index and then publish it. Search engines could then grab this compressed index instead of crawling the site (saving bandwidth and computation on both sides), sample it for inaccuracies and accept it if it looks good.
Additional benefits 1) the index could be used for implementing search on the web site it was produced and since it's standardized, a large number of tools could use it 2) it would make it much easier for other search engines to compete with Google. HTTP/2.0 please. |
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| [ |
go |
| |
earlier |
] |
| |
|
|