Home
ryah [entries|archive|friends|userinfo]
ryah

[ website | tiny clouds ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

node v0.1.3 [Aug. 6th, 2009|02:23 pm]
[Tags|, ]
[Current Mood | happy]

node.js development is coming along nicely!

I've just made the 10th release, v0.1.3. Fixing bugs, adding features. The project is still very rough—I think the average time-to-encountering-a-bug-after-first-compile is about 20 minutes. We've got a little IRC channel with 6 or 7 regulars and an usually active mailing list (which doubles as a bug tracking system).
LinkLeave a comment

non-blocking stdio hack [Aug. 3rd, 2009|11:43 am]
[Tags|]

http://s3.amazonaws.com/four.livejournal/20090803/nonblocking_stdio.tar.gz
 The problem with stdout, stderr, and stdin is that they are necessarily
 blocking when associated with a file. Like file descriptors for normal
 files, they cannot be used with select() or poll().

 If one does 
 
   my_program > /mnt/nfs_filesystem/output

 even printf() system calls might block the entire process's execution.
 This is bad for high performance servers which juggle many sockets and
 must never block execution.

 This function can help with that. Now STDOUT_FILENO, STDERR_FILENO, and
 STDIN_FILENO can be used with select(). All are set non-blocking and will
 act like pipe end-points (they are actually).

 This is done by internally using a ring buffer, pipe, and child process
 to pump data to the blocking file descriptor. 

 Compile with -pthreads. Call nonblocking_stdio() as quickly as possible
 after your program starts. Expect strange behavior with using printf()
 and friends--those are buffered and expect blocking file descriptors.
LinkLeave a comment

popen() for node [Jun. 21st, 2009|03:58 pm]
[Tags|, ]

I just implemented a neat process launching thing in Node. (It's the first time I've ever used vfork()!) It's great because it allows one to stream data in and out of child process. Here's a simple example:
 var cat = new node.Process("cat");
 cat.onOutput = function (chunk) {
   puts("cat said: " + chunk);
 };
 cat.onExit = function (status) {
   puts("cat exited with status " + status);
 };
 cat.write("hello");
 cat.write(" ");
 cat.write("world");
 cat.close();
It will be easy to implement Web Workers on top of this in pure javascript.

Check out the docs
LinkLeave a comment

Why isn't there a bidirectional popen()? [Jun. 19th, 2009|06:29 pm]
[Tags|]

Good read: http://lua-users.org/lists/lua-l/2007-10/msg00189.html


In node I think it will look like this
var cat = popen("cat");

cat.onOutput = function (chunk) {
  puts("cat said: '" + chunk + "'");
};

cat.onExit = function (code) {
  puts("the cat program exited with code " + code);
};

cat.write("hello world");
cat.close();
cat.kill(); 
Link2 comments|Leave a comment

projects using node [Jun. 16th, 2009|07:12 pm]
[Tags|, ]

I think all are in the early stages of development:

redis-node-client, node-json-rpc, and node_http_dispatch by Brian Hammond

pubsub by Malte Ubl

frisbee, A clone of the Disqus blog commenting system, by Urban Hafner

hxV8, the haXe → javascript compiler with modifications to use node I/O. by blackdog66

node_chat by me (live demo at chat.tinyclouds.org)
LinkLeave a comment

Node (Another Server-Side Javascript) [May. 31st, 2009|03:21 pm]
[Tags|]

Node is a new server-side javascript project. It provides a purely event-based interface to I/O:

* TCP server and client
* Standard setTimeout() setInterval() timers
* Asynchronous file I/O
* HTTP server and client

Node's main focus is on performance and efficiency. It is built on top of

* V8 javascript
* libev, event loop abstraction
* libeio, file I/O thread pool

Node has the goal of supporting most POSIX operating systems (including Windows/MinGW) but at the moment it is only being tested on Linux, Macintosh, and FreeBSD. Node has no external dependencies (V8 and other libraries are included in its source tree.)

Node strives to provide complete and good documentation. This goal is not yet met but there is enough material to allow one to get started.

Node is released under a MIT license.

Release: node-0.0.2.tar.gz

Website: http://tinyclouds.org/node/

API documentation: http://tinyclouds.org/node/api.html

Git Repository: http://github.com/ry/node
Link3 comments|Leave a comment

RFC 1122 (Requirements for Internet Hosts -- Communication), section 4.2.2.13: Closing a Connection [May. 2nd, 2009|09:34 pm]
[Tags|]

A TCP connection may terminate in two ways:
  1. the normal TCP close sequence using a FIN handshake, and
  2. an "abort" in which one or more RST segments are sent and the connection state is immediately discarded.
If a TCP connection is closed by the remote site, the local application MUST be informed whether it closed normally or was aborted.

The normal TCP close sequence delivers buffered data reliably in both directions. Since the two directions of a TCP connection are closed independently, it is possible for a connection to be "half closed," i.e., closed in only one direction, and a host is permitted to continue sending data in the open direction on a half-closed connection.

A host MAY implement a "half-duplex" TCP close sequence, so that an application that has called CLOSE cannot continue to read data from the connection. If such a host issues a CLOSE call while received data is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULD send a RST to show that data was lost.

When a connection is closed actively, it MUST linger in TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). However, it MAY accept a new SYN from the remote TCP to reopen the connection directly from TIME-WAIT state, if it:
  1. assigns its initial sequence number for the new connection to be larger than the largest sequence number it used on the previous connection incarnation, and
  2. returns to TIME-WAIT state if the SYN turns out to be an old duplicate.


DISCUSSION:

TCP's full-duplex data-preserving close is a feature that is not included in the analogous ISO transport protocol TP4.

Some systems have not implemented half-closed connections, presumably because they do not fit into the I/O model of their particular operating system. On these systems, once an application has called CLOSE, it can no longer read input data from the connection; this is referred to as a "half-duplex" TCP close sequence.

The graceful close algorithm of TCP requires that the connection state remain defined on (at least) one end of the connection, for a timeout period of 2xMSL, i.e., 4 minutes. During this period, the (remote socket, local socket) pair that defines the connection is busy and cannot be reused. To shorten the time that a given port pair is tied up, some TCPs allow a new SYN to be accepted in TIME-WAIT state.

Here is my guess of what half-duplex close sequence means in POSIX.
client-program   client-kernel   server-kernel  server-program
      |                |               |              |
      |--shutdown()--->|               |              |        client: shutdown(fd, SHUT_WR)
      |                |-----FIN------>|              |
      |                |               |--read EOF--->|        server: read() returns 0
      |                |<----ACK-------|              |
      |                |               |              |
      |                |               |              |
      |                |      .        |              |
      |                |      .        |              |
      |                |      .        |              |
      |                |               |              |
      |                |               |              |
      |                |               |<--write()----|        server: optionally still writes data
      |                |<--------------|              |
      |<--- read()-----|               |              |        client: still reads data as normal
      |                |-----ACK------>|              |
      |                |               |<--write()----|
      |                |<--------------|              |
      |<--- read()-----|               |              |        client: still reads data as normal
      |                |-----ACK------>|              |
      |                |               |              |
      |                |               |              |
      |                |      .        |              |
      |                |      .        |              |
      |                |      .        |              |
      |                |               |              |
      |                |               |              |
      |                |               |<--close()----|        server: close(fd) or shutdown(fd, SHUT_WR)
      |                |<-----FIN------|              |                both have the same effect 
      |<--read EOF ----|               |              |        client: read() returns 0
      |                |-----ACK------>|              |
      |---close()----->|               |              |        client: close(fd)


Is this accurate?
LinkLeave a comment

TCP [May. 2nd, 2009|06:12 pm]
[Tags|]

I think the proper abstraction of a TCP socket (both sides) is something like this

events:
- connected
- received data
- send buffer drained
- got FIN
- done

methods:
- connect (for clients only)
- send
- close (sends FIN, shutdown(fd, SHUT_WR))

this is rather tricky to get right. (posix's unholy mixture of sockets and files aids to the confusion.) am i missing anything?
LinkLeave a comment

HTTP Parser [Apr. 27th, 2009|05:21 pm]
[Tags|]

I extracted the HTTP parser from libebb and beefed it up. It now handles HTTP responses, and I gave it a bit of documentation, made various other little improvements.

    * No dependencies
    * Parses both requests and responses.
    * Handles keep-alive streams.
    * Decodes chunked encoding.
    * Extracts the following data from a message
          o header fields and values
          o content-length
          o request method
          o response status code
          o transfer-encoding
          o http version
          o request path, query string, fragment
          o message body


source repo: http://github.com/ry/http-parser/tree/master

version 0.1: http://s3.amazonaws.com/four.livejournal/20090427/http_parser-0.1.tar.gz
Link2 comments|Leave a comment

multi-line CPP macros in vim [Apr. 27th, 2009|03:26 pm]
[Tags|, ]

How do you edit multi-line cpp macros in vim?
LinkLeave a comment

modules in javascript [Apr. 12th, 2009|03:46 pm]
[Tags|, ]

A small modification of Douglas Crockford's module pattern. The module pattern uses a temporary function which returns an object to achieve a private scope
    var module = function () {
        // private 
        function world () { 
            return "world"; 
        }
        
        return {
            // public 
            hello:  function () {
                return "hello " + world();       
            }   
        };
    }();
I think it's cleaner to instead use an anonymous constructor:
    var module = new function () {
        // private 
        function world () { 
            return "world";
        }
    
        // public 
        this.hello = function () {
            return "hello " + world();
        };
    };
Anything prefixed with this is exported from the scope. This technique does not have an ending (), which tends to be easily overlooked, and it doesn't require an extra indention level.

(This isn't new - comments in the linked-to article have already suggested this.)

LinkLeave a comment

Infinite IRC [Apr. 8th, 2009|01:46 am]
[Tags|]

You're given 50000 servers, a toothpick, and a rubber band. How would you design a modern IRC-like system?

By IRC-like I mean

  • Users can connect to the system and give themselves a nickname.
  • Users can join channels.
  • Users can send messages to channels, which get distributed to each of the users in the channel.
  • Users can send messages directly other users.

By modern I mean:

  • It should scale almost infinitely. A billion messages per second is the minimum load.
  • There can be no single point of failure. Participating servers are failing constantly and this should have limited effect on the users. New servers are added often.
  • Web interface. (Just to be a bit more concrete about the technologies involved.)

My Attempt. Each of the 50000 servers will run two programs: a web server and a queue daemon (like RabbitMQ). The web server and queue daemon connection library both need to be non-shitty. And by non-shitty I mean evented. (In the Ruby world there is an Event Machine-based AMQP client library for use with Thin. I would use my own (yet unreleased) tools.)

A user goes to http://irc.org/ and must pick a nickname before joining. Let's say the user picks jane as her nickname.

The server resolves jane.irc.org (more about the DNS later) and forwards the user to http://jane.irc.org. There the user's browser downloads the HTML, JavaScript, CSS and starts AJAX long polling for updates. (Like Gmail.) The node jane.irc.org has an internal hostname called node1234.irc.org.

When the user joins channel #ruby-lang (through some button in the HTML UI), here's what happens:

  1. An AJAX request sends a JOIN ruby-lang command to the web server. (Encoded in someway or another: perhaps by doing POST /channels.)

  2. The web server, upon receiving that request, opens an AMQP connection to ruby-lang.irc.org if it does not exist already. (The server maintains a pool of connections to other nodes.)

    Once the connection is established the web server creates an auto-delete, exclusive queue (AMQP terms) called server.node1234 and a binding which forwards all messages with the routing key channel.ruby-lang to the new queue server.node1234.

    Finally it subscribes the connection to queue server.node1234.

    All messages received on this connection are published to the localhost AMQP server. (A master connection from the web server to the local AMQP server is always maintained.)

  3. Via the "master" connection to the local AMQP server, the web server creates an exclusive queue called user.jane and binds all messages matching channel.ruby-lang to user.jane. The master connection then subscribes to user.jane.

  4. Messages which appear in the user.jane queue allow Jane's long-poll to return. That is, when a message is returned on the master connection which was in the queue user.jane and there is a pending long-poll request then the message is returned as a response, acknowledgement of the message reciept is sent to the queue daemon.

    If on the other hand there is no pending long-poll, the message receipt acknowledgment is delayed. (This is so that messages stay on in the queue until they're ready to be delivered to the user.)

When the user sends a message to the ruby-lang channel, the web server publishes a message on ruby-lang.irc.org with the routing key channel.ruby-lang.

I think its easy to interpolate the rest of the IRC functionality. For example, to send a message to a user "bob", look up the hostname bob.irc.org and publish a message on that server with the routing key user.bob.

I leverage DNS to keep track of servers. This will be another system unto itself perhaps requiring a dozen nameservers. The nameserver system is rather simple - it must have the ability to

  • register new channels and nicks (new channels/nicks are hashed to node IPs using a libketama-style consistant hashing algorithm);
  • be notified of downed nodes and switch over previously hashed names to alternatives;
  • be notified of new nodes and begin hashing names to them.
This is a decently difficult, yet tractable engineering task; I leave this as an exercise to the reader.

One bottleneck here is that each channel basically lives on a certain computer. That means that channels can not scale to a million simultaneous users. Perhaps a couple hundred users (like normal IRC) would be the limit. This could alleviated somewhat by assigning multiple IP addresses to each channel. Web servers would then create connections to all of the IP addresses assigned to a channel to recieve messages; they would post new messages to only one of them.

LinkLeave a comment

build systems [Apr. 7th, 2009|10:35 am]
[Tags|]

I've switched from scons to WAF. WAF is much better and solves almost all problems. (Building included dependency libraries is still annoying though.)
LinkLeave a comment

(no subject) [Mar. 19th, 2009|08:55 am]
[Tags|]

Dear developers inventing APIs for web servers,

A HTTP header is not a dictionary object. There can exist multiple header lines with the same field name. This happens rarely but it does happen. For example, for a server to set multiple cookies in one response, multiple "Set-Cookie" header lines are needed. A list is a more appropriate way of storing header lines. Instead of having
  response.setHeader(field, value)
I would suggest
  response.addHeader(field, value)
Thank you.
Link2 comments|Leave a comment

(no subject) [Mar. 17th, 2009|12:44 am]
[Tags|]

This looks like a nice async dns library udns.
LinkLeave a comment

FLV seeking [Feb. 26th, 2009|08:35 pm]
[Tags|]

The seek functionality in FLV (flash video) is implemented using a GET parameter ?start=123 where 123 represents the beginning byte that the response should send. This is dumb because HTTP already has a special header for requesting a part of a file (Range, RFC 2616 14.35). This has been noted before. The problem is that
  1. Flash will not allow a GET request to have a Range header.
  2. This string "FLV\x1\x1\0\0\0\x9\0\0\0\x9" needs to be prepended to the data.
Annoying.
LinkLeave a comment

c style [Feb. 11th, 2009|12:17 pm]
[Tags|]

i have a new function declaration style. I put each parameter on its own line with commas in front:
static void op_write           
  ( fuse_req_t req
  , fuse_ino_t ino
  , const char *buf
  , size_t size
  , off_t off
  , struct fuse_file_info *fi  
  )
or
void directory_parser_init     
  ( struct directory_parser *parser 
  , directory_parser_cb cb     
  , void *userdata
  );
easy to read and edit (i hate maintaining line breaks in parameter lists to keep it below 80 columns). also skinny header files can be put in thin vertical windows to the right:

Link4 comments|Leave a comment

IPN [Feb. 8th, 2009|11:35 am]
[Tags|]

Inter Process Networking sounds interesting
IPN is an Inter Process Communication service. It uses the same programming interface and protocols used for networking. Processes using IPN are connected to a "network" (many to many communication). The messages or packets sent by a process on an IPN network can be delivered to many other processes connected to the same IPN network, potentially to all the other processes. Different protocols can be defined on the IPN service. The basic one is the broadcast (level 1) protocol: all the packets get received by all the processes but the sender. It is also possible to define more sophisticated protocols. For example it is possible to have IPN sockets dispatching packets using the Ethernet protocol (like a Virtual Distributed Ethernet - VDE switch), or Internet Protocol (like a layer 3 switch). These are just examples, several other policies can be defined.
Link2 comments|Leave a comment

(no subject) [Feb. 5th, 2009|08:26 pm]
[Tags|]

JavaScript makes relative times compatible with caching is a trivial hack (and imo rather poorly done) but it's the right kind of thinking! Amazingly, this is the sort of optimization that could cut the hosting bills for some websites in half (in ways that a faster backend server like Ebb never could)
LinkLeave a comment

(no subject) [Feb. 4th, 2009|12:39 am]
[Tags|]

Yes
To decrease the amount of bandwidth wasted by web crawlers for search engines, I'd suggest the development of a standard index format that can be linked to by a sites robots.txt. The site admins would run a local crawler which would produce this index and then publish it. Search engines could then grab this compressed index instead of crawling the site (saving bandwidth and computation on both sides), sample it for inaccuracies and accept it if it looks good.

Additional benefits 1) the index could be used for implementing search on the web site it was produced and since it's standardized, a large number of tools could use it 2) it would make it much easier for other search engines to compete with Google.
HTTP/2.0 please.
Link9 comments|Leave a comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]

Advertisement