| ryah ( @ 2008-06-29 10:52:00 |
dedicated cache server?
I'm thinking about writing a website caching server. Something in-between memcached and squid. I don't want to write it (it will destroy my flow on other projects) but I think it's necessary. I have this written in Ruby but it's too slow. I was considering writing an Nginx module to handle it, but if I invest the sort of time that it takes to write an Nginx module, I might as well strive for a proper (web server independent) solution.
Please let me know if I can get the functionality I describe out of some existing software.
The problem is making caching exact. What is needed is a cache store which attaches IDs from database objects (from a Relational DB row or a CouchDB document) to cached output. Additionally each cache must have a request template. I will explain what I mean by request template:
Each dynamically generated web page has several parameters from the request that it uses to generate the response. For example, the HOST and PATH_INFO parameters are used to respond to a GET request to
Additionally, livejournal also checks the COOKIE header to authenticate. The necessary elements of the request (headers, http version, request uri, query parameters) along with their values are what I call a request template. In this example the request template might be
Note that for each page, different parameters from the request are needed. (For example, the "about livejournal" page might not use
Caches are generated from dynamic web pages, they have a request template and a list of IDs. Expiration of cache is done using the IDs. When I change post 12345, the application server (or the database) should notify the cache server that all caches involving 12345 should be expired. In the case of livejournal, it would probably be expiring various caches of friend's pages, the page for the post itself, and the calendar page which lists post counts for each month.
I don't pretend that this method of caching is the right solution for every case, but for very many simple dynamic websites this will work well, I think. Wordpress blogs and big catalog style websites, for example, would make good use of such a caching server. The main benefit is that the caching is exact and can be abstracted from the website programmer; the web framework can measure which request parameters are used and which database IDs a generated HTML chunk depends on.
How I intend to implement this
The caching server should be a simple HTTP server (written in C and using a simple HTTP server library). It will have 3 functions: serving cache, storing cache, and expiring cache. This will all be done though HTTP.
Serving Cache
The front-end webserver (Nginx or whatev) will send all GET requests to the cache server. If the cache server returns 404, it will then forward the request to the Application Server, otherwise it will serve the response. Inside the cache server, when it receives a GET request - it matches the request against all of its request templates to find a suitable cache. If it cannot find a matching request template it returns 404.
Expiring Cache
The Application Server will send POST /_expire?id=1234 to the cache server to expire all caches which are associated with the id 1234.
Storing Cache
The Application Server will send a POST request containing the cache and the associated IDs in the body of the request. Headers, path_info, and query params should constitute the request template for that cache. For example,
The cache would then be stored in memory, like memcached.
The key difference between this and memcached is filtering requests by the request template. In fact, I might use memcached as a back-end for storage (although, that's probably more overhead than it's worth). The difference between this and Squid is - well Squid doesn't do expiration or filtering at all (I think?).
suggestions? objections?
I'm thinking about writing a website caching server. Something in-between memcached and squid. I don't want to write it (it will destroy my flow on other projects) but I think it's necessary. I have this written in Ruby but it's too slow. I was considering writing an Nginx module to handle it, but if I invest the sort of time that it takes to write an Nginx module, I might as well strive for a proper (web server independent) solution.
Please let me know if I can get the functionality I describe out of some existing software.
The problem is making caching exact. What is needed is a cache store which attaches IDs from database objects (from a Relational DB row or a CouchDB document) to cached output. Additionally each cache must have a request template. I will explain what I mean by request template:
Each dynamically generated web page has several parameters from the request that it uses to generate the response. For example, the HOST and PATH_INFO parameters are used to respond to a GET request to
http://four.livejournal.com/871515.h tmlAdditionally, livejournal also checks the COOKIE header to authenticate. The necessary elements of the request (headers, http version, request uri, query parameters) along with their values are what I call a request template. In this example the request template might be
{ PATH_INFO: "/871515.html"
, HOST: "four.livejournal.com"
, cookies: { "ljdomsess.four": "v1:u1329:s17..." }
}Any other request which matches all of the elements and their values in the request template will be served the cache.Note that for each page, different parameters from the request are needed. (For example, the "about livejournal" page might not use
HOST: four.livejournal.com or the cookie - it might only depend on PATH_INFO.)Caches are generated from dynamic web pages, they have a request template and a list of IDs. Expiration of cache is done using the IDs. When I change post 12345, the application server (or the database) should notify the cache server that all caches involving 12345 should be expired. In the case of livejournal, it would probably be expiring various caches of friend's pages, the page for the post itself, and the calendar page which lists post counts for each month.
I don't pretend that this method of caching is the right solution for every case, but for very many simple dynamic websites this will work well, I think. Wordpress blogs and big catalog style websites, for example, would make good use of such a caching server. The main benefit is that the caching is exact and can be abstracted from the website programmer; the web framework can measure which request parameters are used and which database IDs a generated HTML chunk depends on.
How I intend to implement this
The caching server should be a simple HTTP server (written in C and using a simple HTTP server library). It will have 3 functions: serving cache, storing cache, and expiring cache. This will all be done though HTTP.
Serving Cache
The front-end webserver (Nginx or whatev) will send all GET requests to the cache server. If the cache server returns 404, it will then forward the request to the Application Server, otherwise it will serve the response. Inside the cache server, when it receives a GET request - it matches the request against all of its request templates to find a suitable cache. If it cannot find a matching request template it returns 404.
Expiring Cache
The Application Server will send POST /_expire?id=1234 to the cache server to expire all caches which are associated with the id 1234.
Storing Cache
The Application Server will send a POST request containing the cache and the associated IDs in the body of the request. Headers, path_info, and query params should constitute the request template for that cache. For example,
POST /871515.html HTTP 1.1 Host: four.livejournal.com Cookie: ljdomsess.four=v1:u1329:s17...The storage POST should not have any additional headers. The headers are exactly what will be used as the request template to filter GET requests later.
The cache would then be stored in memory, like memcached.
The key difference between this and memcached is filtering requests by the request template. In fact, I might use memcached as a back-end for storage (although, that's probably more overhead than it's worth). The difference between this and Squid is - well Squid doesn't do expiration or filtering at all (I think?).
suggestions? objections?