Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Brigades and HTTP 1.1 in an Apache module

Jerry Stratton, June 22, 2008

Still taking baby steps on my journey to an Apache module. This version will add two things, one easy and one hard: it will switch from requesting via an IP address to requesting via a hostname, and it will switch to HTTP 1.1 instead of HTTP 1.0. Why? Currently, the module uses the IP address of the authentication server. That’s okay if the authentication server is a standalone server with only one hostname (or where the hostname that the IP address refers to is the default hostname for the server) but it won’t work with virtual hosts, nor will it work with servers whose IP address is dynamic.

Given the application, that may not be a problem (you wouldn’t want to run an authentication server on a shared host) but if the module is being used more generally to provide an XML response, not being able to query virtual hosts is a big drawback.

Hostname instead of IP address

The current version of the module uses the IP address in one place. The new version will use the hostname in more than one place.

Go ahead and grab 3 mod_external_auth.c from the source archive and save it as mod_external_auth.c.

First, using the hostname instead of the IP address to connect to the remote server is easy: just replace it. Apache’s APR functions will do a DNS lookup automatically if we give apr_sockaddr_info_get a hostname instead of an IP address. Modern servers cache DNS lookups locally, so there should be very little performance hit doing a DNS lookup instead of hard-coding to the current IP address (and requiring a recompile every time the IP address changes).

At the top of the source file, add “char *authHost = "www.hoboes.com";” (or use your own test server, preferably, since there’s no telling what kind of response you’ll get from my test server).

Then, replace your IP address (216.92.252.156 in the example) with authHost. The new apr_sockaddr_info_get should be:

  • if ((status = apr_sockaddr_info_get(&sockaddr, authHost, APR_INET, 80, 0, request->pool)) != APR_SUCCESS) {

Recompile, and the module should continue working exactly as before. (If you’ve been playing around with the authentication response on the other end, make sure you have it set to “let you in” now.)

HTTP 1.1

Because the module doesn’t currently specify HTTP 1.1, the server assumes HTTP 1.0. This makes programming a whole lot easier: we get the server’s response as one big chunk and we don’t need to bother with http headers. Unfortunately, if we want to be able to handle virtual hosts, the module needs to be an HTTP 1.1 client. The server knows which host should respond to the module’s query only if the module provides the HTTP 1.1 header “Host: ”.

That means the module must provide a couple of headers in its request, and it means the module must handle chunked responses collected over more than one apr_bucket_read.

HTTP 1.1 headers

When we send a Host header, we need to specify that we’re using HTTP 1.1, the Host header, the Content-Length header, and the Connection header (we just want the connection to close and be done with, we don’t want a persistent connection). The headers are going to look like this:

  • GET /authorization.php?ip=xxx.xxx.xxx.xxx&page=/wherever/somepage.html HTTP/1.1
  • Host: www.hoboes.com
  • Content-Length: 0
  • Connection: close

There are visible and invisible parts to this request. The visible parts are the headers; the invisible parts are the lines between the headers. The lines should be separated by a carriage return and a new line. The last line needs to have two sets of those. And when the script searches, it will need to check for those. Apache defines CRLF for us, and we can concatenate two of them to create the header ender:

  • #define HEADEREND CRLF CRLF

The line that creates the authRequest gets a lot bigger:

  • authRequest = apr_pstrcat(request->pool, "GET /authorization.php?ip=", remote_ip, "&page=", uri, " HTTP/1.1", CRLF, "Host: ", authHost, CRLF, "Content-Length: 0", CRLF, "Connection: close", HEADEREND, NULL);

More than one read

That’s not enough, though, because as soon as you switch to HTTP 1.1, the data needs to be read in pieces. That’s going to get a little complicated, so I’m going to break it into its own function, readResponse():

[toggle code]

  • //read from an existing connection into a bucket brigade and then flatten it into xml_response
  • static int readResponse(request_rec *request, apr_socket_t *sock, const char **xml_response, apr_size_t *xml_length) {
    • int connected;
    • apr_size_t runningLength;
    • apr_status_t status = APR_SUCCESS;
    • apr_bucket *xmlBucket;
    • apr_bucket_brigade *xmlBrigade;
    • xmlBrigade = apr_brigade_create(request->pool, request->connection->bucket_alloc);
    • xmlBucket = apr_bucket_socket_create(sock, request->connection->bucket_alloc);
    • connected = 1;
    • runningLength = 0;
    • while (connected) {
      • if ((status = apr_bucket_read(xmlBucket, xml_response, xml_length, APR_BLOCK_READ)) != APR_SUCCESS) {
        • return logError(status, "receiving response", request);
      • }
      • if (*xml_length <= 0) {
        • apr_bucket_destroy(xmlBucket);
        • connected = 0;
      • } else {
        • runningLength += *xml_length;
        • APR_BRIGADE_INSERT_TAIL(xmlBrigade, xmlBucket);
      • }
      • xmlBucket = apr_bucket_socket_create(sock, request->connection->bucket_alloc);
    • }
    • apr_socket_close(sock);
    • *xml_length = runningLength;
    • *xml_response = malloc(runningLength);
    • if ((status = apr_brigade_flatten(xmlBrigade, (char *) *xml_response, &runningLength)) != APR_SUCCESS) {
      • logError(status, "flattening response", request);
    • }
    • apr_brigade_destroy(xmlBrigade);
    • return status;
  • }

That function will read anything up until the server stops responding. (If you don’t trust the server on the other end, you will want to put a limit both on the amount of data you’re receiving and on the amount of time it takes to receive it.)

In the getAuthentication function, remove the apr_bucket_socket_create and apr_bucket_read sections with:

[toggle code]

  • if ((status = readResponse(request, sock, xml_response, xml_length)) != APR_SUCCESS) {
    • return logError(status, "receiving XML response", request);
  • }

The readResponse function will, instead of reading the entire response into one bucket as in the previous module, read the remote server‘s response into multiple buckets, building up a brigade of buckets. When it’s done, it flattens the brigade into the xml_response string.

Looking at variables

This is still going to fail. It will fail for one to three reasons: first, it might fail because of a bug in the code; second, it will fail because the new HTTP 1.1 response includes headers, and they won’t parse as XML; finally, it might fail if the response is a chunked response, because the chunk sizes are not going to parse as XML either.

The module needs a way of showing off variables and their values, so that we can debug it more easily. Underneath the logError function, add a logNotice function:

[toggle code]

  • static void logNotice(char *title, char *message, request_rec *request) {
    • ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, request, "%s: %s", title, message);
  • }

If the readResponse does not return APR_SUCCESS, put a notification of the value of authResponse in Apache’s error log:

[toggle code]

  • if ((status = readResponse(request, sock, xml_response, xml_length)) != APR_SUCCESS) {
    • logNotice("authRequest", authRequest, request);
    • return logError(status, "receiving XML response", request);
  • }

If the problem was due to an error in the HTTP 1.1 request, this will log the value of authRequest to Apache’s error log.

Also, in the parseAuthentication function, if it’s going to return an error log the value of the xml first:

[toggle code]

  • } else {
    • logNotice("xml", (char *)xml, request);
    • return logError(status, "feeding xml", request);
  • }

Before going any further, you’ll want to debug the module until the error occurs during feeding and the XML response that is logged to the error_log is what you’re expecting (including HTTP headers and possibly with a set of numbers put in between chunks).

Getting past the headers

There are two steps to parsing an HTTP 1.1 response. The first is to get past the headers and into the body; and the second is to (if necessary) remove the chunking sizes from the body. Unfortunately, my experience is that the remote server’s chunks don’t correspond to the apr_bucket_read iterations.

Getting past the headers is pretty easy. Just look for the first doubled CRLF. Here’s a parseResponse that only gets past the headers (so it will not work with chunked responses).

[toggle code]

  • //unchunk an HTTP 1.1 response if necessary
  • static void parseResponse(const char **xml_response, apr_size_t *xml_length, request_rec *request) {
    • //get past the headers
    • *xml_response = strstr(*xml_response, HEADEREND)+strlen(HEADEREND);
    • apr_status_t status = APR_SUCCESS;
    • *xml_length = strlen(*xml_response);
    • return status;
  • }

I’m assuming, possibly incorrectly, that apr_brigade_flatten terminates strings.

In getAuthentication, add a parseResponse call after the readResponse call:

[toggle code]

  • if ((status = parseResponse(xml_response, xml_length, request)) != APR_SUCCESS) {
    • return logError(status, "unchunking XML response", request);
  • }

At this point, if you are not getting chunked responses from your server, it will be working again. You can tell if you’re getting chunked responses by the presence of the header "Transfer-Encoding” and the value “chunked”.

Parsing HTTP 1.1 chunks

When an HTTP 1.1 server decides it needs to chunk its response, it will add a Transfer-Encoding header and set it to chunked. It will send the headers intact, and then send the body as a series of chunks. Each chunk consists of a hexadecimal number on its own line, and the chunk of text. The number is the length of the chunk. The last number is always zero: a zero means that there are no more chunks.

You should be able to see this format in the error log, since the xml parser won’t be able to parse these chunks and will call logNotify before exiting. If you are not getting chunked responses, you’re going to want to force the issue: whatever you’re using as a response, make it bigger until the beginning of the response is a number. For example, I put the Gettysburg address into the one of the XML fields and duplicated it seven times.

First, in order to convert hex to integer, the pow() function is useful. So include math.h at the top of the file:

  • #include "math.h"

Then, replace the parseResponse function with one that checks for chunking:

[toggle code]

  • //unchunk an HTTP 1.1 response if necessary
  • static int parseResponse(const char **xml_response, apr_size_t *xml_length, request_rec *request) {
    • //remember the start of headers
    • const char *headerStart = *xml_response;
    • apr_status_t status = APR_SUCCESS;
    • //get past the headers to the beginning of the body (and the first chunk size)
    • char *chunkSizeStart = strstr(*xml_response, HEADEREND)+strlen(HEADEREND);
    • //look for the presence of chunking
    • //this only works because we expect short bodies with well-formed XML
    • //HTTP headers are not case sensitive
    • if (strcasestr(headerStart, CRLF "transfer-encoding: chunked" CRLF)) {
      • char *chunkStart;
      • int chunkLength;
      • int chunkLengthCounter;
      • char *chunkLengthPointer;
      • int chunkLengthPart;
      • //the good data will be less than the size of headers+body+chunksizes
      • char *goodResponse = malloc(*xml_length);
      • int fullLength = 0;
      • do {
        • //get the chunk length: all chunk lengths begin and end with CRLF
        • chunkStart = strstr(chunkSizeStart, CRLF);
        • //work the chunk size hex string backwards to convert from hex to integer
        • chunkLengthPointer = chunkStart-1;
        • chunkLength = 0;
        • chunkLengthCounter = 0;
        • while (chunkLengthPointer >= chunkSizeStart) {
          • chunkLengthPart = *chunkLengthPointer;
          • if (chunkLengthPart >= 97 && chunkLengthPart <= 102) {
            • //letter a - f
            • chunkLengthPart -= 97;
            • chunkLengthPart += 10;
          • } else if (chunkLengthPart >=48 && chunkLengthPart <=57) {
            • //number 0 - 9
            • chunkLengthPart -= 48;
          • } else {
            • logInteger("bad hex character", chunkLengthPart, request);
            • return HTTP_SERVICE_UNAVAILABLE;
          • }
          • chunkLength += chunkLengthPart*pow(16, chunkLengthCounter);
          • chunkLengthCounter++;
          • chunkLengthPointer -= 1;
        • }
        • //copy the chunk to the good data
        • if (chunkLength>0) {
          • //logInteger("chunk length", chunkLength, request);
          • chunkStart += strlen(CRLF);
          • //logNotice("chunk", chunkStart, request);
          • strncpy(goodResponse+fullLength, chunkStart, chunkLength);
          • fullLength += chunkLength;
          • //the next chunk size string starts where this chunk ends
          • chunkSizeStart = chunkStart + chunkLength; // + strlen(CRLF);
        • }
      • } while (chunkLength > 0);
      • *xml_length = fullLength;
      • *xml_response = goodResponse;
      • //logInteger("unchunked length", fullLength, request);
    • } else {
      • *xml_response = chunkSizeStart;
      • *xml_length = strlen(*xml_response);
      • //logInteger("no need to unchunk length", *xml_length, request);
    • }
    • return status;
  • }

That should be it. If there is chunking involved, this function will find each chunk, get its size from the hexadecimal string preceding it, and append it to a clean string (goodResponse).

At this point the module should work with any size response and with virtual hosts. There are a couple of places where it trusts the server not to try to inject bad data (by holding the connection open forever, for example). If you create a module that can’t trust the remote server, you’ll want to be very careful to avoid responses monopolizing your local server’s http processes.

Look for this version of the module in “4 mod_external_auth.c” in the archive.

August 15, 2008: Bug in parseResponse fixed

There was a bug in the way that the parseResponse function counted through chunks; under some circumstances it would count wrong. In the process of fixing it, I also made parseResponse return a status code when it runs into problems. It’s always a good idea to not trust even your own servers. If if they aren’t subject to hacking attempts, error checking helps keeps bugs on one server from cascading down through other servers.

Since it was taking me too much time to get around to writing about the change, I just replaced the code in the parent article and in the archive with the new, good code.

  1. <- Mac OS X Spaces
  2. Regrouping Django ->