dxpc design by Brian Pane =========== SECTION 1: Data Transfer ======================== Multiplexer class ----------------- each dxpc process contains one instance of a Multiplexer subclass: ClientMultiplexer for a dxpc client or ServerMultiplexer for a dxpc server. Multiplexer is an abstract class that manages all of the X connections. When the main loop of dxpc finds out (from select(2)) that there is activity on some file descriptor, it calls the Multiplexer's handleSelect() method to do the appropriate processing. Channel class ------------- A Multiplexer contains a set of Channel objects: "Channel* channels_[MAX_CONNECTIONS]." Channel is an abstract class representing one end of an X connection. It has two very important abstract methods: doRead() to read data from a file descriptor and compress it, and doWrite() to take a compressed message and uncompress it and write it to the file descriptor. It has two concrete subclasses: * ClientChannel, used in the dxpc client, manages a connection to a real X client application. Thus its doRead() method reads request messages sent from the X application and compresses them, and its doWrite() method takes compressed X event/reply/error messages from the dxpc server, uncompresses them, and sends them to the X application. * ServerChannel, used in the dxpc server, manages a single connection to the real X server . Thus its doRead() method reads X event/reply/error messages from the real X server and compresses them, and its doWrite() method takes compressed X request messages from the dxpc client, uncompresses them, and sends them to the real X server. Connection setup ---------------- Each time an X application connects to the dxpc client, the main select loop calls the ClientMultiplexer's createNewConnection() method. This method assigns a unique _channel_number_ to the connection and creates a ClientChannel object to talk to the X application. It then sends a message to the dxpc server to indicate that a new channel has been created. The ServerMultiplexer in the dxpc server opens a socket connection to the real X server and creates a ServerChannel object to talk to this socket. Within the ServerMultiplexer, this new ServerChannel is indexed with the same channel number used for the corresponding ClientChannel in the dxpc client. Reading of data --------------- Once the connection gets set up, the X application sends an initialization message, as defined in the X protocol, to the dxpc client. The main select loop of the dxpc client calls the Multiplexer's handleSelect() method, which maps the file descriptor to the corresponding channel number using the fdToChannelID() method. It then finds the ClientChannel object for this channel and calls its doRead() method. It passes an EncodeBuffer object (described below) as an argument to doRead(). The ClientChannel's implementation of doRead() reads all of the available X messages from the file descriptor. It compresses these messages and appends the compressed versions to the supplied EncodeBuffer. After the doRead() call completes, the Multiplexer's handleSelect() method writes the contents of the EncodeBuffer to the socket that connects the client proxy to the server proxy. It prepends a length field to this block of messages so that the server proxy can tell when it has received the whole thing. (This makes decoding of the message much earlier.) When the dxpc server receives the data from the dxpc client, its main select loop calls the ServerMultiplexer's handleSelect() method, which figures out which ServerChannel corresponds to this X connection and calls that ServerChannel's doWrite() method. This method decompresses the message(s) and writes the uncompressed form to its socket connected to the real X server. When the X server sends messages (replies, events, or errors) to the dxpc server, this whole process happens in reverse. The ServerChannel's doRead() method reads X messages and compresses them. The dxpc server's ServerMultiplexer object prepends a length field to the compressed message block and sends it to the dxpc client, where the ClientMultiplexer dispatches the data to the right ClientChannel. The ClientChannel uncompresses the data and writes it to the socket connected to the real X client application. ReadBuffer class ---------------- ReadBuffer and its concrete subclasses ClientReadBuffer, ServerReadBuffer, and ProxyReadBuffer are responsible for buffering received data from a socket until a complete message is received. Thus the implementation in the various Multiplexer and Channel classes can just call a ReadBuffer's getMessage() method to get a complete message; compression/uncompression is much easier if you know that you've got the entire message. The internals of the ReadBuffer subclasses do all of the work of figuring out where messages start and end based on the protocol spec. EncodeBuffer and DecodeBuffer classes ------------------------------------- dxpc's compression routines take fields that are 8, 16, or 32 bits long and crunch them into compressed fields that are 'n' bits long, where 'n' is seldom a multiple of 8. To keep the main compression methods free of ugly bit manipulation code, the EncodeBuffer class provides an interface that lets you appends arbitrary-length bit strings to a buffer. The ClientChannel and ServerChannel doRead() methods call the EncodeBuffer's encodeValue() method to generate the compressed form of the X messages. The Multiplexer class rounds up the contents of the resulting bit string to an integral number of bytes and sends the bytes to the peer dxpc process. Here the EncodeBuffer provides one other convenience. Recall that the Multiplexer object has to prepend a length header to the data that it transmits. The EncodeBuffer reserves a few spare bytes before the start of its actual data so that the Multiplexer can fill i the header byte(s). See the comments near the end of Multiplexer::handleSelect() for details. On the receiving side, the DecodeBuffer class provides a wrapper for the received byte string that allows arbitrary- length bit fields to be extracted easily. Its operations mirror those of the EncodeBuffer. The EncodeBuffer and DecodeBuffer classes also contain convenience functions that work with dxpc's caches (described below). Section 2: Compression ====================== Compression Techniques ---------------------- dxpc compresses X messages in six ways: 1. It transmits only the actual data fields. Many X messages, particularly the events generated by the server, contain bytes that aren't used for anything other than padding the message to a fixed length or filling holes in the message that result from the alignment restrictions for 16- and 32-bit fields. 2. For fields with very few possible values, dxpc uses the smallest possible number of bits. For example, some messages contain fields whose values are Boolean. In the X protocol, a Boolean field is one byte long. dxpc, however, transmits this as a single bit value: 0 for False or 1 for True. Some messages contain fields that hold enumerated data types. Look at the "format" field in the X_PutImage request, for example. It is an 8-bit field that can only have three possible values: 0, 1, or 2. dxpc transmits this value using 2 bits (the minimum number of bits needed to represent the values 0-2). The encodeValue() method of class EncodeBuffer lets you write the low order 'n' bits of a value to the output stream. And the decodeValue() method of class DecodeBuffer lets you read the low-order 'n' bits of a value from the input stream. 3. For fields that can have many values but often have small values, dxpc uses encodings biased toward small numbers. For example, look at the X_QueryExtension request message. It contains a "length of name" field that is 16 bits long-- enough to hold a value up to 65535. In practice, extensions never have names this long. They have short names like "LBX" and "MIT-SHM". The lengths of these names could be encoded in a small number of bits, using technique 2 above, but dxpc has to be able to handle cases where the name is something really long that requires far more bits (as many as 16) to represent the length. For fields with this property (you sometimes need a lot of bits to represent the value, but usually the value is very small), dxpc uses a "block encoding": it writes the low-order 'n' bits to the output stream. If all of the remaining bits are equal to the 'n'th bit, it writes a 1 bit to the output stream and stops. If not, it writes a 0 bit to the output stream and repeats this process with the next 'n' bits until either all of the bits in the value have been encoded or all of the remaining bits are equal to the last bit encoded. (Note that this technique works well for both small positive numbers, whose binary encoding has a lot of leading zeros, and small negative numbers, whose binary encoding has a lot of leading ones.) The encodeValue() method of class EncodeBuffer actually looks like this: void encodeValue(unsigned int value, unsigned int numBits, unsigned int blockSize = 0); If blockSize is zero, decodeValue() degenerates to technique 2, as described above. Otherwise, it implements the encoding of technique 3 with 'n' equal to the specified 'blockSize.' DecodeBuffer::decodeValue() works the same way. Look at the code in ClientChannel.C that compresses X_QueryExtension requests. It contains: encodeBuffer.encodeValue(nameLength, 16, 6); which means, "encode the low order 16 bits of this value, but write it in 6-bit blocks and stop when the remaining bits are all ones or all zeros. 4. This is the most important and most complex compression technique used in dxpc. For many message types, each field has a high probability of being equal to the value of the same field in some previous message of that type. For example, every "geometry" request (request that draws lines, arcs, rectangles, text, etc) contains a Graphics Context (GC) value as one of its fields. A GC is a 32-bit value, though the X Protocol specification guarantees that the top 3 bits are zeros. Thus there can be about five hundred million GCs. Most programs, however, use only a few GCs. Thus the value of the GC field in a geometry request is usually equal to the value of the GC field in some previous geometry request. This property applies to many other types of X data in addition to GCs: fonts, colors, Window IDs, and even coordinates. For each field that has this property, dxpc uses a cache object to keep track of the field's recent values. Each Channel object (ClientChannel and ServerChannel) contains both a ClientCache that contains cache objects for fields in X-client-generated messages and a ServerCache that contains cache objects for fields in X-server-generated messages. There are two types of cache objects. Class IntCache holds an arbitrary-sized cache of integers, so it can hold 32-bit values. Class CharCache holds a cache of exactly 7 characters, so it can hold 8-bit values. CharCache is used mostly in text compression, as described below. These two classes have basically the same interface (they once were subclasses of an abstract base class Cache, but that added 50% to the size of each CharCache object, to I eliminated it). A cache object implements a move-to-front list just like those in FHBX. If you haven't read the FHBX paper yet, go read it now; it describes the concept much better than I could. There is, fortunately, an easy way to work with these caches: class EncodeBuffer has an encodeCachedValue() method that takes a value, looks for it a specified cache, inserts it in the cache if it's not there or moves it toward the front of the cache if it's already present, and writes an encoding to the output stream. If the value was already in the cache, the encoding is 'n' + 1 bits long, where 'n' is the value's distance from the front of the cache. If the value is at the front of the cache, the encoding is only one bit long. If the value is at index 1 in the cache, the encoding is 2 bits long. These compact encodings are what enable dxpc to achieve very high compression ratios. If the value is not in the cache already, EncodeBuffer::encodeCachedValue() encodes a special "escape" sequence, followed by the value itself. However, if it has to do this, it tries to minimize the number of bits needed to encode the value by actually sending the difference between the value and the last value that it had to insert into the cache (often the difference is a much smaller number than the value itself) and using the "block" encoding of technique 3, as described above. The EncodeBuffer::decodeCachedValue() class does the corresponding uncompression. (Sorry if this description is a bit unclear. It's actually even more complicated if you read the code, since there's a really ugly mechanism for guessing the optimal block size for escaped values. The good news is that EncodeBuffer::encodeCachedValue() and DecodeBuffer::decodeCachedValue() hide all of this complexity. The compression/decompression code just does a single function call per field.) 5. For some fields, there are strong locality patterns that don't quite fit the above techniques. Consider, for example, the sequence numbers contained in all reply, event, and error messages generated by the X server. These numbers keep increasing, until they wrap around at 65536, so the cache-based approach of technique 4 won't work well (every cache lookup will be a cache miss). However, if you look at difference between the current sequence number and the last sequence number, there is a good chance that it will be a very small number. And if you look at the pattern of these deltas, you'll often see the same small numbers (like 1) appearing over and over. Thus, dxpc computes the delta and encodes that using a cache of the last 6 deltas. This technique of *preconditioning* field values is sometimes very effective at transforming values into something that can be compressed using one of the previous four techniques. 6. Some message types contain huge blocks of data that have the same value every time. For example, most X client applications query the server for all X resources when they start up. It sends back a copy of its entire X resource database. dxpc caches this on the client side, so that it doesn't have to be retransmitted for subsequent clients unless it changes. The BlockCache and BlockCacheSet classes provide the supporting infrastructure. Look at the code for the GetProperty and QueryFont requests for some examples. Text Compression ---------------- dxpc's text comprssor is moderately effective. The class TextCompressor uses an array of 8192 CharCache objects to hold information about what characters usually follow what other characters. Its encodeChar() method hashes together the last three characters to form an index into this table, which contains a move-to-front cache that is used to encode the next character using technique 4. There are other text compression algorithms that probably would work much better. Image Compression ----------------- dxpc compresses images differently, depending upon their type: * For monochrome images with width-to-height ratios greater than 10 and heights less than or equal to 32, it transposes the images from row-major to column-major representation and applies a variant of the text-compression algorithm using 'h'-bit "characters," where 'h' is the image height. (Why would it do something so strange? Well... FrameMaker draws bitmapped text one line at a time, and this algorithm compresses the bitmapped text much more effectively than anything else I could come up with.) * For other monochrome images, it uses a variant of Group 4 FAX encoding. * For 8-bit color images, it uses a CharCache to hold previous pixel values and encodes them using technique 4. * For anything else, it sends the whole image, uncompressed. Framework --------- All X messages share certain common properties. For example, all messages generated by the X server have a 16-bit sequence number in the 3rd and 4th bytes. In dxpc, the doRead() and doWrite() methods of ClientChannel and ServerChannel() take advantage of this by processing all of the common parts and then executing a switch statement to find the compression/decompression code for the specific message type being processed. (There are also default handlers that just copy the bytes and don't do any compression or uncompression, in support of message types that dxpc doesn't know how to compress yet.) As a result, it is fairly easy to add compression for some message type that dxpc doesn't compress yet, or to improve the compression for some message type that it already compresses. The following sections describe how to add compression for new message types. Deciding What to Compress ------------------------- As the first step in improving dxpc, run it with the "-s2" option and then run some X application that you want to speed up. When the application terminates, dxpc prints out a histogram showing how many bits of the total X traffic came from each message type, and how well each message type was compressed. This will help you to decide which message types to work on. Compression of Request Messages ------------------------------- Look at the doRead() method in ClientChannel.C. It contains a big switch statement with a case for every request message type that dxpc knows how to compress. (A default case at the end handles any request message type that it doesn't know how to compress; these remaining message types are just transmitted without compression.) Here's how to add compression for some request type that currently isn't compressed: 1. Add a case for it to the switch statement. (the
include file #define's all of the X_[message-type] values). 2. Don't worry about the request type (first byte of the message); the code just before the switch statement has handled this already. 3. If the message type is variable-length, you have to encode the length somehow. Often, variable-length messages contain a list of fixed-size objects. For messages like this, it's most efficient to encode the number of objects, rather than the length of the message in bytes, since the number of objects is a smaller value that takes fewer bits to represent. See the code for X_PolyFillRectangle for an example. 4. Once you've encoded the length (if needed), then encode the values of the request message's fields. See the list of 6 techniques above, and pick the one that works best for each field. If you don't know what patterns or locality properties each field exhibits, add some code to print out the field values, and run a few different X applications to see what kind of patterns emerge. [WARNING: if you're printing to stdout from dxpc, it's generally a good idea to redirect stdout to a file. In certain places in the dxpc code, the X server may be blocked on a write operation (waiting for dxpc to consume data). If dxpc tries to print to stdout, and stdout is in an xterm window, dxpc and the X server will deadlock.] 5. If you need new caches, or "last-value-of-this-field" variables, add them to the ClientCache class. 6. Use the methods of the encodeBuffer object to encode your compressed field values. If bytes in the request message format are unused, don't bother encoding them. 7. When in doubt, read the code for some similar message type that dxpc already compresses. 8. Next, add decompression code to the server. The doWrite() method in ServerChannel.C contains another big switch statement. Add a case for the new message type. 9. At the start of the case, figure out how big the uncomrpressed message should be, either by hardcoding it (if the message is fixed- size) or by reading and interpreting the length information encoded by the dxpc client (if the message is variable-size). 10. Set the value of the local variable outputLength to the uncompressed message size. 11. Allocate a buffer into which you can write the uncompressed message by calling: "outputMessage = writeBuffer_.addMessage(outputLength);" 12. Don't bother setting the message type (first byte) or length (bytes 3 and 4) in the output message; the code after the switch statement will do so automatically. 13. For each field that you've encoded, decode it and write the value to the right byte(s) in the output message buffer. There are some local variables value and cValue for use as temporaries, just to keep you from having to declare them repeatedly in every case. See the existing cases in the switch statement for examples of how to call the decodeBuffer methods and how to store the results in the output message. 14. Compile and test the changes. For debugging, there's a DumpMessage() function declared in util.H that will display the byte values in a block of memory. Use it to dump the original message on the client side and the uncompressed message on the server side to verify that they're the same (note that bytes defined as "unused" in the protocol spec will have random values on the server side, though). Compression of Reply Messages ----------------------------- Some request messages cause the server to send a reply message. To compress the reply message: 1. Implement compression of the request message, as described above. 2. At the end of the case block for the request message in ClientChannel.C, push the request's opcode and sequence number onto the sequenceNumQueue_ object. (Find the line that does this in the case block for X_GetGeometry, and just copy it.) 3. At the end of the case block for the request message in ServerChannel.C, push the request's opcode and sequence number onto the sequenceNumQueue_ object. (Again, steal the code from X_GetGeometry.) 4. In the doRead() method in ServerChannel.C, there are two switch statements: one for replies and one for events. The first one, for replies, is the one to which you can add a case for the new reply message type. 5. In this case block, encode the message in the same manner described for request messages above. If you need to add caches, put them in the ServerCache class. Don't encode the reply type (1st byte) or sequence number (3rd and 4th bytes); they get handled automatically. If the message is variable-length, encode the length as compactly as possible. 6. In the doWrite() method in ClientChannel.C, decode the message, using the same approach you used to decode request messages in ServerChannel::doWrite(). Note: For some request types, it is helpful to be able to store parameters from the request to help in compressing the reply. For example, the X_AllocColor request specifies requested RGB values and the X_AllocColor reply specifies what actual RGB values the X server was able to provide instead. Often, the actual values will be equal to the requested values. Or the actual values will be very close to the requested values, so that it's more efficient to transmit (actual-requested) than (actual). Thus, the sequenceNumQueue_ object lets you pass up to three unsigned ints as additional request data that you can refer to when compressing and uncompressing the reply. See the code for X_AllocColor for examples. Compression of Event Messages ----------------------------- 1. The second case statement in ServerChannel::doRead() has a case for each type of server-generated event message that dxpc knows how to compress. The same approach used for requests and replies is used here: encode each field values using whichever of the 6 techniques is most suitable. Don't encode the event type (1st byte) or the sequence number (3rd & 4th bytes); these are handled automatically. 2. If you need to add caches, put them in the ServerCache class 3. Add the corresponding decoding logic in doWrite() in ClientChannel.C. Compression of Error Messages ----------------------------- 1. The same switch statements in ServerChannel and ClientChannel that handle events also handle errors. The "case 0:" block is the one with the error compression/uncompression logic. Right now this just handles a few common error types, like the error that is generated when an AllocColor request fails because the X server runs out of colors. If you find that other error types are contributing significantly to the X protocol traffic for some important application, you can enhance this code to compress the additional error types.