dxpc design by Brian Pane

SECTION 1: Data Transfer

Multiplexer class
each dxpc process contains one instance of a Multiplexer subclass:
ClientMultiplexer for a dxpc client or ServerMultiplexer for a
dxpc server.  Multiplexer is an abstract class that manages
all of the X connections.  When the main loop of dxpc finds out
(from select(2)) that there is activity on some file descriptor,
it calls the Multiplexer's handleSelect() method to do the
appropriate processing.

Channel class
A Multiplexer contains a set of Channel objects:
"Channel* channels_[MAX_CONNECTIONS]."  Channel is an abstract
class representing one end of an X connection.  It has two
very important abstract methods: doRead() to read data from
a file descriptor and compress it, and doWrite() to take
a compressed message and uncompress it and write it to the
file descriptor.
It has two concrete subclasses:
  * ClientChannel, used in the dxpc client, manages a
    connection to a real X client application.  Thus its
    doRead() method reads request messages sent from the
    X application and compresses them, and its doWrite()
    method takes compressed X event/reply/error messages
    from the dxpc server, uncompresses them, and sends
    them to the X application.
  * ServerChannel, used in the dxpc server, manages a
    single connection to the real X server .  Thus its
    doRead() method reads X event/reply/error messages
    from the real X server and compresses them, and its
    doWrite() method takes compressed X request messages
    from the dxpc client, uncompresses them, and sends
    them to the real X server.

Connection setup
Each time an X application connects to the dxpc client,
the main select loop calls the ClientMultiplexer's
createNewConnection() method.  This method assigns a
unique _channel_number_ to the connection and creates
a ClientChannel object to talk to the X application.
It then sends a message to the dxpc server to indicate
that a new channel has been created.  The ServerMultiplexer
in the dxpc server opens a socket connection to the real
X server and creates a ServerChannel object to talk to
this socket.  Within the ServerMultiplexer, this new
ServerChannel is indexed with the same channel number
used for the corresponding ClientChannel in the dxpc

Reading of data
Once the connection gets set up, the X application sends
an initialization message, as defined in the X protocol,
to the dxpc client.  The main select loop of the dxpc
client calls the Multiplexer's handleSelect() method,
which maps the file descriptor to the corresponding
channel number using the fdToChannelID() method.  It
then finds the ClientChannel object for this channel
and calls its doRead() method.  It passes an EncodeBuffer
object (described below) as an argument to doRead().
The ClientChannel's implementation of doRead() reads
all of the available X messages from the file descriptor.
It compresses these messages and appends the compressed
versions to the supplied EncodeBuffer.  After the doRead()
call completes, the Multiplexer's handleSelect() method
writes the contents of the EncodeBuffer to the socket that
connects the client proxy to the server proxy.  It prepends
a length field to this block of messages so that the server
proxy can tell when it has received the whole thing.  (This
makes decoding of the message much earlier.)

When the dxpc server receives the data from the dxpc client,
its main select loop calls the ServerMultiplexer's
handleSelect() method, which figures out which ServerChannel
corresponds to this X connection and calls that ServerChannel's
doWrite() method.  This method decompresses the message(s) and
writes the uncompressed form to its socket connected to the
real X server.

When the X server sends messages (replies, events, or errors)
to the dxpc server, this whole process happens in reverse.
The ServerChannel's doRead() method reads X messages and
compresses them.  The dxpc server's ServerMultiplexer object
prepends a length field to the compressed message block and
sends it to the dxpc client, where the ClientMultiplexer
dispatches the data to the right ClientChannel.  The
ClientChannel uncompresses the data and writes it to the
socket connected to the real X client application.

ReadBuffer class
ReadBuffer and its concrete subclasses ClientReadBuffer,
ServerReadBuffer, and ProxyReadBuffer are responsible for
buffering received data from a socket until a complete
message is received.  Thus the implementation in the
various Multiplexer and Channel classes can just call
a ReadBuffer's getMessage() method to get a complete
message; compression/uncompression is much easier if
you know that you've got the entire message.  The internals
of the ReadBuffer subclasses do all of the work of figuring
out where messages start and end based on the protocol spec.

EncodeBuffer and DecodeBuffer classes
dxpc's compression routines take fields that are 8, 16, or
32 bits long and crunch them into compressed fields that
are 'n' bits long, where 'n' is seldom a multiple of
8.  To keep the main compression methods free of ugly bit
manipulation code, the EncodeBuffer class provides an
interface that lets you appends arbitrary-length bit
strings to a buffer.  The ClientChannel and ServerChannel
doRead() methods call the EncodeBuffer's encodeValue()
method to generate the compressed form of the X messages.
The Multiplexer class rounds up the contents of the
resulting bit string to an integral number of bytes and
sends the bytes to the peer dxpc process.  Here the
EncodeBuffer provides one other convenience.  Recall
that the Multiplexer object has to prepend a length
header to the data that it transmits.  The EncodeBuffer
reserves a few spare bytes before the start of its
actual data so that the Multiplexer can fill i the header
byte(s).  See the comments near the end of Multiplexer::handleSelect()
for details.

On the receiving side, the DecodeBuffer class provides a
wrapper for the received byte string that allows arbitrary-
length bit fields to be extracted easily.  Its operations mirror
those of the EncodeBuffer.

The EncodeBuffer and DecodeBuffer classes also contain
convenience functions that work with dxpc's caches (described

Section 2: Compression

Compression Techniques

dxpc compresses X messages in six ways:

  1. It transmits only the actual data fields.  Many X messages,
     particularly the events generated by the server, contain bytes
     that aren't used for anything other than padding the message
     to a fixed length or filling holes in the message that result
     from the alignment restrictions for 16- and 32-bit fields.

  2. For fields with very few possible values, dxpc uses the smallest
     possible number of bits.  For example, some messages contain
     fields whose values are Boolean.  In the X protocol, a Boolean
     field is one byte long.  dxpc, however, transmits this as a single
     bit value: 0 for False or 1 for True.  Some messages contain
     fields that hold enumerated data types.  Look at the "format"
     field in the X_PutImage request, for example.  It is
     an 8-bit field that can only have three possible values:
     0, 1, or 2.  dxpc transmits this value using 2 bits (the
     minimum number of bits needed to represent the values 0-2).
     The encodeValue() method of class EncodeBuffer lets you
     write the low order 'n' bits of a value to the output
     stream.  And the decodeValue() method of class DecodeBuffer
     lets you read the low-order 'n' bits of a value from the
     input stream.

  3. For fields that can have many values but often have small
     values, dxpc uses encodings biased toward small numbers.
     For example, look at the X_QueryExtension request message.
     It contains a "length of name" field that is 16 bits long--
     enough to hold a value up to 65535.  In practice, extensions
     never have names this long.  They have short names like
     "LBX" and "MIT-SHM".  The lengths of these names could be
     encoded in a small number of bits, using technique 2 above,
     but dxpc has to be able to handle cases where the name is
     something really long that requires far more bits (as many
     as 16) to represent the length.  For fields with this
     property (you sometimes need a lot of bits to represent
     the value, but usually the value is very small), dxpc
     uses a "block encoding": it writes the low-order 'n' bits
     to the output stream.  If all of the remaining bits are
     equal to the 'n'th bit, it writes a 1 bit to the output
     stream and stops.  If not, it writes a 0 bit to the
     output stream and repeats this process with the next
     'n' bits until either all of the bits in the value have
     been encoded or all of the remaining bits are equal to
     the last bit encoded.  (Note that this technique works
     well for both small positive numbers, whose binary encoding
     has a lot of leading zeros, and small negative numbers,
     whose binary encoding has a lot of leading ones.)
     The encodeValue() method of class EncodeBuffer actually
     looks like this:
            void encodeValue(unsigned int value, unsigned int numBits,
		             unsigned int blockSize = 0);
     If blockSize is zero, decodeValue() degenerates to technique
     2, as described above.  Otherwise, it implements the encoding
     of technique 3 with 'n' equal to the specified 'blockSize.'
     DecodeBuffer::decodeValue() works the same way.
     Look at the code in ClientChannel.C that compresses X_QueryExtension
     requests.  It contains:
            encodeBuffer.encodeValue(nameLength, 16, 6);
     which means, "encode the low order 16 bits of this value, but
     write it in 6-bit blocks and stop when the remaining bits are
     all ones or all zeros.

  4. This is the most important and most complex compression technique
     used in dxpc.  For many message types, each field has a high
     probability of being equal to the value of the same field in
     some previous message of that type.  For example, every "geometry"
     request (request that draws lines, arcs, rectangles, text, etc)
     contains a Graphics Context (GC) value as one of its fields.
     A GC is a 32-bit value, though the X Protocol specification
     guarantees that the top 3 bits are zeros.  Thus there can be
     about five hundred million GCs.  Most programs, however, use
     only a few GCs.  Thus the value of the GC field in a geometry
     request is usually equal to the value of the GC field in some
     previous geometry request.  This property applies to many other
     types of X data in addition to GCs: fonts, colors, Window IDs, and
     even coordinates.

     For each field that has this property, dxpc uses a cache object to
     keep track of the field's recent values.  Each Channel object
     (ClientChannel and ServerChannel) contains both a ClientCache
     that contains cache objects for fields in X-client-generated
     messages and a ServerCache that contains cache objects for fields
     in X-server-generated messages.  There are two types of cache
     objects.  Class IntCache holds an arbitrary-sized cache of
     integers, so it can hold 32-bit values.  Class CharCache holds
     a cache of exactly 7 characters, so it can hold 8-bit values.
     CharCache is used mostly in text compression, as described
     below.  These two classes have basically the same interface
     (they once were subclasses of an abstract base class Cache,
     but that added 50% to the size of each CharCache object, to
     I eliminated it).

     A cache object implements a move-to-front list just like
     those in FHBX.  If you haven't read the FHBX paper yet, go
     read it now; it describes the concept much better than I could.
     There is, fortunately, an easy way to work with these caches:
     class EncodeBuffer has an encodeCachedValue() method that takes
     a value, looks for it a specified cache, inserts it in the cache
     if it's not there or moves it toward the front of the cache if
     it's already present, and writes an encoding to the output
     stream.  If the value was already in the cache, the encoding
     is 'n' + 1 bits long, where 'n' is the value's distance from
     the front of the cache.  If the value is at the front of the
     cache, the encoding is only one bit long.  If the value is at
     index 1 in the cache, the encoding is 2 bits long.  These
     compact encodings are what enable dxpc to achieve very high
     compression ratios.  If the value is not in the cache already,
     EncodeBuffer::encodeCachedValue() encodes a special "escape"
     sequence, followed by the value itself.  However, if it has
     to do this, it tries to minimize the number of bits needed
     to encode the value by actually sending the difference between
     the value and the last value that it had to insert into the
     cache (often the difference is a much smaller number than
     the value itself) and using the "block" encoding of technique
     3, as described above.  The EncodeBuffer::decodeCachedValue()
     class does the corresponding uncompression.

     (Sorry if this description is a bit unclear.  It's actually
     even more complicated if you read the code, since there's
     a really ugly mechanism for guessing the optimal block size for
     escaped values.  The good news is that EncodeBuffer::encodeCachedValue()
     and DecodeBuffer::decodeCachedValue() hide all of this complexity.
     The compression/decompression code just does a single function
     call per field.)

  5. For some fields, there are strong locality patterns that don't quite
     fit the above techniques.  Consider, for example, the sequence numbers
     contained in all reply, event, and error messages generated by the
     X server.  These numbers keep increasing, until they wrap around
     at 65536, so the cache-based approach of technique 4 won't work
     well (every cache lookup will be a cache miss).  However, if you
     look at difference between the current sequence number and the
     last sequence number, there is a good chance that it will be a
     very small number.  And if you look at the pattern of these
     deltas, you'll often see the same small numbers (like 1) appearing
     over and over.  Thus, dxpc computes the delta and encodes that
     using a cache of the last 6 deltas.

     This technique of *preconditioning* field values is sometimes very
     effective at transforming values into something that can be compressed
     using one of the previous four techniques.

  6. Some message types contain huge blocks of data that have the same
     value every time.  For example, most X client applications query
     the server for all X resources when they start up.  It sends back
     a copy of its entire X resource database.  dxpc caches this on the
     client side, so that it doesn't have to be retransmitted for
     subsequent clients unless it changes.  The BlockCache and
     BlockCacheSet classes provide the supporting infrastructure.
     Look at the code for the GetProperty and QueryFont requests for
     some examples.

Text Compression

dxpc's text comprssor is moderately effective.  The class TextCompressor
uses an array of 8192 CharCache objects to hold information about what
characters usually follow what other characters.  Its encodeChar()
method hashes together the last three characters to form an index
into this table, which contains a move-to-front cache that is used to
encode the next character using technique 4.  There are other text
compression algorithms that probably would work much better.

Image Compression

dxpc compresses images differently, depending upon their type:

  * For monochrome images with width-to-height ratios greater than 10
    and heights less than or equal to 32, it transposes the images from
    row-major to column-major representation and applies a variant of the
    text-compression algorithm using 'h'-bit "characters," where 'h' is
    the image height.  (Why would it do something so strange?  Well...
    FrameMaker draws bitmapped text one line at a time, and this algorithm
    compresses the bitmapped text much more effectively than anything else
    I could come up with.)

  * For other monochrome images, it uses a variant of Group 4 FAX encoding.

  * For 8-bit color images, it uses a CharCache to hold previous pixel
    values and encodes them using technique 4.

  * For anything else, it sends the whole image, uncompressed.


All X messages share certain common properties.  For example, all
messages generated by the X server have a 16-bit sequence number in
the 3rd and 4th bytes.  In dxpc, the doRead() and doWrite() methods
of ClientChannel and ServerChannel() take advantage of this by
processing all of the common parts and then executing a switch
statement to find the compression/decompression code for the
specific message type being processed.  (There are also default
handlers that just copy the bytes and don't do any compression
or uncompression, in support of message types that dxpc doesn't
know how to compress yet.)

As a result, it is fairly easy to add compression for some message
type that dxpc doesn't compress yet, or to improve the compression
for some message type that it already compresses.

The following sections describe how to add compression for new
message types.

Deciding What to Compress

As the first step in improving dxpc, run it with the "-s2" option
and then run some X application that you want to speed up.  When
the application terminates, dxpc prints out a histogram showing
how many bits of the total X traffic came from each message type,
and how well each message type was compressed.  This will help
you to decide which message types to work on.

Compression of Request Messages
Look at the doRead() method in ClientChannel.C.  It contains a big
switch statement with a case for every request message type that
dxpc knows how to compress.  (A default case at the end handles
any request message type that it doesn't know how to compress; these
remaining message types are just transmitted without compression.)

Here's how to add compression for some request type that currently
isn't compressed:

1. Add a case for it to the switch statement.  (the 
   include file #define's all of the X_[message-type] values).
2. Don't worry about the request type (first byte of the message); the
   code just before the switch statement has handled this already.
3. If the message type is variable-length, you have to encode the
   length somehow.  Often, variable-length messages contain a list
   of fixed-size objects.  For messages like this, it's most efficient
   to encode the number of objects, rather than the length of the
   message in bytes, since the number of objects is a smaller value
   that takes fewer bits to represent.  See the code for X_PolyFillRectangle
   for an example.
4. Once you've encoded the length (if needed), then encode the
   values of the request message's fields.  See the list of 6 techniques
   above, and pick the one that works best for each field.  If you don't
   know what patterns or locality properties each field exhibits,
   add some code to print out the field values, and run a few different
   X applications to see what kind of patterns emerge.  [WARNING:
   if you're printing to stdout from dxpc, it's generally a good idea
   to redirect stdout to a file.  In certain places in the dxpc code,
   the X server may be blocked on a write operation (waiting for dxpc
   to consume data).  If dxpc tries to print to stdout, and stdout is
   in an xterm window, dxpc and the X server will deadlock.]
5. If you need new caches, or "last-value-of-this-field" variables,
   add them to the ClientCache class.
6. Use the methods of the encodeBuffer object to encode your compressed
   field values.  If bytes in the request message format are unused,
   don't bother encoding them.
7. When in doubt, read the code for some similar message type that
   dxpc already compresses.

8. Next, add decompression code to the server.  The doWrite() method
   in ServerChannel.C contains another big switch statement.  Add a case
   for the new message type.

9. At the start of the case, figure out how big the uncomrpressed
   message should be, either by hardcoding it (if the message is fixed-
   size) or by reading and interpreting the length information encoded
   by the dxpc client (if the message is variable-size).
10. Set the value of the local variable outputLength to the uncompressed
    message size.
11. Allocate a buffer into which you can write the uncompressed message
    by calling: "outputMessage = writeBuffer_.addMessage(outputLength);"
12. Don't bother setting the message type (first byte) or length (bytes
    3 and 4) in the output message; the code after the switch statement
    will do so automatically.
13. For each field that you've encoded, decode it and write the value
    to the right byte(s) in the output message buffer.  There are some
    local variables value and cValue for use as temporaries, just to
    keep you from having to declare them repeatedly in every case.
    See the existing cases in the switch statement for examples of
    how to call the decodeBuffer methods and how to store the results
    in the output message.

14. Compile and test the changes.  For debugging, there's a
    DumpMessage() function declared in util.H that will display the
    byte values in a block of memory.  Use it to dump the original
    message on the client side and the uncompressed message on the
    server side to verify that they're the same (note that bytes
    defined as "unused" in the protocol spec will have random values
    on the server side, though).

Compression of Reply Messages

Some request messages cause the server to send a reply message.  To
compress the reply message:

1. Implement compression of the request message, as described above.

2. At the end of the case block for the request message in ClientChannel.C,
   push the request's opcode and sequence number onto the sequenceNumQueue_
   object.  (Find the line that does this in the case block for
   X_GetGeometry, and just copy it.)

3. At the end of the case block for the request message in ServerChannel.C,
   push the request's opcode and sequence number onto the sequenceNumQueue_
   object.  (Again, steal the code from X_GetGeometry.)

4. In the doRead() method in ServerChannel.C, there are two switch statements:
   one for replies and one for events.  The first one, for replies, is the
   one to which you can add a case for the new reply message type.

5. In this case block, encode the message in the same manner described for
   request messages above.  If you need to add caches, put them in the
   ServerCache class.  Don't encode the reply type (1st byte) or sequence
   number (3rd and 4th bytes); they get handled automatically.  If the
   message is variable-length, encode the length as compactly as possible.

6. In the doWrite() method in ClientChannel.C, decode the message, using
   the same approach you used to decode request messages in

Note: For some request types, it is helpful to be able to store parameters
from the request to help in compressing the reply.  For example, the
X_AllocColor request specifies requested RGB values and the X_AllocColor
reply specifies what actual RGB values the X server was able to provide
instead.  Often, the actual values will be equal to the requested values.
Or the actual values will be very close to the requested values, so that
it's more efficient to transmit (actual-requested) than (actual).
Thus, the sequenceNumQueue_ object lets you pass up to three unsigned
ints as additional request data that you can refer to when compressing
and uncompressing the reply.  See the code for X_AllocColor for examples.

Compression of Event Messages

1. The second case statement in ServerChannel::doRead() has a case for
   each type of server-generated event message that dxpc knows how to
   compress.  The same approach used for requests and replies is used
   here: encode each field values using whichever of the 6 techniques
   is most suitable.  Don't encode the event type (1st byte) or the
   sequence number (3rd & 4th bytes); these are handled automatically.

2. If you need to add caches, put them in the ServerCache class

3. Add the corresponding decoding logic in doWrite() in ClientChannel.C.

Compression of Error Messages

1. The same switch statements in ServerChannel and ClientChannel that
   handle events also handle errors.  The "case 0:" block is the one
   with the error compression/uncompression logic.  Right now this
   just handles a few common error types, like the error that is generated
   when an AllocColor request fails because the X server runs out of
   colors.  If you find that other error types are contributing
   significantly to the X protocol traffic for some important application,
   you can enhance this code to compress the additional error types.