UNM CS341 S2002 HW2

Bit Diddling

Part Zero: Lame Decryption

Executives of Enron have encrypted definitive evidence of their guilt by taking the incriminating files and, considering them as a sequence of 32-bit unsigned integers, multipling each number by an odd constant C, chosen separately for each document. (Not a very clever bunch - they're accustomed to security through obscurity.) In order to restore the file, one need merely multiply each of the resulting numbers by some number D, again using 32-bit unsigned arithmetic, where D is the multiplicative inverse of C in the integers modulo 232. You have been given a document whose C is equal to the last eight digits of your UNM student ID number (plus one if that would be even, because C must be odd).

(For example, if your SSN were 90-209-7152, then your C would be its last eight digits, plus one to make it odd, so C=2097153. And the corresponding D=4292870145, as it turns out. You can verify this with a little hacking,

  #include <inttypes.h>
  uint32_t c = 2097153;
  uint32_t d = 4292870145;
  printf("decimal: c = %lu, d = %lu, c*d = %lu\n", c, d, c*d);
  printf("hex:     c = %lx, d = %lx, c*d = %lx\n", c, d, c*d);
or by using a programmer's calculator, such as the FOX Calculator below.)

The impoverished retirees of Texas are counting on you: find D for them! And be sure to show your calculations, in case you are subpoenaed by Congress. (You can do this manually if you like, but please express D in both binary and decimal, and show how you figured it out; don't shred the paperwork.)

Part One: Save the World from Viral Death

At the dawn of the twenty-second century you must use the only remaining serviceable computer on Earth, a SPARCstation you buried in your basement at the start of the computer jihad of 2032, to decode the bits on an ancient floppy disk in order to unlock the only remaining copy of the smallpox genome and allow scientists to develop a vaccine for a virus about to be released by a violent faction of the Ultranationalist Deep Eco-Republican Hydrocarbon-Lifeform-Firsters. Your computer suffered a hard disk crash in which the C compiler was lost, so you must program in assembler.

Your floppy disk drive succumbed to rust in the dank basement, so an antique low resolution magnetic force atomic microscope was used to read the raw patterns off the floppy's magnetic surface. In order to avoid having a long stretch without a change in magnetic polizarization (a stretch of more than three 0's), or too rapid a change in polarization (two 1's next to each other), 3.5in floppy disks of the late 20th century represented the data using MFM, a technique in which each input bit is represented on disk using two bits, according to the following table:
Prev BitThis BitBits Stored
0 0 10
1 0 00
- 1 01
so for instance the data 00011111100100101010001110001011 would be stored as 1010100101010101010010010010010001000100101001010100101001000101, as can be seen from the following tableau

 00011111100100101010001110001011

00           ||                     10
 00          ||                     10
  00         ||                     10
   01        ||                     01
    11       \/                     01
     11                             01
      11                            01
       11                           01
        11                          01
         10                         00
          00                        10
           01                       01
            10                      00
             00      ----\          10
              01     ----/          01
               10                   00
                01                  01
                 10                 00
                  01                01
                   10               00
                    00              10
                     00             10
                      01            01
                       11           01
                        11          01
                         10         00
                          00        10
                           00       10
                            01      01
                             10     00
                              01    01
                               11   01

(Technical notes: the 1's indicate a transition from one magnetic state to the other, while the 0's indicate no transition. Transitions too close together can interfere with each other, while no transition for too long a stretch can cause the oscillator in the read head to lose synchronization with the data on the track, or even allow the read head to drift off the track entirely. For more information, see the online floppy disk primer or reference material on MFM.)

The smallpox genome was represented using two bits per base pair, according to the following mapping:
Nucleic Acid Code
Adenine 00
Guanine 01
Thymine 10
Cytosine 11
You must write a program in SPARC assembler running on tkisem that decodes the bits salvaged from the floppy. Your program needs to translate the MFM-encoded bits into ASCII letters corresponding to the indicated base pair (A/G/T/C) and print them.

Everything is consistently big endian. This means in particular that the very first bit read off the floppy disk is the most significant bit of the word whose address is in mfm_data; the next bit is the second-most-significant bit in that same word; etc.

The data (in a form suitable for you to link to) is available here. Turn in the program that decodes the data, along with a file containing the decoded output. (You should link the object file that your code assembles to with the object file generated by assembling the file containing the data. Please don't edit the file containing the data, and don't include the data file with the stuff you hand in - we already have a copy thanks!)

Extra credit for the first five people to find the scientific name of the organism in which this naturally occuring genetic sequence is found (keeping in mind that the label on a 100-year-old floppy disk does not always correspond to its contents), and the number and holder of the US patent you must license to use this particular biosequence.

Part Two: Decode an Alien Message

Your skill with SPARC assembler makes you famous, and soon representatives of the Terran Space Administration retain your services. In the aftermath of the Core Wars of the early 21st century, Scott McNeally used his vast fortune to launch a generation ship out into the galaxy, equipped only with SPARC computers. Many decades later and light years from earth, the ship has intercepted an alien transmission. The amount of data is too voluminous to send home across the vast gulfs of space. Instead, we must send the ship a program to decode the message, so they can discover its meaning and send a summary. The ship's computational resources are stretched to the limit, so the program must be very efficient: it must be written in SPARC assembler.

The aliens that sent the message, called the Beetee, live underwater and have two manipulative limbs, each terminating in a cluster of thirteen tentacles. One limb is attached near the bottom of their body, and its musculature permits its use only for pushing thing up, while the other attaches higher up, and is useful mainly for pushing things down. When moving an object, the Beetee tend to give it successively lighter whacks up and down with either limb until it is at the desired height. As a consequence they developed a number system which we would call base negative thirteen, since alternate digits specify positive and negative terms contributing to the total quantity represented. The sequence of digits
(dn, dn-1, ..., d2, d1, d0)
where 0 <= bi < 13 represents the number
x = d0 - 13 d1 + 169 d2 - 2197 d3 + ... + (-13)n-1 dn-1 + (-13)n dn

Here are some decimal numbers, written as sequences of digits in base minus thirteen.
DecimalBase negative thirteen
1212
-121, 1
131, 12, 0
171, 12, 4
-474, 5
33

The Beetee have no rigid skeleton and are able to deform their bodies quite dramatically. (Such a structure, like an elephant's trunk or an earthworm or a human tongue, is called a hydrostat). For this reason the idea of a fixed word size never occurred to them. The physics of the situation led the Beetee to transmit messages as a sequence of on/off values, just like we do, but they encode their messages with variable length symbols. The pulsed laser they use to transmit messages consumes a great deal of energy for each pulse transmitted, so in this situation they use a code which favors 0's (which are inexpensive, since they correspond to the laser being off) over 1's (which are expensive, since they correspond to the laser being on.)

In particular, the transmission is a sequence of symbols, each represented by a stretch of 0's of some length. The end of each symbol is marked by a 1. Physically this corresponds to using the length of the gap between two successive pulses of the laser to represent each symbol. The length of the run of 0's can be anywhere between zero and thirteen clock tics long, and have the following meanings:
run length symbol
0 *
1 0
2 1
3 2
4 3
......
13 12
The 0,1,2,...,12 are digits of a number, while * is a stop symbol, indicating the end of a number. In honor of their ancestors, Beetees always transmit the most significant digit of a number first, followed by successively less significant digits.

So for example, to transmit the sequence of numbers (12, -12, 13, 17, -47, 3) the Beetee would transmit the sequence of symbols (12, *, 1, 1, *, 1, 12, 0, *, 1, 12, 4, *, 4, 5, *, 3, *) which corresponds to the sequence of bits

 00000000000001 1
 001 001 1
 001 00000000000001 01 1
 001 00000000000001 000001 1
 000001 0000001 1
 00001 1
or, as you would see them packed into words on a SPARC, which is big-endian, the first 86 bits of three 32-bit words
 00000000000001100100110010000000
 00000010110010000000000000100000
 11000001000000110000110000000000

The numbers the Beetee send represent a sequence of pen strokes that draw a picture. Each pen stroke is represented by four numbers, first the (x,y) coordinates of the starting point of the stroke, and then the (x,y) coordinates of the end point of the stroke. x and y are in the range +/- 5,000,000. (Well, actually +/- 4,826,809, which is a number of great significance to the Beetee because, in order to reproduce, groups of three Beetee must vie with each other in a combinatoric mating ritual. But you might as well round up to 5,000,000.)

The bit stream (in a form suitable for linking with your program) is available here. Please include a bitmap of the decoded image (eg grabbed using xv) among the files you turn in. As in problem 1, please don't edit the data file, but rather link your code to it, and don't include the data file with the stuff you hand in - we already have a copy thanks.

Extra credit: compose a suitable reply to the Beetee, encoded in the same format as the message they sent.

Hints

Handing in

The due date for this assignment is noon Friday Mar 1, 2002, but people who turn theirs in early will have our gratitude.

To hand in your code and other files, run

~bap/public_html/bin/hw-handin dirs/files
on a *.cs.unm.edu machine. Put written paper (if any) in BAP's mailbox in the CS mailroom, FEC 156.

Hand in checklist:
- including all source files?
- including any documentation?
- including genome output for problem 1?
- including captured bitmap for problem 2?
- including response to aliens for problem 2, if available?
- not including any object files, or SPARC executables?
- not including the data files you got from us?
- not including backup files or other garbage?


Barak Pearlmutter <bap@cs.unm.edu>