[SGVLUG] bit order (was Linux based web-server appliance)

Wed May 24 14:58:25 PDT 2006

On Fri, May 19, 2006 at 01:51:01PM -0700, Dustin Laurence wrote:
> On Fri, May 19, 2006 at 12:57:42PM -0700, David Lawyer wrote:
> > [snip]
> > Consider transmitting the number 0xABC.  If A is sent first, what does
> > it mean?  10, 160, 2560, etc.?  In order to decipher it you need to
> > scan to C and note that there are 3 hex digits.  But if C, the low order
> > byte is sent first (little endian) then you know what it means: just C
> > (12).  So little endian does make sense, even though we don't write
> > number that way and thus have to scan to the end of a number before
> > we know what the 1st digit represents.
> 
> Quite to the contrary, the provably optimal algorithm is inherently
> big-endian,
Not necessarily.  See my remarks latter on.
> and the obvious little-endian algorithm is both expensive and
> numerically unstable (when approximate floating-point numbers are
> being represented--obviously this is irrelevant in the usual
> computer case of exact integers, but as a physicist I felt it
> necessary to stipulate good numerical behavior).  And it's
> interesting, if you're the sort of person that finds these things
> interesting.
> 
> Re-constructing a number from its digits is the problem of
> evaluating a polynomial, since a positional notation simply writes
> the coefficients of the unique canonical form where they have values
> 0 <= a_i < b for the known base b.  Since you say "transmitting" we
> have the constraints that we want the evaluation algorithm to be
> serial, i.e. to be able to work with one digit (bit, digit, word,
> byte, it doesn't matter for this purpose since all of them can be
> viewed as digits in an appropriate base) at a time without backing
> up.  Given that, we also want to need to remember as little state as
> possible for simplicity and efficiency.  And of course we want it to
> a good algorithm in the same sense as for any polynomial evaluation,
> sequential or not: we require numerical stability and computational
> efficiency.
> 
> By lucky chance the provably optimal general solution, Horner's
> rule, is serializable and needs no state.  It is very familiar to
> those of us who keep an RPN calculator with great fondness and live
> in terror of the day it fails and we can't replace it.

I use a RPN calculator and never thought about the state.  There's
just the contents of the 4 stack registers.  I always thought of it as
just using postfix notation where 2 3 + results in 5.  You just place
the operator +after the two operands 2 and 3. 

But back to the question of efficiency.  But first, putting the case in
context, most of the time there is no need to extract a number from
digits because the number is in binary and it's just kept in binary
form.  So extracting a number from it's digits is really just changing
the basis, like from binary to decimal or hexadecimal.

Now the value of the number "def" (with base b) is d x b^2  +  e x b^1
+ f x b^0.  But if we are going to do a lot of such calculations with
base b, we precalculate b^0, b^1, b^2, etc.  So let B be the vector of
the b's (..., b^3, b^2, b^1, b^0) and X be the vector of the digits
(d, e, f, ....).  The value of the number is B.X where . is inner
product.  B is a constant vector.  Now the efficiency of calculating
B.X doesn't depend of whether or not we start multiplying with the low
order order digit of X or with the high order.

> It is written inside-out on paper, but that is because that is how
> you denote big-digit-first evaluation.  It is also simple: start
> with the value of the first digit, and then for each successive
> digit multiply the current value by the base (with fast shift
> instructions unless a moron chose the base) and add the new digit.

Above has n -1 multiplications and n -1 additions, the same as 
with the inner product of B and X vectors as I just described.

> You specified that it's nice not to know how many digits there are
> beforehand--this means there is a terminator and this is the best
> case for Horner's rule since the current value is the only required
> state.  The naive little-endian algorithm requires an additional
> piece of state in this case, the current digit count.  It also
> requires O(n^2) multiplications rather than the O(n) of Horner's
> rule.

O(n^2) is true if we didn't store the values of b^n in a table.  But
for repeated operations one would store them in a table so it's O(n).
In fact the b^n values can be hard-coded into the hardware or
software.

> With numbers known to fit in a machine register this is
> irrelevant because the multiplies are shifts and probably cost less
> than anything else (and will be O(n) even for naive little-endian),
> but with arbitrarily large numbers and bignum style representations
> this can really matter.

For little endian, we can wait till we get all the digits and then use
Horner's, inspecting the most significant digit first.  OK, I admit I
was wrong stating an advantage for little endian.  I guess it really
doesn't make much difference.

-----------------------topic change-----------------------------------
> 
Dustin:
> > > Even a Pentium or Pentium II can consume 200 W, so I'd have
> > > expected a 486 to consume more.
> > 
David:
> > There's a page that shows how, as speeds increased from the 486 to
> > Pentium I to Pentium ... to Pentium IV, the power kept going up.
> > And within for example Pentium I's the power went up as speed
> > increased.
> 
> Of course it does--for a given design the power consumed is
> proportional to the clock speed (well, to zeroeth order).  And there
> is a thermodynamic minimum cost which is also proportional to clock
> speed because every copy that overwrites a value creates entropy,
> and the faster the copies the faster the inherent power.
> 
> Anyway, nobody cared about CPU consumption in that era and so it
> wasn't optimized.

The people who made my 486 cared since they had a "green" model which
had very low consumption which was advertised in the MB manual with
estimates for the HD and monitor.

> If you take your favorite speed rating per watt you'll find that a
> 486, or any x86 chip (at the very least before the Pentium-M), is
> utterly blown away by chips engineered for low power.

But what about the consumption per hour?  I could type and read email
just as fast on the 486 as on a PC of more recent vintage.  And the
modem speeds were about the same, at least with an analog modem.  So
perhaps newer servers save energy but for a home PC's used mostly for
reading and writing, the old 486 PC's might win.  But I'll agree that
a PC desktop designed like a laptop with hibernation mode, etc. will
use less energy.  But how many desktops are designed like this?

[snip]
			David Lawyer