Subject Re: [IB-Architect] Suppress whitespace in transmit buffers
> Ann harrison wrote:
> Perhaps if we eliminated every even valued byte?

Well there was a scheme I came up with years ago that used an integrated
dictionary to do tokenized english text and then compressed the tokens.

There are only 580,000 words in the Oxford English dictionary. That's less
than 2^20. Add in a few million in specialized vocabulary and call it 2^23

Add in say, 2^4 for every possible tense and form
Add in a "mode" bit for specialized, local vocabulary
reserve 2^5 for concepts in other common languages that can't be
represented by english equivalents...


And you end up with every possible representable word taking up exactly 4
bytes, with no need to store spaces in textual strings and the
integers used for tokens easily compressed... Sure you need to ship the
OED with your operating system, but what's 650MB these days? Everybody
could use a good dictionary on their system anyway...

You get a name space where the character representation is independent of
the concept (take that, UTF8!) and you get a really fast word-matching
facility - makes parsing a lot simpler - and you get a rudimentary
(grammar-free) universal translator between languages.

you do, of course, end up with a system where presenting every word
probably means a disk access...

I was going to seek a patent for this scheme, called: "P-Code for
Human Languages" but at the time I thought it was too easy to patent and
pointless to implement.

(at the time (1989) I found the idea of GBs of ram in a
home computer irresistably funny... now I'm not laughing... as hard)

In today's patenting environment... hmmm...


> ------------------------------------------------------------------------
> Porsche Boxter. You and a friend. Nine dream days from
> Napa Valley to Beverly Hills. Provided by
> Click to enter.
> ------------------------------------------------------------------------
> To unsubscribe from this group, send an email to: