From: Dave Stampe-Psy+Eng <dstamp@watserv1.waterloo.edu>
Subject: Re: 386 renderer progress
Date: Thu, 9 Jan 1992 14:06:27 GMT
Message-ID: <1992Jan9.140627.24996@watserv1.waterloo.edu>
Organization: University of Waterloo


thinman@netcom.netcom.com (Lance Norskog) writes:

>About step #17: Conversion To Assembler
>
>This, in the form you contemplate, is a bad idea for 2 reasons:
>        1) it renders your work non-portable to other more
>        deserving hardware (PC's are really icky), and
>
>        2) it is sub-optimal
>
>Recently Byron Rakitzis re-wrote a subset of an image-processing language
>(POPI, from the book "The Digital Darkroom") in incremental
>compilation form.  He wrote code generators for MIPS and SPARC
>processors, and wrote the basic language operators to
>generate code through the generation routines.  The generators
>allocate registers on the fly, and use the processor
>very effectively.  The code is free, and the individual
>generators are only five hundred lines of C.  It's in
>comp.sources.misc, volume 26, issue 111.
>
>Such a system would need a processor-independent front end that
>implements basic blocks and translates float ops into
>raw integer arithmetic.  I claim that such a front end can do a
>better job on float->integer than you can coding it by hand
>because it knows when it doesn't need to renormalize results.
>This is especially true if you use rational arithmetic instead
>of radix-point fixed arithmetic.  

Well, I claim otherwise: I am mixing several fixed-point formats
in my code, based on required accuracy.  And, at one step in the
scanline, I would have had to implement a 64/64=64 bit divide, which
would be expensive.  But, knowing what the algorithm does, I could
find a pair of factors in the numerator and denominator terms with
a ratio expressible in 16 bits, so I predivide these terms, then
multiply instead of dividing.  I'd like to see a compiler figure
that one out!

Also, that "few hundred lines of C" would take longer to write
than finishing up the renderer assemblyization.  That code would
just be a "list" of the tricks I'd use anyway.  Fear not, though, 
there will be a C version available so you people hot to convert
it for other machines can work it out-- I don't have time to.

>More important, the overhead of repetitively walking data structures
>in screen updates can be removed by compiling seperate code chunks 
>for each polygon vertex.  These can be just procedure calls,
>in-line copies of the clipping code, or whatever there's room for.

Well, I've taken that into account when designing my code.  Walking
vertices, is done only twice per poly (by rewriting the order of
operations) so is about 3% of the speed cost.  Also, I forgot to
mention that I'm going to support N-sided convex polygons in the 
code, as there is little or no extra costs involved.  So unrolled
loop don't help much in that case.  If I was only supporting triangles,
I might look into it.  But the increase in rendering speed produced
by larger polys more than offsets time required in the preprocess
phase  to walk the vertices.


--------------------------------------------------------------------------
| My life is Hardware,                    |                              | 
| my destiny is Software,                 |         Dave Stampe          |
| my CPU is Wetware...                    |                              | 
| Anybody got a SDB I can borrow?         | dstamp@watserv1.uwaterloo.ca |
__________________________________________________________________________
