DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gmp.info.gz) Assembler Loop Unrolling

Info Catalog (gmp.info.gz) Assembler Software Pipelining (gmp.info.gz) Assembler Coding (gmp.info.gz) Assembler Writing Guide
 
 Loop Unrolling
 --------------
 
 Loop unrolling consists of replicating code so that several limbs are
 processed in each loop.  At a minimum this reduces loop overheads by a
 corresponding factor, but it can also allow better register usage, for
 example alternately using one register combination and then another.
 Judicious use of `m4' macros can help avoid lots of duplication in the
 source code.
 
    Any amount of unrolling can be handled with a loop counter that's
 decremented by N each time, stopping when the remaining count is less
 than the further N the loop will process.  Or by subtracting N at the
 start, the termination condition becomes when the counter C is less
 than 0 (and the count of remaining limbs is C+N).
 
    Alternately for a power of 2 unroll the loop count and remainder can
 be established with a shift and mask.  This is convenient if also
 making a computed jump into the middle of a large loop.
 
    The limbs not a multiple of the unrolling can be handled in various
 ways, for example
 
    * A simple loop at the end (or the start) to process the excess.
      Care will be wanted that it isn't too much slower than the
      unrolled part.
 
    * A set of binary tests, for example after an 8-limb unrolling, test
      for 4 more limbs to process, then a further 2 more or not, and
      finally 1 more or not.  This will probably take more code space
      than a simple loop.
 
    * A `switch' statement, providing separate code for each possible
      excess, for example an 8-limb unrolling would have separate code
      for 0 remaining, 1 remaining, etc, up to 7 remaining.  This might
      take a lot of code, but may be the best way to optimize all cases
      in combination with a deep pipelined loop.
 
    * A computed jump into the middle of the loop, thus making the first
      iteration handle the excess.  This should make times smoothly
      increase with size, which is attractive, but setups for the jump
      and adjustments for pointers can be tricky and could become quite
      difficult in combination with deep pipelining.
 
Info Catalog (gmp.info.gz) Assembler Software Pipelining (gmp.info.gz) Assembler Coding (gmp.info.gz) Assembler Writing Guide
automatically generated byinfo2html