DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gmp.info.gz) Assembler Writing Guide

Info Catalog (gmp.info.gz) Assembler Loop Unrolling (gmp.info.gz) Assembler Coding
 
 Writing Guide
 -------------
 
 This is a guide to writing software pipelined loops for processing limb
 vectors in assembler.
 
    First determine the algorithm and which instructions are needed.
 Code it without unrolling or scheduling, to make sure it works.  On a
 3-operand CPU try to write each new value to a new register, this will
 greatly simplify later steps.
 
    Then note for each instruction the functional unit and/or issue port
 requirements.  If an instruction can use either of two units, like U0
 or U1 then make a category "U0/U1".  Count the total using each unit
 (or combined unit), and count all instructions.
 
    Figure out from those counts the best possible loop time.  The goal
 will be to find a perfect schedule where instruction latencies are
 completely hidden.  The total instruction count might be the limiting
 factor, or perhaps a particular functional unit.  It might be possible
 to tweak the instructions to help the limiting factor.
 
    Suppose the loop time is N, then make N issue buckets, with the
 final loop branch at the end of the last.  Now fill the buckets with
 dummy instructions using the functional units desired.  Run this to
 make sure the intended speed is reached.
 
    Now replace the dummy instructions with the real instructions from
 the slow but correct loop you started with.  The first will typically
 be a load instruction.  Then the instruction using that value is placed
 in a bucket an appropriate distance down.  Run the loop again, to check
 it still runs at target speed.
 
    Keep placing instructions, frequently measuring the loop.  After a
 few you will need to wrap around from the last bucket back to the top
 of the loop.  If you used the new-register for new-value strategy above
 then there will be no register conflicts.  If not then take care not to
 clobber something already in use.  Changing registers at this time is
 very error prone.
 
    The loop will overlap two or more of the original loop iterations,
 and the computation of one vector element result will be started in one
 iteration of the new loop, and completed one or several iterations
 later.
 
    The final step is to create feed-in and wind-down code for the loop.
 A good way to do this is to make a copy (or copies) of the loop at the
 start and delete those instructions which don't have valid antecedents,
 and at the end replicate and delete those whose results are unwanted
 (including any further loads).
 
    The loop will have a minimum number of limbs loaded and processed,
 so the feed-in code must test if the request size is smaller and skip
 either to a suitable part of the wind-down or to special code for small
 sizes.
 
Info Catalog (gmp.info.gz) Assembler Loop Unrolling (gmp.info.gz) Assembler Coding
automatically generated byinfo2html