base:improved_clockslide
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
base:improved_clockslide [2017-02-28 08:14] – lft | base:improved_clockslide [2017-04-27 01:17] (current) – copyfault | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== Improved Clock Slide ==== | + | ====== Improved Clock Slide ====== |
by **lft** | by **lft** | ||
Line 19: | Line 19: | ||
Every additional byte we skip corresponds to one less cycle of delay. Notice that the operands are reinterpreted as opcodes depending on where we land on the slide. For instance, the final '' | Every additional byte we skip corresponds to one less cycle of delay. Notice that the operands are reinterpreted as opcodes depending on where we land on the slide. For instance, the final '' | ||
- | The length of the clockslide | + | The length of the clock slide depends on the maximum jitter we have to support, and we pay a corresponding penalty in the form of useless waiting cycles. In the example, the maximum supported jitter is 10 cycles (41-31), and the minimum overhead cost is 9 cycles (50-41). |
Now here comes the improvement: | Now here comes the improvement: | ||
- | Notice that in the latest case (starting at cycle 41), we still execute a single '' | + | Notice that in the latest case (starting at cycle 41), we still execute a single '' |
But we can use the page-crossing penalty of the branch instruction to add an extra cycle in this particular case! Consider: | But we can use the page-crossing penalty of the branch instruction to add an extra cycle in this particular case! Consider: | ||
Line 38: | Line 38: | ||
; at cycle 49 | ; at cycle 49 | ||
- | The clockslide | + | The clock slide is now one byte shorter, and the minimum overhead cost has been reduced to 8 cycles. |
The downside is that we now have an alignment requirement. Sometimes it may not be possible to adjust the starting address of the delay code. But note that we can insert dummy bytes just after the branch instruction, | The downside is that we now have an alignment requirement. Sometimes it may not be possible to adjust the starting address of the delay code. But note that we can insert dummy bytes just after the branch instruction, | ||
+ | |||
+ | |||
+ | === Slight variation === | ||
+ | |||
+ | by // | ||
+ | |||
+ | Optionally, if the Z-flag reflects the value of the accumulator, | ||
+ | |||
+ | ; delay 18-A cycles | ||
+ | ; 31..41 (A in range 0..10) | ||
+ | | ||
+ | ; 35..45 | ||
+ | branch | ||
+ | ; 37 (if A=0: no add. branch-cylce, | ||
+ | ; 39..47 (if A=1..9: add. branch-cycle) | ||
+ | ; 49 (if A=10: +branch-cycle +pb-cycle) | ||
+ | ; --- clock slide starts here --- | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ; --- end of clock slide --- | ||
+ | ; page-crossing here! | ||
+ | ; at cycle 49 | ||
+ | |||
+ | If A=0, the branch is not taken; thus the total sum of cycles wasted by the clockslide until the page break will be 11. In case of a non-vanishing A, the NOP-instruction is skipped (-2 cycles) but the additional " |
base/improved_clockslide.txt · Last modified: 2017-04-27 01:17 by copyfault