base:fast_sqrt
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
base:fast_sqrt [2019-07-23 00:16] – verz | base:fast_sqrt [2019-08-18 10:20] – verz | ||
---|---|---|---|
Line 372: | Line 372: | ||
RTS | RTS | ||
</ | </ | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | by Verz: the generic algorithm is: | ||
+ | |||
+ | < | ||
+ | R= 0 | ||
+ | M= N | ||
+ | D= 2^(p-1) | ||
+ | for n= 1 to p | ||
+ | { | ||
+ | T= (R+R+D) ASL (p-1) | ||
+ | if (T <= M) then M=M-T: R=R+D | ||
+ | M= M ASL 1 | ||
+ | D= D LSR 1 | ||
+ | } | ||
+ | </ | ||
+ | where //p// is the number of bits of the result; (or half the bits of the radicand, +1 if the radicand has an odd amount of bits).\\ | ||
+ | // | ||
+ | |||
+ | Incidentally the algorithm provides also the remainder via (M LSR p)\\ | ||
+ | //p// can be greater than necessary, just producing more iterations. A value of p=16 permits to perform the calculations for 32bit numbers and smaller. | ||
+ | |||
+ | This is the code for a 32bit integer Sqrt. Provides the result and the remainder: | ||
+ | <code 6502acme> | ||
+ | ; | ||
+ | ;* sqrt32 | ||
+ | ;* | ||
+ | ;* | ||
+ | ; | ||
+ | ;* by Verz - Jul2019 | ||
+ | ; | ||
+ | ;* | ||
+ | ;* input: | ||
+ | ;* output: sqrt, 16bit value | ||
+ | ;* remnd, | ||
+ | ; | ||
+ | |||
+ | sqrt32 | ||
+ | sta sqrt ; R=0 | ||
+ | sta sqrt+1 | ||
+ | sta M+4 | ||
+ | ;sta T+1 ; (T+1) is zero until last iteration; (T+0) is always 0 | ||
+ | |||
+ | ldy #14 ; 15 iterations (14-->0) + last iteration | ||
+ | loopsq | ||
+ | lda sqrt ; (2*R+D) LSR 1; actually: R+(D LSR 1) | ||
+ | ora stablo, | ||
+ | sta T+2 ; | ||
+ | lda sqrt+1 | ||
+ | ora stabhi,y | ||
+ | sta T+3 | ||
+ | bcs skip0 ; takes care of large numbers; if set, M>T | ||
+ | | ||
+ | lda M+3 | ||
+ | cmp T+3 | ||
+ | bcc skip1 ; T <= M (branch if T>M) | ||
+ | bne skip0 | ||
+ | lda M+2 | ||
+ | cmp T+2 | ||
+ | bcc skip1 | ||
+ | skip0 ;sec | ||
+ | lda M+2 ; M=M-T | ||
+ | sbc T+2 | ||
+ | sta M+2 | ||
+ | lda M+3 | ||
+ | sbc T+3 | ||
+ | sta M+3 | ||
+ | lda sqrt ; R=R+D | ||
+ | ora stablo+1,y | ||
+ | sta sqrt | ||
+ | lda sqrt+1 | ||
+ | ora stabhi+1,y | ||
+ | sta sqrt+1 | ||
+ | skip1 | ||
+ | asl M ; M=M*2 | ||
+ | rol M+1 | ||
+ | rol M+2 | ||
+ | rol M+3 | ||
+ | dey ; implicit: D=D/2, by the decrement of .Y | ||
+ | bpl loopsq | ||
+ | lastiter | ||
+ | bcs skp0 ; takes care of large numbers; if set, M>T | ||
+ | ; during last iteration D=1, so [(2*R+D) LSR 1] makes D the MSB of T+1 | ||
+ | lda M+3 | ||
+ | cmp sqrt+1 | ||
+ | bcc skp1 ; T <= M branch if T>M | ||
+ | bne skp0 | ||
+ | lda M+2 | ||
+ | cmp sqrt ; (T+2) = sqrtLO | ||
+ | bcc skp1 | ||
+ | bne skp0 | ||
+ | lda M+1 | ||
+ | cmp #$80 ; value of (T+1) during last iteration | ||
+ | bcc skp1 | ||
+ | skp0 ;sec | ||
+ | lda M+1 | ||
+ | sbc #$80 ; (T+1) during last iteration | ||
+ | sta M+1 | ||
+ | lda M+2 ; M=M-T | ||
+ | sbc sqrt ; (T+2) | ||
+ | sta M+2 | ||
+ | lda M+3 | ||
+ | sbc sqrt+1 | ||
+ | sta M+3 | ||
+ | inc sqrt ; R=R+D with D=1 | ||
+ | skp1 asl M+1 ; M=M*2; location M+0=0 | ||
+ | rol M+2 | ||
+ | rol M+3 | ||
+ | rol M+4 | ||
+ | rts | ||
+ | |||
+ | ;**** Variables and Shift table | ||
+ | stabhi byte 0, | ||
+ | stablo BYTE $01, | ||
+ | byte 0, | ||
+ | |||
+ | square = $57 ; 5 bytes: input value; during calculation needs the 5th byte | ||
+ | sqrt = $5F ; 2 bytes: result | ||
+ | remnd = M+2 ; 2 B + 1 b: is in the high bytes of M (M LSR 16); msb is in T+0 (the 5th byte of square) | ||
+ | T = $5B ; 4 bytes: could be 2 bytes: T+0 is always 0; T+1 is 0 until last iteration | ||
+ | M = square ; 4 bytes: over the input square | ||
+ | </ | ||
+ | The algorithm is pretty fast: it has a top cycles count of around 1700, but seems to average at 1.3ms (using variables in page zero).\\ | ||
base/fast_sqrt.txt · Last modified: 2019-08-18 20:28 by verz