## Meta

base:filling_the_vectors

# Filling the vectors

By Bitbreaker/Oxyron/Nuance

The attached vector.tar.gz is rather outdated. I rewrote most of the parts of the filler and ended up with 25% faster results. A new tar.gz will come soon, until then i have already updated the source for the fill.asm presented within this article. Have fun reading through the source and detecting new ways to solve the same problem.

## Precautions

For filling polygons you will use some sort of scanline conversion algorithm, if you want to keep it simple, stick to triangles or quads (planar!) that have no angle bigger than 180°. Also you might think of tearing the process into two parts for better understanding and less hassle with the few registers the 6502 offers. Else you have to save and restore registers rather often, what is expensive, also code complexity rises up to a level that is a pain in the arse (see the code here: vector.tar.gz - compile with acme -f cbm -o vector.prg vector.asm)

## Preparation

As for a quad/triangle, first of all take the 4/3 vertices and find the vertice with the lowest y position, then the vertice with the highest y position. Then calculate all x positions for each y that is between y_min and y_max for the lines that span between those two dots. If you define the quads that way, that all lines go clockwise, you can determine whether a line is on the left (index of vertice is < start point) or right side (index of vertice is > start point). This helps a lot when you later on fill the lines, as the direction of filling will then be always the same and additional checks (or swapping of x1/x2) can be omitted in the inner loop.

```   v1 y_min
/\
v4/  \v2
\  /
\/
v3 y_max```

So on the right side there is a line from v1 to v2 and v2 to v3, on the left side from v1 to v4 and v4 to v3.

## Filling

For filling we cut the line that spans from x1 to x2 into 3 pieces to have the first few pixels until a 8×8 block starts (x1 and 7) and the last few pixels after the last full 8×8 block (x2 and 7). Those two pieces are special cases that need extra treatment. In the very special case where start- and end-chunk of the line are within the same block we need to combine them, or else we write them to the bitmap with just a part of the full pattern. Also the start and endpart of the line must be combined (ORA) with the screen content, as we possibly share the edge with other already drawn faces, that would else be trashed. The remaining full blocks (in our example 3) can now be easily filled by just storing \$ff (or your desired pattern) to the respective memory locations. Speedcode \o/

``` ____________________________________________
|    XXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXX     |
|                                   |
x1                                   x2```

## Examples

Here are some snippets of example code. There are some routines to precalculate the x1 and x2 positions as fast as possible. The filler then fills the area enclosed by the values with an 8×2 pattern.

```!cpu 6510

fill_code  = \$1c     ;location of inner loop
xstart     = \$78     ;slope table for xstart is stored here

tgt_dst    = \$c000
tgt_size   = \$400

cd_d       = \$f500
cd_i       = \$f600
!ifdef MULTI_PATTERN {
patt_0     = \$f880
patt_1     = \$f980
patt_2     = \$fa80
patt_3     = \$fb80
}
to_index_col_b1 = \$fd00
to_index_col_b2 = \$fe00

;--------------------------
;SETUP
;
;--------------------------

!ifdef MULTI_PATTERN {
e_patt
!byte \$11,\$aa,\$ee,\$ff
o_patt
!byte \$44,\$55,\$bb,\$ff
}

;--------------------------
;THE VERY INNER LOOP OF OUR FILLER
;(will be placed into zeropage for max. performance)
;--------------------------

fill_start
!pseudopc fill_code {
fill
;fills a line either in bank 1 or 2 with a pattern
;x = x2
;y = y2
!ifdef BY_2 {
lsr+1 f_err+1
}
outline1 nop                    ;either dex or nop will cause a full area or one with outline on right edges

f_bnk1   lda to_index_col_b1,x
sta+1 f_jmp+1
f_back                          ;directly jump to here if something is wrong with speedcode setup
dey
f_yend   cpy #\$00               ;forces carry to be set
bcc f_end
;--------CALCULATE X2-----------------------------------------
f_err    lda #\$00               ;restore error
f_dx1    sbc #\$00               ;do that bresenhamthingy for xend, code will be setup for either flat or steep slope
f_code   bcs +
bcs_start
dex
sta+1 f_err+1
f_bnk2   lda to_index_col_b1,x  ;load index from \$00..\$20 depending on x -> x / 4 & \$1e | bank_offset (\$00|\$20)
sta+1 f_jmp+1          ;save 1 cycle due to zeropage
jmp ++
bcs_end
+
sta+1 f_err+1          ;save error
++
;-------------------------------------------------------------
sta+1 f_msk+1          ;setup mask without tainting X
arr #\$f8               ;-> carry is still set, and so is bit 7. This way we generate values from \$c0 .. \$fc, a range to which we adopt the memory layout of the row tables
sta+1 f_jmp+2          ;update byte of jump responsible to select all code-segments that start with xstart ;save 1 cycle due to zeropage
f_msk    lda maskl              ;the next two instructions could be moved to speedcode, but would just make it bloated, however meshes that throw errors get a penalty from that as an undef case wastes more cycles that way.
!ifdef MULTI_PATTERN {
f_patt   and patt_0,y           ;fetch pattern
}
f_jmp    jmp (\$1000)            ;do it! \o/
;-------------------------------------------------------------
f_end
rts
}
fill_end

;generate labels for combined chunks to reuse parts of the code
!if (.num = 0) {
s1_1_b1
}
!if (.num = 1) {
s2_2_b1
}
!if (.num = 2) {
s3_3_b1
}
!if (.num = 3) {
s4_4_b1
}
!if (.num = 4) {
s5_5_b1
}
!if (.num = 5) {
s6_6_b1
}
!if (.num = 6) {
s7_7_b1
}
!if (.num = 7) {
s8_8_b1
}
!if (.num = 8) {
s9_9_b1
}
!if (.num = 9) {
sa_a_b1
}
!if (.num = 10) {
sb_b_b1
}
!if (.num = 11) {
sc_c_b1
}
!if (.num = 12) {
sd_d_b1
}
!if (.num = 13) {
se_e_b1
}
!if (.num = 14) {
sf_f_b1
}
}
!if (.num = 0) {
s1_1_b2
}
!if (.num = 1) {
s2_2_b2
}
!if (.num = 2) {
s3_3_b2
}
!if (.num = 3) {
s4_4_b2
}
!if (.num = 4) {
s5_5_b2
}
!if (.num = 5) {
s6_6_b2
}
!if (.num = 6) {
s7_7_b2
}
!if (.num = 7) {
s8_8_b2
}
!if (.num = 8) {
s9_9_b2
}
!if (.num = 9) {
sa_a_b2
}
!if (.num = 10) {
sb_b_b2
}
!if (.num = 11) {
sc_c_b2
}
!if (.num = 12) {
sd_d_b2
}
!if (.num = 13) {
se_e_b2
}
!if (.num = 14) {
sf_f_b2
}
}
}

jmp f_back
}

;left chunck

!ifdef MULTI_PATTERN {
lda (patt),y ;refetch pattern, expensive, but at least less than sta patt, lda patt
} else {
lda #\$ff
}
!for .x, .num {
}

;right chunk
jmp f_back
}

!ifdef MULTI_PATTERN {
patt_ptr_hi
!byte >patt_0, >patt_1, >patt_2, >patt_3
}

;--------------------------
;DRAWFACE
;fill face with 3/4 vertices with pattern
;--------------------------

drawface
;find lowest and highest y-position of rectangle. ATTENTION: This makes your head explode, actually it is the optimized case of a bubblesort of 4 values.
lda verticebuf_y+1       ;v1.y - v0.y
cmp verticebuf_y+0
bcs Ba
;--------------------------
;v0 > v1
;--------------------------
Ab
cpx verticebuf_y+2       ;v3.y - v2.y
;--------------------------
;v0 v2 > v1 v3
;--------------------------
ACbd
lda verticebuf_y+0       ;v0.y - v2.y
cmp verticebuf_y+2
bcs +
cpx verticebuf_y+1       ;v3.y - v1.y
bcc min3_max2
min1_max2
jsr render_xstart_12
clc
jsr draw_face_seg_03+2   ;other segment below y_min
jsr draw_face_seg_10+2   ;other segment below y_min
jmp draw_face_seg_32     ;segment with y_min
+
cpx verticebuf_y+1       ;v3.y - v1.y
bcc min3_max0
min1_max0
jsr render_xstart_12
jsr render_xstart_23
jsr render_xstart_30
clc
jmp draw_face_seg_10
min1_max3
jsr render_xstart_12
jsr render_xstart_23
clc
jsr draw_face_seg_10+2
jmp draw_face_seg_03

;--------------------------
;v0 v3 > v1 v2
;--------------------------
cpx verticebuf_y+0       ;v3.y - v0.y
bcc +
cmp verticebuf_y+2       ;v1.y - v2.y
bcc min1_max3
min2_max3
jsr render_xstart_23
clc
jsr draw_face_seg_10+2
jsr draw_face_seg_21+2
jmp draw_face_seg_03
+
cmp verticebuf_y+2       ;v1.y - v2.y
bcc min1_max0
min2_max0
jsr render_xstart_23
jsr render_xstart_30
clc
jsr draw_face_seg_21+2
jmp draw_face_seg_10
min2_max1
jsr render_xstart_23
jsr render_xstart_30
jsr render_xstart_01
clc
jmp draw_face_seg_21
;--------------------------
;v1 > v0
;--------------------------
Ba
cpx verticebuf_y+2       ;v3.y - v2.y
bcs BDac
;--------------------------
;v1 v2 > v0 v3
;--------------------------
cmp verticebuf_y+2       ;v1.y - v2.y
bcs +
cpx verticebuf_y+0       ;v3.y - v0.y
bcs min0_max2
min3_max2
jsr render_xstart_30
jsr render_xstart_01
jsr render_xstart_12
clc
jmp draw_face_seg_32
+
cpx verticebuf_y+0       ;v3-y - v0.y
bcs min0_max1
min3_max1
jsr render_xstart_30
jsr render_xstart_01
clc
jsr draw_face_seg_32+2
jmp draw_face_seg_21
min3_max0
jsr render_xstart_30
clc
jsr draw_face_seg_21+2
jsr draw_face_seg_32+2
jmp draw_face_seg_10

;--------------------------
;v1 v3 > v0 v2
;--------------------------
BDac
cpx verticebuf_y+1       ;v3.y - v1.y
bcc +
cmp verticebuf_y+2       ;v1.y - v2.y
bcs min2_max3
min0_max3
jsr render_xstart_01
jsr render_xstart_12
jsr render_xstart_23
clc
jmp draw_face_seg_03
+
cmp verticebuf_y+2       ;v1.y - v2.y
bcs min2_max1
min0_max1
jsr render_xstart_01
clc
jsr draw_face_seg_32+2
jsr draw_face_seg_03+2
jmp draw_face_seg_21
min0_max2
jsr render_xstart_01
jsr render_xstart_12
clc
jsr draw_face_seg_03+2
jmp draw_face_seg_32

;--------------------------
;FILLER FUNCTIONS
;
;--------------------------

;macro for setting up coordinates (x1)/x2/y1/y2
!macro draw_face_seg .x, .y {
lda verticebuf_y + .y
;carry is always clear
;clc
;calc dy
sbc verticebuf_y + .x
;negative / zero?
bmi .zero
+
tay
iny

lda verticebuf_y + .x
;setup y endval in filler
sta f_yend+1

;calc dx
lax verticebuf_x + .y
;sec
sbc verticebuf_x + .x
;dx is negative?
bcs +

;yes, do an abs(dx)
eor #\$ff

sta f_dx1+1    ;needed to be able to compare A with Y
cpy f_dx1+1
bcs .x2_steep_
.x2_flat_
;setup err, dy, dx
sta f_dx2+1
sty+1 f_err+1
sty f_dx1+1

;setup code for flat lines
lda #\$e8 ;inx
sta f_code
lda #\$b0
sta f_code+1
lda #\$fb
sta f_code+2
ldy verticebuf_y + .y
jmp fill_code

.x2_steep_
;setup err, dy, dx
sty f_dx2+1
sta+1 f_err+1
;sta f_dx1+1

lda #\$b0
sta f_code
lda #bcs_end-bcs_start
sta f_code+1
lda #\$e8 ;inx
sta f_code+2
ldy verticebuf_y + .y
jmp fill_code
.zero
clc
rts
+
sta f_dx1+1
cpy f_dx1+1
bcs .x2_steep

.x2_flat
;setup err, dy, dx
sta f_dx2+1
sty+1 f_err+1
sty f_dx1+1

;setup code for flat lines
lda #\$ca ;dex
sta f_code
lda #\$b0 ;bcs *-3
sta f_code+1
lda #\$fb
sta f_code+2
ldy verticebuf_y + .y
jmp fill_code

.x2_steep
;setup err, dy, dx
sty f_dx2+1
sta+1 f_err+1
;sta f_dx1+1

lda #\$b0 ;bcs
sta f_code
lda #bcs_end-bcs_start
sta f_code+1
lda #\$ca ;dex
sta f_code+2
ldy verticebuf_y + .y
jmp fill_code

}

;--------------------------
;RENDER A FACE SEGMENT (Values for x1 are already calculated)
;
;--------------------------

draw_face_seg_10
outline6 lda #verticebuf_y+0
+draw_face_seg 1, 0
draw_face_seg_21
outline5 lda #verticebuf_y+1
+draw_face_seg 2, 1
draw_face_seg_32
outline4 lda #verticebuf_y+2
+draw_face_seg 3, 2
draw_face_seg_03
outline3 lda #verticebuf_y+3
+draw_face_seg 0, 3

;--------------------------
;RENDER LINE ON TARGET 1
;
;--------------------------

;macro for setting up coordinates (x1)/x2/y1/y2
!macro render_xstart .x, .y {
;calc dy
lda verticebuf_y + .y
sec
;subtract one too much to make test on <= 0
sbc verticebuf_y + .x
;negative/zero?
bmi .zero
beq .zero
+
tay

;calc dx and prepare xstart-value in X
lax verticebuf_x + .y
sbx #\$80
sec                   ;meh, could be saved, but sbx taints carry
sbc verticebuf_x + .x
;dx is negative?
bcs .dx_positive

;yes, do an abs(dx)
eor #\$ff

sta dx
;choose direction dx>dy or dx<dy? y = dy
cpy dx
bcc .rxs_flat_i

.xstart_i
;now setup jump into code nicely and fast without all that jsr and rts-setting shits
sty dy
sty .jmp_i+1          ;set lowbyte of jump
asl .jmp_i+1          ;and shift left -> 128 different pointers selectable by that. ASL is expensive, but therefore doesn't clobber A

ldy verticebuf_y + .x ;y1 -> + dy (determinded by code entry position) -> we start to store @ y2

!ifdef BY_2 {
lsr
}
sec
.jmp_i   jmp (cd_i)
.zero
rts

.dx_positive
sta dx
;choose direction dx>dy or dx<dy? y = dy
cpy dx
bcc .rxs_flat_d

.xstart_d
sty dy
sty .jmp_d+1
asl .jmp_d+1

ldy verticebuf_y + .x

!ifdef BY_2 {
lsr
}
sec
.jmp_d   jmp (cd_d)

;--------------------------
;the flat slopes are done by conventional code
;dx > dy x++ y--
;--------------------------

.rxs_flat_i
;setup inx/dex, dy, dx
sty .rxsdy1+1
sta .rxsdx1+1

;add y1 to stx xstart,y so we can count down by dy
lda verticebuf_y + .x
sta .rxsstx1+1

;dy is counter
tya
!ifdef BY_2 {
lsr
}
sec
-
inx
.rxsdy1  sbc #\$00
bcs -
dey
;yay, zeropage, now we can store x directly!
.rxsstx1 stx xstart,y
bne -
rts

.rxs_flat_d
sty .rxsdy2+1
sta .rxsdx2+1

lda verticebuf_y + .x
sta .rxsstx2+1

tya
!ifdef BY_2 {
lsr
}
sec
-
dex
.rxsdy2  sbc #\$00
bcs -
dey
.rxsstx2 stx xstart,y
bne -
rts

}

render_xstart_01
+render_xstart 0, 1
render_xstart_12
+render_xstart 1, 2
render_xstart_23
+render_xstart 2, 3
render_xstart_30
+render_xstart 3, 0

calc_xstart1_d
!for .x,128 {
sbc dx
bcs +
dex
+
stx xstart+128-.x,y
}
rts

calc_xstart1_i
!for .x,128 {
sbc dx             ;3
bcs +              ;3
inx                ;2
+
stx xstart+128-.x,y;3
}
rts

start_clear
;-----------------------------
;/!\ ATTENTION: All stuff from here on will be overwritten upon codegen of clear
;----------------------------

;just there to be copied to their final destinations @\$c000-\$fc00
targets
!word s0_0_b1, s0_1_b1, s0_2_b1, s0_3_b1, s0_4_b1, s0_5_b1, s0_6_b1, s0_7_b1, s0_8_b1, s0_9_b1, s0_a_b1, s0_b_b1, s0_c_b1, s0_d_b1, s0_e_b1, s0_f_b1
!word s0_0_b2, s0_1_b2, s0_2_b2, s0_3_b2, s0_4_b2, s0_5_b2, s0_6_b2, s0_7_b2, s0_8_b2, s0_9_b2, s0_a_b2, s0_b_b2, s0_c_b2, s0_d_b2, s0_e_b2, s0_f_b2
!word f_back , s1_1_b1, s1_2_b1, s1_3_b1, s1_4_b1, s1_5_b1, s1_6_b1, s1_7_b1, s1_8_b1, s1_9_b1, s1_a_b1, s1_b_b1, s1_c_b1, s1_d_b1, s1_e_b1, s1_f_b1
!word f_back , s1_1_b2, s1_2_b2, s1_3_b2, s1_4_b2, s1_5_b2, s1_6_b2, s1_7_b2, s1_8_b2, s1_9_b2, s1_a_b2, s1_b_b2, s1_c_b2, s1_d_b2, s1_e_b2, s1_f_b2
!word f_back , f_back , s2_2_b1, s2_3_b1, s2_4_b1, s2_5_b1, s2_6_b1, s2_7_b1, s2_8_b1, s2_9_b1, s2_a_b1, s2_b_b1, s2_c_b1, s2_d_b1, s2_e_b1, s2_f_b1
!word f_back , f_back , s2_2_b2, s2_3_b2, s2_4_b2, s2_5_b2, s2_6_b2, s2_7_b2, s2_8_b2, s2_9_b2, s2_a_b2, s2_b_b2, s2_c_b2, s2_d_b2, s2_e_b2, s2_f_b2
!word f_back , f_back , f_back , s3_3_b1, s3_4_b1, s3_5_b1, s3_6_b1, s3_7_b1, s3_8_b1, s3_9_b1, s3_a_b1, s3_b_b1, s3_c_b1, s3_d_b1, s3_e_b1, s3_f_b1
!word f_back , f_back , f_back , s3_3_b2, s3_4_b2, s3_5_b2, s3_6_b2, s3_7_b2, s3_8_b2, s3_9_b2, s3_a_b2, s3_b_b2, s3_c_b2, s3_d_b2, s3_e_b2, s3_f_b2
!word f_back , f_back , f_back , f_back , s4_4_b1, s4_5_b1, s4_6_b1, s4_7_b1, s4_8_b1, s4_9_b1, s4_a_b1, s4_b_b1, s4_c_b1, s4_d_b1, s4_e_b1, s4_f_b1
!word f_back , f_back , f_back , f_back , s4_4_b2, s4_5_b2, s4_6_b2, s4_7_b2, s4_8_b2, s4_9_b2, s4_a_b2, s4_b_b2, s4_c_b2, s4_d_b2, s4_e_b2, s4_f_b2
!word f_back , f_back , f_back , f_back , f_back , s5_5_b1, s5_6_b1, s5_7_b1, s5_8_b1, s5_9_b1, s5_a_b1, s5_b_b1, s5_c_b1, s5_d_b1, s5_e_b1, s5_f_b1
!word f_back , f_back , f_back , f_back , f_back , s5_5_b2, s5_6_b2, s5_7_b2, s5_8_b2, s5_9_b2, s5_a_b2, s5_b_b2, s5_c_b2, s5_d_b2, s5_e_b2, s5_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , s6_6_b1, s6_7_b1, s6_8_b1, s6_9_b1, s6_a_b1, s6_b_b1, s6_c_b1, s6_d_b1, s6_e_b1, s6_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , s6_6_b2, s6_7_b2, s6_8_b2, s6_9_b2, s6_a_b2, s6_b_b2, s6_c_b2, s6_d_b2, s6_e_b2, s6_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , s7_7_b1, s7_8_b1, s7_9_b1, s7_a_b1, s7_b_b1, s7_c_b1, s7_d_b1, s7_e_b1, s7_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , s7_7_b2, s7_8_b2, s7_9_b2, s7_a_b2, s7_b_b2, s7_c_b2, s7_d_b2, s7_e_b2, s7_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , s8_8_b1, s8_9_b1, s8_a_b1, s8_b_b1, s8_c_b1, s8_d_b1, s8_e_b1, s8_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , s8_8_b2, s8_9_b2, s8_a_b2, s8_b_b2, s8_c_b2, s8_d_b2, s8_e_b2, s8_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , s9_9_b1, s9_a_b1, s9_b_b1, s9_c_b1, s9_d_b1, s9_e_b1, s9_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , s9_9_b2, s9_a_b2, s9_b_b2, s9_c_b2, s9_d_b2, s9_e_b2, s9_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sa_a_b1, sa_b_b1, sa_c_b1, sa_d_b1, sa_e_b1, sa_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sa_a_b2, sa_b_b2, sa_c_b2, sa_d_b2, sa_e_b2, sa_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sb_b_b1, sb_c_b1, sb_d_b1, sb_e_b1, sb_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sb_b_b2, sb_c_b2, sb_d_b2, sb_e_b2, sb_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sc_c_b1, sc_d_b1, sc_e_b1, sc_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sc_c_b2, sc_d_b2, sc_e_b2, sc_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sd_d_b1, sd_e_b1, sd_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sd_d_b2, sd_e_b2, sd_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , se_e_b1, se_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , se_e_b2, se_f_b2
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sf_f_b1
!word f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , f_back , sf_f_b2

;copy a lot of stuff to there needed location and setup the filler code
setup_fill
ldx #\$00
-
lda part_6+\$000,x
sta \$d440,x
lda part_6+\$100,x
sta \$d540,x
lda part_6+\$200,x
sta \$d640,x
lda part_6+\$300,x
sta \$d740,x
lda part_7+\$000,x
sta \$d840,x
lda part_7+\$100,x
sta \$d940,x
lda part_7+\$200,x
sta \$da40,x
lda part_7+\$300,x
sta \$db40,x
lda part_8+\$000,x
sta \$dc40,x
lda part_8+\$100,x
sta \$dd40,x
lda part_8+\$200,x
sta \$de40,x
lda part_8+\$300,x
sta \$df40,x
lda part_9+\$000,x
sta \$e040,x
lda part_9+\$100,x
sta \$e140,x
lda part_9+\$200,x
sta \$e240,x
lda part_9+\$300,x
sta \$e340,x
lda part_10+\$000,x
sta \$e440,x
lda part_10+\$100,x
sta \$e540,x
lda part_10+\$200,x
sta \$e640,x
lda part_10+\$300,x
sta \$e740,x
dex
bne -

-
lda part_1+\$000,x
sta \$c040,x
lda part_1+\$100,x
sta \$c140,x
lda part_1+\$200,x
sta \$c240,x
lda part_1+\$300,x
sta \$c340,x
lda part_2+\$000,x
sta \$c440,x
lda part_2+\$100,x
sta \$c540,x
lda part_2+\$200,x
sta \$c640,x
lda part_2+\$300,x
sta \$c740,x
lda part_3+\$000,x
sta \$c840,x
lda part_3+\$100,x
sta \$c940,x
lda part_3+\$200,x
sta \$ca40,x
lda part_3+\$300,x
sta \$cb40,x
lda part_4+\$000,x
sta \$cc40,x
lda part_4+\$100,x
sta \$cd40,x
lda part_4+\$200,x
sta \$ce40,x
lda part_4+\$300,x
sta \$cf40,x
lda part_5+\$000,x
sta \$d040,x
lda part_5+\$100,x
sta \$d140,x
lda part_5+\$200,x
sta \$d240,x
lda part_5+\$300,x
sta \$d340,x
dex
bne -

ldx #fill_end-fill_start
-
lda fill_start,x
sta fill_code,x
dex
bpl -

ldx #\$00
txa
-
sta maskr,x        ;use offset of 1, as xend has that offset as well
eor #\$ff
eor #\$ff
sec
ror
cmp #\$ff
bne +
lda #\$00
+
inx
bpl -

!ifdef MULTI_PATTERN {
;generate full patterns
ldx #\$00
-
lda e_patt+0
sta patt_0+0,x
lda o_patt+0
sta patt_0+1,x
lda e_patt+1
sta patt_1+0,x
lda o_patt+1
sta patt_1+1,x
lda e_patt+2
sta patt_2+0,x
lda o_patt+2
sta patt_2+1,x
lda e_patt+3
sta patt_3+0,x
lda o_patt+3
sta patt_3+1,x
inx
inx
bpl -
}

lda #\$10
sta tmp1
lda #\$00
tax
--
ldy #\$07
-
sta to_index_col_b1,x
ora #\$20
sta to_index_col_b2,x
and #\$1f
inx
dey
bpl -
clc
dec tmp1
bne --

;copy target pointers for speed_code segments to fit memory layout (\$20 pointers each \$400 bytes from \$c000 on)
ldx #\$3f
-
lda targets+\$000,x
sta tgt_dst+\$0*tgt_size,x

lda targets+\$040,x
sta tgt_dst+\$1*tgt_size,x

lda targets+\$080,x
sta tgt_dst+\$2*tgt_size,x

lda targets+\$0c0,x
sta tgt_dst+\$3*tgt_size,x

lda targets+\$100,x
sta tgt_dst+\$4*tgt_size,x

lda targets+\$140,x
sta tgt_dst+\$5*tgt_size,x

lda targets+\$180,x
sta tgt_dst+\$6*tgt_size,x

lda targets+\$1c0,x
sta tgt_dst+\$7*tgt_size,x

lda targets+\$200,x
sta tgt_dst+\$8*tgt_size,x

lda targets+\$240,x
sta tgt_dst+\$9*tgt_size,x

lda targets+\$280,x
sta tgt_dst+\$a*tgt_size,x

lda targets+\$2c0,x
sta tgt_dst+\$b*tgt_size,x

lda targets+\$300,x
sta tgt_dst+\$c*tgt_size,x

lda targets+\$340,x
sta tgt_dst+\$d*tgt_size,x

lda targets+\$380,x
sta tgt_dst+\$e*tgt_size,x

lda targets+\$3c0,x
sta tgt_dst+\$f*tgt_size,x
dex
bpl -

!ifdef MULTI_PATTERN {
lda #\$80
sta patt
}

ldx #\$00
-
lda cd_d_o,x
sta cd_d,x
lda cd_i_o,x
sta cd_i,x
dex
bne -

rts

;pointers into slope-generation code
cd_d_o
!for .x,128 {
!word (128-.x+1) * 9 + calc_xstart1_d
}
cd_i_o
!for .x,128 {
!word (128-.x+1) * 9 + calc_xstart1_i
}

;speedcode chunks that are jumped to from inner loop
part_1
!pseudopc \$c040 {
s0_0_b1
+comb bank1+\$000
s0_1_b1
+norm bank1+\$000, 0
s0_2_b1
+norm bank1+\$000, 1
s0_3_b1
+norm bank1+\$000, 2
s0_4_b1
+norm bank1+\$000, 3
s0_5_b1
+norm bank1+\$000, 4
s0_6_b1
+norm bank1+\$000, 5
s0_7_b1
+norm bank1+\$000, 6
s0_8_b1
+norm bank1+\$000, 7
s0_9_b1
+norm bank1+\$000, 8
s0_a_b1
+norm bank1+\$000, 9
s0_b_b1
+norm bank1+\$000, 10
s0_c_b1
+norm bank1+\$000, 11
s0_d_b1
+norm bank1+\$000, 12
s0_e_b1
+norm bank1+\$000, 13
s0_f_b1
+norm bank1+\$000, 14

s1_2_b1
+norm bank1+\$080, 0
s1_3_b1
+norm bank1+\$080, 1
s1_4_b1
+norm bank1+\$080, 2
s1_5_b1
+norm bank1+\$080, 3
s1_6_b1
+norm bank1+\$080, 4
s1_7_b1
+norm bank1+\$080, 5
s1_8_b1
+norm bank1+\$080, 6
s1_9_b1
+norm bank1+\$080, 7
s1_a_b1
+norm bank1+\$080, 8
}

part_2
!pseudopc \$c440 {
s1_b_b1
+norm bank1+\$080, 9
s1_c_b1
+norm bank1+\$080, 10
s1_d_b1
+norm bank1+\$080, 11
s1_e_b1
+norm bank1+\$080, 12
s1_f_b1
+norm bank1+\$080, 13

s2_3_b1
+norm bank1+\$100, 0
s2_4_b1
+norm bank1+\$100, 1
s2_5_b1
+norm bank1+\$100, 2
s2_6_b1
+norm bank1+\$100, 3
s2_7_b1
+norm bank1+\$100, 4
s2_8_b1
+norm bank1+\$100, 5
s2_9_b1
+norm bank1+\$100, 6
s2_a_b1
+norm bank1+\$100, 7
s2_b_b1
+norm bank1+\$100, 8
s2_c_b1
+norm bank1+\$100, 9
s2_d_b1
+norm bank1+\$100, 10
s2_e_b1
+norm bank1+\$100, 11
s2_f_b1
+norm bank1+\$100, 12

s3_4_b1
+norm bank1+\$180, 0
s3_5_b1
+norm bank1+\$180, 1
s3_6_b1
+norm bank1+\$180, 2
s3_7_b1
+norm bank1+\$180, 3
s3_8_b1
+norm bank1+\$180, 4
s3_9_b1
+norm bank1+\$180, 5
}

part_3
!pseudopc \$c840 {
s3_a_b1
+norm bank1+\$180, 6
s3_b_b1
+norm bank1+\$180, 7
s3_c_b1
+norm bank1+\$180, 8
s3_d_b1
+norm bank1+\$180, 9
s3_e_b1
+norm bank1+\$180, 10
s3_f_b1
+norm bank1+\$180, 11

s4_5_b1
+norm bank1+\$200, 0
s4_6_b1
+norm bank1+\$200, 1
s4_7_b1
+norm bank1+\$200, 2
s4_8_b1
+norm bank1+\$200, 3
s4_9_b1
+norm bank1+\$200, 4
s4_a_b1
+norm bank1+\$200, 5
s4_b_b1
+norm bank1+\$200, 6
s4_c_b1
+norm bank1+\$200, 7
s4_d_b1
+norm bank1+\$200, 8
s4_e_b1
+norm bank1+\$200, 9
s4_f_b1
+norm bank1+\$200, 10

s5_6_b1
+norm bank1+\$280, 0
s5_7_b1
+norm bank1+\$280, 1
s5_8_b1
+norm bank1+\$280, 2
s5_9_b1
+norm bank1+\$280, 3
s5_a_b1
+norm bank1+\$280, 4
s5_b_b1
+norm bank1+\$280, 5
s5_c_b1
+norm bank1+\$280, 6
s5_d_b1
+norm bank1+\$280, 7
s5_e_b1
+norm bank1+\$280, 8
}

part_4
!pseudopc \$cc40 {
s5_f_b1
+norm bank1+\$280, 9

s6_7_b1
+norm bank1+\$300, 0
s6_8_b1
+norm bank1+\$300, 1
s6_9_b1
+norm bank1+\$300, 2
s6_a_b1
+norm bank1+\$300, 3
s6_b_b1
+norm bank1+\$300, 4
s6_c_b1
+norm bank1+\$300, 5
s6_d_b1
+norm bank1+\$300, 6
s6_e_b1
+norm bank1+\$300, 7
s6_f_b1
+norm bank1+\$300, 8

s7_8_b1
+norm bank1+\$380, 0
s7_9_b1
+norm bank1+\$380, 1
s7_a_b1
+norm bank1+\$380, 2
s7_b_b1
+norm bank1+\$380, 3
s7_c_b1
+norm bank1+\$380, 4
s7_d_b1
+norm bank1+\$380, 5
s7_e_b1
+norm bank1+\$380, 6
s7_f_b1
+norm bank1+\$380, 7

s8_9_b1
+norm bank1+\$400, 0
s8_a_b1
+norm bank1+\$400, 1
s8_b_b1
+norm bank1+\$400, 2
s8_c_b1
+norm bank1+\$400, 3
s8_d_b1
+norm bank1+\$400, 4
s8_e_b1
+norm bank1+\$400, 5
s8_f_b1
+norm bank1+\$400, 6

s9_a_b1
+norm bank1+\$480, 0
s9_b_b1
+norm bank1+\$480, 1
s9_c_b1
+norm bank1+\$480, 2
s9_d_b1
+norm bank1+\$480, 3
s9_e_b1
+norm bank1+\$480, 4
s9_f_b1
+norm bank1+\$480, 5
}

part_5
!pseudopc \$d040 {
sa_b_b1
+norm bank1+\$500, 0
sa_c_b1
+norm bank1+\$500, 1
sa_d_b1
+norm bank1+\$500, 2
sa_e_b1
+norm bank1+\$500, 3
sa_f_b1
+norm bank1+\$500, 4

sb_c_b1
+norm bank1+\$580, 0
sb_d_b1
+norm bank1+\$580, 1
sb_e_b1
+norm bank1+\$580, 2
sb_f_b1
+norm bank1+\$580, 3

sc_d_b1
+norm bank1+\$600, 0
sc_e_b1
+norm bank1+\$600, 1
sc_f_b1
+norm bank1+\$600, 2

sd_e_b1
+norm bank1+\$680, 0
sd_f_b1
+norm bank1+\$680, 1

se_f_b1
+norm bank1+\$700, 0
}

part_6
!pseudopc \$d440 {
s0_0_b2
+comb bank2+\$000
s0_1_b2
+norm bank2+\$000, 0
s0_2_b2
+norm bank2+\$000, 1
s0_3_b2
+norm bank2+\$000, 2
s0_4_b2
+norm bank2+\$000, 3
s0_5_b2
+norm bank2+\$000, 4
s0_6_b2
+norm bank2+\$000, 5
s0_7_b2
+norm bank2+\$000, 6
s0_8_b2
+norm bank2+\$000, 7
s0_9_b2
+norm bank2+\$000, 8
s0_a_b2
+norm bank2+\$000, 9
s0_b_b2
+norm bank2+\$000, 10
s0_c_b2
+norm bank2+\$000, 11
s0_d_b2
+norm bank2+\$000, 12
s0_e_b2
+norm bank2+\$000, 13
s0_f_b2
+norm bank2+\$000, 14

s1_2_b2
+norm bank2+\$080, 0
s1_3_b2
+norm bank2+\$080, 1
s1_4_b2
+norm bank2+\$080, 2
s1_5_b2
+norm bank2+\$080, 3
s1_6_b2
+norm bank2+\$080, 4
s1_7_b2
+norm bank2+\$080, 5
s1_8_b2
+norm bank2+\$080, 6
s1_9_b2
+norm bank2+\$080, 7
s1_a_b2
+norm bank2+\$080, 8
}

part_7
!pseudopc \$d840 {
s1_b_b2
+norm bank2+\$080, 9
s1_c_b2
+norm bank2+\$080, 10
s1_d_b2
+norm bank2+\$080, 11
s1_e_b2
+norm bank2+\$080, 12
s1_f_b2
+norm bank2+\$080, 13

s2_3_b2
+norm bank2+\$100, 0
s2_4_b2
+norm bank2+\$100, 1
s2_5_b2
+norm bank2+\$100, 2
s2_6_b2
+norm bank2+\$100, 3
s2_7_b2
+norm bank2+\$100, 4
s2_8_b2
+norm bank2+\$100, 5
s2_9_b2
+norm bank2+\$100, 6
s2_a_b2
+norm bank2+\$100, 7
s2_b_b2
+norm bank2+\$100, 8
s2_c_b2
+norm bank2+\$100, 9
s2_d_b2
+norm bank2+\$100, 10
s2_e_b2
+norm bank2+\$100, 11
s2_f_b2
+norm bank2+\$100, 12

s3_4_b2
+norm bank2+\$180, 0
s3_5_b2
+norm bank2+\$180, 1
s3_6_b2
+norm bank2+\$180, 2
s3_7_b2
+norm bank2+\$180, 3
s3_8_b2
+norm bank2+\$180, 4
s3_9_b2
+norm bank2+\$180, 5
}

part_8
!pseudopc \$dc40 {
s3_a_b2
+norm bank2+\$180, 6
s3_b_b2
+norm bank2+\$180, 7
s3_c_b2
+norm bank2+\$180, 8
s3_d_b2
+norm bank2+\$180, 9
s3_e_b2
+norm bank2+\$180, 10
s3_f_b2
+norm bank2+\$180, 11

s4_5_b2
+norm bank2+\$200, 0
s4_6_b2
+norm bank2+\$200, 1
s4_7_b2
+norm bank2+\$200, 2
s4_8_b2
+norm bank2+\$200, 3
s4_9_b2
+norm bank2+\$200, 4
s4_a_b2
+norm bank2+\$200, 5
s4_b_b2
+norm bank2+\$200, 6
s4_c_b2
+norm bank2+\$200, 7
s4_d_b2
+norm bank2+\$200, 8
s4_e_b2
+norm bank2+\$200, 9
s4_f_b2
+norm bank2+\$200, 10

s5_6_b2
+norm bank2+\$280, 0
s5_7_b2
+norm bank2+\$280, 1
s5_8_b2
+norm bank2+\$280, 2
s5_9_b2
+norm bank2+\$280, 3
s5_a_b2
+norm bank2+\$280, 4
s5_b_b2
+norm bank2+\$280, 5
s5_c_b2
+norm bank2+\$280, 6
s5_d_b2
+norm bank2+\$280, 7
s5_e_b2
+norm bank2+\$280, 8
}

part_9
!pseudopc \$e040 {
s5_f_b2
+norm bank2+\$280, 9

s6_7_b2
+norm bank2+\$300, 0
s6_8_b2
+norm bank2+\$300, 1
s6_9_b2
+norm bank2+\$300, 2
s6_a_b2
+norm bank2+\$300, 3
s6_b_b2
+norm bank2+\$300, 4
s6_c_b2
+norm bank2+\$300, 5
s6_d_b2
+norm bank2+\$300, 6
s6_e_b2
+norm bank2+\$300, 7
s6_f_b2
+norm bank2+\$300, 8

s7_8_b2
+norm bank2+\$380, 0
s7_9_b2
+norm bank2+\$380, 1
s7_a_b2
+norm bank2+\$380, 2
s7_b_b2
+norm bank2+\$380, 3
s7_c_b2
+norm bank2+\$380, 4
s7_d_b2
+norm bank2+\$380, 5
s7_e_b2
+norm bank2+\$380, 6
s7_f_b2
+norm bank2+\$380, 7

s8_9_b2
+norm bank2+\$400, 0
s8_a_b2
+norm bank2+\$400, 1
s8_b_b2
+norm bank2+\$400, 2
s8_c_b2
+norm bank2+\$400, 3
s8_d_b2
+norm bank2+\$400, 4
s8_e_b2
+norm bank2+\$400, 5
s8_f_b2
+norm bank2+\$400, 6

s9_a_b2
+norm bank2+\$480, 0
s9_b_b2
+norm bank2+\$480, 1
s9_c_b2
+norm bank2+\$480, 2
s9_d_b2
+norm bank2+\$480, 3
s9_e_b2
+norm bank2+\$480, 4
s9_f_b2
+norm bank2+\$480, 5
}

part_10
!pseudopc \$e440 {
sa_b_b2
+norm bank2+\$500, 0
sa_c_b2
+norm bank2+\$500, 1
sa_d_b2
+norm bank2+\$500, 2
sa_e_b2
+norm bank2+\$500, 3
sa_f_b2
+norm bank2+\$500, 4

sb_c_b2
+norm bank2+\$580, 0
sb_d_b2
+norm bank2+\$580, 1
sb_e_b2
+norm bank2+\$580, 2
sb_f_b2
+norm bank2+\$580, 3

sc_d_b2
+norm bank2+\$600, 0
sc_e_b2
+norm bank2+\$600, 1
sc_f_b2
+norm bank2+\$600, 2

sd_e_b2
+norm bank2+\$680, 0
sd_f_b2
+norm bank2+\$680, 1

se_f_b2
+norm bank2+\$700, 0
}```

## Alternatives

The filler could also be done charbased. Means, for every empty 8×8 block that you draw into, you start a new char in the charset (or modify the existing if it is not empty), and then place the corresponding char on the screen. That is done for the outlining start/ending chunks. The inside is then filled with a single char that represents the filling pattern in 8×8 size. For that, only the screen needs to be touched. Besides a maybe faster filling, this would also save the overhead of clearing the charset, as it is just overwritten as far as it is used in the next turn. Only the 16×16 area on the screen itself needs to be cleared/set to an empty char. However, due to its complexity, i didn't give this a try so far.

## Further Optimizations

As can be seen, the outlines of each face are calculated per face, however the faces might share parts of their outline with other faces. Here we would calculate the outlines to target1/2 two times. When we want to avoid that, we have to throw over some parts of the described concept. The faces need then to consist of 4 indexes to lines that build their outline, the line then consists of 2 indexes to the respective vertices (Remember, so far the faces just consist of 4 indexes to their respective vertices). That way we can render all the lines needed for the mesh first (and keep track of the already rendered lines with an extra table). Therefore we best use a block of \$80 bytes (maximum length in y) in memory for each line and build a table of pointers, so that we can index to the right line-segment later on. It is also obvious that the nice zeropage-trick (stx target,y) won't work anymore when rendering the outlines. So we have a penalty of 6 cycles in the inner loop. That will waste 1/4 of our expected best case gain. The filling process then needs to be split up:

• load left line and right line from vertice with y_min on
• set up target1 and target2 in filler to point to the right line-segments by getting the pointers from our index-table.
• fill until either the end of y_left or y_right is reached
• repeat last 2 steps until y_max is reached

So far i haven't implemented that case, as it is a lot of extra complexity to add. Also the gain can only be estimated, as for meshes that don't share any outlines among faces, this will even perform slower! But it should perform well for rather complex meshes.

## Fast Clearing

The clearing of the working buffer can waste a lot of time. The first thought often is, to just call the same filler again with a zero pattern, so that only the drawn area is cleared again without any overhead. A silly idea that is It is always faster to just brainlessly clear the whole buffer. Here optimizations are possible. Actually when just rotating some object it will only draw within the rotation radius of the object. So all we need to clear is this area within this radius. We can do this block-wise to save some memory and gain speed, but a speedcode-generator (no indexing, only a plain endless line of STAs) brings you the best results. In my example clearing the screen costs \$57 rasterlines, pretty fair.